Skip to content

Blog Posts

Sharing my views, opinions and learnings on AI Engineering, RAG Systems, Venture Capital, and more.

Categories

  • AI Engineering & LLMs
  • RAG Systems
  • Venture Capital
  • Technical Notes
  • Career & Personal Growth

Subscribe

You can subscribe to my blog updates through: - RSS Feed - Twitter - LinkedIn

Introduction to LLM Evaluations

Large Language Model (LLM) evaluations are the processes and metrics used to measure how well an LLM performs on a given task or meets certain quality criteria. Evaluating an LLM means defining what "good" output looks like—whether that's accuracy, relevance, or safety—and then checking the model's outputs against those expectations. Robust evaluation is critical because it's the only way to know if your model is working as intended and to continuously improve it. In my experience, many AI products that have failed share one common root cause: they never built a reliable evaluation system. Conversely, teams that evaluate early and often can iterate faster and catch problems before users do.

The Virtuous Cycle of LLM Development

At the heart of LLM development is a virtuous cycle where continuous evaluation and curation enable fast iteration. By testing outputs, identifying weaknesses, and rapidly iterating on improvements, you turn a decent prototype into a trustworthy, high-performing system.

Why LLM Evaluations Matter

LLMs can behave unpredictably or degrade as you tweak prompts or scale usage. Relying on initial "vibe-checks" is often misleading. Systematic evaluations provide ground truth signals about quality and ensure you're not just shipping a cool demo, but a reliable product. Evaluation results not only help in debugging but also guide improvements, much like software tests catch bugs. In short, a solid evaluation process creates a feedback loop: test, identify flaws, fix, and test again.

Quantitative vs. Qualitative Evaluations

Broadly speaking, there are two main approaches to evaluating LLMs:

  • Quantitative Evaluations: These are automated and numeric. For example, for a classification task you might compute accuracy or F1 scores; for translation, metrics like BLEU; and for summarization, metrics like ROUGE. More recently, some teams have started using LLM-based evaluators that assign numeric scores or make pairwise comparisons. Quantitative methods are fast and scalable, but they work best only when the metric truly correlates with real quality.

  • Qualitative Evaluations: These rely on human judgment. Domain experts or end users assess the outputs for aspects such as correctness, clarity, and overall usefulness. While human evaluations are considered the gold standard for subjective tasks, they are slow, expensive, and do not scale as easily.

In practice, a combination of both methods works best. You might use automated tests to filter out the obvious issues and then apply human review on a smaller subset for a more nuanced assessment.

Using LLM Evaluations in Production

Evaluating LLMs isn't a one-time research exercise—it's an ongoing process, especially when models are deployed in production. Setting up an evaluation pipeline early saves you from nasty surprises after deployment. Here's a common layered approach:

Level 1: Unit Tests

Unit tests are assertion-based checks on model outputs, much like traditional software unit tests. You define test inputs and expected outputs or properties, and then automatically verify that the model meets those expectations. For example, if you're building a chatbot, you might assert that when a user asks for pricing info, the response contains a dollar amount. These tests run quickly and cheaply and can catch regressions immediately.

Pseudo-code Example:
# Pseudo-code: simple unit test for a summarization prompt
input_text = "Long article about climate science..."
summary = llm.summarize(input_text)
# Expect the summary to mention key entities from the article
assert "climate" in summary and "carbon" in summary, "Missing key info in summary"

Level 2: Model-Driven and Human Evaluations

This level involves deeper evaluation, conducted periodically (for example, nightly or with each model update). It may include logging model outputs and having either humans or an LLM-based judge score them. Comparing these scores helps determine the reliability of automated evaluations. While more insightful, these evaluations require additional time and resources.

Level 3: A/B Tests and User Metrics

Ultimately, the best judge is your end-user. In production, monitoring how model changes impact user behavior and key performance metrics is essential. A/B tests—deploying a new model version to a fraction of users—can reveal differences in engagement, task success, error rates, and overall satisfaction. Although this approach is the gold standard for measuring business impact, it is also the most resource-intensive and requires thorough vetting with Level 1 and Level 2 evaluations beforehand.

Handling Real-World Constraints

When deploying LLMs, it's important to balance evaluation rigor with practical constraints:

  • Latency: For applications that need real-time responses (such as live chatbots), inline evaluations must be lightweight. Complex checks might be better suited for offline analysis or asynchronous processing.
  • Cost: Running evaluations, especially with large models, can become expensive. Mitigate this by sampling a subset of traffic for deep evaluation or by caching results.
  • Feedback Loops: Production systems generate a continuous stream of real user data. Logging inputs, outputs, and user interactions—such as clicks or retries—provides invaluable data that can be fed back into the evaluation pipeline for ongoing improvement.

Pitfalls and Challenges in LLM Evaluations

Evaluating LLMs is a nuanced art, and there are several common pitfalls to avoid:

Metric Obsession Without Purpose

Collecting a plethora of metrics can result in a data overload that provides little actionable insight. It's important to focus on a few key criteria that directly tie to user needs or system goals. Vague or uncalibrated scores often end up being "nice numbers" that are hard to interpret or act upon.

Ignoring Domain Expertise

Designing evaluation criteria without input from domain experts can lead to missing what really matters for your specific application. Whether you're dealing with legal documents, medical advice, or any other specialized field, the evaluation should reflect what actual users care about.

No Systematic Evaluation

Relying solely on ad-hoc "vibe checks" can leave your system vulnerable to unexpected failures. Establishing a systematic evaluation suite—even if it's just a dozen representative test cases—is essential before shipping a product.

Overfitting to Benchmarks

Focusing too narrowly on a single benchmark can lead to models that excel in that narrow area but fail in real-world applications. The goal should be balanced performance across all relevant aspects rather than chasing a single metric.

Trusting Automated Judges Blindly

Automated evaluators, including LLM-based judges, can have their own biases. It's crucial to regularly calibrate these evaluations against human feedback to ensure that the scores reflect true quality.

Lack of Statistical Rigor

LLM outputs can be highly variable. Testing on a small sample may lead to conclusions that are simply the result of chance. Always ensure that your experiments are statistically significant, using larger sample sizes and appropriate statistical tests.

Rule-Based Workflows for LLM Evaluations

One powerful approach to LLM evaluation is the use of rule-based workflows. Instead of relying solely on learned metrics or subjective human judgment, rule-based evaluations use explicit rules or tests to verify that outputs meet specific criteria. For example, if an LLM generates SQL queries, you can actually execute the query to see if it runs without errors. For summarization, you might enforce that all proper nouns from the source appear in the summary.

Why Rule-Based Evaluations?

  • Speed and Determinism: They're fast, deterministic, and highly interpretable.
  • Cost-Effectiveness: Simple string or structural checks can run in milliseconds, making them suitable for real-time applications.
  • Clear Guardrails: They ensure that critical requirements are met, while more nuanced qualities can be assessed with model-based evaluations or human review.

Hybrid Evaluation Workflows

The best evaluation systems combine multiple methods to capture both objective criteria and subjective quality. A typical hybrid workflow might include:

  1. Rule-Based Checks: Filter out outputs that fail obvious requirements (e.g., format, required keywords, disallowed content).
  2. LLM-Based or Automated Scoring: Assess more subjective qualities such as coherence, relevance, or helpfulness.
  3. Human Review: Provide a final layer of quality control by reviewing a sample of outputs, especially those flagged by automated systems.

For instance, in a chatbot application, rules can enforce politeness and required phrases, an LLM evaluator can score the overall response quality, and human reviewers can examine borderline cases to ensure the system truly aligns with user needs.

Case Studies and Industry Learnings

Over the years, I've learned several valuable lessons from both successes and failures in LLM evaluations:

  • Early and Frequent Evaluation: Building a domain-specific evaluation system from the start is crucial. A layered approach—with unit tests, periodic deep evaluations, and A/B testing—enables rapid iteration and early detection of issues.
  • Continuous Feedback Loops: Leveraging real-world data to continuously refine evaluation criteria is essential. A system that constantly logs and learns from each interaction can drive ongoing improvements.
  • Balanced Metrics: Avoid over-optimizing for a single benchmark. Ensure that improvements in one area do not lead to regressions in others.
  • Calibrated Automation: Regularly compare automated evaluation results with human judgment. This ensures that your evaluation methods remain aligned with what users actually value.

Conclusion

LLM evaluations are not just an academic exercise—they are a fundamental component of building reliable, user-friendly AI systems. By defining clear success criteria, using a blend of quantitative and qualitative methods, and integrating evaluations into your production workflow, you can catch issues early and continuously improve your models. Though evaluations may not be the most glamorous part of AI development, they are the unsung hero that transforms an impressive demo into a trustworthy, production-ready product.

Investing in a robust evaluation system unlocks superpowers for any AI team, enabling rapid fine-tuning, effective debugging, and, ultimately, the delivery of AI products that users can trust.

Evaluation Driven Development

As a software engineer, I've been a huge fun of using test driven development(TDD), but how do we test something like LLMs? LLMs are far away from classical ML models, a stark difference is traditional ML models are often good at doing one task, but LLMs can do a variety of them.

Thinking about it as a engineer, it used to scare me when each API returns a new response, not a very reliable system, is it?

That's why I believe everyone building with LLMs needs to adopt the practice of using evals while building your LLM powered applications.

For any software application, backward compatibility is a vital aspect to consider. It ensures that newer versions of the software will work with older ones, maintaining functionality and preventing inconveniences that can arise when making upgrades. When it comes to LLMs, however, ensuring backward compatibility can be a substantial challenge.

LLM software, by its very nature, is continually evolving, with newer and more advanced models being released regularly. While one might assume that these software improvements would have linear upward trajectory in terms of performance and usability, the truth is, it may not be the same for certain specific use cases. Upgrading from an earlier model to a newer one might cause discrepancies in results since each model has its unique characteristics and behaviours.

Ensuring backward compatibility, while challenging, is not an impossible task. Below are some strategies that can help in achieving this:

  • Prompts Standardization: Developing a standardized syntax or nomenclature for prompts can go a long way in ensuring backward compatibility. This would imply a set of universally recognized and accepted guidelines for writing prompts that apply to all existing models and also newer ones.
  • Thorough Testing: A robust testing process can help mitigate compatibility issues. Older prompts should be rigorously tested with newer models. Any prompt that does not achieve the desired or expected result should be adjusted or rewritten.
  • Establish Clear Documentation: Detailed documentation of all prompts is crucial. When a developer understands the original intentions and structures behind a prompt, they will be better equipped to make necessary adjustments with newer models.
  • Creating Model-Specific Code Paths: For critical applications where backward compatibility must be maintained, developers could consider running two versions of the model (the older and the newer one) and switch between them based on the situation. The decision on which model to use could depend on the prompts or the quality of responses.

How do you actually use evals?

Using evals largely differs on what you're building.

The best starting point to take while building is think evals first, how will the user prompt the LLM and what are the odds of the LLM returning the correct text?

There is also the issue of 'prompt compatibility'. A prompt that worked perfectly with an older model may not necessarily evoke the intended response from a newer one, compelling developers to re-engineer their prompts, which can be quite a time-consuming task.

IMO, starting from evals and going into the application behaviour is a great way to replicate TDD while building applications! Some things to consider:

  • User behaviour
  • Type of application
  • Complexity: are you building agents or chains?
  • Tools provided to the base LLM
  • Prompt engineering: setting roles is a good hack and will ensure better evals!

Metrics that matter

In GenAI, user stickiness seems to be the largest issue, if you're a devtool or a consumer app, retention is a pain. Evals can in part help eliminate this by ensuring completion quality.

Some key KPIs:

  • faithfulness - the factual consistency of the answer to the context base on the question.
  • context_precision - a measure of how relevant the retrieved context is to the question. Conveys quality of the retrieval pipeline.
  • answer_relevancy - a measure of how relevant the answer is to the question
  • context_recall: measures the ability of the retriever to retrieve all the necessary information needed to answer the question.
  • Harmfulness: reducing harmful outputs.
  • PII: making sure senstive user data is not leaked by mistake.

How does this affect your product KPIs?

Evals will make sure your NPS from users increases, while making the need to log each interaction less important, since there's an established metric for output quality.

Think of using evals as having a QA engineer in form of an SDK!

LLMOps 101

When we integrate LLMs into our applications, there's a major boost in the features of your apps. However, as with any complex system, it is crucial to have mechanisms in place that allow us to monitor and evaluate their performance over time.

Logging is a fundamental aspect of this oversight. This blog post will explore why logging is essential for those utilizing LLMs any products, and how to follow an evaluation framework and production monitoring approach.

Building apps with LLMs has the same principles as software engineering, with a focus on building reliable and scalable apps.

Understanding LLM Behavior

LLMs are inherently non-deterministic; the same input can lead to different outputs depending on various factors, including the model's training and the context provided. Logging helps us to record each interaction with the LLM, enabling us to understand its behavior by analyzing its responses over time. This is the foundation of building a reliable AI-powered application.

Some key KPIs to observe:

  • Cost Analysis: By tracking the cost of requests in real time, developers can manage budgets effectively and forecast expenses with greater accuracy.
  • Token Metrics: Understanding token usage helps optimize prompt design, potentially lowering costs and improving response quality.
  • Latency Averages: Performance is key in user experience. Mean latency metrics are crucial for identifying lags and making necessary optimizations.
  • Success and Failure Rates: Real-time assessment of request outcomes enables developers to swiftly address failures and enhance success rates.
  • User Engagement: Identifying your user base and their usage patterns allows for better targeting and personalization strategies.
  • Model Popularity: Knowing which models are most used can guide decisions about future integrations or depreciations.

Identifying and Resolving Errors

Even the most advanced LLMs can produce errors, including misunderstandings, non sequiturs, and "hallucinations," where the model confidently presents incorrect information. Logging is essential for identifying when and why these errors occur. By analyzing the logs, we can tweak our prompts or the model's parameters to reduce the incidence of such errors and improve the overall accuracy of the system.

  • Error Rate Diagnosis: A high-level view of errors can pinpoint systemic issues that need immediate attention.
  • Error Type Distribution: Classifying errors helps in understanding what kind of issues are most prevalent and how to prioritize fixes.
  • Error Trend Analysis: Observing the trends of errors over time can indicate the robustness of newly released features or models.
  • Rescue Success: Automatic retries and fallbacks are a safety net; monitoring their success helps ensure reliability even in failure scenarios.

Measuring Performance and Reliability

In order to ensure that an LLM is performing optimally and reliably within an application, it's vital to track specific KPIs that can provide actionable insights into its behavior. These KPIs help in understanding how the LLM interacts with users and handles various queries, thus informing decisions on system improvements and optimizations.

Some key KPIs for measuring performance and reliability include:

  • Response Time: The average time taken for the LLM to respond to a query. This KPI is crucial for user satisfaction as it directly impacts the user experience.
  • Uptime / Availability: The percentage of time the LLM is operational and available for use without any outages, indicating system reliability.
  • Error Rate: The ratio of the number of failed requests to the total number of requests, which helps in pinpointing stability issues.
  • Success Rate: The percentage of queries handled successfully without any errors or interventions, showcasing the efficacy of the LLM.
  • Recovery Time: The average time it takes for the LLM to recover from an error or failure, reflecting the resilience of the system.
  • Quality of Responses: Measure the accuracy and relevance of the LLM's responses through qualitative analysis or user ratings.
  • Throughput: The number of requests processed by the LLM in a given time frame, indicating the system's capacity to handle load.
  • Fallback Rate: The frequency with which the system needs to resort to fallback mechanisms due to LLM's inability to provide an appropriate response.
  • Repeat Interaction Rate: The rate at which users need to ask follow-up questions to get satisfactory answers, shedding light on the clarity and completeness of the LLM's responses.
  • Benchmark Against Goals: How the LLM's performance aligns with predefined benchmarks or objectives for various metrics, reflecting whether the system meets the set performance goals.

Feedback Loops for Machine Learning

For LLMs to improve, they need high-quality, structured data to learn from. Logging provides invaluable data about the model's inputs and outputs, which can be used for further training and refinement. This creates a feedback loop where the performance of the LLM is continuously improved based on actual usage data.

  • Feedback Volume: Keeping a pulse on how much feedback you're receiving is essential for understanding user engagement.
  • Feedback Quality: Inspecting the scores and sentiments in feedback helps gauge user happiness and areas needing improvement.
  • Trend Analysis: A trend line of feedback over time allows developers to measure the impact of changes and maintain a trajectory of improvement.
  • Engaged User Base: Knowing who is providing feedback can help develop a community of testers and brand advocates.

Enhancing User Experience

User feedback is a vital aspect of improving AI applications. By combining user feedback with detailed logs of LLM interactions, we can understand how users perceive the LLM's responses and identify areas where the user experience can be enhanced.

  • Unique Users: Quantifying individual users helps in understanding the reach of your app and catering to diverse needs.
  • Top Users: Identifying power users can help in community building and finding champions for your product.
  • Usage Frequency: Tracking how often users engage with your app sheds light on its stickiness and daily relevance.
  • Feedback Interaction: User feedback serves as a direct line to customer satisfaction and is critical for iterative development.

Business and Operational Insights

Logs can reveal trends in how users interact with the application, providing business insights such as the peak times for LLM usage, the most common types of queries, or areas where users frequently encounter difficulties. This information is essential for operational planning and for developing strategies to encourage more effective use of the AI system.

  • Unique Users: Quantifying individual users helps in understanding the reach of your app and catering to diverse needs.
  • Top Users: Identifying power users can help in community building and finding champions for your product.
  • Usage Frequency: Tracking how often users engage with your app sheds light on its stickiness and daily relevance.
  • Feedback Interaction: User feedback serves as a direct line to customer satisfaction and is critical for iterative development.

VC Modelling

Venture Capital financial modelling notebook

I've always wanted to cover how captables are modelled. Using this github repo, I show how can we build captables, calculate expenses and EBIT, and company valuations.

Link to github repo.

Free Cash Flow:

Free cash flow (FCF) represents the cash a company generates after accounting for cash outflows to support operations and maintain its capital assets.

There are two main approaches to calculating FCF. The first approach uses cash flow from operating activities as the starting point, and then makes adjustments for interest expense, the tax shield on interest expense, and any capital expenditures (CapEx) undertaken that year.

The second approach uses earnings before interest and taxes (EBIT) as the starting point, then adjusts for income taxes, non-cash expenses such as depreciation and amortization, changes in working capital, and CapEx. In both cases, the resulting numbers should be identical, but one approach may be preferred over the other depending on what financial information is available.

EBIT

Earnings before interest and taxes (EBIT) is an indicator of a company's profitability. EBIT can be calculated as revenue minus expenses excluding tax and interest. EBIT is also referred to as operating earnings, operating profit, and profit before interest and taxes.

Valuation

A business valuation is the process of determining the economic value of a business, giving owners an objective estimate of the value of their company. Typically, a business valuation happens when an owner is looking to sell all or a part of their business, or merge with another company.

Pre/Post Valuations

Pre-money and post-money differ in the timing of valuation. Pre-money valuation refers to the value of a company not including external funding or the latest round of funding. Post-money valuation includes outside financing or the latest capital injection.

Captable

A capitalization table is a table providing an analysis of a company's percentages of ownership, equity dilution, and value of equity in each round of investment by founders, investors, and other owners.

Todo

  • Add how to model captables
  • How EBIT is measured
  • Formulas for measurements
  • Add info about termsheets

How to Cold Email

How to write a good cold email?

If you're looking for job,internship, customers or fundraising, a cold email is your best shoot at reaching out(unless you know people from the organization).

The beauty of cold emailing is that it's a direct and upfront ask, you know what you want, if you don't define and send that email.

  • One line about who you are, and what do you do.
  • Hello {receiver}, I'm {name}, studying/working with {name} and I wish to {request}
  • Sharing about yourself
  • Adding 2 to 5 lines about the key highlights which make you a good fit for the request you've asked.
  • Be direct, short and crisp. Writing essays isn't saving anyone's time. Be concise and respectful.
  • Quantify and share your proof of work. If you worked on a feature, show the time it saved or costs reduced. If you gained 100 customers, add that. Show numbers so people understand you better.
  • Add all relevant links in the email. The reader isn't gonna spend his time looking you up. Respect their time.
  • One line about who you are, and what do you do.
  • Close the email wiht a call to action. Add a personal touch, get as personalized as possible!

Sounds interesting? Write a cold email and get what you want! Drop me text at LinkedIn if it helps.

Venture Capital Memos

Venture Capital Memos

A very common question asked by aspiring VCs is where do we learn how investments are made? VC is largely a learning on the job role. Feedback cycles take years at the minimum.

How do you think like a VC?

Enter investment memos: these small documents highlight the usual decision making process of a Venture Capital fund. Many at times, fundraises have memos as well.

List of memos to learn

Memo templates

Note :I am in no way involved with any of these funds. If you wish to learn more about VC, drop me a line at LinkedIn

Venture Capital investments in Open Source

Problem :

Understanding FOSS technologies and investing in open-source tech.

Commerical platforms put you at the mercy of closed-source companies. You are subject to vendor lock-in and rent-seeking behavior.

No way to combat prices, raise features and get community support.

Solution

Monetized open-source projects give some or all of the code away for free.

With the ability to change it.Projects are monetized via services, premium features, hosting and more. Contributors are users themselves.

Freemium plans for user-acquisition, lower CAC, better retention

Key Players

Projects

  • WordPress • The most popular website builder
  • Plausible • Google Analytics alternative
  • Elastic • Search, analyze and visualize data
  • Medusa • Shopify alternative
  • Builder • Visual website builder
  • Posthog • Alternative to Mixpanel and Amplitude
  • Supabase • Firebase alternative
  • Semgrep • Static analysis tool
  • AnonAddy • Anonymous email forwarding
  • Penpot • Figma alternative
  • Canonical • Linux OS

India:

  • Appsmith: Raised from Accel India
  • Tooljet: Raised from Nexus Venture Partners

Predictions

  • Monetized open source projects will become more ambitious.
    • NocoDB and Baserow are Airtable alternatives.
    • n8n is an alternative to Zapier and Integromat.
    • Medusa is a Shopify alternative.
    • Supabase is a Firebase alternative.
    • Appsmith is a Retool alternative.
  • Networks will become public goods. Centralized networks have been one of the most effective ways to generate wealth.
  • The Global Open Source Services Market size is expected to reach $60 billion by 2027, rising at a market growth of 17% CAGR during the forecast period. The term 'open source' refers to a kind of licensing agreement, which enables users to independently alter a work, combine work with big projects, use a work in different ways or develop a new workout based on the authentic.
  • The rising number of skilled developers is solving for India for the world. More developers have contributed to Open Source from India than any other country.

Growth

  • Self-service selling dramatically reduces the cost of selling and servicing transactional, lower-revenue deals. The product becomes a vehicle for allowing customers to expand their spending through executing upsells, particularly in usage-based pricing models. Chargebee offers 2 self-service plans in addition to a free offering, which enables SMB and mid-market customers to grow with the platform without needing to talk to sales.
  • Data-driven targeting uses product data in sales targeting and upsells motion; for instance, providing your sales team with a list of customers who are above their usage limits and ready to pay. Nearpod collects data around which teachers are actively using its platform and then leverages this to encourage school districts to become customers.
  • New or premium feature adoption can be improved by guiding users to that feature with product popups and callouts based on their usage patterns and use-case. MURAL surfaces new features in-app with contextual triggers, while also providing an in-depth changelog to demonstrate all of the value they are continuously adding to their product.
  • Community development involves fostering a community of users who can help each other understand the product and develop new innovations in usage. This community will form a key source of product advocacy and evangelism, with paying customers advocating to free users, and free users evangelizing the product to prospective users. You should support and foster the community by providing forums and events (user conferences and smaller gatherings) where they can interact, uplifting the most active users of the community, and ensuring company employees interact and become members of the community themselves. Postman has fostered an active community of developers using the product and continues to invest in it.

Opportunities

  • The counter position to compete with incumbents. Make it hard for them to mimic your strategy. Copying should lead to cannibalizing their existing business. Medusa is a Shopify alternative. It's unlikely that Shopify will open source. Even if Medusa becomes a formidable competitor.
  • Permissionless contribute to open-source projects. Use open-source contributions to build a portfolio and find jobs. Laszlo Block, former VP of People Operations at Google, says the number of Google employees without a college education is rising.
  • Open-source alternatives avoid platform risk, vendor lock-in and rent-seeking behavior. Use code that you can fork and self-host. Platforms can raise prices without providing more value because switching costs are high.
  • Turn complaints into contributions. Accept pull requests. Along with feature requests. Augment your dev team with open-source contributions.

Security

  • securing serverless in the public cloud, perhaps by isolating serverless workloads in the public cloud with granular account-level segmentation, and limiting exposure through the use of blast-radius architecture
  • rethinking authentication for transient serverless workloads by using ephemeral credentials and short-lived tokens, which are key risk mitigators for credential exposure
  • protecting your availability in a serverless landscape with robust perimeter security that deploys public and internal functions at discrete gateways
  • upgrading risk assessment, governance, and awareness by, for example, adopting policy as code for the codification of organizational policies; using regulatory frameworks in automated governance pipelines for cloud-service provisioning; and deploying all serverless workloads using an embedded DevSecOps pipeline

Risks

  • Consulting: Services are less scalable than hosting and dual-license models. Consider marginal costs before you choose this strategy.
  • Community Backlash • Moving free features to paid tiers may lead to a backlash.

Key Lessons

  • Users are the real winners of monetized open source. Public goods have less lock-in.
  • Value creation does not automatically lead to value capture. Two-way rating systems, proof of stake protocols, and airdrops are uncornered innovations.

Issues

  • IP rights will be hard to enforce in the new world. CryptoPunks creators are under fire for an inconsistent approach to derivatives. Bored Apes are slightly more lenient. NFTs gave us digital scarcity. Legitimacy and social consensus will rule the day.
  • Wealthy benefactors will use open-source projects to prop up closed ecosystems and attract talent. See Apple with Swift and Meta with React.

Blog Post Title From File Name

Hello World

The main motive behind this blog is to share what I write and share who I am, in a minimalist way. Instead of going for a fancy framework, I choose GH pages+Jekyll for ease of maintaning it.

Stay connected, stranger.