Conf42 Prompt Engineering 2025 - Online

- premiere 5PM GMT

Designing Data Science Careers Through A/B Testing and Experimentation Mastery

Video size:

Abstract

Master the art of A/B testing to supercharge your data science career! Learn how top-tier experimentation, statistical precision, and real-world impact can set you apart in AI-driven roles—from optimizing products to fine-tuning LLMs. Don’t just analyze—experiment!

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hi everyone. I'm bna. Thanks for joining me for the session. Today I'm going to talk about how to accelerate your career via experimentation. Think of the session as a field guide. We'll hit experiment design patterns that work in the wild, the platform pieces that make experimentation fast and trustworthy, and the career moves that turn your work into impact. My goal is that you leave with a playbook. You can start using this week. Before we jump into the session let me take a minute and talk about myself and why this topic matters to me. I've spent over a decade helping teams design and scale experimentation using data to build products and solve business problems, not just report on them. My core craft is experimentation and casual inference, choosing right design for the constraint, measuring impact, credibility, and translating results into decisions On systems side, I design and operationalize experimentation platforms. So the tests are reproducible, observable, and cheap to run. And strategically, I'm obsessed with connecting insights to outcomes, turning analysis into decisions leaders can trust, and building a culture where data informed is default. That combination, methodology, platform, and business impact is what I'll share with you today. Let's quickly look at the agenda. For today, we'll start with why AB testing matters for your career where experimentation experiments create value, and how to choose the right experiment design. And we'll dive into some basic statistical concepts and understand some of the techniques that you can use to speed up your experimentation and then pivot to talking about how can you build experimentation portfolio. What are some of the tips for interview success strategies And, we will end this with an action plan. Now let's start the session by talking about why expertise in the experimentation matters. Now this is a high demand skill. It's sought, highly sought after. It's a combination of data science, experiment design, and business strategy in a measurable way. Companies across all the industries are desperately seeking for professionals who can design, implement, and interpret experiments that drive real business outcomes. And additionally, with advancements in the large language models LMS are being used in every part of the life. It becomes crucial to validate the results through experimentation. Experimentation helps evaluate prompts effectiveness, measure change in the response quality, detect biases, understand the overall model behavior at scale. Professionals who have this are indispensable. And the value of understanding core principles of AB testing benefits everyone, not just data analyst. Engineers can build smarter experiment ready systems. Product managers can make evidence ba based features and roadmap decisions. It also has strategic impact through cross functional collaboration experimentation, bridges. Data engineering decision design and product teams helping you communicate insights effectively and influencing decision making across the organization. The skill makes you indispensable as the translator between technical complexity and the business value. Okay. Now let's see where the experiment creates value. The answer is every department in the modern organization can leverage experimentation to make data-driven decisions. You can see some of the examples here on the slide. These are only a few examples, but the key here is understanding which experiment approach work best for each use case and business context. And now how do we choose the right experiment design? And let's look at what all. Experiment designs are available right now. The start. The first one is AB testing, which is a classic. This compares variance simultaneously grade for feature releases and UI changes. And the multivariate and factorial is used to test multiple elements and learn interaction effects. Holdout studies are used. To measure long term and network effects that ac that accrue over weeks, months, and sometimes a year. Business constraints should pick the design not the other way around. When the traffic patterns and the platform complicate things, switch designs switchback experiments are used to rotate by the time so that the same unit cycles through both the variants. Idle for marketplaces and two-sided platforms. Geo experiments randomized by the region when user level randomizations risk spillovers. Useful for location-based features and marketing. If true randomization is not possible, use credible alternatives such as difference indifference interrupted time series, synthetic controls difference indifference compares before and after changes in the treated group against a similar control group that didn't get the change. Interrupted time series looks at the structural break, right when the in intervention happened, if the level or the slope jumps at that point and nowhere else. That's the signal synthetic control. When you change something in one big unit, say state or city build a weighted virtual control from the other units that didn't change. So the pre-print matches yours. Expert judgment matters. Choose designs based on constraint. Technical capability and interventions nature, know when to deviate from the pure randomization. Okay, now let's jump and take a look at the statistical concepts. I wanna take a different direction here. In talking about basic statistical concepts, there are a lot of excellent resources online that can teach you about what exactly this concept is. But today I wanna spend some time. Talking about tying these concepts to the business and why it's important for business, how can you think in those lines? First hypothesis testing. This drives clarity. Knowing how to form good hypothesis teaches structure, thinking. It's not about is it better, but what outcomes and why. The mindset helps align experiments with business strategy and ensures test has clear success criteria before the launch. Second one is the confidence and uncertainty. This builds trust. Communicating uncertainty transparently builds stakeholder confidence. Leaders don't expect perfection. They expect honest ranges and risk. A this skill separates strategic thinkers from report generators, effect size and power. This helps you prioritize what matters. Understanding these concepts help you focus on impactful changes, not statistically trivial ones. The multiple testing disciplines today, we run hundreds and thousands of experiments every month with number of experiments and variance we test. There comes a need to control the false positive rate. This protects business from false wins and bad rollouts. The last, not, but not the least. The fifth one is sequential and thinking. In today's world, we see a need for faster experimentation. To improve velocity of experimentation and get continuous learnings, we need modern techniques like SQL Ambition to support adaptive learning. We've look, now that we have looked at sequential ambition, which drives experimentation velocity, let's also look at other techniques in traditional experiment setup. We have to wait for the larger samples. This delay is experiment time, insight, and thereby product development. If you look at sample size formula here the sample size is directly proportional to the variance. Thereby if you reduce the variance by half, you'll reduce the sample size by half by implementing. Variance reduction techniques, you can achieve statistical significance faster with fewer users. This translates to quicker decision making, accelerated product iteration and more experiments run overall driving rapid innovation and growth. Now that we have looked at the concept of the variance reduction, let's look at some variance reduction techniques. So the first one, which is regression adjustment with three period covariate. This is an umbrella for Cupid and co-op. Pre-post we pick one or two strong pre-treatment predictors at just the outcome. This will help lower the variance. We will look at cupid in detail in the next slide. Now let's go to the second one, which is balance. Assignment at launch is blocking or stratification. While you're randomizing. So treatment and controls start equal on the stuff that matters. Third one is metric engineering. Noise enumerators and denominators and heavy tails blow up variance instead of aggregating per user or in aggregated per user cluster level wind rise, extreme outliers, and lock, transform the right skew metrics. Cluster aware estimations for geos or switchback tests. Analyze cluster, block level or use cluster robot robust standard errors, exposure and eligibility hygiene count only truly eligible and actually expose users. Freezing events definitions prior to the experiment. These are some, a few variance reduction techniques. Now let's look at Cupid in detail. Cupid is one of the vastly used variance reduction techniques in Cupid, we reduce the variance by using the pre-ex experiment data that can help explain variance post experiment. Let's look at an example. Say we want to run a test to see if people run slower with weights attached to them. The experiment. Mile time is a metric we get when we run the experiment. And corresponding row will tell you if the weight were, the weights were added. By looking at the data on the right, you can already see that the experiment, mile time is influenced by how fast the runners already were which is a baseline baseline my time. In this case, we can leverage the average baseline mile time to help explain the difference in the variance and use the change column, which is a difference between the baseline mile time and experiment mile time to understand the impact of adding weights. What we are doing here is using pre-ex experiment data to reduce the variance. That was explainable. Some of the best practices are using the variable that has high correlation to the variable of interest. When we are applying cupid there is no need to stick with just one variable from the pre-ex experiment period. We can use multiple variables if you believe that could reduce the variance, effectively use baselines that reflect the normal behavior. Make sure the data only is from the prior to the experiment period, not from the period where the experiment is running. You can handle missing data by imputing the values. A call out for people who are interested. There are. Some advancements, toin methodology called qac, where in place of using regression model and single or multiple covariates from the pre-ex experiment period. QAC uses machine learning models to predict the baseline metrics by leveraging multiple variables from the pre-ex experiment period. For those interested, I have attached a link to this at the end of the presentation. Please feel free to take a look. Now in this new era, we are, it's also essential to master the experimentation with experimentation on LMS and prompt engineering. So let's take a look at a couple of use cases where we can leverage experimentation to optimize large language models and optimize the prompts that we are giving the la large language models. So the first one is prompt optimization systematically comparing templates few short examples and system instructions to achieve superior outcome, quality, and relevance. The second use case for experimentation is model benchmarking. Evaluating performance across different LLM architectures or fine tuned versions of some effective models. And third one is RA pipeline enhancements. Optimizing RA by testing retrieval strategies, chunking methods, and ranking algorithms. Fourth one is personalization and contextualization. Assessing how dynamic content windows and user specific data define models and boost user engagement. The last one is proactively identifying and mitigating harmful outputs, biases and hallucinations through comprehensive and systematic testing. So there are a lot of use cases, even within large language models and prompt optimization. And thinking about which kind of experimentation methodologies can you apply? You can leverage normal AB testing, multivariate testing, or contextual bandwidths, depending upon the requirement and the evaluation that you wanna perform for offline evaluation, you can use automated frameworks. Syn synthetic data can be leveraged and we can use the LLM as a judge method to to for the faster iterations. Okay. Now that we have an idea of what experimentation is, basics of the experimentation and how to improve the speed of the experimentation, let's look at some of the common pitfalls. If it seems too good to be true, it probably is. Whenever you find results that are extreme, it's always good to check against common pitfalls. Now we can think of common footfalls in four broad categories. The first one is technical implementation issues. That is sample ratio, mismatch logging gaps. When a unit of randomization is different from the unit of analysis, it inflates variance. The second one is experiment, design, floss interference and contamination. Because the units in the control and treatment have impact on one other statistical analysis errors, if you have a need to peak at just the false positive rate so that there is no room for p hacking. Fourth one is contextual challenges and some additional challenges like metric grift, seasonality et cetera. Now we have covered, oh. Common pitfalls and e testing. Let's look at how to use this to accelerate your career. Understanding and partic practicing these concepts will enhance your analytical maturity, your shift, your mindset from reactive to proactive experimentation strategy. This will also help you challenge. Opinions of leaders with evidence and guide strategic decisions, you will become a person. Bridging the gap between cross-functional teams, translating stats into business impact. All this will differentiate you in the market and make you invaluable. Now let's look at how to build the experimentation portfolio. If you're just starting out there are three key steps to building experimentation portfolio. Working and documenting realistic scenarios. Document the process, showing problem identification, hypothesis development, experimental design choices, statistical methods with the CRE business rational. Start by contributing to o open source contribute to experimentation, frameworks and statistical libraries like a stats model, PMC, cetera. Show business impact. Think about end-to-end demonstration. Develop a mindset to I trade an idea instead of one and done kind of experiments. I also wanna share some success strategies for interviews. Interview Success in the experimentation field requires not only technical expertise. But also mastery of concepts and being able to explain it in simple terms to stakeholders. Prepare specific examples where your experimentation work drove measurable business outcomes. End-to-end experimentation knowledge on how experimentation fits into larger ecosystem and how to run experiments at scale. A combination of these three would help you be successful in interviews. For anybody who is starting out you must be curious on how the trajectory of an IC looks in their science field specializing in experimentation. So here is an outlook. This is a broad, I broad view of how the TRA trajectory would look. An entry level candidate would be expected to have rock solid AB testing fundamentals, clean hypothesis, reproducible analysis, and clear writing. When you think about the mid-level, it's, the person in this role will be leading multiple experiments, mentoring analysts diving into sequential and standardizing guardrails. A senior would set experimentation strategy partner with engineering to. Drive the platform experimentation platform drive cross function. Cross org decisions with quantified impact principal or staff would shape the system. Pla develop roadmaps, contribute to open sources, conference talks teach the organization on how to learn faster. So this is a general idea of how the trajectory would look. Now to summarize everything that we have looked at till now there are three key takeaways and the first one is develop deep technical expertise. The second one is pair it with strong business acumen. Last but not the least, is master end-to-end experimentation ecosystem. Together, these three form foundation of successful career. I will end my talk with the action plan. I would highly encourage for the people exploring data science carriers to specialize in experimentation to start today, engage with community, share the learnings, and learn from the community. Stay current with emerging trends in experimentation. The experimentation field evolves rapidly with new statistical methods. Staying current would differentiate you from others. And with that I'll end the presentation. Thanks for being here. Thanks for taking time. Please reach out to me on LinkedIn if you have any questions or simply wanna chat about experimentation or career, or if you wanna, I'll talk about anything else. I'll be happy to. Connect. Thanks for the time. Thank. Thanks everyone.
...

Bhavana Reddy Chadagonda

Staff Data Scientist @ Independent Researcher

Bhavana Reddy Chadagonda's LinkedIn account



Join the community!

Learn for free, join the best tech learning community

Newsletter
$ 0 /mo

Event notifications, weekly newsletter

Access to all content