Data Science Careers via A/B Testing: Scalable Experimentation in Cloud-Native Era

Video size:

Abstract

Unlock your data science career by mastering A/B testing at scale! Learn how to design high-impact experiments, apply advanced statistical methods like CUPED, and stand out with cloud-native, portfolio-ready skills that drive real business value.

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.

Hi everyone. I'm bna. Thanks for joining me for the session. Today I'm going to talk about how to accelerate your data science career via experimentation. Think of the session as a field guide. We'll hit experiment design patterns that work in the wild and the platform pieces that make experimentation fast and trustworthy and the career moves that turn your work into impact. My goal is to leave you with a playbook that you can start using this week. Before we jump into agenda let me introduce myself and take a minute to talk about why this topic matters to me. I've spent over a decade helping teams design and scale experimentation using data to build products and solve business problems, not just to report on them. My core craft is experimentation and casual inference, choosing right design for the constraint, me measuring impact, credibility, translating results into decisions. On systems side, I design and operationalize cloud native experimentation platforms. So Tessa, reproducible, observable, and cheap to run. And strategically, I'm obsessed with connecting insights to outcomes, turning analysis into decisions leaders can trust, and building a culture where being data informed is default. That combination, methodology, platform, and business impact is what I'll share with you today. Let's jump into the agenda. We will start by looking at why AB testing matters for your career, where experimentation can create value, and some basic concepts about experimentation and some common pitfalls. After that, we will jump into how can you leverage this concepts to accelerate your career? And few tips on interview strategy. Then we will end with an action plan. Okay, let's get started. Why AB testing matters for your career? It's a high demand skill. It's highly sought after. Companies across industries are desperately seeking for professionals who can design, implement, interpret experiments that drive real business outcomes. And this is not only valuable for somebody who is a data analyst. This skill understanding of the skill can help engineers build smarter, experiment ready systems. Product managers can make evidence-based features and roadmaps. Also, it'll bridge gap between data, engineering, design, and product. Helping you communicate insights effectively and influence decision making across organization. This skill makes you indispensable as a translator between technical complexity and business value. Now let's look at where experimentation can create value. The short answer is every organization and every department every department in modern organization can leverage experimentation to make data-driven decisions. You can see some examples here on this slide. This is just like maybe 50 percentage of examples that we covered here. The key is understanding which experiment approach work best for each use case and business context. Okay. And how do we choose right experiment design. Now let's look at commonly used experiment designs. Let's start with AB testing between subjects. This is a classic approach. This compares variance simultaneously. This is great for feature releases and UI changes. Multivariate and factorial is used to test multiple elements and multiple variations where we can learn and capture interaction effects. Holdout studies are used in long-term experiments to measure effects that AC occur over weeks and years. Business constraints should pick the design, not the other way around. When traffic pattern or platform complicates things, switch designs. And now let's go to switchback experiments. Switchback experiments rotate treatment by time, so the unit cycles through. A and B ideal for marketplaces and two-sided platforms. This is mostly leveraged by companies like DoorDash, Uber and Lyft, geo experiments randomized by region when user level randomization risks spillovers useful for location based features and marketing. If true randomization isn't possible, use credible alternatives. For example, something like Difference in Difference Interrupted Time Series and Synthetic Control. These are called cultural inference methods. I have three of them here. Difference in Difference compares before and after changes in the treated group against similar control group that didn't get the change. Interrupted time series look for structured. Structure break right when the inter intervention happens, if the level or the slope jumps at that point and not anywhere else, that's the signal. Synthetic control. When you. Change something in one big unit, say a state or city build weighted virtual control from the other units that didn't change. So the pre pret trend matches yours. Expert judgment matters. Choose design based on constraint. Technical capabilities, intervention nature. Know when to deviate from pure randomization. Now let's look at some basic concepts of experimentation. I wanna take a different direction in talking about basic statistical concepts. There are a lot of excellent resources online if you wanna learn about the topics here. But today I wanna spend some time talking about trying tying these concepts to business. And why it's important for business and how you can think in those lines. First hypothesis testing. This drives clarity. Knowing how to form good hypothesis teaches structured thinking. It's not about is it better, but what outcome and why. The mindset helps align experiments with business strategy and ensures TE has clear success criteria before the launch. Confidence uncertainty builds. Trust. Communicating uncertainty transparently builds stakeholder confidence. Leaders don't expect perfection. They expect honest ranges and risk awareness. This scale separates a strategic thinker from report generator. F, exercise and power. Understanding these concepts helps you prioritize impactful changes, not statistically trivial ones four, which is multiple testing disciplines. Today we run hundreds and thousands of experiments every month. With the number of experiments and variance we test, there comes a need to control a false positive rate. This protects the business from false winds and bad rollouts. The last but not least, is sequential and patient methods. In today's world, we see the need for faster experimentation. To improve the velocity of experimentation and get continuous learnings. We need modern techniques like sequential and patient to support adaptive learnings. We have looked now that we have looked at sequential aviation, which drive experimentation velocity. Let's also look at other techniques. In a traditional experiment setup, we have to wait for larger samples. This delay is the experiment time and the insight, and thereby product development. But if you look at the formula here for calculating the sample size is directly proportional to the vari. If you reduce the variance by half, you effectively have the required samples. By implementing variance reduction techniques, you have you can achieve statistical significance faster with fewer users. This translates to QCA decision making, accelerated product iterations and more experiments run overall driving rapid innovation and growth. Now that we have looked at the concept of the variance reduction, let's look at some variance reduction techniques. Variance reduction techniques broadly fall into five different buckets. The first one is regression adjusted with pre period covariates. This is the umbrella for Cupid and coa. Pre, pre-post we. Pick one or two strong pre-treatment predictors, adjust the outcome. This will help lower the variance. We will touch Cupid from this umbrella in detail in the next slide. The second one is balanced assignment at launch. This can be achieved by blocking, stratify, randomization, or pay matching for geos. So treatment and control start equal on the stuff that matters. And the third one is metric engineering. Noisy numerators and denominators and heavy tails blow up variance. So instead, aggregate per user or cluster level. Ize the extreme outliers log, transform the right skewed metrics Cluster. Cluster aware estimations. Cluster aware estimations are mainly used for geo and switchback testing. Analyze at the cluster or block level and use the cluster robust standard errors. The last one is more of operational thing where you. Make sure only count truly eligible and actually exposed users and freeze the definition of the events before the experiment starts. These are few variance reduction techniques. Now let's look at QPI in detail. Cupid is one of the vastly used variance reduction techniques In Cupid, we reduce the variance by using pre-ex experiment data that can help explain the variance post experiment. Let's look at an example. Say we wanna run a test to see if people run slower with weights attached to them. The experiment, mild time. It's a metric that we get when we run the experiment and corresponding row will tell us if the weights were added or no. By looking at the data on the right side, you can see you can see that the experiment mile time is already influenced by how fast of a runner they are. And you can adjust the experiment mile time by using average base mile time. And the average means base mile time from the pre-ex experiment period can explain some of the variance. If you use the change column, which is a difference between experiment mile time and base mile time, that can help you estimate the impact of adding weights. So what we are doing here is using pre-ex experiment data to reduce the variance that was explainable. Okay. Some of the best practices for Cupid are using variables that has high correlation to the variable of the interest. When we are applying Cupid, there is no need to stick with just one variable from the pre-ex experiment period. We can use multiple variables if we believe that could reduce the variance, effectively, use baseline that reflects the normal behavior. Make sure to use the data only prior to the experiment not impacted by the experiment. You can handle the missing values by imputing the values a color for people who are interested. There are some advancements for QPI methodology called QA where in place of regression model and single or multiple covariate from the pre-ex experiment period, PAC uses machine learning models to predict the baseline metrics. By leveraging multiple variables from the P experiment period. For those interested, I've attached a attached link to this at the end of the presentation. Please feel free to take a look. Now let's look at common pitfalls. If something seems too good to be true, it probably is. Whenever you find results that are extreme, it's always good to check against common pitfalls. This can be broadly divided into four sections technical implementation issues. Technical implementation issues, cover sample ratio, mismatch logging gaps units of randomization mismatch. When the unit of randomization is different from unit of analysis, it inflates the variance. The second bucket is experiment design flaws. This is mainly cost because of interference and contamination between the units in the control and treatment, having impact on one other. Statistical AL analysis, if you have. If you need to peak, adjust false, a positive rate so that there is no room for p hacking contextual challenges. There are some additional challenges in the contextual challenges like meta drift and seasonality. Now that we've covered some basic concepts of AB testing let's look at how to use this to accelerate your career. Understanding and practicing these concepts will enhance your analytical maturity. You shift your mindset from reactive to proactive experimentation strategy. This will help you challenge opinions of leaders with evidence and guide strategic decisions. You will become a person bridging the gap between cross-functional teams, translating stats into business impact, and these will differentiate you in the market and make you invaluable. To do this repeatedly, you need a platform support. Okay, let's walk through end-to-end Cloud native experimentation. Architecture, a successful a testing strategy relies on robust. Scalable cloud native architecture that supports entire experimentation lifecycle from initial idea to validation and iteration, enabling continuous learning path. A general cloud native architecture of end-to-end experimentation flow can be categorized into few layers. First. We could have an infrastructure which acts like a bedrock for getting compute building services. The second layer, we could categorize this as data layer, where typical processing, along with the checks and bounds are set in place. And the final on the top, we would have experimentation platform, which would enable us to control the test design routing, feature flagging. And this could run hypothesis testing to enable us to make informed decisions. And itrate. Now let's jump into tips on building experiment portfolio. The key three key steps in building experimentation portfolio, working and documenting on realistic scenarios. Document the process showing problem identification, hypothesis development, experimental design choice, and statistical methodologies with clear business rationale contribute to open source, contribute to. Experimentation frameworks and statistical libraries like STA Model PMCQ, flow Show Business Impact and recommendation about next steps to complete end-to-end flow. Develop a mindset to iterate on the ideas instead of one and done experiments. I also want to share some success strategies for interviews. Interview. Success in experimentation fields requires not only technical expertise, but also mastery of the concepts and being able to explain it in simple terms to the stakeholders. Prepare specific examples where your experimentation work drove measurable business outcomes. End-to-end experimentation knowledge on how experimentation fits larger ecosystem and how. How to run experiments at scale will definitely add to this. A combination of these three would help you be successful in interviews. Now that we have a look, now that we have spoken about the interview success strategies, let's look at how the career trajectory of an IC in the field of data science specializing in experimentation looks like. An entry level candidate is expected to have rock solid EBIT s fundamentals. Clean hypothesis, reproducible analysis, and clear writing. A mid-level candidate is expected to lead multiple tests. Mentor analysts, have knowledge about sequential testing and patient testing, standardized guardrails. A senior would be setting experimentation, strategy, partnering with engineering to drive. The platform experimentation platform drive cross org decisions with a quantified impact, a principal or staff would shape the system by platform roadmaps contributing to open source conference docs, teaching the organization on how to learn faster. To summarize there are three key takeaways. One is develop the technical expertise, pair it with strong business acumen, master end-to-end experimentation ecosystem. Together these form foundation for a successful career. I will end my talk with the action plan. Start today. Start today by building something small. Engage with the community, share the learnings, and learn from the community. Stay current with emerging trends and experimentation. These three will differentiate you from the rest. Okay, that's all I have for today. Thanks for joining me. Please reach out to me on LinkedIn if you have any questions or simply wanna chat about experimentation or any advice on your career, I'll be happy to connect there. All the best to everyone for all the future yours. Thank you.

Slides

Download slides (PDF)

See all 53 talks at this event!

Conf42 Kube Native 2025 - Online

October 16 2025 - premiere 5PM GMT

Data Science Careers via A/B Testing: Scalable Experimentation in Cloud-Native Era

Video size:

Abstract

Summary

Transcript

Slides

Bhavana Reddy Chadagonda

Staff Data Scientist @ Intuit

Join the community!

Featured event

2026

2025

Info

Conf42 Kube Native 2025 - Online

October 16 2025 - premiere 5PM GMT

Data Science Careers via A/B Testing: Scalable Experimentation in Cloud-Native Era

Video size:

Abstract

Summary

Transcript

Slides

Bhavana Reddy Chadagonda

Staff Data Scientist @ Intuit

Join the community!