Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hi everyone.
I'm bna.
Thanks for joining me for the session.
Today I'm going to talk about how to accelerate your data science
career via experimentation.
Think of the session as a field guide.
We'll hit experiment design patterns that work in the wild and the platform
pieces that make experimentation fast and trustworthy and the career moves
that turn your work into impact.
My goal is to leave you with a playbook that you can start using this week.
Before we jump into agenda let me introduce myself and take a minute to
talk about why this topic matters to me.
I've spent over a decade helping teams design and scale experimentation using
data to build products and solve business problems, not just to report on them.
My core craft is experimentation and casual inference, choosing
right design for the constraint, me measuring impact, credibility,
translating results into decisions.
On systems side, I design and operationalize cloud native
experimentation platforms.
So Tessa, reproducible, observable, and cheap to run.
And strategically, I'm obsessed with connecting insights to outcomes,
turning analysis into decisions leaders can trust, and building a culture
where being data informed is default.
That combination, methodology, platform, and business impact is
what I'll share with you today.
Let's jump into the agenda.
We will start by looking at why AB testing matters for your career,
where experimentation can create value, and some basic concepts about
experimentation and some common pitfalls.
After that, we will jump into how can you leverage this concepts
to accelerate your career?
And few tips on interview strategy.
Then we will end with an action plan.
Okay, let's get started.
Why AB testing matters for your career?
It's a high demand skill.
It's highly sought after.
Companies across industries are desperately seeking for
professionals who can design, implement, interpret experiments
that drive real business outcomes.
And this is not only valuable for somebody who is a data analyst.
This skill understanding of the skill can help engineers build
smarter, experiment ready systems.
Product managers can make evidence-based features and roadmaps.
Also, it'll bridge gap between data, engineering, design, and product.
Helping you communicate insights effectively and influence decision
making across organization.
This skill makes you indispensable as a translator between technical
complexity and business value.
Now let's look at where experimentation can create value.
The short answer is every organization and every department every
department in modern organization can leverage experimentation
to make data-driven decisions.
You can see some examples here on this slide.
This is just like maybe 50 percentage of examples that we covered here.
The key is understanding which experiment approach work best for
each use case and business context.
Okay.
And how do we choose right experiment design.
Now let's look at commonly used experiment designs.
Let's start with AB testing between subjects.
This is a classic approach.
This compares variance simultaneously.
This is great for feature releases and UI changes.
Multivariate and factorial is used to test multiple elements and multiple
variations where we can learn and capture interaction effects.
Holdout studies are used in long-term experiments to measure effects
that AC occur over weeks and years.
Business constraints should pick the design, not the other way around.
When traffic pattern or platform complicates things, switch designs.
And now let's go to switchback experiments.
Switchback experiments rotate treatment by time, so the unit cycles through.
A and B ideal for marketplaces and two-sided platforms.
This is mostly leveraged by companies like DoorDash, Uber and Lyft, geo experiments
randomized by region when user level randomization risks spillovers useful for
location based features and marketing.
If true randomization isn't possible, use credible alternatives.
For example, something like Difference in Difference Interrupted
Time Series and Synthetic Control.
These are called cultural inference methods.
I have three of them here.
Difference in Difference compares before and after changes in the
treated group against similar control group that didn't get the change.
Interrupted time series look for structured.
Structure break right when the inter intervention happens, if the level
or the slope jumps at that point and not anywhere else, that's the signal.
Synthetic control.
When you.
Change something in one big unit, say a state or city build
weighted virtual control from the other units that didn't change.
So the pre pret trend matches yours.
Expert judgment matters.
Choose design based on constraint.
Technical capabilities, intervention nature.
Know when to deviate from pure randomization.
Now let's look at some basic concepts of experimentation.
I wanna take a different direction in talking about basic statistical concepts.
There are a lot of excellent resources online if you wanna
learn about the topics here.
But today I wanna spend some time talking about trying tying
these concepts to business.
And why it's important for business and how you can think in those lines.
First hypothesis testing.
This drives clarity.
Knowing how to form good hypothesis teaches structured thinking.
It's not about is it better, but what outcome and why.
The mindset helps align experiments with business strategy and ensures TE has
clear success criteria before the launch.
Confidence uncertainty builds.
Trust.
Communicating uncertainty transparently builds stakeholder confidence.
Leaders don't expect perfection.
They expect honest ranges and risk awareness.
This scale separates a strategic thinker from report generator.
F, exercise and power.
Understanding these concepts helps you prioritize impactful changes,
not statistically trivial ones four, which is multiple testing disciplines.
Today we run hundreds and thousands of experiments every month.
With the number of experiments and variance we test, there comes a need
to control a false positive rate.
This protects the business from false winds and bad rollouts.
The last but not least, is sequential and patient methods.
In today's world, we see the need for faster experimentation.
To improve the velocity of experimentation and get continuous learnings.
We need modern techniques like sequential and patient to support adaptive learnings.
We have looked now that we have looked at sequential aviation, which
drive experimentation velocity.
Let's also look at other techniques.
In a traditional experiment setup, we have to wait for larger samples.
This delay is the experiment time and the insight, and thereby product development.
But if you look at the formula here for calculating the sample size is
directly proportional to the vari.
If you reduce the variance by half, you effectively have the required samples.
By implementing variance reduction techniques, you have you can
achieve statistical significance faster with fewer users.
This translates to QCA decision making, accelerated product iterations
and more experiments run overall driving rapid innovation and growth.
Now that we have looked at the concept of the variance reduction, let's look
at some variance reduction techniques.
Variance reduction techniques broadly fall into five different buckets.
The first one is regression adjusted with pre period covariates.
This is the umbrella for Cupid and coa.
Pre, pre-post we.
Pick one or two strong pre-treatment predictors, adjust the outcome.
This will help lower the variance.
We will touch Cupid from this umbrella in detail in the next slide.
The second one is balanced assignment at launch.
This can be achieved by blocking, stratify, randomization,
or pay matching for geos.
So treatment and control start equal on the stuff that matters.
And the third one is metric engineering.
Noisy numerators and denominators and heavy tails blow up variance.
So instead, aggregate per user or cluster level.
Ize the extreme outliers log, transform the right skewed metrics Cluster.
Cluster aware estimations.
Cluster aware estimations are mainly used for geo and switchback testing.
Analyze at the cluster or block level and use the cluster robust standard errors.
The last one is more of operational thing where you.
Make sure only count truly eligible and actually exposed users and
freeze the definition of the events before the experiment starts.
These are few variance reduction techniques.
Now let's look at QPI in detail.
Cupid is one of the vastly used variance reduction techniques In
Cupid, we reduce the variance by using pre-ex experiment data that can help
explain the variance post experiment.
Let's look at an example.
Say we wanna run a test to see if people run slower with weights attached to them.
The experiment, mild time.
It's a metric that we get when we run the experiment and corresponding row will
tell us if the weights were added or no.
By looking at the data on the right side, you can see you can see that the
experiment mile time is already influenced by how fast of a runner they are.
And you can adjust the experiment mile time by using average base mile time.
And the average means base mile time from the pre-ex experiment period
can explain some of the variance.
If you use the change column, which is a difference between experiment mile time
and base mile time, that can help you estimate the impact of adding weights.
So what we are doing here is using pre-ex experiment data to reduce
the variance that was explainable.
Okay.
Some of the best practices for Cupid are using variables that has high correlation
to the variable of the interest.
When we are applying Cupid, there is no need to stick with just one variable
from the pre-ex experiment period.
We can use multiple variables if we believe that could reduce the
variance, effectively, use baseline that reflects the normal behavior.
Make sure to use the data only prior to the experiment not
impacted by the experiment.
You can handle the missing values by imputing the values a color
for people who are interested.
There are some advancements for QPI methodology called QA where in place of
regression model and single or multiple covariate from the pre-ex experiment
period, PAC uses machine learning models to predict the baseline metrics.
By leveraging multiple variables from the P experiment period.
For those interested, I've attached a attached link to this
at the end of the presentation.
Please feel free to take a look.
Now let's look at common pitfalls.
If something seems too good to be true, it probably is.
Whenever you find results that are extreme, it's always good to
check against common pitfalls.
This can be broadly divided into four sections technical implementation issues.
Technical implementation issues, cover sample ratio, mismatch logging
gaps units of randomization mismatch.
When the unit of randomization is different from unit of analysis,
it inflates the variance.
The second bucket is experiment design flaws.
This is mainly cost because of interference and contamination
between the units in the control and treatment, having impact on one other.
Statistical AL analysis, if you have.
If you need to peak, adjust false, a positive rate so that there is no room
for p hacking contextual challenges.
There are some additional challenges in the contextual challenges
like meta drift and seasonality.
Now that we've covered some basic concepts of AB testing let's look at how
to use this to accelerate your career.
Understanding and practicing these concepts will enhance
your analytical maturity.
You shift your mindset from reactive to proactive experimentation strategy.
This will help you challenge opinions of leaders with evidence
and guide strategic decisions.
You will become a person bridging the gap between cross-functional teams,
translating stats into business impact, and these will differentiate you in
the market and make you invaluable.
To do this repeatedly, you need a platform support.
Okay, let's walk through end-to-end Cloud native experimentation.
Architecture, a successful a testing strategy relies on robust.
Scalable cloud native architecture that supports entire experimentation
lifecycle from initial idea to validation and iteration,
enabling continuous learning path.
A general cloud native architecture of end-to-end experimentation flow
can be categorized into few layers.
First.
We could have an infrastructure which acts like a bedrock for
getting compute building services.
The second layer, we could categorize this as data layer, where typical
processing, along with the checks and bounds are set in place.
And the final on the top, we would have experimentation platform, which
would enable us to control the test design routing, feature flagging.
And this could run hypothesis testing to enable us to make informed decisions.
And itrate.
Now let's jump into tips on building experiment portfolio.
The key three key steps in building experimentation portfolio, working and
documenting on realistic scenarios.
Document the process showing problem identification, hypothesis development,
experimental design choice, and statistical methodologies with
clear business rationale contribute to open source, contribute to.
Experimentation frameworks and statistical libraries like STA Model PMCQ, flow Show
Business Impact and recommendation about next steps to complete end-to-end flow.
Develop a mindset to iterate on the ideas instead of one and done experiments.
I also want to share some success strategies for interviews.
Interview.
Success in experimentation fields requires not only technical expertise,
but also mastery of the concepts and being able to explain it in
simple terms to the stakeholders.
Prepare specific examples where your experimentation work drove
measurable business outcomes.
End-to-end experimentation knowledge on how experimentation
fits larger ecosystem and how.
How to run experiments at scale will definitely add to this.
A combination of these three would help you be successful in interviews.
Now that we have a look, now that we have spoken about the interview
success strategies, let's look at how the career trajectory of an IC in the
field of data science specializing in experimentation looks like.
An entry level candidate is expected to have rock solid EBIT s fundamentals.
Clean hypothesis, reproducible analysis, and clear writing.
A mid-level candidate is expected to lead multiple tests.
Mentor analysts, have knowledge about sequential testing and patient
testing, standardized guardrails.
A senior would be setting experimentation, strategy,
partnering with engineering to drive.
The platform experimentation platform drive cross org decisions with a
quantified impact, a principal or staff would shape the system by
platform roadmaps contributing to open source conference docs, teaching the
organization on how to learn faster.
To summarize there are three key takeaways.
One is develop the technical expertise, pair it with strong
business acumen, master end-to-end experimentation ecosystem.
Together these form foundation for a successful career.
I will end my talk with the action plan.
Start today.
Start today by building something small.
Engage with the community, share the learnings, and learn from the community.
Stay current with emerging trends and experimentation.
These three will differentiate you from the rest.
Okay, that's all I have for today.
Thanks for joining me.
Please reach out to me on LinkedIn if you have any questions or simply
wanna chat about experimentation or any advice on your career,
I'll be happy to connect there.
All the best to everyone for all the future yours.
Thank you.