Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hi everyone.
I'm bna.
Thanks for joining me for the session.
Today I'm going to talk about how to accelerate your
career via experimentation.
Think of the session as a field guide.
We'll hit experiment design patterns that work in the wild, the platform
pieces that make experimentation fast and trustworthy, and the career moves
that turn your work into impact.
My goal is that you leave with a playbook.
You can start using this week.
Before we jump into the session let me take a minute and talk about myself
and why this topic matters to me.
I've spent over a decade helping teams design and scale experimentation using
data to build products and solve business problems, not just report on them.
My core craft is experimentation and casual inference, choosing right design
for the constraint, measuring impact, credibility, and translating results into
decisions On systems side, I design and operationalize experimentation platforms.
So the tests are reproducible, observable, and cheap to run.
And strategically, I'm obsessed with connecting insights to outcomes,
turning analysis into decisions leaders can trust, and building a
culture where data informed is default.
That combination, methodology, platform, and business impact is
what I'll share with you today.
Let's quickly look at the agenda.
For today, we'll start with why AB testing matters for your career
where experimentation experiments create value, and how to choose
the right experiment design.
And we'll dive into some basic statistical concepts and understand
some of the techniques that you can use to speed up your experimentation
and then pivot to talking about how can you build experimentation portfolio.
What are some of the tips for interview success strategies And, we
will end this with an action plan.
Now let's start the session by talking about why expertise in
the experimentation matters.
Now this is a high demand skill.
It's sought, highly sought after.
It's a combination of data science, experiment design, and business
strategy in a measurable way.
Companies across all the industries are desperately seeking for
professionals who can design, implement, and interpret experiments
that drive real business outcomes.
And additionally, with advancements in the large language models LMS are
being used in every part of the life.
It becomes crucial to validate the results through experimentation.
Experimentation helps evaluate prompts effectiveness, measure
change in the response quality, detect biases, understand the
overall model behavior at scale.
Professionals who have this are indispensable.
And the value of understanding core principles of AB testing benefits
everyone, not just data analyst.
Engineers can build smarter experiment ready systems.
Product managers can make evidence ba based features and roadmap decisions.
It also has strategic impact through cross functional collaboration
experimentation, bridges.
Data engineering decision design and product teams helping you communicate
insights effectively and influencing decision making across the organization.
The skill makes you indispensable as the translator between technical
complexity and the business value.
Okay.
Now let's see where the experiment creates value.
The answer is every department in the modern organization can
leverage experimentation to make data-driven decisions.
You can see some of the examples here on the slide.
These are only a few examples, but the key here is understanding which
experiment approach work best for each use case and business context.
And now how do we choose the right experiment design?
And let's look at what all.
Experiment designs are available right now.
The start.
The first one is AB testing, which is a classic.
This compares variance simultaneously grade for feature releases and UI changes.
And the multivariate and factorial is used to test multiple elements
and learn interaction effects.
Holdout studies are used.
To measure long term and network effects that ac that accrue over
weeks, months, and sometimes a year.
Business constraints should pick the design not the other way around.
When the traffic patterns and the platform complicate things, switch
designs switchback experiments are used to rotate by the time so that the same
unit cycles through both the variants.
Idle for marketplaces and two-sided platforms.
Geo experiments randomized by the region when user level
randomizations risk spillovers.
Useful for location-based features and marketing.
If true randomization is not possible, use credible alternatives such as difference
indifference interrupted time series, synthetic controls difference indifference
compares before and after changes in the treated group against a similar
control group that didn't get the change.
Interrupted time series looks at the structural break, right when
the in intervention happened, if the level or the slope jumps
at that point and nowhere else.
That's the signal synthetic control.
When you change something in one big unit, say state or city build
a weighted virtual control from the other units that didn't change.
So the pre-print matches yours.
Expert judgment matters.
Choose designs based on constraint.
Technical capability and interventions nature, know when to deviate
from the pure randomization.
Okay, now let's jump and take a look at the statistical concepts.
I wanna take a different direction here.
In talking about basic statistical concepts, there are a lot of excellent
resources online that can teach you about what exactly this concept is.
But today I wanna spend some time.
Talking about tying these concepts to the business and why it's
important for business, how can you think in those lines?
First hypothesis testing.
This drives clarity.
Knowing how to form good hypothesis teaches structure, thinking.
It's not about is it better, but what outcomes and why.
The mindset helps align experiments with business strategy and ensures test has
clear success criteria before the launch.
Second one is the confidence and uncertainty.
This builds trust.
Communicating uncertainty transparently builds stakeholder confidence.
Leaders don't expect perfection.
They expect honest ranges and risk.
A this skill separates strategic thinkers from report generators,
effect size and power.
This helps you prioritize what matters.
Understanding these concepts help you focus on impactful changes,
not statistically trivial ones.
The multiple testing disciplines today, we run hundreds and thousands
of experiments every month with number of experiments and variance we test.
There comes a need to control the false positive rate.
This protects business from false wins and bad rollouts.
The last, not, but not the least.
The fifth one is sequential and thinking.
In today's world, we see a need for faster experimentation.
To improve velocity of experimentation and get continuous learnings, we
need modern techniques like SQL Ambition to support adaptive learning.
We've look, now that we have looked at sequential ambition, which
drives experimentation velocity, let's also look at other techniques
in traditional experiment setup.
We have to wait for the larger samples.
This delay is experiment time, insight, and thereby product development.
If you look at sample size formula here the sample size is directly
proportional to the variance.
Thereby if you reduce the variance by half, you'll reduce the sample
size by half by implementing.
Variance reduction techniques, you can achieve statistical
significance faster with fewer users.
This translates to quicker decision making, accelerated product iteration
and more experiments run overall driving rapid innovation and growth.
Now that we have looked at the concept of the variance reduction, let's look
at some variance reduction techniques.
So the first one, which is regression adjustment with three period covariate.
This is an umbrella for Cupid and co-op.
Pre-post we pick one or two strong pre-treatment
predictors at just the outcome.
This will help lower the variance.
We will look at cupid in detail in the next slide.
Now let's go to the second one, which is balance.
Assignment at launch is blocking or stratification.
While you're randomizing.
So treatment and controls start equal on the stuff that matters.
Third one is metric engineering.
Noise enumerators and denominators and heavy tails blow up variance
instead of aggregating per user or in aggregated per user cluster level
wind rise, extreme outliers, and lock, transform the right skew metrics.
Cluster aware estimations for geos or switchback tests.
Analyze cluster, block level or use cluster robot robust standard
errors, exposure and eligibility hygiene count only truly eligible
and actually expose users.
Freezing events definitions prior to the experiment.
These are some, a few variance reduction techniques.
Now let's look at Cupid in detail.
Cupid is one of the vastly used variance reduction techniques in Cupid,
we reduce the variance by using the pre-ex experiment data that can help
explain variance post experiment.
Let's look at an example.
Say we want to run a test to see if people run slower with weights attached to them.
The experiment.
Mile time is a metric we get when we run the experiment.
And corresponding row will tell you if the weight were, the weights were added.
By looking at the data on the right, you can already see that the
experiment, mile time is influenced by how fast the runners already were
which is a baseline baseline my time.
In this case, we can leverage the average baseline mile time to help
explain the difference in the variance and use the change column, which is
a difference between the baseline mile time and experiment mile time to
understand the impact of adding weights.
What we are doing here is using pre-ex experiment data to reduce the variance.
That was explainable.
Some of the best practices are using the variable that has high
correlation to the variable of interest.
When we are applying cupid there is no need to stick with just one variable
from the pre-ex experiment period.
We can use multiple variables if you believe that could reduce the
variance, effectively use baselines that reflect the normal behavior.
Make sure the data only is from the prior to the experiment period, not from the
period where the experiment is running.
You can handle missing data by imputing the values.
A call out for people who are interested.
There are.
Some advancements, toin methodology called qac, where in place of using regression
model and single or multiple covariates from the pre-ex experiment period.
QAC uses machine learning models to predict the baseline metrics
by leveraging multiple variables from the pre-ex experiment period.
For those interested, I have attached a link to this at
the end of the presentation.
Please feel free to take a look.
Now in this new era, we are, it's also essential to master the
experimentation with experimentation on LMS and prompt engineering.
So let's take a look at a couple of use cases where we can leverage
experimentation to optimize large language models and optimize the prompts that we
are giving the la large language models.
So the first one is prompt optimization systematically comparing templates
few short examples and system instructions to achieve superior
outcome, quality, and relevance.
The second use case for experimentation is model benchmarking.
Evaluating performance across different LLM architectures or fine tuned
versions of some effective models.
And third one is RA pipeline enhancements.
Optimizing RA by testing retrieval strategies, chunking
methods, and ranking algorithms.
Fourth one is personalization and contextualization.
Assessing how dynamic content windows and user specific data define
models and boost user engagement.
The last one is proactively identifying and mitigating harmful outputs,
biases and hallucinations through comprehensive and systematic testing.
So there are a lot of use cases, even within large language
models and prompt optimization.
And thinking about which kind of experimentation
methodologies can you apply?
You can leverage normal AB testing, multivariate testing, or contextual
bandwidths, depending upon the requirement and the evaluation that you
wanna perform for offline evaluation, you can use automated frameworks.
Syn synthetic data can be leveraged and we can use the LLM as a judge
method to to for the faster iterations.
Okay.
Now that we have an idea of what experimentation is, basics of the
experimentation and how to improve the speed of the experimentation, let's
look at some of the common pitfalls.
If it seems too good to be true, it probably is.
Whenever you find results that are extreme, it's always good to
check against common pitfalls.
Now we can think of common footfalls in four broad categories.
The first one is technical implementation issues.
That is sample ratio, mismatch logging gaps.
When a unit of randomization is different from the unit of
analysis, it inflates variance.
The second one is experiment, design, floss interference and contamination.
Because the units in the control and treatment have impact on one
other statistical analysis errors, if you have a need to peak at just
the false positive rate so that there is no room for p hacking.
Fourth one is contextual challenges and some additional challenges like
metric grift, seasonality et cetera.
Now we have covered, oh.
Common pitfalls and e testing.
Let's look at how to use this to accelerate your career.
Understanding and partic practicing these concepts will enhance your
analytical maturity, your shift, your mindset from reactive to
proactive experimentation strategy.
This will also help you challenge.
Opinions of leaders with evidence and guide strategic decisions,
you will become a person.
Bridging the gap between cross-functional teams, translating
stats into business impact.
All this will differentiate you in the market and make you invaluable.
Now let's look at how to build the experimentation portfolio.
If you're just starting out there are three key steps to building
experimentation portfolio.
Working and documenting realistic scenarios.
Document the process, showing problem identification, hypothesis development,
experimental design choices, statistical methods with the CRE business rational.
Start by contributing to o open source contribute to experimentation,
frameworks and statistical libraries like a stats model, PMC, cetera.
Show business impact.
Think about end-to-end demonstration.
Develop a mindset to I trade an idea instead of one and
done kind of experiments.
I also wanna share some success strategies for interviews.
Interview Success in the experimentation field requires
not only technical expertise.
But also mastery of concepts and being able to explain it in
simple terms to stakeholders.
Prepare specific examples where your experimentation work drove
measurable business outcomes.
End-to-end experimentation knowledge on how experimentation
fits into larger ecosystem and how to run experiments at scale.
A combination of these three would help you be successful in interviews.
For anybody who is starting out you must be curious on how the trajectory
of an IC looks in their science field specializing in experimentation.
So here is an outlook.
This is a broad, I broad view of how the TRA trajectory would look.
An entry level candidate would be expected to have rock solid AB testing
fundamentals, clean hypothesis, reproducible analysis, and clear writing.
When you think about the mid-level, it's, the person in this role will
be leading multiple experiments, mentoring analysts diving into
sequential and standardizing guardrails.
A senior would set experimentation strategy partner with engineering to.
Drive the platform experimentation platform drive cross function.
Cross org decisions with quantified impact principal or staff would shape the system.
Pla develop roadmaps, contribute to open sources, conference talks teach
the organization on how to learn faster.
So this is a general idea of how the trajectory would look.
Now to summarize everything that we have looked at till now there are
three key takeaways and the first one is develop deep technical expertise.
The second one is pair it with strong business acumen.
Last but not the least, is master end-to-end experimentation ecosystem.
Together, these three form foundation of successful career.
I will end my talk with the action plan.
I would highly encourage for the people exploring data science carriers to
specialize in experimentation to start today, engage with community, share the
learnings, and learn from the community.
Stay current with emerging trends in experimentation.
The experimentation field evolves rapidly with new statistical methods.
Staying current would differentiate you from others.
And with that I'll end the presentation.
Thanks for being here.
Thanks for taking time.
Please reach out to me on LinkedIn if you have any questions or simply wanna chat
about experimentation or career, or if you wanna, I'll talk about anything else.
I'll be happy to.
Connect.
Thanks for the time.
Thank.
Thanks everyone.