Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hello everyone.
I thank you so much for being here today at Ative Conference 2025.
My name is Venus Kerr and I'm deeply honored to speak about the
subject, which is very close to my heart, how to build ethical AI in
healthcare, and how can we do that within Kubernetes native architecture.
In recent years, I have been fortunate to work at the
intersection of machine learning.
Cloud infrastructure and governance.
Through this journey, I have seen how easy it is to scale models, deploy them, and
push them to production, and yet we bake in bias, opacity, and compliance risk.
In healthcare, this risk becomes much more serious.
Decision making made by AI can directly affect diagnosis, trust, and lives.
So today I want to share strategies, patterns, and.
Cautionary lessons for making AI in healthcare not just powerful,
but fair, transparent, and trustworthy, all while leveraging
Kubernetes as an infrastructure.
Let's begin.
Let's start with the backdrop.
Healthcare is being transformed by AI
for.
Imaging diagnostic diagnosis or for risk scoring to personalize treatment planning.
Hospitals and medical institutions are increasingly adopting AI power tools
that analyze radiology images for ca patient, iteration, or optimized workflow.
Underneath many of these tools lies Kubernetes native stack.
Why?
Because Kubernetes off.
Offer scalability, orchestration, resource isolation, things that
traditional monolithic systems struggles with you can distribute.
Training workload, autoscale, serving, manage dependencies across microservices,
all elegantly in one cluster, but the same attribute that make Kubernetes powerful.
Its distribution, dynamic nature also introduced complexity, treatment
pipelines, data ingestions model updates, inference services, all spread
across pods and not observability.
Auditability, governance becomes harder, not easier.
Which brings us to our central question as to.
Adopt Kubernetes stack for ai.
Are we preparing them to uphold ethics, fairness, privacy, and trust?
Let's explore.
We have to acknowledge the risk upfront.
AI systems in healthcare have been troubling biases.
For instance, models trained on dermatology images have misclassified,
dark skin tones more often leading to misdiagnosis or predictive model
that undervalue risk in underserved or minority patient population.
These are not hypothetical.
These are documented real world failures.
Part of the problem is that many AI operated as operates as black box.
Like these systems, the models, they are black box.
You feed input in, you get output out, but you cannot easily inspect why the
model arrived at a particular decision.
That ity is deeply problematic when you are dealing with the human lives
and decisions, clinicians, patients.
Regulators demand explainability.
Why did you predict that?
And on what basis?
So we have two core biases on two core issues, actually.
One is bias, unfair outcome across groups, and opaqueness, lack
of transparency in healthcare.
We can't ignore those.
The question is, can Kuber help us address them?
The answer is yes.
If we treat ethical behavior as a part of the infrastructure
itself, we can resolve it.
Let's see,
Kubernetes at first glance is just container orchestration system, but I
see it as an ethical infrastructure, a canvas onto which we can embed
governance audits and controls.
Consider the features Kubernetes gives us controlled.
Rolled out can develop deployments, admission controllers, automatic
rollbacks, namespace, isolation, and rich observability while
logging, tracing, and metrics.
Each of these features becomes an opportunity to insert checks,
not just for performance, but for fairness, privacy, and transparency.
So what does that mean?
For example, before a model is promoted to serving an aian web hook could
enforce that a fairness audit has passed.
A rollback could be triggered if drift is detected.
Logs and audit drills become immutable record of what was deployed when and what.
Credentials.
Thus Kubernetes gives us guardrails, checkpoints, and hook hooks by design.
But hooks alone does not guarantee ethics.
You still need to build the logic.
So let's break down this into three pillars.
I believe they are essential fairness, explainability, and privacy.
After that, we'll see how they can be.
Composed and scale.
Let's first look into fairness.
When we talk about the fairness models, we mean models that don't
systematically disadvantage particular demographic group, but fairness
is not one size With all concept.
There are multiple mathematical definitions, demographic
parity, equalized odds.
Predictive parity and trade offs between them.
In practice, what I do is bake fairness audits into my deployment pipeline.
After each training, job completes a fairness audit task runs, for example,
using Fair Learn or I-B-M-A-I-F 360.
The task calculates metrics across subgroups.
Whether they are false negative rates significantly higher for one group.
Does the model error vary across population?
If any metrics boilers a threshold, the pipeline halts.
The model is flagged for review or roll back in Kubernetes.
This audit job can run as a job or part of an Argo workflow because it's virgin.
Automated and repeatable.
We avoid human error or oversight.
Let me share a story.
One in one of our deployment, our fairness audit got a subtle shift.
Over time, one subgroup error rate had drifted upward.
The system automatically rolled back to the last one.
Good model, known good model, giving us time to retrain and rebalance.
Without automation, that drift might have gone unnoticed until the harm was done.
The fairness checks don't need to happen only at training.
They can happen continuously, especially if our system retrains or adapts.
Model introduction, monitoring, fairness, drift is just as
crucial as performance metrics.
Alright, we have built fairness into training, but what about explainability?
Let's talk about explainability Next.
Explainability asks, why did the model decide what it did in healthcare?
Clinicians, patients deserve the transparency, the two levels of
explanation, local explanation.
For single prediction, for example, why did the model
assign this patient high risk?
Techniques like line or sharp can compute per feature contribution.
Global explanations like understanding the model's behavior in aggregate
feature importance, sensitivity analysis, pattern discoveries.
In Kubernetes native architecture, a robot robust pattern is to deploy
an explainability, sidecar, or microservice alongside the model server.
When an inference arrives, the main service retains the
predict, returns, the prediction.
The sidecar computes and returns explanation.
Because these services scale independently, you can
isolate resources as needed.
You may worry about performance impact.
Yes, explainability does incur latency or compute cost, but you can mitigate that.
For example, only compute explanations for flagged requests like edge
cases, high uncertainty productions.
You can c explanations for repeated inputs.
You can use the asynchronous explanation.
Return predictions first and then explanation later.
One caveat.
Sometimes explanations stem cells can reveal sensitive input data.
Input data.
So imagine access and guard.
You have to manage the access and you have to guard them.
Treat explanations, output with the same care as prediction output.
So you have to keep that in mind by combining fairness and explainability.
We built the systems that are not just accurate, but
comprehensible and justifiable law.
Next, look into the third pillar.
Privacy.
In healthcare, the patient's data is accurate.
We must design systems that minimize risk of exposure by design.
One powerful approach is federated learning.
Instead of centralizing all patient data, we keep data in the local institutes.
For example, hospitals or clinics.
Model trained locally only models, updates, or gradients are shared.
The central orchestrator aggregates, updates and produce a global model.
Rod never leaves the premises.
To strengthen privacy further, we can incorporate differential privacy before
aggregating, update, add calibrated noise so that the individual contributions
cannot be reversed engineered.
This gives us mathematical guarantee that no individual patient needs data
can be deduced even from the aggregate.
Another technique, secure enclaves, multi-party competition.
Homomorphic encryption are promising though trade off is
performance or complexity remains.
But combining federated learning and differential privacy
gives a compelling baseline.
In Kubernetes native setting.
We can spin up local training pods per hospital cluster.
These pods sent encrypted noise added updates to the central aggregator pod.
The aggregator logs all updates and ensures DP algorithm.
Crucially, every step is virgin audited and is transparent because
data never leaves its origin.
Jurisdictional regulatory and the governance compliant
challenges are minimized.
And since the updates are aggregate aggregated, we still
learn from a federation of data.
Let's look into scalability.
How can we scale it?
Because it's one thing to build a fair, private explainable model in development,
but another to run it reliably in production at scale, across clusters.
First, don't treat ethics model.
Don't treat ethics module as monolithics.
Separate them.
Fairness, auditor Explanation Services, privacy aggregator.
Separate them each.
Can auto scale or scale independently in Kubernetes.
Second, cash and batch if many requests are repeated, similar,
have similar inputs, cash explanations, or reuse, fairness.
Audit results.
Don't recompute from scratch each time.
Leverage a synchronous processing.
Some checks can run in the background while serving predictions quickly.
If an edge case is flagged later, you may retrack, rescore or notify.
Utilize monitor drift continuously, not just on performance metrics,
but fairness metrics, privacy violation used sidecar collectors.
Metrics, pipelines and dashboard.
If drift is the fairness or the privacy occurs, trigger alerts
or rollback take the advantage of multi-tenant cluster with isolation.
You must host models for multiple department or institutions on
the shared cluster, but with strong namespace isolation.
Policy enforcement and audit segmentation.
Also, think about the error handling strategies.
If your fairness engine fails, the fallback to safe mode.
If your explanation service lags, then degrade, suck gracefully.
The system must be SST and safe by design.
When done right, ethical behavior scales.
The system rather than being a drag on it.
Let's look into regulatory and ethical alignments.
Ethics does not live in a vacuum, especially in healthcare.
There are laws, regulations, and standards to respect HIPAA in US, GDPR in
Europe, FDA rules in medical devices and emerging AI governance framework like.
EU AI Act, Kubernetes help here too.
Because your infrastructure is immutable, version controlled, auditable, you
can trace back every deployed model, every fairness audit, every explanation
request, and every rule back.
That creates an audit trail regulators.
We'll, appreciate that admission controllers.
OPA gatekeeper can enforce policy before deployment.
Check that the fairness audit passed, that privacy guarantees are met, or
that explanation mechanisms are active.
If not, deployment is refused by integrating ethics
into deployment pipeline.
You reduce friction because compliance become part of engineering.
Not a separate afterthought.
When clinicians ask, can I trust the model?
Can you answer yes, here you can.
Here is the audit trail.
Here is a explainability log, and here is a fair fairness metrics.
That gives the power.
Power.
It's more powerful now because it has, you have insights.
Let's look into the practical implementation strategies.
Let me walk you through the phase roadmap you can use in your own organization.
Let's phase it out.
First phase assessment.
This phase is about understanding your current state.
Before adding any ethical AI layer, start by auditing existing AI ML models.
Check with demographic bias, missing explainability or weak privacy control.
Use open source tool like fair Learn, a IF 360 or Shap to generate baseline metrics.
Identify where these tools can fit into your Kubernetes AML stack.
For example, adding fairness checks into the.
Q flow pipeline or monitoring bias by Prometheus metrics.
Example of hospital readmission prediction model was audited and found
to underperform for older patient integration points were identified
to add fairness, evaluation, step in Q, flow training pipeline.
Second step, second phase.
While implementation you can take care is integration.
Here you operationalize what you found during assessment.
Deploy fairness and explainability tools as sidecar services or Kubernetes job.
For example, lime Pods, running explanations in the real time or fair
learn jobs running during retraining.
Automate fairness and transparency testing as a part of CICD.
Use Kubernetes monitoring like premises ANA to visualize fairness,
drift or explainability coverage.
Example here is like the hospitals in hospitals with deployed
shaft as a Kubernetes just to generate batch explanation for.
Every new model version, automatically logging features
importance to a compliance dashboard.
Third phase is important.
That's scalability, scaling, or monitored.
Once pilot implementations are proven effective, scale them
across departments or product line.
Define reusable.
Hand chart or operators that automatically apply fairness checks or privacy
settings to any new ML workload.
Implement organization wide governance rules via admission controller to
enforce AI policies before deployment.
This creates consistency, traceability, and compliance at scale.
Example, after success with readmission models, the hospital extended fairness
and explainability checks to imaging AI triage models, all governed by
standardized Kubernetes policies.
Hope that helps.
So here.
Lemme give you an hypothetical use case.
The hospital cluster trains a local risk prediction model daily.
After training of fairness, audit board runs if passed the model
packaged and promoted to inference.
In survey, each prediction passed through an explanation
sidecar, a flag, and logs metrics.
Updates flow through the federated aggregation with differential privacy.
Meanwhile, monitoring dashboards, track drift, fairness and system health.
Because each component is modular, you can grow capability.
It rateably, maybe start with fairness, audit, then add explainability,
then introduce federated learning.
You don't need to enable every pillar at day one.
So key takeaways for the healthcare engineers.
Treat ethics as infrastructure, not an afterthought.
Embed fairness, explainability privacy.
Deep into your stack.
Automate everything.
Manual gates space at scale.
When you scale it.
Use pipeline admission controller and check.
Roll out incrementally, start small quick wins, and then expand,
continuously monitor drift, fairness, privacy, performance.
They are all changing over time, so it's important to monitor them.
Create human in loop and feedback loop clinicians, patients, they may
stay in lu, they cannot be excluded.
Ethics is not the enemy of speed.
It's the foundation of trust and sustainability.
AI holds tremendous promise in healthcare, more accurate diagnosis, earlier
interventions, better patient outcome, but without care, we risk amplifying
bias, eroding trust, and causing harm.
Kubernetes gives us a flexible.
Scalable base, which I urge you to do is elevate ethics, fairness, privacy,
explainability from afterthought to first class citizen in your infrastructure.
Start small.
Bake in one fairness, check tomorrow.
Deploy one explainable explanation Sidecar.
Pilot federated training in a controlled environment.
Major them.
Learn, iterate, and share your learnings.
Let's not just build smarter AI in healthcare.
Let's build AI that people can trust.
I want to thank you.
Please reach me out.
You're welcome.
If any questions, feel free to reach me out via LinkedIn or by any means.
Thank you.