Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hi everyone.
My name is Satish Manna and I'm very happy to be part of K 42 Platform Engineering
2025 session, and I'm here to talk about compliance aware platform engineering.
Just a little bit about me before we go into the session.
I have over 25 years of experience across multiple continents.
I was in Asia Pacific for the first initial.
Career about 14 years, and in the last 11 years in the USCA, in my overall
experience, 20 plus years are very focused on financial organizations.
When I begin this presentation, I like to immediately acknowledge
that most platform engineers already understand how to automate
infrastructure, the harder parties doing it in regulated environments.
What I have learned over the years.
Is that compliance isn't something that can be added on at the end of the build.
Just like we architect our.
For high availability or performance, we need to architect specifically for
provability, meaning that we cannot only apply controls, but also demonstrate
how and when they were applied.
That's the central idea behind compliance aware platform engineering.
In my experience, the cultural mindset is just as important as the.
Technical approach.
Rather than thinking compliance teams as blockers, we need to engage them early
as stakeholders In platform design, that means mapping the regulatory controls
to practical, testable assertions that can be implemented in core.
Once the collaboration is in place, the platform team becomes
an enabler by providing a clean.
Reusable way for application teams to consume compliant infrastructure
the moment they need it.
The stock is therefore structured around real world experience from
financial institutions that operate in multiple regulatory zones.
You will see that we.
Moved away from siloed manual tasks and towards a model where compliance
is embedded directly in automation pipelines delivered as part of the
platform and continuously validated.
Let's start by outlining today's agenda.
To make this concrete, I will start through the regulatory landscape.
We have had to go through and design based on the regulatory landscape requirements.
It's important to understand this first.
Because tooling decisions and architectural patterns only make sense
when you look at the type of evidence regulators expect us to produce.
For example, under MiFi two, which is one of the regulators in Europe, like
so in the U-S-A-A-R-B and India, a Mass in Singapore, we are required to log.
All tax timestamped transactions and pro synchronization of the
time source for those locks.
That has very real design implications when you're starting
to develop this solution.
Design or think of this solution once the context is set.
I will take you step by step through the evolution of our delivery
model from ad hoc DevOps pipelines into a scalable platform model.
I'll spend most of the time on four concrete building blocks and
show how each one contributes to.
Compliance by design.
The number one is provisioning frameworks.
Number two is hard and golden images.
Number three, CICD, gate integration, and the last one is policy as core.
Right after that, I will show how we exposed all of these to the
wider organization via self-service because it's not just enough
to build a compliant pipeline.
If people aren't consuming them easy, right?
Then I'll share what worked and what didn't.
In case study format and finally walk through a practical roadmap
that others can use to get started in their own environment.
Even though the examples and use cases are predominantly from financial
organizations, this can be applied in any regulated industries, like in healthcare,
retail insurances, you name it, right?
So wherever there are some regulations applied by some
external auditors or regulated organizations, so this can be applied.
So what does the compliance challenge look like in the financial services?
Let's look into that in this next slide.
Let me drill you into the landscape a bit deeper.
Financial services organizations operate under multiple overlapping regulations.
Reason being, they operate in various continents from various countries.
SOX governs internal controls over financial reporting in us,
whereas GDPR governs personal data.
As I mentioned earlier, one of the European regulators MiFi two mandates,
granular record keeping of order and trade events, and Dora introduces operational
requirements across hybrid infrastructure.
What's key here is that each regulation not only tells you what must be
protected, but often how and when that protection must be verified,
those requirements become technical.
For example, mafi to explicitly states about the timestamps to
a millisecond accuracy for the.
Financial transactions that are performed.
That means a simple NTP is running isn't a good answer for a regulator, right?
You must implement timestamp, lag, logging, regular validation of NTP
drift and retain those logs for.
Audit review, this is where manual process breaks down.
Nobody can manually verify under that level of details across
hundreds of thousands of servers on a daily or on a regular cadence.
So what we discussed early on is that compliance is
fundamentally a data problem.
It's about having the right telemetry stored in the right way at the right time.
Proves you enforced your policies, which means the only scalable solution
is to automate all of it, including the evidence of compliance and make it part
of the platform to manage this platform.
Engineering itself had to evolve.
Let's see how the platform engineering did evolve over the period of time.
When I reflect on my own journey, we started in very
traditional infrastructure silos.
Like many other organizations across the globe, at that point in time,
operations and security had their own tools, processes, and approval workflows.
Application teams were.
Often frustrated with the delays.
So they built shadow scripts, a PowerShell script here, an Ansible
role there just to bypass bottlenecks.
That approach gave us speed in isolated pockets, but didn't give us consist worse.
It created zero shared accountability for compliance.
Every team was doing their own automation, but no two teams did it the same way
and the results weren't reusable.
Shareable traceable, the shift to DevOps helped at that point in time, right?
So because it broke down the wall between dev and ops and
gave us automation pipelines, but again, each pipeline was custom.
One team integrated in Jenkins, another used GitLab, some applied straight into
public cloud, someone to the hybrid, like into AWS and VMware as an example.
While that gave velocity it also created dozens of.
Fragmented delivery patterns.
None of them fully aligned with regulatory policies at audit.
Regulatory policy policies at audit time, we had no easy way to prove
compliance across environments because there were no single baseline.
So platform engineering was that logical.
Next step, instead of every team reinventing the V, we began packaging
proven practices into reusable version infrastructure services that the
whole organization could consume.
This is where the big change for compliance happen once the
platform teams owns the baseline.
Exposes it as a service.
Every team that uses the platform automatically
inherits the correct controls.
Developers don't have to waste time interpreting, say, for
example, number of multiple pages.
Policy documents are waiting for manual risk approvals.
The compliance rules are built into the self-service itself,
right into the service itself.
That's also where the cultural shift happened.
We started being just operators, running servers, and became
product owners of a platform.
Internally, we started calling the platform.
Compliance aware product, and when the teams consume that product, they get
both velocity and compliance by default.
So compliance is no longer a checkpoint at the end of a pipeline.
It's baked into the pipeline itself, into the images, into the provisioning glass,
and into the self-service experience.
So how do we actually build system with compliance back?
So let's look into see how do we build secure hybrid IAS provisioning frameworks
in your next type in our next slide.
So when we implemented our provisioning framework, the
first objective was very clear.
No resource should ever be deployed outside of.
A policy enforced path.
In other words, if someone wants to spin up a VM or a database or a
storage bucket, they can't just create it directly in the cloud console.
Everything must be declared as infrastructure as code.
Typically, with Terraform in Azure, we use bicep and arm templates, and every
Terraform request is passed through a policy engine that validates it against
centrally defined before anything is.
So we have centrally defined rules before anything that's going to
be built using this ISE code.
To make this real, let me give you some examples.
If encryption address is disabled, the deployment is rejected.
Or if someone tries to attach a public IP address to a subnet that
designated for a regulated workplace workloads, the request fails.
So the policy engine blocks it.
So the, these are very simple controls, but when they enforced
automatically, they set enormous amount of remediation effort later.
The second major capability we added was predefined landing zones.
These aren't just empty networks.
They're blueprints that already contain our overlap network topology identity
and access management model, monitoring hooks and logging configuration.
So when a workload is deployed into a public cloud or onto the private
cloud, it automatically lands in a secure, pre-approved environment.
So teams don't even have to think about whether logging is enabled, whether
backup preferences are attached.
They just inherited it.
One of the most important lessons we have learned the way along the way is that.
Policy engine needs to remain independent from the IAC engine.
So policies evolve faster than infrastructure code as code, right?
So for example, if regulators update encryption requirements, I don't want
to go back and rewrite Terraform modules across doesn't suffer repositories.
So by separating policy into its own repository and enforcing it
as an admission control layer, we can update controls globally.
Immediately without breaking infrastructure code.
So the end result is a framework that gives developers what they
need, the ability to provision infrastructure quickly, but
only through a compliant path.
What I have observed is that.
It doesn't reduce flexibility at all.
In fact, it increases the confidence teams know what the
provision is already compliant.
Operation teams know they are not going to be chasing exceptions later,
and because of this framework works consistently across both cloud and
on-prem, and we have been able to unify governance in a way that scales globally.
So we, we learned a lot about, alongside applying compliance on the
provisioning process, we also need to ensure the new workloads themselves
start with a compliant baseline, right?
So what does that mean?
That drives us, that takes us into the golden image creation
with compliance, right?
Golden images are often treated as convenient, something to,
quickly provision because everything is embedded into it.
But in a regulator environment, their foundation of compliance,
our pipeline starts with a vendor ISO, for example, say.
Windows server, IORL image and immediately applies our hard hardening baselines
from CIS level one, as an example, internal account and lockout policies
and configurations for audit and logging for a centralized lock forwarding
right once the base hardening is done.
We embed operational agents and malware file integrity monitoring, like time
synchronization settings, backup configuration, and endpoint protection,
and the next phase of the pipeline runs any vulnerability related scannings
and checks the image against all known high severity vulnerabilities.
So only if the images passes these checks does it it does get version
and published into the catalog, right?
So every release is signed and timestamped, giving us an
immutable audit record of exactly what was approved and when.
So another key point is that.
The golden image lifecycle is automated, so we rebuild on fixed account,
so every two weeks, for example.
There are certain exceptions in situations like WannaCry or an emergence
of a critical CDE, so it means we never massively patch live workloads.
We just replaced them using new trusted base image.
That's how we have eliminated drift and provided consistent audit reports for.
Every server in the estate.
So configuration drift, we have used power DSC, Azure, DSC, right configuration for
windows and civil playbooks for Linux.
But compliance shouldn't just stop at provisioning.
As I mentioned earlier, it must be extended into CICD pipelines, right?
So let's see how the CICD automated compliance can be baked into
The entire process of compliance are platform engineering.
So we have code commits, building the pipelines, having an approval
blades, and then a deployment, eh.
Doing a monitoring at around time, so once team started using the provisioning
framework and golden images, the next logical step was to move compliance
checks directly into the CICD pipeline.
That means we embed policy testing just like we embed unit tests.
If someone proposes a changes to an ISE module, the pipeline will run a
policy test suit that validates things like I am principles, whether it has a
tags, is it compliant with respect to the tags, or does it have a mandatory
logging, encryption, and network rules?
These are just examples.
If anything fails.
The policy test, the pipeline stops and gives the engineer a precise
feedback on which policy failed and why.
So that not only prevents non-complaint changes, it also
educates the engineers over time.
They start to understand the regulatory intent behind the policy.
We have also extended this into runtime validation.
For example, once a workload is deployed, we run periodic complaint
scanning using the same policies.
That allows us to detect and alert if anything drifts from the
up road and standard baseline.
The key point is that compliance becomes a continuous activity,
not a pointing time audit.
So next, let's look into policy as code.
This is a cornerstone of compliance ops, right?
So policy as code is really what binds everything together, right?
Rather than storing governance in documents, we translate them
into machine readable policy.
A simple example is storage buckets must have server side encryption
and block public access, right?
So we spoke about this earlier example, so I'm continuing with the
same example here as well, right?
In the policy scope.
So we write we write that in the bicep on templates and committee to the Git.
So when a developer attempts to provision a bucket, the provisioning
engine queries the policy runtime evaluates the request and returns allow.
RI, explanation because those policies live in get, we can apply
the same software development like Lifecycle to them, and we have pool
request, peer reviews, automated test and approvals before a particular
pool request is signed off.
And we have a traceability around what changed, right?
So this also gives us the ability to wash policies and roll them back if we need to.
When the regulator asks us.
When a particular rule was implemented, our RY it changed.
We have full audit trailing git.
So one of the added benefits is that policy as code allows the
platform team the compliance team to collaborate with a shared artifact.
The risk team doesn't need to read Terraform, and the engineering team
doesn't need to read those documents that contain multiple policies.
They both look at the policy escort repository that's shared across.
The source of the truth changed the entire dynamic between those schemes.
In a nutshell.
So for example, like if I have to come into talking about what technologies we
use, Corp Sentinel, OPA AWS, configures and Azure policy for IAC, right?
The process involves creating a policy registry map to regulations
version controlling them.
Test logic embedded in it.
We also automate remediation and evidence generation policy as code makes compliance
verifiable and scalable transforming rules into enforceable and testable code.
So now that we have made everything compliant with all the
frameworks, how can we make this a balanced self-service model?
Fadi.
Customers are the developers to use it.
So let's dwell into our next slide.
Enabling self-service while preserving compliance, right?
I think we have very strong base so far, building the framework,
building the golden images, adding the compliance, putting them into CICD
gate so that, it can be validated.
Look, now engineers just don't want.
Autonomy, right?
So engineers want autonomy, right?
While compliance teams need governance, the sweet spot is at
the intersection of automation and self-service and governance.
Embedding compliance into self-service platforms enable speed without
sacrificing regulatory alignment.
At this point, once the building docks were in place, we focused on.
How to make them consumable, right?
So if they cannot be consumed, there's no point in having
infrastructure as code at some point.
So we built a self-service catalog where development teams can select from a
set of pre-approved environment types.
So for example, windows service with sql and start.
Linux server with app stack installed containers with runtimes
and other stuff in stock, right?
Each of those catalog entries is simply a wrapper around the approved
provisioning modules, right?
The golden image and the policy runtime.
From a developer perspective, it's just a click.
If they're using a UI or an API call, if they're running through
some programming language and they get the environment within minutes.
From a compliance perspective, they're forced through the compliant provisioning
path, which triggers the compliant gates and applies the golden image.
The beauty of this model is that the faster we make the self-service
experience, the less incentive there is for teams to bypass it.
In fact, they prefer the platform because it saves the time.
We also track consumption and attach metadata to every provisioned resource.
That means we can answer auditors like, who requested this workload?
What policy version was enforced, and what controls were applied immediately
without digging through the old emails or, change tickets, et cetera.
So that's how we did the enabling self-service while
preserving the compliance.
Let's look at case study in a leading financial organization,
what challenges we had, what, how, what approach we have taken, how
the results enhance the reduction in the compliance exception, et cetera.
And our, the biggest pain point for the financial organization was that they
didn't know the how these regulations.
Translate into technical requirements.
It was that every region and every business unit integrated
them slightly differently.
That meant the same type of server could be built at least six different
ways depending on where it was applied.
It created massive headache to at audit time because each region had
to justify why its configuration is different from the others.
So by consolidating.
Hardening into a single golden image pipeline and enforcing provisioning
through a central policy engine.
We unified all those regional variations into a one global standard.
The result was not only a faster provisioning from down,
from days to under, few hours or less than an hour, right?
But also a reduction in audit findings because these were finally a, they
were single source of truth con for configuration and compliance.
Another important outcome was improved collaboration with the risk function.
Once they saw that the policies were implemented in code and enforced
consistently, they were much more comfortable delegating control to
the engineering firm teams that in turn sped up project delivery and
reduced the policy debate cycles during large platform rollouts.
So now how can you take this into your organization?
Let's look at the roadmap, the implementation roadmap.
So if you're trying to get started, start with a small pilot
project through the concept.
Next, assess your current compliance requirements and
automated automation opportunities.
Define a compliance ops strategy, roadmap, tools, processes,
and any required art changes.
And then implement the foundation, say using Golden images policy as code.
And then scale across environments and integrate with existing workflows.
Use CACD policy gates, right?
Finally, measure compliance metrics and continuously improve.
So this is what could be implementation roadmap for in any organizations that you
want to implement this whole framework.
So before we close, let me leave you with some key.
Takeaways.
So key takeaways.
First, compliance must be design principle, not an afterthought, right?
Second, automation is essential because manual processes cannot keep up.
Third, compliance ops is the natural next effort regulated environments.
Finally, self-service and compliance can coexist if platforms are designed well.
So that concludes the key takeaways in the entire presentation.
I would like to Thanks.
Thank you for your time.
I hope this session gave you a clear roadmap to scaling compliance of
automation in your respective industries.
Even though I have specifically focused on financial services.
Thank you very much for this opportunity.
Looking forward for more sessions in.