Conf42 Platform Engineering 2025 - Online

- premiere 5PM GMT

Compliance-Aware Platform Engineering: Scaling Hybrid Cloud Automation in Regulated Financial Environments

Video size:

Abstract

This talk explores the current regulatory challenges in the Fintech industry and how platform engineering framework is extended to implement policy-driven provisioning, automate compliance gates in CI/CD, and deliver scalable IaaS frameworks in hybrid cloud environments with real world use cases.

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hi everyone. My name is Satish Manna and I'm very happy to be part of K 42 Platform Engineering 2025 session, and I'm here to talk about compliance aware platform engineering. Just a little bit about me before we go into the session. I have over 25 years of experience across multiple continents. I was in Asia Pacific for the first initial. Career about 14 years, and in the last 11 years in the USCA, in my overall experience, 20 plus years are very focused on financial organizations. When I begin this presentation, I like to immediately acknowledge that most platform engineers already understand how to automate infrastructure, the harder parties doing it in regulated environments. What I have learned over the years. Is that compliance isn't something that can be added on at the end of the build. Just like we architect our. For high availability or performance, we need to architect specifically for provability, meaning that we cannot only apply controls, but also demonstrate how and when they were applied. That's the central idea behind compliance aware platform engineering. In my experience, the cultural mindset is just as important as the. Technical approach. Rather than thinking compliance teams as blockers, we need to engage them early as stakeholders In platform design, that means mapping the regulatory controls to practical, testable assertions that can be implemented in core. Once the collaboration is in place, the platform team becomes an enabler by providing a clean. Reusable way for application teams to consume compliant infrastructure the moment they need it. The stock is therefore structured around real world experience from financial institutions that operate in multiple regulatory zones. You will see that we. Moved away from siloed manual tasks and towards a model where compliance is embedded directly in automation pipelines delivered as part of the platform and continuously validated. Let's start by outlining today's agenda. To make this concrete, I will start through the regulatory landscape. We have had to go through and design based on the regulatory landscape requirements. It's important to understand this first. Because tooling decisions and architectural patterns only make sense when you look at the type of evidence regulators expect us to produce. For example, under MiFi two, which is one of the regulators in Europe, like so in the U-S-A-A-R-B and India, a Mass in Singapore, we are required to log. All tax timestamped transactions and pro synchronization of the time source for those locks. That has very real design implications when you're starting to develop this solution. Design or think of this solution once the context is set. I will take you step by step through the evolution of our delivery model from ad hoc DevOps pipelines into a scalable platform model. I'll spend most of the time on four concrete building blocks and show how each one contributes to. Compliance by design. The number one is provisioning frameworks. Number two is hard and golden images. Number three, CICD, gate integration, and the last one is policy as core. Right after that, I will show how we exposed all of these to the wider organization via self-service because it's not just enough to build a compliant pipeline. If people aren't consuming them easy, right? Then I'll share what worked and what didn't. In case study format and finally walk through a practical roadmap that others can use to get started in their own environment. Even though the examples and use cases are predominantly from financial organizations, this can be applied in any regulated industries, like in healthcare, retail insurances, you name it, right? So wherever there are some regulations applied by some external auditors or regulated organizations, so this can be applied. So what does the compliance challenge look like in the financial services? Let's look into that in this next slide. Let me drill you into the landscape a bit deeper. Financial services organizations operate under multiple overlapping regulations. Reason being, they operate in various continents from various countries. SOX governs internal controls over financial reporting in us, whereas GDPR governs personal data. As I mentioned earlier, one of the European regulators MiFi two mandates, granular record keeping of order and trade events, and Dora introduces operational requirements across hybrid infrastructure. What's key here is that each regulation not only tells you what must be protected, but often how and when that protection must be verified, those requirements become technical. For example, mafi to explicitly states about the timestamps to a millisecond accuracy for the. Financial transactions that are performed. That means a simple NTP is running isn't a good answer for a regulator, right? You must implement timestamp, lag, logging, regular validation of NTP drift and retain those logs for. Audit review, this is where manual process breaks down. Nobody can manually verify under that level of details across hundreds of thousands of servers on a daily or on a regular cadence. So what we discussed early on is that compliance is fundamentally a data problem. It's about having the right telemetry stored in the right way at the right time. Proves you enforced your policies, which means the only scalable solution is to automate all of it, including the evidence of compliance and make it part of the platform to manage this platform. Engineering itself had to evolve. Let's see how the platform engineering did evolve over the period of time. When I reflect on my own journey, we started in very traditional infrastructure silos. Like many other organizations across the globe, at that point in time, operations and security had their own tools, processes, and approval workflows. Application teams were. Often frustrated with the delays. So they built shadow scripts, a PowerShell script here, an Ansible role there just to bypass bottlenecks. That approach gave us speed in isolated pockets, but didn't give us consist worse. It created zero shared accountability for compliance. Every team was doing their own automation, but no two teams did it the same way and the results weren't reusable. Shareable traceable, the shift to DevOps helped at that point in time, right? So because it broke down the wall between dev and ops and gave us automation pipelines, but again, each pipeline was custom. One team integrated in Jenkins, another used GitLab, some applied straight into public cloud, someone to the hybrid, like into AWS and VMware as an example. While that gave velocity it also created dozens of. Fragmented delivery patterns. None of them fully aligned with regulatory policies at audit. Regulatory policy policies at audit time, we had no easy way to prove compliance across environments because there were no single baseline. So platform engineering was that logical. Next step, instead of every team reinventing the V, we began packaging proven practices into reusable version infrastructure services that the whole organization could consume. This is where the big change for compliance happen once the platform teams owns the baseline. Exposes it as a service. Every team that uses the platform automatically inherits the correct controls. Developers don't have to waste time interpreting, say, for example, number of multiple pages. Policy documents are waiting for manual risk approvals. The compliance rules are built into the self-service itself, right into the service itself. That's also where the cultural shift happened. We started being just operators, running servers, and became product owners of a platform. Internally, we started calling the platform. Compliance aware product, and when the teams consume that product, they get both velocity and compliance by default. So compliance is no longer a checkpoint at the end of a pipeline. It's baked into the pipeline itself, into the images, into the provisioning glass, and into the self-service experience. So how do we actually build system with compliance back? So let's look into see how do we build secure hybrid IAS provisioning frameworks in your next type in our next slide. So when we implemented our provisioning framework, the first objective was very clear. No resource should ever be deployed outside of. A policy enforced path. In other words, if someone wants to spin up a VM or a database or a storage bucket, they can't just create it directly in the cloud console. Everything must be declared as infrastructure as code. Typically, with Terraform in Azure, we use bicep and arm templates, and every Terraform request is passed through a policy engine that validates it against centrally defined before anything is. So we have centrally defined rules before anything that's going to be built using this ISE code. To make this real, let me give you some examples. If encryption address is disabled, the deployment is rejected. Or if someone tries to attach a public IP address to a subnet that designated for a regulated workplace workloads, the request fails. So the policy engine blocks it. So the, these are very simple controls, but when they enforced automatically, they set enormous amount of remediation effort later. The second major capability we added was predefined landing zones. These aren't just empty networks. They're blueprints that already contain our overlap network topology identity and access management model, monitoring hooks and logging configuration. So when a workload is deployed into a public cloud or onto the private cloud, it automatically lands in a secure, pre-approved environment. So teams don't even have to think about whether logging is enabled, whether backup preferences are attached. They just inherited it. One of the most important lessons we have learned the way along the way is that. Policy engine needs to remain independent from the IAC engine. So policies evolve faster than infrastructure code as code, right? So for example, if regulators update encryption requirements, I don't want to go back and rewrite Terraform modules across doesn't suffer repositories. So by separating policy into its own repository and enforcing it as an admission control layer, we can update controls globally. Immediately without breaking infrastructure code. So the end result is a framework that gives developers what they need, the ability to provision infrastructure quickly, but only through a compliant path. What I have observed is that. It doesn't reduce flexibility at all. In fact, it increases the confidence teams know what the provision is already compliant. Operation teams know they are not going to be chasing exceptions later, and because of this framework works consistently across both cloud and on-prem, and we have been able to unify governance in a way that scales globally. So we, we learned a lot about, alongside applying compliance on the provisioning process, we also need to ensure the new workloads themselves start with a compliant baseline, right? So what does that mean? That drives us, that takes us into the golden image creation with compliance, right? Golden images are often treated as convenient, something to, quickly provision because everything is embedded into it. But in a regulator environment, their foundation of compliance, our pipeline starts with a vendor ISO, for example, say. Windows server, IORL image and immediately applies our hard hardening baselines from CIS level one, as an example, internal account and lockout policies and configurations for audit and logging for a centralized lock forwarding right once the base hardening is done. We embed operational agents and malware file integrity monitoring, like time synchronization settings, backup configuration, and endpoint protection, and the next phase of the pipeline runs any vulnerability related scannings and checks the image against all known high severity vulnerabilities. So only if the images passes these checks does it it does get version and published into the catalog, right? So every release is signed and timestamped, giving us an immutable audit record of exactly what was approved and when. So another key point is that. The golden image lifecycle is automated, so we rebuild on fixed account, so every two weeks, for example. There are certain exceptions in situations like WannaCry or an emergence of a critical CDE, so it means we never massively patch live workloads. We just replaced them using new trusted base image. That's how we have eliminated drift and provided consistent audit reports for. Every server in the estate. So configuration drift, we have used power DSC, Azure, DSC, right configuration for windows and civil playbooks for Linux. But compliance shouldn't just stop at provisioning. As I mentioned earlier, it must be extended into CICD pipelines, right? So let's see how the CICD automated compliance can be baked into The entire process of compliance are platform engineering. So we have code commits, building the pipelines, having an approval blades, and then a deployment, eh. Doing a monitoring at around time, so once team started using the provisioning framework and golden images, the next logical step was to move compliance checks directly into the CICD pipeline. That means we embed policy testing just like we embed unit tests. If someone proposes a changes to an ISE module, the pipeline will run a policy test suit that validates things like I am principles, whether it has a tags, is it compliant with respect to the tags, or does it have a mandatory logging, encryption, and network rules? These are just examples. If anything fails. The policy test, the pipeline stops and gives the engineer a precise feedback on which policy failed and why. So that not only prevents non-complaint changes, it also educates the engineers over time. They start to understand the regulatory intent behind the policy. We have also extended this into runtime validation. For example, once a workload is deployed, we run periodic complaint scanning using the same policies. That allows us to detect and alert if anything drifts from the up road and standard baseline. The key point is that compliance becomes a continuous activity, not a pointing time audit. So next, let's look into policy as code. This is a cornerstone of compliance ops, right? So policy as code is really what binds everything together, right? Rather than storing governance in documents, we translate them into machine readable policy. A simple example is storage buckets must have server side encryption and block public access, right? So we spoke about this earlier example, so I'm continuing with the same example here as well, right? In the policy scope. So we write we write that in the bicep on templates and committee to the Git. So when a developer attempts to provision a bucket, the provisioning engine queries the policy runtime evaluates the request and returns allow. RI, explanation because those policies live in get, we can apply the same software development like Lifecycle to them, and we have pool request, peer reviews, automated test and approvals before a particular pool request is signed off. And we have a traceability around what changed, right? So this also gives us the ability to wash policies and roll them back if we need to. When the regulator asks us. When a particular rule was implemented, our RY it changed. We have full audit trailing git. So one of the added benefits is that policy as code allows the platform team the compliance team to collaborate with a shared artifact. The risk team doesn't need to read Terraform, and the engineering team doesn't need to read those documents that contain multiple policies. They both look at the policy escort repository that's shared across. The source of the truth changed the entire dynamic between those schemes. In a nutshell. So for example, like if I have to come into talking about what technologies we use, Corp Sentinel, OPA AWS, configures and Azure policy for IAC, right? The process involves creating a policy registry map to regulations version controlling them. Test logic embedded in it. We also automate remediation and evidence generation policy as code makes compliance verifiable and scalable transforming rules into enforceable and testable code. So now that we have made everything compliant with all the frameworks, how can we make this a balanced self-service model? Fadi. Customers are the developers to use it. So let's dwell into our next slide. Enabling self-service while preserving compliance, right? I think we have very strong base so far, building the framework, building the golden images, adding the compliance, putting them into CICD gate so that, it can be validated. Look, now engineers just don't want. Autonomy, right? So engineers want autonomy, right? While compliance teams need governance, the sweet spot is at the intersection of automation and self-service and governance. Embedding compliance into self-service platforms enable speed without sacrificing regulatory alignment. At this point, once the building docks were in place, we focused on. How to make them consumable, right? So if they cannot be consumed, there's no point in having infrastructure as code at some point. So we built a self-service catalog where development teams can select from a set of pre-approved environment types. So for example, windows service with sql and start. Linux server with app stack installed containers with runtimes and other stuff in stock, right? Each of those catalog entries is simply a wrapper around the approved provisioning modules, right? The golden image and the policy runtime. From a developer perspective, it's just a click. If they're using a UI or an API call, if they're running through some programming language and they get the environment within minutes. From a compliance perspective, they're forced through the compliant provisioning path, which triggers the compliant gates and applies the golden image. The beauty of this model is that the faster we make the self-service experience, the less incentive there is for teams to bypass it. In fact, they prefer the platform because it saves the time. We also track consumption and attach metadata to every provisioned resource. That means we can answer auditors like, who requested this workload? What policy version was enforced, and what controls were applied immediately without digging through the old emails or, change tickets, et cetera. So that's how we did the enabling self-service while preserving the compliance. Let's look at case study in a leading financial organization, what challenges we had, what, how, what approach we have taken, how the results enhance the reduction in the compliance exception, et cetera. And our, the biggest pain point for the financial organization was that they didn't know the how these regulations. Translate into technical requirements. It was that every region and every business unit integrated them slightly differently. That meant the same type of server could be built at least six different ways depending on where it was applied. It created massive headache to at audit time because each region had to justify why its configuration is different from the others. So by consolidating. Hardening into a single golden image pipeline and enforcing provisioning through a central policy engine. We unified all those regional variations into a one global standard. The result was not only a faster provisioning from down, from days to under, few hours or less than an hour, right? But also a reduction in audit findings because these were finally a, they were single source of truth con for configuration and compliance. Another important outcome was improved collaboration with the risk function. Once they saw that the policies were implemented in code and enforced consistently, they were much more comfortable delegating control to the engineering firm teams that in turn sped up project delivery and reduced the policy debate cycles during large platform rollouts. So now how can you take this into your organization? Let's look at the roadmap, the implementation roadmap. So if you're trying to get started, start with a small pilot project through the concept. Next, assess your current compliance requirements and automated automation opportunities. Define a compliance ops strategy, roadmap, tools, processes, and any required art changes. And then implement the foundation, say using Golden images policy as code. And then scale across environments and integrate with existing workflows. Use CACD policy gates, right? Finally, measure compliance metrics and continuously improve. So this is what could be implementation roadmap for in any organizations that you want to implement this whole framework. So before we close, let me leave you with some key. Takeaways. So key takeaways. First, compliance must be design principle, not an afterthought, right? Second, automation is essential because manual processes cannot keep up. Third, compliance ops is the natural next effort regulated environments. Finally, self-service and compliance can coexist if platforms are designed well. So that concludes the key takeaways in the entire presentation. I would like to Thanks. Thank you for your time. I hope this session gave you a clear roadmap to scaling compliance of automation in your respective industries. Even though I have specifically focused on financial services. Thank you very much for this opportunity. Looking forward for more sessions in.
...

Satish Manchana

Product Owner Windows @ UBS

Satish Manchana's LinkedIn account



Join the community!

Learn for free, join the best tech learning community

Newsletter
$ 0 /mo

Event notifications, weekly newsletter

Access to all content