Conf42 DevSecOps 2025 - Online

- premiere 5PM GMT

DevSecOps for AI: Embedding Security and Ethics into the Machine Learning Pipeline

Video size:

Abstract

Learn how to embed security, compliance, and ethics into your AI pipelines. This talk shows how DevSecOps teams can safeguard ML systems from design to deployment using secure data flows, privacy-aware training, policy-as-code, and audit-ready pipelines for trustworthy AI.

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hi everyone. Good morning. Good evening. This is Rajkumar Supermar. I have close to 21 years of industry experience. I'm currently working as a lead data and AI engineering for at t services, right? Today I want to talk about, the critical or origin challenges that every organization facing every day, right? While building. The AI are implementing AI under machine learning pipelines. So before I get into, my topic, I would like to thank Dev SecOps for providing me opportunity to speak in this conference in front of you. Okay. Okay, let me begin my topic with the simple the recent example. Alright. Couple of months before, right? You're a large retail company in the US right? They deployed an automated visual quality inspection model, right? And once after the deployment, right? Everything works fine and whatever they expect at, right? They got the outcome, everything works fine. People are happy. But after several weeks, right? The product defects are skyrocketing, right? The customers are not Hannah. The customers are, complaining. Lot of defects, right? The business leaders are getting investigation. They start the investigation into that, and then the team discovered, right? Someone has intentionally uploaded you, manipulated images into the training data, right? It's a very small poisoning, right? I can say very small poisoning into the training data, but that compromised, yeah. Multimillionaire dollar system. But the problem is nobody noticed that is a worst part. Nobody noticed this, once, once they, deployed this model into the C-D-C-C-A-C-D pipeline, right? Because the pipeline doesn't have any control over the, a security, right? And this story is not only for retail, right? It's for everywhere, right? It's, it starts from telecom, it starts from insurance, or, it starts from. Banking, right? The DevOps pipeline have no control over the security, right? We need to have some kind of strong DevOps, right? That's what we're going to discuss. Okay. Let me move to the next slide. Okay. Nowadays deploying EA systems into the production is very fast. It's not even taking weeks or month, right? It's days, right? They plant, they implement in a days, right? But the speed comes with a couple of issues. One is data poisoning. That's what we discussed in the previous slide. And then, we have and very still attacks. We have a prompt injection. We have ethical risk, we have privacy leak. So these are, some of the issues it's come bundled with the first deployment of the ai, okay? So in 2023, right? You have big financial institution right past its loan approval system. They implemented they deployed using ai right? For the loan approval. But they stopped within few weeks or within few weeks, right? Because, it is not a it is, it's keep on, rejecting the loan application for a different sector of people, right? When they tried to investigate the issue they found, there was no issues. But the problem is the model was trained with a particular set of people, right? So it is not able to recognize the different set of peoples, right? So that's what it's like a bias, right? That's the issue, right? So again, that's a need, we need to build, yeah. EA native dev SecOps framework. Okay let me talk about, the EA native dev SecOps framework, right? It's, it comes with the, different layers, right? We have data injection we have privacy preserving training. We have automated security testing, we have governance, ethical governance case, we have producted interference. And finally, we have to do continuous monitoring, right? We should not we should not lift off the deployment. We should monitor completely, right? Why it's, for example, why it's approval, why it's denying the loan. For example, loan application, right? We need to do monitoring for several months at least, right? Until. The product become very stable. Okay, so let me break into this, into let me break this layer into small chunks. Okay. Secure data injection, right? How do we implement this, right? The data, and the trust starts with the data injection, right? We should properly inject the the data, whichever we trusted. And then we need to implement the cryptographic verification of the data sources. And then we need to detect the real time anomaly and the automatic p P two i detection and redaction. If it is having any, P two I P two I values, for example, email id, personal information that that we need to redact, that we need to mask that. We need to enable our a or in audit audit logs we need to track the lineage, data lineage. Again, just wanted to underline, the what is data lineage, one of the enterprise again happened in us, right? They traced a model failure back to a single character CSV file, right? The issue is right. Because of the lineage. Lineage, right? Nobody knows. Where exactly the CS file comes from, right? So it should come from a trusted source, but because of the corrupt corrupted file, they had a model failure. Okay, so let me go to the next one. That's privacy preserving training. So nowadays companies, social media companies are, telecom or financial companies, right? They're dealing with sensitive data, right? So in, in terms of in terms of adding the privacy we need to apply multiple privacy techniques. For example, we need to apply differential privacy which is something like, adding a nice during the training so that the model cannot memorize the individuals, right? It's like a collaboration, not individual kind of training. Then it's like a, and then second, second level. Second level will be like federated training, federated learning. So the data stays at the source. Only the model updates the travel, right? So for example, think of it as a learning from 10 hospitals, without ever moving patient data, right? And then third one will be like, homomorphic, encryption training and encrypted data. Nobody sees. That's a plain text, right? So for example I can say, usually the banks, right? They used to do the fraud detection. When once they do the fraud detection, they cannot share the real data, right? They also encrypted data and then they help each other to do the fraud detection, right? Only they need to share the, the patents right in, in stuff. Sharing the real or the raw customer data, right? So that's the power of privacy. Privacy of the preserving ai. Okay, let me move to the next one. Okay. And in terms of security testing, the machine learning pipelines we will we bring CACD automation, right? With multiple layers of testing. For example, advers testing that will generate the host inputs. After that, it'll do the static code analysis and then the model hardening, right? If there was any differential things it'll validate the, it'll do the input validation, right? And further it'll do we have implement dependency scanning for the ML frameworks. So this these stages ensures the the issues and if there was any issues. That will be caught before the deployment, right? We can able to cut the issues, we can able to fix it, right? So we cannot able, we, so we will avoid those breaches, right? Or the customer complaints. Move to the next one. Okay, next one is, producting interference inference endpoints, right? Defense against real time threats. So once after we apply the model, right? We should secure by, by the model extraction or the, bots in inference or the prompt injection. And, there are multiple ways. One of the common ways is, setting up the valve security protocol valves can be used to call us. And security protocol, right? So that will detect the bots that detect the injection like that some of which like, it'll it'll it'll pass each request coming from the outside and then it'll pass the request and understand, you said, coming from the trust source. It's not from the, it should not be coming from, the bars, right? And then it'll filter those requests, try to understand what is a request header, right? It should not be any SQ injection or there should not be any bad request. Yeah. So that's, that's how we'll product the the endpoints. Okay. Let go to the next one. Alright, so next one is governance by design, right? Ethical AI checkpoints. So now, security is not, now, not only security is enough, we must ethics and accountability into the pipelines, right? We need to add the governance gets the pre-training ethical review and we should do the bias testing. Across demographic groups, right? We should not do the testing in a in a, in a isolated group of demographics. We should group across demographics, right? And then for sometimes at least we should have the human in the loop approval, right? Yeah, again, I can tell, one of the one of the example in one of the recent deployment, right? The bias test revealed the model performed. 20 percentage was for minority group, right? The data scientist who developed the model does it, did not know why it was went wrong. The business doesn't know. But the problem is, it's all about the training data, right? The reason is they did not provide the enough training data for those minority, minority groups. That's the issue. Okay. Okay. Now, we can have policy as a code or we can have the blockchain audit trials to scale governance. We can qualify policies, we can have version control rules, and then we should implement automated compliance validation. Like enforcement across every EML pipeline. And once we implement the blockchain it'll it'll extend the immutable audit logs. No one can able to temper that and it's other transparent lineage, right? So some where exactly the source that you know it's coming from, right? It's another transparent lineage and then regulator ready evidence without the proper evidence, right? So that's what we build the trust across the company. We know from where exactly the data is coming. The the training data is coming and we should have, the strong nobody can able to temper the audit logs. Who is going to modify everything will be recorded. And then regulator ready evidence. Whenever there was a question from the federal, we know how to answer. We should have the the answer is ready, okay what the next one? Okay because of this recently healthcare, a healthcare company implemented the, the above strategy. And after that so the implement is DevOps framework, what we discussed the previous slide, right? So now the results, they don't have they have zero security incidents and they have much faster deployments, right? Through automated CSCD or automated compliance pipelines. And then they can successfully complete the audits, like in a HIPAA or SOC the process. And then they do have a trust, higher trust in among the healthcare systems. This explains once we deploy on the dev sec of ea AI innovation, right? With the multiple safeguard so we can able to achieve the fruit. Okay, next one. So now, what are the strategies we can do for every organization based on, the couple of the real time success stories? We should start with the, yay risk assessment. And then we should integrate address and bias testing into CSCD. And we should have human in the loop for at least the model became very stable. And we need to implement policy as a code enable cross-functional collaboration deploy runtime monitoring for inference for, and then maintain immutable audit trials so nobody can tamper the audit trials. Okay, go to the next one. Okay, so what are the key takeaways? Right? This security and ethics cannot be bolted onto the AI systems. They must be embedded, right? They must be extended right. This is how we build the ai that is, it should be secure, should be private, should be fair, trustworthy, transparency, and then ready for the production at scale. Okay. I think that's all I have. Thank you so much for, joining me today. If you have any questions, you can let me know. Thank you.
...

Rajkumar Sukumar

Lead Data/AI Engineering @ AT&T Services Inc

Rajkumar Sukumar's LinkedIn account



Join the community!

Learn for free, join the best tech learning community

Newsletter
$ 0 /mo

Event notifications, weekly newsletter

Access to all content