Conf42 DevSecOps 2022 - Online

DevSecOps for AI: Introducing MLSecOps for Software 2.0

Video size:


AI algorithms are vulnerable by design and companies are catastrophically unprepared to defend their AI products from cyber threats. AI security expert reveals how to protect the AI development lifecycle and prevent the AI Apocalypse in the near future with many tech companies becoming AI companies.


  • In the next five to ten years, most tech companies will become AI companies. Adversa was the first commercial company researching AI vulnerabilities. What are the security threats to large scale AI systems and how to implement security in AI pipelines.
  • The first stage is planning, and here you collect all types of requirements to understand what you are building. What is often overlooked during the planning stage is understanding of risks and security requirements. Not knowing your data and business requirements can get you into big trouble, especially in regulated industries.
  • Production level machine learning is a new area for many teams. One of the most unique problems to the security of AI is a non robust training of models. The most common problem in productizing and protecting AI systems is that they're barely tested.
  • Most common problems here are lack of monitoring in the first place. What you should start with is activity login with some usable and actionable dashboards. Finally, the most advanced step is automated detection and one possible automated response.
  • Every AI system is vulnerable by design and it expands the attack surface of software. Traditional security solutions cannot protect AI systems. You should think about securing the entire AI system, not just an AI algorithm. Start building this internal infrastructure and best practices sooner.


This transcript was autogenerated. To make changes, submit a PR.
Hey everyone, and thank you for watching my presentation. Today. I'll tell you about the Mlsecops framework for protecting AI systems. And before I jump into details, I want to tell you a story how I started in this field and actually witnessed how the entire industry was born. And like many of you at this event, I am a cybersecurity guy. And for more than ten years I've been doing security research in various product startups. And after a few years in application security, I became a product leader for security monitoring and threat detection. And we've decided to use machine learning for enterprise security monitoring. And we were quite successful with the models for insider threat detection. But closer to the release, we tested our product for vulnerabilities and we have realized that it's easier to fool our own security models. So we started diving deeper into the world of AI vulnerabilities and discovered the entire field called adversarial machine learning. And during that period, I was interested in devsecops practices, and at the same time I was also building an Mlsecops engine for our security platform. So I have created this name, Mlsecops, which was quite obvious from my two interests to separate these two work streams. One is for operationalizing our security models and another one is for protecting our own models. And then I witnessed how the field of AI security was growing with many new attacks and tools and regulations. So we decided to start a research company focused exclusively on AI vulnerabilities and we founded Adversa in 2019. We were the first commercial company researching AI vulnerabilities, and it also happened that we did many other things that could be called the first in the industry. And we started on a mission to protect AI from cyber threats with applying all the latest attacks from academia to the industry, to the real world. And I won't go into details here, but if you're interested in this topic, I encourage you to visit our website so you can read a lot of research materials there. So in this presentation, I'm summarizing my ideas from years of research, and I'll tell you why. AI is almost certainly the future of software. What are the security threats to large scale AI systems and how to implement security in AI pipelines, all while shifting security left with Mlsecops framework as an example. So why do they call AI as the most likely future of software? Well, you already can easily find many exciting AI applications in almost every industry, and many of them offer quality improvements and large scale automation. And the simple example here is that AI is widely used in cybersecurity so instead of writing regular expressions and signature rules, we can use machine learning for deciding whether a binary is malicious or benign, or whether security events are normal or anomalous. And I believe that in the next five to ten years, most tech companies will become AI companies. Similarly to how many traditional businesses had to go online and become tech companies just to stay relevant. But also, AI enables completely new ideas, like generating visuals or even writing real programs. And many unexpected things at the intersection can happen, like turning speech into text, then this text into visual arts, and this visuals into code, which essentially becomes a really well designed web page. And similarly to how the most popular applications that we use today didn't exist ten to 20 years ago. In the next years, a similar shift will happen with very new types of companies that we cannot even predict now. So why does this happen? If we dive deeper, we can see that AI offers not just cool applications, but a paradigm shift, and that's entire new way of developing software. With the traditional software, we create clear algorithms and behaviors with programming languages. But with AI in the form of machine learning, we can train the algorithms, and instead of defining all the rules, we can reduce the number of instructions and replace it with the trained model. So a deterministic algorithm designed by software developers is replaced or augmented with a probabilistic algorithm designed by AI developers. And of course, it comes with the new roles and processes. For example, data engineers collect and prepare data, then data scientists do experiments and model training, and then machine learning. Engineers focus on model packaging and productizing, and finally, operations handle deployments and monitoring. And this requires a whole new process, which is called mlops. There are many similarities with the devsecops and even some shared infrastructure and tools. But if we zoom in, we can see that models and data sets introduce new types of artifacts, like data set files, model weights, model code metrics, experiments, and so on. And all of these things increase complexity and could be potential attack vectors. So attacks could happen against data models or against a system in general. And the field of attacks against AI is called adversa machine learning. And as you can imagine, with so much value locked in AI systems, it's very valuable target for cyberattacks. And in fact, attacks against AI are already happening. You can read in the media or in incident databases of how hackers bypassed malware detectors, spam filters, or smart firewalls. And our own projects, penetration testing. AI systems have 100% success rate with industries such as automotive, biometrics, intranet, or finance. And just recently, we at Adversa, together with our partners organized the ML Sec AI hacking competition. My team has created the facial recognition track and all the top ten contestants used very different strategies and all of them successfully fooled AI models. And if you're interested to learn more about this contest, you can read the blog post with technical details. So in adversarial machine learning today, there are over 5000 research papers about attacking and defending AI systems. In this presentation, we don't look at individual attacks, so if you're interested in details you can watch one of my previous talks that covers the past ten years of AI vulnerability research. But I want to give you a quick idea about such attacks. So you know what we should protect from the most common AI threats that we see are infection. That happens at the development stage when attackers maliciously modify training data with poisoning attacks or simply trick devsecops to use infected models. Then manipulation happens at runtime when attackers full model decisions or bypass some detections with adversa examples, or simply perform a denial of service attack. An exfiltration happens at the runtime stage as well when attackers steal model algorithm details or extract private data from the model with various types of inference attacks. So if you want to know more, you can read our report that covers all known AI vulnerabilities and explains them from different angles based on industries, applications, data types and so on. And I just want to conclude this section reminding that essentially AI expands the attack surface. On the one hand, AI systems are just yet another example of software. That's why you can expect them to have all the traditional types of vulnerabilities, such as problems with input validation, access control issues, or misconfigurations. On the other hand, AI systems have new types of behaviors. So on top of the traditional vulnerabilities, we should worry about unique AI threats like infection, manipulation or exfiltration. So, as you can see, the security of AI is a fundamentally new problem, and current application security and other security solutions cannot help with AI vulnerabilities. So the new approaches are needed. But despite these unique problems, the core security principles still apply here. Security should enable AI teams build and scale AI software reliably and with more trust. And as AI is fundamentally vulnerable, when you scale it without security, you actually scale problems for your company. So instead, AI teams should be confident in AI reliability because it's backed by preparation and security processes, and then they can scale it successfully. Similarly to how security shouldn't slow down AI development. Also, the manual model validation shouldn't slow down deployment. So automation, playbooks and best practices are also required here. Essentially, we want to build security into the way software is built, not after software is built. So that's called shifting left, because we start doing security earlier in the pipeline, which is technically on the left side. And this idea is also very practical because for AI systems it's deeply connected to the levels of complexity added after each stage. For instance, if in later stages we decide that the model should be fixed with the retraining, then it will require redoing all the stages, starting from the data packaging, redoing all the experiments, choosing the best model, and so on and so forth. So the earlier you start, the more reliable AI systems you can build because you dramatically decrease the number of problems at later stages. And it's beneficial to use combinations of security controls at different stages over the model lifecycle as well. So now I want to give you a high level overview of my Mlsecops framework. I'll focus more on concepts rather than on technologies, so we could build your intuition about this topic and based on your knowledge of devsecops. Later you can do your own research on specific tools. So first I will visualize original Mlsecops tasks this way. Then I will highlight key security problems like this, and finally I will suggest solutions for MLC Ops pipeline. So the first stage is planning, and here you collect all types of requirements to understand what you are building and why, including business and technical requirements. Then cross functional dependencies across departments or even across companies. Then internal and external policies, regulation and compliance. So the final outcome of this stage is a set of plans and policies shaping all the later stages. What is often overlooked during the planning stage is understanding of risks and security requirements. If you don't do risk assessment, you don't know what assets worth protecting from, what you want to protect them, and what risks you can accept. Sometimes you won't even think what assets you'll create during development, and therefore you'll not be able to plan protection for it. If you don't do threat modeling, you'll be blind about what attackers can do to our system. This step doesn't apply any advanced knowledge about attacking algorithms or ability to perform such attacks. This is about finding the weakest links and bottlenecks in the AI system design and how users interact with the AI, how AI outputs can impact users or business decisions. And finally, if you don't manage security, you basically don't have it. So depending on the scale, it could be just a team level or system level security process, or a wider program affecting multiple teams, departments or entire company. So when you start implementing the practices of risk assessment threat modeling and security governance. You'll start producing some artifacts which will affect the entire Mlsecops pipeline. Like risk register helps you track AI risks and mitigations, and it keeps you up to date with the latest potential threats and helps you make decisions later during development. Knowing your attack surface means having all the parts of your models, data sets and user interactions analyzed for attack scenarios, and this could be linked to the risk register or tracked in the model management system with some notes and references. Having security baselines essentially means that you know how to quantify security risks and can decide whether these metrics go or no go situation for deployment. The next stage is data collection and preparation, and the process starts with understanding business requirements and what data should be collected for it, and then understanding existing data and whether something is missing. Then, according to the requirements, the data is collected, cleaned and enriched. Alternatively, it could be sourced from data vendors or partners or contractors. And finally, all this data is packaged for reusing later in the pipeline. So the outcome of this stage is basically releasing a data set as a little product and it will be used for model training later. Clearly, not knowing your data and business requirements can get you into big trouble, especially in regulated industries. You should be very knowledgeable about the structure of your data. Very often you have some private information that is absolutely not required for model training, but brings many privacy and compliance risks. The next common problem is unreliable sources of data and essentially supply and chain risk. And regardless whether you collected it on your own or sourced from data vendors, often you can't guarantee the correctness of methods for data collection, whether it was collected without violations, and whether the actual data matches the data specification. And another more advanced risk is data poisoning that injects some malicious entries in the data set. It could happen by working with compromised data set providers or by injecting individual examples that eventually end up in the data set. So the risk could be that you don't verify the integrity of individual entries and don't confirm or cannot confirm the data sets were not maliciously modified. The main overall principle here is careful curation of data sets. So for data privacy, the main piece of advice is to think whether you need this private data at all. Often it's absolutely not required for model training, so sensitive entries can be just removed. And if it's not feasible, then private details can be anonymized or tokenized. And sometimes other methods like differential privacy could also be helpful. Then to address supply chain risks and ensure data integrity, you should use reliable data sources, those that you can control or verify and data should also have verifiable quality based on specifications, and you can verify it with individual entries, with metadata, with metrics. And finally, against data poisoning, you should avoid inconsistencies in your data. Like first is like filtering out anomalous entries, and also filtering potentially malicious entries that are often anomalous too. And you should use secure data set version control to ensure that the data was not maliciously modified. If you checked that, it's correct previously. The next stage is model building. At this stage, we take a specific data set version and start building a model, including experiments with processing data in different ways, known as feature engineering, and also experimenting with model parameters and architectures. And it's important to remember that we are working with the data set that was previously supplied. So technically, it's kind of an external artifacted model. And also in many cases, for highly effective big neural networks, there are already pretrained models, so the model brains could be also external. And this model building stage essentially becomes writing code for connecting data set and the model. At the next stage, we focus on training the model and hoping that results meet our expectations. Otherwise, we'll repeat experiments with features and architecture, and train it again until the results comply with the business requirements. And finally, for model packaging, we are finalizing code from the early experiments and model training, and if necessary, we convert it from Jupiter notebooks to production ready code. The outcome of this entire stage is that the model is ready and could be reused. Essentially, it could be imported into the main application and used just like any other module. So this stage is probably the most dangerous one because it's so easy to make expensive mistakes here. So let's start with the model itself. If you're using an external model, you should be really mindful where you take it from. Just like with external data sets, you can just download a big model from GitHub and hope that it's reliable. And it actually becomes quite common to use models released by big corporations like Google and Microsoft, who spent millions of dollars for training those models. And it's very realistic scenario for cybercriminals to spread malicious models that look like benign. If the model is created from scratch, it's also quite easy to get into trouble. Production level machine learning is a new area for many teams, and sometimes due to lack of resources or role misunderstanding, data scientists are expected to write secure and production ready code, which is often disconnected from reality. And on top of this, there are many tools and frameworks that emerged from academia that were not designed for production and that were not designed with security in mind. One of the most widespread examples is importing a model with Python pickle files, which is object serialization, and it's a well known insecure practice. Also, the core machine learning frameworks like Tensorflow and other libraries may have their own vulnerabilities. And finally, one of the most unique problems to the security of AI is a non robust training of models that enables model manipulation and data exfiltration attacks. There are various fundamental design flaws of neural networks that like weak decision boundaries, lack of diverse data examples that affect nonrobust learning. So to address all these risks, the main principle is to work with external models, as with external code, and conduct your own due diligence. Here, for model integrity to address supply chain risks, you should use only reliable sources of pretrained models to reduce the risk of backdoors and validate that models you downloaded match like hashes of original models. You should also use secure model version control to ensure that models were not maliciously modified, because it's easy to hijack model artifacts, especially when you work in different environments and switching back and forth between experimenting and productizing. Next, secure coding is actually the closest to traditional application security. Model code is often written by experts in data science who are not necessarily experts in software engineering. And the most common place where problems arise is the process of converting experiments code to the production code. Then you should avoid known unsafe functions like insecure pickling, object serialization, and other things. And you should only use known good libraries and diversions and check dependencies for vulnerabilities. And finally, for robust and secure machine learning, you should remember about algorithms based threats scenarios. So if your primary concern is manipulation of decisions with adversarial examples, then you should think about training a model with adversarial examples. And if you care more about privacy, you could try training with federated learning or even train on encrypted data. The next stage is model validation. At this stage, we take the model built from the previous stage and conduct model evaluations and acceptance testing in terms of model accuracy and business requirements. And if it meets the target baseline, then it's usually all. In some rare cases, we see robustness testing that is mostly focused on some non standard inputs or back testing. And our experience shows that the most mature industries are the banks in the United States because of the model. Risk management regulations also self driving companies because of the obvious safety concerns, and we see that some biometric and Internet companies also fighting some real attacks. Finally, some companies do compliance checks, but again mostly in regulated industries, and the outcome of this stage is essentially a green red light for deployment based on test reports with the evaluation metrics. The most common problem in productizing and protecting AI systems is that they're barely tested because there is very little knowledge and no actionable frameworks or best practices. And it's very rare that testing goes beyond model performance metrics. Next, even when there is robustness testing, we still see few problems here, like not all models are covered by tests. It could be a limited number of tests or limited depth of these tests, and the formal approach happens even when companies do it for regulation or safety concerns. And it's even more rare that they do adversarial testing, which is obviously very dangerous. Another common problem is that AI heavily relies on traditional infrastructure and software environments, and unless companies use some proprietary end to end platform, it's very common to put models into docker containers and hosted somewhere in the Amazon cloud. And as you know, most containers are vulnerable by default and most clouds are never responsible for your security. There are several ways this could be addressed. To work on testing coverage, you really need to have security governance, so the testing should be informed by asset registers with models and data with assigned risk levels and based on security specification, including attack surface and threats models that we created before the development started, and also check for compliance with the functional specification like model cards or data sheets or service information. All of these artifacts define the AI system behavior and should affect the scope of security testing. For better security validation, you should have testing playbooks. Scenarios I can suggest include basic adversarial testing, like a sanity check. If it fails on the most basic tests, you probably shouldn't test further. Then you can use a bigger repository of attacks with the most common tests, similar to how application security has OS top ten attacks, and finish with custom and threat based scenarios for the kind of real world penetration testing. And of course you need to secure the environment and infrastructure. As for any other application, and it depends a lot on your AI system design, but commonly you should scan containers for known vulnerabilities, be conscious about any external dependencies in the system, and ideally scan the entire infrastructure as code for security and compliance. The next stage is model deployment. If the validation stage gave us a green light and the model code with fixed weights, probably packaged in a container, is deployed and ready to accept connections. Here, the model inference step is essentially the model runtime processing inputs and responding and outputs that AI users incorporate in their decision making. And by model serving I mean like general operations of the AI system. So the final outcome of this entire stage is that a concrete model is deployed, the model behaves as expected, and the system is in a good condition. Let's see what can go wrong here. First of all, the model that is actually deployed could be different from what you wanted to deploy before the deployment. The model could have been maliciously modified in the model store by changing its code or replacing its weights. Essentially, you could deploy a Trojan here, not the model you developed. The next problem is the classic problem of adversa attacks. The three main attack groups here are manipulation attacks that include evasion, reprogramming and resource exhaustion, which is essentially a denial of service attack, then exfiltration attacks that steal model algorithm details, extract private data, and infection attacks. And even though infection is usually considered as a train time attack, it's possible that those infected examples are received during model runtime. So essentially the system collected all the inputs and later used in updating data sets. And finally, unrestricted access can help develop various types of attacks. It could be absence of any access controls, or absence of user accounts at all, or just unlimited model queries or unrestricted data submissions, and many other things. So what could we do here? You obviously should check model authenticity. Depending on the system pipeline configuration. This could be a verification of model artifacts from the model store or code hosting. Or it could be a wider model governance approach where you track the entire model lifecycle in a single platform. And on top of this you should control who can roll out models to production to protect from attacks against AI algorithms. You should keep in mind your threat model, because it's impossible to have a model that is fully secure against every existing attack, but it's possible to harden the model against concrete threat scenarios. There are three main directions you can go here secure training in the first place with adversa data sets or differential privacy, then runtime defenses like input preprocessing or output postprocessing, and also what I call operational defenses, things like rate limiting. And the last point is about secure communications and access controls. The first piece of advice is to control authorized usage based on the intended model usage, so you can introduce some rate limits by design, then mutual authentication of requests and responses so you can check the requests come from eligible users, and also you can protect users from man in the middle attacks and encryption of model inputs and outputs. Intransit should also protect both parties from eavesdropping, and the final stage is model monitoring. So after the model started producing predictions, you should carefully monitor it. And not just because every serious system should be monitored, but also because of non deterministic nature of AI systems. Basically, it can change its behavior over time or with unexpected inputs to the model. That's why the model performance monitoring should be a good start. The common use cases include tracking prediction accuracy and detecting model drift. That essentially indicates that the real world data is now different from the data we trained on. Then anomaly detection in model inputs and outputs, or in some general request patterns can be another indicator of model health. And all of this should be looped back to AI development teams for continuous improvement. So the outcome of this entire stage is kind of peace of mind when you know that things are working as expected. So the most common problems here are lack of monitoring in the first place. And I'm talking not just about basic logs, but also about usable monitoring infrastructure. There is no way you can do something useful with logs that are only written in API or proxy log files or saved in container storage. Another problem is analytical capability. So when you have all those logs, do you really know what to look for, how to group or correlate incidents, and what is normal and what is not? And finally, with all those monitoring capabilities, are you passively watching or actively protecting your a system? So what you should start with is activity login with some usable and actionable dashboards that includes monitoring, model performance and metrics, metadata from requests, input and output errors, and you also should monitor requests at the user level for access control and behavior anomalies. Then when you have those monitoring capabilities, you need to add an analytics layer so you could act on this data. And the basic advice is to have event becoming so you could filter events by importance and prioritize investigation. Then it's useful to have error types so you could identify and group events for correlations. And ideally you should have more advanced stuff for incident forensics and for traceability like sessions, request hashes and other types of profiling behaviors inside your AI application. And finally, the most advanced step is automated detection and one possible automated response. So for regular performance incidents it could be as simple as alerting. And for some input trash unexpected outputs, like with the recent prompt injection attacks, it could have some safety filters or custom responses with custom errors. And for some classic adversarial attacks it could be either activation of some defenses or custom responses, or even account blocking. And of course for all these activities it's important to have feedback loops. So not only the AI system issues immediately addressed, but also best practices and processes updated and some additional trainings conducted. So now you can see the whole picture of how Mlcikov's pipeline should look like. And of course I've simplified some things for introducing this topic the very first time. And it's also worth mentioning that I didn't cover things related to traditional cybersecurity that are obviously also important. For instance, during work with the data and models, which happens on data scientist machines, it's important to secure workspaces. They often have collaboration tools like Google Collab or Jupyter notebooks that have remote shared access and could expose access to the operating system and some other critical system functions, or provide direct access to data. And there are known ransomware attacks against Jupyter setups. And also it's a well known cybersecurity problem that entire data storages could be exposed at Amazon or elastic instances, and then the security of the pipeline itself is an important topic. What you should pay attention to is access control for who can commit code, publish models and data sets, pull artifacts, approve releases, promote models for deployment and so on. And of course, secure configuration of machine learning infrastructure is also important, like the code hosting feature stores, experiment trackers and so on. So after this presentation, I want you to remember a few key ideas. First is every AI system is vulnerable by design and it expands the attack surface of software. There are real attacks against AI systems already in almost every industry and every application and every data type, so you should deliberately work on protecting from these threats. Then. Traditional security solutions cannot protect AI systems, so don't expect your firewalls or vulnerability scanners to solve the problem because the problem of AI systems is very unique. And last, you should think about securing the entire AI system, not just an AI algorithm as often discussed in the context of adversa attacks. And also remember that defenses are often more operational than algorithmic. So start building this internal infrastructure and best practices sooner. And if you work in devsecops and you have AI in your product or in any other product teams, you should definitely share your security concerns with AI developers because they rarely know about the real threats landscape. And if you find this presentation useful, I ask you to share it with your colleagues and I hope it will be useful for them too. So this is it. I appreciate your attention. If you're interested in this topic or in any form of collaboration on Mlsecops, search my name on LinkedIn or Twitter and make sure you drop me a message. Thank you.

Eugene Neelou

Co-Founder & CTO @ Adversa

Eugene Neelou's LinkedIn account Eugene Neelou's twitter account

Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways