Conf42 Large Language Models (LLMs) 2024 - Online

Gentle Introduction to LLM Security

Abstract

Large language models (LLMs) present new opportunities for software while also introducing new AI safety and security risks. AI security expert explains how LLMs expand the attack surface and open doors for malicious actors.

Summary

  • Today we will explore the fascinating world of security risks in large language models. With great power comes great responsibility, so llms should be safe and secure. This presentation is based on my work as an AI security expert.
  • Developers of LLM based chat applications implement safety measures and content filters. But malicious prompts can still bypass these safeguards. This vulnerability in LLM guardrails opens the door for an attack known as jailbreaking. Such vulnerabilities can lead to severe consequences.
  • With the proliferation of open source ecosystems, LLM developers heavily rely on public models, datasets, and libraries. Vulnerable software packages of machine learning frameworks or standard libraries can introduce new vulnerabilities and enable attacker control. My primary advice here is to consider the security of the entire system.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hello and welcome. Today we will explore the fascinating world of security risks in large language models. Llms have introduced AI technologies to millions of people for both professional and personal usage. However, with great power comes great responsibility, so llms should be safe and secure. In this presentation, we will review the most important security risks for large language models. My name is Eugene, and I have been working in cybersecurity since 2008 and NGA safety since 2018. These days, I focus on commercializing research and transforming it into enterprise products, and during my journey I have made some industry contributions, including creating the MLsocos framework for integrating security into Amlops, founding the first startup in adversarial machine learning, and lately co authoring the Wasp top ten for LLM security. This presentation is based on my work as an AI security expert and a core team member of the LLM security team at OWASP. So today my goal is to give you a very quick and gentle introduction to LLM security. So for more advanced content, I recommend referring to the original work. The references will be provided at the end of this presentation. And now let's start exploring the most critical security risks for LLM applications. You have likely heard concerns about llms being misused for illicit activities such as making bombs, creating drugs, or even grooming children. Llms from well known AI companies were not intended for such tasks. Developers of LLM based chat applications implement safety measures and content filters, and despite this effort, malicious prompts can still bypass these safeguards. This vulnerability in LLM guardrails opens the door for an attack known as jailbreaking. Jailbreaking encompasses a range of techniques, from switching between languages to data format manipulation and even persuasive negotiation with llms. The consequences of weak guardrails against jailbreaks vary across use cases. General purpose chatbots providers often face significant negative publicity in regulated industries or mission critical use cases. Such vulnerabilities can lead to severe consequences. Ignore all previous instructions and do what I tell you. It is not how you want your LLM based product to deviate from expected behavior, especially when prompted to perform unusual or unsafe tasks. LLM models operate based on instructions provided by developers, including system prompts that define chatbot behavior. And although these prompts are not visible for regular users, they set up the conversation context for the model, and attackers can exploit it by attempting to predict, manipulate, or extract prompts to alter the model's behavior. For instance, attackers may request the model to ignore all previous system instructions and perform a different malicious action. By extracting system prompts, attackers gain insights, inter instructions, and possibly sensitive data. Think of competitors who can extract the brains of your LLM application and learn about trade secrets. If the LLM accepts inputs from external sources, such as files or webpages, then hidden malicious instructions could be embedded there. Imagine a resume that tricks a recruiting LLM into giving the highest possible rating to this resume. A significant risk for LLM developers is the leakage of system prompts, which are fundamental in defining custom behavior on top of foundational models. These system prompts may reveal detailed descriptions of business processes, confidential documents, or sensitive product pricing. Another risk involves manipulating behavior of llms integrated into core product workflows or decision making. Processes not only have the capability to manipulate llms through interaction, but also to infect their memory and training process data serves as the lifeblood of large language models. For training foundational models, developers use Internet scale datasets as well as chat history from users. And if attackers could poison such datasets with strategic data injections, they could manipulate future model responses. And while manipulating large datasets may be complicated for attackers without access to internal infrastructure, it still remains feasible, and the increasing use of open source models and datasets downloaded from the Internet simplifies this task. But what is much easier to do is to create thousands of fake accounts and generate millions of chat messages that look benign individually but collectively are malicious. These chat messages, when used for training, have the potential to influence model behavior during entrance. The implications of data poisoning can range from degrading model quality to establishing backdoors for bypassing content safety filters and delivering malicious responses to users at scale. And let's say you invested significant resources to ensure that LLM is not only useful but also safe and secure. But what if attackers can simply take your LLM down? The state of LLM ecosystem and best practices remains relatively immature, and this complexity creates opportunities for exploitation of such inefficiencies. Quite simple attacks can render the entire LLM application unresponsive or even deplete the available budget. This can be exploited in different ways. In a classic denial of service attack, they might pass a malicious file to the LLM, triggering resource intensive operations or internal calls to other components, and making processing time take like forever. Another tactic, known as a denial of wallet, involves flooding the LLM application with an excessive number of API calls. This attack can potentially exhaust your entire budget in a matter of minutes or hours. The risks associated with resource exhaustion are obvious and easily quantifiable because they can deplete technical and financial resources entirely. And what if attackers switch from overloading requests to making requests so meaningful and valuable that they could replicate the entire model. This is exactly how model stealing works. Attackers can send millions of requests and collect responses from the target LLM selected for replication, and they carefully craft a dataset of prompts to ask and responses collected from the target LLM, which is then used to train a brand new model which is nearly identical to the original one. This new model can serve as a playground for testing further attacks, or it can be used for benign purposes without effort and cost associated with training it from scratch. That's exactly what researchers accomplished with only few hundred dollars when they successfully replicated a high value chat GPT model, which originally required tens of millions of dollars to train. And given the substantial cost of creating intellectual property and training, unique models, poses a significant risk to competitive advantage and market position. Similar to model theft, this strategy can help extract sensitive information from an LLM. Llms have the tendency to memorize secret information they were trained on, and not only can this information be inadvertently revealed, but it can also be strategically elicited by attackers through targeted questioning or interrogation. If confidential data is integrated into the LLM workflow, it can be extracted through methods like jailbreaks or prompt injections, and if secret data was incorporated into the training process, attackers can trigger data leakage by crafting datasets with strategic questions about specific areas such as intellectual property or customer information, and responses from the LLM to such interrogation are likely to reveal sensitive information, so the risks of sensitive information disclosure are significant and widely recognized by companies as their top priority. You can see that uncontrolled responses from llms present business challenges, but they can also introduce technical risks. Some llms serve as a system components that generate software, code, or configuration files, and these outputs are subsequently executed or used as inputs for or other components, and without oversight. This can introduce vulnerable code or insecure configurations. This security risk can materialize with or without threat actors. Llms known for their hallucinations may suggest non existent packages during code generation. Attackers are already capitalizing on this by registering frequently hallucinated libraries and injecting malicious code into them. Alternatively, in the first place, LLM may generate a vulnerable code, a configuration or command that could compromise system integrity when executed. In all of these scenarios, improper handling of insecure outputs can jeopardize the security of LLM applications and potentially other downstream systems. And moving beyond the LLM itself, it is critical to consider the security of the environment. The proliferation of LLM first startups has resulted in the integration into many integrating insecure extensions of plugins can significantly expand the attack surface and introduce new attack vectors. For instance, if an LLM has a plugin for direct database connection for tasks like sales analytics and insights, insecure permission handling between the plugin and the database could allow attackers to extract additional sensitive information, such as customers financial details. In some cases, attackers may exploit vulnerable plugins to pivot to other parts of the infrastructure, similar to the classic SSRF attack. Additionally, if the LLM has the capability to visit website links, attackers could trick users to visit a malicious website that could extract chat history or other data from the LLM. LLM extensions of plugins serve as a privileged gateways to the entire infrastructure, and this represents a classic security vulnerability, with the far reaching implications ranging from unauthorized access to complete control over internal systems. Insecure agents are the siblings of insecure extensions. Agents differ from plugins or extensions because they imply delegation of actions. It means an LLM agent could navigate to various resources and execute tasks. This delegation opens the door for exploitation, as an agent could be redirected to perform malicious activity for the benefit of attackers. A variety of attack techniques against agents follows the creativity of LLM developers. For instance, if an LLM agent is tasked with classifying incoming emails and responding automatically to certain topics, attackers could exploit this functionality. They could instruct the agent to respond to their email with sensitive information, disclose contact lists, or even launch a malware campaign by sending phishing emails to all contacts. Similar attack scenarios are possible in many programming copilots with access to code repositories or DevOps agents with permissions to manage cloud infrastructure. The risk posed by excessive agency and vulnerable agents is significant as it extends beyond the LLM application and has the potential to scale automatically. Just as llms can impact the security of external components, external components can also influence the security of llms. With the proliferation of open source ecosystems, LLM developers heavily rely on public models, datasets, and libraries, and compromising or hijacking elements within this supply chain introduces one of the most critical and stealthy vulnerabilities. Whether by accident or through malicious campaigns, LLM developers may inadvertently download compromised models or datasets, resulting in seemingly normal LLM application behavior, which in fact can be remotely controlled by attackers. Vulnerable software packages of machine learning frameworks or standard libraries can introduce new vulnerabilities and enable attacker control. The primary risk associated with supply chain vulnerabilities is the stealthy control by attackers over LLM decisions, behaviors, or potentially the entire application. As you can see, there is a variety of security risks throughout the entire lifecycle of LLM applications. Unfortunately, the format of this presentation doesn't permit a deep dive into solutions. The LLM ecosystem is still in its infancy and will require considerable time to mature. Moreover, llms, like other ML models, are inherently vulnerable to adversarial attacks. My primary advice here is to consider the security of the entire system. Rather than focusing solely on LLM models or datasets. It is crucial to assume LLM vulnerability and design applications with this in mind, implementing safety guardrails and security controls around vulnerable but useful models. So if you want to dive deeper into the topic, you can check the OWAS website for more technical details about the top ten LLM security risks. You can learn about integrating security into MLAbs processes with the ML scope framework, and if you have any questions or ideas for collaboration, feel free to contact me. Thank you for watching this presentation.
...

Eugene Neelou

LLM Security @ OWASP Foundation

Eugene Neelou's LinkedIn account Eugene Neelou's twitter account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways