Conf42 Chaos Engineering 2021 - Online

Securing the Cloud: Empowering Developers to practice Security Chaos Engineering

Video size:


Cloud platforms must face all kinds of security issues that are frequently a matter for security engineers, not for developers. As a result, security is treated as separate from development. Although sponsors have promoted the integration of security practices into all stages of software development, many developers think security is a topic for other engineering fields. Despite having tools such as Snyk and Blackduck, as result developers are missing the benefits they get from their cloud platforms.

This talk will show the benefits of practicing security chaos engineering [SCE] by empowering developers to leverage the power of security topics directly. [SCE] offers many advantages that include a reduction in remediation costs, disruption to end-users, and improvement of confidence in production systems. In this talk, we are going to show how this practice has helped us to develop a culture based on security between software developers.

Methodology: * Present the foundation of the software development life cycle. * Explore the integration of SDLC, resilience, and security using tools such as Snyk and Blackduck. * Analyze why developers do not include the security topics in their activities. * Present a novel practice titled Security Chaos Engineering. * Show how democratizing security between software developers has shown us the benefits from the distributed, immutable and ephemeral, or DIE, model. * Show some of the experiments that we are trying in ADL for promoting a culture based on security using SCE.”


  • Yury Nio and Jonathan Hill talk about security the cloud empowering developers to practice security chaos engineering. Cloud platforms are facing security issues that are frequently a matter for security engineers, not for developers. Jonathan is going to show us how democratizing security between software developers.
  • Jonathan: You need to know about how to work the security in everything that you need to adopt. From AWS to GCP, talk about the infrastructure protection defined equals. Automate deployment of safety tasks in all cloud providers. And with that you generate best practice for all your model.
  • Survey: 14% of engineers don't show interest in security issues. 23% of developers don't have security steps enabled in pipelines. The first and the most important step is to educate develop.
  • A 2019 study found out of 32 web applications, 82% of vulnerabilities were located in the application called itself. Security should be front of mind both security engineers and developers. Organizations must offer training and sculpture internally. Use security chaos engineers.
  • So let me talk about some examples for practicing security chaos engineering in a game day. One experiment for it is introduce laryncy on security controls. How to this practices cloud be generate more value in your teams. These objectives with using chaos and what could be generate this impact in our teams.
  • First focus obviously is the customers and the new features that generate with the value from the customers. Second focus that generates for this team is the security how to the security play a big important part of this SCE. An opportunity to involve business.
  • Use algorithm that use highest encryption keys, not 256 keys, could be in 1000. Don't leave clear data in logs. Use MFA for critical application actions. Enable CI and CD steps from security.


This transcript was autogenerated. To make changes, submit a PR.
Hi everybody, we are really excited to be here. Thank you very much for attending. The title of our talk is security the cloud empowering developers to practice security chaos engineering. Nice to meet you. We are Yury Nio and Jonathan Hill. We work as cyber liability engineers for AdL Digital Labs, a company in Colombia at provides technology and innovation services. We are chaos engineering advocates. We are promoting the adoption of this discipline in our country. Cloud platforms are facing security issues that are frequently a matter for security engineers, not for developers. As a result, security is rather has separate from development. Today we are talking about security Chaos Engineering, a novel discipline that offers a methodology to bring two developers to leverage the power of security in their roles. They are the topics for today. Jonathan will provide a foundation about cloud and security. He is going to show some keys from the well architected framework. I am going to explore the integration between software development, reliability and security. At this point I am going to analyze why developers don't include security topics in their activities. With this context, I am going to present a novel practice name security chaos engineering. Finally, Jonathan is going to show us how democratizing security between software developers. He is going to show us the benefits from the distributed, immovable and ephemeral framework based on security chaos engineering. So go ahead Jonathan. Thank you Yuri for this big resume. Hi everyone, my name is Jonathan. I talk about the cloud and security and something is scenarios. We start talking about the frame architecture. This script from AWS could be, you know this slide that talks about the architecture, about the excellence of premiums, about the security or reliability, performance and cost optimization in these big steps that define every cloud. You need to know about how to work the security in everything that you need to adopt, right? Then we talk about bigger cloud providers especially that is starting with AWS. AWS talking about the implementation Strong Identity foundation that talks about the implement principle of least privilege and enforce separation of duties with appropriate authorization for each interaction. With AWS resources centralized identify management and IAM use to eliminate reliance and long term static credentials, right? The next enable traceability monitoring everything that you need that you do in your cloud. It's a very needed to define in your infrastructure, right? And the next apply the security in all layers. In AWS you have a VPC subnets, easy to instance Astri bucket. You could define in all these steps security in your IAMS group, security in your security group that allows who grant access for these resources. Next one, talk about the automate security alert. Yeah, best practices for security could be is when you try to automaticate ADL things that are out of security then if you make this something you could be better in your security, right? Protect data and transit and rest. It's very careful if you manage data very sensitive correct. And the next one prepare for security events then training your team, trying your people that know who is security and how to security and what is my business and how to the security could be based my business, right. Because with security you have a very big problem if your reputation it's involved in some of security cases and then practice that. And the next one talk about the Azure security then talk about the defense indeed protect your information from beginning from installing the U walls who access for this data. It's very important define who access from these objects in Azure cloud and the next entity management benefits of single sign on then manage all your access with controls, with AD, with your active directory, with your company directory. That's very good practices to access for this cloud. The next talking about the infrastructure protection defined equals that AWS defined in deep how to protect all things that exist in this ecosystem. That's a very nice practices encryption encrypt your data in res in transit that it's very nice to have to do in your infrastructure network security can grant whole access for your data, whole access from your application and how to apply some control of security that it's very nice to include in every part of jute software application security defined from requirement gathering training your people to generate these requirements for the application that you generate. It's very safe from this beginning. Yeah and last one talking about the GCP security talking about from these stages that define implement the least privilege with identity and authorization controls that it's same but in other words that defined in others cloud infrastructure providers building a ledger security approach implement security at each level in your application infrastructure applying and defense in deep approach use the fiatrogenist product to limit access and use the encryption, right. Automate deployment of sensitive tasks if you have tasks from your data from generate reports cloud be needed to generate some automatic tasks that the people don't execute these scripts could be because if the people access to these scripts access to the data it is very difficult control if you don't have a very granularity of these controls then if you have a very automated deployment you have a very automated this task. It's very nice to have because your security is better. Automate deployment of safety tasks they are talking about implement securing monitoring in all cloud providers define how did you deploy some things about the infrastructure about your application then if you define could be pipelines for generate these resources for my application for my substructure for my business objectives could be to define some task task protect these steps in all things from security. That's right. And with that you generate best practice for all your model that you define in your infrastructure and your business. Thank you Jonathan. Cloud computing models presented by Jonathan are dynamic and complex which make difficult detecting threats and consequently to pronounsticate cyber attacks. As a result, different systems are designed to respond to failures in quite different ways. In the absence of an adversary, systems often fail safe. Failsafe behavior can lead the two obvious security vulnerabilities to defend against an adversary who might explode a power failure, we could design the door to fail secure and remain cloud when not power. These primary reliability risks are not malicious in nature, for example about software update or a physical device failure. In the other side, security risks come from adversaries who are actively trying to exploit system vulnerabilities. When designing for reliability, we assume that some things go wrong at some point. When designing for security, I think it's different because we must assume that an adversary could be trying to make things go wrong at any point. Both security and reliability and concern with confidentially integrity and availability of our systems. But they view these properties through different lens. They have traditionally been confidence fundamental attributes of secure systems. The key difference between the two viewpoints is the presence or lack of a malicious adversary. A reliable system must not breach confidentially accidentally, while a secure system must prevent an active adversary from accessing, tampering with or destroying confidential data. Confidentially integrity and availability and related with these two concepts reliability and security. According to Google, reliability is the most important feature of our systems. Considering this, for reaching this they must have to securing. Probably you are wondering where to begin integrating security and reliability principles into your systems. The first and the most important step is securing security and reliability issues is to educate develop. However, even the best training engineers can make stages mistakes, security experts can write insecure code and sres can miss reliability issues. Considering that it's difficult to keep the many considerations and trade off involved in building a culture based on secure and reliability systems in mind, we started making an evaluation of the situation in our company. So we apply a survey between developers with the aim to know how much they know about security and what is their perception about the importance of this topic. We interviewed 130 engineers in ADL, of which the 25% were software architects, 16% say they were front end engineers and 60% were back DevOps engineer and just two or 3% say they were full stack and quality engineers respectively, although we were expecting that a percent of them didn't show interest in security topics to the first questions do you have interest in security topics? Almost 15 didn't have interest on those topics. That is an important person if we consider that the group is mostly composed of backend engineers. Tools is an online community that produce freely available articles, methodologies, documentation, tools and technologies in the field of web application security. It is a great reference that ADL DevOps engineer in building of a digital solution should know. So we consider could make sense to ask about its practice. To our surprise, the percentage of people who didn't practice AWAPs except one third of the responders about static analyze static analyze is about analyzing and understanding computer programs by inspecting their source code without executing or running them. Static analyzers parts the source code and build an internal representation of the programs that is suitable for automated analyze. This approach can discover potential box in source code, but also it is a great tool to discover software vulnerabilities, preferably before the code is checked or deployed in production. To the question do you run a static analyze? 23% of developers don't have security steps enabled in pipelines. This group includes five software architects, 14 backend engineers and only one front end engineer. 100 people have security integrated in the continuous integration continuous deployment process. Finally, we ask them about the tools that they integrated into their development environment. In ADL, our Jenkins pipeline run steps for measuring the quality of the code using sonar. That is the reason for the first value, 49 of them use sonar. But when we ask by tools or plugins for identifying vulnerabilities in the dependencies of the code, the values are low. Just three people use black doc and two use sneak. Finally, just one people or one person use fortify Amberco two tools for building secure software fast finding security issues early and fix them. It is our conclusion of this survey, 14% of engineers don't show interest in security issues. That is really really important because it imposes us a challenge. Motivate and create culture about security between software developers group of people is molecule formers vacation engineers. That is an important thing to think about this and we need to motivate them and we need to generate motivation stages for them. So let me move another section of this presentation. It is clear that we have a problem from the development which can be extrapolated to the cloud. Considering that there are a group of people who don't show interest in security topics, the most probable is that an exploited vulnerability is only a matter of time. Software development is a dynamic profession. Source code change daily and once in a while. So that's the way the local development environments needs to be set up. There are many benefits of integrating security in the development process. We have to rethink how developers sce the cloud in terms of software development and adoption. In the early days, we performed the software development lifecycle stages offline and on premise. If you remember where developers use their computers has terminals to access early versions of the worldwide web, helping them find answers to problems. All right, so far, thanks to Internet software as a service, solutions quickly bought significant security vulnerabilities. Nowadays, Digital Reliance Trust, open business have served to how to important secure software development lifecycle is for business, customers and society. A common security box can lead to catastrophic breaches if undetected. A 2019 study found out of 32 web applications, 82% of vulnerabilities were located in the application called itself. Hackers can attack users in night out of ten web applications. Attacks include redirecting users to a hacker controller resource, stealing credentials in phishing attacks and infecting computers with malware. Unauthorized access to application is possible on 30% of sites. In 2019, full control of the system cloud be obtained of 16% of web applications on 8% of systems, full control of the web application server allow attacking the local network. On average, each systems contain 22 vulnerabilities on which forward of high severity it is a fact we need to secure and guarantee reliability. Code to date has lead to the growth chaos engineers where resilience is built into code by designing and methodology. Security should be front of mind both security engineers and developers. That is a fact. That is our conclusion of this first part. Organizations must offer training and sculpture internally. What can we do if there are developers don't like security? Has I mentioned in the survey we have a proposal here. Use security chaos engineers Security chaos Engineering is the identification of security controls failures through proactive experiments to build confidence in the system's ability to defend against malicious conditions in production. This definition was promoted in this book, Security Chaos Engineering, published in April of the last year. I have highlighted six words that are valuable in the definition provided by Aaron security failures experiments and it is super important because this discipline is based on the scientific method, confidence and defense because it is about to achieve resilience and lastly production because the theory says that we should run experiments on production environment, although it should not be necessarily so, we can expect traditional teaching methods such as classroom based learning to change our developers mindset on secure coding. Gamified developer programs are a great way to engage developers and actively test their secure coding skills. Chaos game days are based on game days and now I am going to provide some definitions related to that. A definition from AWS says that game days are an interactive team based learning exercises designed to give players a chance to put their skills to the test in a real world, gamified, risk free environment. Most importantly, they are an extremely fun way to learn more about the potential of a technology as a form of game days. Chaos game days is a practice event that can take a whole day. It usually requires only a few hours. The goal of a game day is to practice how you, your team or your supporting systems deal with the real world turbulent conditions. That is the objective for this practice. So it is a framework provided by rules. Miles the framework has three phases, before, during and after during. Before we pick a hypothesis, pick an style, decide who where went the event was wrong. So after that, during the durian phase, the tech decitation is the objective of this part. And other activities include take adept bread, communicate, visit dashboards in the observability tools, analyze data, propose solutions and apply it and solve the incident. And finally the last has is for writing a post mortem. So in this phase we analyze what happened, what is the impact of the incident, what is the duration, what is the resolution time and what are the action systems included here. So let me talk about some examples for practicing security chaos engineering in a game day. One experiment for it is introduce laryncy on security controls. Drop a folder like iron escape will do in non production software secret clear text disclosure disable service seven login permission collisions provide permissions collisions for example in AWS AP gateway shutdown create an encrypted three bucket or finally disable multifactor authentication impact of security chaos in general in previous slide we're talking about the chaos and how to this practices cloud be generate more value in your teams and generate some practices in our teams. And then now we're talking about how to our teams could be generate this value. These objectives with using chaos and what could be generate this impact in our teams. Right then could be one big problem with this part is because you needed to talk a long amount of data to resume and correlation generate some patterns about this data. You could be based bigquery use some other strategy from your cloud and how to this cloud generate this pattern. For me that's not easy to use, but you can use and define what useful and what not useful for my cloud provider use this summer a lot of resources that you can define and how to all part of the cloud generate this part for me that I not needed to generate very big tools. I can use some of IIA to generate this part of my data that I have in my storage. Then how to correlate these logs and how to generate some part of my job more easeful. Right. And the next slide we're talking about the impact. What is the impact about my teams? What happened with my teams talking about what is the most big importance with your requirements gathering and architecture design for security. This one has the part of focus from your software cycle. Because you can prepare your people and generate some plans to this part. But if your architecture or the person that company your business don't generate this value for the company it's very difficult. That view from all team. Because all part of the team it's very important in all part of the software. And you need to generate this value for your team, right. Then you need to generate plans to capacitate your people and how to these people cloud be topics like a herd and generate these requirements the correct way. Right. Then we'll be talking about the continuous testing about this team that it's very important. But first focus obviously is the customers and the new features that generate with the value from the customers. But the second focus that generates for this team is the security how to the security play a big important part of this SCE. Because that team will be generate more requirements for my development team than my business. Because if my team from QA could generate these plans, some tests that generate could be some issues. The security could be my development team is more stronger when generate more software development for my customer then it's a very big part of my team. But it's important every parts of the team. Obviously it's very important. But if you generate focus in this part, the other part in your team generate this because that current generate all the teams in the thought for developers, right. An opportunity to involve business. An example asking if a low that lodging could be generating more from one browser cloud be it's a one off part that more or less than the other ways that the attacker could be generate some waste to this part. And in that part you need to involve your customer in these requirements, right. We highly import of secure dependencies in time on software design and implementation. If you define in your team, in your software team that use open source library it's very carefully because you needed to generate security for that. Yeah. It could be the develop security use open source library that generates some part of holes in my software and it's very difficult to identify when your software is in production. But if you generate part of this definition of how to security these dependencies and how to generate some scan about this library, you generate more value from your customers and AdL value that you generate with your developer team, right? That it's a very nice form to generate this impact. Talking about some recommendations that in AdL we generate with jury and all security teams that it's a very important in this moment, right? Use algorithm to prepare sensitive data. Use algorithm that use highest encryption keys, not 256 keys, could be in 1000. Generate your popular algorithm that it's easy to decipherate for your team. Could be if your team of security is very mature, could be generate some algorithms to protect your data. If you have this part of your teams, very nice because it's part of the software too. Don't leave clear data in logs. Some developers use clear data to debug the application, but in production it's very other way and it's very painful because if you leave clear data in logs, if these logs was stolen, you have a big problem, right? Then could be used more or less data included with the logs from your developer team if needed. Debugging in production exists other ways to generate this debugging and how to your application use production for your developers, right. Use MFA for critical application actions. Multifactor authentication is a very useful for your actions not only for login. You can use this strategy for generate more secure approach in your software, right? Use a long short life effective links for documents to be delivered. If you generate some PDF from your customers, could be generated long time from these documents of OTP. One access from these documents and that's it. Remote frontier storage. That is a good practice because you don't have to generate all things of this PDF. Or could be other strategy is generate a hash. When the hash is get it from your customer, the system automatically generates a PDF. The customer download it and you remove from your storage and that's it. You don't storage this data because it's very sensitive and it's a very hard way to storage and maintain names. This data, right. Uses session management from front kinds cloud be in this part separately more your application generate back end generate front end. If you can put some more layers from your application from your front and your back end because you generate difficult patterns from your hackers that it's very hard way to generate some attacks from your application. Then if you use this strategy, you secure more your application, right. Do not use cookies and process storage. Could be if you use storage and use cookies could be not put in there. Some sensitive data could be put just session id could be put list of products or list of access from my role. But no sensitive data because it's very hard took from the computer's customers this data and replace that, right. And if an attacker take these cookies or storage from browser it's very difficult remove that and the customers are very painful because the customer generate new user neo password generate some control that could be disheartened. And it's very painful for him to make all software activities auditable. If you have a software with all action of detail, it's very nice because you identify what the customer is doing in your software. In this way you can define what happened if what customer access to unexplored link and what try to access to this link. Because then later, right. Then you need to define this auditable from generate some alerts, generate some patterns from your customers and how to define it's not a pattern from your customer, right. Perform vulnerability scan of the software. Talking about the dependencies is a good way to generate this part. Another part. Enable CI and CD steps. If you have in this moment from delivery for your customers from your application, you need to enable these steps from security. That's very nice in your application because you could be generating some politics about from the software about the roles about how to deploy, about how to secure my artifacts, my application and how to generate this very secure in these steps. That is a very previous from generating software in production, right. Use a hard info validation of software elements could be used this hashing from your artifactory repository that generate juratifactor. Put in your artifactory repository with the has when you download check if the hashes change it. And that's a very easy control to your repository and your artifacts that generate too, right. Generate container image in secure way. You could be used less access strategy of minimum of privilege from these containers that it's very long way to your containers, but it's the good way from your customers and your application. Because you don't have to generate these containers with the root access because don't need it. The application really don't need it root element with your environment. Because if one application needed that, it could be redefined how to access from some resources in the container or from your resources, right. Separate environments from applications and separate database too. It's a very useful this part if you have a production environment, development environment, QA environment. It's a very easy part. If you separate and isolate this environment from each one other. That's a very nice and very good way to generate this securing right. Then if you separate for less just production and other environments. In other part is right. Because you could be generate some with this strategy. Because I don't have a lot of money that it's a good way and some part to start with that use security blocking user unsuccessful attempts. If you detect that your customers logging from three, four, five attempts fail it, you could be blocked. Because it's a very good practice and your customers is very grateful for you. Because you can send email advice about your user has been blocking about some attempts that could be done generate and this alert for your customer. I'm very grateful because it's a nice alert and that's it. Right. Thank you.

Yury Nino

DevOps Engineer @ Aval Digital Labs

Yury Nino's LinkedIn account Yury Nino's twitter account

Jhonnatan Gil Chaves

DevOps Engineer @ Aval Digital Labs

Jhonnatan Gil Chaves's LinkedIn account Jhonnatan Gil Chaves's twitter account

Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways