Conf42 DevSecOps 2022 - Online

Role of SRE in DevSecOps

Video size:

Abstract

Large enterprise customers are on a journey to modernisation of products and platforms and depends heavily on DevOps practices for deploying reliable and resilient and secure product features. SREs plays crucial role in shifting security left along with visibility, control and automations.

Summary

  • Kalyan Dhokte: I am going to talk about a role of SRE in Devsecops. SRE focuses on reliability and resiliency of the system. It also helps to engineer highly performance, scalable, fault tolerant applications. Let's understand what are the different principles DevOps and SRE implements.
  • Modern microservice based architecture increases speed of scale along with complexity. It's very difficult and very hard to track what is going on in the system. Security is a context dependent discipline. You need to be careful while striking the balance between security and reliability.
  • Let's understand what are the common symptoms for failing devsecops and SRE models. It is misaligned teams and reinventing to security. And then basically reliability wheels. Embed security first mindset and then also reliability in development teams.
  • Enhancing DevsecOps operations has become the SRE's top priority. Embed security and development, reliability driven development automations and full stack observability. Value stream management to play a big role from planning to operations along with security.
  • There are certain automated and advanced techniques of Devsecops and SRE basically to automate all your deployment pipeline. These techniques take advantage of AI and machine learning techniques to streamline and simplify speed up the complex devsecops stacks. Overall these advanced techniques for automation will provide a lot of cost savings potential.
  • SRE is going to help into bridging the gap between operations and development team. SRE tasks in different phases for improving reliability, resiliency and security. Initial adjustment to DevsecOps model requires a change in mindset for developers and sres.
  • This is one of the case study where building security and reliability in devsecops pipeline. From the plan and develop you start with the threat modeling. And then before going to the production, there are chaos engineering and security experiments. This greatly reduces the total cost of deployment and then also the development.
  • Hope you enjoyed this session and then I will leave with you. Two quotes so SRE work is like being a part of the world's most intense pit crew. What's dangerous is not to evolve. And then it's very important to understand and then continuously evolve with the new changes.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Our name you. Hello everyone, I am Kalyan Dhokte and welcome to conference 42 devsecops 2022. I am from Cognizant Technology Solutions. Thanks for joining my session. I am going to talk about a role of SRE in Devsecops before starting my quick introduction. I am practice lead and responsible to serve customer for building highly reliable resilient enterprise systems by implementing modern digital engineering techniques like SRE and Devsecops. Let's deep dive into the session. First we will understand relationship between Devsecops and SRE if I take a minute to understand about Devsecops. It is integrating security as a part of your pipeline. As you aware in DevOps we focuses on continuous integration, continuous delivery and continuous deployment. So you make design to deploying all the phases automated. And now devsecops integrates security as a part of entire lifecycle. So benefit of Devsecops to detect security much early in lifecycle. What is Devsecops to do with SRE? Let's understand that as you know SRE focuses on reliability and resiliency of the system. It also helps to engineer highly performance, scalable, fault tolerant applications and help customer to build always on business. So reliability is also mean that you are protecting your systems from unauthorized attacks and you are preventing vulnerabilities, threats and also protecting customer data and finally managing and maintaining privacy of entire ecosystem. So for all practical purposes, SRE implements Devsecops principles and security practices. Let's understand what are the different principles DevOps and SRE implements. So eliminate soil, eliminate silos and also eliminating toil is one of the important principle. Let's understand shared responsibility of security and embed security in development teams accept failure as normal gradual changes, automate everything and measure everything. So it is important to reduce cost of failure by incremental changes and doing the blameless postmortem, basically adopting security best practices and creating uniform security posture. So automation also going to help to improve reliability and automating. Security monitoring and testing is one of the important aspect of devsecops. So SRE helps to enforce robust monitoring of reliability and security throughout also making sure SLO, SLI and actionable alerts are getting triggered for any such. So our systems are becoming so evolved beyond human ability and to mentally understand that model and their behavior is very difficult. So why these modern cloud technologies are challenging. So modern microservice based architecture. Cloud native architecture increases speed of scale along with complexity with improved cost efficiency, accelerated innovation, faster time to market and the ability to scale applications on demand. So in this case with CI CD DevOps and cloud native microservice architecture, we are building systems so large and changing so often continuously. We are changing based on innovations and new features to deploy a very fast pace to production. And it's very difficult and very hard to track what is going on in the system. For example, Salesforce CRM is having hundreds, thousands of microservices, which is very difficult to track all these things and which increases the complexity. So as you increase more features, adding more features, the complexity also going to increase. So where does this complexity coming from? So we are creating and deploying new features with continuous delivery. Cloud native serverless architecture, API and microservices service mesh raise engine. All those are very good to making sure you are continuously innovating and continuously deploying new features. But that also increases the complexity. And then when it goes in a month and then again it's in the end of the year, you will see very difficult to manage and maintain that entire complexity because complexity is continuously also increasing. As you sre upgrading kubernetes systems, upgrading cloud services, there are some outages you are fixing continuous, any related to security related issues. If there are certain misconfigurations, you are refactoring code, moving lot many applications to cloud native and serverless kind of architecture, changing deployment tools as well. So all those factors going to increase the complexity. If you want to simplify the complex system, you have to change it. But in order to changing systems for simplification, you will be adding additional complexity. And complexity is how you. It is not easy to simplify and it is very difficult to understand, but complexity is more about navigating it. So how you will navigate the complexity, let's understand that. So security is a context dependent discipline. And then by deploying all these devsecops and SRE practices, you could able to navigate the complexity very easily. So flexibility to change the system rapidly and then applying security context to system changes. So you need to understand security is kind of a stateful kind of nature. Okay, we can see there is a clear conundrum as developers constantly implements new features and move towards customer value rapidly. And in this case security will not move on the same curve. And then there is a constantly you have to verify security and reliability is implemented as per design. So in this case continuous security and reliability has to be embedded. And then you need to continuously check and verify that security with the security first mindset and reliability first mindset. That's very important. You need to infuse all those things into development, deliveries and in production. So SRE basically helping in securing the infrastructure applications when you deploy into production and then it will help to basically tracking the availability, SLO, tracking, error budget, all those aspects which includes automated security monitoring as well. So in this case you need to combine framework of Devsecops and SRE and continuous security to address assess for such complex distributed microservice architecture. Because lot of bad actors SRE continuously chasing such complex enterprises for various attacks and the race to defend against attackers must be accelerated compared to functional failures. The reliability and security failures frequently impact entire enterprise systems and also impact the brand value. Hence special attention is required for the security and reliability. So there are some trade offs as well between reliability and security. While redundancy is going to increase the reliability, it also increases the attack surface. So reliability and security trade offs with respect to incident management. So when you want to improve MTTR and then reliability incidents basically benefits with the verbose logging while those logs would be again target for those attackers. So you need to be very careful while making sure basically striking the balance between security and reliability. Let's understand what are the common symptoms for failing devsecops and SRE models. So one important thing is avoidance and blame. This is one of the symptoms. Lot of developers and then teams are saying the security team is blamed for slowing down the projects, the pipeline entirely getting slowed down because of the security reviews and project teams are frequently sidestepped that could try to avoid the security, slow and ineffective security reviews. Security reviews of new applications take weeks and then also produce little actionable information. Bad user experience basically the user experience for authentication authorizations are sometime very much inconsistent and then there are gaps in risk profile as well. So large parts of your technology portfolio have undiscovered security and vulnerability. And then all these things increasing the complexity and then the complexity is becoming very unmanageable. So your vulnerability list is very large and complex and it is growing rapidly when you move towards or shifting more application to the production. So how can I diagnosis the diagnosis basically need to understand why these symptoms are. And then basically it is misaligned teams and reinventing to security. And then basically reliability wheels. Security and reliability involved very late in the game and many times there sre a lot of lack of automation. So for this revenue is definitely embed security first mindset and then also reliability in development teams embed reliability in development teams. So create security and reliability standards and accelerate that as well as automations of the security scans and continuous security testing, observability and resiliency we need to understand how we have to make Devsecops is a kind of healthy operating model. Embed sres into dev teams to help implement security and reliability consideration early in the design and development phases. So let's understand what are the common practices for success of Devsecops and SRE model. Enhancing DevsecOps operations has become the SRE's top priority in light of digital transformation and continuous escalation of attacks. So all these aspects are very important to making Devsecops and SRE models successful. Embed security and development, reliability driven development automations and full stack observability. So that's going to enhance Devsecops operations and will be success of the devsecops and SRE models. So moving into the modernisation service model devsecops with unit to embed reliability and security stack. So I personally believe value stream management to play a big role from planning to operations along with security and not just a CI CD pipeline in digital transformation journey. So basically value stream is a business practice that helps to identify areas of improvements. Okay. So in a process to make operations more efficient you need to drive the business value. So value stream management will help in security governance, automations, security dashboard, analytics policy as a code and pipelines. So from planning design phase to monitor phase, the security will be embedded right from the threat modeling you need to implement assessed SAS DAS as well as chaos security experiments. So security policy compliance then you need to continuously schedule some of the audits, fuzzy test, Wafa penetration testing, automated security monitoring right from the plan to design all till entire operation, pre production as well as production, how the security will be implemented in the production pipeline and then as well as the reliability is equally important which help to making Devsecops production pipeline more secure and more reliable. So combining value stream management with the implementation and expansion of Devsecops and SI practices is becoming the best of breeder approach to optimize every stage of deployment, development and security embedding security end to end. By having everything in a single pipeline with optimized value streams and checks and controls for vulnerability and any kind of misconfiguration, you are going to eliminate any time consuming manual reviews and help to accelerate the production pipeline. So here you need to understand if there are certain shift left and as well as you are moving some application to cloud and existing, there are some existing process and tools you need to understand what are those existing process with respect to people, what kind of skills are there, what kind of culture, what are the processes available? Are they implementing policy as a code? What are the different security practices and then what is the technology? You can't just start from the scratch, you can basically learn and document the existing process, adopt and then adopt and optimize, use those existing processes and technology and skill set and then implement devsecops pipeline. So there are certain automated and advanced techniques of Devsecops and SRE basically to automate all your deployment pipeline. So SREs have also devoted lot of their time in attention to shifting right basically for automations security monitoring in production and then giving continuous feedback and then based on security monitoring data back to developers. So you need to help and then support basically automating with this advanced technique to developers. There are security gates that you can automate to stop workflow from slowing down and then basically backing up monitoring and analytics, monitoring analytics into the pipeline, then automate close verification checks into the Devsecops framework, reliability driven development, code profiling, tool integration, all these are very important techniques you can automate and there are certain AI and ML techniques for threat analysis. So advanced Devsecops framework take advantage of AI and machine learning techniques to basically to streamline and simplify speed up the complex devsecops stacks. There are two examples I would give. One is basically collecting and analyzing software while users are logging into the system and then those logging information will identify which aspect of software with the bad actors are attempting to target. So this information will give to the AI ML algorithm and then it can suggest different code alteration and ads or any architecture changes is required to proactively identify code vulnerability. From a testing perspective, code changes can be run through finely tuned machine learning tools to identify how a particular change might affect other aspect of the applications. So these are the examples where you can have AI backed threat analysis and then you have to making sure the chaos engineering with respect to security experiments can be also integrated which will give the automations devsecops pipeline. With the tools like Gremlin Chaos link cloud strike can be used. So we'll provide more details in the case study about the chaos link and then in modernisation side you need to making sure automated security monitoring and analytics will be implemented. So tracking or reliability gates and operation excellence dashboard auto auditing and compliance tools that streamlines basically a lot of compliance related reporting and then deployment of the scripts validation. Basically there SRE certain security context and flags you need to add while deploying any pods in the Kubernetes and security policy framework will continuously monitor and assess that security flags. So overall these advanced techniques for automation of devsecops and SRE will provide a lot of cost savings potential. So Devsecops and basically SRE automation when you complete, which will help to lower likelihood of any catastrophic cybersecurity incidents and reduction in number of operational staff which basically improve your MTTr and then reduce lot of p one tickets. So let's understand what is SRE role in Devsecops according to team topology. So effective software teams are very essential for any organization to deliver value continuously and sustainably. But how do you build the best team for the organizations and your specific goals with respect to culture, what kind of skill set you have, what is the leadership and then all development operation team. So there is a bridge between Dev and Ops that SRE is going to help bridging the gaps between operation and development team. So SRE will help in reliability and resiliency, kind of features that has to be prioritized and then it will help and then basically automate reliability engineering security observability, security feedback constantly giving implementing golden signals. So SRE is going to help into bridging the gap between operations and development team. The role of SRE is to collaborate, engage in value added activities and create results that contribute to measurable reliability improvements. So we understand what are the different tasks with respect to SRE role in Devsecops. What are the backlogs? The initial adjustment to DevsecOps model requires a change in mindset for developers and sres, but building in vulnerability detection ahead of app deployment ultimately lowered their sres around the security. So SRE tasks in different phases for improving reliability, resiliency and security. So basically with respect to security, you need to secure the build pipeline, secure entire deploying and also runtime protection. So security side of it is mainly enabling security code scanning security policy whenever the parts are running in aks, kubernetes, cluster, configure and validate deployment scripts for security context. So overall reliability and dependability consists of seven attributes. And security is combination of confidentiality, integrity and availability. So confidentiality is mainly ensuring that the information is inaccessible to unauthorized people, commonly enforced through your encryption, ids and password, and even with the two factor authentication. So integrity is mainly safeguarding information and system from being modified by unauthorized people. And of course availability is ensuring that unauthorized people have access to the information when needed. And then safety, resiliency and maintainability is basically contributing to overall dependability of the system. So these are some of the tasks and backlogs you need to continuously implement during every phases right from the build, deployment and operations with respect to specific areas of reliability, resiliency and security. So next is plan to develop the threat modeling from all the phases. This is one of the case study where building security and reliability in devsecops pipeline. So right from the plan and develop you start with the threat modeling. So there are certain threat modeling tools you can implement and then these modeling tools will help to identify the impact on the design various design components. IDE security plugins will start on the immediate code scanning pre commit hooks and then the secure coding standards. Then commit to code. When you commit the code into any repository you need to understand what is a secure repository, are you implementing or not. Then there are static code analysis, then dynamic security unit, functional test, dynamic security testing as well in the build and test phases, dependency management, secure entire secure pipeline. So infrastructure scanning, cloud configuration validation, all those processes are important right from the plan and develop to operate phrase. This is the security model, ideal security model. Apart from that in build and test there are certain plugins created for this case study is basically YAML validator and security context schema. So you will have certain plugins. In this case it's the azure pipeline. So we have to making sure while developers are coding the YAML files, YAML scripts, they are adding the relevant security context and the security flags so that for example run as a non root user or you should not have access to root file system allow privilege, access should be false. So all those things read only root file systems. All those flags need to be checked and validated while you are executing the Azure pipeline. So continuously checking that schema and then providing the report directly to the developers that can be automated as well. And then before going to the production, when you are moving your code into non production environment and also the production environment, there are chaos engineering and security experiments can be conducted using certain security experiment tools like chaos, linger and Gremlin which will help to understand the complexity. You need to basically navigate the complexity and then any changes, any turbulence condition arises. You need to be prepared for that, basically getting giving the confidence to the developers and operation team. So secondary vulnerability is one of the area you can simulate with the fault injection simulations also for the reliability side of it. So when there are relatively easy and cost effective to resolve those issues, as a result it greatly reduces the total cost of deployment and then also the development because you are identifying those security issues early in the lifecycle. Also we need to design certain tools with non security experts in mind and making threat modeling more easier for developers by providing clear guidance on creating and analyzing the threat model. Okay now understand some of these security policy can be enforced security center like defender continuously discover new resources and are being deployed across your workloads and assets, whether they reconfigured according to security best practices or not. And then the tickets are flagged and then you will get the priority list of all the recommendations. So this is basically enforcing security policy using security center so it will understand to help you to know the posture, continuously assess and identify vulnerabilities, harden those resources and services with security benchmark and detect and resolve threats to resources and services. So the major motivation for chaos engineering with security experimentation is to gain confidence when systems are exposed to any real life scenario attacks and that can be transmitted to any cloud platforms with various APIs and various attack surfaces. So there is open source security experimentation tool like a chaos slinger and then cloud strike. So SRE engineers basically design security test scenario by assessing risk based fault vulnerability assessment and then create the various test scenario. So there are certain components. One is controller which coordinates the chaos injection experience then manager which receives the instruction to conduct attacks based on specified attack modes. Fault engine that will help to knowledge on about cloud compliance and best practices. Fault injector is responsible for implementing the security and fault against the target cloud assets. Chaos monitor is an important component which will continuously monitors and the progress of attacks and early detect any effects due to fault injection which will basically control the blast radius. Chaos analyzer analyzes the scores and generates the report. So possible recommendations to include for updating any security rules for security groups. If you are creating any energy rules that can be added or enhanced because of understanding after the simulation recommendations and restriction of access as well as making sure access control policies are intact. So all these are the important recommendations can be derived from chaos engineering experience. So various test category there are spoofing of user identity and other entities then tempering the s three or RDS or redshift kind of a data store privacy breach or data leaks there are misconfigured default security groups, denial of service, destroy cloud services configurations and data store and then elevation of privilege to basically add some users and assets, account to existing roles or groups with higher privileges and then check if are you getting the security alerts or not. Basically checking the security alerts is important aspect to validate all those scenarios. Validating baseline security requirements, assign some public IP to your components and compromise with the internal resources and then check the security center is validating and providing the alerts or not so scenario are also there with respect to plays of slinger and cloud strike. So one is scenario is misconfigured port change and s three bucket permission changes. So you have list of energy and then you need to select those energy that are tagged with the opt in tag for chaos. Randomly select any energy and apply random open or close action based on the port configuration. Chaos linger will help to apply those configuration changes and then there is generator components which starts experimenting and performs port equation and change the port and then you need to track the changes. Verify events are triggered in security center alerts or not. So that is one of the scenario that you can test it and then three bucket permission changes. You have to create one user new user get a list of all the buckets in the cloud with respect to AWS or Azure or GCP. So select random bucket from the set of buckets in the cloud and configure attack points using these chaos tools. Then simulate bucket unavailability by changing bucket ACL from allow to deny and tool will apply the configurated changes tool starts the security experience and you will get the real time insights of the chaos engineering tool set for the experience and then you can monitor the AWS cloud watch and verify events are triggered in security center or not. So there are two things. One is testing with the penetration testing and all those things. And also it is very important to making sure you are creating automated dashboard and continuously monitoring the security and reliability. So apart from that you have to create a customized Slo monitoring dashboards. Use of automated root cause analysis without the automation it is very difficult to because lot of data is getting created and then it is very difficult to making sure pinpointing any component and issues in the production system. The use of automated root cause analysis is going to use anomaly detection and suspect ranking. Automated machine learning is applied using the python, Skype, skit learn and all those things. So these are the automated dashboard that you can create and then understand where is the anomaly, identify the anomaly and detect it very rapidly. The summary here is the cultural aspect of SRE to embed security along with reliability, implement SRE practices in devsecops like security, chaos engineering, improve organization, defense and zero trusted network and then making sure reliability and security integrated into design and deployment process. You have to making sure eliminate reliability security related bottlenecks with continuous delivery in a production pipeline and bridging the gaps with security practices while ensuring quick and safe deliverables. Eliminate all the siloed team and with increased collaboration and shared security responsibility and reliability driven development. These are the aspect takeaway from this session. Hope you enjoyed this session and then I will leave with you. Two quotes so SRE work is like being a part of the world's most intense pit crew. We change the tires of a race as it's going 100 mph. So that's a quote from Andrew from Sre at Google. And then another quote is you need to continuously evolve and change. What's dangerous is not to evolve. That's a quote from Jeff Bezos. And then it's very important to understand and then continuously evolve with the new changes. So thank you very much. Hope you enjoyed this session. And if any questions, please reach out to me either on LinkedIn, Twitter and Klokta at 9987 on Discord. Thank you very much.
...

Kalyan Dhokte

Practice Head - SRE, Digital Engineering, Pune @ Cognizant Technology Solutions

Kalyan Dhokte's LinkedIn account Kalyan Dhokte's twitter account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways