Conf42 Chaos Engineering 2021 - Online

Role of Quality Engineers in SRE

Video size:

Abstract

The role of traditional testers has evolved to validate the resilience of modern applications and infrastructure. During this session, the speaker will share some insights/lessons learned while working helping customers make their quality engineering transformation journey.

This talk would cover the following points: - Applying observability in testing processes (functional testing, performance testing etc.) - Automating resilience (chaos) tests along with performance tests

Summary

  • Reuben Rajan George is a cloud reliability architect with Accenture. Today's talk is directed to quality engineers looking to make a career switch into SRE. And finally, product managers who like to engage their existing QE engineering pool in operations or SRE activities.
  • Modern day software development engineers in testing are highly skilled automation engineers. Unlike traditional testers, sdes today have varied responsibilities. Today they have responsibilities ranging from functional performance, security, usability and accessibility validation. Quality engineers come up with a mindset of curiosity, adaptability and exploratory.
  • Testers today and tomorrow would be involved in these following areas, and they are mentioned as four bullet points. First thing is the autonomous functional validation. Second is machine learning for any optimization of testing. And finally, as there is a great impetus to make our systems more reliable and resilient.
  • Autonomous functional validation in autonomous functional validation, the key points here is that they would be involved in setting up these test automation frameworks. The second thing is this environment provisioning template validation basically is testing deployment templates. The main goal here is to reduce as much toil as possible.
  • Continuous performance optimization is to automatically configure applications, runtimes, databases, cloud environments individually. Quality engineers will also be evolved in determining potential single points of failures and fault modes. There are cost benefits to any sort of performance optimization.
  • So that's all I has for today. You are free to ping me on LinkedIn with my name, Ruben Rajan George, or on Twitter. See you later.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
You. Hello. I'm Reuben Rajan George, and I work as a cloud reliability architect with Accenture. I've been involved in quality engineering roles for over ten years, primarily in the area of performance engineering and optimization. Now, today's talk is directed to the following folks. Now, these are quality engineers who are currently looking to make a career switch into SRE. The next folks are the QE engineers who are currently asked to take up SRE roles, and finally, the product managers who like to engage their existing QE engineering pool in operations or SRE activities. And I hope this is going to be helpful. Modern day sdets, or in other words, software development engineers in testing, or in other words, quality engineers, are actually highly skilled automation engineers with not just the functional knowledge of the applications, but a clear understanding of how systems function under the hood. Now, these engineers must be aware of all the black box testing techniques along with hands on development or coding skills. Many of these sds are equipped with programming skills, making them capable to read, analyze, troubleshoot and recommend appropriate grammar and optimized algorithms. Now, sds today are actually embedded throughout the entire application development process right from the very beginning. Here are some of these roles that are played by sdes today. Unlike traditional testers, sdes today have varied responsibilities. Traditionally, quality engineers used to focus on manual functional testing, functional validation, manual testing of all the application screen flows, and so on. However, today they have varied responsibilities ranging from functional performance, security, usability and accessibility validation. And second thing is that they have a clear understanding of the end to end functionality of an application from domain and user points of view. And they always strive to add a bit of sense and context to everything that can be seen in the way they do their documentation, their test scenarios and test planning and so on. They also have strong programming skills to automating repetitive tasks. This could be your test scenario creation, test data preparation if they have to create frameworks to do parallel execution, reporting and even dashboarding, and they have an understanding of potential edge case scenarios. So they will always look for thinking. They're trained to think outside the box, and they help in identifying those edge case scenarios that are usually missed out in application design and development. And they are also trained to think, like I said, to think outside the box and create these what if scenarios that are actually a basis of any hypothesis based testing techniques. Quality engineers come up with this with a mindset. These are three words that I use, curiosity, adaptability and exploratory. Let me explain. Now, they come with a mindset that is very curious to know things and query whatever that's placed before them, that actually breeds a positive culture to query why a certain functionality or feature works a certain way. This leads to more questioning as to why does even one do the same task over and over again. In other words, in our modern SRE terms, they always wonder why are we continuously doing these manual tasks? We automate those tasks and thereby reducing toil. They also help create these what if scenarios that probes questions in the house systems would perform if a certain parameter is changed. The next attitude that they have is adaptability. And I'm talking from a third person point of view, so don't mind me taking that angle. They have an open mindset, and first and foremost is since there are a lot of new technologies that are coming out every month quarter, they're open to learning new scripting tools or languages to enable them to do thorough testing of the applications that have, irrespective of the technology stack, whether it be Python, Java, et cetera. As these architectures evolved, these sdates SRe able to develop cloud fluency and be able to validate the architecture's effectiveness, performance and security. What they also are open to do, and are adaptable, is that they're open to switch roles between functional validation, which includes API testing and so on, database validation to test the data consistency, performance validation, load and stress response resilience, fault orance of the system and the threat of chaos. And most primarily, they also take up the hat of the end user and understand how system response actually affects them. So they see it from an end to end perspective, not only look at the particular architecture component, and finally they come with an exploratory mindset, they look out for what can be automated, so they automate the repetitive tasks, like I said earlier, all those test scenario creation testers and result analysis, test data generation, result analytics and inference and so on. And they also would be helpful in architecting, even architecting and automating test infrastructures to integrate and orchestrate environment setup, validation activities, even monitoring setup. Now, testers today and tomorrow would be involved in these following areas, and they are mentioned as four bullet points. First thing, and first and foremost is the autonomous functional validation. And there are a couple of things under that. They would be able to create automation tests, automation frameworks that are self maintaining or autonomous. They would be involved in also validating deployments that are being pushed to production, basically your cloud formation and your terraform templates. They are also involved in setting up observability in the testing space. So in this way that they are able to monitor what happens under the hood for your test sessions. The second thing that they would be involved and they have to build skills in the area of machine learning and so on for any optimization of testing to derive any sort of analytics and decisions based on those analytics capture. We'll talk more about that in the coming slides. Rather than moving from a very reactive performance testing approach, they will be more evolved in setting up autonomous frameworks for continuous performance optimization. And I'll talk some of the key points in this area as well. And finally, as there is a great impetus to make our systems more reliable and resilient to any sort of fault, they would be involved in the evaluation of resilient design patterns. They would be also evolved in injecting failure scenarios into production and validating the system behavior. Now let's go on to the first point now. Autonomous functional validation in autonomous functional validation, the key points here is that they would be involved in setting up these test automation frameworks that are autonomous or self maintaining. Basically what this does is these frameworks should identify changes in architecture every new deployment or code push and automatically update their test code to match up to the updated architecture and business knowledge. So they would have automation scripts that will be running and as soon has a new deployment is pushed in, it senses this new deployment, the new variables, environment variables and so on, and also updates their test code automatically. The second thing is this environment provisioning template validation basically is testing deployment templates. This could be transformation terraform and so on. And what they would do is validate how this template ties in those various resources together into an integrated application. So what the queue would do here is would create the entire stack cloud transformation stack, for example using the SDK and validate whether the stack outputs matches the expected behavior and they pass the test if the stack creation is successful and they'll fail the deployment if the stack creation integration test fails. Test session observability what we sre trying to do here is to evaluate performance regression across application builds while they're doing the functional validation and identifying any contributing parameters, making use of your APM tools and applying usage patterns. This involves setting up observable, even lower environments. There is trade off with cost and licensing and so on, but there are a lot of open source solutions that actually your QE engineers actually use today that facilitate capturing performance data during your functional validation. This could be like changes in number of DB call that SRE meta servers API calls metro service, the time it takes for the DB to read. And since these environments are actually, as these environments are generated using transformation thermal, they can be killed even after execution. So here the test automation engineer is able to automate even that piece of activity. So it's purely autonomous in a way. Now, the second thing that the functional or quality engineers are able to do is to identify latent failure causal chains because they sre now able to look under the hood, resulting in outage or poor experience. Mine usage data from logs and analyzing key failure patterns and failure propagation across the stack in test optimization, analytics and decisioning what you see on the right hand side, the main goal here is to reduce as much toil as possible. And so how they do this is where quality engineers should be skilled in statistics and NLP machine learning techniques to optimize test suit by performing requirement risk analysis, removing redundant scenarios, merge these defects, identify opportunities to sequence test cases, test scenario prioritization, and improve test coverage. It also evolved testing failure propensity, vitality and so on. And one of the other outcomes of this activity is even optimizing test data repository. All right, so let me move on to the next slide. Continuous performance optimization now this is an interesting area because this applies to my line of work. And what is the goal of these autonomous performance optimization techniques is to automatically configure applications, runtimes, databases, cloud environments individually, because each of these have their own separate parameters, and they analyze the relationship between these configuration parameters, between databases, cloud environments and so on, by continuously doing performance testing and tuning activities. So for this performance, engineers today are actually skilled in a couple of things. Earlier they used to be skilled with your performance testing tools like Jmeter, neoload and with your pipeline tools and so on, and monitoring tools like Prometheus, dinosaurs and splunk. But today they should be also skilled with configuration management tools like ansible and so on, and also be able to do bash scripting and able to work with APIs and so on. In addition to automating the performance test, they're also involved in automating dashboard creations, which involves automated monitoring environments, deployments, build version control, and applying dashboard as a code tools and techniques. Now, there are cost benefits to this, obviously, there's cost benefits to any sort of performance optimization. First and foremost is the system utilization, improves system utilization, optimized infrastructure and license cost, and improved customer experience, and improved observability. Now, finally, and one of my favorite topics is the reliability evaluation. There are two parts to this evaluation of resilient design patterns that are applied. This involves creating automated scripts to validate effectiveness of resilience patterns, applying probably whichever library you use today historics or Lins four g and so on, and validate patterns like circuit breaking, rate limiting, bulkheading, timeout handling, and result caching, and so on. So quality engineers would also be evolved in determining potential single points of failures and fault modes using FMEA and STPI techniques. They would evaluate potential failover scenarios for multi region deployments and testing fallback scenarios. They would also be involved in identifying potential dependency failure scenarios. They would also be involved in evaluating effectiveness or recovery playbooks from a functional end to end perspective. And not only that, they would be evolved in conducting dry runs of failure scenarios to validate and finally, chaos testing. There's a huge ocean of information on this area, but however, quality engineers will be involved in two key things. One is to automate the identification, the scenario creation, the injection of those failure scenarios for pre production and production environments, and verify whether those testing results match expected system behavior. This also involves testing failure of all components and external dependencies, and these failures could be network brownouts, instance failures, et cetera. So that's all I has for today. I hope this quick lightning talk would have given you some ideas. You are free to ping me on LinkedIn with my name, Ruben Rajan George, or on Twitter. I'm available on the handle. Ruben Rajan thank you very much. See you later.
...

Reuben Rajan George

Cloud Reliability Architect @ Accenture

Reuben Rajan George's LinkedIn account Reuben Rajan George's twitter account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways