Conf42 DevOps 2024 - Online

Efficiency in Motion: Mastering Continuous Delivery without Compromising Stability

Video size:

Abstract

Unlock microservices success! Join our session on efficient deployment, shifting from all-in-one releases to a single domain service. Perfect for beginners to intermediates, discover common pitfalls, why change is crucial, and practical continuous delivery improvements

Summary

  • Efficiency in motion mastering is delivery without compromising stability. I work as a senior architect with a company simpler who is into the domain of intranet platforms. I specialize in creating products using microservices and event driven architecture. What we will look at is what all changes this team had to go through to embrace a continuous delivery.
  • Most companies are by default going with microservices architecture. It is nothing but an independent deployable unit modeled around a business domain. The key advantage is team autonomous. You want to ensure that whatever you are delivering is not breaking any of the existing features.
  • A deployment of a service into production is not equal to a feature release. Every feature which I develop goes behind something called as a feature toggle, a feature flag. Even half built code can still make into the production because the flag is turned off by default. This is what we are trying to do when we are doing continuous delivery.
  • You are able to take advantage of a microservices architecture. The risk is literally zero. You have near real zero downtime deployment. It gives you enough flexibility on the feature release. You can embrace trunk based development. But it is not going to be easy.
  • When you work with feature flags, right, you need to be very careful in terms of how you name the feature flag. All the feature flags that you add in the code, they eventually have to be removed. Without breaking anything is a key. observability cannot be an afterthought.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hello, a very good morning, good afternoon and a good evening, depending on the location you are connected from. And thank you so much for joining me in this session. Efficiency in motion mastering is delivery without compromising stability where I will be talking about my experience working with teams who thought they were doing microservices. But eventually they realized that it was actually a trap of a distributed monolith and they really couldn't take a full advantage of a microservices architecture, which tells you that you can ship a feature as soon as possible the moment it is developed. So what we will look at is what all changes this team had to go through to ensure that they can really embrace a continuous delivery of a single microservice as a shippable unit. And about me I work as a senior architect with a company simpler who is into the domain of intranet platforms and has been positioned as a leader in 2023. Gartner Magic Quadrant in intranet packaged solution I specialize in creating products using microservices and event driven architecture. I also publish blogs on the site that you see on the screen and if you want to connect me, these are my LinkedIn credentials and this is how I look as of today. So let's dive in and see what we have got in this session for you. So this diagram is pretty self explanatory. If you look in any of a product organization, you have got multiple stakeholders who work together to deliver a product. You have a product team who works with the end users and the customers to understand the requirements, produce some documents, brds and srds, and then it's the dev team who works with the product team to understand the requirements better. Work with the architects, come up with the design and they try hard to produce a feature which can be released into the prod environment as soon as possible. Now, as soon as possible is the crux here. So the intent is you release the feature soon and you wait for the feedback from the users who will be using this feature. And if they find everything okay, nothing like it, but we generally do not see that happening. So they will have some feedback, they'll pass it on to the product team. Product team will again come back to the developers and then this cycle goes on and on. So the way I said, the key is you want to deliver the feature as soon as possible. And you can smell the team is trying to do a continuous delivery, which is a software development process of getting the code changes deployed into production quickly, safely and with high quality. So we understand all these buzzwords, not really a buzzword, but a key things, which is you want to do it as soon as possible. You want to ensure that whatever you are delivering is not breaking any of the existing features. And of course, it's all bug free. And that is where it says the high quality. And one of the software architects that complements this kind of development process is known as microservices architecture, which is nothing but an independent deployable unit modeled around a business domain. And generally these microservices collaborate with each other to deliver a larger business use case. So majority of us have done monolith architecture and we have looked at the pain that this architecture offers, and then people have slowly, and I would say majority of the companies are by default going with microservices architecture because they understand the advantage that this architecture gives to them. It's like smaller units, easy to comprehend, easy to understand, easy to develop, test, deploy. I mean, that is the real advantage of a microservice architecture, right? And one of the key advantage is team autonomous. So I can have multiple services being owned by different teams. And because these teams own the service, they understand the in and out and is typically modeled around a business domain. So tomorrow, if there is a change which is supposed to be done in one of the service, it is just easy for them to go and make a change and deploy the service without impacting any other service. So in this case, you see there is service a, which is on version one, and there is a change which is supposed to be done in the same service. So this service will simply make some changes, do enough testing, ensure that things are not breaking, and they can just go and deploy the version two of the same service. And they don't have to collaborate with any of the other services because they know that the requirement is only for them. So they have full authority to make a change to deploy as and when they feel okay and they feel confident. But this is what is expected, that I can go and deploy my service as and when I feel. But for many of the companies, when you look at the way they ship their features, things really go crazy. And that is what I am going to talk about in this session. So majority of the companies, if you look at a deployment pattern, this is how you will get to know that you have got multiple services who have their artifacts in a single artifact repository. But when they are trying to ship a feature, it's not just one service which is being deployed. There are multiple services which is being deployed in the prod environment. And there could be many reasons for that to happen. But when you look at this kind of architecture, right, it gives you a feeling that this is not microservices, this is more like a distributed monolith. And yes, you're absolutely true, this is a distributed monolith. And honestly, there can be many reasons which are quite technical. Maybe the service boundary of a microservice is not correct. Maybe a single microservices was broken further down into so many services that a single requirement in that domain leads to lot of changes happening across multiple services. And maybe because of that reason you're forced to deploy these services together. Other reason could be that you have an underlying data storage which is being shared across multiple services, a kind of entire pattern that we say in microservices, where a change in the data storage which is being shared, it can trigger a change, a cascading impact to majority of the services. And this could be also one of the reason why there are so many changes happening and things are getting deployed even for a single feature to be shaped. And one more could be like too many shared libraries, right? You are trying to share some code by way of creating libraries. So one change in that library and then you are forced to update all the services which are using this library, and then you will have to deploy those services once again when you're doing a release. But there is one more reason for such a deployment pattern. And this is pretty common for companies like SaaS, companies who follow a specific release cadence like they have a predefined feature releases schedule where let's say, they say that we will do a shipment every one month and then we will accumulate all the features who were developed even in the first week of the month. They are just holding onto it. They are just waiting for the four week slot to be over, the final day of release to come. And then they will ship the features like multiple features being shipped the same day, and then there could be valid reasons for it. Some of the nontechnical reasons for doing this stuff is maybe you want to create a market buzz. Your customers are asking for a feature which is very in high in demand, and then you want to create a market buzz that okay, we are doing this feature in so and so release, and then you want to have that craze going in the market. Other reason could be your customers, they don't really have an appetite. So as a company, you are okay to release so many features, maybe daily, if not weekly or monthly, but then your customers are saying like, no, let me absorb whatever you have already shipped, and then we are not in a position to absorb more, you want to share, like maybe from a compliance season, you are forced to share some report with every release that you make. One of the example could be like a pen test that you are doing for your product, where you want to release your security posture of the product. And maybe you will have to train your customer support teams. You want to publish user guides, a lot of other stuff because of which you are actually doing a release in a very specific cadence. So we understand now that we wanted to do microservices. We wanted to embrace deploying one service at a time with the assumption that one feature fits. One service may not be the case always, but in majority of the cases, that's how it should be. If that's not the case, then you have done something wrong with breaking up your domain and assigning the domain to a service. But then what is the problem that we are trying to address here? Like what exactly the issues that this distributed monolith causes? Well, a very high risk that you are getting into when you're doing this bulky deployment is you do not know if something goes wrong, why it went wrong. And this is coming, when I say distributed monolith, this feature is coming from like this negative feature is coming from the monolith world where you're doing a bulky deployment, releasing so many things in a single release package, and then you really do not know what went wrong if things are not working as expected, right, and then lot of debugging time it consumes for you to at least understand that what could have gone wrong, why things are not working as expected. Second is high deployment time. With microservices, the expectation is it should be like zero downtime, or maybe roughly near zero downtime is what we say. But in this case, with so many microservices getting released together, your deployment time is eventually going to increase, which might have an impact on the user experience for the customers who are using your product, when you are actually doing a release. And the most important is what I would say, it literally burns out the engineering team. Now, some of the teams which I have worked with who were doing this kind of deployment. So every team will have a designated person on a release day. They will get into a conference room or maybe onto a bridge where one person represents one service, or maybe more than one service the person or the team owns. And then there will be a kind of checklist that will go first service a, followed by b, followed by c, and followed by d, and so on. And once every service is deployed, then you give a signal to your automation team that you know what, we are done with all kind of deployments, with all the services in all the predefined sequence. Now it's time for you to go and just perform the sanity. And then you just pray, maybe the release God that things work as expected, which generally may not be the case always. And then you just have to wait for the outcome. If things pass, you are one of those lucky ones. But if things fail, then you do not know what went wrong or which service resulted into such kind of situation. And honestly, some of the reasons which I have also heard by the teams who are not very proficient with microservices, it's like they tend to avoid, they assume that my network is always available. And once I am able to somehow pass this kind of engagement where every service is deployed together, all my use cases are passing. Things will never fail after that. Right? So a kind of myth, that network is always available, but just a comment I wanted to make. But then overall, you see everyone is so occupied on the release day that the entire day is gone and the productivity of the team, it goes to literally zero. And just imagine the pressure the QA team will be having. And then followed by the dev team where they just have to wait for the results to come out. And if they feel something is not working as expected, then they just have to work on it. And if they cannot fix it in a given amount of time, then they will have to roll back. But then what? To roll back, literally roll back everything. And that is something we do not want and that is something that microservices is not we use for. Right. All right, so we looked at the situation that the teams are into. We looked at the problem that something like this is going to have if you try to go with this deployment model. But then what's the solution? If you look at the solution, it's pretty simple, right? Like you just deploy the service as soon as a feature is developed, and I'm not kidding, that is the solution, right? Logically, you're done, just go and deploy it. The task is done, right? And let people use it if they want to use it. Again, I'm saying deploy the service as soon as a feature is developed. What exactly it means, we'll understand in a bit. But then the moment you make this statement, you immediately get two questions, which I generally get. One, what happens to the release cadence? We just discussed about it, right. I want to create a market bus. Now, what happens to the market buzz I cannot create. Right. With this. And second, with whatever branching strategy that you follow, there will always be a situation where multiple developers in the same service are trying to work on different features. They check in the code. As a best practice, they should check in the code on a daily basis in the branch. And feature X is done, but feature Y is not yet done. And I want to release feature X, but the dev who's working on feature y, they come and say you cannot do it. Like you cannot just go and deploy the artifact because my feature is not ready. And if it is not ready, end users can see the buttons on the UI, they can just click it and things will not work. In that case, what's going to happen? Right? So a very fair and important questions, I would say, and the answer lies like this, a deployment of a service into production is not equal to a feature release. Now, all I had said was we deploy the service into the prod environment, but not essentially releasing a feature. Now, what exactly it means is that I will ensure that every feature which I develop goes behind something called as a feature toggle, a feature flag, right? It's simply whichever language you have been programming, you always have an if condition. So if the flag for feature x is enabled, my code will execute. If the flag is off, my code will not execute. And I just assume we all trust the if continuous, right? So if something is true, it will execute. If something is false, it will not execute, it will escape. And if I can ensure that every feature which I write it is behind a feature toggle, my job is done. So if I go to the next slide, and if you look at this diagram, right, you have a diagram that says a trunk based development, the branching strategy where multiple features are being worked on at the same time. So every developer just ensures that whatever feature they are working on, they just go behind a feature flag. By default, the flags will be turned off. And if you are done with the feature you deploy into the production, you still keep the feature flag off only when you think, and only when you think that it's time for you to turn the feature on. Simply go and turn the feature flag on. You don't have to redeploy the service and your job is done. So now with this, you are not waiting for all the dependencies to be available. Even half built code can still make into the production because the flag is turned off for everyone by default. And if you see this is what we are trying to do, we are trying to do a continuous delivery. So in this case, multiple service lines, like multiple service owners, they can work on their feature set and they can keep the feature flag on and off depending on whether they want to release the features to the end users for them to use or they just want to deploy into the production and let the right time come for them to release a feature. That means simpplr go and turn the feature flag on for the customers to use. If I go to the next slide now, this is important, like how it helps. We are looking at the solution and there are some advantages to it. So how it helps is you are able to take advantage of a microservices architecture. Now you are talking about deploying microservice as a single unit of deployment. So the risk is literally zero. You have near real zero downtime deployment. The risk has gone to absolutely to the bottom, because now things will work. Nothing like it. If things don't work, you know why things are not working? It is because of your service. And you can either turn the feature flag off or for some reason if you have to roll back, you know, you just have to roll back your service, like you don't have to depend on other service, no coordination needed, and all good. It gives you enough flexibility on the feature release. So we discussed that majority of the SaaS based companies, they have this cadence of deployment. So they generally define that, okay, one release four weeks down the line, all these features will go. But with this approach, if you have got something which is readily available, like, which is ready upfront and you want to ship it, because maybe you have a customer churn happening, you want to make them happy or you are getting enough push from the customers that we want this release as soon as possible and you had a plan to ship it maybe in the second month, but because the feature has been developed, you can just ship it right away. You don't even have to wait for the cadence to come. You can do dark launches. One of the beauty I would say of this approach is you can do beta launches, you can do selective customer launch. Maybe you want to understand, you are not very clear what the impact would be in terms of the performance, in terms of the infrastructure, rollout, the need and all. So you can do a silent launch, maybe for one customer, two customers who can be your test bed as well. So you understand the feature, how this is being used, if this is being acknowledged, the load and all, you understand, you fix the performance issues with these customers and then your customers are actually the live testers for you if you think the other way around. And once you really harden it, you can just deploy it for maybe across regions or maybe for bigger enterprise customers. And this way you are launching something which is already hardened by some of your real users. You can test in production, right? Maybe in some cases you have situations where you cannot test some kind of integrations in a lower environment. Maybe you do not have enough licenses, or there could be lot of other reasons as well. So you want to deploy into production and you want to test in production. So maybe you can just open the feature flag only for a specific user, maybe a specific test user. You can perform enough testing in your production and then get a certification that yes, whatever you have done is working, though you could not test it in the lower environments. You can embrace trunk based development. We all understand there are different kind of the branching strategies. We all have gone through the pain and hassle of all the merges and conflicts, a lot of other stuff. Trunk based development is something that helps you eliminate those kind of issues. So with this approach, you can also embrace trunk based development. And because you are doing deployment so many times in the production, you kind of master the art of deployment. So tomorrow, if something goes wrong, because you are doing things day in, day out, you know, what needs to be done, like what's the next course of action? Should I roll back, should I fix it, should I turn the feature flag off compared to where you are doing a deployment to prod once, maybe in a week, maybe in a month, a kind. And then if something really goes wrong, you generally do not know what to be done and you land up, maybe simply doing a rollback, right? But as the saying goes, nothing comes for free, right? We looked at the solution, we all understand, we acknowledge it, but it is not going to be that easy. Right? So there are quite a few challenges that you have to address before you say that. No, this is something that I want to adopt. So you need to be very thorough with what you're getting into, which is going to be fruitful in long run. But then initially there will be hiccups, it will take a lot of effort. So that is what this slide talks about. So low fault tolerance. So now if you see your entire product, you have feature flags everywhere. So whatever service, homegrown product, or maybe a third party product that you are using for leveraging feature flag, the availability of that service or that feature flag product has to be pretty high. If that one service goes down, your entire product goes for a toss. So that is something pretty important, high testing and validation effort. So we all understand when we write a microservice or as a matter of fact, when you write any piece of code, we ensure that whatever functions we have written, whatever conditions we have written, we write unit test cases. Then we write integration test cases, we write API test cases, we write end to end test cases, all sort of sanities and regressions and a lot of things that we do from an automation perspective, which itself is a very high, effort consuming task. But with feature flag now I'm adding more complexity. Now the complexity is all about lot of branching that will be happening. So let's say you have a feature. The feature can be either on or it can be off. Now, there could be other services who are dependent on this feature. Now, when you execute your test, when you execute your test suit, right, you need to ensure that your overall product is not breaking. If the feature flag is off or the feature flag is on, like things should work as expected, even if you have a half baked code lying in the production behind a feature flag which is turned off. So there are so many branching that would be happening. So for every test cases, you will have lot of situations, lot of feature flags, lot of combinations that will be there. And there comes a very important question, right, that do I need to do all the permutations in combinations? So every feature flag will have, let's say a state of true and false, and I have like hundreds of feature flags. Do I need to have a combination with each one of them? In that case, I will never be able to release something. Forget about fast, right? Because it will take a lot of time for you to execute it. Now that's where you have to take a smart decision. You need to identify some of your core use cases, some of your core services who are behind feature flag, and making sure that at least those core use cases, those core services are not breaking. Things are working as expected, and then if you have got more cpu capacity, you have got more time. You can just write more test cases and execute more test cases. Nothing like it, but at least have some core combination of feature flags just to ensure that your key critical components are not breaking and your key use cases are passing as expected enforced governance. When you work with feature flags, right, you need to be very careful in terms of how you name the feature flag. Looking at the feature flag, can I identify which service is using the feature flag? How many feature flags you want to keep in your product? You cannot just keep on adding feature flags, right? Once your feature flag is released, maybe in an early stage with the beta customers, eventually it will go ga, right? So you cannot just keep on adding feature flags. You have to decide upfront that the lifecycle of this feature flag is going to be, let's say two weeks or maybe two months, and then you will have to write enough test in terms of identifying that this feature flag was supposed to have a lifecycle of two weeks. The two weeks are gone. The feature flag still lies. Now something is wrong, right? So you have to write what you call as a fitness functions around these feature flags that helps you identify that the life of this feature flag has expired. So the dev team has to take some action to get that clean operational ownership. Now this is something pretty important with so many feature flags happening turned on and off for some of the customers, for some of the beta customers, who is going to take the responsibility of turning the feature flags on and off. So there are many ways by which this can be done. You have got many stakeholders who can be involved, who can take actions, who can streamline the process. What we have realized is, at least in the production environment, if this ownership goes to the product team, because they understand well in terms of the feature is to be rolled out for which customer or maybe which beta customer or test customer or friendly customer, whatever name you want to call it. So they know when the feature will go Ga and for whom to releases, at what point of time, if a new customer is getting signed up, is this feature to be released for the customer or not? So if the product owners own the feature flags, at least in the prod environment, that will be good to have in lower environments. It depends. You can have the pod leads or the function leads, the architects, they can own and they can just play around with the feature flags. Or maybe your QA architect, they can also do the same thing. By default you are actually accumulating the tech debt. So all the feature flags that you add in the code, they eventually have to be removed, right? Once your feature goes GA, and that is what you have to be on toes to understand which feature flag has expired and it's time for you to go and remove the code. And again, it depends on how you write the code involving feature flag. And that's why there is a high learning curve. You don't want to scatter the feature flags around your product. With that it will be really hard to understand which feature flag is being used, where, in which service and how do I clean it up. So there are well defined patterns that you can use to ensure that the feature flags is easy to remove. I would say is easy to add for sure, but then it's easy to remove as well. Again, without breaking anything is a key. Now, observability cannot be an afterthought. We all understand the importance of observability, why it exists and why it should be there. Especially with Microservices, you cannot take a chance and it has to be a day zero thought process. But the moment you add feature flags, it cannot be a day one, it has to be a day zero thought process. Because now with feature flags being added, different services will be owning their own feature flags. Something being turned on, turned off by mistake. You want to know that something is breaking and you want to react like, you have to be proactive in identifying something has gone wrong, rather than your customer coming to you and saying that this service was working up till now, and now it is not working. And then you go and okay, somehow the feature flag got turned on or turned off, let me reverse it. And then you go back saying that, okay, now just try it out. And things work. So you want things to be identified by you and not by somebody else. And resiliency cannot be an afterthought. So resiliency is very important in a world of microservices. A microservice has to be resilient. But many times what we have seen is people tend to ignore the resiliency part. They assume the service they are dependent on will always be available and my network will always be behaving good with me. The bandwidth will always be enough. So they generally tend to avoid those kind of design and handling that in the code. But with feature flags, you really cannot take those chances. So that's why resiliency and observability, they just cannot be an afterthought. It has to go as a day zero implementation, I would say. Well, the last piece is to summarize whatever we discussed, use feature flags and define operational and governance model for it. This will help you to embrace the microservice architecture. This will help you to deploy microservice as a unit of deployment. The moment you are done with the development of your feature, you have tested enough without dependent on others, without having coordination with other teams. You can just go and confidently deploy the service in the environment, in the prod environment again, you can deploy, and if you think that it's time for you to release, you can just turn the feature flag on and the feature is released as well. You can embrace trunk driven development and get rid of all the pains that happens because of all the merge conflicts, a lot of other stuff, and finally have enough test automation in place for the key components for key use cases, playing with the toggle flags on and off, and ensure that things are always working as expected. Even you have a half big code in the production environment and I'm sure, and I can assure you that once you follow whatever is mentioned here, you are all set to embrace continuous delivery for a microservice as a unit of deployment. Thank you so much for joining me in this session.
...

Naresh Waswani

Senior Architect @ Simpplr

Naresh Waswani's LinkedIn account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways