Conf42 DevSecOps 2022 - Online

Level Up Your CI/CD With Smart AWS Feature Flags

Video size:


In this talk, I present the added value of using feature flags as part of your CI/CD process and showcase a feature flags rule-based open-source SDK I donated to AWS Lambda Powertools. The talk also covers feature flags best practices across the entire CI/CD process, from development to production.


  • Run Isenberg: How you can level up your CCD pipeline with AWS smart feature flags. Canary deployments, gradually deploying a new feature and changing the behavior of your service. Another use case is a b testing. We're going to discuss the best practices for using feature flags from development to testing to production.
  • Feature flags are a type of configuration. A configuration is a collection of settings that influence and change the behavior of your service. We have dynamic and static configurations that we can use for feature flags. Since it allows for really quick changes in service behavior, this is a winner.
  • We're going to use dynamic configuration for our feature flags implementation. Why did I choose AWS app config? What's so great about it? It's an AWS integrated service. It's fully managed. It has great features that answer many of my requirements.
  • Edible Lambda Powertools defines best practices for AWS Lambda logging, tracing, input validation. It has a support for regular and smart feature flags. The feature flags will change value according to your input. It allows you to do a b testing.
  • The feature flags best practices that we're going to use across all the stages of our pipeline of our development. From the build to the testing to deployment and production. At some point we want to retire the features and remove the code.


This transcript was autogenerated. To make changes, submit a PR.
Everybody, my name is ran Isenberg and I want to talk to you today about how you can level up your CCD pipeline with AWS smart feature flags. So let's start it. So let's say that you've just deployed your new service, your new feature to your AWS account, your production account, and everything seems fine at the beginning. However, as time goes by, you realize that you have a problem, something is not working. You need to revert the feature and you need to do it as soon as possible. What you're trying to do essentially is to change the behavior of your service. And this capability is a very important one, changing the behavior of your service. And I can think of another two cases where this is very useful. One case is canary deployments, gradually deploying a new feature and changing the behavior of your service gradually. Let's say at the beginning for 10% of the customers, then 20% of the customers, all the way to 100. And during that time if there's an error, you basically want to revert the behavior change automatically and quickly. And lastly, another use case is a b testing. And in a b testing, what you want to do is basically enable a feature change the behavior of your service for a subset of customers. So let's say you have a premium set of customers that you want to enable them, a premium set of features, right? So this is how you can do it with a b testing. So now comes the question, how do you do that? How do you do all these free capabilities? Three capabilities? Well, the answer is obviously feature flags. And this is the main topic of my talk today, and I'm going to show you how you can do it on your edibus account. And we're going to use edibus app config and an SDK that I wrote and contributed to Edibus Lambda power tools. So, a little bit about myself my name is ran Isenberg. I'm a principal software architect at Cyberark. I'm an edibles community builder and I maintain and write at my serverless blog website, runthebuilder Cloud, where I share my serverless knowledge and experience. So what are we going to talk today? What we're going to talk about today? We're going to talk about what are the requirements for these capabilities. We're going to discuss the functional and non functional requirements for a solution. And since feature flags are configuration, we're going to discuss the configuration types, how we're going to implement the feature flags. We have dynamic and static configurations. I'm going to show you in deep dive, the AWS app config and Lambda Powertools solution. We're going to talk about smart feature flags and what's smart about them. And lastly, we're going to discuss the best practices for using feature flags from development to testing to production. So let's start with the requirements. So if I recall, I said that you want to have the ability to quickly roll back any feature to change the behavior as soon as possible. We want to have the gradual deployment of features and an automatic rollback in case of an issue, and we want to have a b testing. In addition, since this is an AWS solution only, we wanted to support both lambda functions and containers. And another requirement that was important to my company, but I think it should be also important to you is fendrop high certification. And lastly, there's a non functional requirement. Any solution should be really easy to use and integrate into my service and my CACD pipeline, and I want it to be self managed and resilient. I don't want to worry about backups or high availability of the feature flags solution. So feature flags are a type of configuration, and a configuration is essentially a collection of settings that influence and change the behavior of your service. And in this example, you can see a naive feature flags implementation that I wrote. I have a simple function, I evaluate, I have a magic function that does evaluate feature flags for me. We're going to discuss what it does later on and it returns me a boolean, and then I have a simple if else if the feature flag is enabled, I'm going to handle the new feature logic. Otherwise I'm going to do the same old service logic and it will not change my behavior. So this is a very naive implementation, but it works. So let's discuss the configuration types. We have dynamic and static configurations that we can use for feature flags. What is a static configuration? So a static configuration, in this case I'm going to use the example of lambda functions because this is what I use, but it can also be containers. So in this case, when I upload my lambda function, when my CI CD pipeline, my service CI CD pipeline uploads a lambda function to the cloud, to my account, it bundles my handler code with environment valve. It defines the environment variables, and also it can bundle in the zip study configuration files could be just JSON files. So they're part of the zip files that goes to AWS and it's deployed. And if I want to make a change to the static configuration, I just need to run the CICD pipeline again and go through all the gates and the tests, et cetera, to build the zip file and deploy it to my production account. Dynamic, on the other hand, are a bit different. So I still have my service CI CD pipeline and I still create my lambda files, my lambda zip file, and I deploy it to AWS. However, the lambda does not have the configuration statically in its zip file. It uses an API call to fetch the configuration from an external resource, some configuration resource that is deployed by another CI CD pipeline, a dedicated CI CD pipeline, just for the configuration. Okay, so in this case, if I want to make a change to the lambda behavior, all I need to do is deploy the configuration CI CD pipeline, which is much quicker, it has less tests and less resources to deploy and it's much quicker. And then when the lambda checks for the new configuration, it's going to get the new values and it's going to change the behavior accordingly. So let's sum it up, static versus dynamic. So static again, we're reading the configuration from the bundled resources, the JSON files in the zip or environment variables. In dynamic, we're using an API call in static, if you want to make a change, you need to rerun the service CICD pipeline. And in the dynamic we need to run the configuration CI CD pipeline, which is quicker. We do have the complexity in dynamic of another pipeline to manage, but since it allows for really quick changes in service behavior, this is a winner. We're going to use dynamic configuration for our feature flags implementation. So now that we understand how to do the feature flags, how to implement them, let's go other the solution we're going to use a JSON configuration file as part of the development stage. We're going to deploy it to AWS app config with its own CICD pipeline. Like we said, it's a dynamics configuration file configuration. And then we're going to use the SDK in lambda power tools for feature flags to evaluate in runtime and get the feature flags from ADLs app config. So this is a sample JSON file with just a premium features where default value is false. The feature is disabled by default in this case. Now we're going to show again bring up the dynamic diagram from before, and here we can see that now we're deploying a JSON file that is translated into an AWS app config configuration resource. And my lambda is going to check new configuration from app config and fetch the values in runtime with an API call. So why did I choose AWS app config? What's so great about it. Okay, so first of all, it's an AWS integrated service. I don't need to add another third party service outside of AWS account. I don't need to have any traffic going outside my account, so it's more secured. I don't need to using into go into the process of security evaluations and all those corporate processes that go into when you're adding third party integrations. It's part of AWS and I can just use it. It's one of the few solutions, if not the only one I believe, that has fedrump high certification for feature flags. It's fully managed, so I don't need to care about backups and high availability. It's always there, it's always working. It has a great feature for validating JSON schemas, so I can define a schema for my configuration. So if somebody tries to upload a malformed or some problematic schema, it will just fail the deployment and my environment will be just fine. And it has deployment strategies. So when you deploy configuration, you can choose canary deployments, which if you recall, is one of our functional requirements. So it has it out of the box. So it's great. I can do canary deployments and define AWS Cloudwatch alarms that if they trigger during the canary deployments, I'm going to have the automatic rollback and go back to the previous version of my configuration. So all in all, it has great features that answer many of my requirements. So this is how the console looks like in app config. You need to define an application. An application can be just your microservice or service. In this case, it's called a test service. And each application has an environment, and environment can be dev test, production, et cetera. And each environment has the configuration, which on the bottom right, you can see it has a version, it has a name on the left, and it has a deployment status if you chose canary deployments. So now that we know how to deploy the configuration, we're going to use app config dynamic pipeline. Let's talk about the evaluation of the function of the feature flags in runtime. We're using to use AWS Lambda power tools, we're going to use Python, which is what I developed. But it's a very simple solution, so you can really write it in your own language of choice. We're using here edibles, APIs, and some Python code. So the examples are going to be Python, since the solution is Python based. So for those who don't know edibles Lambda Powertools is an amazing repository. It basically defines all the best practices for AWS lambda logging, tracing, input validation, and feature flags are defined and you can use their utilities to do that. It has over 1 million downloads per month, so it's very popular. And we're going to use the feature flags utility, which I designed and contributed to edible Aslam powertools. And what it essentially does, it fetches configurations from app config. It stores it in an in memory cache, it evaluates the feature flags value for you, and it has something very interesting. It has a support for regular and smart feature flags. And I'm going to discuss smart feature flag later on. And just to clarify, it's not just for a lambda function, even though the name says lambda, you can use it also in containers. So let's go back to the simple use case. We have a regular feature flag, a 10% of campaign, and the default value is going to be, let's say the feature is enabled by default. And this is how you're going to use the code. In line three, we're going to define the app config configuration, the environment, the application, and the configuration name. In line nine, we're going to define the instance of our SDK with the in memory cache. We're going to initialize it. And then in line twelve we're going to evaluate, right, this is the magic function. We're going to evaluate the feature flags, 10% off, and we're going to get a boolean value back, apply discount, and then you can see the navy implementation again in line 15. If apply discount, change the behavior, do something new. Otherwise do the old behavior. And something important to note that in line 13 I'm using the default value equals false. Why is that? Well, what if somebody deployed a new configuration and just removed the feature 10% of campaign from the configuration? I don't want my code, my lambda function to crash, so I'm going to have a fallback, a default value. So in case it doesn't find the feature flag in the configuration, it's going to have a default value. So now I'm going to show you smart feature flags, which are very cool. First of all, they enable you a b testing, which is the final requirement that we didn't answer yet. So how does it do that? Basically, the feature flags will change value according to your input. You have a context input that you provide, and it has a rule engine that checks if the rule matches. And if they do, they return the value that the rule defines. So you can have for one input the value can be the feature flex value can be false. But if you provide a different value input, it can be true. So one configuration and different behavior and it allows you to do a b testing and I'm using to show you how in a second. So let's take a look at this sample configuration. Let's assume that we have on the left our input event to our lambda, we have usernames and each user has a tier. In this case the tier is premium, but it can also be standard. And on the right we can see the configuration that we have. In line 17 we have the regular feature flags. And in line two we have the smart feature flag. So again it has a default value of false in line three, but then it have the smart rule engines. It has the rules in line four defined. It has one rule, it says customer tier equals premium. And if the customer tier is premium, then line six says then the feature flag is going to be true. Right? And in order for the rule to match, all the conditions need to apply need to match, need to value it true. So here we have a set of conditions, just one. And it means that the tier, which is the key in the input needs to have a value of premium. And the key tier needs to equal to the value premium, right? Because the action is equal. So tier and the value need to be equal. So let's see it here in this example. So the same code applies here. It's the same thing as we had before, but we have the context in line 13 where we were building the input context. So we have the key tier and then we have the value. It can be standard or premium. So if you recall, if tier, the key is going to be standard, then the feature flag is going to be the rule does not match, it's going to be default, false. If tier has a value of premium equals premium, then the rule is going to match. And the has premium features in line 17 is going to be true. So in line 17 we just call the same evaluate function, but we provide the optional context. Okay, so then if it's premium tier, line 19 is going to trigger and we're going to enable the premium features. Otherwise for another user we're going to have different behavior. So that way you can do a b testing between different users with the same configuration. And there are over ten actions that you can use. You can see more in the website. You have start with keen value, et cetera, over ten actions. And also you have non boolean feature flags. You can use any valid JSON value can be a list of strings, integers, et cetera. In this case I'm using a list of strings where I want the premium tier to have special actions that I do on their account, like remove limits and remove ads. But the default for the non premium users is going to be no special action is going to be applied. So you can use this for all sorts of sample rules. You can enable it for a specific customer, maybe an admin of a customer, apply discount for specific types of products, offer free shipping if the cost is higher than some number. You can have so many possibilities here, and it's very flexible. So like I said, we're going to use it for a b testing, and you can have different user experiences for different users with just one single configuration which does not change. So if I recall, I've mentioned that there is an in memory cache. Why is that important? Because each call to AWS app config to fetch configuration costs money and we want to save some money. So the in memory cache says that if the cache does not expire, we do not fetch the new configuration and we save money. And you can define what number of seconds you want to have. And it's important to remember that it's a balance between cost saving and having the service change its behavior as soon as possible. Because if the cache doesn't expire, the service will not fetch a new configuration. And by the way, I'm adding very soon, hopefully this month I'm adding time best rules where you can enable rules and feature flags at specific times, enable features for a specific duration, or enable them during specific days. And now lastly, we're going to discuss the feature flags best practices that we're going to use across all the stages of our pipeline of our development, from the build to the testing to deployment and production. So in my eyes, the development team needs to own the process from start to end. They need to write the configuration JSON files, they need to write the code that evaluates it and behaves accordingly. And they need to start where the features are enabled in best and dev accounts, but disabled in production. And when it comes to best, well, we're going to use mocks, we're going to mock the configuration in our tests so we have better control on the outcome. And obviously we're going to mock the feature when it's enabled and tested. All the side effects and everything is working just fine. But it's very important to mock the feature as also disabled, because sometimes you don't have a simple if statement, if feature is enabled, do something sometimes it's more complicated and it's really important that that part of logic is tested. We want to assert that the logic, the function that handles the feature flag when it is enabled, does not run when the feature is actually disabled. We actually had a bug where our feature was marked as false, but due to a bug in the if statement, it was a complicated one. The feature actually ran and we had some problem in production. So it's very important to test that. Then once you decide that the feature is stable in the non production environments, you can go ahead and run a deployment strategy to production and use canary deployments. Epcofing has you covered for that. And you should define cloud watch alarms on errors for your service so you can auto revert sorry, you can auto revert your configuration if there's an error. Now, what happens if for some reason at some later time you do have some errors in your feature, things that you didn't find in the tests? Well, you should disable the feature as soon as possible and run the configuration CI CD pipeline again. You should update the tests and add the missing use cases and just do the whole thing again. Just deploy and re rulebased again. And I suggest that you also do a retro meeting where you identify why, how come you missed those use cases in the test, how come you had this bug in production and eventually you need to retire the feature flags. And why you should do that? Well, because feature flags, they add code complexity, you have more best around it, you have more mocks, you have more if statement and branching in your code. It's more complicated. So at some point we want to retire the features and remove the code. How do we do that? How do we do that? We're meeting once a month and then we can discuss and select candidates for removal for feature flags to remove. And then all we need to do is just run the configuration CI CD pipeline again and monitor that everything is okay. How do we select candidates for removal? Well, if the feature has been enabled to all the customers for several weeks and it's been stable, there are no bugs around it, the feedback of the customers has been very positive and you don't have any open issues. And if you don't expect any changes in the code around that area, then you should totally just retire the feature and make your code simpler. So let's sum it all up. We created feature flags, smart and regular. We deployed app config. We used lambda power tools to fetch and evaluate the configuration feature flags. We had canary deployments. We learned how to do a b testing, and we learned how to do what are the feature flags? Best practices in the development stages all the way to production. So thank you very much. That's been my talk. And you can follow me on my twitter and my, my LinkedIn and check out my website, runthebuilder cloud, where I talk about all things serverless. Thank you very much and have a good day.

Ran Isenberg

Principal Software Architect @ CyberArk

Ran Isenberg's LinkedIn account Ran Isenberg's twitter account

Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways