Transcript
            
            
              This transcript was autogenerated. To make changes, submit a PR.
            
            
            
            
              Everybody, my name is ran Isenberg and I want to talk to you today
            
            
            
              about how you can level up your CCD pipeline
            
            
            
              with AWS smart feature flags.
            
            
            
              So let's start it. So let's say that you've
            
            
            
              just deployed your new service, your new feature to your AWS
            
            
            
              account, your production account, and everything seems fine
            
            
            
              at the beginning. However, as time goes by, you realize
            
            
            
              that you have a problem, something is not working. You need to
            
            
            
              revert the feature and you need to do it as soon as possible.
            
            
            
              What you're trying to do essentially is to change the behavior of your
            
            
            
              service. And this capability is a very important one,
            
            
            
              changing the behavior of your service. And I can think of
            
            
            
              another two cases where this is very useful. One case
            
            
            
              is canary deployments, gradually deploying
            
            
            
              a new feature and changing the behavior of your service gradually.
            
            
            
              Let's say at the beginning for 10% of the customers, then 20% of
            
            
            
              the customers, all the way to 100. And during that time
            
            
            
              if there's an error, you basically want to revert the
            
            
            
              behavior change automatically and quickly.
            
            
            
              And lastly, another use case is a b testing.
            
            
            
              And in a b testing, what you want to do is basically enable
            
            
            
              a feature change the behavior of your service for a subset
            
            
            
              of customers. So let's say you have a premium set of
            
            
            
              customers that you want to enable them, a premium set
            
            
            
              of features, right? So this is how you can do it with
            
            
            
              a b testing. So now comes the question,
            
            
            
              how do you do that? How do you do all these free capabilities?
            
            
            
              Three capabilities? Well, the answer is obviously feature
            
            
            
              flags. And this is the main topic of my talk
            
            
            
              today, and I'm going to show you how you can do it on your edibus
            
            
            
              account. And we're going to use edibus app config and
            
            
            
              an SDK that I wrote and contributed to Edibus Lambda
            
            
            
              power tools. So, a little bit about myself my name
            
            
            
              is ran Isenberg. I'm a principal software architect
            
            
            
              at Cyberark. I'm an edibles community builder
            
            
            
              and I maintain and write at my serverless
            
            
            
              blog website, runthebuilder Cloud, where I share my
            
            
            
              serverless knowledge and experience. So what
            
            
            
              are we going to talk today? What we're going to talk about today?
            
            
            
              We're going to talk about what are the requirements for these
            
            
            
              capabilities. We're going
            
            
            
              to discuss the functional and non functional requirements for a
            
            
            
              solution. And since feature flags are configuration,
            
            
            
              we're going to discuss the configuration types, how we're going to implement
            
            
            
              the feature flags. We have dynamic and static configurations.
            
            
            
              I'm going to show you in deep dive, the AWS app
            
            
            
              config and Lambda Powertools solution. We're going to talk
            
            
            
              about smart feature flags and what's smart about them. And lastly,
            
            
            
              we're going to discuss the best practices for using feature flags
            
            
            
              from development to testing to production.
            
            
            
              So let's start with the requirements.
            
            
            
              So if I recall, I said that you
            
            
            
              want to have the ability to quickly roll back any feature
            
            
            
              to change the behavior as soon as possible. We want to have
            
            
            
              the gradual deployment of features and an automatic rollback in
            
            
            
              case of an issue, and we want to have a b testing.
            
            
            
              In addition, since this is an AWS solution only, we wanted to
            
            
            
              support both lambda functions and containers. And another
            
            
            
              requirement that was important to my company, but I think it should
            
            
            
              be also important to you is fendrop high certification.
            
            
            
              And lastly, there's a non functional requirement.
            
            
            
              Any solution should be really easy to use and integrate
            
            
            
              into my service and my CACD pipeline, and I want it to
            
            
            
              be self managed and resilient. I don't want to worry about
            
            
            
              backups or high availability of the feature flags solution.
            
            
            
              So feature flags are a type of configuration,
            
            
            
              and a configuration is essentially a collection of
            
            
            
              settings that influence and change the behavior of your service.
            
            
            
              And in this example, you can see a naive feature flags
            
            
            
              implementation that I wrote. I have a simple
            
            
            
              function, I evaluate, I have a magic function
            
            
            
              that does evaluate feature flags for me. We're going
            
            
            
              to discuss what it does later on and it returns me
            
            
            
              a boolean, and then I have a simple if else if the feature
            
            
            
              flag is enabled, I'm going to handle the new feature logic.
            
            
            
              Otherwise I'm going to do the same old service logic
            
            
            
              and it will not change my behavior. So this is a very naive implementation,
            
            
            
              but it works. So let's
            
            
            
              discuss the configuration types. We have dynamic and
            
            
            
              static configurations that we can use for feature flags.
            
            
            
              What is a static configuration? So a static configuration,
            
            
            
              in this case I'm going to use the example of lambda functions
            
            
            
              because this is what I use, but it can also be containers.
            
            
            
              So in this case, when I upload my
            
            
            
              lambda function, when my CI CD pipeline, my service
            
            
            
              CI CD pipeline uploads a lambda function to the cloud, to my account,
            
            
            
              it bundles my handler code with environment
            
            
            
              valve. It defines the environment variables, and also it
            
            
            
              can bundle in the zip study configuration files could be
            
            
            
              just JSON files. So they're part of the zip files that
            
            
            
              goes to AWS and it's deployed. And if I
            
            
            
              want to make a change to
            
            
            
              the static configuration, I just need to run the CICD pipeline again
            
            
            
              and go through all the gates and the tests, et cetera,
            
            
            
              to build the zip file and deploy it to my production account.
            
            
            
              Dynamic, on the other hand, are a bit different. So I still have my
            
            
            
              service CI CD pipeline and I still create my lambda files,
            
            
            
              my lambda zip file, and I deploy it to AWS.
            
            
            
              However, the lambda does not have the
            
            
            
              configuration statically in its zip file.
            
            
            
              It uses an API call to fetch the configuration from
            
            
            
              an external resource, some configuration resource that is deployed
            
            
            
              by another CI CD pipeline, a dedicated CI CD pipeline,
            
            
            
              just for the configuration. Okay,
            
            
            
              so in this case, if I want to make a change to the lambda behavior,
            
            
            
              all I need to do is deploy the configuration CI CD
            
            
            
              pipeline, which is much quicker, it has less tests and
            
            
            
              less resources to deploy and it's much quicker. And then when
            
            
            
              the lambda checks for the new configuration, it's going to get the new values and
            
            
            
              it's going to change the behavior accordingly.
            
            
            
              So let's sum it up, static versus dynamic.
            
            
            
              So static again, we're reading the configuration from the bundled
            
            
            
              resources, the JSON files in the zip or environment variables.
            
            
            
              In dynamic, we're using an API call in
            
            
            
              static, if you want to make a change, you need to rerun the service CICD
            
            
            
              pipeline. And in the dynamic we need to run
            
            
            
              the configuration CI CD pipeline, which is quicker.
            
            
            
              We do have the complexity in dynamic of
            
            
            
              another pipeline to manage, but since it allows for
            
            
            
              really quick changes in service behavior, this is a winner.
            
            
            
              We're going to use dynamic configuration for our feature
            
            
            
              flags implementation.
            
            
            
              So now that we understand how to do the feature flags,
            
            
            
              how to implement them, let's go other the solution
            
            
            
              we're going to use a JSON configuration file as part of
            
            
            
              the development stage. We're going to deploy it to AWS app config with its
            
            
            
              own CICD pipeline. Like we said, it's a dynamics configuration
            
            
            
              file configuration. And then we're going to use the SDK
            
            
            
              in lambda power tools for feature flags to evaluate in runtime
            
            
            
              and get the feature flags from ADLs app config.
            
            
            
              So this is a sample JSON file with just a premium
            
            
            
              features where default value is false. The feature is disabled
            
            
            
              by default in this case.
            
            
            
              Now we're going to show again
            
            
            
              bring up the dynamic diagram from
            
            
            
              before, and here we can see that now we're deploying a JSON file
            
            
            
              that is translated into an AWS app config configuration
            
            
            
              resource. And my lambda is going to check new
            
            
            
              configuration from app config and fetch the
            
            
            
              values in runtime with an API call.
            
            
            
              So why did I choose AWS app config?
            
            
            
              What's so great about it. Okay, so first of all,
            
            
            
              it's an AWS integrated service. I don't need to add another
            
            
            
              third party service outside of AWS account. I don't need
            
            
            
              to have any traffic going outside my account, so it's more
            
            
            
              secured. I don't need to using into go into
            
            
            
              the process of security evaluations and
            
            
            
              all those corporate processes
            
            
            
              that go into when you're adding third party integrations.
            
            
            
              It's part of AWS and I can just use it. It's one of the few
            
            
            
              solutions, if not the only one I believe, that has fedrump
            
            
            
              high certification for feature flags. It's fully managed,
            
            
            
              so I don't need to care about backups and high availability.
            
            
            
              It's always there, it's always working. It has
            
            
            
              a great feature for validating JSON schemas, so I can define
            
            
            
              a schema for my configuration. So if somebody
            
            
            
              tries to upload a malformed or some problematic
            
            
            
              schema, it will just fail the deployment and
            
            
            
              my environment will be just fine.
            
            
            
              And it has deployment strategies. So when you deploy configuration,
            
            
            
              you can choose canary deployments, which if you recall,
            
            
            
              is one of our functional requirements. So it has it out of
            
            
            
              the box. So it's great. I can do canary deployments
            
            
            
              and define AWS Cloudwatch
            
            
            
              alarms that if they trigger during the canary deployments,
            
            
            
              I'm going to have the automatic rollback and go back to the previous
            
            
            
              version of my configuration. So all in all, it has great
            
            
            
              features that answer many of my requirements.
            
            
            
              So this is how the console looks like in app config.
            
            
            
              You need to define an application. An application can
            
            
            
              be just your microservice or service. In this case, it's called
            
            
            
              a test service. And each application has
            
            
            
              an environment, and environment
            
            
            
              can be dev test, production, et cetera.
            
            
            
              And each environment has the configuration, which on
            
            
            
              the bottom right, you can see it has a version, it has a name on
            
            
            
              the left, and it has a deployment status if you chose canary
            
            
            
              deployments. So now that we
            
            
            
              know how to deploy the configuration, we're going to use app config
            
            
            
              dynamic pipeline. Let's talk about the
            
            
            
              evaluation of the function of the feature flags in runtime.
            
            
            
              We're using to use AWS Lambda power tools,
            
            
            
              we're going to use Python, which is what I
            
            
            
              developed. But it's a very simple solution,
            
            
            
              so you can really write it in your own language of
            
            
            
              choice. We're using here edibles, APIs, and some Python
            
            
            
              code. So the examples are going to be Python, since the solution is Python
            
            
            
              based. So for those who don't know edibles
            
            
            
              Lambda Powertools is an amazing repository. It basically defines
            
            
            
              all the best practices for AWS lambda logging,
            
            
            
              tracing, input validation, and feature flags
            
            
            
              are defined and you can use their utilities to do
            
            
            
              that. It has over 1 million downloads per month, so it's
            
            
            
              very popular. And we're going to use the feature
            
            
            
              flags utility, which I designed and contributed to edible
            
            
            
              Aslam powertools. And what it essentially does, it fetches
            
            
            
              configurations from app config. It stores it
            
            
            
              in an in memory cache, it evaluates the feature
            
            
            
              flags value for you, and it has something very interesting. It has
            
            
            
              a support for regular and smart feature flags. And I'm
            
            
            
              going to discuss smart feature flag later on. And just
            
            
            
              to clarify, it's not just for a lambda function, even though
            
            
            
              the name says lambda, you can use it also in containers.
            
            
            
              So let's go back to the simple use case.
            
            
            
              We have a regular feature flag,
            
            
            
              a 10% of campaign, and the default value is going
            
            
            
              to be, let's say the feature is enabled by default.
            
            
            
              And this is how you're going to use the code. In line three,
            
            
            
              we're going to define the app config configuration, the environment,
            
            
            
              the application, and the configuration name. In line nine,
            
            
            
              we're going to define the instance of
            
            
            
              our SDK with the in memory cache. We're going to initialize
            
            
            
              it. And then in line twelve we're going to evaluate, right, this is the magic
            
            
            
              function. We're going to evaluate the feature flags,
            
            
            
              10% off, and we're going to get a boolean value back,
            
            
            
              apply discount, and then you can see the navy implementation again
            
            
            
              in line 15. If apply discount, change the behavior,
            
            
            
              do something new. Otherwise do the
            
            
            
              old behavior. And something
            
            
            
              important to note that in line 13 I'm using the default value
            
            
            
              equals false. Why is that?
            
            
            
              Well, what if somebody deployed
            
            
            
              a new configuration and just removed the feature 10%
            
            
            
              of campaign from the configuration? I don't want my
            
            
            
              code, my lambda function to crash,
            
            
            
              so I'm going to have a fallback, a default value. So in case it
            
            
            
              doesn't find the feature flag in the
            
            
            
              configuration, it's going to have a default value.
            
            
            
              So now I'm going to show you smart feature flags,
            
            
            
              which are very cool. First of
            
            
            
              all, they enable you a b testing, which is the final requirement
            
            
            
              that we didn't answer yet. So how does
            
            
            
              it do that? Basically, the feature flags will
            
            
            
              change value according to your input. You have a context input
            
            
            
              that you provide, and it has a rule engine that checks
            
            
            
              if the rule matches. And if they do, they return the value that the
            
            
            
              rule defines. So you can have for one input the
            
            
            
              value can be the feature flex value can be false.
            
            
            
              But if you provide a different value input, it can
            
            
            
              be true. So one configuration and different behavior and
            
            
            
              it allows you to do a b testing and I'm using to show you how
            
            
            
              in a second.
            
            
            
              So let's take a look at this sample configuration. Let's assume
            
            
            
              that we have on the left our input event to our lambda,
            
            
            
              we have usernames and each user has a tier. In this
            
            
            
              case the tier
            
            
            
              is premium, but it can also be standard.
            
            
            
              And on the right we can see the configuration that we have.
            
            
            
              In line 17 we have the regular feature flags. And in line two
            
            
            
              we have the smart feature flag. So again it has a default value of
            
            
            
              false in line three, but then it have the smart
            
            
            
              rule engines. It has the rules in line four defined. It has one
            
            
            
              rule, it says customer tier equals premium.
            
            
            
              And if the customer tier is premium,
            
            
            
              then line six says then the feature flag is going to be
            
            
            
              true. Right? And in order for the rule to match,
            
            
            
              all the conditions need to apply need to match,
            
            
            
              need to value it true. So here we have a set of conditions,
            
            
            
              just one. And it means that the
            
            
            
              tier, which is the key in the input needs to have
            
            
            
              a value of premium. And the key tier
            
            
            
              needs to equal to the value premium, right?
            
            
            
              Because the action is equal. So tier and
            
            
            
              the value need to be equal.
            
            
            
              So let's see it here in this example.
            
            
            
              So the same code applies here. It's the same thing
            
            
            
              as we had before, but we have the context in line 13
            
            
            
              where we were building the input context. So we have the key
            
            
            
              tier and then we have the value. It can be standard or premium.
            
            
            
              So if you recall, if tier, the key is
            
            
            
              going to be standard, then the feature flag is going to be the rule does
            
            
            
              not match, it's going to be default, false. If tier has a value
            
            
            
              of premium equals premium,
            
            
            
              then the rule is going to match. And the has premium features
            
            
            
              in line 17 is going to be true. So in line
            
            
            
              17 we just call the same evaluate function, but we provide
            
            
            
              the optional context. Okay, so then
            
            
            
              if it's premium tier, line 19 is going to trigger and
            
            
            
              we're going to enable the premium features. Otherwise for
            
            
            
              another user we're going to have different behavior.
            
            
            
              So that way you can do a b testing between different users
            
            
            
              with the same configuration.
            
            
            
              And there are over ten actions that you can use.
            
            
            
              You can see more in the website. You have start with
            
            
            
              keen value, et cetera, over ten actions.
            
            
            
              And also you have non boolean feature flags.
            
            
            
              You can use any
            
            
            
              valid JSON value can be a list of strings, integers, et cetera. In this
            
            
            
              case I'm using a list of strings where I
            
            
            
              want the premium tier to have special actions that I
            
            
            
              do on their account, like remove limits and remove ads.
            
            
            
              But the default for the non premium users is going to
            
            
            
              be no special action is going to be applied.
            
            
            
              So you can use this for all sorts of sample rules. You can enable
            
            
            
              it for a specific customer, maybe an admin of a customer,
            
            
            
              apply discount for specific types of products, offer free shipping
            
            
            
              if the cost is higher than some number. You can
            
            
            
              have so many possibilities here, and it's very flexible.
            
            
            
              So like I said, we're going to use it for a b testing,
            
            
            
              and you can have different user experiences for different users with
            
            
            
              just one single configuration which does not change.
            
            
            
              So if I recall, I've mentioned that there is an
            
            
            
              in memory cache. Why is that important?
            
            
            
              Because each call to AWS app config to fetch configuration
            
            
            
              costs money and we want to save some money.
            
            
            
              So the in memory cache says that if the cache does
            
            
            
              not expire, we do not fetch the new configuration and
            
            
            
              we save money. And you can define what number of seconds you want to
            
            
            
              have. And it's important to remember that it's
            
            
            
              a balance between cost saving
            
            
            
              and having the service change its behavior
            
            
            
              as soon as possible. Because if the cache doesn't expire,
            
            
            
              the service will not fetch a new configuration.
            
            
            
              And by the way, I'm adding very soon,
            
            
            
              hopefully this month I'm adding time best rules where you can enable
            
            
            
              rules and feature flags at specific times,
            
            
            
              enable features for a specific duration,
            
            
            
              or enable them during specific days.
            
            
            
              And now lastly, we're going to discuss the feature flags
            
            
            
              best practices that we're going
            
            
            
              to use across all the stages of our pipeline of our
            
            
            
              development, from the build to
            
            
            
              the testing to deployment and production.
            
            
            
              So in my eyes, the development team needs to own the process from
            
            
            
              start to end. They need to write the
            
            
            
              configuration JSON files, they need to write the code that
            
            
            
              evaluates it and behaves accordingly.
            
            
            
              And they need to start where the features are enabled
            
            
            
              in best and dev accounts, but disabled in production.
            
            
            
              And when it comes to best, well, we're going to use mocks,
            
            
            
              we're going to mock the configuration in our tests so we have better
            
            
            
              control on the outcome. And obviously we're
            
            
            
              going to mock the feature when it's enabled and tested.
            
            
            
              All the side effects and everything is working just fine. But it's very important to
            
            
            
              mock the feature as also disabled, because sometimes
            
            
            
              you don't have a simple if statement, if feature is
            
            
            
              enabled, do something sometimes it's more complicated and it's really important
            
            
            
              that that part of logic is tested.
            
            
            
              We want to assert that the logic,
            
            
            
              the function that handles the feature flag when it is enabled,
            
            
            
              does not run when the feature is actually disabled.
            
            
            
              We actually had a bug where our feature was marked as
            
            
            
              false, but due to a bug in the if statement, it was
            
            
            
              a complicated one. The feature actually ran and we had
            
            
            
              some problem in production. So it's very important to test that.
            
            
            
              Then once you decide that the feature is stable in the non production
            
            
            
              environments, you can go ahead and run a deployment strategy
            
            
            
              to production and use canary deployments.
            
            
            
              Epcofing has you covered for
            
            
            
              that. And you should define cloud watch alarms on
            
            
            
              errors for your service so you can auto revert
            
            
            
              sorry, you can auto revert your configuration if there's
            
            
            
              an error.
            
            
            
              Now, what happens if for some reason at some
            
            
            
              later time you do have some errors in your feature,
            
            
            
              things that you didn't find in the tests? Well, you should disable the
            
            
            
              feature as soon as possible and run the configuration CI
            
            
            
              CD pipeline again. You should update the tests and
            
            
            
              add the missing use cases and just do the
            
            
            
              whole thing again. Just deploy and re rulebased again. And I suggest that
            
            
            
              you also do a retro meeting where you identify why,
            
            
            
              how come you missed those use cases in the test, how come you had this
            
            
            
              bug in production and eventually
            
            
            
              you need to retire the feature flags. And why you should do that? Well,
            
            
            
              because feature flags, they add code complexity, you have more best
            
            
            
              around it, you have more mocks, you have more if statement and branching
            
            
            
              in your code. It's more complicated. So at
            
            
            
              some point we want to retire the features and remove the code. How do we
            
            
            
              do that? How do we do that? We're meeting once a month
            
            
            
              and then we can discuss and select candidates for removal
            
            
            
              for feature flags to remove. And then all we need to do is just run
            
            
            
              the configuration CI CD pipeline again and monitor
            
            
            
              that everything is okay. How do we
            
            
            
              select candidates for removal? Well, if the
            
            
            
              feature has been enabled to all the customers for several weeks and it's been
            
            
            
              stable, there are no bugs around it, the feedback
            
            
            
              of the customers has been very positive and you don't have any open
            
            
            
              issues. And if you don't expect any changes in
            
            
            
              the code around that area, then you should totally just retire the
            
            
            
              feature and make your code simpler.
            
            
            
              So let's sum it all up. We created feature flags,
            
            
            
              smart and regular. We deployed app config. We used lambda
            
            
            
              power tools to fetch and evaluate the configuration feature flags.
            
            
            
              We had canary deployments. We learned how to do a b testing,
            
            
            
              and we learned how to do what are the feature flags?
            
            
            
              Best practices in the development stages
            
            
            
              all the way to production. So thank you
            
            
            
              very much. That's been my talk. And you can follow me
            
            
            
              on my twitter and my, my LinkedIn and check out my website,
            
            
            
              runthebuilder cloud, where I talk about all things serverless.
            
            
            
              Thank you very much and have a good day.