Transcript
            
            
              This transcript was autogenerated. To make changes, submit a PR.
            
            
            
            
              Hi everybody, my name is ran Isenberg, and today we're going to talk about AWS
            
            
            
              CDK best practices. So AWS CDK allows
            
            
            
              you to write infrastructure as code to describe your resources in
            
            
            
              the cloud with code, not JSon or YAML
            
            
            
              files, but actual code it. And to me, as a developer,
            
            
            
              it feels right at home to write code.
            
            
            
              Now, I had the pleasure of talking at the first AWS
            
            
            
              CDK day about three years ago, and I came there
            
            
            
              from a perspective of a newbie with CDK.
            
            
            
              And I wanted to share the experience of working with CDK and to share
            
            
            
              how it helped to accelerate the development at Cyberark with CDK.
            
            
            
              So in this use case, I described an audit service,
            
            
            
              Poc. You can think of an audit service as a
            
            
            
              type of an ETL log service, where you take a log, you alter
            
            
            
              it, extend the information and save it into a bucket.
            
            
            
              And here you can see we have lots of serverless services, we have IoT,
            
            
            
              we have kinesis, we have lambdas, we have API gateways,
            
            
            
              we have elasticsearch, which is now Opensearch, we have buckets.
            
            
            
              And we didn't have a lot of knowledge and experience with these
            
            
            
              services back then. And since there's not a
            
            
            
              lot of business domain logic here, it's just a matter of connecting
            
            
            
              all the Lego pieces here and configuring all the event driven architecture.
            
            
            
              So it took us with CDK, it was me and
            
            
            
              two other guys, just three days, three days to get all this
            
            
            
              working, which to me was mind blowing. It really shows how CDK
            
            
            
              can really accelerate your development. It's a very powerful
            
            
            
              tool. And AWS Uncle Ben says,
            
            
            
              with great power comes great responsibility. Since CDK
            
            
            
              is such a powerful and flexible utility, it's really easy to make mistakes
            
            
            
              when you're writing code. You can basically do whatever you want.
            
            
            
              I'm using Python, the CDK
            
            
            
              variation Python, and you can do basically whatever you
            
            
            
              want. So it's really easy to make mistakes, and these mistakes can be quite
            
            
            
              costly. So this brings up the agenda
            
            
            
              of this talk. We're going to talk about best practices so you don't
            
            
            
              make mistakes. So we're going to cover CDK app guidelines,
            
            
            
              constructs guidelines, SACD guidelines, security and resilience,
            
            
            
              journal development tips. And we're going to summarize it all. So let's
            
            
            
              start so, a little bit about myself. My name is Ron Eisenberg.
            
            
            
              I'm a principal software architect at Cyberark,
            
            
            
              at the platform engineering group, I'm an editless community
            
            
            
              builder. I'm an owner of the website runthebuilder.
            
            
            
              Cloud where I share my serverless knowledge and you can see
            
            
            
              the QR code, you're going to see these QR codes along the
            
            
            
              presentation. This is the link to my website where again I share all my serverless
            
            
            
              knowledge. Okay, so let's talk about
            
            
            
              CDK app guidelines. So usually when
            
            
            
              you go ahead and start your first service,
            
            
            
              you start with CDK application.
            
            
            
              Usually it should be around one business domain and it
            
            
            
              should have one stack, not several. I'm going to explain why.
            
            
            
              And these stacks are going to have at least one construct. Each construct
            
            
            
              is going to have multiple resources with configurations
            
            
            
              between them, all their relationships, and sometimes even
            
            
            
              between constructs you can have relationship between the items.
            
            
            
              Maybe there's an event driven mechanism here. And I
            
            
            
              usually view constructs as a micro or nanoservice within
            
            
            
              the application itself. So I
            
            
            
              think it's usually best practices to have you should have one stack,
            
            
            
              one business domain maintained by one team so you don't have
            
            
            
              conflicts. One CI CD pipeline. And the reason I
            
            
            
              said one stack is because assuming
            
            
            
              you have several stacks in this application, so you deploy
            
            
            
              the first stack and then the second stack. What happens if you
            
            
            
              have a serious issue that you want to fix in the second stack,
            
            
            
              but for some reason the first stack fails deployment.
            
            
            
              So now you're stuck. You have increased your blast ready. So you have
            
            
            
              a bug in your first stack and you must solve it before you can
            
            
            
              solve the really critical issue that you have in your second stack. So I think
            
            
            
              it's really for the better to have just one stack in your application
            
            
            
              and to have a smaller blast ready.
            
            
            
              Okay. So however, in some
            
            
            
              cases you need to split the application into
            
            
            
              another application, another repository, another stack,
            
            
            
              and I can think of two use cases. One use
            
            
            
              case is when a different team will maintain the new application or
            
            
            
              it's a different business domain. It started as a small part of
            
            
            
              your service and then over time you realize hey, it's going to
            
            
            
              be its own domain, so you're going to move it to its
            
            
            
              own repository and maintain it there. And you should keep
            
            
            
              in mind that you shouldn't oversplit. Right. The balance is the key here
            
            
            
              because there is a complexity added when you
            
            
            
              have multiple repositories, because sometimes services
            
            
            
              depend one on each other. Sometimes you need to develop
            
            
            
              features that are cross repository, cross application.
            
            
            
              Then you need to develop it with feature flags and coordinate
            
            
            
              the enablement of these feature flags. Sometimes there is a deployment
            
            
            
              time dependency, maybe there's an HTTP API
            
            
            
              gateway that one
            
            
            
              application builds and the other one needs to know it.
            
            
            
              So then you can use things like SSM and cloud map that one
            
            
            
              stack publishes to the other and use it in deployment time.
            
            
            
              So it gets more complicated.
            
            
            
              Instead of just heading all in the same repository,
            
            
            
              let's talk about project structure. So I believe that you should
            
            
            
              have three folders. You should have CDK service and the
            
            
            
              test. CDK will obviously contain the application. You have
            
            
            
              the application on the root folder, CDK will contain the stack
            
            
            
              and all the constructs. Then you have the service,
            
            
            
              which is the business domain logic, all your lambda function code, things like
            
            
            
              that. And then you have tests, and tests we're going to cover. You're going to
            
            
            
              have unit integration, end to end, and security and CDK
            
            
            
              infrastructure examples that we're going to cover later on. And as
            
            
            
              you can see, I'm a true believer in the DevOps
            
            
            
              mentality. That infrastructure as code and the business
            
            
            
              domain code should reside together because the developer should
            
            
            
              have the ownership and the understanding of everything together,
            
            
            
              from the development stages to the production and the monitoring.
            
            
            
              Okay, so let's say that you've created your amazing
            
            
            
              template, your amazing application,
            
            
            
              you have all the best practices, you have an amazing CI CD
            
            
            
              pipeline, your project structure is amazing and it
            
            
            
              works. Now you want to create the second
            
            
            
              service. So what do you do? Do you just
            
            
            
              copy all the code from there, just duplicate the first repository
            
            
            
              and manually change it to create another repository? No,
            
            
            
              it's a lot of work. So what we saw, like I said,
            
            
            
              I'm part of the platform engineering group, and we saw that by
            
            
            
              creating a CDK template project, a self service
            
            
            
              project, teams can just start and create their
            
            
            
              service really fast and they can get started
            
            
            
              with something that works, that has all the best practices, all the project
            
            
            
              structure, all the CI CD pipeline, everything just as it
            
            
            
              should be. And they can just focus on writing the business domain. They usually go
            
            
            
              and we provide them internal training when we tell them
            
            
            
              about the internal sdks that we use all the best practices for writing lambda
            
            
            
              functions, how the CSCD pipeline works, things like that.
            
            
            
              And we studied it really helps to reduce the cognitive load from the
            
            
            
              developers and accelerate the development. So once I
            
            
            
              started, it really works for us at Cyberark, I decided why not make an open
            
            
            
              source out of it? So I created the editless Lambda handler cookbook
            
            
            
              project, which is found in the QR code on the
            
            
            
              top right. And it's basically a serverless service that
            
            
            
              allows you to create an API gateway lambda that writes to a dynamodb
            
            
            
              table and uses feature flags based on app config configurations.
            
            
            
              And it has the CI CDK pipeline and observability and all
            
            
            
              the best practices for writing lambda functions and the testing and
            
            
            
              everything. So you should check it out.
            
            
            
              So now let's talk about construct guidelines.
            
            
            
              So the same way that you don't write all your code in one
            
            
            
              function, right? You don't just have one file with 10,000
            
            
            
              lines. You shouldn't do that the same in your stack. You shouldn't define
            
            
            
              all of the resources in your stack. You should use constructs.
            
            
            
              Constructs and constructs are really easy to share between
            
            
            
              teams. You can
            
            
            
              have a best practice construct that
            
            
            
              you can share between teams and save time. So use
            
            
            
              constructs. Usually I see them as a microservice or
            
            
            
              a nanoservice. And one exception to
            
            
            
              the resources on the stack is the lambda layer. If you're using lambda layer that
            
            
            
              is used in multiple constructs, I think that's okay to
            
            
            
              define on a stack level. But usually you should just have
            
            
            
              different constructs that define the micro or nanoservices
            
            
            
              of your application.
            
            
            
              So why did I mention that constructs are really great to
            
            
            
              share? Usually platform engineers will create
            
            
            
              and maintain the shareable construct. You can think about
            
            
            
              it as organization approved,
            
            
            
              security approved constructs or patterns that you can use
            
            
            
              across the organization without reinventing the will. You can create a library.
            
            
            
              In my case it's Python, because we use Python. It's a Python library
            
            
            
              of CDK constructs and you can import and use it in
            
            
            
              your CDK code and just use it as a black box, so to speak,
            
            
            
              so it saves time for developers. However,
            
            
            
              since it's a library, it has a version and you might need to upgrade.
            
            
            
              And in upgrades you need to be careful not to whoever
            
            
            
              maintains it needs to be careful not to change the logical id of
            
            
            
              stated resources so you don't get your database deleted.
            
            
            
              And we're going to talk about it later on. But when
            
            
            
              you're doing it, you should
            
            
            
              be careful when upgrading and writing, when you're changing logical
            
            
            
              ids. And we're going to talk about it later on and I'm going to explain
            
            
            
              it in several details. Okay,
            
            
            
              some examples of shareable constructs so you
            
            
            
              have maybe WAF rules that you want to use for your API,
            
            
            
              gateway or cloud front distributions. Maybe you have an SNS
            
            
            
              SQS pattern subscription with
            
            
            
              encryption at rest which is not enabled by default. You might want
            
            
            
              to have an AWS app config dynamic configuration construct.
            
            
            
              Maybe you want to have Datadog log shipper or Pii
            
            
            
              sanitizers. And you can find more example the following links
            
            
            
              constructs dev, serverless land CDK patterns and the
            
            
            
              edibles solutions constructs.
            
            
            
              So now that we understand that we need to write
            
            
            
              constructs, how do we take an application and split
            
            
            
              it into constructs? So I think that
            
            
            
              it should be by business domain driven. Let's take a look
            
            
            
              at the following service.
            
            
            
              So we have the crud API. We have an API gateway that
            
            
            
              invokes two lambda functions that write and read
            
            
            
              to an Aurora serverless database. It has
            
            
            
              its own VPC networks and all the fun stuff.
            
            
            
              There's an Aurora stream that triggers a lambda function that sends a message
            
            
            
              via SNS. There's an incoming message via the SNS
            
            
            
              to an SQSQ that triggers a lambda function and again reads
            
            
            
              the aurora function, the Aurora database. So how do you go about
            
            
            
              and split it into constructs? So like I said, I think it
            
            
            
              should be business domain driven. We have the crud
            
            
            
              part and we have the database part. I think the database part,
            
            
            
              even though it's defined in the crud,
            
            
            
              it is an internal contract because the
            
            
            
              lambdas there, they're the only one who write there, writes into the database.
            
            
            
              So I think they own the database, so to speak.
            
            
            
              You still should create the aurora as its own construct
            
            
            
              because it's a very complicated construct and it's really easy to share it
            
            
            
              across organization. So once you do it, you create it once and then
            
            
            
              you can share an aurora database across all the organization and just have
            
            
            
              a best practices and secured Aurora serverless
            
            
            
              database. And on the other hand, you have the messaging,
            
            
            
              the asynchronous part, you have the SNS and the queue
            
            
            
              and the two lambda functions. And again there's connections between the
            
            
            
              two constructs. As you can see, the lambda functions they need
            
            
            
              to access the overall database. So it's
            
            
            
              important to understand that gathered is no right and wrong in this case
            
            
            
              because it's all defined under the same stack.
            
            
            
              There isn't really right or wrong, they're going to get deployed the
            
            
            
              same way. However, I think it makes more sense to split like this
            
            
            
              because it makes it easier to find the code, to find the resources
            
            
            
              in the project itself. It makes it easier to maintain and the
            
            
            
              readability of the code. But you can choose whatever
            
            
            
              type of construct changes that you want.
            
            
            
              But I think this is a good example of how to do it that makes
            
            
            
              sense. Okay, let's talk about
            
            
            
              CI CDK guidelines.
            
            
            
              Okay, so usually you'd
            
            
            
              like to model your CI CD guidelines, stages in code.
            
            
            
              Different environments have different configuration and that's okay. And CDK
            
            
            
              needs to know how to make these configuration changes into
            
            
            
              your environment. And usually in my case we use Jenkins
            
            
            
              that sets environment variables and
            
            
            
              injects them into the CDK application.
            
            
            
              We call it a profile, can be dev test, production, whatever,
            
            
            
              and then CDK code knows how to address this parameter and
            
            
            
              make the different configuration. So you can see also
            
            
            
              in this example that I'm using different accounts. I'm using
            
            
            
              dev account test account production account and they're a different account.
            
            
            
              And the reason for that is that you want to
            
            
            
              have a small best radius in case of a breach. If somebody hacks
            
            
            
              into your dev account, you don't want to have your production account jeopardized.
            
            
            
              Another reason is to have the AWS resource quarter limits.
            
            
            
              You don't want to reach it. So by using different accounts you're
            
            
            
              probably not going to get there. So let's see an example
            
            
            
              of how it works in CDK. So in this case I want to define a
            
            
            
              table. I have the profile environment variable
            
            
            
              that I'm going to get that
            
            
            
              Jenkins sets. So in this case I'm defining a dynamodb
            
            
            
              table. And you can see at the point in time recovery table recovery
            
            
            
              argument. So if I'm at the developer environment dev environment,
            
            
            
              I don't want to enable it, I don't care about this database, it's going to
            
            
            
              be in a firmware. Users use it for just branch development
            
            
            
              and feature development and I don't want to backup the database.
            
            
            
              However, if it's production, I do want to have backups.
            
            
            
              Right. I want to be able to return to
            
            
            
              appointed time, recovery time in case of a crisis or a disaster.
            
            
            
              The same thing goes for the removal policy. If I'm
            
            
            
              in a development environment, I want to remove the database
            
            
            
              when I finish with the stack. But in production, if for
            
            
            
              some reason there is a mistake and the stack is removed, I want
            
            
            
              to keep my data. I want to keep my database. Okay, so this
            
            
            
              is an example of how you can use different configurations
            
            
            
              in your CDK code. Let's talk about security guidelines.
            
            
            
              So, secrets in, never ever write
            
            
            
              secrets hard coded in plain text in CDK or config files.
            
            
            
              You should store them in GitHub, Jenkins or some sort of guidelines,
            
            
            
              whatever you're using as an internal secret. And then you
            
            
            
              can inject it into CDK as an environment variable or parameter
            
            
            
              into the constructor of the stack. And then CDK
            
            
            
              will use this parameter to deploy it into
            
            
            
              secrets manager or SSM parameter store as an encrypted string,
            
            
            
              and then the lambda will consume it from SSM and
            
            
            
              secrets manager. And it's going to have an environment variable
            
            
            
              that tell it the secret name. And of course the correct permissions
            
            
            
              to get the secret. This is how you should do it. And don't use
            
            
            
              in the lambda functions, don't use the environment variable for storing secrets.
            
            
            
              Don't do that. So this is the proper way to do that.
            
            
            
              Okay, let's talk about resources, security configurations.
            
            
            
              So as you can see, AWS is really thinking about security.
            
            
            
              Back in January, new s three objects are encrypted
            
            
            
              by default and DynamoDB supports encryption at rest
            
            
            
              for quite a while now, but it's not all the
            
            
            
              same for all resources. What about SNS encryption at rest? You can see
            
            
            
              that it's disabled by default and you need to know it and
            
            
            
              enable it yourself in the CDK code.
            
            
            
              So security defaults differ by the service itself.
            
            
            
              AWS gets better it, but it's your responsibility in the end.
            
            
            
              You have the shared responsibility model where
            
            
            
              AWS keeps the security off the cloud,
            
            
            
              but you need to make sure that the security in the cloud
            
            
            
              is properly defined because these are your resources and you
            
            
            
              own them and you need to make them secure. It's your responsibility. Nobody else
            
            
            
              is going to do that for you. Okay?
            
            
            
              So you should make sure that all your
            
            
            
              configurations are really use the best security
            
            
            
              best practices. You should have security review, you should
            
            
            
              have scheduled a penetration test from time to time, and you should
            
            
            
              also use CDK security tests. And that's what I'm going to show you now.
            
            
            
              We're going to use a tool called CDK Nag.
            
            
            
              And these are tests that you run prior to deployment.
            
            
            
              So you're not going to deploy a stack that has
            
            
            
              security misconfiguration. So you don't expose yourself to
            
            
            
              a security hazard. You're going to run it before
            
            
            
              the deployment to your account. And these tests,
            
            
            
              what they do, they actually synthesize the cloud formation
            
            
            
              template of your stack and then they run a bunch of
            
            
            
              assertions again and security checks on that stack.
            
            
            
              So in this case we have two tests. The first test is going to check
            
            
            
              for AWS solution architects best practices for security measures.
            
            
            
              And the second one is the HIPAA standard for security
            
            
            
              checks. And if you did something wrong, like an overly privileged role
            
            
            
              or an open bucket to the world, a public bucket,
            
            
            
              it's going to tell you hey, it's going to fail and you're not going to
            
            
            
              push the code and deploy something that is risky. So that's
            
            
            
              very important. Okay.
            
            
            
              Another thing that I think is very important is to write your own IM
            
            
            
              policies. In this example, I want to define a dynamodb
            
            
            
              table and I want to provide a lambda role with the permissions to
            
            
            
              get an item and put an item into the table.
            
            
            
              So in many cases you can see that people tell
            
            
            
              you hey, you should use the table grant read, write data
            
            
            
              to your role. It's really easy, it's very readable and it works.
            
            
            
              But what happened is that I wanted to have
            
            
            
              two permissions added to my llama function, but by using
            
            
            
              this function I actually provided something like, I think there's like
            
            
            
              eight or ten permissions here that I don't need. So my
            
            
            
              role is not least privileged. Okay, so if somebody gains
            
            
            
              access to this role, he can make a lot of more damage to
            
            
            
              my dynamodb table that
            
            
            
              we wanted to have access to. So what you should
            
            
            
              do is use the CDK
            
            
            
              to write your own inline policies. And this way you
            
            
            
              understand the IM policies better. And you can see you
            
            
            
              write a policy document and policy statement. You say I want to only put
            
            
            
              item and get item on a specific resource table
            
            
            
              arn my specific table, right. You're not going
            
            
            
              to use an asterisk here and I'm going to allow it. So this way we
            
            
            
              have just the permission that we wanted and I think
            
            
            
              it's going to make you a better developer since you understand Im policies better.
            
            
            
              Let's talk about resilience.
            
            
            
              Okay, so sometimes in
            
            
            
              CDK people can go ahead and make refactor the code
            
            
            
              and move resources from one construct to another, maybe rename
            
            
            
              the construct. And sometimes they don't realize that by doing that they change
            
            
            
              the logical id of the resource. That means that CDK
            
            
            
              and cloudformation are going to delete the resource and create it new with
            
            
            
              the new logical id. And that could be a big issue,
            
            
            
              a serious issue if we're talking about stateful resources such as
            
            
            
              tables with data, actual production data, or maybe cross
            
            
            
              account trust role that you change its arn
            
            
            
              and now you don't have access to production to the
            
            
            
              other account. So you can
            
            
            
              have serious issues by doing something that seems very simple
            
            
            
              and naive. And another issue that I've encountered, only ones
            
            
            
              to be honest, is that if you have your CDK code
            
            
            
              resin exception that somehow doesn't fail the entire process
            
            
            
              of deployment, you can have entire resources deleted from your stack.
            
            
            
              So you can basically deploy and remove an API gateway or
            
            
            
              bucket and things like that, which is not very great. So one
            
            
            
              way to avoid that is
            
            
            
              to write CDK unit infrastructure best. So let's see how you can
            
            
            
              do that. So in this use case, again,
            
            
            
              this runs before prior to deployment. So you know you're going
            
            
            
              to keep your code safe. And here we're going to
            
            
            
              create and synthesize again the cloud formation
            
            
            
              template. And we're going to make some checks. We're going to make sure
            
            
            
              that our critical resource, the API gateway, the rest API,
            
            
            
              it's going to be there. The same thing for the DynamoDB table,
            
            
            
              and we can also add checks to make sure that the logical id is
            
            
            
              there and it hasn't changed. So if it changes, we know
            
            
            
              that we're going to basically create a new table with zero data
            
            
            
              there. So it's not great. So this can be a nice safeguard to
            
            
            
              prevent that. Another cool utility
            
            
            
              that you can use is CDK diff. It's an open source that you
            
            
            
              can add to your pipeline and what
            
            
            
              it does basically is it visualizes
            
            
            
              new resources and changes to your stack. You can see that
            
            
            
              a new resource is added in the green and
            
            
            
              a resource is deleted in red. So it makes it easier to
            
            
            
              understand if there is a critical change or maybe somebody is making something that
            
            
            
              they shouldn't be doing and changing critical resources.
            
            
            
              It just makes it a better visibility.
            
            
            
              Backups so in backups you should use retain policies like
            
            
            
              what we saw earlier. It's better be safe than sorry.
            
            
            
              You should have the ability to retain the database.
            
            
            
              Then you can restore the data into
            
            
            
              the new table in case you delete it, in case you created a new table
            
            
            
              instead. And you should always backup your resources.
            
            
            
              DynamoDB has a point in time. Same thing for Aurora databases.
            
            
            
              You can use AWS backups for our resources so you can
            
            
            
              recover your lost data in case of a disaster.
            
            
            
              Let's talk about some general tips and guidelines.
            
            
            
              So usually when I'm
            
            
            
              using a new service in CDK and I'm not
            
            
            
              really sure how to define it, I can go ahead into the console,
            
            
            
              the AWS console, and play around with it and just try
            
            
            
              to understand how the resources and entities play
            
            
            
              together, maybe what's the relationship between them? And then
            
            
            
              it makes it easier for me to write the CDK code because I understand the
            
            
            
              service much better sometimes.
            
            
            
              The second tip is that sometimes the higher level constructs,
            
            
            
              the abstractions that CDK provides, does not expose all
            
            
            
              the configuration that you might need. Sometimes you need to use the lower
            
            
            
              abstraction, the CFN low level resources.
            
            
            
              They're less, let's say easier to use or fun
            
            
            
              to use, but they usually expose all the cloud formation aspects
            
            
            
              and configuration and you can use them to define pretty much whatever you want.
            
            
            
              The third tip is tags. Tags are super important
            
            
            
              because you can use tags on the stack level and
            
            
            
              they're added to all the resources. So it's really easy to understand
            
            
            
              all the resources that you see in AWS, who created them,
            
            
            
              when they created them, what service they belong to, and it's really
            
            
            
              easy to manage your services, to manage your resources like
            
            
            
              that, or mermaid to understand why you have some orphan
            
            
            
              resources because they have tags on them. And lastly,
            
            
            
              I think the most important tip is that we're developers
            
            
            
              and we like to have cool abstractions and cool factory methods.
            
            
            
              And my tip for you is don't do it. This is a CDK code,
            
            
            
              infrastructure code. It should be as simple as possible,
            
            
            
              okay? It should be really readable and easy
            
            
            
              to use, and you shouldn't make it too
            
            
            
              complicated and you really should. I'm okay with more codiplication
            
            
            
              if it's really easier to read.
            
            
            
              So let's summarize it. Like we said,
            
            
            
              CDK is very powerful, but you need to be responsible.
            
            
            
              And we covered all the best practices for CDK
            
            
            
              app stack constructs, how to share constructs.
            
            
            
              We talked about the CDK template and self service mechanism,
            
            
            
              security and resilience and that's it.
            
            
            
              I hope you found it interesting and helpful. And thank you very
            
            
            
              much. You can follow me on Twitter LinkedIn and my website runthebuilder.com.
            
            
            
              Thank you very much.