Conf42 DevOps 2023 - Online

V-Net Vs. Public Network: How to Choose What's Right For You

Video size:

Abstract

DevOps is all about the orchestration of the developing and deploying of a pipeline, and in order to orchestrate your pipelines you need to secure your agent. Which option is best to choose, V-Net vs. Public Network? V-net takes more upfront work but offers customization. Public Network requires less to implement, but has limits when applying to individual projects. In this talk, Hong Bu will discuss the pros and cons of securing an agent with V-Net or a Public Network, and when you might want to go with one over the other. Every security agent has limitations, but with this guide you will be able to choose what is best for you.

Summary

  • In a customer project we are using the Azure DevOps to orchestrate the automatic developing and the deployment or CI CD. However, our Microsoft hosted agent which is running all the pipelines for the automation orchestration is located in the public network. How can we solve this problem?
  • Microsoft Healthset agent provides so many virtual machine images that you can choose. But the Microsoft hosted agent is running on the public network. Solution was to use a self hosted agent which is built by ourselves. But something unexpected happened during the middle of the project.
  • Use this customized image to satisfy my running pipeline request. Next, use this created image to build my virtual machine or virtual machine skill site. In order to build this virtual machine on Azure I need to publish the image to a place.
  • Instead of using the by default Microsoft agent, I'm going to create a new agent pool and link that to the virtual machine skill site I just created. Let's see how the result will be and before run the pipelines.
  • Security is always the first thing we should consider in any project. Cloud security is definitely a shared responsibility. Building a self hosted agent is definitely more complicated than using the Microsoft hosted one. But thinking about in the Hong Bu we can avoid big asset loss in the long run.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Now let's see how we can secure our DevOps agent. And before that we also need to know why we need to secure the DevOps agent. What issue might happen if I didn't do it. So my team inside of Microsoft, we are doing a lot of code with engagement with a strategic customer or big customer to help launch their solution on our cloud. So as a program manager of the projects, it's my biggest goal to ensure that all the project could be delivered on time with high quality. And that's where the understanding of the time spent on security, design, implementation and plan comes in. And here all my securing and artifacts and learning here are coming from my teammates and myself from a previous customer engagement. So what happened inside of that project? Well in that customer project we are using the Azure DevOps or the ADO to orchestrate the automatic developing and the deployment or CI CD. So this is a quite common practice and AdO is widely used by a lot of our customer. And here we were using Microsoft hosted agent which is running in the Microsoft public network and at the same time all the customer resources are located inside of their virtual network or VNet which is under protection. So they're using Azure web application in the front to ingest and accept all the requires from the Internet users and then rock this bike to the VNEt resources to get responsible to their end user request. And at the same time they do have an access control to the web application's ICM site. For the people who may not aware of an SCM site, well it's actually the engine behind any Azure web application for deployment. So that means if you want to deploy any upgrade version or new version to the web application, you need to deploy that first to the SCM site of the web application and then make the upgrade to the app. So you can imagine it's quite crucial to protect and make it secure for the access to the SCM site. And this is the same for our customer. So they do set a restriction to the SCM site of their web application that only the sites from requests from their virtual network are allowed to access to that SCM site. However, our Microsoft hosted agent which is running all the pipelines for the automation orchestration is located in the public network and this is per design. We couldn't change it. So how can we solve this problem? Before that, you may have a question in your mind why you are using Microsoft hosted agent which is running on the public network and caused this challenge. Well the reason is it's so easy, it's so easy to start because the Microsoft Healthset agent provides so many virtual machine images that you can choose. So you don't need to worry about how to build the different images from the starting but can quickly start your deployment and developing work. And also a lot of the Microsoft hosted agent images have their purview tools and runtime dependencies libraries that you don't need to worry about these tools and installation, that you can focus on your own pipeline and build the orchestrations. And what's more, the Microsoft health state agent is managed service on Azure. So that means it's totally and fully maintained and also kept by Microsoft the Defender systems and also the security mechanism and also keep upgrading and also updating software by Microsoft so you don't need to worry about all the maintenance that makes everything easier in your project so you can save a lot of time and efforts to start your work. So we chose Microsoft hosted agent with reasons, but we also encountered the challenge that this agent is not able to deploying any services to the SDM site of the web application based on customer security rules. So we came out with our first solution which tend to be a very easy one. So we know that every Microsoft health agent one is running it will be allocated an IP address. If we add whats IP address in the runtime to the allow list of the web application in its access control configuration then we can temporarily get the passport to the SCM site and then make our deployment. After the deployment we can remove the IP address from the allow list. So whats the SCM site will be projects from then on. We only need that window of time of allowance. So you may ask, did this work? Yes it did, but something unexpected happened during the middle of the project. So we are in the middle phase of the engagement and I received another urgent call from our customer project manager. He told me that his company security team risked a high severity ticket to our project and said they detected the web application that we're running is temporarily using allow policy that accessed from an IP address that's coming from the public network. Even this is added temporarily, but it's regarded as a big security leakage from their security policy and this option operation needs to be abandoned and stopped immediately and never tapped in again. Well I was shocked. The CI CD pipeline is running all the time and our team, my team is doing the automation of the deployment using the Microsoft hosted agent and this pipelines to make our upgrade all the time. If the allow list is forbidden, that means we cannot make any further deployment to the web application and our homework will be stopped and suspended. But we are so critical phase of this project and we cannot afford any delay. How can I do? Well, I gave the customer team a quick call, security team a quick call and try to understand the rhythm. It turned out that the security team had regular scan of all the services running on their cloud network and they figure out this security leakage as they mentioned before. And I explained to them that you see, the allow list was added only temporarily during the deployment window, and after the deployment it will be relieved and it's only for the testing and our developing purpose. And also the Microsoft health state agent is maintained and backed up and supported by all the Microsoft security tools and policies, et cetera that you don't need to worry about it. But the customer security team was also very strong. They said that even we are using this temporarily for the deployment window, but think the Microsoft hosted agent is running in the public network. No one can guarantee that it won't be attacked by a hacker during that deployment window. And in whats case it means their network or their enterprise network will be at security risk at that moment and that will not be allowed from their zero trusted policy. Well, I know this is something that I cannot further negotiate. What I was going to do is immediately find a workaround or a solution to solve this issue as soon as possible. So I immediately held the urgent meeting discussion with our engineer team and we came out with another solution. It's also straightforward. So instead of using the Microsoft hosted agent which can only be running in the public network, we'll be using a self hosted agent pool or a virtual machine skill site which is built by ourselves which will be running inside of the same virtual network as customer the other resources. In that case it will be in the allow list and will be permitted to make the deployment to the SDM site of the web application by default. But the self hosted agent needs all of our effort, starting from the zero, building the customized image by ourselves and also install the tools and dependencies and running time and elaborates all these things by ourselves. And we need to run it and test it to ensure it works. And the worst thing is sometimes you don't know what is missing, what library is missing and what tools needs to be installed until the pipeline failed on the agent. And it will be very time consuming. But do we have other choice? Not likely. And then in the next week it's a very critical phase of our project. We divided the project into two one is doing the continuing development in our own subscription so that we won't be lagging too much of the progress. At the same time, the other team is building the self hosted agent by ourselves and doing all the testing to ensure that the pipelines works without issue. So there were a lot of back end for as I mentioned, you don't know what is missing and what is wrong until the pipeline failed on the newly built agent and the troubleshooting really took time. It was a painful journey. But after one week we finally had a stable agent that all of our CSD pipeline could run successfully on that agent without any issue. And then we immediately move all of our work onto this newly built agent and started orchestrate our pipelines from here and make the continuous deployment to the SEM side of the application and that resumed all of our work. Luckily it didn't cast a big impact to our project and we finally meet our timeline. So and good ending. In order to illustrate this procedure I did a workout, a demo beforehand and also made a recording of that procedure. So now let me play the video and also explain the procedure step by step at the same time. So this is a very simple application I made. When I type in my name it will send back the grating to me. So as I introduced, I used the access control to protect my application. So if you look at here, this is my web application. So inside of the network settings there is access control policy settings. All right, so this is the detailed securing. If you look at here, there are two pros. One is about the main site. So this is about the access control of the Internet request to this application. So by default is allowed all the request from the Internet. And on the other hand there is advanced two set. So for this part as planned before, this is about the definition on the access to the SCM set or is about which request will be allowed for deployment on the web application. So by default for all the unmatched rule, it will be denied the access and only the access from the VNet will be allowed to access to the SCM side of the web application. So let's see how my pipeline will be working. And now this is my CRCD pipelines. This is the Yaml file I wrote to run my automatic deploying here I'm using the Microsoft hosted agent which is configured by default. So let's see how the pipeline running result will be. All right, so here is the result. If I look further, you see the failure reason is IP forbidden and the URL trying to access the ACM site of the azure web application. So as I explained, because the SCM site only allows the request from the virtual network. So the IP address which is running on a Microsoft hosted agent in a public network doesn't allow to make the deploying. So let's see how I go on with my first resolution which is add the IP address to the access control list of the web application. So here in the same place of the advanced two site of the web application, I started to add a new rule to allow the IP address which is running on my Microsoft hosted agent that I detected beforehand. So here I know the IP address which is running on the agent. So I directly added this IP address to the rule list. So here you see I entered all the settings, I added this rule. Now let's see how I rerun the field pipeline. Okay I made the deployment again to the web application. All right we see whats the pipeline running is successful, so the deploying is successful as well. Just a tip because in reality it's not feasible to add the IP address in a static way. So in practical I used a Powershell script like this. Instead I'm using this script to fetch the running ip of my Microsoft hosted agent and then I add this rule directly using this Powershell script as well. So it will be much easier. And after running the deployment this IP address will be removed from the access restriction rule of the web application. So this is just one small tip which are running this in the reality. But as you know this solution anyway is forbidden by the customer so I won't use this solution and now I will remove this ip address from the allow list again. Now let's see how I move on with the second solution. Whats I started to build my self hosted agent from creating the image. All right, so now you can see here, this is the image I created, a customization image to satisfy my running pipeline request. So here, this is the image I created. It's a quite large one, about 8gb, and this is the JSOn file to describe this image. So you can find a lot of documents and guidance and tell you how to create your own customized image. What I recommend is this GitHub which I found is very helpful. It provides detailed step by step guidance and instructions to create your customization image. I follow this and successfully created my own image like this. So I recommended use to this as well. And next, what I'm going to do is to use this created image to build my virtual machine or virtual machine skill site. But before that there's a very critical step that in order to build this virtual machine on Azure I need to publish the created image to Azure. So that means I need to publish this to a place. So where is the place? Here it is. So this is the self hosted agent gallery. You can imagine the agent gallery is like the replace. So the agent gallery is the place host your image. This is the image I just published to this agent gallery. And what I can do is to use the published image to create a virtual machine or create a virtual machine skill site. And what I'm going to do is to use this image to create my virtual machine skill site. A virtual machine skill site is a pool with a flexible number of virtual machine as you defined, so that it could provide the convenience and the flexibility to scale in or scale out of your virtual machines based on the workload on the running pipeline. All right, so this is the virtual machine skill site I created. If you look at here virtual machine skill site and the image I have been used is the one adjuster published in the agent gallery. And if we look at the settings here, it's within the same virtual network which is in the allow list of the web applications SEM site configuration. So with this settings, I'm now going to move back to my Azure DevOps. What I'm going to do is to create a new agent pool. Instead of using the by default Microsoft agent, I'm going to create a new agent pool and link that to the virtual machine skill site I just created. So inside of Azure DevOps organization and project settings, there's menu in the left hand called Agent Pool. So start from here, you can configure your existing agent Pool or add a new agent pool as I'm going to describe. So what I'm going to do is add a new agent pool. So now here you see I'm going to select my subscription and then link to this new created agent pool to the virtual machine skill site I just created. Now I select the virtual machine I created. That means I will bind this new agent pool to this virtual machine skill site. I name it Ado Pool. And with the other settings done. All right, so you will see there will be a new pool called Ado Poor. Let's click on it. If we look at the agents, agents will listed all the virtual machines which is in the running status, but right now there's no agent. Why? Because the virtual machine I created in the virtual machine skill site has not yet been in running status at that moment. So I moved back to my virtual machine skill site and check all the virtual machine status and ensure that they're in running status. So that means my ado pool is ready to use. Okay, now I move back to my ado pool and check the agent status again. Yeah, let's see. The two virtual machine listed under the agent pool are exactly the one you just saw from the virtual machine skill site which are in running status. Why these two agents are in idle status because there's no pipeline running so there's no job on them. But this is a very good signal. Means our ado pool is ready to work and our pipeline is ready to rerun again. Let's see how the result will be and before run the pipelines. Remember now what we are going to make the deployment or the new agent is taking this work is not the default settings we need to change it. So in the yaml file look at the deployment job description and in the pool settings change the name of the pool from the by default Microsoft Health state agent to the new one I just created. Okay, so this is very important and then let's rerun the pipeline. All right, now you see the result. The deploying to the Azure web application has finished and successfully. Okay, so make a ramp up of the procedures of my demo. So I created a web application and also I make the configuration of the access rule that only the access or the deployment request from the VNET will be allowed to the SDM site of the web application. In order to meet that requirement I started to build myself customization image and publish that to the Azure Agent gallery as a published image. And starting from there I create a virtual machine skill site using that customized image. And then I created Ado poor inside of my project and linked my ado poor with the virtual machine skill site I created so that I can use the virtual machine running inside of the virtual network with my customer image. Then I updated my pipelines and used this newly created agent as my agent pool and redeploy my application and everything works. Now the last part is about my takeaway from my whole engagement. So the first thing is the security thing is never the last thing, but always the first thing we should consider in any project. The reason is because cloud security is definitely a shared responsibility. So we definitely need to work closely if we're working with a customer, a big enterprise team, we need to work with their security team or expert to understand and define the requirements from their side because every organization may have specific request and you can imagine how busy this type of team will be. Lucky in our case. So try to make the conversation and dialogue with them as early as possible to avoid any unexpected issues or risk or violations in the later part of a project to give you a big shock and last but not least the thing here you can see that just from my demo. Building a self hosted agent is definitely more complicated than using the Microsoft hosted one and also consider the security and other security factors into our design or implementation will add more complexities and even more challenges. But thinking about in the Hong Bu we can avoid big asset because everything is protected, everything is secured. It may take extra effort, it may give you some inconveniences but it will ensure that the customer project will be successful that all your resources and your asset will be also protected so that will avoid big asset loss in the long run. No pains no gains so we pay the effort to build a security workload to build security network and then in the long run we get the successful project so all the pains pay and that's all my sharing today. I hope this is helpful and thank you for your listening.
...

Hong Bu

Senior Program Manger @ Microsoft

Hong Bu's LinkedIn account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways