Conf42 Platform Engineering 2025 - Online

- premiere 5PM GMT

Scaling Enterprise Development with Cloud IDEs: Security and Performance at Scale

Video size:

Abstract

Learn how cloud IDEs and remote development environments can help businesses safely increase developer productivity. Discover practical methods to integrate with enterprise systems, expedite onboarding, and maximize performance, all without sacrificing velocity or control.

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hello everyone. My name is Jen Taghi. I'm a software engineer at Salesforce. I'm part of the Slack engineering team, and today I'm going to be talking about. Scaling enterprise development with cloud IDs, security and performance at scale. So this presentation is inspired from a project that I led at Slack which involved the development of a new remote development environment platform here at Slack that allowed engineers here to move from developing at using their laptops to a shared remote development environment. If you've used products like GI Pod and other cloud IDs, you might be familiar with it. But this is about building something from the ground up that meets your unique needs. Alright, so with that, I'll get started. I think the idea is like, why do we even need it? What are the challenges that are unique to enterprise software development? So we are often. We have to navigate huge code bases, distributed teams. Everybody's working on certain subsets of the code base that makes local development extremely slow and hard to manage. When oftentimes if you're working on your laptop, it might take you minutes, if not hours, to be done with your builds because your local laptops just doesn't have the hardware or the compute capacity required to actually build these large code bases. Even with the advent of moving away from monoliths to microservices, you still have extremely complicated. Code basis and development environment. And that requires extremely heavy tool chains. If you're doing I development Android web development or some ML AI work, you need to install. And set up those tools and you need to make sure that they continue to work, and that is a huge undertaking. So that's why this move to cloud IDs is preferred. Cloud IDs are remote cloud hosted development environments. They're usually accessed via remote IDE that is connected to your machine using a secure SSH channel. So you get access to a terminal. Just as you would of your laptop. So you can run commands you can run, get commands and any other things any local CLI tools that you would run at your laptop. You'll do it at eight. On the virtual machine you get a a numbered development environment. So you can use any regular browser to be able to test your changes. So you know, the idea is that this offloads the heavy computing bit to the cloud which scales really well thereby making a much more pleasant development experience. So I think I did cover this, just to enumerate it further, the motivation of you moving to cloud IDs is mostly to navigate large code bases and these heavy workloads to be able to work in a more performant, efficient way. But also I think the more important point is reliability. If you're working off of your laptop. There's going to be some tool chain that is going to break with every patch, every update that you make. And everyone will have their own unique sets of challenges. So you will have end up with things like, it works on my machine but not on yours. So the idea is to have a consistent development experience across enterprise. Where the setup, the development setup is more reliable because the VM and the OS version there doesn't change. If it changes for everyone. So that means the remediation and changes can be made to make sure your tool chain stays stable. So let's talk about the particular problem at slack. We. We had a already complicated rather complex development setup. But then what happens is when you acquire new companies, which is the case, slack itself is an acquisition, but when you're integrating with other novel products, which were not built as part of your code base. So now you're left with features that have footprint in multiple code bases, often in different GitHub accounts, even different GitHub enterprise accounts. You still need to make sure you provide your developers a rather manageable development environment but still navigate these different code bases and, you allow them to have end-to-end testing so that, the development can be smooth. So it's going to be tough if you just, if you're making changes to one repo, then you wait for it to sync, then you make changes to the other side. You use some feature flags or some versioning to be able to test things together, that's going to slow you down considerably. So to solve that problem in particular, we had a product that had this problem that developing that product required managing both code bases. So what we did was we built this custom development environment wherein we had. We had two docker containers running both code bases side by side, and then we relied on port to port communication so that the two containers can talk to each other. So that gave us a environment where the product can be tested holistically. And we had the tool chains to be able to manage both code bases also. So this was really transformational in that suddenly you can iterate and develop features for this product much quicker. What are the benefits over traditional locals setups? For one scalability and speed, like I said, you get high-end compute, they're much more elastic. Depending on your use case, you might get like a much more. The VM that is provisioned to you much might be much higher compute versus when you're doing something simple like maybe some STML JavaScript development you might get a, a sort of lightweight machine. Depends on what you're doing. It makes for a consistent development experience. You have your entire development team working on a setup that is consistent. No, everyone's experience is the same. It leads to much more secure and auditable development environment because you're not having to pull oftentimes confidential company artifacts to your local laptop, rather you working on a vm. That is much more hardened when it comes to security practices. You have the principle of least privilege and all other IEM role-based access controls are enabled there and most importantly, least to a faster onboarding. You have new engineers who are not spending time setting their laptops. With one CLI command, they get a VM that is fully provisioned, fully set up with everything that is required to development so they can be productive from the day one. So if you look at what is the, like a high level architecture overview for these remote development environments? So each developer gets a personal remote workspace that has isolated resources that doesn't share, it shares resources with the other VMs, but it doesn't share any logic or data. So it's completely isolated and it's as. As good as working on your own laptop, except it's faster. You get, you're connected to these VMs through TLS encrypted tunnels, usually through secure SSH. The VMs that power this development experience usually are, they are made off of like standard images. So what you do is, let's say if you, even if you're supporting three to four development environments the flavors of it. You can identify the commonalities, like maybe you need Linux installed on all of them. Maybe you need some other binaries installed on all them. So you can distill all of the common parts and bake those in into an image. So you use these base images to build these development environments. So again, that just means like those things are hardened and not left, to be dealt with by the developer. So that means like you're gonna have a much more smooth development experience. Again, it's gonna be, it will by default. It is going to be a much more secure experience just by the fact that you're not using your own laptop for development. Because it, it comes most of these development platforms, it's gonna have. SSO enabled. So you can have these VMs are mostly going to be in your corp intranet behind firewall. So without proper accesses they're not gonna be accessible to anyone. And again, the usual privileges and access control mechanisms will also be available here. In terms of performance optimization you're not gonna have cold starts because you'll be able to pre-built things or, and have things like remote cache. So usually in a couple of minutes your machine is ready because when because of the shared VM and shared pool, you will have prebuilt and the images that have those dependencies already installed. You will have a pool of VMs, some a warm pool of VMs ready where we basically, these environments are ready to go because depending on your usage in you might have a set of VMs that are already provisioned and they're just like. Provide it to the developer when it is requested. So you don't need, really need to wait for the developer to issue the command before you start building these images these machines. So we are, slack. We used AWS auto scaling Group. And like in that we have like sub two minute, around 92nd startup times. Like I said, you can cash dependencies and builds so you don't have to do everything all the time. Only when something things change. So what is a developer flow for while using these remote development environments? So first, you're going to request a new development development environment. You'll do it with a simple CLI command. You'll go to your terminal and maybe you'll write, get remote dev, your branch name and some configuration. Let's say you wanna do front end developments. You'll say get remote dev branch name. Front end or backend ml or any other configuration that you wanna provide, depending on that per configuration the system will then, like our platform, at least a Slack, it automatically provisions these VMs. It's going to install all the required dependencies. So if development environment requires Xcode utilities, or some secure cryptographic utilities to for key encryption data encryption, those kind of things get pre-commit hooks, post-it hooks any linkers that you might need or any other tools that you might need. Those kind of things will be installed. So usually you have to write a script for these. We and every enterprise can will have these things specific to them for for at Slack. We had our own sort of shell scripts and chef recipes that when these VMs came up, those volumes are mounted. The secrets are first from the secret store and and everything that needs to be set up and installed on these machines was done. Okay. When it's provisioned, which like I mentioned, takes 90 seconds, it's done, your remote environment is now ready for use. You can connect to it via SSH from vs. Code cursor, wherever you know. And this, you get access to a terminal. So it's like working at your own laptop except you have access to a vm, which is much more reliable and much more powerful. And then, you can code away. It's you get a branch checked out with the branch name you provided, and you can start checking things and merging code because you're also integrated. The VM is integrated with git with your source control system. You can in, it at Slack at least, we have dynamic provisioning, which means. In the config, in the CLI command, we indicate what kind of development environment we need. If it is front end development, maybe you will get a machine with different parameters. It's going to be, maybe you'll get like MX large or something like I'm talking about Amazon ESGs. And if you indicate you need machine learning data sounds assigned backend, you can predefine. Like what, how many cores of CPU, what kind of gigabits of rams that should be provisioned? And according to your enterprise development needs, you can set these parameters. So this gives you that level of control where if you doing something really compute intensive, you can ask for a bigger much more powerful machine in terms of accessing code. On these VMs. So obviously you, the first thing, once everything is set up on these VMs, you wanna download your code, right? You wanna check out your git repository. So there are few ways to do that. You can do that by on laptops you basically run your Git commands. So what you can do is one of the options, at least when we were thinking about this at Slack, we were evaluating these two approaches. One is S agent for wording. When we work at laptop, basically we use our SSH keys which are added to the GitHub repo, and we are able to fetch and pull and push, merge our GitHub repo. So we just basically for our SSH credentials to the vm, and then we run these commands from V vm. It uses our SSH credentials from the laptop. And it's done over a security LS connection, this is encrypted and completely safe. And it doesn't require any other setup because it's as if you are issuing these git commands from your local laptop. The other way to do that is basically. Authenticate to GitHub to be able to use the code is you can create a GitHub oath app. It is a managed way of doing things very easy setup and integration. But you do have to be mindful of token scopes, revocation, making sure you, the tokens are not expiring or you are renewing your secrets and keys. And with token there is always this potential of misuse if they're compromised. So we ended up going with the SSS agent for all these reasons. Now because you're building these features with this development and environment yourself, you can build some unique features that cater to your use cases. So I'm going to talk about one of them here. So one, one of the features that we implemented for this is frontend grafting. So frontend grafting is allows us to basically graft or put the frontend assets or bundles, which means J-H-T-M-L, JavaScript or react, all of that from another vm. Onto, let's say, a different development environment or even staging or prod code. So oftentimes we build something, but we are not able to test it with production like data or shapes or configuration and the development environments we use, they just lack that sort of data to be able to confidently test things. So what this did was. We basically built a grafting mechanism to test with like real world data where you can go to your pro and then you can u using special query paras. You are able to tell that fetch all the front end assets and bundles from this vm. The VM that you are using, the remote development environment that you are using to build front end. Again, this is all secure. It's a, it, you have to be within the company firewall and so there's no potential of misuse here, but it's a very innovative way of basically using front end that is supported from one environment and back end that is from another environment. And there's obvious benefits here. You. Because for our product we had to do it because the front end is from, it lives in one code base at least some of the assets were, and the back end of the services is in another. So yeah, that was one innovative sort of solution that we came up with. But building our own custom remote development of that environment is what enabled us to be able to build this No other. Off the shelf. Cloud IDE would provide this. So yeah, these development environments, they are integrated with your usual enterprise systems that you need for development, version control, pr, workflow. I already mentioned that. They, when the VMs come up you use Visual Studio Code, you already have your Git code checked out, so you're able to. Look at GI history. We are able to create branches, forks do the usual things that you do, create, pull requests, all sorts of things. And they're also aligned integrated with the CICD systems because you are doing it yourself. It's like a bare metal vm and you can in, you can do anything on it. Yeah, you you can have integration with CI CD pipelines. You can run your tests issue commands to run test in your Jenkins, if that's what you use. You can trigger CI workflow. From the cloud id any other tool that you might be needing? Let's say if you have custom tools for code reviews, issue trackers, let's say Jira, or triggers or hooks any other dashboards, everything can be accessed because it's basically another computer that you've been given access to. The integration with Enterprise Secret Management. Most enterprise software companies, like there is some sort of secret store that is used where the keys are pulled from the secret store at runtime. And these keys are rotated at a cadence. But yeah, that integration is also done. And we did that for our use case. In terms of how do you, so there needs to be a operational playbook and like a rollout strategy. If you build this from the ground up, it's not going to be like you certainly announce it and that, it's ga you have to treat it like a product that just happens to be used inter internally. So maybe start with the small pilot team. Get them to use this as they're using it because developers are your users. They're going to have feedback that you can inculcate in the product lifecycle. You can prioritize the features that developers need the most. Maybe they need some tooling that is missing, or maybe they need linkers that is extremely important, or the ability to run tests from these IDs. So this kind of feedback you will get. And you're gonna have these teams onboard in phases and that will allow you the time and the feedback necessary to be able to build something useful. You we have, you have Redfin playbooks for OP operational tasks, because this is going to be new vm maybe some of the developers have only worked on Windows or Mac, and suddenly they have to work with a Unix or a you, or open to OS depending on what your VM has. So you need to have these operational tasks, these scripts these life, maybe even docker lifecycle commands, everything documented. In a playbook so that developers can do that. So yeah, that sort of is connected to the training and documentation bit as, and when you get irate you can add more features to it because there is some barrier to entry here. There is a considerable in initial investment. So you were not going to be able to build everything at once. You have to prioritize what parts of the code, base or the development flow can you support first. And that is where the feedback is going to be important. Obviously like for something like this you have VMs and you have these these essentially autoscale groups. And that are being shared, but you want to make sure that you do it in a cost effective manner. So what are some strategies that you can use to make sure there is no overspend? There are a few strategies that we use. Obviously these development environments are provisioned on demand and they are ephemeral. By default they will have a life cycle, let's say. Maybe a week or two weeks, and then you get warnings and then they are they, they're killed and you know that space is available for someone else. But even other than that you can have. In dev environments suspend or, may go to sleep when there is idle time detected like 30 minutes, one hour, things like that. You also have to be mindful of what size of instances you're using, so if that's where the config comes in. So if it's important that you map the hardware that you're providing. To the use case that the developer has if they're doing some simple front end development, you have to use the right size instance to make sure that you're not just throwing hardware at a problem that doesn't require it. This is multi-tenant by default because. Behind the scenes is basically one machine that is serving these that is using virtualization. You have multiple VMs and they're all running these docker containers that is helping you develop. And yeah we need to be, you're keeping an eye on the cloud costs. So you have to have dashboards billing and need to have some alert set up to make sure that you are not crossing your budgets. And if you are, you might need to provision more capacity. It's a good problem to have if you get that kind of adoption. It's a good problem to have, and, but it certainly requires preparedness. And yeah, like I said, you cannot just throw like realistically speaking, just more hardware at things. So while you want, faster performance for your developments, you also have to balance it with cost. So every organization will stumble upon the right trade off. That is good for them, depending on where they are. But it's just something to keep in mind. Having sort of these enterprise cloud IDE platforms you can, because this is a consistent development experience provides parity. You can enforce po policy guardrails, you can have auditing, you can have access and compliance management. For every audit because if you're creating an audit trail you can be sock to compliant by default. Admins will be able to restrict the images, machine sizes, the network access. There's a lot more control that you will have. When the development is happening on these remote development environments there is going to be some level of developer autonomy for sure, because eventually what you wanna do is you wanna give them a VM with their terminal where they can install things if they need to for their use case. And they want, let's say if there is ID extension they want, or they want a custom linter or a DING package, they can do all of that. So there's that level of autonomy too. But the core pieces of the product are going to be hardened. So let's talk about some of the lessons learned and some recommendations that came out of this project. So it's because there is some investment it, there is some barrier for entry here. Yeah, it's probably more suited to a bigger code basis. If, like for it to be a justifiable decision, you have to have real issues scaling on your laptop development code flow. So if you're having that, those issues, this is the right choice for you. But if it is still manageable, because of the effort the effort is non-trivial. It might not justify right away though if your company grows your code base grows. Eventually you will end up using this. Always I trade with feedback. Prioritize the things that matters to the development team, to the developers in your organization. First provide environment FLA flavors, which meant. Have these, category or different categories of development environments where the, you have the right size of VM and compute resources available for. The kind of work you're doing that will may help you be more cost effective. Just as a sort of a side note, Uber created six flavors of these dev environments to cater to different needs. We, at Slack we also have at least four four or that I can think of for different things. Some real world success story. I'm proud to say at Slack we've been very successful for these I. Don't think I, there are many developers who use local laptop development for this product in particular because it's just di very difficult to, I manage those two code bases on the single laptop. So around 90% reduction rate has been very successful for us. And again, some other benchmarks. Around 75% build time reduction, because you are using a bigger machine and multiple machines that are able to have cashed bills and a remote cash. So you're able to reduce that time, which is a huge boost. Then there is a case study came out of Uber where they had a internal dev PO system. It allows choosing large machines up to 48 CPU codes. That must be for a specific use case. And then, obviously a laptop can never scale to that kind of requirement. And the biggest plus is you are productive. Your engineers are productive from day one because they're not spending time just setting up machine. So to conclude what it offers, I think the key takeaway is that moving to enterprise development with cloud IDs, remote development environment. It's going to speed up your engineering and while making your development workflows extremely secure you, it'll enable us to you to leverage powerful infrastructure and have some sort of centralized control to solve all those reliability issues where things are working on once one developer's computer, laptop, but not another. So those kind of things that you can completely take out by moving to remote development environments. Yeah, and you can architect and cater it for your own requirements. That's the thing. So yeah, I'd a it's a very transformative project. If it is successful, it changes. It's a complete overhaul of your development experience. It might have long lasting permanent changes to your engineering culture even because if it improves development, velocity, productivity it it's a good thing for the organization overall. That's what, that's all I had for today. Thank you for allowing me the opportunity to present at Platform Engineering Con 42. Thank you. Okay.
...

Jayant Tyagi

Software Engineer @ Salesforce

Jayant Tyagi's LinkedIn account



Join the community!

Learn for free, join the best tech learning community

Newsletter
$ 0 /mo

Event notifications, weekly newsletter

Access to all content