Conf42 Site Reliability Engineering (SRE) 2025 - Online

- premiere 5PM GMT

From Complexity to Clarity: Our Serverless Journey That Slashed Costs While Boosting Developer Velocity

Video size:

Abstract

Discover how we transformed our infrastructure with serverless, slashing costs while boosting developer productivity. I’ll share our journey—challenges, solutions, and wins—giving you a practical roadmap to similar results. No theory, just battle-tested strategies you can implement today.

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hi everyone. Today I'm excited to share with you the story of our transitions from transitional cloud services to the serverless architectures and a journey that not only simplified our operational, but also delivered real and miserable benefits. Our ship to serverless was not just about keeping up with the technology trends, it was driven by the need of. Tackle complexity, reduce cost, and acceler accelerate our development process along the way we experience significant cost saving and a drastic reduction in incident and a boost in our deployment frequency. Also in the next few slides. I will talk you through the key challenges we faced and how we implemented serverless ens, and the impact it had on on our cost and incident management and overall developer productivity. Let's dive in. Let's move to the second slide. Okay. So in this slide our initial infrastructure challenges, that is what I am just going to discuss all about. So at the outset, we are facing mounting challenges with our like traditional infrastructures. So as our business grow did our infrastructure expenses, unfortunately, this cost were growing. Then our actual business needs. So underutilized servers were draining resources and leading to the inefficiency and unnecessary expenses. So our team were spending an over William amount of time managing the servers and infrastructures. Those type of things and these constant operational took focus away from what truly mattered and core innovations and building new features for our users. Now our traditional architectures was simply not built to handle sudden traffic spikes. So these limitations caused the significant issues including the customer facing failures which impacted user experience and trust. So these challenges made it clear that we need the change. Okay, now let's move to the second third slide. So here I'll be discussing like why we choose the serverless. So there are a couple of reasons like we choose. Serverless has a strategic force multipliers. Not because it's a one size fits all solution, but because it gives us a distinct advantage in terms of speeds, cost, efficiency, and scalability. By adopting serverless, how our engineers and can focus on building features and solving a business problem. Instead of spending time on managing and maintenance infrastructures, serverless architectures automatically scales to meet the traffic demands, ensuring the optimal resource utilizations without manual intervention. Now paper use models in these sections like. Where like with the serverless, we only only pay the actual compute resource use. So that is the most important thing, which provides the cost efficiency and reduce the need of over provisioning. And these combinations of agility and the cost effectiveness and the scalability made serverless the right choice for our needs. Yeah. Here I'll be discussing about the, what the technical approaches we took. And because the initial project I actually took like a we had decided to go with the AWS services. So to address our challenges, we turned to the AWS Suite of Power Pool serverless services. So services like AWS Lambda the Core Compute Service for running our business logic in a serverless environment, Lambda allows us to execute the code without managing the servers, automatically scaling based on the demands. And another one, like the Dynamo db that is another service we choose. It is a fully managed NoSQL database with auto scaling capabilities. DynamoDB adjust to traffic changes in real time, ensuring that high availability and the performance and seamless data management also. And and another thing is like a API gateway, which is a fully managed service for creating and managing and. Securing the API endpoints at scale API gateway simplifies API deployment, allowing us to connect our frontend and backend services effortlessly. Another one like the S3 S3 is another major component, like when we are driving through any development cycle or any applications to store the data. So highly durable pro objects storage for the static assets and the data. AWS S3 provides reliability. Reliability and the cost effective storage for everything from image to backup, ensuring the seamless data retrieval and the scalability. Another important thing is the CloudWatch logs. So a monitoring and logging service that helps us to track and analyze the log data in a real time. CloudWatch provides insights into the system performance and operational health, enabling us to detect the issues early. And optimize optimize our infrastructure. So it is a really very important component to choose like for the logging purpose. Another one like the though it is not mentioned here, but there are a couple of other component like the Secret Manager, which is a service to secure store manage sensitive informations like the API keys currencies and other secrets. Secret Manager help us, helps us to ensure the sensitivity data in securely handle without the risk of exposure. And the for izing containerizing the applications. We use the Docker to ensure the consistency across the environments and Kubernetes, specifically the Amazon elect elastic Kubernetes service, like it is called, like EKS, allows us to scale and manage microservices efficiently. Together, these services enable us to build flexibility and reliable service serverless architectures. With these services in place, we were able to achieve the scalability and the efficiency of what we needed. Let's move to the next one. And here I'll be discussing regarding overcoming initial resistance. Like a, like any major change we face some resistance along, along the way. So many engineers initially doubted that the real world benefits and the serverless architecture. So there were the concern about the performance, scalability, and the potential risk of. From that additional infrastructure to address this concern, we conducted a couple of works of proof of concepts like a POC projects. And these hands-on demonstration actually helped the team to see the practical value of serverless and providing the clarity and how it could solve our challenges. Now we identified the supported and early adapters within the teams who embraced our list and the server, and focused successful implementations. Their positive experience helped building confidence across the organizations. And then finally, we use the metrics to validate the business impact of the serverless transitions and data on cost saving, incident reduction, and improved deployment frequency, provided concrete evidence of benefits. So over the time this helps shift the mindset across the team and build strong support for our journey. Let's move to the next. Our implementation strategy was a step by step process to ensure a smooth transition actually. So the first step was to break down our monolithic applications into the smaller applications. Functions side pieces. By identifying key components, we were able to transition a more modular and scalable architecture. So that is the first point we took it. And then the pilot project, like we began by launching the pilot project with non-critical workloads. And this allowed us to experiment and refine our serverless approach without risking core operations while also gaining valuable insight for larger implementations. Our next step was to refactor existing applications to make them stateless, ensure they are they were like optimized for serverless executions. This process involve adapting our code and workflows to align with the serverless model. I. And again, like another one, like the CICD pipeline. To ensure the smooth deployment, we focused on building the optimized serverless specific CICD pipelines. This allowed for a continuous integration and delivery tailored alert for a serverless environment, enabling fast and reliable updates. This strategy allows allowed us actually to success and scale progressively. And NCO, the transition was as seamless as possible, moving to the next, yeah, as with the complex system, observable observability become a significant challenge. And dis distributed tracing across the multiple serverless functions was inherently more complex and the execution context for each functions was more limited, making it harder to gather detailed insights. We also had. To content with unpredictable cold start latency with impacted performance, as well as highly valu, highly variable resource consumptions. Making consistent monitoring and difficulties to tackle these challenges. We implemented several keys of solutions are like fastly introduced end to end correlation ID to track the request across the functions, providing a full view of each request journey. And we then deployed the centralized logging with structured data, ensuring we could capture and analyze log effectively. Custom function specific dashboards allowed us to monitor performance on a granular level and a dynamic alerting based on the statistical baselines helped us to detect issues early today. Our sophisticated monitor monitoring infrastructure provides the comprehensive visibility into our wireless ecosystem. And allowing us to have proactive detect issues and resolve them rapidly, ensuring a smooth and efficient operations. Now here to ensure our serverless journey remain cost effective. And we implemented several key op cost optimization strategies right side function resources is one of them. Memory allocation has a. Direct impact on both performance and cost. To optimize these, we implemented automated testing to determine the optimum memory settings for each functions, and ensuring that we are using the resources efficiently. We without our provisioning. And another one, like a optimized cold starts. And yeah, like cold start latencies can significantly affect the performance, especially during traffic spike. So to mitigate this, we reduced the size of our deployment package and implemented the provision concurrency for critical parts. This strategy ensured a minimum cold start impact even during the traffic traffic functionalities and all. And monitoring the execution s by closely we were able to able to identify the functions approaching timeout thresholds. These proactive monitoring prevent costly timeout loops and helped us to avoid the execution in efficiencies and saving both time and money. We introduced the tagging standards and deploy deployment policies to keep our serverless architecture organized and avoid unnecessary resources. So governance ensured that serverless resources were used strategically, and that the infrastructure remained manageable and cost effective. Together these strategies allowed us to optimize both performance cost across us across our serverless architecture. So this slide we'll discuss about the measurable result. Yes. And looks looking at the result, our serverless transitions. We have some truly impactful outcomes. So we successfully decreased our infrastructure expenses by 62 percents. And thanks to the efficiency of serverless architecture and optimize use of resources now, operational failure were reduced by 78% leading to a more reliable and stable environment. So with few disruptions to our services and customer experience. And our deployment frequency increased by 3.5 times, enabling faster iter sense and more frequents driving innovations and improvement for our users and which is a big achievement. Developers satisfactions increase actually to 94%. And as the team spend more time focusing on building the features less on managing infrastructure contributing to improved improved user experience and productivity. So these reserves actually highlight that, the tangible benefits observer list, both in terms of cost and in terms of productivity as well. So as we reflect on our serverless journey, we have learned a few keys of license. So the flexibility of serverless architecture was allowed us to adapt quickly to. To changing her business needed. We have learned that this agility is a key ops staying competitive in a fast and placed environment. Serverless is not only one-time solutions. We are constantly iterating on architecture and process, ensuring that we are always optimizing performance, cost, and efficiency. By freeing up the resource from infrastructure management our teams have been able to focus more on product development and innovations which is driving the future of our business. I. And looking ahead, we plan to extend our serverless architecture to more workloads continuing refining our harm monitoring systems as well. And explore new AWS services that can drive even more efficiencies and capabilities. So the journeys from over and the future looks bright, I think. Now as we continue our serverless journey. So we I want to provide the resources to help you on our own path. It'll be like the next step as well. Access to our comprehensive serverless templates and architectures patterns. These resources provide a solid foundations. For your own serverless implementation and saving your time and effort, and reviewing our architecture decisions and. We made throughout the journey. These records will help to guide your decision making process and provide insight into the challenges. We overcome and use open source utilities to taste and the benchmark performance. So that is mostly important thing, and these tools are designed to help and you measure the efficiency of your serverless architecture and optimize it for your own needs. Join these channels what I mentioned here, and to connect with experts and peer who can actually provide the valuable implementation advice and share with best practices and collaborate and ask the questions and learn from others in the community. And same thing. I actually followed, like during my entire journey, and these resources are here to help you, you help you and succeed. And I also look I also look forward to supporting your serverless endeavor. In future if you want. That's fine. You can contact me. Okay. And so I appreciate your time and today I hope my HR D has inspired you to explore the power and serverless architecture and for your own project. If you have any questions, if you want to learn more, feel free to reach out to me and I have my contact informations and LinkedIn here so you can message me anytime. I will respond as per my time. Thank you so much.
...

Tarun Kumar Chatterjee

.NET Senior Lead Developer @ Presidio

Tarun Kumar Chatterjee's LinkedIn account



Join the community!

Learn for free, join the best tech learning community for a price of a pumpkin latte.

Annual
Monthly
Newsletter
$ 0 /mo

Event notifications, weekly newsletter

Delayed access to all content

Immediate access to Keynotes & Panels

Community
$ 8.34 /mo

Immediate access to all content

Courses, quizes & certificates

Community chats

Join the community (7 day free trial)