Observability Starts Before the Outage: Synthetic Monitoring for Modern Systems

Video size:

Abstract

What if you are asleep and your website fails? By the time you wake up, customers are frustrated and revenue is gone. In this talk, discover how synthetic monitoring lets you catch issues before users do, turning observability into a proactive shield, not just a postmortem tool.

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.

Hello everyone. I'm Paulette Baer. I'm a DevOps engineer and a technical blogger. Today I'm so excited to talk about synthetic monitoring for modern systems. So let's first understand why synthetic monitoring, what problem does it really solve? Imagine you wake up to a flood of emails, support ticket, frustrated users, and lost revenue. That's what happened with us once. We had a payment gateway, API, which crashed at two M, and we found out only when 50 customers complained about it. So what if we could catch all those problem before they happen? That's what synthetic monitoring does, and I'll show you how. So now what is synthetic monitoring? Synthetic monitoring simulates the user's interaction with your system. Think of it as a robot, acting like your customers, like clicking through your website, submitting forms, calling your APIs 24 into seven even when you are asleep. And it does all this taste every one minutes, five minutes, or whatever the timeframe you choose and from different places around the world. Now what is a problem with traditional monitoring? So traditional monitoring waits for something to break, and it's reactive process. On the other hand, synthetic monitoring is proactive. It doesn't wait for real users to find out the problem. So that's the power of synthetic monitoring. Now, why synthetic monitoring matters? So I've got three key reasons for that. First. In today's world, our systems are getting super complex. We have lots of microservices, serverless applications, and APIs and second users. Expectations are higher than ever. Users want everything to be instant. 2024 is studied. Said that just two seconds delay can make 20% of your customer leave your website. Right, and four online is stored. The data, say the downtime can even cost you a hundred thousand dollars per hour, and that's a lot of money. Third, we have got our customers all around the world. Your app might be used by customers in Canada, USA, or India, so we want to make sure it works for everyone from everywhere. Now synthetic monitoring trends in 2025. So as of 2024, the global synthetic monitoring market valued at $1.42 billion, and it is expected to reach about $3.78 billion by 2033 with the annual growth rate of 11.5%. Now what's driving this growth? So let's see these points. So first, rapid cloud adoptions. Companies are moving to the cloud because it's faster, easier, and it can help us save cost compared to data centers. Also, now company wants new solution for monitoring, better ways to monitoring. So they are adopting synthetic monitoring now. Monitoring tools have integrated AI and ml, now they find out the issues much faster and with accuracy using ai. And third, there's a big shift going on from reactive to proactive monitoring. So now companies are aiming to identify and fix issues before even the real customer. So now let's see what are the types of synthetic monitoring we have? So first we have got API Monitors so IT test backend end points and make sure it is up and running it. Test your APIs and make sure it returns 200 successful response and it is not failing so you can monitor your critical APIs using this. API monitors. Yeah. Now second is browser based monitoring. So it simulates what a real user would do. So it'll click on add to card button. It'll place data, it'll make the payment. So all those browser related tests you can do using browser based monitoring. Now, scripted monitors, so this type of monitors follow some fixed steps. Like what a customer's user journey looks like. It'll log into a account, it'll place the order, maybe add to cart and all those flow one by one, step by step. And to end that to that, you can test with a scripted monitors meters. Now, ping monitors, so ping monitors is used to check the availability of your servers. Are they online? Is there any. Network error that you can check using ping monitors. Now come SSL and TLS certificate monitors. So this monitor ensures safety certificate of your websites, whether the certificate has expired or valid, that you can check using SSL and TLS certificate monitor. Us now, we have learned so many things about synthetic monitoring. Now see how to implement synthetic monitoring. So I've got you three steps to implement it. First thing is identify critical users journey and transactions and API. So identify the problem, what your users want. So for example, for. Healthcare applications. Users want to have it always up and running. They want to access to their prescription, they want to book appointment with their doctors. So this is a critical user journey and you can't afford downtime in that. So make sure you test such a scenarios. So once you identify the critical user's journey, now next step is create the script. Create the scripts for end-to-end flow, what users would do one by. And the third point is select the monitoring tools. There are several monitoring tools available in the market. See what works for you for your use case. And the fourth step is schedule test to run 24 to seven from global locations. So this flow that we have seen, you can test it every one hour to make sure if in the last hour it fell, then you can set up some alerts that you will receive when your team can take action on that. So schedule alerts and monitor it 24 into sevens from global location. If your users are based in USA, then test it from USA if you have got your users globally. Then test it from multiple locations. Now, last step is analyze the data regularly to optimize performance. So once you have data in last 24 hours for how many times your system failed and what were the reasons of it. So once you find out the issue in your system, you can plan to fix it so it doesn't break next time. Now, tools for synthetic monitoring. So there are many tools available in the synthetic monitoring, new Relic, Dynatrace, Datadog, but all these tools have some features in common that you should look for. So first is multi-region testing. So you can test your APIs and system from around the world and multiple locations. Then observability integrations so you can integrate your other. Logs into the observability, like for your systems, logs, stresses and metrics, and then AI driven insights. So you can use AI to find the patterns in your logs. If you have a huge logs and there is some application failing, you cannot pinpoint the problem. Then you can use AI to pinpoint the problem. So I have used this features one, for one of my Java application, there was an null pointed exception and there was a huge code base I couldn't identify from where it is failing using ai. It just given me the exact method name where it is failing. So use AI and it also gives you some tips how you can solve this problem and improve your system's performance. Now, custom scripting SU support. So all this tool have scripting support. You can write your scripts for your test and nots, Python and different programming languages. Now, TIFs for writing synthetic tests. So there are some best practices you, you should follow when you are writing scripting. So test from multiple region. We already talk about you should test it for multiple region if you're. Users are best globally, then set realistic threshold. And so it is very important for your non-critical application you can afford if it fails one or two times in a day. But for critical ones like payment or banking, you don't want it to be down even once. So set threshold realistically now focus alerts on critical issues. If you keep setting up alerts for everything, then you will be overwhelmed by the alerts and there is a chance you will miss the critical one. So always set up alerts only for critical issues. Now, validate end-to-end flows, so you have to validate end-to-end flows. It doesn't work if add to card feature is working or. Order placing is working. You have to make sure user is able to log in, add to cart, place, order, and able to complete the payment. So always validate end to end user's journey. Okay, now key takeaways. So the main problem, the synthetic monitoring solves this, it catches issues before your customers or user do. So have it implemented and focus on critical user journey. What is critical for you? As I said, for healthcare, it may be booking appointment with the doctors or accessing their prescription. And for banking it may be transferring funds from one accounts to another and integrate with your observability stacks. So you may have logs pure AWS Cloud Kubernetes. Or different kinds of logs so you can integrate it with your observability stack. And the last, and not least, leverage AI to predictive insights. AI is really helpful when you have got lots of logs, metrics, and presses. It can pinpoint the problem, it can suggest tips to improve the performance of your system. Thank you so much for your time. If you have got any question, please don't hesitate to reach me. You can reach out to me through LinkedIn. Here is the QR code on the screen. Thank you. Thank you so much.

Slides

Download slides (PDF)

See all 61 talks at this event!

Conf42 Observability 2025 - Online

June 05 2025 - premiere 5PM GMT

Observability Starts Before the Outage: Synthetic Monitoring for Modern Systems

Video size:

Abstract

Summary

Transcript

Slides

Palak Bhawsar

DevOps Engineer @ Telstra

Join the community!

Featured event

2026

2025

Info

Conf42 Observability 2025 - Online

June 05 2025 - premiere 5PM GMT

Observability Starts Before the Outage: Synthetic Monitoring for Modern Systems

Video size:

Abstract

Summary

Transcript

Slides

Palak Bhawsar

DevOps Engineer @ Telstra

Join the community!