Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hello everyone, and welcome.
My name is Ani and I'm the founding member at Cloud.
First of all, I would like to thank Mark and the team for providing
me with the opportunity to talk about observability in S 2025
as a name of the topic, suggest whether to observe or not.
I don't think that's a question.
Everybody is deep down into observability.
The real question is whether it's giving me the output that I'm expecting, whether
I'm running organiz systems correctly and looking at the data correctly.
So when we deep dive into the word observability, we need to look at not
what just it is, but how it plays into different context across my organization.
Whether I.
Security or work in compliance or something else.
Observ has something for me, but what did it, so before we deep dive
into observability and its different context, let's start with the basics.
Observability is the ability to understand what's happening inside your
system just by looking at the outputs.
The outputs in this context are called signals like logs.
It's about answering the question why something is happening and not
just what went into my application, but what do I do about it.
So when we look at it has a very broad spectrum or different context
into organization and let's deep dive into different context of.
First and foremost is about the technical context, which revolves around the
core data and tools that send the data.
This is the heart of observability.
This is where you collect all the telemetry data, like metrics, log
stresses, and you define how to use that data, how to track, which is simple.
Test sending the data.
Example.
Example, when you look at the technical data, will for an application recently
deployed, we'll look at the A spikes.
You might want to check request duration metrics.
You might want to check logs, and you might want to look at specific process
for the service that are failing.
So this is the core data and critical for observability.
This defines the basics on which the system are built on.
But beyond this technical context, there are specific use cases.
Not all users want to use the depth of all the data that you're sending in, but they
want to look at dashboards and alerts.
So one of that team is your operational teams, and they have a very defined
operational context for your system that.
The live and breathe on the data that is sent by these observability tools,
and they look at system reliability, incident response, and objectives.
They want to define different SLA SLOs, SLIs error budgets, or
application, and track those for the reliability of the system.
For instance, let's say you have an operational dashboard that shows spike.
Or define different errors at different service levels, can the ops teams want
to understand how the system degradation is based on these errors they want?
They not only want to detect these errors, but want to understand the
impact on of time of the system.
This directly goes hand in hand with developer experience.
The major outcome of an observer system is the feedback loop that operation
teams can provide to the development.
So the development context has wherein the developers understand how it's
from writing and shipping code to running it in production, how different
tests are perform for my application, how my C pipeline is working and.
A feature flag, how it performs in different stages
and how different services.
This helps users to define the production performance of a certain
application, its features and tune them.
It also helps to understand each cases for different feature flags that
have been created, so the development context and DevOps go hand in hand.
But that doesn't stop here.
A user facing application is of no use with a hundred percent working
backing system and no user experience that works for the customers.
So from a business perspective, a hundred percent backend, uh, has no use.
So the define different observative response observability requirements
for user facing application.
And that is basically goes into the UX context, how the user is
experiencing the product, how you monitor, how your funded behaves, and
how your resource flows are working.
They use tools like real monitoring s checks to help answer different
challenges into user experience.
Is, is.
Are my customers abandoning cars because frontend timeouts does my site
load in five seconds instead of two?
How is my bounce rate and different red flags that affects different user loads?
The frontend monitoring it help improve the user interactions with the system.
Business users are concerned about the user observability and its outcomes.
If the,
uh, technical details that you're getting are not helping business context, then
this is a huge drawback to your system.
The system needs to bridge the gap between technical health and the business impact.
If you.
You need to able to correlate that with five x error in the checkout service
or in some of backend service, which is not able to work with the database.
So if you can determine what is causing the revenue loss because of
the application issues, then business can get very good insight into.
Piggyback on security and compliance for your application health.
When you talk about security, it, the users of the system, the security
user system needs to ingest the logs and traces that detect anomalies or
detections and, and investigate any breaches into the application in, uh, use.
For example, example, a spike in login attempts from a single IP can range
or can be spotted into tive data.
So it also can help you look at the different, uh, pricing for applications
or different services within the application and find out user compromise.
So these.
Security incident.
It's just not application downtime.
It basically costs, uh, business and have different penalties at different regions.
So to avoid this, different teams use different compliance methods.
Think of GDPR, hipaa, SOC two that you eventually hear about compliances and.
System also helps to maintain audit data, access logs, and track consideration
changes all times that are essential for this compliance and audit requirements.
It also helps to define policies like longer retention, uh, than authenticated
access to the sensitive data.
Enhancing observability in different contexts helps user look at
the data in different contexts.
But what is the use of this context
when you wanna utilize the data?
Correlation is the key application teams might be generating and
utilizing infrastructure metrics for sending the software, uh,
details how, how it is performing.
It's helpful to monitor this performance and resource usage.
For the applications.
Applications.
But for businesses, the data needs to be domain specific.
What is my user experience, whether it helps me to achieve the business goals.
So correlations of this metrics help
the business leaders and the, uh, other users to use of the.
If you track the latency to number of active users, if you track the,
uh, through, uh, block signups or checkouts right, then this is more
helpful to the users than just looking at the different dashboards.
The key differences between application metrics and business metrics is that they
need to go beyond system performance to define the business performance.
They need to go beyond latency, CPN error rates to define the convergence,
churn, and error research they need to provide what is affecting the
system of time to what is affecting the revenue and product decisions.
So correlating the application business metrics is the key.
Apart from isolating the technical issues, you need to be able to satisfaction so.
If I'm getting a drop-in checkout, I need to be able to find out which is
the find error that is causing this, and that needs to be defined with
different tools that businesses can use.
So if you want to look at different, uh, uh, business outcomes, we need
to have a visual correlation between the matters that we're ingesting
and based on the we're consistent.
And give the fine access to the issues that you're getting into application.
For example, I'm able to track which user is getting the checkout, uh, bad
checkout experiences, which region is having the most latency, which
region is having different plans for mobile and let's say browser users,
which users are getting feature.
Which product our ID are getting, uh, lower checkouts and whatnot, right?
So this is very granular filter that businesses can look at.
And essentially this goes into providing alerts for specific things when we only
monitor the infrastructure application.
Business and application, we give composite alerts.
For example, if an error rate spike is giving issues into conversion stop,
or is giving issues into purchase checkouts, or is giving issues into
user signups, this is the alert that businesses want to look at.
Rather than just, okay, I got an 500, or I got 50% error rate into
application for last five minutes.
The SO for that is should be a mix of system health and user experience.
So going beyond when we are using an system, this needs to
have a unified system inside.
That's where the coordination helps.
That's where you can correlate business metrics to applications.
That's where when you look at certain logs for 500, you can look at the time
series of it at the same time, see the events which are causing this.
If a system does not have a unified view of different data that it is ingesting
and provide you with a capability to use that view to better utilize the uh,
so.
Application metrics are primarily technical.
It talks about things like latency, error rate usage,
database query time and whatnot.
But business metrics are more product driven.
Whether I'm able to retain my daily active users, what is the
adoption of my new features?
What is my revenue for transaction?
What is my PCO for total infrastructure that I'm investing in?
These are different things that different roles in the system are looking at.
And they're critical for everyone.
So business metrics will tell you where the business is based on details that
you provide with the application metrics.
Correlating the two is very essential and it turns the, from a mere backend
system into a full stack superpower.
Let give you very real time.
I.
Standard payment processing.
The payment processing works across multiple regions, but I'm getting five
xx and which is affecting my overall SLO.
When I look at this, I see that it can be at different places, but if I
have a custom label that gives metrics about success, signups, or purchase.
Detailed granularity with user type, region plan, or feature flags for that
thing, I can, uh, kind of aggregate based on that and get the details about, so if
a user can get for this region and for this device type, I'm getting my users
affected, they can easily change that and.
So correlating that is very key.
But at the same time, the upgrade system needs to have the
capability to correlate that.
An observed system with the correlation can give you more, uh,
output than one, which without it.
So the, at the end of it, the system has a defined cycle.
It'll gather application and business metrics based on
the tools that you're using.
They can be at different granularity, but when you ingest the metrics, how
the system is correlating that, create relationships and visualize the data in a
way that any user of the system based on its role can define different data sets,
define different dashboards and alerts.
Generate insights from the, uh, ingested data is the key.
So to sum it up, observability has many contexts.
Technical, operational, user experience, business security, and compliance.
They all are necessary or they might not be necessary for different enterprise.
Application and business.
It brings observative life.
It turns the raw data into real insight, real insight.
Thank you for listening in.
I'm happy to answer any questions offline or deep dive into specific things.
Thank you very much.