Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hey, everyone.
Today I'll be presenting on cloud native observability, enhancing
monitoring and performance.
So when we think of cloud, applications, the first thing that come into our mind
is SAS software, probably, high volume internet sites like YouTube, Netflix.
so in order for these services to be delivered to the customers, the users.
efficiently.
Constantly behind the scenes, there are a ton of engineers that are
continuously monitoring these services.
they're trying to identify potential bottlenecks, optimizations, and,
Basically trying to see what the health of all these systems are as
the systems mostly run on at a data center delivering these services.
So let's get into some details of what I'll be presenting today.
So we'll go over.
What is cloud observability?
Why is it important?
Some key observability tools and technologies.
What are the benefits?
Best practices?
We'll go through some real world use cases.
What are our current challenges?
And, what lies ahead?
What is the future of cloud observability?
Cloud observability, sorry.
what is cloud observability?
Cloud native observability?
The definition is Observability is the ability to measure the internal
state of a system based on its outputs.
In cloud native environments, it involves logs, metrics, and traces.
The key components are metrics, which is insights on how a system is performing.
Logs is constant events of What the system is running or what it's doing.
Traces is the overall flow of that system.
How, how it's part of the overall topology.
Where are the requests coming in?
Where is it going?
So in a nutshell, it is a way to gather logs, metrics and traces from a system
and using it for monitoring purposes.
So why is observability important in cloud native?
in general terms, applications, cloud native services.
So modern services are mostly microservices as.
Microservices keep evolving, they start getting complicated.
Lots of APIs get embedded into it, lots of, CRUD operations.
there is some level of, complexity that, kicks into, these overall architectures
with all the interdependencies across, different microservices based on
their, distributor architecture.
in order to get a visibility into those kind of architectures, observability
is really important so that we exactly find, any performance issues at a
specific, microservice level or a module.
the next one is real time monitoring.
So overall, observability is key to identify potential issues in the system.
it helps us gain insights of how the system is actually performing,
gets, gives us a lot of visibility onto, the application based on
the metrics or, visibility that has basically been configured on
different kinds of applications.
Better visibility gives us faster, issue resolving capabilities.
So that's another reason why observability is, so important.
covering scalability and performance, it ensures smooth auto
scaling and resource allocation.
So as we constantly monitor these, services, servers, applications,
we have better insights.
we are better equipped with making better decisions on how to leverage.
if the system is constantly running at a high load.
It helps us make a decision to, auto scale, to more, to beef
up the, to beef up the system more, in case of low utilization.
We can always ramp down.
So it helps us gain a lot of visibility that eventually helps to design the
system better on the scalability aspect of the architecture as well.
Same goes with improved debugging.
With modern application, modern services, there is tend to be constant
issues, performance issues that we try to identify, bugs, probably any
customer issues, latency related issues.
So with improved monitoring, it helps engineers to identify the issue.
So it helps reduce the mean time to identify the issue as well as
to resolve the issue eventually.
so the turnaround to issue identification to delivering back those
bottlenecks issues to the customer is much more faster with, cleaner
visibility or, higher visibility into these services or applications.
So some of the key observability tools and technologies today in
the modern world is Prometheus.
It's primarily used to scrape metrics from different systems applications.
So it collects metrics as well as it helps to monitor those metrics.
Grafana is another open source, library or, visualization that is really
popular, getting popular with lot of, third party plugins integration.
So it primarily helps in, creating visualizations, Based on the metrics
that's in, some of these, backends like Prometheus or Influx db.
So it helps us query, those metrics continuously have some alerting in
place to allow the user in case there are, some anomalies or something
that exceeds a certain, threshold.
One thing that's becoming popular is with OpenTelemetry.
Earlier, pre OpenTelemetry, there were multiple vendors in the market.
Everybody had their own API or own agents that were Constantly scraping
metrics or, collecting metrics from the applications or systems, building
a schema that is vendor specific and sending it to their own backends.
So once the user or once the organization is, Is starting
to use that specific vendor.
They're pretty much married to that vendor So in order to port to migrate
to a different customer to different vendor, sorry It's extremely hard.
There is very limited vendor portability.
So as an industry everybody came up with some Standards on how we
collect not how we collect some of these metrics at the same time how
we reported back onto the back ends.
So the back end has a defined schema, which is what is
included in open telemetry.
It also provides a standardization for tracing across different, applications
or different tiers of an application, to overall stitch, how the request
is flowing from one tier to another or one system to another system.
and.
Giving us a lot more insights into, the overall end to end picture.
So instrumentation is actually, the process to instrument
the applications via agents.
So OpenTelemetry is really popular.
So Eager is again, very popular with OpenTelemetry.
it is one of the backends where, when you use the agents, you can export
it back onto the Ager's platform.
So different applications, once it reports into, the Ager exporter, Ager is, has a
really smart way to constitute all those different backends and present an overall
end to end picture of how the request is actually flowing within the system,
trying to, create those dependencies.
onto the system.
Next is Elastic Stack.
It's mostly on log aggregation and analytics.
Helps in collecting the logs, providing some analytics on top of it.
regarding unified monitoring, the centralized visibility across cloud
native environments is very important.
proactive issue detection Adaptable to dynamic workloads, especially on
scalability front is very important Also when we say cloud native it has
to be very DevOps friendly because modern cloud native Software is being
delivered iteratively or every two, three weeks, maybe once a quarter.
So it has to be DevOps friendly.
So there should be some kind of continuous integration, continuous deployment model.
So the best practice in that is use an open standard, which is open telemetry.
Use some Grafana dashboards.
For visualizations, have some automation in place to, to find
anomalies, based on the trend that is being observed on the metrics.
Any spikes, anomalies should be quickly detected and reported back.
to potentially find any bottlenecks or performance issues.
So some of the real world use cases, I said, as I mentioned,
Netflix uses open telemetry for the distributed tracing and microservices.
So all the microservices.
Interdependencies is measured using open telemetries to, to identify potential
bottlenecks in across, handling the overall requests at which specific
microservice there is potential issue.
Same with Uber, it implements Ager for end to end transaction monitoring.
Airbnb uses Prometheus and Grafana for real time performance monitoring.
Challenges.
These systems keep getting bigger and bigger.
There are more number of metrics being available on, on a yearly basis.
It keeps increasing.
So there is huge volumes of telemetry data that gets exported.
So the cost goes up, as well as different cloud environments has.
It has different, ways of presenting the data, different kinds of metrics.
So there is standard on the collection part, but there is no
standard on the visualization front.
So that is.
Future trends, AI driven anomaly detection is a big thing right now.
most of the observability organizations like vendors are primarily focused
on, on, on how to leverage AI, to use.
AI has a pass through for every metric or have some kind of machine learning
models to, to constantly scrape data, analyze, and then, report any anomalies.
So that is excited to see how that's going to shape up.
Similar with more widespread adoption of eBPF for deep observability, as
well as increased integration of observability with security tools.
So when it comes to security services, not many.
Not much of it is, observable or there is a lot of restrictions.
So in the future, how it turns out for especially security related tools is
going to be something to watch out.
So that's pretty much it.
Thank you so much.