OpenTelemetry or OpenTeleMessy? Solving Observability Problems While Creating New Ones
Video size:
Abstract
OpenTelemetry promises unified observability, but often delivers chaos. This talk explores real-world lessons in taming OTel’s complexity—highlighting hidden pitfalls, what works, what doesn’t, and how to avoid turning your telemetry into a Telemessy nightmare.
Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hi everyone.
I'm Marmo.
I'm currently working as a senior software engineer at Nvidia, and
I'm very excited about this talk.
And the topic is open telemetry or open tele solving absorbability
problems for creating new ones.
Let's get started.
The agenda of this topic of the talk is I will start with the basic introduction
about mental, elementary, and later we'll into the promise and kind of issues solve.
And later we'll go over the reality of open telemetry and the problems
the new problems it creates, and and we'll also discuss about whether does
Open Telemetry makes sense and the strategies for taming some of the
issues the open telemetry opens up.
And finally, the feature outlook and the conclusion.
So at first the introduction open Tele is a collection of a PS and SDKs that
would help users to instrument generate and collect and export the tele data.
It covers all basic absorbability signals like metrics, logs, and traces.
If you look into the picture at the left hand side the open provides a different
a PS SDKs to instrument application.
At the middle, you have a open Telemetry character.
The Open Telemetry character is very powerful aggregator that could basically
collect your metrics and send to a different absorbability backends.
For example ME metrics can be sent to some of the third party vendors
like graph and Cloud Datadog.
Similarly, locks can be sent to Elasticsearch Splunk tracers could
be sent to different other vendors.
And the promise and the core problems on the open solves.
The first most important is the vendor lock in elimination
pre open telemetry world.
Every third party vendor has a different proprietary agents for the SDS APIs
and Open Telemetry solve this problem.
So you can basically instrument once your application and you can send
to whatever third party vendor or whatever backend you wanna send to.
And the other problem it solved is unified standards.
It standardized different open standards like it combined open tracing and open
sensors, and it also standardized tele collection across different languages.
And the other problem it solved is complete signal coverage.
As I mentioned before it supports instrumenting and collecting
metrics, traces, and logs.
And also it solves the problem of co correlating these
signals to analyze the deeper.
And the other problem it solved is suppression of con concerns.
It decoupled meta collect telemetric collection from the analysis.
Yeah, as I mentioned, the open telemetry is very powerful aggregator
and you can also choose any compatible backend you want to choose.
And the other important problem it solved is auto instrumentation
before open telemetry here.
If you wanna, instrument your code to collect different signals.
You need to write a lot of boiler plate and custom code and open telemetry.
In many of the cases it provides auto instrumentation capabilities.
And instead of writing any code, you can just start exporting over telemetry.
And it also had very good community adoption.
It's one of the second largest active CNCF project, and it has
very active roadmap given all the advantages and problems it solved.
It also contributed to some new problems.
So in reality users and community adapters have been facing different problems.
The first one is technology, ma if you look at, to if you look at the open
telemetry documentation if you want to get started with you'll be worried with the
different technologies like SDKs and APIs.
So it it has too many layers of indirection and abstractions and also
architecturally it's quite complex.
Each component has its own unique configurations and different
a lot of components, lot of exporters, process tickets.
It's not very easy to get started.
And the other problem is the language implementation in inconsistencies.
So one of the pro, one of the problems open telemetry solve is if
you, and if your dev environment is polyglot, it unifies how you collect,
how you instrument your telemetry.
But in reality when you are implementing augment Elementary in your environment.
There are tricks and trips.
Each language has, and there are implementation
consistencies inconsistencies.
And the other problem is the maturity spectrum.
Different components are at different maturity levels.
For example if you consider the broader signals tracing is the most most matured.
And later comes the metrics and the logging.
Logging is very recent entrant.
And it was only when, only recently.
And also getting sorted, decided as a top barrier to adoption.
And also, yeah, it's very difficult to debug when something goes wrong.
Problems across instrumentation, col collection, export.
And other is the character configurations.
So you'll be basically, you'll see a very big a ML configurations
and mismatched component names.
And sometimes you need to configure our own authentication.
And they will see lots of connect connectivity issues.
So if you want to get something resolved with the community and if
you are trying to file an issue you'll be bonds between a different reports.
So given the problems it solves and the kind of implementation challenges we have
seen when does Open telemetry make sense?
So it makes sense when you have, when when you are organization has a very large and
diverse technology stacks and you wanna standardize across these environments.
And it helps when value exceeds the implementation complexity.
So you need to take a call and look at the tradeoffs there.
And also if your automation is looking for a very strong rental, neutral
telemetry collection and exporters, then open telemetry makes sense.
It would be a very long-term investment and you will have a lot of flexibility
in the observability providers.
It also makes sense.
Very very makes sense.
In the cloud data environments, it is very ideal for distributed systems
under monitoring container containerized applications like Kubernetes.
It also provides it makes sense where you have data flexibility requirements
where you need to have good control over your data collection and the routing.
And they also need the filtering capabilities to reduce nice and costs.
And you wanna add custom tag for your automation searching.
So how do you strategize and obtain the tele issues?
So you can start with auto instrumentation options.
Big, you can begin with language specific auto instrumentation.
For example, in E case E case is AWS managed Kubernetes service.
So it has open telemetry characters that they can use by default.
And also you can start with the more matured signals like as I
mentioned before tracing is the most matured open telemetry signal.
So you can start with that and later and expand to metrics and logs and also try to
follow the latest best practices leverage semantic connections and optimize the
attributes and the labels you're using.
And further faster the troubleshooting.
You can also use the local exporters.
It also makes sense to do implementation instead of going for one big bank
adoption, it can start small and iterate.
You can focus on critical user flow at first and expand
gradually as an expertise grows.
And also we can depend on the community resources.
Instead of only depending on documentation I can look into community resources
like blogs and different talks.
You can depend on those.
And also vendor distributions can for adoption, for example, recently Grafana
Cloud has come up with their own open telemetry folk called Grafana Ally.
So it makes implementation adoption little easier.
Also, there are lot, lots of active community support and extinctions.
So we can make use of those to come out to overcome some of the
challenges and some of the tele.
The open telemetry creates our future outlook and beyond there
are lots of promising developments.
One is semantic is reaching stability.
This is very promising development and the Open Telemetry collector
is also approaching version one.
This is a very powerful aggregator and besides tracing logs and metrics
profiling has been becoming very popular.
Pen telemetry is also adapting that signal and it's planning to support it.
And gene Absorbability is also integration within Open Telemetry ecosystem.
It's also promising the ongoing challenges.
There are still many challenges like documentation problems
complexity management and standardizing different test cases.
Given all this open telemetry is still the number one solution
for your observable needs.
If you're looking for a strong go neutral all comprehensive solution.
Thank you.
Thank you all for your attention.