Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hello everyone.
This is working as c network security engineer at SoFi.
Today I'll be talking about a critical but of an overlooked
security challenge that is threat modeling for observability systems.
While we have become experts at securing our applications and infrastructure.
Monitoring and observability tools often operate in a security blind spot.
This presentation will show you how to identify, prioritize, and
mitigate security risks that could turn your observability systems from
security assets into attack vectors.
So understanding the observability blind side, let's start with
the uncomfortable truth.
Adversaries are strategically targeting the gaps in our observability coverage.
So think about it from an attacker perspective.
If you can't see it, you can't defend it.
The first major issue is that threat actors methodologically exploit
areas where observability coverage is insufficient or non-existent.
They're not randomly probing.
They're specifically looking for the dark corners of your infrastructure where their
activities won't be logged or monitored.
The second thing, your observability data itself becomes intelligence for
attackers, those publicly accessible metric logs and prices that help you
troubleshoot, they inadvertently reveal your system architecture, service
dependencies, and potential entry points.
It's like leaving a blueprint of your house on the front porch.
Finally.
Requires continuous reassessment.
Even if you have robust observability today, hidden vulnerabilities
evolve with your system.
That dashboard you deployed six months ago might now be exploring sensitive data.
You didn't anticipate the traditional observability triad.
So before we dive into the security threats, let's establish our foundation.
The traditional observability thread consists of three pillars.
Metrics provide numerical representations of the system behavior over the time.
These include performance indicators, resource utilization, and error rates.
They're great for understanding trends and setting All alerts.
Logs are timestamped records of discrete events within your system
so they can capture error messages, access attempt, and state changes.
Logs are your detailed forensic trial.
So the traces show to end-to-end request flows across the distributed systems.
The real service dependencies, latency, bottlenecks, and errors
propagate through your architecture.
Here's the security challenge.
Each of these pillars can become an attack vector, if not properly secure.
Your metrics can leak architectural information.
Your logs might contain install data, and your traces can reveal
business logic and the data flows.
So coming to the security risks in observability pipelines,
now let's examine where the vulnerabilities typically emerge.
In observability pipelines, I want you to think of your observability
infrastructure as a three layer cake.
Each layer represents un security challenge at the collection points.
Unsecured agents and collectors expose entry points into your infrastructure.
We see outdated agent software running with excessive privileges.
And transmitting data over unencrypted challenge channels.
So these agents often have deep access to your systems to collect telemetry,
making them attractive targets.
The transport layer is where the data is in transit, becomes vulnerable to
interception, missing TLS encryption, poor certificate management, and weak
cipher suits create opportunities for man in the middle attacks.
Remember, this data is flowing continuously across your network.
Finally, storage systems create centralized high value targets.
Insufficient access controls means that compromising that your monitoring
system could expose your data from across your enter infrastructure and
patch vulnerabilities in time series databases and improper data contingent
policies compound these risks.
Now moving to the threat actor motivations, understand why attackers
target observability systems help us defend more effectively.
Let me walk you through four primary motivations, recognizance.
Is often your first goal.
Attackers leverage exposed metrics to map your infrastructure.
They can identify several locations, software versions, and potential entry
points through misconfigured dashboards.
Your Grafana dashboard might betly showing them exactly what they
need to plan their attack in.
The data.
Exfiltration becomes possible when sensitive information
gets embedded in logs.
PII secrets the access to against accidentally logged
to create both compliance violations and security breaches.
I've seen API keys database credentials, and potential data flowing through
log strings in plain text alert actic in a particular indic tactic.
Three T deliberately trigger the false positives to overwhelm your alert systems.
When your team is dealing with hundreds of false arounds,
they'll miss the one real security incident happening simultaneously.
Finally, leaving off the land attacks.
Use your legitimate observability tools as attack infrastructure.
Monitoring systems with elevated privileges becomes perfect
vehicles for lateral moment.
The attacker doesn't need to bring their own tools.
They use yours.
So mapping the attack surfaces, coming to the effective threat
modeling requires systemic, systematic attack surface mapping.
This isn't a one time exercise.
It's an ongoing process that evolves with your infrastructure.
Start by identifying assets.
Document all observability components in your architecture.
Map the data flow between the collectors, processors, and the visualization tools.
Include a third party services cloud monitoring tools and custom dashboards.
Next, define the trust boundaries.
Establish a very aware the data crosses security domains, and
determine which components have privileged access to sensitive systems.
Your Prometheus server might need access to multiple environments.
That's a trust boundary, which worth scrutinizing.
Enate entry points catalog all the interfaces exposed
to buy your monitoring tools.
Consider the API endpoints dashboards, agent communication
channels, and web hook receivers.
Each one is potential attack vector.
Finally, prioritize the threats based on the potential impact and likelihood
focus your remediation efforts on critical observability components.
First, a compromised central logging system has different
implications than a compromised edge.
Monitoring agent monitor.
Now let's dive into this stride for observability systems.
Let's apply the Stride Threat modeling framework specifically
for observability systems.
This table shows how each stride category manifests in monitoring infrastructure and
provide targeted mitigation strategies.
Poofing in observability means false metric injection.
Attackers could send fabricated metrics to SCU via monitoring
data or hide their activities.
Strong authentication for all agents prevents unauthorized data submission.
Tampering involves modify telemetry data in transit or addressed cryptographic
integrity checks, ensures data hasn't been altered between collection and
storage repudiation covers, or when.
Audit logs are deleted or modified, so the immutable logging pipelines prevent
attackers from covering their attacks.
Information disclosure happens when sensitive metrics are
exposed to the unauthorized users.
Strict access controls on dashboards and APIs prevent the data leakage.
Denial of service.
Targets your observability infrastructure itself overloaded.
Collectors can blind you to actual attacks.
Rate limiting and redundancy.
Maintain monitoring availability.
Elevation of privilege often involves compromised monitoring agents
that have excessive permissions.
The least privileged principle limits the damage from the agent.
Compromise.
Now, let's.
Dive into real world attack scenarios.
Let me walk you through a real world attack scenario that
demonstrates how observability system can become the attack vector.
And then rather than the defense mechanism, the attack begins
in initial access through an outdated Grafana instance.
The attacker exploits unknown vulnerability and gains access
to public dashboards that expose internal infrastructure details.
From those dashboards, they can learn about your network topology
and service dependencies, and also the tech technology stack.
Next comes the credential theft.
The attacker discovers that API keys are being logged in the plain
text in your application logs.
Because your monitoring service accounts have excessive permissions,
perhaps they were granted broad access for easier configuration.
These credentials provide significant access.
Prometheus server then becomes the.
Private point for lateral moment because it needs to scrape metrics from
your, from across your infrastructure.
It has network access to multiple environments.
This attackers uses the legitimate access to move from your monitoring
network into production systems.
Finally, the data exfiltration occurs through custom metric
queries that extract sensitive data.
The beauty of this.
Attack from adversary perspective is that the ex exfiltration blends
perfectly with normal monitoring traffic.
Your network monitoring tools see legitimate Prometheus
queries, not the data theft.
Now the coming to the securing observability pipelines.
Now let's establish security baselines for observability pipelines.
These are in the suggestions.
They're the requirements for secure monitoring.
100% encrypted telemetry.
All the monitoring data must be used.
Must use the TLS in transit, no exceptions.
I've seen too many environments where metrics flow unencrypted because
it's just internal monitoring data.
Two-factor authentication for dashboards access.
Requires multi-factor authentication for all the monitoring in interfaces.
Your observability dashboard contains as much as sensitive information
as your production systems.
So 30 day maximum rotation schedule for monitoring credentials, service
accounts, and the API case should rotate regularly long-lived Credentials
in monitoring systems are security risks, zero secrets and logs.
The tolerable number of credentials in telemetry data is zero.
Implement log scrubbing and the secret detection to prevent the
accidental credential exposure.
Now let's dive into building the alert resilience.
Alert resilience is crucial because attackers often target your notification
system to hide their activities.
Here's how to build robust alerting, implement signal to noise filtering
by developing correlation rules that reduces false positives.
Group related alerts to prevent the alert.
And when your team stops trusting alerts, attacker swings, establish
alert tires with severity based routing For notifications, critical
security alerts mean bypassing normal throttling mechanisms.
Your security instance shouldn't get lost in the queue of performing alerts,
performance alerts, employ anomaly detection in machine learning based.
Detection in unusual patterns.
Baseline normal behaviors before deploying and alerting.
The attackers often operate within normal parameters to avoid detection.
Most importantly, protect your alert mechanism themselves.
Treat notification system as critical infrastructure.
Secure your slack web books, email servers, and paging systems with the
same rigor as your public applications.
Now let's dive into the action plan from Blind Sports to insight.
Let me leave you with a concrete action plan to transform your
observability from a potential liability into a security asset.
First, conduct an observability threat assessment.
Map your current monitoring architecture using the frameworks we discussed today.
Identify all the components, data flows, and the trust boundaries.
Secondly, the.
To implement the security control systematically secure the data at
collection, transport, and storage phases.
Don't try to do everything at once.
Prioritize based on risk.
Third, integrate with security operations.
Align your monitoring and security teams.
Your observability data should feed your security operation center, and
your security team should help secure your monitoring infrastructure.
Finally measure your security effectiveness.
Track security metrics for continuous improvement, monitor failed
authentication, attempts on dashboards, credential rotation for compliance, and
the time to detect your security events.
The goal is transformation.
Turn your observability infrastructure from a blind spot
into a security force multiplier.
So thank you for your attention.
Remember, secure Observability isn't just of protecting your monitoring tools.
It's about ensuring that your ability to detect and respond to threat remains
intact when you are needed it the most.
Thank you all for paying attention to this.