Threat Modeling for Observability: From Blind Spots to Actionable Insight

Video size:

Abstract

Think your observability stack is secure? Think again. This talk shows how threat modeling reveals hidden risks in logs, metrics, and traces. Learn to spot blind spots, think like an attacker, and build observability systems that are not just insightful—but secure by design.

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.

Hello everyone. This is working as c network security engineer at SoFi. Today I'll be talking about a critical but of an overlooked security challenge that is threat modeling for observability systems. While we have become experts at securing our applications and infrastructure. Monitoring and observability tools often operate in a security blind spot. This presentation will show you how to identify, prioritize, and mitigate security risks that could turn your observability systems from security assets into attack vectors. So understanding the observability blind side, let's start with the uncomfortable truth. Adversaries are strategically targeting the gaps in our observability coverage. So think about it from an attacker perspective. If you can't see it, you can't defend it. The first major issue is that threat actors methodologically exploit areas where observability coverage is insufficient or non-existent. They're not randomly probing. They're specifically looking for the dark corners of your infrastructure where their activities won't be logged or monitored. The second thing, your observability data itself becomes intelligence for attackers, those publicly accessible metric logs and prices that help you troubleshoot, they inadvertently reveal your system architecture, service dependencies, and potential entry points. It's like leaving a blueprint of your house on the front porch. Finally. Requires continuous reassessment. Even if you have robust observability today, hidden vulnerabilities evolve with your system. That dashboard you deployed six months ago might now be exploring sensitive data. You didn't anticipate the traditional observability triad. So before we dive into the security threats, let's establish our foundation. The traditional observability thread consists of three pillars. Metrics provide numerical representations of the system behavior over the time. These include performance indicators, resource utilization, and error rates. They're great for understanding trends and setting All alerts. Logs are timestamped records of discrete events within your system so they can capture error messages, access attempt, and state changes. Logs are your detailed forensic trial. So the traces show to end-to-end request flows across the distributed systems. The real service dependencies, latency, bottlenecks, and errors propagate through your architecture. Here's the security challenge. Each of these pillars can become an attack vector, if not properly secure. Your metrics can leak architectural information. Your logs might contain install data, and your traces can reveal business logic and the data flows. So coming to the security risks in observability pipelines, now let's examine where the vulnerabilities typically emerge. In observability pipelines, I want you to think of your observability infrastructure as a three layer cake. Each layer represents un security challenge at the collection points. Unsecured agents and collectors expose entry points into your infrastructure. We see outdated agent software running with excessive privileges. And transmitting data over unencrypted challenge channels. So these agents often have deep access to your systems to collect telemetry, making them attractive targets. The transport layer is where the data is in transit, becomes vulnerable to interception, missing TLS encryption, poor certificate management, and weak cipher suits create opportunities for man in the middle attacks. Remember, this data is flowing continuously across your network. Finally, storage systems create centralized high value targets. Insufficient access controls means that compromising that your monitoring system could expose your data from across your enter infrastructure and patch vulnerabilities in time series databases and improper data contingent policies compound these risks. Now moving to the threat actor motivations, understand why attackers target observability systems help us defend more effectively. Let me walk you through four primary motivations, recognizance. Is often your first goal. Attackers leverage exposed metrics to map your infrastructure. They can identify several locations, software versions, and potential entry points through misconfigured dashboards. Your Grafana dashboard might betly showing them exactly what they need to plan their attack in. The data. Exfiltration becomes possible when sensitive information gets embedded in logs. PII secrets the access to against accidentally logged to create both compliance violations and security breaches. I've seen API keys database credentials, and potential data flowing through log strings in plain text alert actic in a particular indic tactic. Three T deliberately trigger the false positives to overwhelm your alert systems. When your team is dealing with hundreds of false arounds, they'll miss the one real security incident happening simultaneously. Finally, leaving off the land attacks. Use your legitimate observability tools as attack infrastructure. Monitoring systems with elevated privileges becomes perfect vehicles for lateral moment. The attacker doesn't need to bring their own tools. They use yours. So mapping the attack surfaces, coming to the effective threat modeling requires systemic, systematic attack surface mapping. This isn't a one time exercise. It's an ongoing process that evolves with your infrastructure. Start by identifying assets. Document all observability components in your architecture. Map the data flow between the collectors, processors, and the visualization tools. Include a third party services cloud monitoring tools and custom dashboards. Next, define the trust boundaries. Establish a very aware the data crosses security domains, and determine which components have privileged access to sensitive systems. Your Prometheus server might need access to multiple environments. That's a trust boundary, which worth scrutinizing. Enate entry points catalog all the interfaces exposed to buy your monitoring tools. Consider the API endpoints dashboards, agent communication channels, and web hook receivers. Each one is potential attack vector. Finally, prioritize the threats based on the potential impact and likelihood focus your remediation efforts on critical observability components. First, a compromised central logging system has different implications than a compromised edge. Monitoring agent monitor. Now let's dive into this stride for observability systems. Let's apply the Stride Threat modeling framework specifically for observability systems. This table shows how each stride category manifests in monitoring infrastructure and provide targeted mitigation strategies. Poofing in observability means false metric injection. Attackers could send fabricated metrics to SCU via monitoring data or hide their activities. Strong authentication for all agents prevents unauthorized data submission. Tampering involves modify telemetry data in transit or addressed cryptographic integrity checks, ensures data hasn't been altered between collection and storage repudiation covers, or when. Audit logs are deleted or modified, so the immutable logging pipelines prevent attackers from covering their attacks. Information disclosure happens when sensitive metrics are exposed to the unauthorized users. Strict access controls on dashboards and APIs prevent the data leakage. Denial of service. Targets your observability infrastructure itself overloaded. Collectors can blind you to actual attacks. Rate limiting and redundancy. Maintain monitoring availability. Elevation of privilege often involves compromised monitoring agents that have excessive permissions. The least privileged principle limits the damage from the agent. Compromise. Now, let's. Dive into real world attack scenarios. Let me walk you through a real world attack scenario that demonstrates how observability system can become the attack vector. And then rather than the defense mechanism, the attack begins in initial access through an outdated Grafana instance. The attacker exploits unknown vulnerability and gains access to public dashboards that expose internal infrastructure details. From those dashboards, they can learn about your network topology and service dependencies, and also the tech technology stack. Next comes the credential theft. The attacker discovers that API keys are being logged in the plain text in your application logs. Because your monitoring service accounts have excessive permissions, perhaps they were granted broad access for easier configuration. These credentials provide significant access. Prometheus server then becomes the. Private point for lateral moment because it needs to scrape metrics from your, from across your infrastructure. It has network access to multiple environments. This attackers uses the legitimate access to move from your monitoring network into production systems. Finally, the data exfiltration occurs through custom metric queries that extract sensitive data. The beauty of this. Attack from adversary perspective is that the ex exfiltration blends perfectly with normal monitoring traffic. Your network monitoring tools see legitimate Prometheus queries, not the data theft. Now the coming to the securing observability pipelines. Now let's establish security baselines for observability pipelines. These are in the suggestions. They're the requirements for secure monitoring. 100% encrypted telemetry. All the monitoring data must be used. Must use the TLS in transit, no exceptions. I've seen too many environments where metrics flow unencrypted because it's just internal monitoring data. Two-factor authentication for dashboards access. Requires multi-factor authentication for all the monitoring in interfaces. Your observability dashboard contains as much as sensitive information as your production systems. So 30 day maximum rotation schedule for monitoring credentials, service accounts, and the API case should rotate regularly long-lived Credentials in monitoring systems are security risks, zero secrets and logs. The tolerable number of credentials in telemetry data is zero. Implement log scrubbing and the secret detection to prevent the accidental credential exposure. Now let's dive into building the alert resilience. Alert resilience is crucial because attackers often target your notification system to hide their activities. Here's how to build robust alerting, implement signal to noise filtering by developing correlation rules that reduces false positives. Group related alerts to prevent the alert. And when your team stops trusting alerts, attacker swings, establish alert tires with severity based routing For notifications, critical security alerts mean bypassing normal throttling mechanisms. Your security instance shouldn't get lost in the queue of performing alerts, performance alerts, employ anomaly detection in machine learning based. Detection in unusual patterns. Baseline normal behaviors before deploying and alerting. The attackers often operate within normal parameters to avoid detection. Most importantly, protect your alert mechanism themselves. Treat notification system as critical infrastructure. Secure your slack web books, email servers, and paging systems with the same rigor as your public applications. Now let's dive into the action plan from Blind Sports to insight. Let me leave you with a concrete action plan to transform your observability from a potential liability into a security asset. First, conduct an observability threat assessment. Map your current monitoring architecture using the frameworks we discussed today. Identify all the components, data flows, and the trust boundaries. Secondly, the. To implement the security control systematically secure the data at collection, transport, and storage phases. Don't try to do everything at once. Prioritize based on risk. Third, integrate with security operations. Align your monitoring and security teams. Your observability data should feed your security operation center, and your security team should help secure your monitoring infrastructure. Finally measure your security effectiveness. Track security metrics for continuous improvement, monitor failed authentication, attempts on dashboards, credential rotation for compliance, and the time to detect your security events. The goal is transformation. Turn your observability infrastructure from a blind spot into a security force multiplier. So thank you for your attention. Remember, secure Observability isn't just of protecting your monitoring tools. It's about ensuring that your ability to detect and respond to threat remains intact when you are needed it the most. Thank you all for paying attention to this.

Slides

Download slides (PDF)

See all 61 talks at this event!

Conf42 Observability 2025 - Online

June 05 2025 - premiere 5PM GMT

Threat Modeling for Observability: From Blind Spots to Actionable Insight

Video size:

Abstract

Summary

Transcript

Slides

Srikanth Potla

Senior Product Security Engineer @ Sofi

Join the community!

Featured event

2026

2025

Info

Conf42 Observability 2025 - Online

June 05 2025 - premiere 5PM GMT

Threat Modeling for Observability: From Blind Spots to Actionable Insight

Video size:

Abstract

Summary

Transcript

Slides

Srikanth Potla

Senior Product Security Engineer @ Sofi

Join the community!