Conf42 Internet of Things (IoT) 2025 - Online

- premiere 5PM GMT

Taming IoT Data Streams: Lessons in Managing Scale, Complexity, and Trust

Video size:

Abstract

IoT devices generate massive streams of data, but the challenge is making that data reliable, secure, and usable at scale. Engineers face messy pipelines, outages, and governance needs. By improving how we manage streaming systems like Kafka, teamsfocus on building IoT apps that deliver real impact.

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Everyone, my name is Ramad Rell and I'm here today to talk to you a little bit about iot data streaming and the complexities that come alongside with it. So let's go ahead and get started. So let's talk about IO ot, right? The rise of IO OT is actually transformed the way we think about data today. We've got billions of connected devices, everything from wearables to industrial sensors that are generating streams of events that must be captured, processed, and trusted in real time. For many engineering teams, moving from devices, producing data to data producing value is where the hard problems begin. And in today's session, we're gonna explore the hidden challenges of IOT data pipelines. Let's go ahead and jump in. So the challenges that we're seeing today, they typically stem from the unique nature of the data source rather than just standard big data problems. Key hidden challenges include ensuring data integrity, managing device heterogeneity, and navigating complex security and compliance issues. So how do you make streaming data reliable? When devices are noisy, unpredictable, or go offline? How do you enforce security, governance and data quality without slowing innovation? And how can platform engineers give developers self-service access to streaming data without creating ca chaos? So let's dig into it with challenge number one. The first hidden challenge is that sensor data is inherently unreliable. It's noisy, and it can be inaccurate. So sensors can produce erroneous, irrelevant, or noisy data due to environmental factors such as temperature of fluctuations vibrations. There could be hardware limitations or even just simple malfunctions, but this creates a lot of added complications, right? The data that comes off the sensor is used for multiple business applications. So when the data that's coming off those devices is unreliable, it's gonna have a entire domino effect of negative consequences. So let's go ahead and look at the manufacturing industry, for instance. In this environment, data could be coming off of different robotic arms. One robot may be placing windows on cars. Another may be painting the car, and a third would be placing the tires on the car. So what happens when the number of windows placed is incorrect and that data that's being captured is incorrect? This throws off data accuracy completely. If the transmitted data says 500 windows were placed on the car, but in reality is 5,000 windows were placed on the cars. The chain of command of the rest of the robots is off. The data that goes into the procurement and shipping department is off. The data that is exchanged with the manufacturer is wrong. This calibration drift is throwing off the entire supply chain. Then you've got not only calibration issues but you also have missing data or packet loss, and they can have a similar effect. This is typically, takes place when you have inconsistent network connectivity issues, but this loss data leads to an incomplete picture of what is actually taking place, and that missing data can lead to significant disruptions all the way from production halts to compromise quality control and safety issues, which in turn could cost the missus millions of dollars. So now as teams moved away from, legacy architectures and ET l and broken, or I guess you could say one-off integrations and onto data streaming, they did it because there was new business models that need to be fed this new data. And with the legacy systems, the data was always getting stuck. You gotta ask yourself what's worse, the data getting stuck or transmitting inaccurate data. So both of these things can cause business disruption and it can cause the business a lot of money. So let's dig in, or we will be digging in just a few minutes on how we can solve that. But first, let's get into the second hidden challenge of managing iot streaming, right? And this next thing is about managing device heterogeneity. So the primary goal of managing this diversity is to ensure that all devices can work together seamlessly and efficiently towards a common objective without being bottleneck by the least capable component. What does that mean? While you need a multifaceted technical approach here to tame the complexity of all these different systems. You would need to ensure interoperability, allowing for different systems to communicate and exchange information effectively using standardized interfaces and protocols, you would have to optimize performance by assigning tasks of the most suitable processor, or adapt the workloads to improve scalability and flexibility. And of course, you would have to handle different challenges like addressing issues with varying performance. Security vulnerabilities arising from complexity and power constraints in different devices. But the reality is that there is a system, there are system management challenges as well, such as. Performance prediction accurately predicting the performance of an application on a complex heterogeneous system is very difficult. You have those increased security vulnerabilities, the increased complexity in shared resources and heterogeneous systems expands the potential attack service. And then you get data integration and consistency in environments involving multiple data sources, whether it's databases, iot, sensors. Challenges conclude everything from schema mismatches, data inconsistency and ensuring data accuracy and consistency across all platforms. So the final challenge of streaming IOT data is, being able to look at data providence as enterprises expand their footprint in terms of geography, programming, language languages, deployment types, data types, and data points of entry. It's. Becoming harder to trust the data's integrity, or even establish a variable chain of custody in regulated environments. Who actually gathered that data? When was it gathered? Where was it gathered from? What did the handoff look like? What server was used? Was it was the server actually secure? What systems were involved? Which users had access to it? Did they transform it? Was the data tested who touched it last? It's crazy because data Pro, pro providence data provenance is more necessary than ever before. It ensures that there is no unauthorized access, alteration, or contamination of data. It verifies the origin of the data. It ensures that audits will be passed and paints a clear picture of who is liable for the data at any given point in time. Talked about a lot of different challenges. What are the solutions? How do you tame this iot giant that has invaded the streaming world? As iot data grows and flows continuously from an expanding universe of devices and system, the real challenge becomes achieving true end-to-end visibility across your entire data stream. This is where discipline data management stops being optional and becomes essential. It's where chaos and failover testing turn into a routine practice. Where fast data exploration becomes a critical workflow and where security and governance are designed into the system, not patched on later, this is where the shift should happen. This is where your team shifts from reacting to emergencies to starting to anticipate them. So what do you need in place to be able to anticipate any of the emergencies that might come up? How do you deal with these hidden challenges when it comes to data integrity, device heterogeneity, and data provenance? You need an end-to-end solution. You need something that's going to give you a unified look, a unified control plane to unlock all of Kafka's full potential, a platform that provides that single pane of glass so you can see the entire ecosystem. Transforming it into a managed, secure, and efficient enterprise asset, and that's how Kafka streaming should be used. So what does that start with? You've got to have this centralized management Kafka component. What that means is you can see there's four components of managing data and there's intelligence scale of how you use that data. So when we're talking about the core components of managing data, that's everything that you want to see in one central place from topics, schemas, consumer groups, Kafka Connect, et cetera. And at the same time, you want to be able to add intelligence at scale because there is so much data. So you can have metadata and labels to make resources easy to discover and organize while real time cluster and broker monitoring provides that immediate insight into health and performance. You also need to be able to give developers what they need so they can move quickly. So how do you do that? You need something that has self-service tooling in it. So they can abstract away the complexity that comes with Kafka, but you can provide developers with the tools they need to build tests and debug applications quickly and safely without needing to be experts on Kafka. Of course, it's important that we need to have end to end zero trust framework when it comes our data in motion, right? You need data quality and governance. You need to be able to have traffic. Traffic and cluster protection and make sure that you know who's entering and using the data and who has access to what data when it comes to ensuring compliance and protecting your most critical data streams. How can you do that? So how do you get that? Let's break it down. You need to make sure you see who does what, right? So giving, having that granular access control, auditing, and data centric security. So on the left hand side here, it's about, identity and access management teams can now, define find grade role-based access for different users and groups. You can integrate with SSO and existing L DAP or ad systems and automatically manage users over time. Pretty cool. On the right hand side, you've got data protection and auditing. Every user and application is captured in immutable audit logs that can actually be exported. Data is protected end to end with message encryption, including field level and schema based encryption for sensitive data. And when teams need controlled access, you've got dynamic data masking, ensuring that sensitive information is only visible to those authorized users or obligations. Together. This allows organizations to meet security and compliance requirements while still enabling teams to safely self-service streaming data. No bottleneck, right? So how do you actually enforce data quality before data ever becomes a problem, while also giving the enterprise a single source of truth for all streaming data On the left hand side, it's about what you can do. You can ensure data is valid before it enters the system. Teams can define quality gates, enforce schema usage, and apply advanced validation rules to make sure only compliant, well-structured data is produced. If data doesn't meet those standards, it's blocked at the source, preventing bad data from propagating downstream. On the right hand side, it's about maintaining an enterprise wide catalog. This gives teams searchable, centralized view of data streams, topics, and schemas. It also enables secure sharing through data sharing in partner zones and supports chargeback by tracking resource usage by app, by team or application. Together. This allows organizations to scale streaming with confidence, protecting data quality, while improving visibility governance and collaboration across teams. Finally, it's about protecting cluster stability by putting automated guardrails around how data is produced, consumed, and managed on the topic management side, all create or alter requests go through a governed workflow, ensuring changes are reviewed and approved before they impact the cluster. Topics can also be safely renamed without disrupting producers or consumers. For producers policies enforce that all messages have a valid schema and apply rate limiting to prevent runaway producers from overwhelming brokers and causing outages. And on the consumer side, group level policies control how consumers connects and behave limiting group membership, join rates and commit offsets to prevent misbehaving consumers from destabilizing the cluster. Together. These automated policies shift Kafka operations from reactive firefighting to proactive protection, keeping clusters stable as usage scales. So let me leave you with this one final thought. What's your disciplined approach to data management? Do you have a disciplined approach to data management? Iot data streaming is only growing. There's more and more data that is coming down that needs to be used in for different business models and be used to actually drive different decisions. So as the IO OT bubble keeps expanding, do you and your team have the right tools and foundation to ensure that this increasingly fragile bubble doesn't burst? Thanks for your time.
...

Rehmat Kharal

Driving Strategic Growth for Global Customers @ Conduktor

Rehmat Kharal's LinkedIn account



Join the community!

Learn for free, join the best tech learning community

Newsletter
$ 0 /mo

Event notifications, weekly newsletter

Access to all content