Conf42 Python 2023 - Online

Building an IoT Monitoring App with InfluxDB, Python, and Flask with Edge to cloud replication

Video size:

Abstract

The Internet of Things (IoT) is increasingly driven by sensor data, with devices taking measured actions based on everything from wind speed and direction, vital body functions, illumination intensity, and temperature. Learn how to build a functional IoT monitoring application built on InfluxDB.

Summary

  • Anais Dotisgeorgiou: We're going to be talking about building a plant monitoring application with influxDb python flask and with edge to cloud replication. If you have any questions about time series influxDb flask Python building this plant buddy application, I encourage you to reach out with me on LinkedIn.
  • You need at least one IoT sensor for your plant and a breadboard with jump wires and terminal strips. For our plant monitoring application, we decided to also go with four sensors. It's just a way of consolidating data from multiple edge devices to the cloud.
  • Flux is a microweb framework that's written in Python. We're also going to be using influxdb for storage. Telegraph is an open source plugin that is plugin driven for metrics and events. Plotly is a Python graphing library that makes interactive publication quality graphs.
  • Time series data is any data that has a timestamp associated with it. Examples include weather conditions, stock prices, customer monitoring. It appears in almost any application across a lot of different spaces.
  • There was a unique need for databases that can specifically handle time series. You need ways to interact with your time series effectively like you probably want. ideally you have a database that also contains additional features. That's what influxdata and influxDB aims to do.
  • You can either use telegraph or the client libraries to ingest data into your OSS instance. The main reason why we're using telegraph is because most users don't want to take on the burden of learning a new language. And the main reason in cloud flux is being replaced with SQLB.
  • Flux is a data scripting language that comes embedded with influxdB OSS. It allows you to build data pipelines to query, analyze and transform your data. It's kind of JavaScript esque in its syntax, but functionally sort of operates more like pandas.
  • We recently launched influxDB cloud powered by our IoT storage engine. It will allow storage in parquet file format with unlimited cardinality. The new cloud version also supports SQL. The plan is eventually to roll iocs and SQL capabilities to our open source actions.
  • Edge data replication is the process of replicating data from an edge instance of oss to cloud using the edge data replication tool. Using a hybrid solution for our application provides the flexibility to move mundane tasks to the edge. This provides more scope for more interesting analysis and data storage to occur in the cloud.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Everybody, and welcome to today we're going to be talking about building a plant monitoring application with influxDb python flask and with edge to cloud replication, which is an influxDB feature. So as you know already, my name is Anais Dotisgeorgiou and I'm a developer advocate at Influxdata, and I encourage you to connect with me on LinkedIn. So if you have any questions about time series influxDb flask Python building this plant buddy application, which we're going to discussed today, I encourage you to reach out with me there and ask any questions you have. So before we begin, let's go over a quick agenda. So first and foremost, I will be talking about the IoT hardware setup that we use. So this is all the devices that we use to monitor our plants for our plant buddy or our plant monitoring application. Next, I'll be going over the tools we use to build this application. Then I'll be giving an overview of influxdb, followed by a data ingestion setup overview. And then I'm going to talk about flux and sql, which are two languages that you can use to query influxdb. Then I'll follow that with an understanding of how to set up edge to data replication and explain what edge to data replication is. I guess a spoiler, edge to data replication is just the process of replicating data from an OSS edge instance of influxdb to a cloud instance to consolidate your data there. Then we'll talk about the data requests for building the application. And last built, not least, share the code base. And with that, let's begin. Let's begin with talking about the setup for our IoT devices. So this is a diagram of roughly how our plant body, application and system works on the edge. So as you can see, we store and manipulate some of our data here in the open source build, and then we send a down sampled version of that data to our cloud instance. So downsampled data is just the process, or down sampling data is the process of taking high raw precision time series data and then creating lower precision aggregates of that data as sort of to provide a summary of your data. Because oftentimes we don't care about having that high precision data, especially over long periods of time. And it's just a way of consolidating data from multiple edge devices to the cloud. So in order to successfully build this application, you need the following things. The first and foremost is a plant, preferably, although maybe you could monitor your pet instead. And we used a particle board microcontroller for this. But you could use any other compatible microcontroller. You need at least one IoT sensor for your plant and a breadboard with jump wires and terminal strips. So here's a look at the breadboard schematics. So these schematics are for hooking up our four sensors to our breadboard. And this diagram is just to help break down what ports the microcontrollers and the sensors are connected to. And hopefully this will make it easier to set up the exact same setup or a similar one if you're not familiar with microcontrollers. And for our plant monitoring application, we decided to also go with four sensors. The first sensor monitors temperature and humidity. The second sensor monitors light, then soil moisture, and then temperature. So you'll notice that all of those measurements are all time series data, which is why we use influxdB to store that, because influxdB is a time series database, and so that's why it's a good use case for it. So now let's next talk about the tools that we use to build our application. So at the front and center is flask. So flask is a microweb framework that's written in Python, and it's going to be doing the heavy lifting for this project and help run our local application and routing. We're also going to be using influxdb for storage. So influxdb is a time series data storage engine, but it's also much more than that. It contains APIs and various tools for working with real time data and applications, and it also has a massive community and ecosystem. There are forums, slack channels, and Reddit where you can go and get community support. Myself, I will be on those channels helping people as well as other developer advocates and engineers. There is a ton of support on GitHub as well, and all of our products have open source offerings. So that's influxdb in a nutshell. And it's an ideal solution for storing our sensor data because it is a time series database and our data is time series data. And then we will be using telegraph, which is the other product that influxdata, the creators of influxDB creates. And telegraph is an open source plugin, sorry, an open source collection agent that is plugin driven for metrics and events. There are over 300 plugin options to choose from. So if you have the task of ingesting data from a source and sending it somewhere else, while taking advantage of buffing and caching capabilities and other agent control capabilities with a lightweight agent that is open source, go check out telegraph. It's a pretty cool tool, and it's very simple to configure. It's configurable in a single ToML config file and downloadable in a single binary, as well as influxdb oss as well. Next we'll talk about our client library suite. So we have client libraries available in multiple languages, and you can read and write data into influxdb with them. So if you don't want to use telegraph to ingest data, you can use a client library. We'll be using the Python client library to query our data and write our data to influxdb cloud. You can also use client libraries if there isn't an otelegraph plugin like I mentioned. And next I'll be showing a code example for how to do this. Also, I want you to be aware of the visual studio flux extension. So flux is a query language in influxdb OSS, and it allows you to execute sophisticated data analysis on your time series data. Sophisticated queries also create checks and alerts and a bunch of different things. So this extension is particularly useful for executing some of those queries that you don't have to go back and forth between vs code, where you're probably building your application, and the influxdb UI to build your flux queries. You can just stay in vs code, which is really helpful for developers. And last but not least, we will be using plotly. So plotly is a Python graphing library that makes interactive publication quality graphs. It's open source and free and easy to use. And look at all the beautiful graphs that you can create with Polly. So now I'll give an overview of influxdb, and I've already talked some about influxdb, but let's just make sure we're on the same page. In order to do that, we need to establish some context, and that context is answering the question, what exactly is time series data? So essentially, time series data is any data that has a timestamp that is associated with it. Examples include weather conditions, stock prices, customer monitoring, I mean, even point of sale transactions, healthcare logs and traces. And we usually think of time series data existing in two categories, metrics, which is time series data that's gathered at a regular interval. So when we measure the temperature data of our house, for example, consider that to be a metric, because we're measuring IoT regularly, and then we have events, and those are measurements that are gathered at irregular time intervals. So when something maybe happens, when something's triggered, for example. So where does time series appear? Well, the short answer is that it appears in almost any application across a lot of different spaces. So the first is consumer and industrial IoT, manufacturing, industrial platforms, renewable energy and fleet management. These all contain and provide and create a lot of time series data. So when we think of things like pressure, concentration, flow rate, rotations per minute, temperature, humidity, et cetera, these are all sensors vibrations, these are all sensors that might exist in those spaces and data that's collected in those categories. Then we have software infrastructure. You want to be able to monitor your API, endpoints and also developer tools. And then DevOps is a huge DevOps monitoring is a huge source of time series, as well as your containers in Kubernetes. And last but not least, we can think of real time applications. Things like gaming applications and fintech are really huge sources of time series data, but also network monitoring. So now let's talk about the emergence of the time series database category. So we came from relational databases and then document databases like MongoDB came on the scene and more recently search databases. But there was a unique need for databases that can specifically handle time series. And what are these other types of database categories missing that time series databases have and address? The first one is that when we write and think and deal with time series data, we're typically only concerned with ingesting that data and the queryability of that data. It's very rare that when you are collecting really high throughput time series data that you are interested in performing single point deletes or updates. So time series databases should create design assumptions around that and essentially make trade offs in their design that prioritize really high ingest and really high reads over updates and deletes, some things that those other databases are better at performing. Additionally, you need ways to interact with your time series effectively like you probably want. It's very helpful out of the box to be able to have a visualization component to a database as well, and a UI for that, just because time series data isn't really that well understood without graphs. Second component is abilities to manage your time series lifecycle. So you might need to be able to automatically expire old data as soon as it's become irrelevant, and also reduce your data or downsample your data from raw, high precision data to lower precision aggregates, so that when you view your historical data, you can view it as a snapshot and see overall trends more effectively. You also want to be able to have tools that help you work with timestamps more easily built, work across time zones more easily. So ideally you have a database that also contains additional features and is a whole platform that makes working with time series data easier, and that's what influxdata and influxDB aims to do. So this is the architecture diagram for influxdata and more specifically the OSS version. So with the OSS version, influxdb itself is a storage engine, but so much more. It also has that visualization layer and a query and task engine. Then we have telegraph, which we've talked about, where you have over 300 plus plugins and listed. There are some of the input plugins that you can use, for example. And so the main goal of influxDB and influx data, just to resummarize, is that you can get data from a lot of different sources. You can pull that data into influxdb with a variety of different tools, and then you can use influxdb itself to not only transform and downsample that data, but create triggers and alerts, and even use flux, which is the query and scripting language for influxdB, to also collect that data. And then you want to be able to send that data for additional application workflows to gain infrastructure insights and even perform IoT actions. So now let's talk about our data ingestion setup. So I'm not going to go into depth on how the setup the microcontroller. That's only because each one is unique and its setup structures vary, so just follow the appropriate ones. But for the sake of understanding the code from here on out, I want you to be aware that mine is running on a port on my computer because it's plugged in directly, and this is an example of how the data comes in. When I run the command particle serial monitor, it shows me the data that is coming in. And the sensor data is also highly varied. So I'll be skipping the details for how to clean up that data and tag it, which we need to do for our sensors. But all this code is available on GitHub, so you can check that out there in more detail. So when you have your open source influxDb installed and running, even on localhost, you will see a UI like this where you can set up your bucket and token, and you can also do this via the CLI. But I find that using the UI is easier, and so I wanted to show that for this demo. So this video goes over how to create a bucket. That's where you're going to store your data. And in this selection you have the ability to set your retention policy as well. And a retention policy just describes the amount of time that you want to actually retain that data. And when you want to automatically expire it. And this video also shows you how to set up your API token as well, which we're doing right now. We normally suggest that you just use an all access token, but be careful with it. But for development, it can be useful at first when you're getting started, but you can also set up a specific read and write token to specify specifically just your bucket to protect your data and make sure that none of your tokens are the same. So we've already seen how to set up a bucket and token in the UI, but at this point in the code, I have set up my own bucket on my cloud account, and I put in the appropriate credentials and tokens to receive data into influx. And here we're using the influxdb Python client library, which allows you to write a few lines of code to begin streaming data into influxdb. The point here is a data point is being added to the database, and all of these values changes based on the device and the values. And we'll add tags to this point that we're writing to influxdb to help us differentiate between temperature, humidity. Or we could use a tag to differentiate between multiple users or multiple plants that we were monitoring. So that's pretty much it. We create a point with the point method, and we're encapsulating this in a write to influx function, and we are using the write method to actually write this point to our bucket and to our organization, within our organization. And so this is what writing data to influxdb with telegraph looks like. This is what the Toml configuration file looks like. So this is a telegraph config file. The whole thing is quite large, so I'm not going to go in depth on the entire thing, but this is a small part of the configuration file, and this is like the part that you're actually interested in. And you can either use telegraph or the client libraries to ingest data into your OSS instance. But in the project example, we are using telegraph, but we also have that client library code available for those who prefer to use that instead. Each telegraph library has its own documentation, but MOS is very straightforward to install and set up, and you just run telegraph with single commands. Like to run this file, you would just simply say telegraph config and then specify the path for the file. And here we're using the execd plugin, and we are essentially passing in a command that says we want to execute this python three script with the path to that script and the serial port to use that to generate or collect our time series data with Python and then take advantage of the agent to write it. So this is an example of how our data appears in table format once we've written our data to influxdb and actually queried it. So we have one measurement which is sensor data, a field which is light and soil moisture, and there's more fields including temperature and humidity. And we also have the equivalent value for each field as well as a timestamp associated with it. So it's also worth noting that influxDB data is making a big push to support SQL. So flux is still supported in OSS, but in influxdB, cloud flux is being replaced with SQL. And the main reason why this is the case is because we find that most users don't want to take on the burden of learning a new language that is proprietary to a single piece of technology they use. They're more comfortable with SQL. So that's why we're working to provide users in the cloud instances with SQL. However, we recognize that existing users are still taking advantage of flux. So if you are an OSS user, you can still continue to use flux, but we will be using SQL to query data from our influxDB cloud account. But I do want to just quickly introduce flux. For those of you who are confused by that, querying a database with SQL probably makes a lot of sense to a lot of people, but maybe not with Flux. So flux is a data scripting language that comes embedded with influxdB OSS, and it allows you to build data pipelines to query, analyze and transform your data. So that's a quick example of what flux looks like. It's kind of JavaScript esque in its syntax, but functionally sort of operates more like pandas, where the input of one line gets pipe forwarded into another function and each function progressively changes or provides some sort of analysis or transformation option on your data. So this is our most basic flux query that we use to retrieve our data out of influxdb. So specifically, the device id and field are both variables that we can change. For example, the device id could be one and the field could be air temperature, and by having values be variables, we call the same flux query for all of our graphs. And our range is currently set to the past 24 hours. So that's how much data we're going to be displaying on the graphs. But we cloud change that as well. And similarly as well, our bucket is also something that we can change. So let's talk a little built more about the change that is happening from flux to SQL. So the future of influxDB cloud and the future of open source. So we recently launched influxDB cloud powered by our IoT storage engine and it will allow storage in parquet file format with unlimited cardinality. So if you choose to use the edge to cloud replication version of this project, you will most likely connect with influxDB Cloud's new SQL version. So the new cloud version also supports SQL. And if you choose to stay completely in the open source version then you'll probably using Flux. And the plan is eventually to roll iocs and SQL capabilities to our open source actions as well. But we can use flight SQL plugins presently in future with influxDB cloud powered by IoT to take advantage of Apache Superset, tableau power, Bi and Grafana as well. So another big move to change the storage engine in influxDB cloud was to offer more interoperability, and that's because it's largely built on the Apache ecosystem on things like data fusion, Parquet and Arrow, which is all really exciting. But now let's talk about edge data replication. So edge data replication is the process of replicating data from an edge instance of oss to cloud using the edge data replication tool. So what are the advantages of edge data replication? Well, the first is that you reduce bandwidth cost of sending high fidelity data to the cloud, and it also has network resilience for intermittent failures and connectivity to the cloud. So to summarize, using a hybrid solution for our application provides the flexibility to move mundane tasks such as down sampling to the edge. And you can do that down sampling as needed for each type of device that you are gathering data from. And this provides more scope for more interesting analysis and data storage to occur in the cloud. So that's why we're kind of like looking at this hybrid solution and using both the edge and the cloud instances of influxdb. So tangibly, the feature of edge data replication consists of two new API endpoints, the remotes and replication endpoints, and two new CLI commands, the remotes and replication commands. And each replicated bucket also gets a disk back queue for buffering data safely in case of any disruptions that might exist. So now we have our setup instructions here, which can be found on our GitHub readme built. As you can see here we have a command to set up our edge device, which for this project is the open source local host I'm running. So just follow these commands and then you can get it started for yourself. And these are the two commands that you have to run to have your edge connected to your cloud instance. So create your cloud bucket in the exact same steps as your open source, but just in the cloud. And we have full documentation for edge data replication or EDR replication as well, that you can check out. That goes into more detail about the configuration setup. But basically it's two steps. You first create a remote connection and then you create a replication rule between localhost and cloud. So you describe basically what you want and how you want data to be replicated to your cloud instance. So now we're ready for data requests and visualization. So in this step we're calling for the previous flux query and infilling the variables with our selections, including bucket, sensor and device. And we return the result that allows us to graph our incoming data. And we use a query data frame method or function from the Python library that pulls our data back into a data frame format that I find easier to work with a lot of python libraries, especially for visualization, and there are a few different data outputs options that you can choose from if you prefer a different style. So it's just my preferred way of working with the Python client library. So this part of the demo is currently under construction. Querying with SQL we are working on redoing this project for the SQL support as well as updating documentation to go along with the project. That should be done by the end of this talk, so I encourage you to go check it out. Basically all you do is use the aero flight SQL instead and it's just a couple of lines there. And then you can query directly with SQL instead and return a data frame as well. So the process is almost identical. It's just a couple of lines different. But yeah, go check out the GitHub repo because that's all up to date there. And this is the end result of querying data for the data points and we can now graph them. And as you can see here things is an example of the hard coded graphs, but in the demo or I'll also show the selectable graphs. So this is what our plant body dashboard looks like. Basically here we have one graph where we are looking at the light, but we can also see the soil and room temperature and the humidity and soil moisture. So here's what the soil and room temperature looks like. And here's what the room humidity and soil moisture looks like. So now let's talk about some further resources so you can run this yourself and get familiar with everything that we talked about today. So try it for yourself. Follow the following links. Like I said, it will be updated with the SQL example as well, so should already be updated. Honestly, I encourage you to go take a look at it and try both the purely OSS version and OSS to cloud. I should mention there's also a free tier cloud version of influxdB. So yeah, you don't have to pay for anything to try influxdb cloud powered by IoT and last built. Not least, I encourage you to please please join us on either our slack or also our discourse forums, community influxdata.com, and to participate in any conversations around influxdb IoT or influxdb cloud powered by IoT. Specifically, join the influxdb underscore IoT channel. So again, get started yourself. You can visit our website and also our influxDB community organization contains a bunch of examples from the developer advocate at influxdata and also community members as well on different projects using influxdb. So that's just a good place to get inspired as well if you're just wanting to check out influxDB. And here are some further resources as well. I've mentioned a lot of these and the last one worth knowing about is influxdB University there at the bottom where you can get free instructional courses and earn badges on all things influx. Thank you so much.
...

Anais Dotis-Georgiou

Developer Advocate @ InfluxData

Anais Dotis-Georgiou's LinkedIn account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways