Conf42 Python 2021 - Online

Using Open Earth Observations for measuring economic indicators

Video size:

Abstract

Economists generally struggle to quantify the impact of policies at the ground level. We illustrate how open Earth Observation data (or satellite imagery) can be used to measure economic activity.

The alarming speed with which the COVID19 global pandemic spread across the world and the subsequent quarantine & work-from-home (WFH) measures initiated by various governments has created an unprecedent situation. There has been a sharp decline in economic activity and the global GDP is projected to contract by 3% (IMF projections April 2020) That is equivalent to $3tn of lost economic output & precious jobs. World Bank believes that the pandemic will also push the poorer section of the society into extreme poverty. However the pandemic has also given us an opportunity to observe the impact - EO/ Satellite data is being used to quantify the impact on the global economy. We discuss the various methods with which economic activity can be measured. As an example we will also illustrate how, NO2 data can be extracted from multi-spectral satellite imagery. And highlight changes in pollution levels across major industrial belts.

Summary

  • In this talk we'll discuss how we can use earth observation data or satellite images to quantify and measure economic activity. During this talk I will try to simplify the concepts but if you have doubts, drop me a mail and I will be happy to answer your queries.
  • Economic activity causes pollution, especially manufacturing activities like steel production, steel or cement production. Measuring pollution can be a very good measure of identifying ongoing economic activity. This is very useful for countries like brazil, which are struggling between having economic activity and protecting the environment.
  • Abha Purval is an intern at thinkevolve consulting. He wants to research about machine learning in defense and remote sensing. Using open earth observations to find the change in economic activities. Using a Python library named Xcube to provide earth observation data and an analysis ready form.
  • OpenStreetMap has an API for fetching and saving rodzeo data from or to the OpenStreetMap database. With the help of Python library and the plugin xcubessage, we are going to make and configure a cube. This will be very helpful in the pandemic like Covid to analyze the lockdown situation.
  • April is a computer science engineering student from Kerala. He is working on a project to monitor surface water using Google Earth engine. Using the NDWI metric to calculate the surface water area. Could be used to monitor the change in water area during any period of time.
  • Most important of them is the availability of data. You may notice a large gap during revisit times and you may not get an image for the period of interest. Sometimes the sensors do not work properly. Cloud cover can cause problems, especially around the equatorial belts or over rainforests. Local laws and regulations may differ from region to region.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hey, this is Aakash Gupta and in this talk we'll discuss how we can use earth observation data or satellite images to quantify and measure economic activity. I am the CEO of and co founder of Thinkevolve Consultancy. We are building products for the next billion users. One of our projected involves creating global infrastructure to track and identify changes in economic activity at a global scale. We are using satellite images, social media and other alternative data sets for this purpose. If you have any questions or doubts you can reach me at LinkedIn or on my work email Id which is akash Thinkevolve consulting.com. During this talk I will try to simplify the concepts but if you have doubts, drop me a mail and I will be happy to answer your queries. For measuring economic activity we generally use proxies like measuring air pollution over major industrial belts, measuring the quality of water. Or you could look at some ESG parameters like the rate of change in deforestation levels. Look at surface water monitoring to identify water levels. Or you could look at the change in or the destruction of mangroves across the shorelines. Another means is looking at highways and airports which some consider as the lifeline of today's connected economies. Measuring the car density of vehicle population in these areas give very important economic indicators. During this talk we will mostly use openly available data which can be accessed via AWS or Google data belts. I have two of my colleagues, Abha and Absal who will discuss two interesting use cases. Abha will talk about how she used open satellite data to detect and count the number of vehicles on highways while Absal will discuss how we used Google Earth engine to measure the change in surface water monitoring as well has count the number of water bodies in any given area of interest. I hope you will find this interesting. So our first use case is where we have suggested that economic activity causes pollution, especially manufacturing activities like steel production, steel or cement production, leather industry fertilizers due to their manufacturing they release a lot of pollutants in the air. In the same manner, high number of vehicles on roads causes pollution. This leads to an increase in high no two levels in the atmosphere and we can use sensors on sentinel five p satellites to identify changes in the no two levels in the upper atmosphere. Let me show you this composite image of the indian subcontinent. The red areas that you see are areas of high concentration of no two. You can see these red areas over New Delhi Gazia belt which is quite an industrial belt out there to the south of Varanasi which has lot of leather and tanning industries the village Jamshedpur belt, which is a major steel production and coal producing area in east India, as well as over Mumbai and Gujarat, which are major cities. Similarly, we can see red spots over Hyderabad and Karachi in Pakistan. This was taken in April 2019. A similar composite taken in April 2020 showed a major reduction in pollution levels across the indian subcontinent. This was because in late March there was a strict lockdown which was initiated. This led to many factories being shut down. Small and medium enterprises had stopped working. There was quarantine measures and hence less traffic on the roads. And has seen in this image there's very less projections. In fact, we did a small experiment with a lot of countries across different continents and found out that there was a significant change in no two levels post a lockdown being implemented. This was observed in Germany as well as in the chinese province of Wuhan. But if the country has lax implementation of a lockdown, we didn't see this effect. Measuring pollution can be a very good measure of identifying ongoing economic activity. Let's look at measuring the quality of rivers during the lockdown. There was a lot of chatter on social media regarding sighting of wildlife in the canals of Venice. This was mostly fake news. But if you look at the satellite images that we have, you'll notice that before the lockdown, you could see belts which are using the Venice channels. But post the lockdown, since no boats were going were transversing channels, you do not see them. You see less turbidity in the water and hence less pollution. This used high resolution imagery over the city of Venice, but we can also use sentinel data. Sentinel data to calculate the floating algae indicator, which indicators the presence of aquatic life in the waters. This false color image shows the waters of Lake Victoria in Africa. The green color shows the floating algae indicator, which is calculated using multispectral images. Opt in from sentinel to l two a images. The red color over here shows turbid water, but most of these sensors that we have used till now in the optical spectrum. So if you have cloudy weather or bad weather conditions, most of these satellite images are of no use. It's here where we can use SAR or synthetic aperture radar, which can penetrate through bad weather conditions. This is useful to track building construction, road construction and military establishments. In fact, we use Saab for tracking the construction of a field hospital in Spain as well as in Wuhan, even in bad weather conditions. Let's look at some ESG parameters, like the rate of deforestation during the planet Hack 2020. Last year, my team had created can app, which helped users to quickly identify in real time logging events taking place in the Amazon. This is very useful for countries like brazil, which are struggling between having economic activity and protecting the environment. This is a dicey and a political subject for them, and they are also facing budgetary constraints because of which number of indicators have reduced and available resources are less. And this causes a lot of strain on their existing resources. But our app was able to provide them quick insights on which, so that they can identify areas where they could focus their limited resources. As we discussed earlier, highways and airports are the lifeline of most economies. As such, we can use the detection of cars or vehicle density in this area to understand the vitality of that region we can use cars and vehicles are normally very small in size, so we need to use very high resolution imagery like those provided by planet scope satellites, which provides imagery in the 30 centimeter resolution. This helps us to identify cars which are parked in parking lots. We used high resolution imagery and then we used open source model which allowed us to detect more than 60 different classes of vehicles. A similar approach can be used to count the number of airplanes which are parked as well as in flight. Analyzing temporal signals can be used to identify anomalies and any unexpected changes have their economic signals. We also use pictera, which is a low code platform for training and deployment of machine learning models. It's specifically suited for training machine learning models for detecting objects and satellite images. And if you are a can of low code no code approach, I would suggest having a look at their platform, which is very easy to use. But in many cases, citizen data scientists and researchers do not have the budget to access very high resolution imaging. All that you have is getting access to open data sets which are often of low resolution, especially sentinel data has a resolution of 10 meters, around 10 meters, ten to 14 70 meters and 10 meters depending upon the sensors. And at such low resolution you cannot really detect commercial vehicles or even count the number of cars. So we have used a very unique approach to this problem and we have Abha who will discuss about it. Hello everyone. Before diving I would like to introduce myself first. I'm Abha Purval and I am an intern at thinkevolve consulting. Currently I am pursuing my undergraduate degree in bachelor's of technology. I have a been interest in machine learning and want to research about its use in defense as well as in remote sensing. This projected is based on that only. Have you ever heard that we can use open earth observations to find the change in economic activities based on the detection of various elements? So today I would like to discuss the project on which I have worked as an intern. Let me give you a quick introduction. First, as we know that the earth observations is delivering significant insights that help us to understand the consequences and the drastic crisis on the environment, the people and the economy. And going forward to that, we are going to use sentinel two data from the sentinel hub to detect the number of drugs present at a particular time on a particular place. Sentinel two carries a push broom sensor to cover the 290 kilometer field of view. Twelve detectors are arranged in two parallel arrays on a focal plane. Those detectors, acquiring the visual and the near infrared, cover ten spectral bands each. This configuration causes two main effects, namely interdetector and interband parallax angle. In simple words, consider a single pixel. The parallax angle results in a band specific viewing geometry of this pixel and the multispectral instrument see it at slightly different times depending on the band. With the help of these two effects, the satellite detects the moving object. The offset of different wavelengths that moving objects have in sentinel two data causes a reflectance in RGB which looks like a rainbow. Now, moving ahead, we will directly jump to the working and the results for our process we have used a Python library named Xcube that has been developed to provide earth observation data and an analysis ready form to users. An Xcube dataset contains one or more variables whose values are stored in cells of a common multi dimensional spatial temporal grid. The dimensions are usually time, latitude and longitude, however, other dimensions may also present. It is based on a popular data science packages such as tsaR, xray and dask. At first we generate an Xcube compatible TsaR data cube. For accessing sentinel two data. We are using Xcubessets plugin that adds support for the Sentinel hub cloud API. It extends xcube by a new Python API function, xcubessage cube Opencube to create data cubes from Sentinel hub on the fly. It also adds a new CLI command to generate and write data cubes created from Sentinel hub into the file system. This will be the overview of the process. Moving towards our first step, we are going to select the region of interest and then find out the bounding box coordinates of that region with the help of bbox finder as well as the Sentinel hub request builder. Here I took the example of Vishakhapatnam. Next we are going to extract the OSM roadmap. OpenStreetMap has an API for fetching and saving rodzeo data from or to the OpenStreetMap database. The overpass API is a read only API that serves a custom selected path of the OSM data map. It acts as a database over the web. The client sends a query to the API and gets back the data set that corresponds to the query. We can use various keywords for obtaining a different maps like roadmap, water body map, etc. For our sake we are using the keywords like highway, motorway, roads, etc. Which are predefined in OSM so as to obtain the roadmap of our region of interest. We are going to build a query using overpass API which is especially to extract and obtain the OSM maps for our region of interest. With the help of Python library and the plugin xcubessage, we are going to make and configure a cube. A cube is basically a data set contains the required features and parameters. Generating a cube simply means that extracting a part of a particular data with required features. Thus, we pass the parameters like the data set that is s two, l two, a time range, time period, a special resolution bands, et cetera. The satellite has a minimum resolution of 10 meters and we are going to extract that. We are extracting six different bands named b two that is blue band, b three that is green band, b four that is red band, b eight near infrared red band believes short wave infrared band and SCL that is scene classification map. After that we are going to process the obtained timestamps and then calculate the thresholds. At first we are going to obtain the road mask by masking the OSM map to the sentinel images. After that by using these values and formulas, we are going to calculate the different band values like NDVI, NDWI, NDSI so as to avoid the vegetation index, water body index and snow index. After that we are calculating the blue green ratio and the blue red ratio to find out the rainbowish reflectance and then we compare it with the threshold to detect whether the truck is present or not. Its truck is going to be represented by two to three pixels in the images. To visualize this, we are going to rasterize the image and then we count the number of trucks for that day and the ROI for the date 19 June, the number of truck is 195. For a given ROI, we have analyzed this for different region of interest and thus got to know that for a particular month in 2019, the number of trucks in Vishaka Putnam is it's 771. That is, the number of trucks will reduce up to 26% during the lockdown. This is the comparison chart for the number of vehicles in 2019, that is before pandemic and during pandemic that is 2020. Like this we have an economic analysis based on the number of vehicles in different years. The major drawback is cloud acquisition so we have to select the dates carefully, having less amount of clouds to get the proper results. Also, the number of trucks on the road obtained are not accurate as minimum special resolution is of 10 meters. But yes we can get the approximate number which can let us know about the relative truck density. Thus this will be very helpful in the pandemic like Covid to analyze the lockdown situation. Thank you. Hello, I'm April. I'm an interns think world consultant. I'm a computer science engineering student from Kerala. My area of interest are mostly NLP and privacy preserving machine learning. Currently my works are around geospatial data. I'm here to discuss about a project to monitor surface water using Google Earth engine. We are working with Google Earth Engine which is a cloud computing platform for processing satellite imagery and other geospatial and observation data. It provides access to a large database of satellite imagery and computational power needed to analyze those images. It has also got a Python and JavaScript API which helps to bring these facilities into our applications. Another resource we found quite useful was GeMap which is a python wrapper around the Google Earthengine API, making use of IPY leaflet and IPI widget. If you want to integrate earth engine with your Jupyter node, check out GE map. It's so helpful. We use the NDWI metric to calculate the surface water area. The NDW normalized Difference water index method is an index for delineating and monitoring contract changes in surface water has water bodies strongly absorb light in the visible to infrared electromagnetic spectrum. Green and short wave infrared waves are used for this. In case of lancet eight, they are represented by band three and band six. NDWA values lies between minus one and one and generally water bodies have an NTWA value greater than 0.3. Let's see how this works. Okay, so a rectangle is drawn in the map using the Jupiter widget around the required area and then there appears a layer on top of the rectangle where the deep blue shades indicators surface water. The normalized difference of band three and band six is calculated to get the NDW value of pixels. And then we get the number of pixels having the NDWI value greater than threshold and multiply it with the area of a single pixel to get the total surface water area. Can NDWI value created than zero indicates water and threshold value can be anything more than that. Conventionally it's 0.2 or 0.3 and here is a plot of water area during January to May of this year and this can be used to monitor the change in water area during any period of time. For any area. The same can be done with different bands to get the vegetation index or moisture index. A limitation of this method is that NDWI is sensitive to filter plant and can result in an overestimation of water bodies. And the next part of a project is to count the number of water bodies. We use OpenCV and scikitlearn for this. First we download the NDWI image of the required area from the previous process and then it is converted to grayscale and some smoothing is done to reduce the high frequency noise. After thresholding the image, we get an image like this where the water bodies are in white and rest in black, and then we apply a method called connected component analysis. Connected component labeling is used in computer vision to detect connected regions in binary digital images, and it solves the problem of finding out parts of an image that are connected physically, irrespective of its color. The connected components, also known as blobs, can be counted and filtered. The nearest neighbors of a pixel are labeled the same to form a blob. If the number of pixels exceeds a predefined threshold, then we consider the blob large enough and it is added as a water body. This approach has limitations when used to count water bodies having an area smaller than a threshold. Also, complex shapes and irregular areas of water bodies may also affect the accuracy. Although the accuracy of the technique isn't perfect, it can still be used to analyze the change in number of water bodies in a region between a period of time. That's it from my part. I hope you guys enjoyed it. Thank you. So that was very cool, and I hope a lot of you found that interesting and inspiring. Maybe it inspired you to work on your own projects. However, before finishing this talk, I would like to highlight some of the assumptions, like when you start working very closely with these data sets to understand the assumptions and the limitations that come with it. So let me just talk about a few of them. The most important of them is, of course, the availability of data. You may notice a large gap during revisit times and you may not get an image for the period of interest. This is because a single satellite is revolving around the world and taking images, and it's moving as well as on the equatorial led, especially for rural areas or for remote region, you might not have good revisit times or historical data sets to work on. Sometimes the sensors do not work properly. Most of the satellites which are in the open access are openly accessible. They are quite old. The sensors may not work properly and you have bad data. You need to use some kind of anomaly detection algorithms to ensure that you have valid information. And of course, as we discussed earlier, cloud cover can cause problems, especially around the equatorial belts or over rainforests, especially in the Southeast Asia, like Indonesia, Myanmar, Cambodia. You have cloud cover for most part of here and it becomes difficult to acquire optical imagery. Data providers are for profit organization. They do work with a lot of agios, but since they are for profit, they have to ensure that they are covering areas of interest which are in Tiban. And these are mostly industrial regions, urban areas, high density urban areas, and industrial complexes like oil and gas facilities and of course defense facilities. NDWI is a very sensitive index. So if there's a built up land which can cause overestimation of the water bodies, you should be aware of the disadvantage of any index that you choose. Complex shapes and irregular areas of water bodies and a lot of other object detection algorithms can fail, give a lot of false positives. The precision might be low. So these limitations should be known. And you should understand that when you start working on any project. And of course there's bureaucracy and access to sensitive data sets, because like in India, we have a national remote sensing organization which regulates the flow of high resolution data sets. So if you need access to anything, any images of higher than 50 centimeter resolution, lower than, sorry, lower than 50 centimeter resolution, you need to route your request to them. And of course, if your area of interest involves a defense establishment, that needs to be blurred out. So the other thing is, of course, you should be aware of the local laws and regulations which may differ from region to region. Each country has its own laws which govern it. But in any case, there is a lot of openly available data sets and you can use it at quickly get started and gain good insights. So, yeah, thanks for listening and have of tea. Thank you.
...

Aakash Gupta

CEO @ ThinkEvolve Consulting

Aakash Gupta's LinkedIn account Aakash Gupta's twitter account

Abha Porwal & Apsal K

Interns @ ThinkEvolve Consulting



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways