Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hey, this is Aakash Gupta and in this talk we'll discuss
how we can use earth observation data or satellite images
to quantify and measure economic activity.
I am the CEO of and co founder of
Thinkevolve Consultancy. We are building products
for the next billion users. One of our projected
involves creating global infrastructure to
track and identify changes in economic
activity at a global scale. We are using
satellite images, social media and other alternative data sets
for this purpose. If you have any questions or doubts you
can reach me at LinkedIn or on my
work email Id which is akash Thinkevolve consulting.com.
During this talk I will try to simplify the concepts but if you have doubts,
drop me a mail and I will be happy to answer your queries.
For measuring economic activity we generally use proxies like
measuring air pollution over major industrial belts,
measuring the quality of water. Or you could look at some ESG
parameters like the rate of change in deforestation levels.
Look at surface water monitoring to identify
water levels. Or you could look at the change
in or the destruction of mangroves across the
shorelines. Another means is looking at highways
and airports which some consider as the lifeline of
today's connected economies. Measuring the car density
of vehicle population in these areas
give very important economic indicators.
During this talk we will mostly use openly
available data which can be accessed via AWS
or Google data belts.
I have two of my colleagues, Abha and Absal
who will discuss two interesting use cases.
Abha will talk about how she used open satellite data
to detect and count the number of vehicles on highways
while Absal will discuss how we used Google Earth
engine to measure the change in surface water monitoring
as well has count the number of water bodies in any given area of interest.
I hope you will find this interesting. So our first
use case is where we have suggested that
economic activity causes pollution, especially manufacturing activities
like steel production, steel or cement production,
leather industry fertilizers due to their
manufacturing they release a lot of pollutants in
the air. In the same manner, high number of
vehicles on roads causes pollution. This leads
to an increase in high no two levels in the atmosphere
and we can use sensors on sentinel five p
satellites to identify changes in the no
two levels in the upper atmosphere. Let me show you this
composite image of the indian subcontinent.
The red areas that you see are areas of high concentration of
no two. You can see these red areas over
New Delhi Gazia belt which is quite an industrial
belt out there to the south of Varanasi which has lot
of leather and tanning industries the village Jamshedpur belt,
which is a major steel production and coal producing
area in east India, as well as over Mumbai
and Gujarat, which are major cities.
Similarly, we can see red spots over Hyderabad and
Karachi in Pakistan. This was taken in
April 2019. A similar composite taken
in April 2020 showed a major reduction
in pollution levels across the indian subcontinent.
This was because in late March there was a strict
lockdown which was initiated. This led to many factories being
shut down. Small and medium enterprises
had stopped working. There was quarantine measures
and hence less traffic on the roads. And has seen
in this image there's very less projections.
In fact, we did a small experiment with a lot of countries across different continents
and found out that there was a significant change in no two levels
post a lockdown being implemented. This was observed in
Germany as well as in the chinese province
of Wuhan. But if the country has lax implementation
of a lockdown, we didn't see this effect. Measuring pollution
can be a very good measure of identifying
ongoing economic activity. Let's look
at measuring the quality of rivers during the lockdown.
There was a lot of chatter on social media regarding
sighting of wildlife in the canals of Venice. This was
mostly fake news. But if you look
at the satellite images that we have, you'll notice
that before the lockdown, you could see belts which
are using the Venice channels. But post the
lockdown, since no boats were going were
transversing channels, you do not see them. You see less turbidity
in the water and hence less pollution. This used
high resolution imagery over the city
of Venice, but we can also use sentinel data.
Sentinel data to calculate the floating algae
indicator, which indicators the presence of aquatic
life in the waters. This false color image
shows the waters of Lake Victoria in Africa.
The green color shows the floating algae indicator,
which is calculated using multispectral images. Opt in from
sentinel to l two a images. The red color over here shows turbid
water, but most of these sensors that
we have used till now in the optical spectrum.
So if you have cloudy weather or
bad weather conditions, most of these satellite images are of
no use. It's here where we can use SAR
or synthetic aperture radar, which can penetrate
through bad weather conditions. This is useful to
track building construction, road construction and
military establishments. In fact, we use Saab
for tracking the construction of a field hospital in Spain as
well as in Wuhan, even in bad weather conditions.
Let's look at some ESG parameters, like the rate
of deforestation during the planet Hack 2020.
Last year, my team had created can app, which helped
users to quickly identify in real time
logging events taking place in the Amazon.
This is very useful for countries like brazil,
which are struggling between having
economic activity and protecting the environment.
This is a dicey and a political subject for them, and they
are also facing budgetary constraints because of which number of
indicators have reduced and available
resources are less. And this causes a lot of strain
on their existing resources. But our app was able to provide
them quick insights on which, so that
they can identify areas where they could focus
their limited resources. As we
discussed earlier, highways and airports
are the lifeline of most economies.
As such, we can use the
detection of cars or vehicle density in
this area to understand the vitality of that region we can
use cars and vehicles are normally very
small in size, so we need to use very high resolution
imagery like those provided by planet scope satellites,
which provides imagery in the 30 centimeter resolution.
This helps us to identify cars
which are parked in parking lots. We used high resolution imagery and
then we used open source model which allowed us to detect
more than 60 different classes of vehicles.
A similar approach can be used to
count the number of airplanes which are parked
as well as in flight. Analyzing temporal
signals can be used to identify anomalies and any unexpected
changes have their economic signals.
We also use pictera, which is
a low code platform for training and deployment
of machine learning models. It's specifically suited
for training machine learning models for detecting objects
and satellite images. And if you are a can
of low code no code approach, I would
suggest having a look at their platform, which is
very easy to use. But in many cases,
citizen data scientists and researchers do not have
the budget to access very high resolution imaging.
All that you have is getting access to open
data sets which are often of low resolution,
especially sentinel data has a resolution of 10 meters,
around 10 meters, ten to 14 70 meters and 10
meters depending upon the sensors. And at
such low resolution you cannot really detect commercial
vehicles or even count the number of cars.
So we have used a very unique approach
to this problem and we have Abha
who will discuss about it. Hello everyone. Before diving
I would like to introduce myself first. I'm Abha Purval and
I am an intern at thinkevolve consulting.
Currently I am pursuing my undergraduate degree in bachelor's of
technology. I have a been interest in machine learning and want
to research about its use in defense as well as in remote
sensing. This projected is based on that only. Have you
ever heard that we can use open earth observations to find
the change in economic activities based on the detection of
various elements? So today I would like to discuss the project
on which I have worked as an intern. Let me give
you a quick introduction. First, as we know that the earth observations
is delivering significant insights that help us to understand the
consequences and the drastic crisis on the environment,
the people and the economy. And going forward to that,
we are going to use sentinel two data from the sentinel hub to
detect the number of drugs present at a particular time on
a particular place. Sentinel two carries a push broom sensor
to cover the 290 kilometer field of view. Twelve detectors
are arranged in two parallel arrays on a focal plane.
Those detectors, acquiring the visual and the near infrared,
cover ten spectral bands each. This configuration causes
two main effects, namely interdetector and interband parallax
angle. In simple words, consider a single pixel.
The parallax angle results in a band specific viewing geometry of
this pixel and the multispectral instrument see it
at slightly different times depending on the band. With the
help of these two effects, the satellite detects the moving object.
The offset of different wavelengths that moving objects have in
sentinel two data causes a reflectance in RGB
which looks like a rainbow. Now, moving ahead, we will
directly jump to the working and the results for our process we
have used a Python library named Xcube that has been
developed to provide earth observation data and an analysis
ready form to users. An Xcube dataset
contains one or more variables whose values are
stored in cells of a common multi dimensional spatial temporal
grid. The dimensions are usually time,
latitude and longitude, however, other dimensions
may also present. It is based on a popular data science
packages such as tsaR, xray and dask. At first
we generate an Xcube compatible TsaR data cube.
For accessing sentinel two data. We are using Xcubessets
plugin that adds support for the Sentinel hub cloud API.
It extends xcube by a new Python API function,
xcubessage cube Opencube to create data
cubes from Sentinel hub on the fly. It also
adds a new CLI command to generate and write data cubes
created from Sentinel hub into the file system.
This will be the overview of the process. Moving towards our
first step, we are going to select the region of interest and
then find out the bounding box coordinates of that region
with the help of bbox finder as well as the Sentinel hub
request builder. Here I took the example of Vishakhapatnam.
Next we are going to extract the OSM roadmap.
OpenStreetMap has an API for fetching and saving rodzeo
data from or to the OpenStreetMap database.
The overpass API is a read only API that serves a
custom selected path of the OSM data map.
It acts as a database over the web. The client
sends a query to the API and gets back the data set
that corresponds to the query. We can use various keywords for obtaining
a different maps like roadmap, water body map,
etc. For our sake we are using the keywords like
highway, motorway, roads, etc. Which are predefined
in OSM so as to obtain the roadmap of our
region of interest. We are going to build a query using overpass
API which is especially to extract and obtain the
OSM maps for our region of interest. With the help
of Python library and the plugin xcubessage, we are going to make
and configure a cube. A cube is basically a data set contains
the required features and parameters. Generating a cube
simply means that extracting a part of a particular data
with required features. Thus, we pass the parameters like
the data set that is s two, l two, a time range,
time period, a special resolution bands, et cetera.
The satellite has a minimum resolution of 10 meters
and we are going to extract that. We are extracting
six different bands named b two that is blue band,
b three that is green band, b four that is red
band, b eight near infrared red band believes short
wave infrared band and SCL that is scene classification
map. After that we are going to process the obtained timestamps
and then calculate the thresholds. At first we are
going to obtain the road mask by masking the OSM map to
the sentinel images. After that by using
these values and formulas, we are going to calculate the different band
values like NDVI, NDWI,
NDSI so as to avoid the vegetation index,
water body index and snow index. After that we
are calculating the blue green ratio and the blue red ratio to
find out the rainbowish reflectance and then we compare
it with the threshold to detect whether the truck is present
or not. Its truck is going to be represented by
two to three pixels in the images. To visualize this,
we are going to rasterize the image and then we count the
number of trucks for that day and the ROI for the date
19 June, the number of truck is 195.
For a given ROI, we have analyzed this for different
region of interest and thus got to know that for a particular month
in 2019, the number of trucks in Vishaka Putnam is
it's 771. That is, the number of trucks will reduce
up to 26% during the lockdown. This is the
comparison chart for the number of vehicles in 2019,
that is before pandemic and during pandemic that is
2020. Like this we have an economic analysis
based on the number of vehicles in different years. The major
drawback is cloud acquisition so we have to select the
dates carefully, having less amount of clouds to get the
proper results. Also, the number of trucks on the road obtained
are not accurate as minimum special resolution
is of 10 meters. But yes we can get the approximate
number which can let us know about the relative truck density.
Thus this will be very helpful in the pandemic like
Covid to analyze the lockdown situation. Thank you.
Hello, I'm April. I'm an interns think world consultant.
I'm a computer science engineering student from Kerala.
My area of interest are mostly NLP and privacy
preserving machine learning. Currently my works are around geospatial
data. I'm here to discuss about a project to
monitor surface water using Google Earth engine.
We are working with Google Earth Engine which is a cloud computing platform
for processing satellite imagery and other geospatial
and observation data. It provides access to a large database
of satellite imagery and computational power needed to
analyze those images. It has also got a Python and JavaScript
API which helps to bring these facilities into our
applications. Another resource we found quite useful was GeMap
which is a python wrapper around the Google Earthengine API,
making use of IPY leaflet and IPI widget.
If you want to integrate earth engine with your Jupyter node, check out
GE map. It's so helpful.
We use the NDWI metric to calculate the surface water
area. The NDW normalized Difference water index
method is an index for delineating and monitoring contract
changes in surface water has water bodies strongly
absorb light in the visible to infrared electromagnetic spectrum.
Green and short wave infrared waves are used for this.
In case of lancet eight, they are represented by band three and band six.
NDWA values lies between minus one and one and
generally water bodies have an NTWA value greater
than 0.3. Let's see how this works.
Okay, so a rectangle is drawn in the map using the
Jupiter widget around the required area and then there appears a
layer on top of the rectangle where the deep blue shades indicators surface
water. The normalized difference of band three and band six
is calculated to get the NDW value of pixels.
And then we get the number of pixels having the NDWI
value greater than threshold and multiply it with
the area of a single pixel to get the total surface
water area. Can NDWI value created than zero indicates
water and threshold value can be anything more than
that. Conventionally it's 0.2 or 0.3 and
here is a plot of water area
during January to May of this year and this can be used to
monitor the change in water area during any period of time.
For any area. The same can be done with different
bands to get the vegetation index or moisture index.
A limitation of this method is that NDWI is sensitive to filter
plant and can result in an overestimation of water bodies.
And the next part of a project is to count the number of
water bodies. We use OpenCV and scikitlearn for this.
First we download the NDWI image of the required area from the
previous process and then it is converted to grayscale
and some smoothing is done to reduce the high frequency
noise. After thresholding the image, we get an image
like this where the water bodies are in white and rest in
black, and then we apply a method called connected component analysis.
Connected component labeling is used in computer vision to detect
connected regions in binary digital images,
and it solves the problem of finding out parts of an image that are connected
physically, irrespective of its color. The connected components,
also known as blobs, can be counted and filtered.
The nearest neighbors of a pixel are labeled the same to
form a blob. If the number of pixels exceeds a predefined
threshold, then we consider the blob large enough
and it is added as a water body. This approach
has limitations when used to count water bodies having an
area smaller than a threshold. Also,
complex shapes and irregular areas of water bodies may also affect the
accuracy. Although the accuracy of the technique isn't perfect,
it can still be used to analyze the change in number of water bodies in
a region between a period of time.
That's it from my part. I hope you guys enjoyed it. Thank you.
So that was very cool, and I hope a
lot of you found that interesting and inspiring.
Maybe it inspired you to work on your own projects.
However, before finishing this talk, I would like
to highlight some of the assumptions,
like when you start working very closely with
these data sets to understand the assumptions and the limitations that come
with it. So let me just talk about a
few of them. The most important of them is, of course, the availability
of data. You may notice a
large gap during revisit times and you may
not get an image for the period
of interest. This is because a single satellite is
revolving around the world and taking images,
and it's moving as well as on the equatorial
led, especially for rural areas or for remote region,
you might not have good revisit times or historical
data sets to work on. Sometimes the sensors
do not work properly. Most of the satellites
which are in the open access are openly
accessible. They are quite old. The sensors may
not work properly and you have bad
data. You need to use some kind of anomaly detection algorithms
to ensure that you have valid information. And of course,
as we discussed earlier, cloud cover can cause
problems, especially around the equatorial belts
or over rainforests, especially in the Southeast
Asia, like Indonesia, Myanmar, Cambodia.
You have cloud cover for most part of here and
it becomes difficult to acquire optical imagery.
Data providers are for profit organization. They do work
with a lot of agios, but since they are for profit,
they have to ensure that they are covering areas of interest which
are in Tiban. And these are mostly industrial
regions, urban areas, high density urban areas,
and industrial complexes like oil and gas facilities
and of course defense facilities.
NDWI is a very sensitive
index. So if there's a built up land which can cause
overestimation of the water bodies, you should be aware of
the disadvantage of any index that you choose.
Complex shapes and irregular areas of water bodies
and a lot of other object detection algorithms can fail,
give a lot of false positives. The precision
might be low. So these limitations should
be known. And you should understand that when you start working on
any project. And of course there's bureaucracy and access
to sensitive data sets, because like in India,
we have a national remote sensing organization which regulates
the flow of high resolution data sets. So if you need access to
anything, any images of higher than 50 centimeter
resolution, lower than, sorry, lower than 50 centimeter resolution,
you need to route your request to them.
And of course, if your area of interest involves
a defense establishment, that needs to be blurred out.
So the other thing is, of course, you should be aware of the
local laws and regulations which may differ from region to
region. Each country has its own laws
which govern it. But in any case,
there is a lot of openly available data sets
and you can use it at
quickly get started and gain good insights.
So, yeah, thanks for listening and have of
tea. Thank you.