Conf42 Machine Learning 2021 - Online

History meets AI: Unveiling the secrets of ancient coins

Video size:

Abstract

The University of Oxford houses 21 millions of objects in the collections of its Gardens, Libraries & Museums.

Preserving these assets requires great care. In this talk, we will review how AWS helped them build a sector leading ML solution that increased access to its collections for students, researchers, and public visitors while saving its staff and volunteers a massive amount of work.

This talk will show recent work related to the new AWS Case Study titled “University of Oxford introduces a sector leading Machine Learning prototype to augment Digitisation in Numismatics”.

Summary

  • History meets AI, unveiling the secrets of ancient coins. Second part of this session dedicated to hands on examples. Show you some ways that you can build your own application to solve challenge similar to this.
  • The University of Oxford houses 21 million objects in the collections of its gardens, libraries and museums. To optimize the access to these collections for digital teaching and research, Glam asked the question, can we maybe use machine learning to help us?
  • Asmonian Museum has built the world's largest digital collection of roman provincial coinage that is open to anyone to browse online for free. Getting an item into this collection requires expert input from curators. What we wanted to do is not automate this task, but augment the humans behind it.
  • There is a new open source solution that is in the works. It's not going to be restricted only to coins, but rather any collections object. Anything that is released will be added in the comments below this video.
  • Using aws for dash, volunteers can easily digitalize large collections of coins. The process is very quick and easy experimentation. You can build your own algorithm. This improves the workflow and productivity and adds value for the public.
  • Sagemaker Studio is an end to end platform for machine learning from AWS. You can directly train it from this screen. Once this is trained, you can deploy it as an endpoint and use this model for inference.
  • With only three lines of code you can save maybe from 60% to 90% of the cost. The task that we want to solve for is saliency detection. We use custom library that is called Ice vision. It uses both things for training. Think about spot as a way for going into an auction for unused compute capacity.
  • Building a visual search application with Amazon, sagemaker and elasticsearch. Using an open source data set from clothes fashion. Running a convolutional neural network against these images. Once we tested that, this actually works.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hi, everyone, and welcome to this session. History meets AI, unveiling the secrets of ancient coins. My name is Nico. I work with the EMEA public sector at Amazon Web Services. So today we're going to split this session into two parts. The first part is going to be talking about this unveiling of the secret of ancient coins. We care going to explore the challenge that we had at hand, and we are also going to explore the solution that we came up with. Then the second part of this session is going to be dedicated to hands on examples where I'm going to show you some ways that you can build your own application to solve challenge similar to this. So we're going to be focusing on three things. Number one is going to be image classification. Number two is going to be background removal and image segmentation. And number three is going to be how to build a visual search engine. So with that, let's dive right into the challenge that we had at hand. So first, let's start with these. Why? So the University of Oxford houses 21 million objects in the collections of its gardens, libraries and museums. Glam for short. One aspect of their mission is that they want to preserve these assets and make them accessible to the world for education and research. But of course, there are only so many space that you can have for these. So the organization only has enough space to display about 10% of its holding at a single time. And there's an enormous backlog of artifacts still waiting to be cataloged. So to optimize the access to these collections for digital teaching and research, Glam asked the question, can we maybe use machine learning to help us? If we are successful, that will reduce the time that a research department needs to identify and catalog an object. But before we even think of that, the first thing that we had to identify is a suitable, well cataloged collection that will become the prototype candidate. So that candidate was the roman provincial coinage. Digital collection. This is a world renowned research project in numinastics. The team included a curator with previous experience in developing digital collections from the ground up. This person is Sharon Meirat. He's the curator for the Herberdin coin room in the Ashmolian Museum. So the first step in any machine learning project is to decide what you want to predict. In this case, Anshanesh Babu, who is the system architect and network manager from clan, wanted to predict a very simple outcome. Heads or tails. That is, is the specimen that I have in front of me that I'm looking at. This photograph is that of the overs or the reverse of a coin, which is another way of saying that given a known training data, can we have a machine learning solution? Predict the right side of a coin with a high degree of a crest. So now that we have the why we want to do this, let's move into what are the actual things that we want to solve for. So this is the moment when the Ashmore Museum came to AWS and together we started discussing what is a normal day for the people who are working at this museums, what care, the challenges that they are facing? What are the limitations and constraints that they have. So we knew from before, the Asmonian Museum has built the world's largest digital collection of roman provincial coinage that is open to anyone to browse online for free. Now, getting an item into this collection requires expert input from curators, but these people are highly skilled and very care, making this task very difficult to escape. So the way that this works is that, for example, you may have a multitude of physical specimens and you want to catalog them. AWS items. Maybe the item that you want to catalog this for already exists in the collections and this is just another specimen to that item. Or maybe the item is completely new to these digital collection. Some cases you may have all the information available for these specimen, or maybe in some other cases, you may lack some other information. Also, something that might happen is that other research institutions, or maybe even individuals might reach out to these ashmolian museum with the simple question, look, we have this item. Do you know what it is? And the answer to this question is at times very complex because of the sheer volume of items that need to be processed. Oftentimes, groups of people who want to help out. The mission of the university volunteers to help out with this task. But normally, because of the way that this is established, in some cases, even the most simple task cannot be accomplished by a single person or a small group of individuals. Right? When I mean task, I mean getting a specimen and identifying the right item that this specimen belongs to. So what we wanted to do is not automate this task, but augment the humans behind it. Build tools that can support these people who are working with this every day in a way that they can focus on more relevant tasks, avoiding, for example, spending hours and hours rotating photos so that they are aligned perfectly before they move to the next task. In this case, these customer objective is to reduce the time that it takes for the correct appraisal of a single specimen. Currently, these is estimated between 10 minutes and several hours for each item. You can imagine that you get an item and you want to spend some time corroborating that the information that you have at hand that is available matches the one that you have in the collection. And if it doesn't match, then you need to figure out what is that missing information. And for this, you can have a multitude of combinations, making this exponentially difficult when you have items that are not the standard ones. Right? So the difficult items will require an enormous amount of time, and normally they will require an expert who is very scars. These sense. So let's take deep into what we are talking about. So, these are two screenshots from the digital collection. The image that you see on the left are three items. So three coins. And you can see that we have the overs and the rivers on the left. And then we have information on the right, where you can see, for example, the inscription that is written on the overs and the rivers, the city that belongs to the region, these province, even the person that is in the image. Now, the item that you see on the left, on the right. Care different specimens to this same item, right. So you can see how the quality of the specimen varies in a big way. These, what you can see, care four photos of the same coins. So you can see that we have a very high quality on the left, where we can figure out the text that is written. We can very easily figure out the person. But then when we look at examples like the ones that are in the middle, right, in the top and the bottom, we are having a pretty difficult time discerning what is what in this picture. So when you're presented with an item like this and you have to match it to the image on the left, this quickly becomes a very difficult task. So this comes back to the question, can we use machine learning to solve this? Why? So now we know the why. We know the what. Now let's move at how did we solve this? So these first thing that you can see is that these images, right? So let's take the image on the right doesn't quite look the same as this image. So this one on the left, for example, this has been taken with a professional equipment, has been taken without a background. So you can see that the illumination is very constant. You can see that this is very high resolution, and there's also no blurriness, and everything is on focus. So this is not always the case, especially when we get images. The Ashmore museum gets images that belong to individuals or other research institutions who might not have the same researchers for capturing this information. So some of the technical challenges that we can face is that first, the image will be very low resolution. For example, let's say you take it with a smartphone, maybe it's blurry or noisy. There is also a very inconsistent illumination across the image, so some areas might be darker than others. Also, the physical condition of the coin might be that the coin might be highly deteriorated, right? So this will play against actually finding out similar items. And also, the problem itself is very hard because we are talking about coins or objects that are more than 2000 years old in some cases. So in short, photos that are taken by non museum personnel look very different than images within the digital collection, making visual search very challenged. So the way that we thought about this is that we should first split the task into two. First, one is let's improve the base image quality and let's make this coin look as much as possible to look as similar as possible to the one that we have in these digital collection. Right? So we want to. For example, in this case, we have a blurry background, we have a rotation, we have low resolution. So we want to account for all of those things and create an image that is very similar to the ones on the right. Once we have this image, we can extract features out of it and search in the collection to bring back the most similarly looking items. So this is an example of what we're doing here. You can see that we are detecting the shape of the coin and then we care coins, all of these activities at the same time, right? So we are removing the background, we are rotating these image, and then we are also increased the resolution of this image. So that way we have the item on the right, which is more similar than the image that we had on the left. Once we had this image on the right, we come back to this metadata that we have also extracted from the image, right? So we know if the image is heads on tails or tails with a 95% aggressive. So we know that in this case, we are looking at the overs of a coin and we can scan through all the images in the collection, but only at the overs. We don't need to look at the back aws well, and we can also use this other information, like, for example, the material, the region, the city, the province, and the person who is at the coin to make the task of identifying this item easier. So with this, let's move to a very quick demo of what this proof of concept was. And just have to say that this demo has been produced more than a year ago. There is a new open source solution that is in the works. It's not going to be restricted only to coins, but rather any collections object that you will have either physical or digital, say gems or fossils. Any object with the idea, the care concept that you want to visually search for similar items inside a collection. If you're interested in something like this, keep posted to this video. We're going to add any news that come out. Anything that is released will be added in the comments below this video. So with this, this is a web application created using streamlip and show you. So the idea for this is that we can interact with these models that we have created in a way that we can, for example, upload a picture. This case, the first thing that we want to do is either choose an example from a library or upload a picture. In this case, we choose the image that we saw before. Blurry background, image rotated, low resolution. So what we want to do is first find a region of interest out of this image, remove the background, auto rotate it, and these finally apply some deep blur and upscaling to the image. So once we have finished this and this is all happening in real time, you will have this image. That is the output of this process. Once we have this, we want to extract also metadata out of this image, right? So for example, is the overs, who is the person that is in the image? What is the material that this coin is made of? What is the reason that this belongs to? And so on. Once we have this metadata, we care going to use the features that we have extracted out of this image. So this is, for example, the faces, the eyes, the way that these are placed in the image. We care going to use this to look inside our collection. And you can see that we are going to come back with eight results that are similar to the image that we are looking for. So in this case, volunteers, for example, doesn't have to go through thousands of images. They only have to focus on eight images. And they also have information that can point them in the right direction. Right. So they have the region, the person who is in the picture, and also they have similar items. So maybe when they see an item that already exists, this is just another specimen, they can quickly attach this to that one. So what are the benefits of using aws for dash model? So the first one is that this is very quick and easy experimentation. They built and deploy eleven machine learning models in about ten weeks. There's a smaller workload, right. So you can imagine that saving up minutes of every task in a pipeline. In the end, when you have a large volume of items that you have to digitalize they adapt to a lot of time. In this case, it's estimated that they will save up to three years of work cataloging a collection of 300,000 coins. Less time. The coin analysis is expected to take just a few minutes versus times that are ranging from 10 minutes to maybe hours. And also more value, right? So this is complementing the work that is already being carried out by volunteers. This is not automating anything, this is augmenting the people, the humans who are behind this. So these are some quotes of this. I thought this project would be complex and time consuming, but using Aws made it easy. Another one, this comes from Jerome. Now we can focus our volunteers on other steps that add value machine learning process improves the workflow and productivity and adds value for the public. With this, let's have a look at this very small task as an example. We want to remove the background of this, right? And we are going to see in one of the examples how we can do this actually technically. And doing this doesn't have to be all done by yourself. For example, there are some solutions available in the marketplace that you can use an out of the box, right, for background removal. And in this case, this one, for example, at this time you have a price for every API call and you can just subscribe to this one. So if you have images that you want to remove this background for, then you will just subscribe to this API and then just run them through this service right through this endpoint. Another way that you can do it is of course you can build your own algorithm. And we're going to see an example actually out of this, where you pick up, for example, this data set that is an image segmentation data set, open images, and you have more than 600 classes where these segmentation masks are available. Then you use an algorithm. In this case you're using mask or CNN. And we can use different machine learning frameworks, Pytorch, Tensorflow, Mxnet, together with Sagemaker and different ways of doing training. So with this you can build also very easily, you can build your own custom pipeline. These shows us some resources. I'm going to add these to the description of the video anyway, but just to give you an idea of other things that you can do, there is a recent collaboration between hiringface and Sagemaker that is very useful. It's very robust, secure, and also I added some documents and repositories for deploying your very own web application for machine learning. So with that, I'm coins to actually change the focus to the second bit of the presentation. And I'm going to move to this one. Okay, so now we want to explore these idea of building your own machine learning solution, right? So we want to build the same thing. How can we do, we want to create these heads versus tails model the classifier. We want to remove the background and also we want to visually search these images in a collection. So how can we do it? Okay, so let's focus first on the first one. Okay, so image classification. So for that one, I'm going to show you now, Sagemaker Studio. This is an end to end platform for machine learning from AWS. In this case, I'm not going to go into a lot of detail about what thing does what or anything like that, but I'm going to show you that there is something called Shamstat. And what is that? Basically when you click here, let's take the first one. Right? So model popular image classification based on. Okay, so that's exactly what we want to do. We want to build an image classifier. Let's do some more exploration. Okay, so when we explore it, you see that, for example, I particularly like this architecture. Efficient. Net has a very good performance. And you can see how you have different versions of this available out of the box. So these one are feature vector extractor. Right. And we'll get to why this is important in a second. But just keep them in mind for now. What we want to choose this is we want to choose the biggest variation, the b seven, these most performant, and we want to use these for our model. So once you click these and you have selected these model or these, then we can either deploy the version that is available without any changes. This model has been trained with imagenet. So let's go back for a second. So we have this. What is this? So this is jumpstart is a repository of solutions and models that you can quickly deploy with one click. In this case, we want to look at vision models and we want to look at solving the task image classification, right. So we also have the data set that this model has been trained on and we know if the model is fine tunable or not. This case it is. Right? So the same as this model that we have here. So how can you fine tune it? Well, you just go here to fine tune model. You choose the data source and you find your s three buck. You choose it, choose the directory name where you have it and then you can choose the instance that you want to use to train and then the parameters that you want to use and you will train it. And once this is trained, you can deploy it as an endpoint and use this model for inference. So how should you position your data? So you would have your input directory. This is the s three bucket that we were talking about before. And these you will have two folders. First one will be the overs and then you will have your examples and then you will have the reverse. Right, an example. And with that you don't have to do anything else. You can directly train it from this screen. Once this is trained and deployed, you can deploy it. And what you're going to see is something like this. So you can see that this takes around 10 minutes maybe to deploy or even less than that. This is using a CPU instance in this case. So you don't have to worry about GPU or CPU. You can use both. And you have an endpoint, you have a notebook that will show you how you can use this, how you can use this endpoint, right? So this one you see that we have two pictures. In this case we are using the original model. So the only thing that it has to do is pick up that this is a cat and a dog. And then you can see here top five model predictions or tabby, et cetera and so on. Top five model, et cetera. If you were using your own model, these classes will have been drivers and overs, right. So with that, let's actually move to the second model that we want to do. So we finish an image classifier and now we want to move to a segmentation model. We want to remove the background. So you can use your own segmentation model or you can just check other solutions that are open and available. So this is a website that I really like, papers with code. The task that we want to solve for is saliency detection. And you can see that you also have here available things like YouTube net. This one is very successful at detecting background and removing it. So choosing the most important object in the image and these removing the background. So this is also something that you can use with Sagemaker and then deploy it as an endpoint. Because we want to build something that is very custom. We care going to go through a different route and we care going to use an open data set, in this case these open image data set. We're going to look for coin, but it can be other things as well. And you can see that we have here the segmentation mask and they are available. So we are going to use this segmentation mask to train our model. So for this we're going to use this repo that we have here. I'm going to put these, this is in the links that are available in the presentation and will be made available, the description of the video. And I'm just going to walk you through some of the steps that we want to do this, you want to do here. So we're going to use custom library that is called Ice vision. This is built on top of Pytorch and it's on top of Pytorch lightning and also fast AI. So it uses both things for training. And at the same time it has available many, many algorithms out of the box. So for example, factor CNN or Mascar CNN. So this is these thing that I'm going to be using for training in this session. So the first thing that we want to do is we want to download that data set and all the images, but only for the class coin, right? There are more than 600 classes available, but we only want to use this one coin. And 600 are these, these are the 600 like person, piano, et cetera and so on. So we only want to train this model on coins. Okay, so the first thing that we do, we download the data and extract these images and the segmentation mask, we save them locally and then we convert these annotations because originally they use one vocabulary for this annotation and we want to move it to another one. So we move it from something that is called Pascal to another one that is called cocoa. Common objects in comma. So once we do this, we upload the data with this one line of code, right? We upload the data to a string, which is our object storage, storage. And we define what are the resources that we want to use for training. This case we want to use CPU instance. So we use this, these p, three, two, x large. And I'm not going to use a spot, but you can think about spot as a way for going into an auction for unused compute capacity. And you bid for this unused capacity. Normally the savings range from 60% to 90%. So this is whatever the on demand price is, 60% to 90% less than on demand price. And the only caveat that you have is that these resources, because you are bidding for them, once someone wants to use on demand researchers, your capacity will be taken away and given to them. So effectively your training will stop. The good thing is that all of this is already taken care of on AWS and you are saving checkpoints as you are moving on with your training. So in this way, if your training suddenly stops, for example, once this compute capacity becomes available again, you can start using it one more time. So I would recommend you to use these things because with only three lines of code you can save maybe from 60% to 90% of the cost. So once we have set up that configuration, we go here and we create something that is called an estimator, right? And we take our train script which is this one, the source directory where everything is. Let me show you this case. It's only two files, requires TXT and train and we pass arguments parameters to these training shop. So what this is going to do effectively is create a container new, different from what you're seeing here. Another instance only for this task and you will only have to pay for the amount of time that you've been training, not more than that. So with that you can see that we create this estimator and then we fit to the data that we had. So inputs, this is the data that we downloaded and then uploaded to s three. And after some time this is going to finish and it's going to tell us that it was successful. Of course you can also track this if you go to the AWS console and you can see the shops here for example, you can see how much this training took. This is around 22 minutes and we were charged for 22 minutes. If we were using spot instances we would have had reduction of around 70% of the cost in this case. So once this is finished training we want to deploy this model and run our predictions. So for that we can use this other example where what I'm actually doing here is I'm creating a container but I'm running this model, right. So you can see all the steps, just want to show you don't want to stay on the details too much. You can explore this at your own time. But I just want to show you the results of this. You can see that the actual time that it takes for a prediction is quite quick, right. And the quality is quite good right. So we have the image on the left and we only want to pick up one coins. So we pick up the one on the right and you can see how the background has been removed completely and the image is clean. So with that and conscious of time, going to move to the last item today and that is how can we build a visual search engine. And for that we care going to follow this blog post building a visual search application with Amazon, sagemaker and elasticsearch. So basically what we want to do is you have, and this is using an open source data set from clothes fashion. But of course you can think that you can change these things, these images, to the images that you have, for example, coins, right. So what this is going to do is it's going to run a convolutional neural network against these images. It's going to extract these feature vectors. And this is going back to that model, right, that we were talking about, the Shamstad model. Right. So we have this feature vector structure. Right. So we can actually deploy this and we don't have to do any type of custom modeling. We have the model right here. So once we have these vectors, we input all of these vectors into elasticsearch, and then we do something called k nearest neighbors search. So we look at the images that have the lowest distance between them, between the feature vectors from these images and the reference image that we have, right. So if you go through these steps, you will see that clicking here, launch a stack. This is going to open up this screen where basically we just create the resources that you need to run this. So it will create an S three bucket, it will create a sage maker notebook. And then the only thing that you need to do is actually, let me show you, open your notebook that was recently increased, increased. And you care going to be presented with this repo, right. And this repo is this one. Again, the link to this is, you can find it in the description of the video. Let's dive right into it. Right. So we have this image, visual image search. The first thing that we want to do is get these trend data, right. So this is almost 10,000 high resolution images. In this case. In your use case, this will be your images. It wouldn't be these 10,000 images, it will be yours. And you can see that the first step that we do is we get this data and then we do some transformations, and then we upload this data to a string where we will have it. That will be the location that we are going to read from once we want to train our model. So once we have these images, we are going to be using a pretrained model that comes included in the Keras libraries. This case, it will be Resnet 50. But like we were seeing before, you can actually, instead of doing all of these steps, you can just use the model that we saw before. Right. Let me go again. So this model, you can just, once you deploy it, you click deployment, you will be presented with an endpoint URL. And that is the one that you can use to do this task. Otherwise, let's continue with this custom implementation. So we take this Resnet 50 and we want to deploy it as an endpoint, right. So that's what we do now we use this piece of script. It's the one that we are going to be using to pick up the model, load it into memory, and then run all of these images through this and only return the feature vectors, not the actual label, out of it, just the feature vectors. So that is what we do here. So in this place we are going to deploy the model as a sage maker endpoint. This normally takes around 10 minutes. You can see that I'm using CPU instance. This case I'm going to deploy only one, but you can change it if you want this to be quicker, for example. So all the requests will be routed to one instance. In this case we are going to be using an example image and this is the result that comes back from these input. So these are the feature vectors. Once we tested that, this actually works. We want to build this index, right? So we want to first get all the images, all the keys of these files on s three. And then we want to basically process all of those images. We want to get the feature vectors out of all of those images and we want to upload them or get them into this elasticsearch index. So once we have done that, you can see that that is what we are coins here, we care, importing these features into elasticsearch. And the next thing that we can do is now we can do a test. So you see that we have the first image, the query image here, and now we say, okay, so bring me back examples out of your index, bring me back the most similarly looking images, right? So you can see that you're only returning outfits that have all of these patterns. So these are very similar between each other and the same thing. We use a different method, but these is the same result and you can see what this looks like, right? So in your case, using your own data, this will be presenting one coin as the reference image, the input image, and then returning all of these most similarly looking images in the collection, right? So the good thing about this application is that it actually also involves this implementation, that it also involves deploying a full stack visual search application. So this is great if you're doing a demo. So what you will do is these are several steps for creating the architecture. But basically once you run all of these steps, you're going to be presented with an application that looks like this. And I actually have it running locally here. So you see that, for example, you will choose how many items you want to return out of this. And these you can choose an image, and you will just submit your shop and then get the results back. Let me show you, for example, in here. The way that this will look is something like this. So, at the end of your experimentation, if you want, you can delete all of the resources that we created, and then you will just finish with your experimentation. You wouldn't have any extra cost out of this. So, with that, I actually wanted to come back to original presentation, and I wanted to thank you for staying with us so long, and I hope you find this presentation useful. Thank you very much.
...

Nicolas Metallo

Senior Data Scientist @ AWS

Nicolas Metallo's LinkedIn account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways