Conf42 DevSecOps 2023 - Online

Supply Chain Attacks: Focused on NPM attacks

Video size:

Abstract

Supply chain attacks are spreading like no other disease. This talk would be focused on the account takeover vulnerability of NPM packages of JS and prevention techniques and our scripts. - Includes research of scanning over 2.1 million packages for account takeover vulnerability (non-intrusive).

Summary

  • Today we would be talking on supply chain attacks focused on NPM attacks and remediation of such supply chain attack vectors. Anything that is presented in this talk is not meant to be illegal, unethical or malicious in any way.
  • NPM packages are used by developers on a regular basis. There are maintainers of those packages who could push out updates. If the accounts of those maintainers get hacked, that NPM could get affected. Last year attacker was able to take over NPM library that had 6 million downloads.
  • NPM attacks vector on a global level so what we did, we did gather packages from different publicly available sources. Used our in house servers and made some scripting and did some research to find out how many of those packages are actually vulnerable. Now Hassan would come and he would present on how we did that and what we found out.
  • Danish, can you give me permissions so that I can share my screen? Yeah, sure. Now you can do that. 1 second. Amazing. Now let me sharing my screen so we can jump into the research.
  • Khan: This research was account takeover vulnerability. We used multiple technologies and multiple scripts to extract the email addresses from these NPM packages. From these packages we collected 6.7 million email addresses. When we start doing the reattribution we're going to attribute those domains with their email addresses and then we are finally going to identify vulnerable packages.
  • On average eleven email addresses are being used in a single NPM packages. If one domain gets expired and someone just claim it, then it's literally going to affect like 3000 plus packages. Hassan would be joining us in one or two minutes to continue with the remaining part.
  • We did not only research NPM. Yes, we also did a research for ruby gems as well. We scraped, we extracted the gems, and then we identified dependency confusion vulnerability on these gems. 16% of the gems were found out vulnerable at that time. How many other packages could be vulnerable right now?
  • So we have talked about the problem of account takeover and the vulnerability and dependency confusion. What are the solutions to these problems? MFA, MFA has been around for many, many times now. SBOM has been standing out into the market right now.
  • If anyone want to connect with us, they can just scan these QR codes. They can just come to our LinkedIn and ask any questions or just simply stay connected. Stay tuned for our upcoming research.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Welcome to our talk. Today we would be talking on supply chain attacks focused on NPM attacks and remediation of such supply chain attack vectors. My name is Dhanish and I'm a security researcher playing around and working in cybersecurity for last eight, nine years. And we have done our cybersecurity research on different open source attack vectors, especially related to NPM, hard coded secrets and a lot of other such scenarios. We also have been invited to conferences like Blackhead, Sascon and some other global conferences. So this is the research we done last year and we are going to present it today to you guys. First of all, the disclaimer is that anything that is presented in this talk is not meant to be illegal, unethical or malicious in any way, and we expect the same from you. So keep that in mind. So first of all, supply chain, traditionally, supply chain means the involvement or a network of suppliers, raw materials and manufacturers to produce an end or a final product and then supply that to the final consumer. That could involve people, entities, information resources and activities. So this is the traditional supply chain, right? For example, take an example of this delicacy, right? If you look at it, not only the chef made it, but also the involvement of different ingredients, whether that was butter, honey, pistachio, other ingredients. If any of those ingredients go bad, the final product or the final consumer could be affected. And same for the car industry. For example, the car is not manufactured in a single manufacturing unit. Different parts get outsourced from different countries, from different factories. So similar is for the software. Software is not developed or not coded, you could say in a single in house team. There are involvement of dependencies, binaries and other components to prevent the reinvention of the wheel, right? If we look at this software development lifecycle, it looks like this and there is a huge involvement of dependencies into it. And we would be talking on the risk factors that are linked to those type of third party dependencies. And there is a famous proverb that chain is only as strong as its weakest link. And this is a little bit of a meme that is going to be very relevant in the upcoming slides. So security issues we are going to discuss today would be focused on dependencies today. So there are different type of supply chain attacks and different type of supply chain attacks. Scenarios or possibilities. First of all, if you look at the vulnerabilities, if for example, if you are going to use a third party code, and what if that code does have a vulnerability? For example, there was a huge case of log for shell in the past, right? If you are using a third party dependency and that third party dependency is vulnerable, so that vulnerability inherently comes into your own code base, into your own production application. And then we have a type of squatting. It's like a mimicking the name of a trustworthy package to fool or to trick the developers to trust a malicious package, for example, right? Then we have repo checking tags, could claim repost is username. When an actual person changed the name, it's similar to subdivision takeover. Then we have account takeover. The focus of this talk would be on account takeover and we would let you know that how that is effective and how that works, dependency confusion and the effectiveness of these type of attacks is that security researcher was able to breach Microsoft, Uber, Apple and Tesla to make a point, right? And this is another example from the SS that 500,000 systems were affected because of our supply chain attack vectors and obviously log four shell. All of these companies were affected one way or another by the log for shell. So coming to the node package manager, actually the node package manager is the world's largest software registry versus the software registry. Software registry is a platform, is a solution where third party libraries or dependencies or code snippets are placed for others to use in an open source scenario. And JavaScript is the most used language for last 910 years according to stack overflow. And we are going to focus on the Javascript dependencies and NPM dependencies on this research and this tag vector. Let's take an example of a package that stays on NPM. If you look at it, this is a very famous NPM package called exprs. You can see there are 31 dependencies of it. Like Express is dependent on 31 dependencies and these are the dependents. For example, these packages are dependent on exprs. So what does that essentially mean? We would have a visualization of that in the future. Slides NPM packages are used by developers on a regular basis, obviously. And there are maintainers of those packages who could push out updates. What that essentially means that any package or a third party dependency or a code that stays on NPM as a package have a maintainer or an open source contributor or multiple of those maintainers that can push out updates and stuff like that. This is a snapshot or the screenshot from the NPM from the last year, and we can see that this number of packages were there, download numbers were huge. So you can get an idea of how widely the NPM dependencies are being used in the real world. Let take an example of the similar same package called Express. And we would now visualize that how the web of dependencies looks like. For example, this express is dependent on a lot of dependencies and those dependencies are dependent on other dependencies and stuff like that. So if you are using a dependency, you are not only depending on that, you are depending on a lot of other dependencies, essentially. And that is basically a supply chain that if any one of those gets bad, you are going to get affected. So let's move forward. As I've already mentioned, there are maintainers on NPM. And what if the accounts of those maintainers get hacked? Take an example of a Facebook. There are packages on Facebook. What if the admin of that page gets hacked, right? Obviously that page would get affected. And similar to NPM, if the account of the maintainer of NPM package gets attacked or hacked, that NPM could get affected, right? If we look at the possibilities, there are two common possibilities. What if their email addresses are takeoverable? We would get to that in the future slides. And what if their passwords are leaked in some breach? In both the cases, attacker could obviously take over, then pivot out and mess up with the code. So moving forward, this is a little bit of a workflow. That package is maintained by maintainer and those maintainer could make changes as already mentioned, and maintainer accounts are linked with an email address, like an example of social media and other accounts. Obviously in NPM there are accounts that are linked to email addresses and those email addresses are obviously linked to a domain or a mailbox, for example. So what if these domains get expired, for example, right? If the maintainer or developer is using some custom domain. What if that custom domain gets expired? We would look into the possibilities. So last year attacker was able to take over NPM library that had 6 million downloads. To make a point on how significant the takeover of a maintainer email or a maintainer account is on the security of supply chain. So let's take an example of a package that has 36,000 dependent projects, for example. And that package is obviously on a software registry account, which is NPM. That NPM account have an email address of maintainer and that email address does have a domain, for example. Most common ones are Gmail and stuff like that. But look at the custom domains. What if that domain gets expired? Obviously, attacker could take over that expired domain and then that email address and then can reset the password of the software registry, MPN and then take over a package. And then those 36,000 projects can get affected. How the attacker would actually do that. Attacker would just look at the maintainer of the package and then pull out all of the email addresses of the maintainers that is available on NPM and then would look into the who is data for all the domains of those maintainers and see how many domains or if any domain is expired, he would just buy that domain and yes, claim the mail inbox and just forget the password on NPM software registry and create malicious updates of those packages to affect anyone that is using the packages, right? So last year that attack was on peak or there was a boom of that attack, but there were no defensive strategies, even manual or automated wherever you googled it or stuff like that. So we were able to be on a spotlight to spread the awareness on how to find NPM dependencies that are vulnerable to account hijacking. And you chain secure your ecosystem from that type of attacks, right? So manually how you can prevent those? You can prevent those manually by listing down all the packages that are in your company. And you could have the log JSon file package log Json file and stuff like that to pull out all the packages that are being used in your project. And then each package for this command npm view package name here and then maintain an email here on your maybe personal computer to find out the maintainers of those packages one by one. If you do that manually, obviously that is not effective. But yeah, and then you can just take out all the email addresses and separate out the domains and then look out the who is data of all the domains of the packages of the NPM that maintains that you are using in a project. Right. And then identify the vulnerable ones. But that's not effective. So we found out the automated ways because mostly hundreds of packages are being used by a single organization on a project. So it's preferable to have some type of a crone job and automation to do that on a regular basis and not just copy paste a command and do that manually one by one. So we scripts a mini tool that you could use in your pipeline to look for the takeover NPM packages in your code base to just get rid of them or to just turn off the auto updates or to be vigilant. In that case you can install that automation based script from here. And yeah, use that. You can use this command after adding your packages and package text file and then use it to find out the vulnerable ones. So now getting to the real jewel, we thought about the thing that how much is the effect of this vulnerability? NPM attacks vector on a global level so what we did, we did gather packages from different publicly available sources. All of the packages, essentially around all of the packages that were available at that time, NPM packages. And then used our in house servers and made some scripting and did some research to find out how many of those packages are actually vulnerable. And we are talking about millions of packages that we have done the research gathered from different available sources. So now Hassan would come and he would present on how we did that and what we found out. And that's the most exciting part of this talk. Hassan, can you please come? Yeah, sure. Dhanish, that was really grateful insights. Now let me share my screen so we can jump into the research. Dhanesh, can you give me permissions so that I can share my screen? Yeah, sure. Yes. Now you can do that. 1 second. Okay. Danish, can you see my screen? Yes, it's perfect. Okay. Amazing. So, yeah, as Danesh mentioned that we are going to focus on the at scale research that we have performed. So before I jump into the research, I just wanted to quickly introduce myself. I am Khan and I've been a security researcher, security engineer. I have multiple cvs under my name. I've got a chance to present the supply chain attacks and its research into multiple conferences like Black Hat, the Sascon, Devsecon and other conferences as well. And I really love to perform mascan at scale. And this is a QR code for the LinkedIn if you guys wanted to connect. So, yeah, about this research. Initially we started with. At that time of research, we collected all of the NPM packages. And at that time we have 2.1 million NPM packages available on the NPM registry. We used multiple technologies and multiple scripts to extract the email addresses from these NPM packages. Because this research was account takeover vulnerability, as Danish has explained in his previous slides that we did extracted the email addresses. So when we started performing the extraction, we came up with 6.7 million email addresses. And this is just a graphical representation that literally shows you from the step one that we collected packages. And from these packages we collected 6.7 million email addresses. This is a script, a Python script that has been used and it's publicly available. It uses NPM public API to extract the email addresses from the packages. And of course, when we extracted that email addresses, it was really obvious that multiple packages were being maintained by a single person that has an email address. So we started sorting out the email addresses and we came up with a number which is like 600k emails, which are unique email addresses. So in this representation, you can see, we collected packages then we extracted email addresses. And from email addresses we collected the unique number of emails. And then because we have to look for the expired domains because to take over an account you have to claim that expired domain and then you have to register into NPM registry. We extracted all of the domains from these email addresses and we found out there were like 132k domains initially in this research. Upon finding out the unique domains we came up with the number one thirty two k and when we started looking into the expiry we used multiple resources including APIs and who is extraction of these expired domain. And we came up with the number 675 domains which were actually expired domains all over the NPM registry. From this perspective of the research we can see we started with the number 2.1 million NBM packages but now we are going down to 675 domains only. And let me add one more thing here. This research is going to be a two way research because in initial phase, in first phase of this research we're going to extract packages and from packages to we are going to extract expired domains and when we start doing the reattribution we're going to attribute those domains with their email addresses and then we are finally going to identify vulnerable packages. And a special thanks to one of my colleagues, Yelp for helping us in finding out the expiration of several domains and defining the procedure of. Yeah, this is the whole procedure that we did for the extraction of the expired domains as you can see. So now we are onto the part of the reverse. We can say from domain attribution to email attribution and then we find out that there was literally 845 crumbs of separate unique email addresses that has been used or has been utilized with these expired domains. These are just the graphical representation of the complete process which explains that we started with the packages and then we went down to the expiration of the domains and then we started the research from the back and then we attributed those domains with their email addresses and now we are onto the path of the attribution with their vulnerable packages. So before we jump into the conclusion of how many vulnerable packages we identify, let's look into some stats and some fun and very impactful stats. So if we divide total number of email addresses with the unique email addresses we get the number eleven. This means on average eleven email addresses are being used in a single NPM packages. And here in this screenshot you can see let's look at the first sample. We can see it's like 3800. And then we have an email in front of front of it. This means a single email address has been utilized in these amount of packages. So just imagine if this email, which has a domain, if it gets expired and someone just claim it, then it's literally going to affect like 3000 plus packages. And if you go last, in the last number, you can see we have like 9000 plus on a single email addresses. And this number is literally huge. So just imagine the impact here, how much of impact one expired domain can have on NPM packages. Another quick math we can see if we divide this eleven that we extracted before and we multiply it by 845, which is actually the number of the unique email addresses we found, we come up with the number like 9499. And this actually represents the total vulnerable packages that has been found. But when we did the actual research, we came to know the total number of vulnerable packages was 2843, which is really small number. Again, we know that we started with a huge number which was actually 2.1 million. And now we are come down to the number like 2000 and something. I mean if you are researching and you're doing your research and from this perspective of research, you might be thinking, okay, this research has no impact, the number is very low. But now let me show you some really good stats that can show how impactful this research is. So if we look into the total packages, we had like 2800. And if we look into the dependent repos, then we can see there are like 250k dependent repos. These packages have 250k dependent repos. And as we have talked about, every packages has multiple dependencies in multiple dependents. So if one package affected, it's going to affect other ones as well. If we look about the dependent packages cumulatively on all of these vulnerable packages, then we come up with the number ninety three k. And if we look into the folks and contributors, then the numbers are really astonishing. If we look into the folks, we come up with the number 400k, which means literally there are 400k people who have actually cloned these vulnerable packages. Or when you folks something, it gives you an idea that that code might get used into the other users computers. And if you look in the number of contributors, you can see 50k people are actually contributing in these packages. So the number is huge. But if you look into the vulnerable packages, the number, it's 2843. I mean the number is really small. But when we look into the impact of these, look how many folks, how many contributors, how many dependents are these packages are actually affecting right now? So yeah, this is really huge. There are million of downloads are happening around on single NPM packages. If we just look into the other packages, like NPM package and maybe express package security packages. You guys see many? Some. Hassan, can you hear me? Yes, I can hear you. Hello, chain, you hear me? Can you hear me? Yes, I can. Can you hear me? Hello, guys. So hold on for now. Hassan would be joining us in one or two minutes and yes, then he would be continuing with the remaining part. So no worries. Welcome back. Hasan. Yeah, Danish, can you hear me? Yes, that's perfect. Now let me share my screen once again so we can continue. Danish, can you see my screen? Perfect. This is the slide we have to continue, right? Yeah. Okay. Amazing. So, yeah, we were talking about the impact of this research that how this research can affect. So as you know that in this research we extracted the email addresses. But as we know that email addresses actually can be found in many terms, like dark web dumps or data leaks, et cetera. So what if these emails have been into the data breaches and these are actually being leaked? So just imagine if these leaked credentials are being actual NPM credentials or GitHub credentials. So, yeah, the impact is really huge here. We did not only research NPM. Yes, we also did a research for ruby gems as well. We extracted all of these ruby gems. Initially it was one hundred and sixty k. And for this research we did not just downloaded all the gems, we did something different. We started scrapping the packages that are publicly available on the Internet. For this, we used multiple resources, used multiple resources like GitHub, BitBucket, GitLab and other resources, and scrapped all the public available gems from the Internet. And this was the process that we have used. We scraped, we extracted the gems, and then we identified dependency confusion vulnerability on these gems. This research has a very tricky part, because for the extraction and for the identification of dependency confusion, we used multiple scripts that are linked in below, and we utilized several techniques and we created a vulnerable ruby gem that has been the part of dependency focused vulnerability. And once we have this script, the hardest part was to extract or accelerate the data from the vulnerable gem. We used multiple techniques, we used Burp collaborator with, you can see Nslookup, who am I, and hostname commands. And we extracted as much as information to collect further exploitation of the packages. And the fun stuff. And the fun part of this research, we just tested a very small chunk of gems. It was like 1700 gems was scanned, and out of these we found out like 285 gems were vulnerable, which is actually 16% of the gems were found out vulnerable at that time. So just imagine like there were total number one hundred and sixty k and we scanned only 1700. So just imagine this percentage will go more up and how many other packages could be origins could be vulnerable right now to dependency confusion attack. So this is the script that we have created and used for the identification of dependency confusion vulnerability. Yeah. So another tool for another problem for gems, we created this tool. It's called a gem scanner. If you have dependencies or gems in your code, you can use this tool. It will identify vulnerable packages and outdated packages and it will output on the terminal. Excuse me, this is just an example of the output of the tool that has been used. As you guys can see, we have some labeled with the already on the latest version and the current version which identifies that. We have to update these gems. So we have talked about the problem of account takeover and the vulnerability and dependency confusion. So what are the solutions to these problems? First of all, MFA, MFA has been around for many, many times now. NPM and Even Ruby maintainers, Ruby gem maintainers are also implementing these type of protections. They have started this MFA enabled from the top packages and now they are actually implementing this to the other packages as well. Other solutions what we have to use we literally have to keep an eye on the latest updates of the packages. We have to keep an eye on the cvs and the latest security patches that these dependencies are actually having. We have to perform manual audits and use automations even in CI CD pipelines to protect our infrastructure from these third party code. And what I prefer is to use some validate checksums of these packages. So you only know what type of code you are actually importing in your code. And of course we have to mature our CI CD pipeline in our pre commits. We have to secure development lifecycle as well. And these are some solutions that can help you guys. For example, if you are using Ruby infrastructure, Ruby on Rails MDC, you can use dependable. With GitHub, you can use bundler audit, you can use breakman to identify vulnerabilities in dependencies. If you are using node js, you can use NPM audit, node js, scan retired js for the integrity and find out the vulnerabilities and dependencies, et cetera. And the other tools obviously can be used as per your infrastructure. And if you are looking for the commercial solution, we know that SBOM has been standing out into the market right now. It's really to the peak and it's definitely to have an S bomb into your organization to protect you from such attacks. And this is one of the SBOM solution that has been really good into the market and can be used in your own code as well for the protection from these open source attacks. And I was reading the news and I just came to know that it was made compulsory to have an SBOM solution into your own company. And this act was being made by no other than Joe Biden. So, yeah, I think that's pretty much it about this research. So if you guys have any questions, you can reach us out on LinkedIn or any other platform you would like. Yeah, so that's all from my side. So, Hassan, can you move to the last slide? So if anyone want to connect with us, they can just scan the QR codes. The last slide after this one. After this one. There's no slide after this one. Okay, I just updated that on my slides. Let me just recheck that. Okay. Let me just share the screen for the sake of audience. If they want to connect with us, and they can just do that effectively. So if anyone want to connect with us, they can just scan these QR codes and they can just come to our LinkedIn and ask any questions or just simply stay connected. On the left, if you scan the QR code, you would go to my LinkedIn. And on the right, if you scan that, you would go to the Hassan's LinkedIn and stay connected. Okay, Hassan, do you have anything to share after this one? Actually, I do have. If anyone is interested in our upcoming research or the research that we have already done, we done the scanning or the at scale research of hard coded secrets that included AWS private keys and stripe private keys and a lot of other private credentials in the open source landscape. And that included the WordPress plugins, that included NPM packages. And we actually scanned all of those packages and all of those plugins for these type of secrets to find out the mistakes of developers when they just publish the hard coded secrets in the public code. So, yeah, stay tuned and you may be able to look into our research someday in other conference or maybe this one. Yeah, this is all from my side. Same. So, guys, it was really nice to have us into this session, and I hope you guys find it really productive and really insightful session. And again, if you have any questions, then feel free to reaches out. Thank you.
...

Danish Tariq

Security engineer & Security Researcher

Danish Tariq's LinkedIn account

Hassan Khan Yusufzai

Hassan Khan Yusufzai's twitter account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways