Conf42 JavaScript 2023 - Online

Building a Scalable Multi-Tenant Frontend Architecture for an E-Commerce Platform

Video size:

Abstract

Dive into the world of multi-tenant frontend architecture with us. Discover how we built a secure, compliant e-commerce platform using React and Next.js, and gain practical insights for your own development journey.

Summary

  • Gilermi is the CTO and co founder at Mercloud, currently based in London. Mercloud develops an ecommerce platform that specializes for the b two B market. We work with companies across different industries to help them to provide a digital sales channel.
  • A B two B ecommerce scenario is where the goal is to establish a long term relationship with your customers rather than a one time transaction. It involves complex topics like customized pricing models based on customer profile. The volume of these orders are quite big and they usually involve multilayered approval process.
  • The first version of our solution was a very traditional react application. We wanted to take advantage of things like server side rendering and caching on the CDN level. We had to replicate this stack once per customer and that was causing us to have high costs. As the architecture grew it was harder to get it fully automated.
  • A multitenant architecture is an architecture where you have a single instance of your application serving multiple customers. This model will help you to maximize the resource utilization in an efficient way. Data isolation is something you need to consider when building such a solution.
  • To develop it, we chose NextJs. Next JS has a great dev experience with zero config in a matter of minutes. To deploy it we decided to use Versaille. Versailles helps us stick to our serverless first approach.
  • Nextjs middleware middleware allows you to run code before a request is completed. You can modify the request and the response by doing, rewriting, redirecting or simply modifying the request headers. If you're doing any API fetching requests here, you should be caching this response.
  • Next uses a file system based router where folders are used to define the routes. We can also use some special notation to define dynamic route paths. This router is pretty simple to use and allows us to do caching as well. But it can also lead to mixing cached resources from multiple tenants.
  • Nextjs can be used to create multitenant applications. Can identify multiple tenants based on a domain if running on localhost. Feel free to use this repo and raise any pull requests of improvements.
  • The lessons learned from this journey for us was the first one. Always look into adapting tools and technologies that will help you to focus on business value. Experience and performance is the main thing you need to consider.
  • Yes, and that's the end of the session today. Hope you have enjoyed. Please feel free to reach out to me on my social media and also check our website at Mercloud IO. Bye.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hello everyone. Thanks for joining the session today, where we'll be talking about the genuine hat at Mercloud of implementing a multitenant front end architecture for our ecommerce platform. But let me start by introducing myself. My name is Gilermi. I'm the CTO and co founder at Mercloud, currently based in London, and I've been working for 17 years in the industry, most of those as a software engineer. If you'd like to reach out to me or simply follow the content I'm always sharing online, you can find here on this slide the links to my social media. Also, let me talk to you about Mercloud. What we do here, we develop an ecommerce platform that specializes for the b two B market that handles all the complexities of sales between companies. We work with companies across different industries to help them to provide a digital sales channel to their customers where they can have access to the catalog of products and also make and manage their orders. If you're interested to know a bit more about our company and the product we built, we can check that out on our website, mercloud IO. Or you can also reach out to me on my social media. All right, let's start understanding what exactly is a b two B ecommerce? When we talk about ecommerce, the first thing that comes our mind is the traditional b to C model where I'll give you an example. I want to buy a product. I go to an online shop, browse some products, compare different options, prices, add to my basket, make the payment, and after a few days I get it delivered at my doorstep. On a B two B scenario, the actor behind the purchase, it's a person on behalf of a company. Let's say you have a store where you sell electronics. So you need to constantly reach out to your suppliers to buy products to refuel your stock as you're selling your products. That's a B two B ecommerce scenario, where the goal here is often to establish a long term relationship with your customers rather than a one time transaction. If I go online and buy a television, I won't be coming back to that same online store to buy another television in a couple of years. But if I'm refilling stock for my store, I'll be using that very often. On a b two B ecommerce, we need to handle complex topics like customized pricing models based on customer profile. And the profile here can be the geolocation, the customer tier. You might have a customer that's vip where you apply special pricing to it. There's also complex tax regimes where the policies and the rates are based on the product and the customer who is buying. I might have a product that has a reduced VAT rate, for example, or it might have extra taxes applied on top of that. And also because we're talking about refilling in stock, we're talking about big transactions. The volume of these orders are quite big and they usually involve multilayered approval process. People need to approve each transactions in the chain. And who exactly are these customers of such a solution? And to understand who they are, first we need to understand how the product supply chain looks like, from getting the raw material to the manufacturer and then distributing the products to retailers and wholesaler suppliers who will then sell these products to the final customers. And all these transactions we see here, before reaching out the final transactions, they are b two b transactions. And here is where b two B ecommerce is used. The final one is the traditional one that we know that's a b two c. So that's another type of ecommerce that we're talking about. All right, so let's see where we started. The first version of our solution was a very traditional react application where we wanted to take advantage of things like server side rendering and caching on the CDN level. And the architecture of our mvp looked like this. We had it deployed on AWS. The react application was running on a Fargate cluster, and in front of the load balancer we had a CDN. And also we had our static asset, not only Javascript but also images of products hosted on s three that we could serve to the users. That was a quite simple architecture, but there were constraints. We had to replicate this stack once per customer and that was causing us to have high costs. Each tenant, they had to have their own infrastructure with their own resources, and each of these resources they had to be sized to accommodate the demand of each of those customers. This was also making our onboarding process rather complicated, not only deploying a new stack per customer, but also configuring it. And you can see that this was getting harder. By the time as this architecture was growing, it was harder to get it fully automated and also was challenging us to deliver new features with a very slow deployment process because we had all these tags to maintain to update and on top of that was also difficult to monitor. And we also had some poor performance numbers. Even though this was not perfect, it was pretty okay for our mvp and helped us to onboard our first customers. But the next step was for us to think about the next generation of this application and reimagine the architecture. And we wanted to rebuild an application, not only to modernize it, but also to support increasing customer base. We had, and the main points we wanted to focus on this re architecture were increasing the scalability and reduce the management overhead, have less things to maintain and configure. And also we wanted to have an architecture that would allow us to have quicker deployments and also a faster onboarding experience. We also wanted to reduce the latency of our customers. We had customers spread across different regions in the globe. We have many customers in South America, us, and also few in Europe. So we wanted the data and the service as close as possible to our customers. We also wanted to increase the observability to understand better what was going on on our application. The solution to simplify this architecture was then to transform it into a multitenant one. Okay, what exactly is a multitenant architecture? It's an architecture where you have a single instance of your application serving multiple customers. And these customers, they are known as tenants. This model will help you to maximize the resource utilization in an efficient way, because you're sharing the same infrastructure with all your customers. Here we have an example where we can compare the two different models. The single tenant one on the left, where each of our customers are tenants, they have their own installation, their own instance of the application, and that's talking to their own data. On the right one, we have the multitenant one, where you can see that we have one single instance of the application serving everyone, but we can still keep that data isolated from each other. And talking about data isolation, this is a quite complex topic, but it's something very important that you need to consider when building such a solution. We need to pick an isolation strategy, and you can start from a fully isolated model where everything is isolated from each other, they don't talk to between them. Or you can go to a fully shared one where you have instances of, for example, your database, your data lake, and you're sharing that with all your customers. In the middle, you can have also a hybrid model where you share only a few of these resources. You have to pick the best strategy. But this is not only based on your own needs, this will also be based on your tenants needs. For example, compliance, GDPR. So this is something you really need to consider and pick the better strategy for that. Let's talk about the benefits of a multitenant architecture. First, it's cost efficient because we have resources that are shared among those tenants. We no longer need to deploy a different stack for each of them. It's also scalable. You can scale horizontally to accommodate the increase of demand, not only for new tenants, but also the increased demands of your current ones. Is you also have usually one single pipeline that will handle the deployment of this whole architecture. So this allows you to be more efficient and deploy quicker your changes and also helps with security and compliance because you end up having a centralized management solution with uniform policies that are just applied across all your customers. This kind of solution also increases your developer productivity. Like I said, you have one single pipeline that you handle the full deployment and in most cases you also end up with a single code base. It's easier for you to iterate. That also increases the business agility. It will allow you to adapt to new demands in a rapid way and also quickly launch these new features in a short period of time. Okay, so let's talk now about the technology choices we had for this project. To develop it, we chose NextJs. And the reasons why we chose next JS initially because it has a great dev experience with zero config in a matter of minutes. You can clone a template repository and start coding and deploy it in a matter of few minutes. That's great. There's not much management involved to it. It also comes with a simplified routing solution building and also other tools that helps on your day by day as a developer. For example hot code reloading. It also comes with rich building features that helps you with server side rendering and also static generation. Also because we already had react expertise in house, was easier for us to stick to the ecosystem and just keep using react. Next JS also comes with some performance optimization out of the box. For example, automatic code splitting. That's something that you usually would need to do manually with webpack. It also comes with image optimization and URL prefetching. And not to mention that they also have a big community and great documentation. It's very easy to find resources online and examples. Okay, so we chose next JS to build it and to run it. To deploy it, we decided to use Versaille. First reason is that Versaille is the company behind NextJs. So we can expect this marriage to take the most advantage of both solutions. But not only that, we also wanted to take advantage of compute at the edge. The global edge network will allow us to deploy to multiple locations and this will come with multi availability zones and automatic failover out of the box. So this way we can ensure that our application will be running as close as possible to the geographical location of our users. There's also no infrastructure as code to maintain, it's just code. All you need to do is to connect your GitHub repo to Versaille, and then you get automatically deployments with cache invalidation. And my favorite one that is the preview deployments. How do they work? Every time you create a new branch on your repo and you push changes to it, Versailles will create a new isolated environment, deploy that code and provide you a temporary URL where you can use to validate your deployment and also for testing. And also, another thing that Versailles helps us is to stick to our serverless first approach that we had. Mercloud. For those who are not familiar what serverless is, it's a way to run your application in the cloud without the need of servers. There are servers, of course, but you just don't need to manage them. You have your small units of code that are your functions and they are triggered by events, and you only pay for what you use. So if you have an idle application because you have few customers, it's overnight, the weekend, so you're not paying for it because you're not consuming. And serverless comes with some good benefits. It helps you to focus on business logic and let the cloud manage the infrastructure for you. This also increases your team agility, and not to mention that you have automatic scalability. The cloud will manage that for you. You don't need to worry about this. Usually, of course this is not a universal solution, but works pretty well in many situations. All right, so after the technology choices, we came up with a draft idea of how our new architecture would look like. So we can see here that we have a frontend sitting behind the CDN. So this is running as close as possible to the users. And this next JS application will be communicating with our API. So this API was already built and deployed and hosted to AWS. So all we had to do was to consume it. But before starting to code anything, we had to solve some challenges. And the first one was how could we identify each of our tenants? It's pretty common to see out the SaaS products to handle multitenants by providing each of the tenants a subdomain. So every time you need to identify which tenants belongs to a particular request, you parse the host header of the request and then you can simply extract the tenant identification from that domain. But in our case, our customers are exposing this application to their own customers. So we wanted to allow them to configure and run it with their own domains. So this approach will no longer work because we cannot just parse the URL and extract that tenant identification from it. We also have the situation where we need to handle multiple domains pointing to the same tenant. And to handle this we will need to have a mapping table where we can correlate which domain belongs to what tenant. So every time I need to identify what tenant request belongs to, I can simply do a lookup on this table and get the correlation there. And the way our tenants they can configure their custom domain for this setup is by using your admin application. Once they configure a domain, we save this information on that mapping table I just showed you and this will trigger a routine that will configure this custom domain in Versailles using the domains API. What exactly is this domains API? If you have used Versailles already and you go to the settings of your project, you see there are a tab where you can configure custom domains for your project. And this is the same API we're using on the solution. And once you link a new domain, you need to somehow validate that you own that domain. And the way you can do this with Vercel is by creating a CNA entry to the DNS of configuration of the domain and then you have your traffic redirect to that installation to that project. You might be wondering if this API, these domains APIs of Versailles, has any limit. And if you haven't used this in the past, you probably have heard or faced the issue where there was a limit of the number of domains. You could point to a single Versaille project, but this is no longer the case. It's been almost two years now that this limit has been removed. So now you can use unlimited domains on a single project. All right, so here we can visually see what happens when a user makes a request to an application. The user will type the URL into their browser and then the browser will reach out to the DNS server over the Internet to find out which IP is linked to that web address. And then once it knows, the IP will do the request to the correct server. This is a very simplified overview of this process and we know it's in reality it's a bit more complex than this, but this illustration help us to understand the basic flow of this dense cool. But on a multitenant scenario we will end up with multiple domains resolving to the same ip address. And then once our application receives this traffic, we start questioning all right, so what tenants belongs to that domain? The domain of this request I just received and to solve this problem, we need to add some intelligence to our application to be able to resolve this information and tell what tenant belongs to that particular request. For this we use the nextjs middleware middleware. They allow you to run code before a request is completed. So this sits in front of your application, runs in the edge, and you can use it to modify the request and the response by doing, rewriting, redirecting or simply modifying the request headers. And here's an illustration of how we do this in Mercloud. So our middleware will be responsible to extract the host header of the request and then do a request to our API. And this API will do the lookup on that mapping table I showed you before. And once it knows what tenant that domain belongs to, we inject a header on that request. So now every time my application needs to know which tenant that request belongs to, all it needs to do is to check the header on the request. And this is how our middleware implementation looks like. You can see here that we extract the host of the request, we make that fetch request, and then if that result that succeeds we simply inject the tenant in the header online. 13 one of the questions that people usually ask about this is is it really performatic? Is a best practice to do fetch requests on middleware? Of course, everything you do here just adds to the latency of that response and something that we recommend here. If you're doing any API fetching requests here, you should be caching this response so you can use something like a key value store on the edge as well. So you only reach out to the real API if you don't have that information cache. So this is a good performance tip I can give you here. All right, and the next thing we had to think about was how to do the routing of our application. But first let's understand how the built in router of NextJs works. Next uses a file system based router where folders are used to define the routes, and files are used to create the UI that's shown for that route segments. We can also use some special notation to define dynamic route paths that are based on a path parameter. So as we can see here on the products, and once that is compiled you get nested routes with a path parameter. This router is pretty simple to use and allows us to do caching as well. And the way we can do caching is a user makes a request to a page and that gets rendered on the server side. And before we return this response to the user, we will cache that application, that response, that output. So the next time a request is made to that same URL, we can serve that cache content. So I don't need to regenerate the page. But here on the cloud we do something more sophisticated. It's called incremental static regeneration. The principle is pretty much the same, but you can also set a TTL on that cached response. So next time a user makes a request to that same URL and this TTL has expired, we will still serve that old stale version of the page built in the background. It will trigger a process that will refresh the cached content of the page. And next time a user comes and make a request to that same URL, then you'll be served with the new version. This works pretty cool, it's great. But let's bring this to the context of multitenancy. So I might have user one here that belongs to tenant a and you make that request, so you get served the old version of the page and the background process will be triggered. And now I have a user two that comes and access that same URL. And the question here, what version of the page will be served to user two? The answer here is user two will be served the version of the page that was generated for user one that belongs to tenant a, a different tenant. So this is not good, because now we're mixing content of two different tenants so they have private data, they shouldn't be shared, they should be isolated. But we having the risk here of sharing the wrong content of that page to the wrong user. How can we fix this issue? Is there a way to fix it? The first step here is to look again how we structure the routing and think, how could I make each route be tenant aware so it knows which tenant context it belongs to? And the solution here is to add a dynamic path segment to the very root of our router, so every route underneath it, it's under the context of that tenant. So now I can say that safe to cache any content, because even if a different tenant ends up with a request on the same URL, I know that my content is cached in a different path segment in a different context. So we will avoid mixing cached resources from multiple tenants. This is how this routing configuration will look like in the URL. We can clearly see that now that we're adding a new path parameter to a route. The tenant identification will be shown here. And this is not something we really want because remember, we giving our tenants the possibility to use their own domains on the platform. So why do we still need to identify put an identification on the URL? For sure we can improve this. There is quite a long discussion about this topic on the next Js GitHub that took quite a while to get an official answer on how to solve this, and the recommendation is to use some sort of identification on the route, like how we did. And here on this example they're suggesting you to use the hostname of that request. And this does exactly the same way we do with the tenant. But here they using the hostname of the request. It works exactly the same way because it's a unique identifier for each tenant. And also yes, you see that on that thread that they mentioned that this will be reflected, this will be shown on the URL, but luckily there is a solution for that. We can use URL rewrites to handle that dirt job. So the rewrite will be responsible for adding that identifier to the router, but we will also mask the URL that's presented to the user. So the request is still routed to the correct segment, but it's simply not shown on the browser for the user. The way you can do URL rewrites in nextjs is by setting these rules on your next config file, or you can also use midos to do that if you want. And after we apply these changes, the rewrite changes. Here are the results. So we no longer have the tenant identification on the URL path, but the request still being routed to the correct path segment. I've prepared a quick demo here to show you this working, so let's hope everything works fine. So let me change my screen. So we have a repo here with a very simple nextjs application. You can see here in our router we have that dynamic route that represents the tenant, and we have a page here. And all this page does is it makes a request to this time API that will return us what the current time is. And you'll print this on the screen. So we'll print hello, we'll say which tenant that session is and we'll print what time this page has been generated. We also have a middleware here where based on the host header of that request, we identify which tenant is. So this is a very dummy example here. I'm just checking if we have tenant a or B, and then we're setting it, and otherwise if it can resolve that we'll just set as a default tenant. And also in our next config file we have the rewrite rules for it. So you can see here that we get pretty much anything on the request and then we'll proxy that to a tenant path and the tenant will be extracted by this header, the x tenant. And this is exactly the header we're setting here. So if I run this application now running and I go to my browser and I do localhost, you can see here that, okay, I got a hello word, the default tenant and the date and time that this page has been generated. And if I refresh, you can see the time is not refreshing. So this proves that I'm serving that cached version. But how can I identify multiple tenants here based on a domain if I'm using running this on localhost? So what I've done here on my machine, I created two local domains that they point to local host, so I can use them to simulate other domains. So if I access that, my application, okay, I forgot to set a port. All right, so you can see here that now it's able to identify which tenant that request belongs to, the domain, and the time that page was generated. And then if I do this the same with the other tenant, you see now that I have tenant b and this is the time that the page was generated. So you can see here for each of the tenants, including the default, it's a different time that the page was generating. And if I keep refreshing this, I'm getting served that cache version of the page. And on this solution as well, we setting a TTL of 60 seconds. So if I come back to this page after 60 seconds and do a request, that background process of regenerating the page will be triggered. And then if I refresh the browser again, I'll get a new version of this page. And it's what happened here with the first request we can see that it just got updated again. All right, so that was the demo. And this is a public repo. You can find it on this URL or QR code and you can use it to create a multitenant application as a template to create this kind of application. And there you'll find two branches on this repo. The one I showed you is called using middleware. So we solve this problem by using the middleware like we do at Mercloud, but you also have on the main branch you find a solution where we simply use the host header of the request and identify different tenants. So we don't need a middleware there. Feel free to use this repo and raise any pull requests of improvements. If you have any contribution will be very appreciated. All right, but if today you do a research on how to built a multi tenant application on NextJs, we'll quickly hear about the Versailles platform starter kit. It's a template for a full stack next js application with multi tenancy and custom domain support. This is great, but well, a bit too late for us. When this came out we had implemented our solution, but even though we said okay, let's check it out and see how they implement it and how they solve these problems we had. And then we found out that they do the domain based routing. That's pretty much what we do with the tenant. Slightly different way they do URL rewrite using the middleware. We use the next config file for that. And one of the reasons we don't use middleware for that is that the first iteration of this application, we made it with next ten version ten midos weren't a thing yet. So once we migrated to the latest version of NextJs, we didn't bother refactoring this part of the rewrite. That's the main reason why we don't use middleware. And we found out on this solution that they also use the Versailles domains API. So we're pretty happy and we thought, okay, we did a great job here. We didn't do anything that was completely different from their solution and where we landed. So this is a high overview of our architecture today. So you can see that the front end is hosted on the Versaille infrastructure where we didn't cover on this talk, but we handle authentication with off zero and the middleware talks to our API on the back end. We have everything hosted on AWS, but we have multiple versions of this API hosted in different regions. That's for compliance and data isolation for our customers. But we have one API, that's the tenant API that runs on a global region, and that's the API that the midower uses in order to do the correlation between the host header of a request, the domain of that request, and do the mapping with attendance. All right, the outcomes of our implementation we got great improvements in performance. We're taking advantage of the edge networking and the CDN caching. Also because we're running on the edge now, we reduce a lot the latency so the servers are much closer to our users. We also increase the agility of our dev team. We no longer need to maintain a very compliant infrastructure and multiple deployment pipelines. It's much easier today for us to build and release new features without any overhead. And today is also much easier for us to onboard new tenants and this process is fully automated. All we need to do is add a record to an admin system. We had that and automatically we just create all the resources required for that tenant. And the lessons learned from this journey for us was the first one. Always look into adapting tools and technologies that will help you to focus on business value rather than having to spend days, weeks of the time in the beginning of the project just to set up a very complex infrastructure and structure of your code. Look into adapting these tools that with very minimal effort will allow you to jump straight into coding and you can easily deploy them. Also, think about your users, they want the best. Experience and performance is the main thing you need to consider to achieve this. You want to serve pages as fast as possible to your users. And for this you need to take advantage of things like gen stack and incremental static generation like we do here, mercloud. And just be careful about server side rendering. Anything you do that will slow your page load because it needs to be re executed on every request. And this will make much difficult, much more difficult for you to cache the response. Also, observability, it's a must and it also needs to be tenant aware. So create consumption metrics that will help you to identify who's using what and how much of that are they using. So when you're monitoring the health of your application, you can easily identify who's using more resources. And remember, each tenant you on board will bring with them a different Persona and also different usage patterns. So you have very small tenants that they don't require much, but you also have big ones that will bring a huge demand to your application. So you want to easily identify who the noisy tenants are. Let's say one of your tenants is going under a DDoS attack and suddenly the performance of your whole application is being impacted. And this is impacting other tenants. So you want to identify who the noisy tenants are there. So you can quickly identify and mitigate any bottlenecks that are being caused by them. And also to wrap it up. A very important recommendation we can give you is don't do early optimization. You probably get it wrong and you have to redo it later. Use metrics to drive it. So first have the problem, and then the metrics will tell you where your problems are, where your bottlenecks are, and then you can use this information to attack the problem once you have it, instead of trying to guess what your future problems will be. Yes, and that's the end of the session today. Hope you have enjoyed. Please feel free to reach out to me on my social media and also check our website at Mercloud IO. Thank you very much. It was a pleasure to share this with you.
...

Guilherme Dalla Rosa

Co-founder & CTO @ MerCloud

Guilherme Dalla Rosa's LinkedIn account Guilherme Dalla Rosa's twitter account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways