Conf42 Rustlang 2022 - Online

My Blog is Hilariously Overengineered to the Point People Think it's a Static Site

Video size:

Abstract

Blog engines are a fantastic opportunity for self-expression and overengineering things. This is the story of my blog engine through the years as I have made it faster and faster. I currently serve pages so fast that timing how long it took ended up taking longer than serving the pages in the first place. In this talk, I will go over how and why my blog engine in Rust is so fast and what you can learn from it to make your web applications even faster.

Summary

  • Zeeso explains how his blog works and why people often mistake it for a static website. He uses a templating engine called ruct that takes a weird meta syntax on top of HTML and then spits out rust code. His blog is one of the best resources for learning Nix and Nixos.
  • My website is stateless and it farms out its state to a stateful microservice. This allows me to move it around to any server I want very easily. I can also take all this data in RAM and then transform it into whatever kind of feed I want.
  • The biggest thing that you can take away from this is that dynamic web apps can be very fast. I'm going to stick around in the chat to answer any questions I haven't answered already. If you have questions, please speak up. I love answering them.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Speed. Safety. Development experience, fearless concurrency. These are all things that you associate with programs written in rust. How about somewhere buzzwords? Words like elegant? Oh, that's a good one. I'm Zeeso and I'm going to share the gory details of how my blog works and why people often mistake it for a static website. Buckle up and kick back. We're going to learn about the Internet today. I'm Zeeso. You've probably seen my blog on that orange website or that other orange website. I also study philosophy and have been writing a novel. I work at Tailscale as the archmage of infrastructure and I do developer relations. My blog is somehow one of the best resources for learning Nix and Nixos. This talk will contain opinions about website design and the like. These opinions are my own and are not the opinions of my employer. Websites are social constructs. There are only servers that speak this weird HTTP protocol and then sometimes spit out a markup language called HTML. If you're lucky. This HTML is then understood by very principal humans or web browsers, and then it all gets transformed into roughly what the writer or designer wants it to look like. I write all my posts for the blog in Markdown in emacs. Sometimes I do brain dumping or initial drafting in Apple notes on my MacBook or iPad, but it all gets into emacs eventually for publication. This markdown has some front matter in YaML, this has metadata like tags, the series, it is in stream recordings related to posts and when it is scheduled to be posted publicly. This is all used by the templates to make sure that I can't forget to put it in articles, but the main focus is on the contents of the post, the words I type out. In the process, I've organically grown my own custom markdown dialect on top of a markdown parser named Comrac. Comrac is made by a friend and it is the most important part of this website. However, over the years I've found that vanilla Markdown just isn't good enough for my needs. I've grown features out on my blog that require more fancy things like the conversation snippets and the newly added aigenerated hero images. At first, I just implemented a hacky markdown extension. It applied the conversation snippet logic to anything that matches a markdown link with a weird URL scheme. Unfortunately, that ended up not scaling well as the conversation snippets got more complicated, like when I need to add links. So I brought in a library called law underscore HTML. I use this to transform my custom HTML elements into a bunch of other HTML using a bank of templates. This allows me to write markdown with occasional HTML for the things markdown can't express. Then I can rely on my blog engine to translate those short codes to what people see on the site. In my blog, I use a templating engine called ruct that takes a weird meta syntax on top of HTML and then spits out rust code. This means that when you load a page like my homepage, you're hitting a function that renders that homepage to a string buffer. That string buffer is what my website throws back into the void, and hopefully it all comes back to you on your end. As a side effect of doing all this, it happens fast. Really fast. So fast that it's faster than a static website. Turns out serving things out of ram is very fast. And when I say fast, I mean that I have tried so hard to find some static file server that could beat what my site does. I tried really hard. I compared my site to nginx, Openrest, Tengen, Apache, Go standard library, warp in rust, axum in rust, and finally a go standard library HTTP server that had the site data compiled into ram. None of them were faster saved the precompiled Go binary, which was like 200 megabytes and not viable for my needs. It was a hilarious benchmarking session. I have accidentally created something that is so efficient that it is hard to express how fast it is. Things is efficient and fast, but the syntax of ruct is awful. I have to specify the types of my code in the template itself. I have to be sure that the automatically generated template code is importing any of the non default traits I need. It works, but it kind of sucks. So I've been playing with mod instead. Mod is a procedural macro library that lets you transform its own domain specific language into HTML at compile time. You can make your components normal rust functions I use mod for all of my short codes, and I've been slowly converting my site over to use it. It's pretty great, you should check it out. One of the biggest things you see me use these for is the little conversation snippets that I have in blog posts. This was originally created to absolutely dunk on homophobes that were angry that someone put furry art in an information security blog post, but this also lets me experiment with a more socratic dialogue style for helping to explain things in more detail. I now write everything with this style and have to go back and edit it out for the work blog. My coworkers can confirm this. This flexibility also lets me add things like hero images generated with AI. I use these to help make my post more visually interesting. I'm still refining my style and trying to make things better, but I'm just absolutely terrible at CSS. One of my favorite parts of how this site works is something that will probably make the theoretical computer scientists in the crowd start crying. When my blog loads everything from the disk into ram, it stores all the posts in the moral equivalent of a linked list. When you, as a reader, look at one of my posts, it's doing a big o of n lookup on potentially every one of my posts to figure out which post to display. Normally, this would be terrifying, especially with the amount of traffic my blog gets, as represented by this handy graph here. You'd think that something that does a lookup on potentially every post, in the worst case for the most common thing on the biggest data set, would make performance terrifyingly slow. You'd also think that with the amount of traffic that I get, it would be an active detriment and I'd be trying to remove it. However, this is when I play my trap card. When you look at the analytics, you can see that the most frequently read article is the most recently posted one. This means that it's not actually a big o of n lookup. Most of the time it's constant time complexity. In theory, things design is the terrifying type of thing that you'd normally find out after you accepted a job offer and had your first day of work, but in practice it's fine. It is a bit weird though, and I may need to rethink this in the future, but this is scaled to almost 300 posts for now, so I think it's okay. When my site starts up, it reads every post from the disk into RAm. Rust makes that really easy. With Tokyo, I can schedule a bunch of jobs and then wait for them all to finish. This lets me spread the loadout to every cpu core so that the posts can load up to twelve times as fast as they would if everything was done iteratively. Once it's done loading them, it sorts them and then puts them into the list for the blog's data structures. I can do things in one line of rust and it would be something like 50 lines of go. Rust allows me to have a lower cognitive complexity because I can just rely on things being taken care of for me instead of having to reinvent the wheel all the time. I things in high level logic and let the compiler take care of the lower level details of making it work. It's great. Can amusing part of all of this loading things into ram stuff means that my website is actually stateless. This allows me to move it around to any server I want very easily in case something very bad happens. I can also take all this data in RAM and then transform it into whatever kind of feed I want. I currently support RSS, Atom and JSON feed so that you can subscribe to my blog with whatever reader you normally use. JSON feed allows for custom extensions, and I have played with one that gives you some of the extra metadata in my front matter that isn't exposed in JSON feed itself. Normally, this doesn't show much of anything useful. It's where I put things like the Twitch and YouTube links associated with a post, the link to the slides and talk pages, or the name of the blog post series, if one exists. I don't know if anyone uses these, but I've been starting to use them for some of my internal pipeline. Things I mentioned my website was stateless, right? Turns out that's not totally the case. It's mostly stateless, sure, but it also has a stateful component that organically grew to meet my needs. This stateful component sort of started out as a personal API for other things. I named it me after the Toky Pona word for me. I use this daily to track some personal things, but it became really useful once I found the indie web concept of posse publish on your site syndicate elsewhere. This concept allows me to post things on my blog and then have something else take over to announce those posts on Twitter and Mastodon. With messages like this, everything is automated. I don't have to lift a finger except for Patreon. Patreon's API doesn't allow you to generate posts, and sometimes I can forget to link the post to my patrons. I'm trying to get better about this, but I would really love to just hand this over to a machine and stop having to care about it. The other major thing I use things for is web mentions. Web mentions are kind of like app mentions or Twitter, but it's generalized for any website on the Internet. It's another indie web protocol that a surprising number of websites support. Along with bridges for things like Twitter and Mastodon, me receives and stores all of the web mentions I get. When my site starts up, it reaches out to me and gets a list of web mentions for every post it loads into memory. This means that there's potentially some delay from you sending the web mention to it showing up on my blog, but in practice that's okay. I would like it to be faster, but that would mean having to move the web mentions database into my main blog app, and I don't know if I'm ready to do that or not because it would make moving my website around a lot more complicated. So I mentioned on my blog before that I host everything on one big Nixos server. Now, this means that I would be able to store things on that server fairly durably. But I also have mentioned that my site is stateless and it farms out its state to a stateful microservice. You may be wondering something like why would you do that to yourself? I have a good reason for it, but in order to explain why, I want to take a moment to trace over the history of my website's hosting. Heroku's free tier was one of the things I used to break into tech when I started my job in Mountain View and got my former domain name. I was likely using Heroku to host that website. I don't have notes from back then, I'm going off of my gut feeling and some projects that I have on GitHub. At that point, my website was a showcase of my ability to write things using a web framework called lapis. You can think of it as rails for Lua built into the side of Nginx. This variant of my website was in use for a few years until I rewrote it in late 2016. A huge part of how that website worked was that it parsed the markdown for each post every time the page loaded. This let me edit and test things very quickly, which made writing posts and previewing them in real time possible. I didn't fix this before my first article got to the front page of Hacker News, which meant that my website was a bit slow, but it did survive the load, barely. After that, I set up a cache server named Olegdb. Olegdb is a key value store written in c by some friends, and it is a joke about mayonnaise that has been taken way too far. I used Olegdb in my website to cache the rendered HTML for each markdown post. When you loaded a page, it made another request to the OlegDb server to grab the contents from the cache. This was faster than parsing the markdown on every pages load, and it ended up being the thing that made my site survive the wrath of Hacker News. Some time after my site was deployed on Heroku, I moved it over to a server running docku. Docku is a self hostable heroku clone that lets you run a heroku like environment with Docker on a server you own and operate. I've used docku for years since, and for a very long time. It was the first thing I reached to when trying to deploy anything to the cloud. It's got templates for spinning up basically any database you could think of at the time, and it was trivial to just spin up infra when I went to experiment and kill it off when I was done, no additional cost required. I was very price sensitive back then. Being able to host many apps on the same $5 per month server was a huge advantage compared to hosting one app on one $5 per month Heroku app. I've also been a member of the Go community slack since it was founded. Time and time again answering helping people with Go, I had seen people wanting an example of a web application that used the Go standard library as its framework, and there was no really good example for it. I had also reached a performance optimization point where I didn't know how to make my site on lapis run faster, so I kind of got nerd sniped and decided to rewrite my site in Go. The first iteration used a Go backend with purescript and react on the front end. This worked for some time, but after I realized that my target audience uses weird browsers that don't support Javascript sometimes I removed the client side rendering entirely and I had the server spit out HTML to the client like a traditional website. This allowed me to survive hacker news hugs of death gracefully and is why I started putting everything into ram in the first place. The Go port of my website handled the load like a champ. This is also when I started putting everything into one giant linked list. It was so much faster than using a cache server, but the main downside was that it made the site slower to start up. At the time it wasn't a practical issue. I admit my blog is an exceptional use case. My website gets a lot more traffic than you could possibly imagine. It usually gets more than 100gb per month. This is really impressive because my site mostly contains text and small images. When my articles get popular, they get very popular very fast, and then that starts people looking at other pages. On my website I have really unique performance requirements. The number on the slide is the number of times I've been on the front page of news aggregators or have made other posts that have gone viral. At nearly 300 posts written. This means that my posts have a less than one out of ten chance of getting a lot of page views in a very short amount of time. So I need to be sure that the website code runs as fast as it can for the most common use of the most common routes. At one point, my blog was starting to get loaded enough that it started to make my docker server fall over from plain text, HTML responses and RSS replies. Something had to give. So in a moment of weakness, I made a pact with the devil. I put my blog on Kubernetes as a part of me learning how to use Kubernetes for work. I'm a very hands on person. I need a local copy of things in order to really feel like I understand how to use them. So I decided to commission a freight train to mail a letter and I set up a Kubernetes cluster with digitalocean. This worked pretty great once I got past the initial teething issues, and it worked for a long time. I was disappointed by how many alpha components I needed to serve web apps reliably. I was able to do continuous deployment using GitHub actions and it made my blog minimal effort. At most, I was focused on writing. Publishing was relegated to the machines, however, sometimes it blew up and when it did, it was worse than when the single server blew up. I didn't have access to root on the servers. I had just enough apps on the Kubernetes cluster that I couldn't scale the cluster up and down to unbreak issues. Sometimes a file system mount would get stuck and I didn't have a reboot that sucker button to unstuck it. When that happened, my git server would stop working. This is a very annoying thing to debug while you should be focusing on your day job. After a while I gave up. Then I got nerd sniped again with Nixos. With Nixos I could just directly specify what should run and where I had power beyond what mere mortals could attain. With Docker and Kubernetes alone, I could shape the universe of the applications in question and then proceed with that. Instead of trying to kit Bosch things into shape based on overly generic tools, I could just use Nginx to route to the Unix socket. And then I did not have to care about the overly generic Turing complete Yaml hell that is Kubernetes. I think it's pretty great, but I'm a Vtuber, so take my opinions with an appropriately sized grain of salt. The biggest thing that you can take away from this is that dynamic web apps can be very fast, especially if they are built to purpose. If you keep your goals in mind as you develop things out. It'll do everything you need very quickly. My blog stands on the shoulders of giants. Every one of these people gets a special shout out for helping either make my blog or this talk shine. Thanks. You all really help more than you can imagine and thank you for watching. I'm going to stick around in the chat to answer any questions I haven't answered already. If I miss your question, or if you really want an answer to my question outside of the chat, please email it to how I made blog at zserve us. I'll have a written version of this talk, including my slides, a recording of the talk, and everything I said today on my blog. Soon. If you have questions, please speak up. I love answering them and I am more than happy to take the time to give a detailed answer. Be well, all.
...

Xe Iaso

Archmage of Infrastructure @ Tailscale

Xe Iaso's twitter account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways