Skip to content

Latest commit

 

History

History
161 lines (92 loc) · 14.5 KB

Scaling Mastodon in the Face of an Exodus.md

File metadata and controls

161 lines (92 loc) · 14.5 KB

Skip to main content

Nora Codes

home blog tutorials projects rss

Leonora Tindall 2022/11/10

TL;DR: Mastodon’s Sidekiq deferred execution jobs are the limiting factor for scaling federated traffic in a single-server or small cluster deployment. Sidekiq performance scales poorly under a single process model, and can be limited by database performance in a deployment of the default Dockerized configuration.

If you are an embattled instance admin, go here.

I recently moved to a well-established Mastodon (Hometown fork) server called weirder.earth, a wonderful community with several very dedicated moderators and longstanding links to other respected servers in the fediverse. Overall, it’s been lovely; the community is great, the Hometown fork has some nice ease-of-use features, and the moderators tend to squash spam and bigotry pretty quickly.

Then Elon Musk bought Twitter.

The Setup

Just as tens of thousands of people1 fled Twitter to join the fediverse, Weirder Earth had a storage migration pending. It did not go well, for reasons unrelated to this post, and the instance was down for several hours.

Normally, this would not be a problem; ActivityPub2, the protocol that powers Mastodon and other fediverse software, is very resilient to outages, and most implementations have plenty of built-in mechanisms like exponential backoff and so forth to get their neighbors up to date when they come back online. This particular outage, however, coincided with a huge amount of traffic, so when the server did come back online, there were many tens of thousands of messages waiting for it. Each of these messages necessitates a Sidekiq task which integrates it into the database, users’ timelines and notification lists, and so forth.

Sidekiq did not succeed in handling this load.

the past six months of Sidekiq data. Most of it is steady, maybe a slowdecline, and labeled “In which Elon Musk has not yet bought Twitter”. The lastlittle bit is labeled “OH SHII-” and shows instance load spiking massively.A screenshot from the Weirder Earth Sidekiq dashboard, showing processed and failed jobs over time. Created and edited by Packbats.

Users began to experience multi-hour delays in incoming federation, though outgoing federation was unaffected. To users of other instances, nothing was wrong; weirder.earth posts showed up within seconds, or about two minutes at the worst. Even to weirder.earth users, the web interface worked perfectly well; it’s just that there was little new content to populate it.

The Sidekiq queue grew and grew over time, eventually reaching a peak of over 200,000 queued jobs. In conversation with the admin of another instance of a roughly similar size, I learned that processing a 6-digit number of jobs is a day is only a fairly recent phenomenon. So, this was an issue.

The Cause

I became involved with this situation during the initial outage, helping with Docker problems, alongside another user, the Packbats,3 and another instance admin. As the Sidekiq queue grew and grew, myself and two of the instance’s admins looked into every possible cause, mostly sifting through documentation, since it was not an area of expertise for any of us.

It was immediately clear that at least part of the problem was Sidekiq’s inability to successfully use the 8GB of memory and 4 CPU cores available to the server; the RSS of the single Sidekiq process was around 350 MB, and the load factor was averaging around 3 – not even fully utilizing the 4 available cores.

So, we started increasing the number of workers available to Sidekiq. This was done by increasing MAX_THREADS and passing the -c (“concurrency”) argument to Sidekiq. (We later learned that DB_POOL is more appropriate; see below.) We increased first to 15, then to 25, and finally to 50, which is the mythical “stable limit” of Sidekiq according to several StackOverflow posts.

It helped, but the queue depth was still growing, so we increased our database container’s max connections to 200 (by adding -c ‘max_connections = 200’ to the invocation in docker-compose.yml) and increased to 150 Sidekiq threads. Again, this helped, but overnight the queue continued to grow.

At this point, I realized that the bottleneck was not in Sidekiq anymore; individual jobs had gone from completing in seconds to tens of seconds and even sometimes over a minute. One of the instance admins pulled up PgHero and we realized that, indeed, the database was being incredibly conservative with memory. By default, it was using only 128MB for shared buffers, which is dramatically low on a server with 4GB of memory and a workload like Mastodon, where a huge volume of traffic is just fetching the same few posts and conversations over and over again. So, we increased that to the aggressive value of 2GB, and again saw an improvement in performance. Rather than growing, the queue was now stable, at 194,000 events and a latent time of about 14 hours.

We had conquered the first derivative.

The Solution

Ultimately, the real solution as a combination of three things:

  • Increasing the memory available to the database for shared buffers
  • Increasing the number of threads available to Sidekiq (and, accordingly, the database connection pool size and maximum number of connections via DB_POOL and max_connections)
  • Splitting Sidekiq’s queues off into their own processes.

See, Mastodon uses 5 Sidekiq queues: defaultpushpullmailers, and scheduler. For us, push was conquered early on, and mailers and scheduler were never an issue, having very low volume. pull, for getting incoming federation events, and default, for building notifications and timelines, were the real trouble. So, we split them into their own Sidekiq processes.

We had gone up to 200 max_connections and a concurrency of 150, so I guesstimated that the following split would work well:

  • a single container/process for the pushmailers, and scheduler queues with 25 threads
  • a container/process each for pull and default with 60 threads each

This worked great for pull. We went from holding steady to dropping about 500 events every minute, and rapidly cleared the entire pull queue – but the default queue was still holding on. 4

This was harder to diagnose, but eventually we realized that the Sidekiq instance working on the default queue wasn’t at its full potential. Looking at the PgHero connection status dashboard, we could see that many of our 200 connections were held by inactive or low-activity Sidekiq processes with names like sidekiq 6.2.1 mastodon [0 of 25 busy] while the active ones were only holding a few connections. But that should be okay, right? 60 + 60 + 25 is much less than 200, so there should be plenty of connections to go around.

As it turned out, Puma was also configured to use lots of database connections (MAX_THREADS = 150), so it was hogging many that it didn’t need. We sorted that out, refined our Postgres config to allow even more connections, split the Sidekiq processes up even more, with fallbacks and various different ordered combinations of queues. We all went to sleep and woke up to an empty queue!

We’ve been experimenting and optimizing since, but the server has been coping with the load handily since.

Lessons Learned

The default Mastodon configuration is broken. It’s fine for a tiny instance on a tiny server, but once you start to grow, you must scale it up. Here’s what we did:

Increase Your Database’s Resources

By default, especially in the default Docker setup, Postgres is not using any appreciable percentage of your memory. I suggest using PgTune, choosing an “online transaction processing” or OLTP workload, setting about half of your server’s memory, and applying whatever it says.

In Docker, it’s easiest to apply this through adding -c "option=value" arguments to the invocation in docker-compose.yml; this can also work if you’re invoking Postgres through a SystemD unit. Or, you can use postgresql.conf. Don’t forget to restart the database.

Split Your Sidekiq Queues

Sidekiq can scale to as many as hundreds of threads on a single process, but using multiple processes is much more efficient. Before increasing concurrency (that’s the next step!), split out the queues into different processes.

All the queues in Mastodon can be split among multiple processes, except for scheduler, which must only ever be on a single process. I recommend having a single Sidekiq process each for the mailers and scheduler queues, and then processes that prioritize pullpush, and default. You can specify queues with the -q option in docker-compose.yml or your SystemD unit files.

Specifying multiple -q options gives Sidekiq a fallback order for working on jobs, so for instance -q default -q pull means that when all default jobs are done, that process will start working on pull jobs. Make sure that there is at least one process prioritizing every queue, or else processing that queue could stall entirely!

Because the queues you want to prioritize are defaultpush, and pull, it might be sensible to make processes with all the permutations of defaultpush, and pull:

  • defaultpushpull
  • defaultpullpush
  • pulldefaultpush
  • pullpushdefault *
  • pushdefaultpull
  • pushpulldefault *

However, in reality, push is generally much lower volume than the others, so it makes more sense to drop the number of queues to four, removing the fourth and final orderings (marked with *).

To do this in Docker, simply copy and paste the sidekiq service definition, change the name, and add arguments to the invocation.

For instance, we might go from this default definition: To these definitions: 

Fully Utilize Your Database

Now that it’s got more to work with, take note of the max_connections parameter you set for Postgres. Ideally, those connections would be mostly, but not completely, utilized. Database contention will slow you down.

The DB_POOL variable controls how many database connections a Ruby on Rails process will use. (MAX_THREADS controls this for Puma, the server used in web.) In Mastodon, the relevant Rails processes are in the Docker-Compose services webstreaming, and whatever Sidekiq services you’ve configured. In docker-compose.yml, you can change these variable by adding a section like the following to each relevant service:

environment:
  - DB_POOL=value

On sidekiq services, also add the -c option to the sidekiq invocation, with the same value as the DB_POOL variable.

command: bundle exec sidekiq -q default -q push -q pull -c 25
environment:
  - DB_POOL=25

In addition, the web service takes a variable called WEB_CONCURRENCY to control how many processes it runs. Similarly, streaming has STREAMING_CLUSTER_NUM to control the number of processes.

The sum of MAX_THREADS times WEB_CONCURRENCY in webSTREAMING_CLUSTER_NUM times DB_POOL in streaming, and all the sidekiq DB_POOL variables, must be less than max_connections in your Postgres config. If it’s more, you’ll experience database contention.

In the example above, assuming the rest of the configuration is default and you have 200 database connections available, I’d set the following:

  • webMAX_THREADS = 10WEB_CONCURRENCY=3 for 30 connections
  • streamingSTREAMING_CLUSTER_NUM = 3DB_POOL = 15 for 45 connections
  • sidekiq-default-push-pullDB_POOL = 25-c 25 for 25 connections
  • sidekiq-default-pull-pushDB_POOL = 25-c 25 for 25 connections
  • sidekiq-pull-default-pushDB_POOL = 25-c 25 for 25 connections
  • sidekiq-push-default-pullDB_POOL = 25-c 25 for 25 connections
  • sidekiq-push-schedulerDB_POOL = 5-c 5 for 5 connections
  • sidekiq-push-mailersDB_POOL = 5-c 5 for 5 connections

For a sum of 185 connections. This means there will be 15 loose database connections for things like migrations and manually connecting to the database to do queries and maintenance.

Into the Future

These numbers may not work perfectly for you, and you may have to tweak things as workloads change.

There is more work left to do, and we’ll only know if this is successful once the next big wave hits, but it’s definitely better than it was before. We also have not figured out how to monitor the instance for future events, though the top candidate is, of course, Prometheus and Grafana.

Good luck. May your timelines run smoothly and your processes never be OOM killed.


  1. Wilfred Chan, “Mastodon gained 70,000 users after Musk’s Twitter takeover. I joined them”, The Guardian (2022 11 02). ↩︎

  2. ActivityPub is a W3C Recommendation. See https://activitypub.rocks/ ↩︎

  3. @[email protected] ↩︎

  4. I also typo’d mailers as mailer, which lead to some stalled emails. Don’t do that. ↩︎


Last updated Nov 11 2022

All text CC-BY-SA & code AGPL | MONKEYCHEF theme by null | Berkeley Mono | No AMP