Speeding up a Django web site without touching the code

I’ve recently been tweaking my server setup for a Django 1.3 web site with the goal of making it a bit faster. Of course, there is a lot of speed to gain by improving e.g. the number of database queries needed to render a web page, but the server setup also has an effect on the web site performance. This is a log of my findings.

All measurements have been done using the ab tool from Apache using the arguments -n 200 -c 20, which means that each case have been tested with 20 concurrent requests up to 200 requests in total. The tests was run from another machine than the web server, with around 45ms RTT to the server. This is not a scientific measurement, but good enough to let me quickly test my assumptions on what increases or decreases performance.

The Django app isn’t particularly optimized in itself, so I don’t care much about the low number of requests per second (req/s) that it manages to process. The main point here is the relative improvement with each change to the server setup.

The baseline setup is a Linode 1024 VPS (Referral link: I get USD 20 off my bill if you sign up and remain a customer for 90 days), running Apache 2.2.14 with mpm-itk, mod_wsgi in daemon mode with maximum 50 threads and restart every 10000 requests, SSL using mod_ssl, and PostgreSQL 8.4.8 as the database. For the given Django app and hardware, this setup is strolling along at 4.0 req/s.

With this blog post as reference, I switched from Apache+mod_wsgi to using nginx 0.7.5 as SSL terminator, for serving static media, and as a proxy in front of Gunicorn 0.13.4. Gunicorn is a WSGI HTTP server, hosting the Django site. The Linode VPS got access to four CPU cores (n=4), so I set up nginx with 4 workers (n) and Gunicorn with 9 workers (2n+1). Different values for these settings are sometimes recommended, but this is what I’m currently using. This setup resulted in an increase to 9.0 req/s.

A nice improvement, but I changed multiple components here, so I don’t know exactly what helped. It would be interesting to test e.g. Apache with mod_proxy in front of Gunicorn, as well as different number of nginx and Gunicorn workers. The nginx version is also a bit old, because I used the one packaged in Ubuntu 10.04 LTS. I should give nginx 1.0.x a spin.

Next up, I added pgbouncer 1.3.1 (as packaged in Ubuntu 10.04 LTS, latest is 1.4.2) as a PostgreSQL connection pooler. I let pgbouncer do session pooling, which is the safest choice and the default. Then I changed the Django app settings to use pgbouncer at port 6432, instead of connecting directly to PostgreSQL’s port 5432. This increased the performance further to 10.5 req/s.

Then, I started looking at SSL performance, without this being the bottleneck at all. I learned a lot about SSL performance, but didn’t improve the test results at all. Some key points was:

nginx defaults to offering Diffie-Hellman Ephemeral (DHE) which takes a lot of resources. Notably, the SSL terminators stud and stunnel does not use DHE. See this blog post for more details and how to turn off DHE in nginx.
If you’re using AES, you can process five times as many requests with a 1024 bit key compared to a 2048 bit key. I use a 2048 bit key.
64-bit OS and userland doubles the connections per second compared to 32-bit. My VPS is stuck at 32-bit for historical reasons.
SSL session reuse eliminates one round-trip for subsequent connections. I set this up, but my test setup only use fresh connections, so this improvement isn’t visible in the test results.
Browsers will go a long way to get hold of missing certificates in the certificate chain between known CA certificates and the site’s certificate. To avoid having the browser doing requests to other sites to find missing certificates, make sure all certificates in the chain are provided by your server.

If you’re switching from Apache to Nginx, note that Apache uses separate files for your SSL certificate and the SSL certificate chain, while Nginx wants these two files to be concatenated to a single file, with your SSL certificate first.

Next, I read about transaction management and the use of autocommit in Django. The Django site I’m testing is read-heavy, with almost no database writes at all. It doesn’t use Django’s transaction middleware, which means that each select/update/insert happens in its own transaction instead of having one database transaction spanning the entire Django view function.

Since I’m using PostgreSQL >= 8.2, which supports INSERT ... RETURNING, I can turn on autocommit in the Django settings, and keep the transaction semantics of a default Django setup without the transaction middleware. Turning on autocommit makes PostgreSQL wrap each query with a transaction, instead of Django adding explicit BEGIN, and COMMIT or ROLLBACK statements around each and every query. Somewhat surprisingly, this reduced the performance to 9.2 req/s. Explanations as to why this reduced the performance are welcome.

Reverting the autocommit change, I got back to 10.5 req/s. Then I tried tuning the PostgreSQL configuration using the pgtune tool. I went for the web profile, with autodetection of the amount of memory (1024 MB):

pgtune -i /etc/postgresql/8.4/main/postgresql.conf -o postgresql-tuned.conf -T Web
mv postgresql-tuned.conf /etc/postgresql/8.4/main/postgresql.conf

pgtune changed the following settings:

maintenance_work_mem = 60MB        # From default 16MB
checkpoint_completion_target = 0.7 # From default 0.5
effective_cache_size = 704MB       # From default 128MB
work_mem = 5MB                     # From default 1MB
wal_buffers = 4MB                  # From default 64kB
checkpoint_segments = 8            # From default 3
shared_buffers = 240MB             # From default 28MB
max_connections = 200              # From default 100

After restarting PostgreSQL with the updated settings, the performance increased to 11.7 req/s.

To summarize: in a few hours, I’ve learned a lot about SSL performance tuning, and–without touching any application code–I’ve almost tripled the amount of requests that the site can handle. The performance still isn’t great, but it’s a lot better than what I started with, and the setup is still far from perfect.

To get further speed improvements, I would mainly look into three areas: adding page (or block) caching where appropriate, log database queries and tweak the numerous or slow ones, and look further into tweaking the PostgreSQL settings. But, that’s for another time.

If you have suggestions for other server setup tweaks, please share them in the comments, and I’ll try them out.

Updated: Removed the “mean response time” numbers, which simply is (time of full test run) / (number of requests). It just told us the same as req/s in a less intuitive way. The other interesting number here is the perceived latency for a single user/request. I’ll make sure to include it in future posts.