We got a story on the front page of Slashdot… and got our assess handed to us literally 2mins later with server load jumping from 0.56 to 10 and then off into the 40s where the server just decided to melt instead of host pages.
During the last few hours we’ve been troubleshooting what went wrong and why we couldn’t get the server load down even though we had everything cached statically. Well we finally got it all settled down, and here are 5 tips for you folks out there running WordPress blogs that want to know exactly what to turn off/on when you get Dugg or Slashdotted.
1. Have the WP-Super-Cache Plugin Installed
It’s an excellent plugin, extending the page-caching functionality of WP-Cache with the ability to write out the entire page to a static HTML file on disk that is hosted back to the caller immediately. This will freeze the state of the page (no new comments or login prompt always shown) but it will save your WordPress site and server from melting.
Not too impressed?
The saving grace for us during our Slashdotting was the ability to on-the-fly setup directly cached pages. For example, as soon as we saw hits coming in for the Lego Algorithm story, we popped it into WP-Super-Cache and had it create a permanent cache of the page that it would serve instead of trying to dynamically read the page in case it changed.
As it turned out, there is also support for placing your site into “Lockdown” mode. Lockdown mode stops WP-Super-Cache from trying to refresh it’s cached pages after new comments roll in.
So if you have a story that people are going crazy commenting on, and it’s killing your server, you can lock it down and let the cache manually refresh at the interval you have setup instead of on every single comment.
TIP: If your site is normally pretty busy and you have a handful of most-popular posts, you might consider keeping a list around of them so when your site gets hit you can immediately static-cache them to lessen the blow on the server.
2. Be Aware of DB-Heavy WordPress Plugins
This is a very vague tip, but fortunately I think we use some of the most popular DB-intensive WordPress plugins out there, so we’ll just list off what we had to turn off to get back to sanity:
- Popularity Contest: This plugin tracks stats on each post to get an idea of how “popular” they are. Unfortunately this means a lot more DB activity for accessing each post, so if you are getting Dugg or Slashdotted, turn this off temporarily. The plugin has a nice feature where you can go in later and manually edit the popularity of a post (the day after, for example) to supplement any popularity it might have missed during the crunch.
- WordPress Related Posts: This plugin almost everyone uses. Based on the current article displaying, it will use the tags/categories/etc. from the post to display other posts related to it. Unfortunately this means a lot of taxonomy lookups every time a post is displayed. Be sure to disable this puppy too.
- Simple Tags: I have no idea why this pegs DB access so heavy on the client side, I thought it was just an admin-enhancement tool. Having this plugin enabled was the difference between a 21 server load and a 4. It was just incredible and I have no idea why.
- WP-User Online: This isn’t quite as intense as the other two plugins listed above, but it did make a difference of a Unix load value on the server of about 1 (3.0 to 4.0)
TIP: After disabling plugins, if you are using WP-Super-Cache, you need to clear your cache otherwise in some cases the plugin code will keep executing.
3. Using DB Tools to See What is Running
This is a very cool trick for MySQL that Marc Chung pointed me towards tonight while trying to trouble-shoot the issue of not knowing exactly which plugin or which command was pegging MySQL so hard:
mysqladmin -u root -p processlist
This will dump out all the running queries against MySQL and give you an idea of how long they have been running. In our case the particular command:
SELECT t.*, tt.*
was by far being executed the most often and taking the most time. Our search for more information about that query and which plugin produced it brought us to this WordPress.org thread. It wasn’t terribly clear where the query originates from, maybe Taxonomy.php, but it did seem clear from this post that plugins working with additional meta-information about the content of the pages were the culprit (hence the list in Section #2).
Your methods of troubleshooting may be different; this may be second nature to you or this may be the first time it ever dawned on you to try this, either way we found it hugely helpful.
There are also tools available to query PostgreSQL and I’m sure any of the other big DBs out there.
4. Host with a Great Company
This is really important.
We’ve hosted with a few other companies and had some really crappy experiences. It seems like getting Dugg or Slashdotted is an excellent way to test out the type of hosting company you are dealing with (regardless of policies).
With previous hosting companies our account was simply rebooted until the traffic calmed down (basically 404-ing every vistor) or immediately disabled and not able to be re-enabled without calling the hosting company to clarify why traffic spiked so high.

With RimuHosting, even though the server load climed to 41 for a sustained period at one point, they hunkered down with us for about 3hrs to solve the problem and help us out… for free.
I would say this has been the norm for RimuHosting though. Other times that we have gotten stories Dugg or Reddit’ed’ed, they’ve simply tried to optimize the machine so the pages served faster, like adding more RAM to it for no charge or increasing our bandwidth to help out. This has been the case when we started on a Mira-1 VPS account all the way through dedicated solutions… their support has always been this way regardless of how much we were paying them a month for hosting.
This is a far cry from some other hosts that will simple disable your account, so any dream of growing your reader-base with a big story are shot out the window.
You really do get what you pay for… or in our case, you get well beyond what you pay for and you appreciate the hell out of it.
Conclusion
As we found out first-hand, when your site gets hammered from unplanned traffic, there are stages of changes you can roll out to a WordPress site, utilizing some plugins and turning others off that will be able to get some pretty excellent performance to the 1000s of people hitting you. You don’t necessarily need to generate a single static page with no theme and post that for them to hit like we’ve seen a lot of sites doing.
However, that being said, a lot of the flexibility you have depends heavily on the hosting company you are with. If your hosting company is going to throttle you at a 1 or 2 load level on the server, or simply shut off your site as soon as it gets too high, there isn’t a whole hell of a lot you can do.
I’d also like to point out that since making these changes yesterday, the story (along with a few others from our other sites) have gone gang-busters on StumbleUpon, in addition to still being on the front page of Slashdot; so if anything traffic is heavier today than it was yesterday, but we are right around a 1-3.0 server load and hosting an unholy amount of traffic so far today.
WordPress is going strong and performing well. So to a lot of the hosting companies or developers that say “WordPress is a resource hog”, I think the issue is that it can be depending on the plugins you are using… but you can also tune out your WordPress install to be pretty damn performant as well.
If you guys have any tips or tricks from your own experiences, please post them below. Favorite WordPress plugins you use or maybe how you dialed out the SQL in a particular query to be a bit faster or indicies you created to make certain plugins/tables faster?
All tips are welcome and appreciated.























October 11th, 2008 at 11:23 pm
Thanks man, can you post tragic profiles and lead times from reedit and the other two?
Cheers matey skipper you old sailor!
October 12th, 2008 at 3:35 am
Right on with the WP Super Cache. We had some pretty substantial digg and stumble traffic a few weeks back and actually used the lockdown mode for the first time. We clocked over 10k visitors over the weekend, and Super Cache didn’t even flinch - quite simply a brilliant plugin. Thanks for the tip on simple tags. Didn’t know that it’s a high usage plugin, and agreed…I don’t really see why it should be db heavy. Perhaps some modification of code is in order.
October 12th, 2008 at 3:45 am
Thanks for this I was have many problems with this on my old site.
October 12th, 2008 at 7:48 am
Your story was about legos, Java, and annealing algorithms with a hint at secret google insider knowledge, and then it got posted on slashdot. Be grateful you didn’t also work Dr. Who into the story. Heh heh. Actually, though, great article (that one and this one), and congratulations on making the slashdot front page.
October 12th, 2008 at 9:30 am
Zoran & John,
Thanks guys. Really caught us off guard but we are very thankful to Justin for the story idea as well as all the readers checking it out and leaving feedback.
Also glad this article was helpful too, I was writing it *as* I had a few terminal windows up watching load, mucking with the plugins keeping track of which ones did what and how to keep things running.
Was pretty interesting. I just assumed we would have to host static-pages only for a day or so, I had no idea that some of those handy meta-information type plugins (most popular, related, etc.) could cause so much ancillary DB traffic. In hindsight it makes sense, but at the time I was bamboozled.
October 12th, 2008 at 3:34 pm
Great article! I have several posts on diig, delicious, or stumble upon frontpage and never had any problem..Thanks to Wp Super cache!
October 13th, 2008 at 12:19 am
just curious how much traffic did you pull in the 24 hour timeframe, amount of page views to be specific? If you don’t mind disclosing that.
We have hit front page on digg about 5 or 6 times, and have easily pulled 90k page views on dynamic pages without any problems. (Highly optimized Java/Hibernate backend / Async writes for view counter).
-Victor
October 13th, 2008 at 2:01 am
Did you also learn, from that little article, not to abuse your readers in the comments? I was pretty impressed with your site until I saw you trying to burn your own readers in the comments. It isn’t funny, and it isn’t clever.
Despite that, this is a good article - thanks for posting it.
October 13th, 2008 at 8:34 am
Victori,
It was almost exactly around there and the server load stayed between a 2 and a 3 once those offending plugins were disabled, I was really surprised by that because this is a mostly default WordPress install; we haven’t dug down in it and customized any of the SQL that the platform is emitting at all.
I can imagine with a highly customized solution like you were talking about you could really eek out some awesome performance underload, especially with some intelligent caching techniques.
John,
I’m glad you found the article helpful, thanks for posting! As far as abusing the readership (certainly an bad idea for any publication), I think you meant our response to James’s post?
It’s really hard to stomach internet know-it-alls after a long day… someone that drops into a genuinely interesting story and instead of contributing or just not saying anything, takes the time to fill out the comment form *just* to leave some asinine remark about how unimpressed they are and some indirect reference to how smart they are.
I try and discourage that with sarcasm… I can see how my response was a bit more biting than it should have been though. I’ll tone it down in the future.
October 13th, 2008 at 12:31 pm
Excellent post. I’ve had a WPMU server on a quad 2.33 Xeon get totally melted down in the same way, so thanks for this!
October 13th, 2008 at 3:14 pm
ah correction, checked out statistics we pulled 123k page views on our highest peak with digg. Load was about 0.85 - 1.0, not too bad. Hibernate second level cache is amazing stuff. We ran out of file descriptors from our first few digg effects.
Lessons learned:
Keep max keep alives to ~10-15 seconds and make sure your process has enough rlimit/maxfile count to support enough concurrent clients. Slow clients kill your httpd.
pfiles `pgrep nginx` for example will give you your maxfile descriptor count.
Can’t stress enough how much nginx kicks ass under load.
October 13th, 2008 at 3:38 pm
@JetteroHeller, did you have particular plugins that were killing performance? The hardware sounds pretty good… good enough to handle a spike with some of the plugins above maybe… were there other culprits?
@victori, that’s insanely nice man… server load of 1.0 serving up that many pages. Do you just stick an unholy amount of ram in that machine and set the heap space up to like 4GB so Hibernate’s 2nd level cache can get as big as it wants?
October 13th, 2008 at 3:49 pm
4gigs of ram/quad core xeon x5355 opensolaris snv_98 .. snv_90 at the time.
Offload as much into ram as possible. Do the obvious database optimizations, indexes for large tables where you run constraints or joins on. 8kb filesystem recordsize for efficient database writes for postgres and 16kb for mysql innodb. ZFS really kicks ass in this department, you are able to create partitions and set recordsizes on the fly.
I had two jetty application server instances load balancing the site on the server. To avoid heavy garbage collections I would kill off the process when RSS ram got high ~510megs. This was done via crontab script that ran at 5 minute intervals. Solaris’s SMF would restart the process automatically.
All in all it worked great after the first two complete meltdowns from the digg effect. Code base optimization and configuring the right settings does wonders. RAM is your best friend.
October 13th, 2008 at 3:53 pm
oh concerning hibernate second level cache, I just dedicated 128mb of ram to memcached.
More than enough room to cache hot spots on the site. Since Hibernate caches queries it knows when to expire those caches automatically. So you get free cache expiration without any intervention.
October 13th, 2008 at 4:00 pm
Editor,
In fairness I know where you’re coming from - there’s always one isn’t there!
Thanks for replying to my comment
October 13th, 2008 at 4:53 pm
victori,
Great tips, really appreciate you taking the time to write all that up.
Out of curiosity have you guys dabbled with JPA or other persistence technologies like iBatis, or has Hibernate always given you what you needed and done it well enough that you never had to stray?
October 13th, 2008 at 8:00 pm
Well I have used many ORM technologies. I went from Activerecord(rails), DBI::Class (perl/catalyst), to Hibernate(wicket/spring/java). I have tried JPA and iBatis to some limited extent. JPA ejb3 works exactly like hibernate but has less features such as cascade deletes and depends on a full blown J2EE application server.
If it was up to me, I would love an Activerecord HQL/ejb-ql compatible ORM. This might actually be possible with jruby; merge of the HQL parser with activerecord. The one critical thing missing in activerecord is lack of distributed sql cache. Sure you can use page fragment caching, but then you have to man handle it manually
Anyway good luck with digg, you have a nice site here.
October 14th, 2008 at 7:43 am
victori,
Sorry to drag this on, but you’re tantalizing the CS-degree in me… is fab40 all done in Wicket/Spring/Hibernate?
I’m a big Wicket fan, haven’t decided on ORM prefs yet and never had an “ah-ha!” moment with Spring where I FINALLY understood what all the hub-bub is about.
I’m curious what you had to do on a Wicket site to survive a Digg; namely with keeping the session sizes down by carefully crafting the model/detachable models for everything.
That is one issue I had with ShoutStuff (Wicket), I was either being sloppy with my models or forgetting that the state can get versioned unless you use IDetachableModel for every single model except those that cannot re-attach.
Anyways I digress… like I said, you got my programming-brain all excited
October 14th, 2008 at 8:23 am
1. Subclassed LoadableDetachableModel that uses Memcached
2. Hibernate 2nd level cache backed by Memcached
3. Jetty session handler backed by memcachedb (BerkeleyDB storage)
notice a trend?
Configure wicket to use HttpSessionStore to keep all Pagemap data in the session aka your memcachedb storage. This enables you to kill off wicket application servers at will without any expired sessions.
Configure wicket to use ONE_PASS_RENDER RenderStrategy so it does not buffer response data locally, so essentially you can bounce users around from server to server without expiring the session.
Keeping two jetty instances on a single server can easily support a full blown diggig without any problems. You have the added benefit of stability by redundancy and ability to roll out rolling updates without interruption to your service.
Stateful web applications do scale!
Concerning state size, I have a jetty file session manager written. It lets me keep an eye on session size via ‘du -hs *’ command. Nice and simple.
Some J2EE/hibernate stuff does not vertically scale! so dropping down to SQL is a must. For example, user.getFriends().add(foo); is highly inefficient. It fetches all your friends just to add a single entity to the collection. Works great when you just have 2-5 friends, not so much when you have 10000-30000 friends.
off to bed, pulled an all nighter. Sorry if I have any grammar or typos in my post.
October 14th, 2008 at 10:07 am
victori, you are a badass, thanks for the followup.