A Review of Digg.com Traffic and CDNs
Thanks to all those that read my recent post about networking 3 Mac Mini's Hopefully it can help you create a network that is as close as possible to the administrator's second home, the NOC. As an aside form the posts I had planned, I did want to share some of the statistics that the digg.com exposure generated and some "simple" helpful tips to those interested in surviving high traffic/bandwidth peaks. I'll cover more "technical" ways of setting up your server(s) to handle this too, but that is for my next post.
Bandwidth Chart (7 Days)
The story made it to the front page of digg about one week ago on Monday, February 6th. It was posted to Digg about 36 hours earlier on Saturday evening and within 1 hour of making it to the home page, using approximately 60 diggs to get it there, the bandwidth capacity here at ActionMoniker.com really took a dive. Below is a 7 day chart that goes from Sunday, February 5th to Saturday, February 11th.
The first big spike in the chart was the initial onslaught of traffic on Monday afternoon from the digg.com homepage. That small gap you see there before the next big spike is where the WebSvr Mini had to be rebooted. In fact, I even upgraded the RAM to one gigabyte. You can see another spike towards the end that correlates to the secondary traffic generated from the blogosphere, basically when other bloggers and news portals got around to posting the information about my article and then linking back to my site. Neat statistics, but let's take a look at why that WebSvr Mini crapped out.
We All Need Cache
Apache seemed to do well with the quantity of requests coming at it from Digg. That said, my post did have one HUGE draw back, the 10 large images on the article's page that totaled 124 Kilobytes in size. My assumption is that when the server started taking requests in mass my 2.6 Megabit connection was not fast enough to serve those images in a timely manner. This caused a cascading effect on Apache by keeping memory intensive threads open for longer and longer until the server could not take it any more.
This is where a good cache comes in handy. Not browser or proxy caches, those are for my next post too, but rather what are called gateway caches which are more commonly called Content Delivery Networks (CDNs). These are distributed caching servers throughout the internet that typically help deliver content to a client by geotargeting a visitors IP address and then finding a server close to that IP with the requested content. The best example of commercial CDN is Akamai, I think Apple bought or invested in these guys about 3 years ago, but I digress. The point is that CDNs can help distribute the load off of your local server by copying your content be it a whole web page or just certain media types and serving them up to your visitor's browser.
The most popular CDN that is freely used by many digg.com readers is a company called Coral. Many times a Digg or Slashdot article will take down an unsuspecting server (know as the slashdot effect) forcing prospective visitors to view your content from a cached version on a sever very far away from your own. Often readers on both Slashdot and Digg try to "coralize" your article by linking to it from a comment thread using the Coral network as a cache. This is easily done by adding .nyud.net:8090
to the end of any hostname in a URL. For instance http://www.metaskills.net/article.php
would become http://www.metaskills.net.nyud.net:8090/article.php
but this is really only helpful to prospective readers if the coralized link was used as the initial link on Digg or Slashdot.
In my case it was not and I did not like the idea of users having to read my article on somebody else's server nor relying on them finding a coralized link to my site buried in a comment thread. If that had happened, I would have lost traffic statistics, user comments, RSS subscriptions, lateral page visitation, and many visitors altogether. I had to get the server running again and cope with my limited bandwidth. Wow, I just said 2.6 megabits was limited :)
Just Cache Images or Large Media
When I rebooted the WebSvr Mini, I immediately changed the the <img src="">
tags of the 10 images in the content area that had the largest file sizes to appear from the Coral network. This was done by changing the image source tags from <img src="/images/mac-mini-network.jpg">
to a fully qualified domain source using the coral server which ended up looking something like this <img src="http://www.metaskills.net.nyud.net:8090/images/mac-mini-network.jpg">
thus ensuring any future visits to my article received those images from 1 of any of the the 260 Coral network servers. In this way the largest aspect of my page was completely offloaded leaving my local Apache to handle the page source and other media request in a timely fashion. Best of all, this was completely transparent to the end user and no features were lost in their browsing experience.
I highly suggest that if you know ahead of time that an article of yours will be Slashdoted or Dugg, that you coralize as many of the images or large media files on your web server as possible. Your options on what that media content type is may vary, but choosing the ones with the largest file size first is usually the best idea. It can save your site from going down or to a creeping crawl.
Do You Digg It ?
Well I do and to date we have had 34,000 reads to the Mac Mini Network article. With a success like that it was time to sport some gear. I had already changed my default home page from my customized Google with Slashdot articles to digg.com and now all I needed was a hat to tell the world. For the past 5 months I have been sporting a /. hat and matching retro LED digital watch from ThinkGeek.com and it was time for a change. I thought for sure that there would be a vendor out there selling "I got Dugg" or "Digg It" or even a logo embroidered hat, but I could not find one at all! I would have to say that this company has sure missed out on some sweet merchandising opportunities. So I will have to stay with my /. hat.
Above is a comp I put together at CafePress.com. I even mailed it to Digg.com and asked what they thought. They kindly asked that I refrain from using their mark on resale items until their lawyers finished helping them with their copyrights and licensing agreements. Maybe I will go back to /.