Not so long ago, Fight Aging! was subject to a seemingly random distributed denial of service (DDOS) attack - which may have been no more than the flailing of a badly configured spam bot network, but it still managed to knock the site offline. One of the ways in which less ethical folk can make money online is by building up networks of compromised machines, a mix of vulnerable servers found and taken over, coupled with personal computers that fall victim to malign websites and browser vulnerabilities. These networks can be rented out to spammers and used to execute commands to write comments, register at forums to make spam posts, and send email - spam remains a profitable industry, sadly, which is why we see so much of it.
Fight Aging! is built upon Movable Type, a popular blogging software platform. Since this is a popular platform, it is targeted by most spammers and spam-enabling toolkits; they largely focus on leaving spam comments. Fight Aging! is not a well trafficked blog, as I outlined yesterday, but it does touch on topics that are associated with high value keywords in the world of search engine advertising. This means that there are plenty of people and software programs out there who want to insert links into Fight Aging! pages - such as in the comments to a post - that point to their site. By doing so, they gain authority in some search engines, but also gain a few additional visitors. Given that it costs next to nothing for the spammers to try to submit comments, they don't spend too much time thinking about whether it's worth it or not for any individual site - they'll just spam everyone they can find who might be remotely relevant to whatever moneymaking scheme lies at the base of it all.
The bottom line is that I see a lot of hits to the comment posting pages of Fight Aging! as a matter of course. The DDOS event was a very large step up from business as usual, coupled with what amounted to a download of the entire site - possibly to identify higher value pages to try to spam with comments. But who knows; one of the realities of managing a site is that it is rarely worth the cost to try to find out exactly why this sort of thing happened, provided that it stops.
At the time, I was hosting Fight Aging! on a shared server at getNetworks, who I should note have provided unfailingly good service for very little money since this site launched back in 2004. The support folk there did the sensible thing and blocked access to Fight Aging! to protect the other paying customers on the same server. The upside of a shared server web hosting arrangement is that it is cheap. The downside is that you have comparatively little control over the configuration of server and its software. The DDOS attack was a final prompt urging me to up and move Fight Aging! to a new home, one that gave me enough control to add armor against problems of this nature and tinker the hosting environment to be more to my liking.
Today, Fight Aging! runs on a small EBS boot instance in Amazon EC2 running Fedora Core 14. Web pages are served by Apache 2, Perl 5, and PHP 5, and data is stored in MySQL 5.
But what on earth does that all mean? Stick around, and find out. EC2 is the Amazon Elastic Compute Cloud, a service that allows people to register accounts and launch and manage a fleet of virtual servers (or "instances"). When I log in to one of my virtual servers, it appears to me just like a physical server, but it is in fact drawing on the resources from any number of physical machines, and where the physical processing actually happens may vary from moment to moment. This is cloud computing, where the "servers" we customers interact with are abstractions.
On EC2, server uptime, bandwidth, and storage space are metered like water. If I use more, I pay more. Virtual server instances come in various sizes; "small" in this case means the equivalent of a single processor machine with 1.7G of RAM - though the comparison is slightly fuzzy, given that this is only an abstraction of a machine. It works out to be considerably cheaper than renting a physical server, and I gain some useful tools, such as the ability to clone the entire Fight Aging! server whenever I want and spin up a backup or test version in a matter of minutes should I want to tinker.
Most virtual server instances on EC2 do not persist their hard drive data when they are shut down - they were launched from a server disk image, will do their work, and no-one really cares about the additional data they collect while running. For those of us who need something that behaves more like a real server, where data is kept around even if the server restarts, there is the Elastic Block Store (EBS) option. Clearly I care about ongoing storage of data for Fight Aging! - posts, comments, logs, and so on - and so the Fight Aging! instance uses EBS.
The instance runs Fedora Core 14, which is an open source Linux operating system variant. It, like most modern Linuxes, gives me the freedom to install more or less anything a server would need using simple commands. Gone are the days in which you had to fight your machine tooth and nail to install anything; package management software has come a long way, and installing new software is as simple as typing "yum install [name of package here]" and pressing enter a couple of times.
- MySQL is the database, in which all of the data on posts, categories, comments, and the like is stored in ways that make it efficient to arrange, query, and rearrange.
- Apache is the web server, the software that handles the process of responding to your browser and handing out the files packed with data that the browser forms into web pages.
- PHP is a programming language that is most often used to write dynamic web applications and web pages. Fight Aging! has little of the dynamic in its pages - at least in the grand scheme of things - but it is there. So the pages most often accessed by visitors, such as this one, are written in PHP.
- Perl is also a programming language that sees much of its use in web applications.
In essence, and somewhat dumbed down, the Movable Type blogging software is a set of web pages written in the Perl language that allow me to use a web browser to manage the process of creating a whole set of other web pages written in PHP - like the page you're reading now. I enter the information needed to create these PHP pages - the text and template information like the header and footer - and then Movable Type code churns along to write out PHP files that Apache and PHP can turn into web pages when you request them.
The process by which a webserver reads a PHP or Perl file, processes its instructions, accesses the database, and marshals the end result of all that activity into a stream of data to feed to your browser is not instant. It can, however be made considerably faster than the default situation on a shared server allows for. That in turn improves the server's robustness when faced with a sudden influx of visitors or a DDOS attack.
On the Fight Aging! server I employ layers of caching provided by the MySQL query cache, Memcached, the Alternative PHP Cache, and FastCGI. The basic idea behind caching is that if you need the same result more than once, a web page for example, why take all the effort to built it from scratch multiple times? The same goes for the result of asking the database for a particular piece of data, any array of data you build in code, and even the machine-level instructions that hand-written code compiles into when it is prepared for execution. Do it once, and save the result for the next time you need it. The four items I list above are all open source, freely available technologies that considerably speed up the operation of the Fight Aging! web server through caching - by saving it from performing the same expensive tasks over and over again.
Despite that, I have to say that the administrative interface of Movable Type 4.35 is still horribly slow. You folk get to see the blindingly fast front end of the site, and I'm stuck with the bloat that SixApart have coded into their later versions. Before I made the move to EC2, I was still using Movable Type 3.3 - an old version, but solid and I had customized it over the years to remove a number of the issues it had. Movable Type gets the job done, but I have been consistently disappointed with each new version: every upgrade has proved slower and more bloated than the last, while adding very little that was of benefit to my needs. Unfortunately, complaining about blogging software is much like complaining about political parties - they are all variations on terrible, and you either learn to live with it or move to an island (which in this analogy would be writing your own blogging software, a process that has a whole host of its own drawbacks).
There is more to managing a web site than the web server, however - there is also the minor matter of email. In a shared hosting environment, this is managed for you by the host, but there is no such provision in EC2; you have to roll up your sleeves and manage it yourself. Thus next door to the Fight Aging! web server instance in my EC2 account there stands a small Ubuntu Linux mail server instance. Having spent some time assembling the thing, I'll say this about building mail servers: it's very much like piling up an abstract sculpture made of balanced, loaded weapons that you don't fully understand, all the while hoping that the instruction manuals are up to date and accurate. The nature of the email ecosystem makes mail servers far more complex and consequential than web servers.
If you fail to correctly configure your web server, the worst that can happen is that it doesn't work at all, or hands out copies of your unprocessed code to any visitor who happens by. By way of comparison, the worst that can happen if you incorrectly configure a mail server is that spammers will descend upon you, turn it into a spam proxy, and lead to your domain becoming blacklisted in ways that are very hard to remove. Also, your server will reveal a range of personal information about you and anyone else who uses it, will lose random emails and leave no record that it did so, and will tell you that it delivered emails that it in fact didn't deliver. Additionally, many domains will reject your email outright or flag you as a spam source based on nothing more than the way in which your mail server responds to requests. I could go on - there's more, but you get the point.
Email is hard. So I was cautious and built the Fight Aging! mail server from a great set of instructions - which I should add still required me to come back and spend some days configuring yet more information into the system once I had the basics sorted out. For example, I use Sender Policy Framework to try to ensure that no-one can forge emails from @fightaging.org addresses, and just getting that to work correctly required testing across several days of operation.
Internally, a mail server is a collaboration between numerous pieces of distinct software that hand pieces of mail off to one another: the local delivery mechanism, the outbound delivery mechanism, the virus checker, the spam checker, the greylister, and so forth. Each of these is written by an entirely different group, often years apart in time, and so has completely different modes of configuration and operation. The pipes that pass mail and other information between these pieces of software are complex and infinitely configurable - and have to be done exactly right, or else. The "or else" here is usually of the "and now I discard this random piece of important mail without telling you" variety.
It was a challenge, but fortunately I now have a functional, defensively configured mail server, fingers crossed.
Beyond the modest cost of using the EC2 platform, all of the high-powered, complex software I've mentioned in this post is free - open source and gratis. The results of countless programmer-years of work can be downloaded and worked with in a matter of seconds, with no outlay other than my time. We live in a fascinating age.