Skip to main content

Infrastructure projects

I've been running my own server for a year and a half now, and have been surprised at how trouble free it's been. I attribute this to:

  1. luck
  2. good planning
  3. a decent upstream provider
  4. the maturity of linux distribution maintenance tools (e.g. yum)

In this case, good planning means:

  1. keeping it as simple as possible
  2. doing things one at a time
  3. i'm the only one mucking about on it
And so this month, inspired by some Drupal camp sessions, I decided to take some time to make a good thing better. My goals were:
  1. Optimizing my web servicing for more traffic.
  2. Simplifying my Drupal maintenance.
  3. Automating my backups.

And here's the results ...

Web Servicing Optimizations

This was relatively easy - I just finished off the work from here:

Specifically, i discovered that I hadn't actually setup a mysql query cache, so I did that. And then I discovered that it was pretty easy and not dangerous to remove a bunch of the default apache modules. All I had to do was comment out the lines of the httpd.conf file. I took out some other gunk in there that isn't useful for Drupal sites (multilingual icons, auto indexing).

I like to think that between those two, the response time is even better, though the difference is relatively marginal without much load. The real reason to do this is to increase the number of available servers in apache without the risk of going into swap death. So I can now add more sites with out fear.

Simplifying Drupal Maintenance

I was converted to SVN (a version control program) 3 years ago and still love it. I've been using it to methodically track all the code, with individual repositories for each of my major projects, using the full trunk and vendor branch setup and the magic of svn_load_dirs.

But after a project starts using a lot of contributed modules, or when there are several code security updates each year and you have several projects, this starts getting time consuming.

So I've started NOT putting drupal core or contributed modules into my svn projects, and I'm using one multi-site install for most of my sites. Along with the fabulous update_status module for Drupal 5 (which is in core for Drupal 6), keeping up-to-date is now much more manageable. It's also a change of mind set - I'm now more committed (pun intented) to the Drupal community. I means I can no longer hack core (at least not without a lot of work).

And so -- I also tested this whole scheme out by moving all my simple projects to a new document root that's controlled entirely via cvs to the server, with symlinks out to my individual site roots (which still go in svn, so i can keep track of themes, files and custom modules), and it worked well. There's actually a performance issue here as well - by keeping all my sites on the same document root, the php cache doesn't fill up so fast, because there's less code running. And it's more easily kept secured.

And as a final hurrah, I converted up to Drupal 6. In the process, I've given up on the 'links' module which I thought had some promise, and am now just using the 'link' module that defines link fields for cck. I also started learning about the famed Drupal 6 theming, and tweaked the theme for fun.


I backup to an offsite-server using rsync, which seems to be a common and highly efficient way to do things for a server like this. Rsync is clever to only send file diffs, so load and bandwidth are kept to a minimum. My backups are not for users, they're only for emergencies, so I don't need to do hourly snapshots, only daily rsyncs.

Well, this works well for code, but not so much for mysql. I'd been doing full mysqldumps, and then copying them to my backup server, but this was not very efficient. So finally this week, I've set it up with help from some simple scripts to use the --tab parameter to mysqldump - which dumps the tables in each database to separate files. This means that now when I run rsync on them, it's clever enough to only worry about the tables that have changed, which are relatively few each day. So now I've got daily mysql backups as well, without huge load/bandwidth!

And that also means, I can now use my backup as a place to pull copies of code and database when I want to setup a development environment.


Which takes me almost to a new topic, but it's also about infrastructure, so here it is. I've been running little development servers for several years. My main one I actually found being thrown out (it was a Pentium II). They have served me well, but I was rethinking my strategy mainly on power issues: I'm not happy that I have to use so much electricity for them (and as older servers, the power supplies aren't very efficient), and since one of them is actually in my office, it's fine in the winter when my office is cold, but really not good in the summer when I'm trying to stay cool.

And so the promise of virtualization lured me into believing I could run a little virtual server off my desktop. I tried XEN, but it broke my wireless card (because I have to run it using ndiswrapper), so I finally gave up and installed VMWare (because it was in an ubuntu-compatible repository), even though it's not really open source.

Does it work? Well, so far so good.

Popular posts from this blog

Me and varnish win against a DDOS attack.

This past month one of my servers experienced her first DDOS - a distributed denial of service attack. A denial of service attack (or DOS) just means an attempt to shut down an internet-based service by overwhelming it with requests. A simple DOS attack is usually relatively easy to deal with using the standard linux firewall called iptables.  The way iptables works is by filtering the traffic based on the incoming request source (i.e., the IP of the attacking machine). The attacking machine's IP can be added into your custom ip tables 'blacklist' to block all traffic from it, and it's quite scalable so the only thing that can be overwhelmed is your actual internet connection, which is hard to do.

The reason a distributed DOS is harder is because the attack is distributed from multiple machines. I first noticed an increase in my traffic about a day after it had started - it wasn't slowing down my machine, but it did show up as a spike in traffic. I quickly saw that…

CiviCRM's invoice_id field and why you should love the hash

I've been banging my head against a distracted cabal of developers who seem to think that a particular CiviCRM core design, which I'm invested in via my contributed code, is bad, and that it's okay to break it.

This post is my attempt to explain why it was a good idea in the first place.

The design in question is the use of a hash function to populate a field called 'invoice_id' in CiviCRM's contribution table. The complaint was that this string is illegible to humans, and not necessary. So a few years ago some code was added to core, that ignores the current value of invoice_id and will overwrite it, when a human-readable invoice is generated.

The complaint about human-readability of course is valid, and the label on the field is misleading, but the solution is terrible for several reasons I've already written about.

In this post, I'd like to explain why the use of the hash value in the invoice_id field is actually a brilliant idea and should be embrac…

Upgrading to Drupal 8 with Varnish, Time to Upgrade your Mental Model as well

I've been using Varnish with my Drupal sites for quite a long while (as a replacement to the Drupal anonymous page cache). I've just started using Drupal 8 and naturally want to use Varnish for those sites as well. If you've been using Varnish with Drupal in the past, you've already wrapped your head around the complexities of front-end anonymous page caching, presumably, and you know that the varnish module was responsible for translating/passing the Drupal page cache-clear requests to Varnish - explicitly from the performance page, but also as a side effect of editing nodes, etc.

But if you've been paying attention to Drupal 8, you'll know that it's much smarter about cache clearing. Rather than relying on explicit calls to clear specific or all cached pages, it uses 'cache tags' which require another layer of abstraction in your brain to understand.

Specifically, the previous mechanism in Drupal 7 and earlier was by design 'conservative' …