Skip to main content

An apache OOM (out of memory) emergency in a container

On Sunday last, a (Linux) server in my infrastructure that was running a fairly conservative number of docker containers in production was brought to its knees. The monitoring data (from prometheus) showed that cpu was all gobbled up (from an average of less than 2% to a steady 75%-ish) and remained gobbled up until the server was rebooted. Notably, the disk usage and throughput went down during the event, and memory usage did not change notably, nor was it notable high. On review of the messages log, one of the last entries before the event was documentation of an apache OOM (out of memory) event. On this server, apache is only running inside containers, which are generally limited to 500Mb (by docker). So presumably, a docker container running apache ran out of memory and tried to recover some memory and that was what triggered the event. Reviewing the log of requests before the emergency, it's not clear which container or url or urls might have been generating so much memory use. It is a fast server, but the apache containers are all running mod_php, so it's entirely possible that some sequence of requests could generate a lot of parallel apache workers all bloated with php memory. This particular infrastructure design has been in production for a year and a half, and this server is the fastest and most lightly loaded, so it makes me think that it's not an obvious general design problem, but perhaps an edge case specific one to one of my recently adopted sites. When reviewing the urls that were accessed in the minutes before the oom event, three possibilities stand out: an old Drupal 6 site with a calendar view that is being spidered, the same site that has a small custom code app in it, and a new wordpress site showing lots of admin-ajax.php calls. For all of these cases, reducing php memory, reducing the maximum number of workers and/or increasing the memory of the docker containers would all be reasonable strategies. BUT - what surprises me the most is that this kind of event isn't better handled already. Specifically - a much better strategy for an out of memory event would be to shut down the container and restart it. Obviously, that wouldn't be the responsiblity of the apache process itself, but it does surprise me that the standard apache image doesn't have some magic in it to help facilitate this kind of option. Of course, this example is also a reasonable argument for the use of php-fpm, but up until now I'd been crossing my fingers that varnish in front of my containers might go some way to handling the common mod-php complaints of memory usage. I'll also confess that I have not been monitoring container memory usage (because cadvisor is such a resource hog), but that would be smart.

Popular posts from this blog

The Tyee: Bricolage and Drupal Integration

The Tyee is a site I've been involved with since 2006 when I wrote the first, 4.7 version of a Drupal module to integrate Drupal content into a static site that was being generated from bricolage. About a year ago, I met with Dawn Buie and Phillip Smith and we mapped out a number of ways to improve the Drupal integration on the site, including upgrading the Drupal to version 5 from 4.7. Various parts of that grand plan have been slowly incorporated into the site, but as of next week, there'll be a big leap forward that coincides with a new design [implemented in Bricolage by David Wheeler who wrote and maintains Bricolage] as well as a new Drupal release of the Bricolage integration module . Plans Application integration is tricky, and my first time round had quite a few issues. Here's a list of the improvements in the latest version: File space separation. Before, Drupal was installed in the apache document root, which is where bricolage was publishing it's co...

Refactoring My Backup Process

A couple of weeks ago, I decided to spend a few hours on a Friday afternoon improving my backup process for my Blackfly managed hosting service . Two weeks later, I've published my ongoing work as an update to my backup-rsync project and have decided to share it with you. You might think I'm trying to compete for "least click-bait like title ever", but I'm going to claim this topic and project might be of interest to anyone who likes to think about refactoring , or who is implementing backups for container-based hosting (like mine ). Definition "Backup" is one of those overloaded words in both vernacular and computer-specific use, so I want to start with definitions. Since "a backup" is amongst the least interesting objects (unless it contains what you absolutely need in that moment), I think it's more interesting and useful to define backups functionally, i.e. A "backup process" is a process that 1. provides a degree of insuranc...

drupal, engagement, mailing lists, email

I lived, worked and studied in Costa Rica from 1984 to 1989. Ostensibly, I was there to study Mathematics at the University, and indeed I graduated with an MSc. in Mathematics supervised by Ricardo Estrada (check that page, he even advertises me as one of his past students). And yes, I do have a nine page thesis that I wrote and defended in Spanish somewhere in my files, on a proof and extension of one of Ramanujan's theories. But mathematics is a pretty lonely endeavour, and what drew me back to Central America (after the first visit, which was more of an accident), was the life and politics. The time I lived there was extremely interesting (for me as an outsider, though also painful and tragic for it's inhabitants) because of the various wars that were largely fuelled by US regional hegemonic interests (of the usual corporate suspects and individuals) and neglect (of the politicians and public) - the Contra war in Nicaragua, the full-scale guerrilla wars in El Salvador and...