Skip to main content

An apache OOM (out of memory) emergency in a container

On Sunday last, a (Linux) server in my infrastructure that was running a fairly conservative number of docker containers in production was brought to its knees. The monitoring data (from prometheus) showed that cpu was all gobbled up (from an average of less than 2% to a steady 75%-ish) and remained gobbled up until the server was rebooted. Notably, the disk usage and throughput went down during the event, and memory usage did not change notably, nor was it notable high. On review of the messages log, one of the last entries before the event was documentation of an apache OOM (out of memory) event. On this server, apache is only running inside containers, which are generally limited to 500Mb (by docker). So presumably, a docker container running apache ran out of memory and tried to recover some memory and that was what triggered the event. Reviewing the log of requests before the emergency, it's not clear which container or url or urls might have been generating so much memory use. It is a fast server, but the apache containers are all running mod_php, so it's entirely possible that some sequence of requests could generate a lot of parallel apache workers all bloated with php memory. This particular infrastructure design has been in production for a year and a half, and this server is the fastest and most lightly loaded, so it makes me think that it's not an obvious general design problem, but perhaps an edge case specific one to one of my recently adopted sites. When reviewing the urls that were accessed in the minutes before the oom event, three possibilities stand out: an old Drupal 6 site with a calendar view that is being spidered, the same site that has a small custom code app in it, and a new wordpress site showing lots of admin-ajax.php calls. For all of these cases, reducing php memory, reducing the maximum number of workers and/or increasing the memory of the docker containers would all be reasonable strategies. BUT - what surprises me the most is that this kind of event isn't better handled already. Specifically - a much better strategy for an out of memory event would be to shut down the container and restart it. Obviously, that wouldn't be the responsiblity of the apache process itself, but it does surprise me that the standard apache image doesn't have some magic in it to help facilitate this kind of option. Of course, this example is also a reasonable argument for the use of php-fpm, but up until now I'd been crossing my fingers that varnish in front of my containers might go some way to handling the common mod-php complaints of memory usage. I'll also confess that I have not been monitoring container memory usage (because cadvisor is such a resource hog), but that would be smart.

Popular posts from this blog

The Tyee: Bricolage and Drupal Integration

The Tyee is a site I've been involved with since 2006 when I wrote the first, 4.7 version of a Drupal module to integrate Drupal content into a static site that was being generated from bricolage. About a year ago, I met with Dawn Buie and Phillip Smith and we mapped out a number of ways to improve the Drupal integration on the site, including upgrading the Drupal to version 5 from 4.7. Various parts of that grand plan have been slowly incorporated into the site, but as of next week, there'll be a big leap forward that coincides with a new design [implemented in Bricolage by David Wheeler who wrote and maintains Bricolage] as well as a new Drupal release of the Bricolage integration module . Plans Application integration is tricky, and my first time round had quite a few issues. Here's a list of the improvements in the latest version: File space separation. Before, Drupal was installed in the apache document root, which is where bricolage was publishing it's co...

A Strange Passion for Security

I'm not a computer security expert, but it's been part of my work for many years, in different forms.  A very long time ago, a friend hired me to write up a primer for internet security, and ever since then it's been a theme that's sat in the background and pops up every now and then . But lately, it's started to feel like more than a theme, and but indeed a passion. You may consider computer and internet security to be a dry subject, or maybe you imagine feelings of smugness or righteousness, but "passion" is the right word for what I'm feeling. Here's google's definition: Passion: 1. a strong and barely controllable emotion. 2. the suffering and death of Jesus. Okay, let's just go with number 1. for now. If you followed my link above to other posts about security, you'll notice one from eight years ago where I mused on the possibility of the discovery of a flaw in how https works. Weirdly enough, a flaw in https was discovered shortly...

Orchestrating Drupal + CiviCRM containers into a working site: describing the challenge

In my previous posts, I've provided my rationale for making use of Docker and the microservices model for a boutique-sized Drupal + CiviCRM hosting service. I've also described how to build and maintain images that could be used for the web server (micro) service part of such a service. The other essential microservice for a Drupal + CiviCRM website is a database, and fortunately, that's reasonably standard. Here's a project that minimally tweaks the canonical Mariadb container by adding some small configuration bits:  https://github.com/BlackflySolutions/mariadb That leaves us now with the problem of "orchestration", i.e. how would you launch a collection of such containers that would serve a bunch of Drupal + CiviCRM sites. More interestingly, can we serve them in the real world, over time, in a way that is sustainable? i.e. handle code updates, OS updates, backups, monitoring, etc? Not to mention the various crons that need to run, and how about things ...