An apache OOM (out of memory) emergency in a container

On Sunday last, a (Linux) server in my infrastructure that was running a fairly conservative number of docker containers in production was brought to its knees. The monitoring data (from prometheus) showed that cpu was all gobbled up (from an average of less than 2% to a steady 75%-ish) and remained gobbled up until the server was rebooted. Notably, the disk usage and throughput went down during the event, and memory usage did not change notably, nor was it notable high. On review of the messages log, one of the last entries before the event was documentation of an apache OOM (out of memory) event. On this server, apache is only running inside containers, which are generally limited to 500Mb (by docker). So presumably, a docker container running apache ran out of memory and tried to recover some memory and that was what triggered the event. Reviewing the log of requests before the emergency, it's not clear which container or url or urls might have been generating so much memory use. It is a fast server, but the apache containers are all running mod_php, so it's entirely possible that some sequence of requests could generate a lot of parallel apache workers all bloated with php memory. This particular infrastructure design has been in production for a year and a half, and this server is the fastest and most lightly loaded, so it makes me think that it's not an obvious general design problem, but perhaps an edge case specific one to one of my recently adopted sites. When reviewing the urls that were accessed in the minutes before the oom event, three possibilities stand out: an old Drupal 6 site with a calendar view that is being spidered, the same site that has a small custom code app in it, and a new wordpress site showing lots of admin-ajax.php calls. For all of these cases, reducing php memory, reducing the maximum number of workers and/or increasing the memory of the docker containers would all be reasonable strategies. BUT - what surprises me the most is that this kind of event isn't better handled already. Specifically - a much better strategy for an out of memory event would be to shut down the container and restart it. Obviously, that wouldn't be the responsiblity of the apache process itself, but it does surprise me that the standard apache image doesn't have some magic in it to help facilitate this kind of option. Of course, this example is also a reasonable argument for the use of php-fpm, but up until now I'd been crossing my fingers that varnish in front of my containers might go some way to handling the common mod-php complaints of memory usage. I'll also confess that I have not been monitoring container memory usage (because cadvisor is such a resource hog), but that would be smart.

The kernel of my home office

Search This Blog

An apache OOM (out of memory) emergency in a container

Popular posts from this blog

drupal, engagement, mailing lists, email

IATS and CiviCRM

The Tyee: Bricolage and Drupal Integration