Skip to main content

The outsourcing question

I run a web development business, and am always engaged in a question about how many of my supporting services should be contracted out or done myself. And for what I don't do myself, who I can trust to deliver that service reliably to my clients. And what to do when that service fails.

This is not an academic debate this week for me.

On Sunday, my server-hardware supplier failed me miserably. On Friday, I notified them of errors showing up in my log related to one of my disks (the one that held the database data and backup files). They diagnosed it as a controller issue and scheduled a replacement for Sunday early morning. So far so good. It took longer than they had expected, but it came back and seemed to check out on first report, so I thought we were done. It was Sunday morning and I wasn't going to dig too deep into what I thought was a responsible service providers' area of responsibility.

On Sunday evening, Karin (my business associate at Blackfly) called me at home (which she normally never does) to alert me that the server was failing. By that point, the disk was unreadable, so we scheduled a disk replacement and I resigned myself to using my offsite backups, which were now a day older than normal because the hardware replacement had run over the hour when the backup schedule runs normally (why didn't I manually run it after the hardware "upgrade"? yes).

That server has been much too successful of late, and loading all the data from my offsite server was much slower than I'd anticipated (i.e. 2 hours), and then running all the database restores took a while. To make it worse, I decided that it was a good opportunity to update my Mariadb (mysql) version from 5.2 to 5.5. That added unexpected extra stress and complications to it (beware the character configuration changes!!!!), which I can mostly only blame myself for, but at least I suffered for it correspondingly with lack of sleep.

But then on Monday, after sweeping up a bit, I discovered that not only had the hardware swap that was done on Sunday morning not addressed the problem and made it much harder by postponing what could have been a simple backup to the other disk, they actually swapped good hardware for older hardware of lesser capacity - in other words, the response to the problem had been to make it considerably worse. I had a few words with them, I'll give them an opportunity to come up with something before I shame them publicly.

Now it's Tuesday morning and and the one other major piece of infrastructure that I outsource (DNS/Registration, to hover.com) is down, has been for the last hour.

In cases like this, my instinct is to circle the wagons and start hosting out of my basement (just kidding!) and run my own dns service (also kidding, though less so). On the other hand, the advantage of not being responsible is that it gives me time to write on my blog when they're messed up.

Conclusion: there are no easy answers to the outsourcing question. By nature, I take my responsibilities a little bit too close to heart, and have a corresponding outlook on what healthy 'growth' looks like. Finding a reliable partner is tough. It's what I try to be.

Update: here's an exchange with my server host after they asked when they could schedule time to put the right cpu back in, and asking me whether they wanted to keep the same ticket or a different one:

Me:

Thanks for this. I don't care if it's this ticket or another one. Having a senior technician to help sounds good, and I wonder if you could also tell me what you plan to do - are you going to put back my original chassis + cpu or try to swap in my old cpus into this chassis? Or are you just going to see what's available at the time?

The cavalier swapping of mislabelled parts after a misdiagnosis of the original problem points to more than a one-off glitch, particularly in light of previous errors I've had with this server - it sounds to me like a you've got a bigger problem and having a few extra hands around doesn't convince me that you've addressed it. 

What I have experienced is that you are claiming and charging for a premium service and delivering it like a bargain basement shop.

Them:

We will check available options prior to starting work during the maintenance window. 
 
We are currently thinking we would like to avoid the old chassis in case there are any SCSI issues and move the disks to another, tested chassis. As an option, we could add a CPU to the current server. 
 
If you have any preference on these options, we will make it the priority. 
 
I apologize again for the mistakes made, and the resulting downtime you have experienced. 


Is it just me, or did they just confirm what I was afraid of?

Popular posts from this blog

The Tyee: Bricolage and Drupal Integration

The Tyee is a site I've been involved with since 2006 when I wrote the first, 4.7 version of a Drupal module to integrate Drupal content into a static site that was being generated from bricolage. About a year ago, I met with Dawn Buie and Phillip Smith and we mapped out a number of ways to improve the Drupal integration on the site, including upgrading the Drupal to version 5 from 4.7. Various parts of that grand plan have been slowly incorporated into the site, but as of next week, there'll be a big leap forward that coincides with a new design [implemented in Bricolage by David Wheeler who wrote and maintains Bricolage] as well as a new Drupal release of the Bricolage integration module . Plans Application integration is tricky, and my first time round had quite a few issues. Here's a list of the improvements in the latest version: File space separation. Before, Drupal was installed in the apache document root, which is where bricolage was publishing it's co...

A Strange Passion for Security

I'm not a computer security expert, but it's been part of my work for many years, in different forms.  A very long time ago, a friend hired me to write up a primer for internet security, and ever since then it's been a theme that's sat in the background and pops up every now and then . But lately, it's started to feel like more than a theme, and but indeed a passion. You may consider computer and internet security to be a dry subject, or maybe you imagine feelings of smugness or righteousness, but "passion" is the right word for what I'm feeling. Here's google's definition: Passion: 1. a strong and barely controllable emotion. 2. the suffering and death of Jesus. Okay, let's just go with number 1. for now. If you followed my link above to other posts about security, you'll notice one from eight years ago where I mused on the possibility of the discovery of a flaw in how https works. Weirdly enough, a flaw in https was discovered shortly...

Orchestrating Drupal + CiviCRM containers into a working site: describing the challenge

In my previous posts, I've provided my rationale for making use of Docker and the microservices model for a boutique-sized Drupal + CiviCRM hosting service. I've also described how to build and maintain images that could be used for the web server (micro) service part of such a service. The other essential microservice for a Drupal + CiviCRM website is a database, and fortunately, that's reasonably standard. Here's a project that minimally tweaks the canonical Mariadb container by adding some small configuration bits:  https://github.com/BlackflySolutions/mariadb That leaves us now with the problem of "orchestration", i.e. how would you launch a collection of such containers that would serve a bunch of Drupal + CiviCRM sites. More interestingly, can we serve them in the real world, over time, in a way that is sustainable? i.e. handle code updates, OS updates, backups, monitoring, etc? Not to mention the various crons that need to run, and how about things ...