Friday, October 16, 2009

CiviCRM Import: Advanced Techniques

I'm almost finished a big new CiviCRM import/installation that I've been working on for longer that I'd planned. That's pretty normal, and of course there were a number of things that should have been warnings:

  1. it's a huge mass of data from an established non-profit: more than 5000 constituents, more than 50,000 contributions, from more than 10 years.
  2. it's being imported from Raiser's Edge, the cadillac of desktop fundraising software.
  3. the clients are very attached to the detailed minutae they have accumulated about their constituents, all of which were faithfully entered into Raiser's Edge.

After a round of a sample import and looking at what the data might look like in CiviCRM, we were able to whittle it down a little bit, but the last month I've been working on the big final import and have developed some techniques that I expect would be generally useful with big CiviCRM imports.

Two Scripting Techniques

For my first test import, I pulled the CSV files into an OpenOffice spreadsheet and manually cleaned things up [e.g. dates, bad characters, etc.]. That works for small imports, but not for large ones, because:

  1. it's not reliably reproduceable [and/or it takes a long time]
  2. there are some things it doesn't do well [like splitting fields off to separate imports].

For my scripts in this project, I used two simple techniques:

a. add a column type row

before processing the CSV files, I added a row underneath the header, and inserted one of several 'column types' that my script could use to parse the data. For example, "Date" columns need to be converted into ISO-8601 format. "Select" columns are custom fields with option groups - so the labels in this column get cleaned up into values and those pairs get added into the database as valid options. "Note" fields get written to a different file and then imported as activities. I wrote separate scripts for each of the client files to process [i.e. constituents, notes, gifts, and extra attributes]. Some of the scripts also refer to the column header.

b. chunking

i still use the standard civicrm import technique [because I'm too lazy to script to write directly to the database, and also I'd like to not have to reinvent my wheels each time the database structures change], but the files I got from the client are way to huge - it would time out. I probably should have checked out the option of importing from a database via sql, but instead, my script just breaks the files up into no more than 1000 0r 2000 records at a time [reinsterting the header each time]. That means manually uploading a lot of files [e.g. 17 contribution files], so I have a little piece of paper handy on which i'm writing the upload file number and the time beside it so I can keep track while I'm editing this blog between uploads.

I should also say that the first time I did the sample import, I was really stretching my little staging machine. I'm now doing this import on a virtual server running off my desktop which I upgraded in the interim so it's a little less painful, though I still get a fair amount of time to edit my blog. You certainly don't want to do big imports on a live server!

Documentation

Importing into CiviCRM usually involves some degree of judgement calls. Some of the fields migrate naturally into CiviCRM fields [last name, province, ...] but all imports I've done include fields that have no built-in CiviCRM equivalent. For that, the standard advice is to turn it into a custom field, which can then be manipulated further after the import [e.g. turn it into a group, create equivalent functionality via a smart group, etc.].

This is where it usually pays to hire a professional - you have to consider not only how to get the data into CiviCRM, but you have to understand what kind of functionality you need from that data and import it somewhere where you can use it properly.

But more than that - what happens in a couple of months? In this case, there are so many fields and functions being imported that the mass of detail is almost overwhelming. So documenting what export field goes into what field/function in CiviCRM is essential not only for me [since I have to redo things with gaps of a month or two between imports], but even more for the staff and future staff of the client who are going to say "where did such and such go"?.

So my second technique is a simple one for documentation. I set up a new node type [I'm using Drupal w/ CCK and views of course], with these fields:


$content['type']  = array (
 'name' => 'CiviCRM Field Migration',
 'type' => 'civimigrate',
 'description' => 'Description of mapping of raiser\'s edge fields to civicrm fields.',
 'title_label' => 'Raiser\'s Edge Field',
 'body_label' => 'Description',
 'min_word_count' => '0',
 'help' => '',
 'node_options' =>
 array (
   'status' => true,
   'promote' => false,
   'sticky' => false,
   'revision' => false,
 ),
 'language_content_type' => 0,
 'upload' => '0',
 'scheduler' => 0,
 'scheduler_touch' => 0,
 'i18n_node' => '0',
 'old_type' => 'civimigrate',
 'orig_type' => '',
 'module' => 'node',
 'custom' => '1',
 'modified' => '1',
 'locked' => '0',
 'image_attach' => '0',
 'image_attach_size_teaser' => 'thumbnail',
 'image_attach_size_body' => 'thumbnail',
);
$content['fields']  = array (
 0 =>
 array (
   'label' => 'civicrm_field',
   'field_name' => 'field_civicrm_field',
   'type' => 'text',
   'widget_type' => 'text_textfield',
   'change' => 'Change basic information',
   'weight' => '1',
   'rows' => '1',
   'size' => 60,
   'description' => '',
   'default_value' =>
   array (
     0 =>
     array (
       'value' => '',
     ),
   ),
   'default_value_php' => '',
   'default_value_widget' => NULL,
   'group' => false,
   'required' => 0,
   'multiple' => '0',
   'text_processing' => '0',
   'max_length' => '',
   'allowed_values' => '',
   'allowed_values_php' => '',
   'op' => 'Save field settings',
   'module' => 'text',
   'widget_module' => 'text',
   'columns' =>
   array (
     'value' =>
     array (
       'type' => 'text',
       'size' => 'big',
       'not null' => false,
       'sortable' => true,
       'views' => true,
     ),
   ),
   'display_settings' =>
   array (
     'label' =>
     array (
       'format' => 'inline',
     ),
     'teaser' =>
     array (
       'format' => 'hidden',
       'exclude' => 0,
     ),
     'full' =>
     array (
       'format' => 'default',
       'exclude' => 0,
     ),
   ),
 ),
 1 =>
 array (
   'label' => 'civicrm_note',
   'field_name' => 'field_civicrm_note',
   'type' => 'text',
   'widget_type' => 'text_textfield',
   'change' => 'Change basic information',
   'weight' => '2',
   'rows' => '1',
   'size' => 60,
   'description' => '',
   'default_value' =>
   array (
     0 =>
     array (
       'value' => '',
     ),
   ),
   'default_value_php' => '',
   'default_value_widget' =>
   array (
     'field_civicrm_note' =>
     array (
       0 =>
       array (
         'value' => '',
         '_error_element' => 'default_value_widget][field_civicrm_note][0][value',
       ),
     ),
   ),
   'group' => false,
   'required' => 0,
   'multiple' => '0',
   'text_processing' => '0',
   'max_length' => '',
   'allowed_values' => '',
   'allowed_values_php' => '',
   'op' => 'Save field settings',
   'module' => 'text',
   'widget_module' => 'text',
   'columns' =>
   array (
     'value' =>
     array (
       'type' => 'text',
       'size' => 'big',
       'not null' => false,
       'sortable' => true,
       'views' => true,
     ),
   ),
   'display_settings' =>
   array (
     'label' =>
     array (
       'format' => 'inline',
     ),
     'teaser' =>
     array (
       'format' => 'hidden',
       'exclude' => 0,
     ),
     'full' =>
     array (
       'format' => 'default',
       'exclude' => 0,
     ),
   ),
 ),
 2 =>
 array (
   'label' => 'civicrm_transformation',
   'field_name' => 'field_civicrm_transformation',
   'type' => 'text',
   'widget_type' => 'text_textfield',
   'change' => 'Change basic information',
   'weight' => '3',
   'rows' => '1',
   'size' => 60,
   'description' => '',
   'default_value' =>
   array (
     0 =>
     array (
       'value' => '',
     ),
   ),
   'default_value_php' => '',
   'default_value_widget' =>
   array (
     'field_civicrm_transformation' =>
     array (
       0 =>
       array (
         'value' => '',
         '_error_element' => 'default_value_widget][field_civicrm_transformation][0][value',
       ),
     ),
   ),
   'group' => false,
   'required' => 0,
   'multiple' => '0',
   'text_processing' => '0',
   'max_length' => '',
   'allowed_values' => '',
   'allowed_values_php' => '',
   'op' => 'Save field settings',
   'module' => 'text',
   'widget_module' => 'text',
   'columns' =>
   array (
     'value' =>
     array (
       'type' => 'text',
       'size' => 'big',
       'not null' => false,
       'sortable' => true,
       'views' => true,
     ),
   ),
   'display_settings' =>
   array (
     'label' =>
     array (
       'format' => 'inline',
     ),
     'teaser' =>
     array (
       'format' => 'hidden',
       'exclude' => 0,
     ),
     'full' =>
     array (
       'format' => 'default',
       'exclude' => 0,
     ),
   ),
 ),
 3 =>
 array (
   'label' => 'civicrm_todo',
   'field_name' => 'field_civicrm_todo',
   'type' => 'text',
   'widget_type' => 'optionwidgets_onoff',
   'change' => 'Change basic information',
   'weight' => '4',
   'description' => '',
   'default_value' =>
   array (
     0 =>
     array (
       'value' => 0,
     ),
   ),
   'default_value_php' => '',
   'default_value_widget' =>
   array (
     'field_civicrm_todo' =>
     array (
       'value' => false,
     ),
   ),
   'group' => false,
   'required' => 0,
   'multiple' => '0',
   'text_processing' => '0',
   'max_length' => '',
   'allowed_values' => '0|
1|*',
   'allowed_values_php' => '',
   'op' => 'Save field settings',
   'module' => 'text',
   'widget_module' => 'optionwidgets',
   'columns' =>
   array (
     'value' =>
     array (
       'type' => 'text',
       'size' => 'big',
       'not null' => false,
       'sortable' => true,
       'views' => true,
     ),
   ),
   'display_settings' =>
   array (
     'label' =>
     array (
       'format' => 'hidden',
     ),
     'teaser' =>
     array (
       'format' => 'hidden',
       'exclude' => 0,
     ),
     'full' =>
     array (
       'format' => 'default',
       'exclude' => 0,
     ),
   ),
 ),
);

And then I imported the headers of my spreadsheets, grouping them with a taxonomy term corresponding to the spreadsheet name. Then a simple view allows me to list all the fields in each spreadsheet, and document the following information for each input field:


a. Input field name [the node title]
b. Note - description of the use of the field
c. CiviCRM field - name of the field in CiviCRM it's being migrated to
d. CiviCRM note - additional information about the civicrm field [e.g. custom, etc.]
e. CiviCRM transformation - processing of the original field before putting it into CiviCRM
f. Todo [bonus field - just use it as you need to for keeping track of loose ends].

At the end of this, I've got a nice table for each input spreadsheet documenting where that value is ending up in CiviCRM and what happens to it along the way.

Thursday, October 15, 2009

Open Source Showcase for Non-Profits in Toronto

On Monday October 26th, I'll be at the "Open Source Showcase for Non-Profits" at the Centre for Social Innovation here in Toronto.

The showcase is a low-cost event where members of the non-profit sector can learn about open source projects relevant to their work. I'm helping organize and will do some presentations as well.

The idea came from Julian Egelstaff about a month ago, and he, Jane Zhang, Joe Murray, Reema Tarzi, and I met just a couple of weeks ago and have put it all together with remarkably little work. That's a tribute to the viability of the idea, the excellent organizing tools that are now available for such events, and the quality of the organizing committee. My own contribution was to set up a CiviCRM install with CiviEvents to do registration, which was impressively easier that I expected - part of my motivation was that I'd never set up a CiviEvents page, and now I'm not afraid of it anymore.

So, visit the information page and register.

Tuesday, September 15, 2009

Toronto CiviCRM Coaching Sessions for Mozilla Week

For the Mozilla Service Week, I'll be at the Centre for Social Innovation on Wednesday morning, to provide 1-1 coaching for anyone interested in using CiviCRM.

Yes, that's now tomorrow, Wednesday September 16, 2009, starting at 10 am, I hope you can come. You're supposed to sign up, as early as 9:30 for 15 minute sessions, but if you want to just drop by, you can join whoever's there.

Details about where and more details about what are here.

Friday, July 17, 2009

Toronto Drupalcamp 2009

I'm sad to say that Toronto's Drupal Camp [which I helped organize for it's first 3 years] is happening while I'm out of town. It's kind of a good thing, since I had decided to take a little sabbatical from the organizing anyway. But in case you're breathlessly wondering, check out the 2009 toronto drupal camp site. It's not ready yet, but hopefully will be by the time you read this. The dates are set for the weekend of Aug 15.

Friday, July 03, 2009

The Tyee: Bricolage and Drupal Integration

The Tyee is a site I've been involved with since 2006 when I wrote the first, 4.7 version of a Drupal module to integrate Drupal content into a static site that was being generated from bricolage. About a year ago, I met with Dawn Buie and Phillip Smith and we mapped out a number of ways to improve the Drupal integration on the site, including upgrading the Drupal to version 5 from 4.7. Various parts of that grand plan have been slowly incorporated into the site, but as of next week, there'll be a big leap forward that coincides with a new design [implemented in Bricolage by David Wheeler who wrote and maintains Bricolage] as well as a new Drupal release of the Bricolage integration module.

Plans

Application integration is tricky, and my first time round had quite a few issues. Here's a list of the improvements in the latest version:

  • File space separation. Before, Drupal was installed in the apache document root, which is where bricolage was publishing it's content. This was dangerous and confusing because of the risk of Bricolage overwriting a Drupal file and vice-versa, and the mess that it left us for version control since bricolage versioning was best maintained within the bricolage application on another machine. So in the new version, Drupal is installed in it's own non-document root directory and Drupal pages are accessible via an Apache alias command like this:
    Alias /cms /var/www/drupal-dir
    This change also allows us to be more specific about which of the bricolage url get passed through Drupal, because that mechanism has it's own mod_rewrite rule, something like:
    RewriteRule ^(.*)\.html$ /index.php?fid=%{REQUEST_FILENAME}&q=$1 [L,QSA]
  • Drupal file discovery. Drupal 'discovers' bricolage files using the Drupal custom not found mechanism. This is probably not always the best way to do it - instead Bricolage could publish a csv file of new articles that Drupal processes, or maybe even push data directly into the Drupal database. But file discovery is the mechanism that we inherited on this site, and it's robust and relatively simple. When Drupal does discover a new page, there are a few pieces of information that Drupal likes to know about, such as a page title, a unique bricolage id (if a page gets republished with a new name, it knows how to move the comments over), and whether comments are allowed for the page, to name just a few. In my first version, these bits of information were translated via some php defines, which aside from being ugly, meant that the bricolage page had to be php. So in the new version, all these values are now in meta tags.
  • Template files The best thing this version does is to get rid of the extra file that was required for each bricolage page. Previously, because of trying to reimplement the integration on a live site with existing comments in vb3, i resorted to getting bricolage to output separate template files from the original html files. Because we were starting fresh here, Bricolage can now just output one page per file and use the meta tag mechanism for all it's drupal-specific stuff. The nice result is that to remove Drupal integration, you can just update the apache mod rewrite command and the site suddenly becomes a regular php or html site.

Mainstreaming?

So in spite of failing to release early or often, i'm hoping that the new release will be appreciated and used outside of The Tyee. In that spirit, here are some step-by-step instructions for a simple install that adds static page integration to an existing Drupal installation.

  1. Create the static page directory if you don't already have one. I just added a subdirectory called 'static' to my site directory, and then added an alias so I could address pages within that directory more simply /static/. By default, these pages would not be processed by Drupal because they actually exist.
  2. Download and install the module. This won't break or do anything.
  3. Add a mod rewrite to send your 'static' files through the drupal bricolage module. See above for an example, which maps static urls like /static/pathname/filename.html to the Drupal path 'pathname/filename'. For the tyee, all the filenames are index.html, so we remove that (because each index.html file has a print.html version which doesn't need to go through Drupal).
  4. Set up the discovery mechanism. In the Drupal admin -> site config -> error reporting, put in "bricolage/notfound" as the 404 page.

With those steps complete, urls like /static/test/blah.html that correspond to an actual html page will get mapped via the url_alias mechanism to an internal Drupal path like 'bricolage/id' and display those pages as if they were phptemplate pages after running through the Drupal bootstrap and generating appropriate values (e.g. the user, blocks, etc.). To get commentability on your static pages, you'd also need to add the appropriate meta tags and on your static pages.

How is this useful?

The original use case of this module is to add Drupal commenting to a static site. Since you get a full Drupal bootstrap for each page, you also get blocks and users, and nodes if you want. Which means that really, you're injecting any Drupal-generated dynamic content into a site who's design and primary content can be controlled via another mechanism [like Bricolage].

Of course, intergration is always complicated, and the Tyee example is instructive in that bricolage is outputting php, which, without Drupal, would be reinterpreted on each page load. By running it through Drupal, you can get the static page cache for anonymous users, which has the potential to also speed up the site [but you have to consider whether the dynamic content in the page php really should be cached ...].

Monday, January 26, 2009

CentOS4 and CiviCRM 2.1

With the new year, a new resolution to upgrade some sites to the new CiviCRM 2.1. CiviCRM 2.1 is particularly special because it requires Drupal 6 and it's the first version that supports Drupal 6. So upgrades of existing Drupal 5 sites are difficult, particularly if any custom modules or themes involved.

As it turned out, my procrastination was justified. I asked my friend Rob Ellis to help with Maquila Solidarity Network, who I've been working with for a few months, and who decided that the new features in 2.1 were too good to postpone any longer. Rob did the upgrade and discovered two issues on my CentOS 4 server:

  • The CiviCRM installer insists on PHP 5.2.x
  • CiviCRM requires a version of PCRE with unicode

None of this sounds very interesting, and I wouldn't post about it, except that I would have thought it wouldn't be as hard to fix as it was. So here's what I did, in case there's someone else out there with CentOS4 (or RHEL4) trying to run CiviCRM 2.1.

Running CiviCRM 2.1 on a normal CentOS4

The original RHEL4 (and hence CentOS4) comes with php4, which is really not okay any more, but the CentOS 'extras' repository has php 5.1, which is what I've been using for the past 2 years on this server. Unfortunately, there don't seem to be any plans to upgrade this to 5.2.

From my brief reading, it looked like there wasn't a big difference from 5.1 to 5.2, and the CiviCRM maintainers didn't promise that it wouldn't work on 5.1. So the first thing Rob did was just fiddle with a couple of installation files to allow the installation to procede with 5.1. Not too surprisingly, it worked, almost.

What actually created errors, was a problem with PCRE, which is the Perl Regular Expression library. So, rob found the file that was generating the errors (packages/IDS/Converter.php) and patched it in a few places (hey, it's an external library) where it thought it cared about unicode, and voila, it worked.

Conclusion: CiviCRM 2.1 can run with slight modifications on CentOS4 (and RHEL4). Yay rob!

On The Other Hand

But I really didn't relish maintaining these modifications to CiviCRM through multiple installs and upgrades, and updating my php to 5.2 and my pcre to unicode both seemed like sensible things to do. Whether it was sensible remains to be seen, but here's how I did it.

PHP 5.2.x on CentOS4

When I googled this, I ended up being pointed to a repository called "utter ramblings", which I tried to use. While I appreciate the work Jason did on this, it didn't work. The problem arose that his upgrade also required a version of a library on which subversion depended, and he chose the standard CentOS/RHEL4 version of subversion, so his upgrade was incompatible with my up to date subversion package from Dag. I also just wasn't quite convinced that he really wanted to be keeping his repository going for a long time. As an aside, the php upgrade also thought it needed to upgrade my apache to 2.2, which wasn't a bad thing, but made the whole upgrade a little more risky and complicated.

There was another repository by a french-speaking guy called 'remi' that many people praised, but I found myself shying away from it also, based on a fear of language confusion and the fact that he was hosting his repository on a domain called 'family collette'. This was probably unfounded paranoia on my part, and his repository might have been completely adequate.

What I did end up finding, though much less prominent, is the 'atomic rocket turtle' repository, which you might think i'd avoid even more because of it's name. But it was very impressive technically - he must have a pretty good understanding of what he's doing and a good automatic build environment because he had the latest version of php 5.2 compiled shortly after it had come out in December.

So - i just followed the instructions, ran the update and it did a nice clean minimal update of php 5.2 and just a couple of small dependencies that didn't break anything else.

Of course, I had to upgrade my version of APC, but that was to be expected since I don't maintain it from YUM.

PCRE with Unicode on CENTOS4

I had hoped my php 5.2 upgrade would solve the pcre problem, but it didn't. That was because the php installed (as well as the previous one - presumably a CentOS/RHEL standard) uses the option that tells php to use the installed OS library. So I had to go learn about my CentOS version of PCRE (which was dated 2003) and why it didn't support unicode.

That turned out to be confusing on google, because it seems I'm not the only one to be messed up about the difference between UTF-8 and unicode. The version I had did have UTF-8. but not unicode support.

The solution turned out to be a combination of these two posts:

http://devblog.jasonhuck.com/2009/01/08/installing-lasso-on-centos-5/

Look for "Add Unicode Properties Support to PCRE".

The only problem with it was it was for the wrong version of CentOS, so I found:

http://www.centos.org/modules/newbb/viewtopic.php?topic_id=6833

which pointed me at a broken link for a fedora 6 src rpm, which i eventually found here:

http://archives.fedoraproject.org/pub/archive/fedora/linux/core/6/source/SRPMS/pcre-6.6-1.1.src.rpm

Using that, with jason huck's instructions, turned out to work just fine.

Conclusion: you can update your CentOS4 to run CiviCRM 2.1 without modification, though some assembly is required. Specifically: recompiling source RPMs.

Thursday, November 06, 2008

Eating my dog food

I was carrying home a bag of dog food recently for my dogs when the neighbour made jokes about eating dog food and the coming recession. I think recessions are like winter - you know it'll come eventually, but it's hard to imagine in the depths of summer.

But my point is really about dog food, and eating it. The woman who sells me Nutromax claims the salespeople eat it to prove it's good. As a computer-geeky guy, I'm familiar with the expression "eating your own dog food" to mean, using your own software. I just looked it up on wikipedia and discover that the original idea did indeed come from an advertisement about dog food, and that it's now used mainly about software. Here's what wikipedia says about the idea:


Using one's own products has four primary benefits:

1. The product's developers are familiar with using the products they develop.
2. The company's members have direct knowledge and experience with its products.
3. Users see that the company has confidence in its own products.
4. Technically savvy users in the company, with perhaps a very wide set of business requirements and deployments, are able to discover and report bugs in the products before they are released to the general public.

A disadvantage is that if taken to an extreme, a company's desire to eat its own dog food can turn into Not Invented Here syndrome, in which the company refuses to use any product which was not developed in-house.

So, that's my introduction to say that I've finally created myself a Drupal site for my business. It had previously been hosted at googlepages, because it was free and easy and I thought Web 2.0 was cool (just kidding about that last one). Also because I didn't have a server or domain name, because I thought I'd just be a consultant.

After three years, I'm still working as an independent consultant. What I've changed is:

  1. I've got my hands full with Drupal and CiviCRM for Canadian non-profits. I may do some projects outside that scope, but I've now got a more specific niche.
  2. I'm not just a "consultant", but a full service shop - i.e. websites from beginning to end, even mail. I use the "keep it as simple as possible, but no simpler" rule, and working on other people's servers turned out to be more complicated than running my own server (no, not in my basement, I use a commercial Canadian service for the hardware and network).
  3. I'm committed to remaining "agressively small" [credits to Mark Surman and Phillip Smith]. There's an assumption in the technical world that you have to "grow" your business to be competitive (yes, not just the technical world). I think that ideology is wrong in a general way from economic and environmental points of view, but specifically wrong for most Drupal websites. Big shops with layers of management do not make better websites, and certainly not cheaper - the big shops are not driven by real 'economies of scale' but by delusions of money and/or fame by the owners. You know who you are ...

That's my story so far, now go visit my new site.

Friday, July 04, 2008

Infrastructure projects

I've been running my own server for a year and a half now, and have been surprised at how trouble free it's been. I attribute this to:

  1. luck
  2. good planning
  3. a decent upstream provider
  4. the maturity of linux distribution maintenance tools (e.g. yum)

In this case, good planning means:

  1. keeping it as simple as possible
  2. doing things one at a time
  3. i'm the only one mucking about on it
And so this month, inspired by some Drupal camp sessions, I decided to take some time to make a good thing better. My goals were:
  1. Optimizing my web servicing for more traffic.
  2. Simplifying my Drupal maintenance.
  3. Automating my backups.

And here's the results ...

Web Servicing Optimizations

This was relatively easy - I just finished off the work from here: http://homeofficekernel.blogspot.com/2008/02/drupal-centos-optimization.html

Specifically, i discovered that I hadn't actually setup a mysql query cache, so I did that. And then I discovered that it was pretty easy and not dangerous to remove a bunch of the default apache modules. All I had to do was comment out the lines of the httpd.conf file. I took out some other gunk in there that isn't useful for Drupal sites (multilingual icons, auto indexing).

I like to think that between those two, the response time is even better, though the difference is relatively marginal without much load. The real reason to do this is to increase the number of available servers in apache without the risk of going into swap death. So I can now add more sites with out fear.

Simplifying Drupal Maintenance

I was converted to SVN (a version control program) 3 years ago and still love it. I've been using it to methodically track all the code, with individual repositories for each of my major projects, using the full trunk and vendor branch setup and the magic of svn_load_dirs.

But after a project starts using a lot of contributed modules, or when there are several code security updates each year and you have several projects, this starts getting time consuming.

So I've started NOT putting drupal core or contributed modules into my svn projects, and I'm using one multi-site install for most of my sites. Along with the fabulous update_status module for Drupal 5 (which is in core for Drupal 6), keeping up-to-date is now much more manageable. It's also a change of mind set - I'm now more committed (pun intented) to the Drupal community. I means I can no longer hack core (at least not without a lot of work).

And so -- I also tested this whole scheme out by moving all my simple projects to a new document root that's controlled entirely via cvs to the drupal.org server, with symlinks out to my individual site roots (which still go in svn, so i can keep track of themes, files and custom modules), and it worked well. There's actually a performance issue here as well - by keeping all my sites on the same document root, the php cache doesn't fill up so fast, because there's less code running. And it's more easily kept secured.

And as a final hurrah, I converted http://community.civicrm.ca/ up to Drupal 6. In the process, I've given up on the 'links' module which I thought had some promise, and am now just using the 'link' module that defines link fields for cck. I also started learning about the famed Drupal 6 theming, and tweaked the community.civicrm.ca theme for fun.

Backups

I backup to an offsite-server using rsync, which seems to be a common and highly efficient way to do things for a server like this. Rsync is clever to only send file diffs, so load and bandwidth are kept to a minimum. My backups are not for users, they're only for emergencies, so I don't need to do hourly snapshots, only daily rsyncs.

Well, this works well for code, but not so much for mysql. I'd been doing full mysqldumps, and then copying them to my backup server, but this was not very efficient. So finally this week, I've set it up with help from some simple scripts to use the --tab parameter to mysqldump - which dumps the tables in each database to separate files. This means that now when I run rsync on them, it's clever enough to only worry about the tables that have changed, which are relatively few each day. So now I've got daily mysql backups as well, without huge load/bandwidth!

And that also means, I can now use my backup as a place to pull copies of code and database when I want to setup a development environment.

Virtualization

Which takes me almost to a new topic, but it's also about infrastructure, so here it is. I've been running little development servers for several years. My main one I actually found being thrown out (it was a Pentium II). They have served me well, but I was rethinking my strategy mainly on power issues: I'm not happy that I have to use so much electricity for them (and as older servers, the power supplies aren't very efficient), and since one of them is actually in my office, it's fine in the winter when my office is cold, but really not good in the summer when I'm trying to stay cool.

And so the promise of virtualization lured me into believing I could run a little virtual server off my desktop. I tried XEN, but it broke my wireless card (because I have to run it using ndiswrapper), so I finally gave up and installed VMWare (because it was in an ubuntu-compatible repository), even though it's not really open source.

Does it work? Well, so far so good.

Wednesday, May 14, 2008

Toronto Drupal Camp 2008

I thought I'd have some time for some house renovations before Drupal Camp this year, but planning Drupal projects is always harder than you'd think. In any case, I'm also helping plan Drupal Camp, and I've even got a couple of session proposals that have to do with planning Drupal websites. So come find out what all the fuss is about.

Friday, April 18, 2008

CiviCRM Case Study: Fairvote.ca

These are my notes from a CiviCRM data import for Fair Vote Canada I did on April 16/17, 2008.

Fair Vote Canada is a small NGO, has been around for about 7 years, and is a public interest lobby group for proportional representation-type voting systems in Canada. If you care about democracy, then they're worth supporting. One thing I find particularly interesting and important is that they're cross-party. Obviously, depending on whether they're in power or not, parties have a very biased opinion about proportional representation, and regardless of their statements of principles, that's not going to change with any changes of government, since parties exist to win power, or they don't last long. So Fair Vote Canada decided early on to be strictly non-partisan, and they have some energetic and high-profile supporters from across the political spectrum.

On the technical side of things, they've had a Drupal site for a while, but were still using Excel spreadsheets to manage their relationships with their members (about 3000 of them), which was getting unwieldy and time-consuming.

They had tried to setup CiviCRM and import the data earlier this year, but the import had been done as if CiviCRM was a custom relational database (like the thousands of FoxPro/Filemaker desktop installs out there) - so it wasn't very useful. For example, donations and householding stuff were imported as custom fields. The installation did have some customization (fields, profiles) that needed to be kept, but the data was all considered suspect.

1. Sample Imports

Before I did anything on the live server, I created a vanilla CiviCRM site on my development server and imported a sample Excel sheet provided by Fair Vote, testing my ideas about how to do this. The data was saved as one household per row, with multiple columns detailing date/amount of donations, as well as one or two individuals associated with the household.

The key idea was to generate 'external ids', in order to maintain the relationships between the contact information and the donation and membership data. Then I could import the same sheet several times - both as contact information (possibly multiple times for households) and then as donation information, retaining the relationship through the use of this external id key which is well supported by CiviCRM.

2. Server survey and backup

I looked at all the existing server code and backed up the relevant databases.

3. CiviCRM Install

I started out by creating stage.fairvote.ca in the /sites directory of the current site and cloning fairvote.ca to it. I then edited the two public CiviCRM related pages to say 'coming soon' and turned off the CiviCRM module on the live site. Then I edited the settings in the stage site so that it used that old CiviCRM database – so I had full access to the old CiviCRM data while I rebuilt the new one on a clean install. I installed v. 202 in /sites/all/modules where it's happiest and ran the usual new install routines, and then copied over the global configuration stuff from the old install (locale, etc.).

4. Global spreadsheet cleanup

My sample import exercise had provided me with a few global spreadsheet cleanups that I knew I had to do. These were:

a. convert dates to ISO 8601 (yyyy-mm-dd) - using the cell formatting feature in OpenOffice, with some manual and automated cleanup when dates had been entered erratically.

b. remove dollar signs from currency (simple format)

c. generate "household names" for spreadsheet rows with more than one contact ID per address. I used a macro that combined the last names.

d. fix various misspelled country/provinces (e.g. USA -> US, NF -> NL, etc.)

e. modify gender from "m" and "f" to "Male" and "Female" (using a spreadsheet macro). I did this with some other columns as well (e.g. French).

f. add a dummy column that has just the word "Donation" in it for when I import the donation columns of a sheet.

g. after all that, I had to split the membership spreadsheet because it included rows with 1 membership and 1 name, 1 membership and 2 names, and 2 memberships with 2 names. It also had some other special membership entries that I wanted to mark separately. So I ended up with 4 spreadsheets from this one (more details below about this).

5. CiviCRM Customization

I created a few custom fields after looking through the old installation and the data I was importing. Not all of the old customizations were useful - some looked like accumulated cruft and I had no corresponding data in my spreadsheets. With my external id trick, I could also rely on being able to re-import any data that I didn't import the first time (at least, for the custom fields - re-importing relationships wasn't going to be as easy).

I also had the two public CiviCRM-related pages: the newsletter signup and the petition - they needed their own custom fields and profiles and groups.

6. Data Import

The bulk of the work now should have been relatively straightforward, but ended up being fiddly.

a. members spreadsheet

This was the hardest and most important, so i started with it. It had 2747 entries. As per the above note, I split it into:

mv - 'vip', steering committee memberships (15)
m22 - rows with 2 names and 2 membership (156)
m12 - rows with 2 names and 1 membership (87)
m11 - rows with 1 name and 1 membership. (2489)

for the householding (m11 & m12) I created an 3 extra columns in which i generated ("external") ids for the household and individuals, looking like:
m22-h-140 m22-i1-140 m22-i2-140
i.e.: -<(household or individual 1 or 2)>-

Fortunately, the other sheets later could all be simpler with just one external id per row, since there was no householding involved.

Each of these sheets was then exported to CSV format, and now I did the imports.

First each sheet got imported at least once for the contact information, and 3 times in the case of m12 and m22 (once for the household and twice for the two individuals in the household). When importing the individuals with households (i.e. m12 and m22), I chose not to import the mailing address address of their household to avoid duplicated mailings, but did import the phone number to all three. Instead, I used my 'external id' trick to relate the individuals to the household, which does contain their mailing address info. In these imports, I also imported the recurring donation information into a custom field of the first individual per record.

For each of these imports, I generate a new 'group' for the import, using the codes above. This somewhat redundant, because you can regenerate these groups based on the external id, but I've left them in temporarily so you can check over the data more easily. Since they're ugly and distracting, they should be deleted eventually.

Then I imported all the donation information by importing it up to 8 times - once for each Donation amount/date. I imported the date as the 'recieved date' and the amount as the 'total amount' and set the donation type as 'donation' - i.e. only three fields, plus I used the external id to relate the donation to the first individual of each row.

Then I imported the membership data - which was just the 'date entered' as 'membership since' and the max of date entered and date renewed as 'membership start'. I imported the m22 sheet twice - once for each individual. There is some automated stuff about renewing membership automatically when getting a donation, but this didn't do anything during the import. Subsequent donations (manually input) should automatically update the membership status.

The rest of the sheets were similar, but much simpler, notes following.

b. Non-member donors sheet

744 records. Here I used the external id format d-. I imported the donations as a special 'MMP-Donation' since they were marked specially on the sheet and didn't seem to bestow membership like a normal donation. I didn't generate a group for them.

c. non-member volunteers

Originally 749 records, only 721 imported after cleaning out ones with bad addresses - no external id, I just tagged all imports with the 'volunteer' tag that already exists.

d. newsletter list - non-members

Originally 862 records only 859 valid, added to group 'FVC Newsletter' and used external id n-.

e. organizations

originally 61, imported 60, no external id.

f. petition - online signers - non members.

4501 records - put into FVC Petition Group - no external id - also imported 'Email newsletter?' custom field, petition sign date and party fields. Didn't put them into the newsletter list!

Final tally: 9,831 contacts imported.

7. Conclusion

CiviCRM and it's import facility was impressive for fully capturing all the variety of data available on these spreadsheets. It's now all there, with excellent functionality that wasn't in the original sheets.

I encountered a number of little bugs as I went along, but the biggest one to note was a few times when the import would claim success but not do anything. That caused me hours of grief as I tried various ways of tricking it into thinking it was a new import (believing the problem to be a caching issue), but eventually I looked into files/civicrm/upload and discovered a log file that had a fatal PHP error that wasn't reported on the screen (related to an invalid value for a custom field).

Like all projects like this, it took longer than I'd hoped for, but the result is actually better than I'd feared - there was very little lost in translation. The total time was about 3 days.

Here's hoping that the tool helps the cause.