Friday, April 18, 2008

CiviCRM Case Study:

These are my notes from a CiviCRM data import for Fair Vote Canada I did on April 16/17, 2008.
Fair Vote Canada is a small NGO, has been around for about 7 years, and is a public interest lobby group for proportional representation-type voting systems in Canada. If you care about democracy, then they're worth supporting. One thing I find particularly interesting and important is that they're cross-party. Obviously, depending on whether they're in power or not, parties have a very biased opinion about proportional representation, and regardless of their statements of principles, that's not going to change with any changes of government, since parties exist to win power, or they don't last long. So Fair Vote Canada decided early on to be strictly non-partisan, and they have some energetic and high-profile supporters from across the political spectrum.
On the technical side of things, they've had a Drupal site for a while, but were still using Excel spreadsheets to manage their relationships with their members (about 3000 of them), which was getting unwieldy and time-consuming.
They had tried to setup CiviCRM and import the data earlier this year, but the import had been done as if CiviCRM was a custom relational database (like the thousands of FoxPro/Filemaker desktop installs out there) - so it wasn't very useful. For example, donations and householding stuff were imported as custom fields. The installation did have some customization (fields, profiles) that needed to be kept, but the data was all considered suspect.
1. Sample Imports
Before I did anything on the live server, I created a vanilla CiviCRM site on my development server and imported a sample Excel sheet provided by Fair Vote, testing my ideas about how to do this. The data was saved as one household per row, with multiple columns detailing date/amount of donations, as well as one or two individuals associated with the household.
The key idea was to generate 'external ids', in order to maintain the relationships between the contact information and the donation and membership data. Then I could import the same sheet several times - both as contact information (possibly multiple times for households) and then as donation information, retaining the relationship through the use of this external id key which is well supported by CiviCRM.
2. Server survey and backup
I looked at all the existing server code and backed up the relevant databases.
3. CiviCRM Install
I started out by creating in the /sites directory of the current site and cloning to it. I then edited the two public CiviCRM related pages to say 'coming soon' and turned off the CiviCRM module on the live site. Then I edited the settings in the stage site so that it used that old CiviCRM database – so I had full access to the old CiviCRM data while I rebuilt the new one on a clean install. I installed v. 202 in /sites/all/modules where it's happiest and ran the usual new install routines, and then copied over the global configuration stuff from the old install (locale, etc.).
4. Global spreadsheet cleanup
My sample import exercise had provided me with a few global spreadsheet cleanups that I knew I had to do. These were:
a. convert dates to ISO 8601 (yyyy-mm-dd) - using the cell formatting feature in OpenOffice, with some manual and automated cleanup when dates had been entered erratically.
b. remove dollar signs from currency (simple format)
c. generate "household names" for spreadsheet rows with more than one contact ID per address. I used a macro that combined the last names.
d. fix various misspelled country/provinces (e.g. USA -> US, NF -> NL, etc.)
e. modify gender from "m" and "f" to "Male" and "Female" (using a spreadsheet macro). I did this with some other columns as well (e.g. French).
f. add a dummy column that has just the word "Donation" in it for when I import the donation columns of a sheet.
g. after all that, I had to split the membership spreadsheet because it included rows with 1 membership and 1 name, 1 membership and 2 names, and 2 memberships with 2 names. It also had some other special membership entries that I wanted to mark separately. So I ended up with 4 spreadsheets from this one (more details below about this).
5. CiviCRM Customization
I created a few custom fields after looking through the old installation and the data I was importing. Not all of the old customizations were useful - some looked like accumulated cruft and I had no corresponding data in my spreadsheets. With my external id trick, I could also rely on being able to re-import any data that I didn't import the first time (at least, for the custom fields - re-importing relationships wasn't going to be as easy).
I also had the two public CiviCRM-related pages: the newsletter signup and the petition - they needed their own custom fields and profiles and groups.
6. Data Import
The bulk of the work now should have been relatively straightforward, but ended up being fiddly.
a. members spreadsheet
This was the hardest and most important, so i started with it. It had 2747 entries. As per the above note, I split it into:
mv - 'vip', steering committee memberships (15)
m22 - rows with 2 names and 2 membership (156)
m12 - rows with 2 names and 1 membership (87)
m11 - rows with 1 name and 1 membership. (2489)
for the householding (m11 & m12) I created an 3 extra columns in which i generated ("external") ids for the household and individuals, looking like:
m22-h-140 m22-i1-140 m22-i2-140
i.e.: -<(household or individual 1 or 2)>-
Fortunately, the other sheets later could all be simpler with just one external id per row, since there was no householding involved.
Each of these sheets was then exported to CSV format, and now I did the imports.
First each sheet got imported at least once for the contact information, and 3 times in the case of m12 and m22 (once for the household and twice for the two individuals in the household). When importing the individuals with households (i.e. m12 and m22), I chose not to import the mailing address address of their household to avoid duplicated mailings, but did import the phone number to all three. Instead, I used my 'external id' trick to relate the individuals to the household, which does contain their mailing address info. In these imports, I also imported the recurring donation information into a custom field of the first individual per record.
For each of these imports, I generate a new 'group' for the import, using the codes above. This somewhat redundant, because you can regenerate these groups based on the external id, but I've left them in temporarily so you can check over the data more easily. Since they're ugly and distracting, they should be deleted eventually.
Then I imported all the donation information by importing it up to 8 times - once for each Donation amount/date. I imported the date as the 'recieved date' and the amount as the 'total amount' and set the donation type as 'donation' - i.e. only three fields, plus I used the external id to relate the donation to the first individual of each row.
Then I imported the membership data - which was just the 'date entered' as 'membership since' and the max of date entered and date renewed as 'membership start'. I imported the m22 sheet twice - once for each individual. There is some automated stuff about renewing membership automatically when getting a donation, but this didn't do anything during the import. Subsequent donations (manually input) should automatically update the membership status.
The rest of the sheets were similar, but much simpler, notes following.
b. Non-member donors sheet
744 records. Here I used the external id format d-. I imported the donations as a special 'MMP-Donation' since they were marked specially on the sheet and didn't seem to bestow membership like a normal donation. I didn't generate a group for them.
c. non-member volunteers
Originally 749 records, only 721 imported after cleaning out ones with bad addresses - no external id, I just tagged all imports with the 'volunteer' tag that already exists.
d. newsletter list - non-members
Originally 862 records only 859 valid, added to group 'FVC Newsletter' and used external id n-.
e. organizations
originally 61, imported 60, no external id.
f. petition - online signers - non members.
4501 records - put into FVC Petition Group - no external id - also imported 'Email newsletter?' custom field, petition sign date and party fields. Didn't put them into the newsletter list!
Final tally: 9,831 contacts imported.
7. Conclusion
CiviCRM and it's import facility was impressive for fully capturing all the variety of data available on these spreadsheets. It's now all there, with excellent functionality that wasn't in the original sheets.
I encountered a number of little bugs as I went along, but the biggest one to note was a few times when the import would claim success but not do anything. That caused me hours of grief as I tried various ways of tricking it into thinking it was a new import (believing the problem to be a caching issue), but eventually I looked into files/civicrm/upload and discovered a log file that had a fatal PHP error that wasn't reported on the screen (related to an invalid value for a custom field).
Like all projects like this, it took longer than I'd hoped for, but the result is actually better than I'd feared - there was very little lost in translation. The total time was about 3 days.
Here's hoping that the tool helps the cause.