Skip to main content

CiviCRM's invoice_id field and why you should love the hash

I've been banging my head against a distracted cabal of developers who seem to think that a particular CiviCRM core design, which I'm invested in via my contributed code, is bad, and that it's okay to break it.

This post is my attempt to explain why it was a good idea in the first place.

The design in question is the use of a hash function to populate a field called 'invoice_id' in CiviCRM's contribution table. The complaint was that this string is illegible to humans, and not necessary. So a few years ago some code was added to core, that ignores the current value of invoice_id and will overwrite it, when a human-readable invoice is generated.

The complaint about human-readability of course is valid, and the label on the field is misleading, but the solution is terrible for several reasons I've already written about.

In this post, I'd like to explain why the use of the hash value in the invoice_id field is actually a brilliant idea and should be embraced. And sure, let's give it a different label.

The key issue is reconciliation with payment processors. Paypal was the first payment processor to be implemented, and still relies on that invoice_id as far as I know.

Reconciliation between CiviCRM and any payment processors can (and usually should) be done in two ways - you'll want to ensure that payments in CiviCRM have matching ones in the payment processor's records, and vice-versa. There are also cases where you really need reconciliation of some kind - e.g. when visitor has paid by Paypal comes back to CiviCRM, it needs to confirm the contribution, and also after an ACH/EFT payment request, there needs to be a confirmation in a few days when the payment has actually been attempted. In fact: instant, done and forever payments are really just a dangerous illusion from the use of credit cards. If your mental model of payment processing is exclusively credit card based, you're going to mess up eventually.

Now, most payments match up just fine, and for an on-site processor doing credit cards, reconciliation is often ignored, but there are a number of ways/times when it is important (assuming the data in your CiviCRM is important).

Case 1. A payment processes fine, but is later reversed (manually, or by the donor, for example) in the payment processor interface.

Case 2. A payment completes in the payment processor, but that information doesn't get communicated back to CiviCRM. This could happen for both externally hosted payment pages like PayPal, but equally an on-site payment processor that makes a request for a payment that goes through, but fails to capture the result (e.g. due networking or server issues).

Case 3. A payment is made manually through the payment processor.

CiviCRM has two fields to help us do reconciliation:

a. the invoice string that it sends to the payment processor (or at least, it does by default, any individual processor plugin may choose not to) and saves into the invoice_id field.
b. the transaction string that it gets back. That gets saved in both the contribution table and the transaction table (because now you can have more than one payment per contribution).

The two contribution table fields in CiviCRM have a hard-coded requirement to be unique (or empty), which means that when used properly, and with enough tools provided by the payment processor, we can do reconciliation, both manual and automated.

The important thing is that we need BOTH these fields if we are to cover all three of those cases - we need to identify matching entries, as well as unique unmatched entries in CiviCRM and unique unmatched entries in the processor.

And yes, if there is more than one payment against a contribution, we don't have uniqueness of the invoice number at the payment processor end of things for each payment, but that actually doesn't break anything.

Okay, having established that we need both fields, now the question is just - why do we need such an ugly string to send to the payment processor as an invoice id?

For that, there are actually two good answers:

1. When a payment via a payment processor is attempted in CiviCRM, typically no contribution record has been created. So there is no nice integer id that we can use to generate a human-friendly invoice id. We could do some gymnastics and add some extra code so that we were generating incremental ids each time, but that's not an easy problem to solve, and the numbers we generated would not be matched up with the contribution id numbers.

2. Global uniqueness is a good thing. The hash method for generating unique strings is also used for example in git. If we were to use some kind of incremental id and a different system (e.g. Drupal commerce?) was connecting to the same payment processor, we could have overlaps of invoice numbers, making reliable reconciliation impossible.

Okay, so enough already about the invoice_id field?

Addendum, April 28:

1. I discovered that the invoice_id field dates from CiviCRM version 1.3, so about the end of 2005.

2. My answer about not having a nice integer id available is incomplete - since 4.7, Eileen has been fixing stuff so that contributions get created as pending contributions before any payment attempts are made, and that almost allows us to change the default way that invoice_ids are generated, if we were to decide that makes sense (e.g. it might make sense for pay later contributions). But actually, there's still at least one code pathway where payments are attempted without contribution ids, so answer 1. is still a valid answer/reason.

Popular posts from this blog

IATS and CiviCRM

Update, Nov 2009: I've just discovered and fixed a bug I introduced in the 2.2 branch for the IATS plugin. The bug was introduced when i updated the API files from IATS and failed to notice that the legacy method for C$ one-time donations was no longer supported.
If you're using a version greater than or equal to 2.2.7, and are using IATS for C$, non-recurring donations, then you're affected.
To fix it edit the file : CRM/Core/Payment/IATS.php, and remove the line that looks like this:

$canDollar = ($params['currencyID'] == 'CAD'); //define currency type The full fix removes a conditional branch based on that value a little further on, but by removing this line, it'll never actually use that branch. Drop me a line if you have any questions.
Update, May 2009: This post is still getting quite a bit of traffic, which is great. Here are a few important things to note:
The IATS plugin code is in CiviCRM, you don't need to add any code.You do still …

The Tyee: Bricolage and Drupal Integration

The Tyee is a site I've been involved with since 2006 when I wrote the first, 4.7 version of a Drupal module to integrate Drupal content into a static site that was being generated from bricolage. About a year ago, I met with Dawn Buie and Phillip Smith and we mapped out a number of ways to improve the Drupal integration on the site, including upgrading the Drupal to version 5 from 4.7. Various parts of that grand plan have been slowly incorporated into the site, but as of next week, there'll be a big leap forward that coincides with a new design [implemented in Bricolage by David Wheeler who wrote and maintains Bricolage] as well as a new Drupal release of the Bricolage integration module.PlansApplication integration is tricky, and my first time round had quite a few issues. Here's a list of the improvements in the latest version:File space separation. Before, Drupal was installed in the apache document root, which is where bricolage was publishing it's content. This …

Me and varnish win against a DDOS attack.

This past month one of my servers experienced her first DDOS - a distributed denial of service attack. A denial of service attack (or DOS) just means an attempt to shut down an internet-based service by overwhelming it with requests. A simple DOS attack is usually relatively easy to deal with using the standard linux firewall called iptables.  The way iptables works is by filtering the traffic based on the incoming request source (i.e., the IP of the attacking machine). The attacking machine's IP can be added into your custom ip tables 'blacklist' to block all traffic from it, and it's quite scalable so the only thing that can be overwhelmed is your actual internet connection, which is hard to do.

The reason a distributed DOS is harder is because the attack is distributed from multiple machines. I first noticed an increase in my traffic about a day after it had started - it wasn't slowing down my machine, but it did show up as a spike in traffic. I quickly saw that…