I've been banging my head against a distracted cabal of developers who seem to think that a particular CiviCRM core design, which I'm invested in via my contributed code, is bad, and that it's okay to break it.
This post is my attempt to explain why it was a good idea in the first place.
The design in question is the use of a hash function to populate a field called 'invoice_id' in CiviCRM's contribution table. The complaint was that this string is illegible to humans, and not necessary. So a few years ago some code was added to core, that ignores the current value of invoice_id and will overwrite it, when a human-readable invoice is generated.
The complaint about human-readability of course is valid, and the label on the field is misleading, but the solution is terrible for several reasons I've already written about.
In this post, I'd like to explain why the use of the hash value in the invoice_id field is actually a brilliant idea and should be embraced. And sure, let's give it a different label.
The key issue is reconciliation with payment processors. Paypal was the first payment processor to be implemented, and still relies on that invoice_id as far as I know.
Reconciliation between CiviCRM and any payment processors can (and usually should) be done in two ways - you'll want to ensure that payments in CiviCRM have matching ones in the payment processor's records, and vice-versa. There are also cases where you really need reconciliation of some kind - e.g. when visitor has paid by Paypal comes back to CiviCRM, it needs to confirm the contribution, and also after an ACH/EFT payment request, there needs to be a confirmation in a few days when the payment has actually been attempted. In fact: instant, done and forever payments are really just a dangerous illusion from the use of credit cards. If your mental model of payment processing is exclusively credit card based, you're going to mess up eventually.
Now, most payments match up just fine, and for an on-site processor doing credit cards, reconciliation is often ignored, but there are a number of ways/times when it is important (assuming the data in your CiviCRM is important).
Case 1. A payment processes fine, but is later reversed (manually, or by the donor, for example) in the payment processor interface.
Case 2. A payment completes in the payment processor, but that information doesn't get communicated back to CiviCRM. This could happen for both externally hosted payment pages like PayPal, but equally an on-site payment processor that makes a request for a payment that goes through, but fails to capture the result (e.g. due networking or server issues).
Case 3. A payment is made manually through the payment processor.
CiviCRM has two fields to help us do reconciliation:
a. the invoice string that it sends to the payment processor (or at least, it does by default, any individual processor plugin may choose not to) and saves into the invoice_id field.
b. the transaction string that it gets back. That gets saved in both the contribution table and the transaction table (because now you can have more than one payment per contribution).
The two contribution table fields in CiviCRM have a hard-coded requirement to be unique (or empty), which means that when used properly, and with enough tools provided by the payment processor, we can do reconciliation, both manual and automated.
The important thing is that we need BOTH these fields if we are to cover all three of those cases - we need to identify matching entries, as well as unique unmatched entries in CiviCRM and unique unmatched entries in the processor.
And yes, if there is more than one payment against a contribution, we don't have uniqueness of the invoice number at the payment processor end of things for each payment, but that actually doesn't break anything.
Okay, having established that we need both fields, now the question is just - why do we need such an ugly string to send to the payment processor as an invoice id?
For that, there are actually two good answers:
1. When a payment via a payment processor is attempted in CiviCRM, typically no contribution record has been created. So there is no nice integer id that we can use to generate a human-friendly invoice id. We could do some gymnastics and add some extra code so that we were generating incremental ids each time, but that's not an easy problem to solve, and the numbers we generated would not be matched up with the contribution id numbers.
2. Global uniqueness is a good thing. The hash method for generating unique strings is also used for example in git. If we were to use some kind of incremental id and a different system (e.g. Drupal commerce?) was connecting to the same payment processor, we could have overlaps of invoice numbers, making reliable reconciliation impossible.
Okay, so enough already about the invoice_id field?
Addendum, April 28:
1. I discovered that the invoice_id field dates from CiviCRM version 1.3, so about the end of 2005.
2. My answer about not having a nice integer id available is incomplete - since 4.7, Eileen has been fixing stuff so that contributions get created as pending contributions before any payment attempts are made, and that almost allows us to change the default way that invoice_ids are generated, if we were to decide that makes sense (e.g. it might make sense for pay later contributions). But actually, there's still at least one code pathway where payments are attempted without contribution ids, so answer 1. is still a valid answer/reason.
This post is my attempt to explain why it was a good idea in the first place.
The design in question is the use of a hash function to populate a field called 'invoice_id' in CiviCRM's contribution table. The complaint was that this string is illegible to humans, and not necessary. So a few years ago some code was added to core, that ignores the current value of invoice_id and will overwrite it, when a human-readable invoice is generated.
The complaint about human-readability of course is valid, and the label on the field is misleading, but the solution is terrible for several reasons I've already written about.
In this post, I'd like to explain why the use of the hash value in the invoice_id field is actually a brilliant idea and should be embraced. And sure, let's give it a different label.
The key issue is reconciliation with payment processors. Paypal was the first payment processor to be implemented, and still relies on that invoice_id as far as I know.
Reconciliation between CiviCRM and any payment processors can (and usually should) be done in two ways - you'll want to ensure that payments in CiviCRM have matching ones in the payment processor's records, and vice-versa. There are also cases where you really need reconciliation of some kind - e.g. when visitor has paid by Paypal comes back to CiviCRM, it needs to confirm the contribution, and also after an ACH/EFT payment request, there needs to be a confirmation in a few days when the payment has actually been attempted. In fact: instant, done and forever payments are really just a dangerous illusion from the use of credit cards. If your mental model of payment processing is exclusively credit card based, you're going to mess up eventually.
Now, most payments match up just fine, and for an on-site processor doing credit cards, reconciliation is often ignored, but there are a number of ways/times when it is important (assuming the data in your CiviCRM is important).
Case 1. A payment processes fine, but is later reversed (manually, or by the donor, for example) in the payment processor interface.
Case 2. A payment completes in the payment processor, but that information doesn't get communicated back to CiviCRM. This could happen for both externally hosted payment pages like PayPal, but equally an on-site payment processor that makes a request for a payment that goes through, but fails to capture the result (e.g. due networking or server issues).
Case 3. A payment is made manually through the payment processor.
CiviCRM has two fields to help us do reconciliation:
a. the invoice string that it sends to the payment processor (or at least, it does by default, any individual processor plugin may choose not to) and saves into the invoice_id field.
b. the transaction string that it gets back. That gets saved in both the contribution table and the transaction table (because now you can have more than one payment per contribution).
The two contribution table fields in CiviCRM have a hard-coded requirement to be unique (or empty), which means that when used properly, and with enough tools provided by the payment processor, we can do reconciliation, both manual and automated.
The important thing is that we need BOTH these fields if we are to cover all three of those cases - we need to identify matching entries, as well as unique unmatched entries in CiviCRM and unique unmatched entries in the processor.
And yes, if there is more than one payment against a contribution, we don't have uniqueness of the invoice number at the payment processor end of things for each payment, but that actually doesn't break anything.
Okay, having established that we need both fields, now the question is just - why do we need such an ugly string to send to the payment processor as an invoice id?
For that, there are actually two good answers:
1. When a payment via a payment processor is attempted in CiviCRM, typically no contribution record has been created. So there is no nice integer id that we can use to generate a human-friendly invoice id. We could do some gymnastics and add some extra code so that we were generating incremental ids each time, but that's not an easy problem to solve, and the numbers we generated would not be matched up with the contribution id numbers.
2. Global uniqueness is a good thing. The hash method for generating unique strings is also used for example in git. If we were to use some kind of incremental id and a different system (e.g. Drupal commerce?) was connecting to the same payment processor, we could have overlaps of invoice numbers, making reliable reconciliation impossible.
Okay, so enough already about the invoice_id field?
Addendum, April 28:
1. I discovered that the invoice_id field dates from CiviCRM version 1.3, so about the end of 2005.
2. My answer about not having a nice integer id available is incomplete - since 4.7, Eileen has been fixing stuff so that contributions get created as pending contributions before any payment attempts are made, and that almost allows us to change the default way that invoice_ids are generated, if we were to decide that makes sense (e.g. it might make sense for pay later contributions). But actually, there's still at least one code pathway where payments are attempted without contribution ids, so answer 1. is still a valid answer/reason.