Approximate joins

There are two main ways to reconcile payments against charges: The Open Item method needs a foolproof way to match payments to charges, but what if the customer neglected to return a copy of the invoice, or to write the invoice number on the cheque? Reconciliation staff spend much of their time resolving such problems.

Can we help? Yes! It won't be entirely foolproof, but it will drastically cut down the onerous work of reconciliation.

Here is DDL for a test case:

CREATE SCHEMA approx;
USE approx;
CREATE TABLE charges (
  ID INT UNSIGNED PRIMARY KEY AUTO_INCREMENT,
  custID INT UNSIGNED,
  amount DECIMAL(10,2) NOT NULL
);
CREATE TABLE payments (
  ID INT UNSIGNED PRIMARY KEY AUTO_INCREMENT,
  custID INT UNSIGNED,
  amount DECIMAL( 10,2) NOT NULL
);

Both tables carry a custID column to identify whose charge or payment it is, but there is no foreign key linking payments to specific charges--that is the link we are going to approximate.

Now populate the tables with a few rows of sample charges and payments for customer #1, ensuring that you have a variety of payments – some that match the charge exactly, some that are close but not enough, and some that are slight overpayments.

INSERT INTO approx.charges VALUES 
(NULL,1,100),(NULL,1,12),(NULL,1,56),(NULL,1,43),(NULL,1,59),(NULL,1,998);
INSERT INTO approx.payments VALUES 
(NULL,1,99),(NULL,1,62),(NULL,1,40),(NULL,1,50),(NULL,1,12),(NULL,1,1000);

SELECT * FROM charges;
+----+--------+--------+
| ID | custID | amount |
+----+--------+--------+
|  1 |      1 | 100.00 |
|  2 |      1 |  12.00 |
|  3 |      1 |  56.00 |
|  4 |      1 |  43.00 |
|  5 |      1 |  59.00 |
|  6 |      1 | 998.00 |
+----+--------+--------+
SELECT * FROM payments;
+----+--------+---------+
| ID | custID | amount  |
+----+--------+---------+
|  1 |      1 |   99.00 |
|  2 |      1 |   62.00 |
|  3 |      1 |   40.00 |
|  4 |      1 |   50.00 |
|  5 |      1 |   12.00 |
|  6 |      1 | 1000.00 |
+----+--------+---------+

The first thing to do is define an approximation threshold: how close must the amount paid be to the amount charged before we conclude that the amounts are related? For this example we define the proximity threshold as 2. In a real-world example, it might be 10, or 50, or perhaps percentage of the charge. It all depends on the nature of the organisation and the typical total purchase. A house builder may make frequent purchases valued at $1000 and more. You scale the threshold to the typical situation.

Since the amount paid might be more or less or even equal to the amount charged, to link a payment to a charge we need not an equi-join but a theta-join that tests a range both below and above the charge amount. That might suggest a BETWEEN clause. Here is a better idea: use the ABS() function:

SET  @proximity = 2;   -- change this value to suit your situation
SELECT
  c.ID AS ChargeNo,
  c.Amount AS Charge,
  p.ID AS PaymentNo,
  p.Amount AS Payment
FROM charges c
JOIN payments p
  ON c.custID = p.custID 
  AND ABS(c.amount - p.amount) <= @proximity
WHERE c.custID = 1;

Before you run this query, look at the data to anticipate the result.

Here it is:

+----------+--------+-----------+---------+
| ChargeNo | Charge | PaymentNo | Payment |
+----------+--------+-----------+---------+
|        1 | 100.00 |         1 |   99.00 |
|        2 |  12.00 |         5 |   12.00 |
|        6 | 998.00 |         6 | 1000.00 |
+----------+--------+-----------+---------+

The solution is correct, as far as it goes, but it doesn’t go far enough. We correctly identified the three situations: underpayment, exact payment and overpayment, but we suppressed all charges that don’t have a matching payment. Reconciliation staff are probably interested in a bigger picture of the situation. Fix this by changing the INNER JOIN to a LEFT JOIN:

SET @proximity = 2;
SELECT 
  c.ID AS ChargeNo, 
  c.amount AS Charge, 
  p.ID AS PaymentNo, 
  p.amount AS Payment
FROM
  charges c
LEFT JOIN payments p
  ON c.custID = p.custID 
  AND ABS(c.amount - p.amount) <= @proximity
WHERE c.custID = 1;
+----------+--------+-----------+---------+
| ChargeNo | Charge | PaymentNo | Payment |
+----------+--------+-----------+---------+
|        1 | 100.00 |         1 |   99.00 |
|        2 |  12.00 |         5 |   12.00 |
|        3 |  56.00 |      NULL |    NULL |
|        4 |  43.00 |      NULL |    NULL |
|        5 |  59.00 |      NULL |    NULL |
|        6 | 998.00 |         6 | 1000.00 |
+----------+--------+-----------+---------+

Much better! The reconciliation people now know that three charges have no matching payment.

What if the customer mistakenly pays for something twice? Add a row to the Payments table with an amount of $1000, then re-run the last query:

+----------+--------+-----------+---------+
| ChargeNo | Charge | PaymentNo | Payment |
+----------+--------+-----------+---------+
|        1 | 100.00 |         1 |   99.00 |
|        2 |  12.00 |         5 |   12.00 |
|        3 |  56.00 |      NULL |    NULL |
|        4 |  43.00 |      NULL |    NULL |
|        5 |  59.00 |      NULL |    NULL |
|        6 | 998.00 |         6 | 1000.00 |
|        6 | 998.00 |         7 | 1000.00 |
+----------+--------+-----------+---------+

How convenient! We can see at once that charge number 6 was paid for twice.

Somebody in the reconciliation department owes you lunch.