The Use of Statistics and Software Code
This post was first published on 26/06/2019.
Author: Stephen Mason.
As part of my LLM at the University of London, University College in 1990/1991, I took the ‘Proof’ component with Professor Twining. Professor Twining did not let anybody join his course until they passed an exam in statistics.
My knowledge of statistics is basic at best, which why I asked Professor Peter Bishop, School of Mathematics, Computer Science & Engineering, Department of Computer Science, City, University of London (https://www.city.ac.uk/people/academics/peter-bishop) and Roger Porkess for their observations on the comments made by Anthony de Garr Robinson QC in his opening speech for the Post Office in the current trial of Bates v Post Office Limited TLQ17/0455 before Mr Justice Fraser in London (a transcript of the trial is available at https://www.postofficetrial.com).
Mixing statistics and causation when discussing software code is dangerous. Statistics are not discussed in the practitioner text Electronic Evidence (4th edition, Institute of Advanced Legal Studies for the SAS Humanities Digital Library, School of Advanced Study, University of London, 2017), but I have written a comprehensive analysis, in chapter 6, of the presumption that computers are ‘reliable’ and the assertion by judges that software code can be trusted via ‘judicial notice.’
Below is what Anthony de Garr Robinson QC said in his opening speech for the Post Office:
Day 1: 11 March 2019, 98 – 101
Now, in their submissions the claimants say that they will challenge Dr Worden’s numerical analyses. That is to be welcomed. It will assist your Lordship to assist the soundness of his calculations. At the moment there is no engagement really by Mr Coyne with any questions of likelihood or extent, there are just some criticisms made of some of the assumptions that Dr Worden makes in his report.
Now, it is worth noting that Dr Worden has a number of different calculations, some of which are more complicated and some of which involve more assumptions than others. Let me just deal with one very simple calculation. This requires no understanding of statistics or mathematics. It is set out in section 8.5 of Dr Worden’s first report which starts at {D3/1/148} and it has changed a little bit in Dr Worden’s second report but we don’t need to address that in any detail at this stage.
MR JUSTICE FRASER: I should just tell you for interest I do understand mathematics and statistics. I’m not being funny, but I do.
MR DE GARR ROBINSON: No, that’s very helpful, my Lord. I thought I did, my Lord, I have several maths A-levels, but I realised that my own sense of my own mathematical abilities was rather greater than it turned out to be.
MR JUSTICE FRASER: I mean this one is just a simple multiplication, isn’t it?
MR DE GARR ROBINSON: Exactly.
MR JUSTICE FRASER: I think most school children would probably follow this one.
MR DE GARR ROBINSON: Exactly. It is one I understand: Over the period 2000 to 2018 the Post Office has had on average 13,650 branches. That means that over that period it has had more than 3 million sets of monthly branch accounts. It is nearly 3.1 million but let’s call it 3 million and let’s ignore the fact for the first few years branch accounts were weekly. That doesn’t matter for the purposes of this analysis.
Against that background let’s take a substantial bug like the Suspense Account bug which affected 16 branches and had a mean financial impact per branch of £1,000. The chances of that bug affecting any branch is tiny. It is 16 in 3 million, or 1 in 190,000-odd. The chances of affecting a claimant branch are even tinier because the claimant branches tended to be smaller than ordinary branches. One could engage in all sorts of calculations, but your Lordship may recall from Dr Worden’s second report that he ends up with a calculation of a chance of about 1 in 427,000-odd. So for there to be a 1 in 10 chance for a bug of this scale to affect one set of monthly account for a claimant branch, one would need something like 42,000 such bugs.
Of course there’s a much simpler way of doing it which really is just a straight calculation. There have been 3 million sets of monthly accounts so the chances of the Suspense Account bug affecting any given set of monthly accounts is 60 in 3 million or about 5 in a million, so to get a one in 10 chance of such a bug you would need to have 50,000 bugs like it.
But, my Lord, all the roads lead to the same basic result which is that even for a significant bug of that sort, the number of bugs that would need to exist in order to have any chance of generating even a portion of the losses that are claimed by the claimants would be a wild number that’s beyond the dreams of avarice. It is untenable to suggest that there are 40,000 or 50,000 bugs of that scale going undetected in Horizon for 20 years.
Dr Worden explains that in paragraphs 643 and 644 of his first report and the reference to that is {D3/1/152}. And it is interesting, my Lord, that the claimants very sensibly do not suggest that there will have been bugs of that scale in that number operating — lurking secretly in Horizon for the last 20 years and they don’t suggest it because they can’t. It’s a matter of common sense. And in my respectful submission just that calculation demonstrates that the claim made at the end of paragraph 17.1 of the claimants’ submissions is untenable. A combination of Horizon’s impressions with the volume of transactions done in Horizon is not entirely consistent with the errors reflected in the claimants’ case. In my respectful submission it is obviously inconsistent with that.
Just to be clear, that’s not to say that a claimant could not have been hit by a bug. As I hope I have made clear to your Lordship, Horizon is not perfect. It remains a possibility, but the important point is how unlikely it is. But of course the question of whether an individual claimant has suffered an impact as a result of a bug is not a point for this trial. That is a breach issue to be dealt with in an individual case. This trial is about setting a baseline for Horizon’s reliability, not a final conclusion that will govern every single breach case that comes before your Lordship.
Roger has offered a number of comments below, which are reproduced with his agreement.
Roger and I worked on a paper entitled ‘Looking at debit and credit card fraud’, which was published in Teaching Statistics, Volume 34, Number 3, Autumn 2012, 87 – 91.
This article was awarded the G Oswald George prize for 2012 (I gave my part of the award to the Felix Fund: https://www.felixfund.org.uk/about/who-is-felix/ – I was an Ammunition Technician in the British Army, 1973 – 1982, including bomb disposal).
The article was subsequently translated into German: Betrug mit Kundenkarten und Kreditkarten, Stochastik in der Schule, 34 (2014) 2, S. 15–18.
Roger Porkess is a past Chief Executive of Mathematics, Education, Innovation (MEI) for 20 years, and author or co-author of national reports on mathematics and statistics, including ‘A world full of data’ (Royal Statistical Society), as well as a very large number of mathematics and statistics textbooks.
Anthony de Garr Robinson QC: ‘I have several maths A-levels’
This is another example of imprecise language. He means several maths A-level modules.
Anthony de Garr Robinson QC: ‘nearly 3.1 million’
It is actually slightly over 3.1 million. It does not make any difference, but the imprecision is sloppy.
Anthony de Garr Robinson QC:
Against that background let’s take a substantial bug like the Suspense Account bug which affected 16 branches and had a mean financial impact per branch of £1,000. The chances of that bug affecting any branch is tiny. It is 16 in 3 million, or 1 in 190,000-odd. The chances of affecting a claimant branch are even tinier because the claimant branches tended to be smaller than ordinary branches. One could engage in all sorts of calculations, but your Lordship may recall from Dr Worden’s second report that he ends up with a calculation of a chance of about 1 in 427,000-odd. So for there to be a 1 in 10 chance for a bug of this scale to affect one set of monthly account for a claimant branch, one would need something like 42,000 such bugs.
Anthony de Garr Robinson QC: ‘16 branches’
He means 16 branch accounts.
Anthony de Garr Robinson QC: ‘branch is tiny’
This is not right. He means ‘any particular branch account’.
Anthony de Garr Robinson QC:
The chances of affecting a claimant branch are even tinier because the claimant branches tended to be smaller than ordinary branches.
There is a major assumption here which may well not be justified. It is that the probability of an account being compromised by an error is proportional to the size of the account. However it could be that the circumstances which give rise to an error surfacing are more likely to occur in small accounts.
Anthony de Garr Robinson QC: ‘1 in 427,000-odd’
A consequence of the previous point is that changing 1 in 190,000 to 1 in 427,000 cannot be justified. It does not actually make much difference, but should not have been included in the case.
Anthony de Garr Robinson QC: ‘1 in 10’
Now we come to the serious point. He produces the figure ‘1 in 10’ out of a hat with no justification. Nothing that has been said so far leads to a figure anything like this.
Anthony de Garr Robinson QC: ‘42,000 such bugs’
This figure too is based on the invalid 1 in 10 probability.
This is then compounded by an assumption that each observed malfunction is caused by a different error in the code. This may well not be the case, particularly if the errors in the code are not fully understood and corrected following their manifestation through malfunctions.
So the figure of 42,000 is completely spurious. The subsequent argument based on it is consequently less than worthless.
Anthony de Garr Robinson QC:
Of course there’s a much simpler way of doing it which really is just a straight calculation. There have been 3 million sets of monthly accounts so the chances of the Suspense Account bug affecting any given set of monthly accounts is 60 in 3 million or about 5 in a million, so to get a one in 10 chance of such a bug you would need to have 50,000 bugs like it.
5 in a million is wrong. 60 in 3 million is 20 in a million or 1 in 50,000. However, it is not clear where the number 60 has come from; if Mr de Garr Robinson actually meant 16, that would give a ratio of about 1 in 190,000.
However since the purpose of this calculation is then to use the fictitious 1 in 10 figure, the calculations are of no value anyway. The whole paragraph is invalid.
Anthony de Garr Robinson QC:
But, my Lord, all the roads lead to the same basic result which is that even for a significant bug of that sort, the number of bugs that would need to exist in order to have any chance of generating even a portion of the losses that are claimed by the claimants would be a wild number that’s beyond the dreams of avarice. It is untenable to suggest that there are 40,000 or 50,000 bugs of that scale going undetected in Horizon for 20 years.
This complete argument can and should be discounted.
Professor Peter Bishop’s comments are set out below with his agreement:
Dr Worden’s report cites the Suspense Account bug which had 16 failures in 3.1 million submissions. This information was used calculate the submission failure probability for the bug (around 5 10-6). It was then stated that:
… for there to be a 1 in 10 chance for a bug of this scale to affect one set of monthly account for a claimant branch, one would need something like 42,000 such bugs.
The claim here is badly phrased, I think the intended phrase was:
… for there to be a 1 in 10 chance for bugs of this scale to affect one set of monthly account for a claimant branch, one would need something like 42,000 such bugs.
This is a complete red herring. What is actually being calculated is the number of bugs needed for a 1 in 10 chance that one set of monthly accounts is affected for any branch.
The 1 in 10 criterion is a completely arbitrary figure and implies that 1 in every 10 submissions will fail (i.e. a 10-1 failure rate of Horizon submissions for all branches). This is equivalent to expecting 310,000 submissions will fail out of the total set of 3.1 million submissions. If this were the case, it would imply an average of 23 submission failures for every Post Office branch in the UK.
Counsel later states that:
… in order to have any chance of generating even a portion of the losses that are claimed by the claimants would be a wild number that’s beyond the dreams of avarice. It is untenable to suggest that there are 40,000 or 50,000 bugs of that scale going undetected in Horizon for 20 years.
But as we can see, this calculation does not correspond to reality, as nobody is claiming the Horizon reliability would be so poor. This is a strawman argument where an infeasible scenario is posited then demolished.
There is no rationale that relates a scenario where every one of the 31,000 branches has to experience 23 submission errors, to a situation where around 500 branch postmasters are falsely accused of fraud.
Alternative analysis
The approach used in the Worden report is completely irrelevant. The relevant statistical measure is the chance that a branch will be wrongly accused of fraud – not how likely it is that an individual submission will go wrong.
For a branch to become a suspect, it needs only one out of possibly a hundred account submissions to be incorrectly processed.
From the Worden analysis, the Suspense Account bug caused 16 branch submission errors, so the number of similar bugs needed to get 500 branch submission error is:
= 500 / 16
= 31 bugs
This is three orders of magnitude less than the 40,000 to 50,000 bugs claimed to be needed by Dr Worden using the flawed criterion for the probability of failure per submission.
Are 31 residual bugs credible after 20 years? – sadly yes.
With a million lines of code and typical coding best practice we might start with 1,000 to 3,000 bugs (though concurrent transaction processing software is particularly difficult to get right as it prone to transient non-reproducible failures). So, there could easily be 31 bugs remaining undetected after 20 years
Discussion and conclusions
I find it amazing that Dr Worden’s seriously flawed analysis could be viewed as credible evidence in a court of law.
Looking at the probability that an account submission can fail and saying it is tiny is meaningless on its own. By analogy, it is illogical to say that if there is only a 1 in a million chance of winning a lottery, ergo any person who claims to have won the lottery must be lying. This argument ignores the fact that increasing the number of people who buy tickets will increase the probability that somebody will win (even if your own chances remain the same). For example, if we know that 10 million people buy tickets, we would not be at all surprised to hear the 10 people won the lottery that week.
To perform a statistical analysis to determine whether the claimant’s claims are credible, we should start from the hypothesis that all branches are potential victims of random Horizon failures, then ask what conditions are needed to produce 500 victims and then consider whether these conditions are credible.
We showed that only 31 bugs similar to the Suspense Account bug are needed to cause submission failure in 500 branches. This number of residual bugs is entirely credible for a complex real-time system, and in practice there could be many more than this (even in a mature 20-year-old system).
As a result of these analyses we consider that it is entirely credible that issues experienced by the 500 claimants could have been caused by flaws in the Horizon software.
This guest post was written by Stephen Mason. This post therefore reflects the views of the author, and not those of the IALS.