“What Are the Chances? Interpretation of a DNA Match” is a guest blog post by Sue Carney.
Didn’t someone once say “There are three kinds of lies: lies, damned lies, and statistics.”? Whoever said it is another story, but the point I’m making is that when numbers are used to support a point of view, there may be doubt or even a lack of understanding.
In a recent #CClivechat I said that elimination of a person with non matching DNA is definitive but matching DNA is not. This warrants further explanation and here I’ll explain how a number can describe the strength of a matching DNA profile, where those numbers come from, what they mean and perhaps more importantly, what they don’t mean.
DNA profiling is not conclusive. People are often under the misapprehension that if a DNA profile matches a person – let’s say the profile’s from a blood stain at a crime scene and it matches the suspect in a murder enquiry – then it follows that the blood stain has originated from that suspect. This seems a logical assumption, but DNA evidence is not nearly as clear-cut and the assumption is not correct. Sure, there is a very, very good chance that the blood is from the suspect, but there are other factors to consider: Could anyone else have the same DNA profile as our suspect? DNA is inherited, so could a relative be expected to have the same DNA profile? How often might we expect to see those matching DNA results in the DNA profiles of the general population? Who are the general population and would they have had opportunity to leave their blood at the crime scene?
Suddenly the certainties attached to a matching DNA profile are less clear. To help us avoid some potential pitfalls, we’ll examine some of the principles forensic scientists use to interpret DNA evidence.
The forensic scientist always considers the findings (the evidence) in terms of two alternatives or propositions.
These two alternatives show the prosecution and defence versions of events. Continuing with our example, the prosecution version is that the blood stain is from the suspect. The defence version may vary but that’s ok. If the blood at the crime scene isn’t from the suspect, then the defence proposition helps us to decide who to consider as a possible alternative source of the matching DNA. For example, the suspect might say that the blood isn’t his, but from his brother or some other relative. As discussed earlier, relatives are much more likely to have similarities in their DNA profiles, so this might be a possible defence. If the brother’s reference DNA sample is available, he can be compared to the crime stain and if the DNA profiles don’t match, the brother can be eliminated as a possible source of the blood. However, if this classic defence is used in a trial, it’s likely that a reference profile from the brother won’t be provided, and the scientist will be forced to offer a modified figure to show the likelihood of the match if the blood is not from the suspect but from his brother. More about this later.
If the ‘brothers defence’ is not suggested, then the defence view might simply be that the blood is not from the suspect and could be from any other male in the general population. In such cases, the scientist might describe the defence proposition as the blood being from someone other than and unrelated to the suspect.
The forensic scientist always describes the probability of the findings (the evidence) and never the probability of the proposition.
This principle is drummed into forensic scientists during training. Getting this wrong is taken very seriously. UK appeal court rulings such as R v Doheney & Adams  give very specific guidance on the presentation of DNA evidence and define the roles of the expert witness and the jury. The expert witness provides opinion on the likelihood of the evidence given a particular proposition, whilst the jury use the opinion of the forensic expert to make up their minds about the likelihood of the proposition. In other words, to decide on guilt or innocence.
Lets recap here on our DNA case example: The prosecution proposition is that the blood at the crime scene is from the suspect. The defence proposition is that the blood isn’t from the suspect but it’s from someone other than and unrelated to him. The scientist has used DNA profiling to try to discuss the issue of who the blood could be from, and explains that a full DNA profile was obtained from the blood. The DNA results show no signs of a mixture of DNA from more than one person, and the profile matches that of the suspect. The chance of obtaining the matching DNA profile if the blood is not from the suspect but from someone other than and unrelated to him, is estimated to be in the order of one in a billion (a UK billion = a thousand million.) Or, the matching DNA profiles are a billion times more likely if the blood is from the suspect and not from another unrelated male.
The media often report the chance of the blood being from someone other than the suspect as one in a billion. This is wrong! The commentator has referred to the chance of the defence proposition and not the chance of the evidence. (Check the structure of the sentence and refer to the definitions of our propositions in the previous paragraph.) Lawyers often make this mistake too, although less so these days as DNA evidence has become more commonplace. As scientists we refer to this error as ‘The Transposed Conditional‘ or ‘The Prosecutor’s Fallacy.’
Here’s an analogy to explain the prosecutor’s fallacy. Imagine a trustworthy person, such as the Archbishop of Canterbury, plays a round of poker. (We may or may not agree with his righteous & trustworthy status, but let’s assume it for the purposes of the example.) The Archbish has an excellent hand and wins the round, but he’s accused of cheating. One might ask what‘s the chance of him being dealt such a good hand if he’d cheated. Others may ask the chance that he’s cheated, given that he’d had such a good hand. They’re the same, right? Wrong! The first question is the correct way to consider the problem. We could calculate the chance that he’d be dealt that particular hand and win – this is the probability of what was observed or ‘the evidence.’ The second question is transposed. In trying to calculate the likelihood of cheating, we’d be trying the calculate the probability of a proposition. A classic example of the prosecutor’s fallacy. More importantly, even if we could decide both values, we’d find that they are not the same.
This is Bayesian thinking, so called after Bayes’ theorem, a formula to calculate how the chances of a particular event are affected by a new piece of evidence. A useful summary of the appeal court rulings affecting DNA evidence can be found here.
The appeal court rulings use the term ‘random occurrence ratio’ to describe the one in a billion statistic. Forensic scientists prefer to call this the match probability, arguing that the term random occurrence ratio has no real scientific pedigree. So how is the probability of a DNA profile calculated and why is it one in a billion? Standard UK DNA profiling examines ten areas of DNA plus a sex test. DNA inside cells is packed tightly into discrete subunits called chromosomes. Since each of the areas of interest is located on a different chromosome, the chance of inheriting one particular DNA profiling result has no affect on the chance of inheriting any of the other possible results. Inheritance of each result is an independent and random event, based on the results that your parents have in their profiles. In the same way, if one had six (unloaded) dice, throwing a six with any one of them would have no affect on the chances of throwing a six with any of the others.
The laws of probability state that the probability of a set of multiple events all happening can be calculated by multiplying together the probabilities of each separate event. This is called the product rule. For example, the chance of throwing a six is one in six or one sixth. The chance of throwing two sixes is one sixth multiplied by one sixth, which equals one in thirty six. The chance of throwing three sixes is one sixth multiplied by one sixth, multiplied by one sixth, which equals one in two hundred and sixteen… You get the picture.
To calculate the probability of a DNA profile, we first need to know how often every possible result is observed within the general population. Since we haven’t DNA profiled everyone in the UK, we must make a conservative estimate of how frequent each DNA result is. Databases of volunteers’ DNA profiles help here. They contain sufficient profiles to be representative of the three most frequent racial groups in the UK. These databases are completely separate from The National DNA Database, and are only used for these calculations. Frequency databases containing relevant samples are used to calculate match probabilities in other nations.
The probabilities of each DNA result in our crime stain profile are multiplied together using the product rule and minor adjustments are made to the final match probability figure to take account of possible relatedness and other factors. Why then, is the answer always one in a billion? Well, erm… it isn’t!
In the early days of DNA evidence and statistical interpretation, match probabilities were calculated for many real and some made up DNA profiles. Think of this as a testing or validation phase. Even for the most common DNA profile – a created profile containing the most frequent results at each area – the match probability is a very small fraction of one in a billion (or one in an inconceivably large number.) In other words, match probabilities show that all profiles are considerably rarer than one in a billion.
It was agreed at that time to use ‘one in a billion’ as a cut off point. Match probabilities rarer than this might become incomprehensible to jury members, and this has been accepted as standard practice in the UK, whilst scientists in other countries quote the true match probability. Could this affect court proceedings? Recent research in the US suggests that the so called CSI effect (jurors being more likely to reach a guilty verdict if the case includes forensic evidence) is not as pronounced as anticipated, but the effects of quoting such inexplicable numbers remains unclear.
A match probability is less rare (one in a smaller number) for partial profiles, or if relatives are being considered in the defence proposition. For example, the DNA match is estimated to be ten thousand times more likely if the blood is from the suspect rather than from his brother.
Plans are afoot to introduce a new sixteen area DNA profiling system to the UK. In my view, the estimated match probability should remain at one in a billion, even though the system will have greater discriminating power, i.e. less chance of an adventitious match. To illustrate this, consider a huge car park. The chance of observing a red car might be relatively high, since red cars are not that unusual. What if the red car had also to have leather upholstery? This might decrease the chances of finding a vehicle matching our criteria. If we are then asked to estimate the chances of a red car with leather upholstery and a particular brand of audio equipment, the chances decrease further. In fact, the chance of observing a vehicle matching all the criteria becomes less likely, the more criteria there are. Therefore, by adding five additional criteria to the DNA profiling tests, the chances of findings a match should be less likely if the DNA is from anyone other than the matching suspect.
A DNA match may be a very powerful piece of evidence, but the question the courts really want the answer to is ‘How did the DNA get there?’ There should always be context. If the blood isn’t from the suspect, which members of the general population were in the vicinity of the crime scene at the time? More importantly, the suspect might have a valid reason for the presence of his blood at the scene.
Finally, if real match probabilities are so minute, why isn’t it better to simply say that the matching DNA is from that person? Well, unless a scientist has compared the entire DNA sequence from a crime stain, with the genome of the suspect (his entire DNA) and found that every base pair matches, they could never be 100% sure, especially if the suspect has an identical twin. Aside from the legal reasons, it just wouldn’t be accurate, and scientists hate inaccuracy.