Let’s assume the following hypothesis:
if the reliability of a dichotomical test is f, then the probability that it
gives a wrong result is 1-f.
The following question arises: Below what reliability will a test result
have a probability of being correct of less than 0.5?
Let P be the number of elements in the population, a the probability (known)
for an element of this population to have a definite feature K, and f the
reliability of the test. The number of K-elements detected by the test equals a
f P. The number of non-K detected (wrongly) is (1-a)
(1-f) P. The probability that an element detected by the test is effectively a
K-element is 0.5 if a f P = (1-a) (1-f) P, equivalent to f = 1-a. So, as
soon as f £a, the test becomes a nonsense.
A test must be more reliable if what it attempts to detect is very rare.
This simple fact is very often neglected.
Let's take an example: the alcohol test.
We assume as hypothesis that one driver out of 100 is at '0.8 or more'
(European norm for heavy offence is in excess of 0.8 gm/ltr.). In the
following table, we examine for several reliabilities of the test the
probability that somebody with a positive test is actually positive. We take a
population of 100,000 persons, of which 1,000 are supposed to be 'at 0.8 or
more.'
|
Reliability of the test |
Valid detections |
Invalid detections |
Probability
a "detection" is valid |
|
.999 |
999 |
99 |
0.91 |
|
.99 |
990 |
990 |
0.5 |
|
.95 |
950 |
4950 |
0.16 |
|
.9 |
900 |
9900 |
0.08 |
|
.8 |
800 |
19800 |
0.04 |
We can imagine the dangers of bad
interpretations of tests in, for example, the medical field.
In the first
part , we assumed the following hypothesis: if the reliability of a
dichotomical test is f, then the probability that it gives a wrong result is
1-f.
Let’s now try to see what happens if we
don’t assume this hypothesis.
Let P be the number of elements in the population, a the probability (known)
for an element of this population to have a definite feature K, f1 the probability
that a K-element is actually detected as a K-element, and f2 the probability
that a non-K-element is actually not erroneously detected by the test. In practical cases, we have
f1<f2.
The number of K-elements detected by the test equals a f1 P. The number of non-K elements detected incorrectly is (1-a) (1-f2) P. The probability that an element detected by the test will in actuality be a K-element will be 0.5 can be represented as: a f1 P = (1-a) (1-f2) P. This is equivalent to a special test condition f 2 = 1 + af1/(a -1).
The test becomes a nonsense if f2 < 1 + af1/(a – 1) .
The ratio a/(a-1) is usually very small (For a = .01, this ratio becomes – 1/99 and the special test condition becomes f2 = 1 – f1/99.)
For a = 0.01 and any reasonable value of f1 between 0.8 and 0.999, the test will make no sense if f2 < 0.99 !!
This can be easily shown with the following example: In a population of 100,000 elements, let a = 0.01. So 1000 elements will actually be K-elements. If f2<0.99, more than one percent of the non-K elements (that’s more than 990) will be invalidly detected as K-elements. And even is f1 were to have the value 1 (all the K-elements are detected), although we would still have the 1000 valid detections, we would also have more than 990 invalid ones.
* : I want to thank Fred Vaughan for his help to write this article in good
English.