DATA DREDGING AND THE PROBLEM OF MULTIPLE COMPARISONS
In all our statistical interpretations regarding the acceptance
of data-determined conclusions, we always recognize that there is the random chance
of a highly unlikely statistical result being misinterpreted as the typical true
conclusion. A coin tossed six times and coming up all heads might lead us to conclude
the coin was not fairly balanced, yet that will happen every 26
= 64 times that it is attempted even with an entirely fair coin.
When we pick an alpha value of .05, we are roughly saying that about 5% of the time
we will accept misleading random errors as though they were true. This is roughly
equivalent to recognizing that for every 20 experiments we do, perhaps 1 will be
wrongly accepted as true based solely on bad luck and random chance. If we analyze
a large set of data in many ways, some random results may appear statistically significant
just based on this sort of chance. This is the error of "data
dredging."
In a similar manner, if a very large number of independent comparisons
are made between two groups of subjects and we accept an error rate of 5%, we might
get a false, but statistically "significant" result about once for every 20 comparisons
made. This is the error of "multiple comparisons."
Mathematical methods are available to correct for these errors—what is most
relevant for the consumer of the statistical analysis is to recognize the presence
of the error and look for the correction.