One of the unfortunate realities of science is that small data sets often produce unreliable results, as any minor, random fluctuations can have a large impact. One solution to this issue has been building ever-larger data sets, where these fluctuations tend to be small compared to any actual effects. One of the notable sources of big data is the UK Biobank; brain scans from people in the Biobank were recently used to identify changes in the brain driven by SARS-CoV-2 infection.
Now, a large team of researchers has turned this idea upside down in a new paper. They took some of the biggest data sets and divided them into smaller pieces to figure out how small data sets could go before things got unreliable. And for at least one type of experiment, the answer is that brain studies need thousands of participants before they’re likely to be reliable. And even then, we shouldn’t expect to see many dramatic effects.
Associate all the things
The research team behind the study termed the type of work they were interested in “brain-wide association studies,” or BWAS. It’s a pretty simple approach. Take a bunch of people and score them for a behavioral trait. Then give them all brain scans and see if any brain structures have differences that consistently correlate with the behavioral trait.
By analyzing the whole brain at once, we avoid any biases that might come from what we think individual brain regions do. The downside is that we’ve defined a lot of brain structures, increasing the chance of a spurious association. And people have published BWAS studies with just a few dozen participants, meaning random chance could play a large role in any results.
For the current study, the research team combined three large data sets to create a total population of over 50,000. They then ran every possible association they could, given the behavioral traits that had been scored in the participants.
The simplest thing they did was search for the strongest correlation they could find. There’s a measure of the strength of a correlation, termed r, where a value of 1 represents a perfect correlation and zero represents no correlation (-1 is anti-correlation). In terms of r, the largest association the researchers found among billions of tests was 0.16—which is not especially strong. In fact, a correlation as weak as r = 0.06 was enough to get something into the top 1 percent of all correlations. (The same was true for anti-correlations.)
Unsurprisingly, many studies have already reported correlations stronger than these. The results suggest that we should be treating these results pretty skeptically.
Where things go wrong
To further explore the potential issues with association studies, the researchers divided the study population into much smaller groups, ranging from only 25 participants to as many as 32,000, and then reran the BWAS in these smaller populations. In the smallest studies, associations could reach as high as r = 0.52. That’s much stronger than we would expect to see based on the full data set, and it suggests some pretty severe problems with small studies.
But the researchers had to go much larger for these issues to go away. “Statistical errors were pervasive across BWAS sample sizes,” the researchers write. Even with populations in the area of 1,000, false-negative rates were very high, meaning that an association found in the full data set wasn’t detected. And real associations sometimes appeared to be twice as strong as they were in the full population.
Overall, it appears that we need multiple thousands of participants before BWAS-style studies are likely to produce reliable, reproducible results.
The researchers caution that this work applies to a specific type of brain study. It doesn’t mean that all brain studies with low populations are unreliable—in fact, the paper shows that we’ve learned a lot about brain function from many small studies. I’d note that much of what we understand about the function of different areas of the brain comes from studying injuries that affect a single individual. The authors also find that some related analyses—using functional MRI or performing multivariate analysis—tended to produce more robust results using their data set.
Still, the paper provides a clear and important caution to people doing research in the field. The question is how that caution will be acted on. For this idea to change the standards upon which papers are published, journal editors will need to pay attention, as will other researchers in the field who act as peer reviewers. Fortunately, the growth of large, public data sets like the Biobank will make it easier for everyone to demand larger, more rigorous studies.