In January of 1986 the Space Shuttle *Challenger* exploded seconds after liftoff, destroying the spacecraft and killing the crew. Lots of finger-pointing followed but eventually the **infamous O-rings** became the focus of the investigation. Panel member **Richard Feynman famously dropped the O-ring material into his ice water **during the hearings. And he got a clear and unambiguous result—the goddamn stuff hardened!

Feynman was a “let’s see what happens” kind of guy, and that was one of those moments where a simple experiment could cut through the B.S.

But big science isn’t so simple. Things like clinical trials involve not only an awful lot of money and time but generate mountains of data. All that information is hard to handle. It requires quite a bit of analysis before any real conclusions can be arrived at, and even then the results can be interpreted in multiple ways.

This is frustrating for regular folks. We want clear and unambiguous evidence that acai berries will protect us from cancer or that meditation will lower our blood pressure. Big science is particularly hard on journalists who want to write about “breakthroughs” and other dramatic things. Big science isn’t very dramatic. It is slow and incremental. Conclusions are couched in vague or conditional language (“the evidence seems to suggest . . .”). Scientists are like everyone else, they want fame, fortune, and glory, but as a general rule they are circumspect about grand, far-reaching statements.

Part of the problem is that scientists are not statisticians. Most experiments require significant statistical analysis. The data have to be examined and processed, and these mathematical skills are often outside the realm of the scientists’ expertise.

Designing experiments is hard. A scientist has to have the free-thinking, synthetic brain of the artist to explore all the necessary questions. And he or she has to have the constraint-driven, analytical mind of the engineer in order to turn those questions into proper experiments. You have to be imaginative and rigorous at the same time.

A lot of clinical trials use something called NHST—null hypothesis significance testing. Here’s how it goes: you create a hypothesis, something like “drug XYZ will shrink bladder tumors.” But you don’t really test that, at least not at first. You create the *null hypothesis*, which is just saying that your experimental hypothesis is invalid. “Drug XYZ will have no effect on bladder tumors.” Then you run your test.

This seems sort of backwards but it is just a way of cross-checking. You don’t want to get ahead of yourself. If the null hypothesis is not supported, that is, it is refuted by the results, you can go forward. It’s like the train conductor checking your ticket and saying it is OK and that you can continue your journey.

The problem is that you knew this already. You knew the null hypothesis was invalid. After all, you’d played around with drug XYZ beforehand and knew it had promise for shrinking tumors. That’s why you designed the trial! You aren’t going to waste valuable lab time on a fruitless endeavor.

The problem comes when you get the result you expected. That is, the null hypothesis is not supported. Now you have to see if the data back up the experimental hypothesis. And this is where you need the math geeks. There is a lot of noise out there. It is not always easy to get the signal, to pull it out of the background.

Researchers don’t go into these investigations blind. They have mathematical models of the phenomena they study and they use these to make projections. They know, in advance, what the data ought to look like. They make predictions with the models. And they test the “significance” of the data, that is, they see if the numbers they got are close to what they predicted. If they get the “statistical significance” they hoped for then they can conclude their experimental hypothesis is valid.

Maybe. Sometimes there are flaws in the experimental design. There are variables that were not accounted for. There are alternative explanations for the results. And it is easy to fall into the logical trap of “the null hypothesis is false so that must mean my hypothesis is true.” Which of course is nonsense as there could be many useful competing hypotheses.

So they learn two things from all this. One, they confirm what they knew from before. Two, that they have to run another experiment to see if the results were just noise or not!

That’s not very exciting. **Demonstrating an unequivocal outcome in front of a bunch of bureaucrats and politicians is much cooler**.

Big science does not live up to its name. Its results are usually pretty damn small!