Wednesday, January 22, 2014

Is falsifiability a scientific idea due for retirement?

Sean Carroll argues that it is.

He characterises the belief that "theories should be falsifiable" as a "fortune-cookie-sized motto"; it's a position adopted only by "armchair theorizers" and "amateur philosophers", and people who have no idea how science really works. He thinks we need to move beyond the idea that scientific theories need to be falsifiable; this appears to be because he wants to argue that string theory and the idea of the multiverse are not falsifiable ideas, but are still scientific.

This position is not just wrong, it's ludicrous. 

What's more, I think deep down Sean – who is normally a clear, precise thinker – realises that it is ludicrous. Midway through his essay, therefore, he flaps around trying to square the circle and get out of the corner he has painted himself into: a scientific theory must, apparently, still be "judged on its ability to account for the data", and it's still true that "nature is the ultimate guide". But somehow it isn't necessary for a theory to be falsifiable to be scientific.

Now, I'm not a philosopher by training. Therefore what follows could certainly be dismissed as "amateur philosophising". I'm almost certain that what I say has been said before, and said better, by other people in other places. Nevertheless, as a practising scientist with an argumentative tendency, I'm going to have to rise to the challenge of defending the idea of falsifiability as the essence of science. Let's start by dismantling the alternatives.

Can we use some other words instead?

Sean tries to substitute "falsifiability" with two of his own criteria: according to him, to be judged to be scientific a theory must be "definite", and it must be "empirical". 

Apparently a theory is definite so long as it makes some clear and unambiguous statement about how reality works, but this is such a broad definition as to include basically any idea about the world, from Plato to creationism to string theory and everything in between. Nor is "empirical" a better starting point, because this apparently merely means that theories should "account for the data" in some way. 

To see what's wrong with this, let's look at a concrete example. In 1857, Philip Henry Gosse wrote a book called Omphalos: An Attempt to Untie the Geological Knot, in which he put forward his theory of how to reconcile the existing evidence of fossilised animals (on which he, as a geologist, was an acknowledged expert), and his belief in the literal accuracy of the Biblical story of creation.

According to Gosse, the fossils that were being uncovered from geological strata were not evidence of creatures that had existed many millions of years ago, but rather had been put there by God when he created the world, in order to convey the impression that they were the remains of animals that had lived many millions of years ago. Other people might claim that the fossilized remains of partially digested food that was sometimes found in the stomachs of these animals indicated that they had actually once lived, and eaten, and digested food. But for Gosse this was just another part of God's master plan of deception: misleading evidence cleverly placed to further our mistaken impression. And indeed, we would expect God to not overlook even the tiniest detail, wouldn't we?

Now, clearly Gosse's theory was completely unscientific tosh, and no one has ever taken it seriously.1 But what exactly is wrong with it? After all, it is "definite": it certainly makes perfectly unambiguous statements about reality. It is also "empirical" in the sense that Sean uses the word. The theory is constructed precisely in order to explain data (the fossils), and it does so in a perfectly coherent manner. If it meets both of these new criteria for scientific validity, does that mean we must treat it as scientific?

Obviously not. As Stephen Jay Gould argued beautifully in his essay Adam's Navel, the reason that Gosse's theory is unscientific tosh is because it is not falsifiable. It is a Theory of Anything. There is no possible fossil data that it could fail to explain. It is based on some assumptions, but it is impossible to use evidence to decide whether these assumptions are true or not. Therefore it is not science.

The point, clearly, is that the criterion of falsifiability is still a useful and necessary one in distinguishing theories that are scientific, and theories that are not. You can't go around arguing that we should do away with falsifiability as a standard criterion for scientific acceptability when your suggested alternatives would pass Gosse's theory as scientific as well.

Does the meaning of the word 'falsifiability' need to be updated?

Perhaps the problem is that our notion of what makes a theory "falsifiable" is just too simplistic, too Popperian. Maybe if we had a more nuanced view of what the word "falsifiable" should mean, we would be able to sort the theories and ideas worth pursuing from their unscientific relatives.

This is an idea I actually have a bit of sympathy with. Popper's standard picture The armchair philosopher's view of how falsification works in science – you come up with a theory, and use it to make a definite prediction for what should be seen in an experiment; if you see something different, you kill the theory – is far too simplistic. [Edit: correction inserted here after being called out in the comments for misrepresenting Karl Popper!] That's simply not how science works, or even how it should work.

For a start, experiments are often wrong, so a single experimental result is rarely enough to overturn a theory. Take for instance the case of the neutrinos that apparently travelled faster than light. Almost no serious physicist believed the theory needed to be changed, despite the very impressive quoted statistical significance of 6 standard deviations (i.e., a chance of roughly one in ten billion that it was due to a fluke).2 As it turned out, quite rightly so.

Even in other less extreme circumstances, the simple picture of falsification doesn't work because scientists are all Bayesians in practice. What happens is that we start with a prior belief in the truth or otherwise of a theory (informed partly by prejudice and partly by previous experimental results), and we update this assigned probability in light of each new piece of evidence. Given a sufficiently strong prior, even conflicting empirical evidence may not be enough to cause us to reject a theory, but only slightly reduce the posterior probability we assign to it.

However – and this is where I'm going to volunteer my own definition – to be scientific a theory must admit the possibility that experimental evidence will reduce the strength of our belief in it. It may never be possible to rule a theory out, but if experiments can never make it less believable then it's not scientific. We will have stretch the meaning of the word "falsifiability" to include such cases where experimental data only allow us to shift the balance of probabilities rather than make definite yes or no statements, but that's not unreasonable. It is then still OK to insist that to be scientific, a theory must be falsifiable. If there is no earthly event that could ever occur that would make you reconsider your belief in some idea, it's not scientific, as simple as that.

Note that it isn't enough to claim that an idea is falsifiable in principle, but just not in any practically accessible realm of experiment. By that standard, Gosse's Omphalos would still be scientific, because in principle it could be falsified if God were to make himself visible to us and explain that he didn't put the fossils there to fool us but that evolution really happened. A theory is only scientific if there is any realistic possibility of experimental results that might cause us to alter the probability we assign to it being true.

This doesn't necessarily mean that those experiments must be possible to perform today. It isn't even necessary that we should be able to immediately imagine the form of the experiments that would test the theory. Because of the difficult nature of theorizing and the long time it may take to make progress in a field, or for technological advances to catch up with the state of the art, we should be willing to grant each idea a grace period: we accept that it is not yet clear whether it is falsifiable, but that it might turn out to be, and in the meantime it may be considered worthy of scientific endeavour. The length of the grace period we are willing to grant is influenced by our prior level of belief in the theory, which will in turn depend on such factors as its internal logical coherence, the number of new assumptions it requires, and the rate of progress towards establishing falsifying tests. But no grace period is infinitely long.

So what about the multiverse and string theory? 

Of course we were always going to end up here, because the whole point of Sean's piece was to argue that although the multiverse and string theory are not falsifiable ideas, they should still count as science. I hope I've convinced you that this is rubbish. I'm not necessarily saying that the multiverse idea isn't falsifiable in the slightly broader sense defined above; but if it isn't, it certainly should not be considered a scientific theory.

One line of defence that people sometimes produce in favour of the multiverse – Sean's original piece does this, but so does Shaun Hotchkiss here – is that it may not be falsifiable, but it still could be true. [Edit: Shaun states that this is not quite his position, see the clarification in this comment below.] I think this misses the point. Of course it could be true. So could Omphalos. If you think about it, there is no way to prove that God did not cunningly create the universe just to appear as if inflation and the Big Bang and evolution and all the rest happened, therefore the theory that he/she/it did so could be true. But this does not make it scientific, nor does it make it worthy of study. In fact we would almost certainly still gain more concrete knowledge of the nature of reality by spending our time studying other, falsifiable, theories, even if they are subsequently falsified.

On the other hand, some ideas are worthy of study irrespective of whether they can be tested, and irrespective of whether they are "true" or not. For instance, proofs of theorems that the require the axiom of choice to be false could be just as mathematically interesting as those which require it to be true. A thorough evaluation of the logical consequences of a clearly stated set of untestable assumptions is a fine intellectual pursuit. But it is then mathematics we are doing, not science.

If the assumptions underlying string theory or the multiverse cannot ever be tested by experiment, then that is what they should be classified as – mathematics. We need the clear demarcation between what is science and what isn't in order to keep out intruders like Omphalos. As scientists, we face a constant struggle against unscientific nonsense in the world at large. Falsifiability is our great weapon in this struggle, and we shouldn't give up that discriminatory power in order to fit someone's favourite square peg into a round hole.


That said, there is a way in which the multiverse idea that arises from string theory could potentially be falsifiable. This is through its "prediction" that the parameters that appear in the laws of physics should lie close to what are called "catastrophic boundaries". For a good introduction to how this reasoning works, you could read Shaun's posts on the multiverse here and here. If these parameters could be shown to lie close to catastrophic boundaries, this could tip the probabilities in favour of the multiverse. Or if they did not, it would strongly argue against it.

In such a sense, the multiverse would be falsifiable and therefore scientific, even if we could never conclusively say if it were true or not. (It seems to me that this is the sensible approach for defenders of the multiverse to take, rather than ridiculously glorifying in its lack of falsifiability, as Andrei Linde does here.)

In fact it is often argued (as Shaun does) that the strongest point in favour of the multiverse is that it explains the extremely small value of the cosmological constant we observe. This would indeed be a very strong point in its favour! However, I don't think this issue is quite so simple as is made out. In my next blog post, I will try to explain why.

1 Gosse's contemporaries rejected his ideas out of hand, though they did so not necessarily because they understood evolution, or the scientific method, but because they – perhaps understandably! – could not bring themselves to believe in a God who would be so deceitful.

2 The fact that they didn't believe the theory needed to be changed didn't of course stop them from immediately speculating on ways in which it could be changed. This is because theoretical physicists are opportunists, who will take any chance to write a newsworthy paper. The fact that they expected the result to go away soon just added an urgency to try to publish quickly.


  1. Given that we had a discussion in the comments at my internet hang out location I thought I'd leave a comment at yours saying that I actually agree with basically everything in this article.

    When I write "what if it is true?" I have in mind that we therefore need to find a way to test whether it is true, even if the test is not measurable today. And I strongly approve of your two points that 1) falsification should be viewed in a more Bayesian sense to just mean testable in a way that would reduce our confidence that it is true and 2) falsification doesn't have to occur within or lifetime (although it does need to be conceivably possible, at some point).

    Given your good definition of falsifiability, I think both string theory and the multiverse are falsifiable. So, I'm very curious to read your comments about catastrophic boundaries, where perhaps we will disagree.

    1. Actually, I wouldn't mind if you clarified In your post that I don't think that, even if a multiverse isn't falsifiable, it should still be called science. I appreciate that my words can be read that way, but they can also be read the way I explain in the comment above, which was my intention. I did write that we can infer its existence, which was my way of alluding to the ideas of determining that the correct model of inflation generates a multiverse and/or the existence/discovery of catastrophic boundaries.

    2. Sure - I have added a clarification and a link to your comment. The issue with the cosmological constant and catastrophic boundaries is essentially one of details, so it will take a little while to write that post!

  2. @Sesh I agree that naive falsificationism should be challenged. Unfortunately however you have repeated the now mythological view in philosophy of science texts that Karl Popper was a naive falsificationist. Even a cursory reading of his "Logic of Scientific Discovery" (1959) or "Logik der Forschung" (1934) discounts this unfortunate myth. If ever there was a less naive philosopher it would great to be shown one.

    I understand that arguments are built on rungs but this particular one is unfortunate. I could direct you to Rafe Champion's Amazon Kindle Guide to The Logic of Scientific Discovery for a digestible but accurate summary.

    With respect to subjective Bayesian heuristics some vigilance is also required. We must sharply distinguish between the question of whether a statement is decidable (whether we believe we can prove it true or false) and the question of its truth.

    1. Thanks for the correction. Being only an armchair philosopher it's perhaps not surprising that I misrepresented the real thing. I guess in that respect my impression of Popper being a naive falsificationist is not uncommon among scientists. But I'll accept I was probably arguing a straw man.

      I agree with the distinction between whether a statement is decidable and whether it is true. But I don't think it is necessary that only strictly decidable statements be scientific. So long as experimental data can alter the balance of probabilities it is allowable, even if a "proof" of falsehood is not possible (even in principle).

  3. This attitude---changing the rules every time you find yourself being restricted by them---gives science a bad name among anti-science philosophers of the first kind. (There are two kinds of philosophers: those who love wisdom and those that wisdom also loves them.) I have a book by a philosophy professor who claims that the history of the philosophy of science is nothing but a series of apologies: every generation of philosophers apologizes for the previous-generation demarcation of science, that acting scientists also obviously transgress, yet still offer a new, more spacious demarcation; and this has been going on since Francis Bacon, to this day. His conclusion is, of course, that philosophers should stop apologizing for scientists and, instead, place it in the same category as religion, art, politics, and all other fallible human intellectual endeavors.
    Thus, I wouldn't endorse qualifying falsifiability. I would, instead, qualify science-related activities and statements. It's OK to spend whole careers on string theory, it's also OK paying for it by the taxpayer, as long as the limitations---current and in-principle---of these activities is clear to all involved.

    1. Not being a string theorist, I'm not personally restricted by any current understanding of falsifiability so that's certainly not a motivation for my argument. I would argue for the qualification of the (pop-science view of) falsifiability on the grounds that logically no experiment can reduce the posterior probability assigned to a theory to exactly zero. Having accepted that, I then realise that any non-zero probability cutoff I could choose to apply to determine when "falsification" has occurred is necessarily arbitrary, so it seems sensible not to set a fixed cutoff.

  4. Hi Sesh, good to see you blogging again!

    I don't really know much about string theory, but to me there seems to be a middle ground, or potential problem.

    One thing I don't really know is whether string theory reproduce GR / standard model in some limits? I'll assume that it does. Then, in this case, it certainly seems to have much more scientific validity than Gosse's theory. If GR and the standard model can be united in string theory, but it is not able to make any testable further predictions beyond what they can manage, then this seems to be a slightly different issue.

    I guess there has been work on what happens when you have two different theories that give the same predictions, but differ in other perhaps un-testable ways. However, i don't know enough on the subject.

    However, if string theory really does produce GR and the standard model, but no new testable predictions, I don't see how this makes it less "scientific" than the current theories. If a theory makes many falsifiable predictions, and one unfalsifiable prediction, does it stop being scientific?

    1. Hi Simon. I'm not an expert on string theory, so if the statement I am about to make is incorrect, I hope someone will come along and tell us. My understanding of the current state-of-the-art is that amongst the $10^{500}$ vacua of the string landscape, no one has been able to locate an example which definitely does contain the Standard Model. However, no one has proven the converse either, and the usual assumption is that such an example does exist.

  5. A fellow armchair philosopher, I'd take a slightly different path, to conclusions not a million miles away.

    In the first part of your post, I feel there's two issues here, which I feel you step between: how is science done, and how should science be done. This is a distinction that's pretty important (if only because it's a reasonable way of looking at the Kuhn vs Popper controversy -- Kuhn focused on how science was done: the sociology; Popper looked at how it should be done: the logical underpinnings). There's a reason Popper called his book "The Logic of scientific discovery".

    Now both questions can be informed by actual scientific practice, but we should be careful how far we go in objecting to e.g., the falsification criterion because scientists don't always behave like this. (Compare: arguing that formal logic is incorrect because people don't think logically in all situations. In both cases, we've mistaken the subject of study).

    So, would be wary of ramming together Bayesian and Popperian thinking just on the practical evidence that scientists are often Bayesian in the course of their work. Rather, let's separate the two: Bayesianism could be an alternative account of scientific reasoning (a new logic of scientific discovery that replaces Popper's -- which it's worth noting it does have the potential to be: John Earman's "Bayes or Bust" is good here). Falsification is redundant in this case: doing science is applying Bayes' rule to experiments where your priors are theories.

    Or alternatively it could be that you use Bayesianism to assess how much you believe the reported experimental result is showing a real effect. i.e., whether you accept it as a falsifying instance, whereupon you can go Popper's route. I prefer the second option, and this requires no adjustment to the falsification criterion: you just accept that in the messy actual world, you need some way of working out which of the experimental reports you accept. Bayes is helpful.

    So, over in the second half of the post to the central question of whether landscape-y string theory is science. No, I agree, and think it's maths. Pure maths inspired by physics is a huge field (e.g., differential equations, which are only interesting because of their physics implications) can be studied as pure maths.

    The thing I find weird is that results from that pure maths can have measurable implications, and then they become science. So how do I tell, while doing this maths, whether I'm going to come up against something that suddenly, magically, makes it science for a little while, and then dives back into maths again?

    I'd appeal to Popper again (yes, I'm a fan). He's engagingly blunt about the process of coming up with theories/hypotheses/etc. They can be discovered in any way: induced from a large number of experimental results, worked through on a theoretician's blackboard, or they can be revealed through tantric meditation. Doesn't matter. What matters is the content of the hypothesis, and whether there's a falsifiable statement. So again, I think there's a confusion: between the process of creating hypotheses, and their content. I can be as science-y as I like when I create hypotheses (wear a lab-coat, use lots of maths), it's irrelevant. What matters to whether it's science is the content of the hypothesis.

    So that leads us to a weird situation. Let's say we start with a set of science-y processes that up until now have given us scientific hypotheses to test. And they're powerful ones: QFT for example, so we extend these ideas to what seems to be to us to be their logical conclusion, and ... we end up with something that's not science any more. Even if we're right, and this is indeed the right path to take and we've uncovered the right facts about the uni/multi-verse, we've reasoned ourselves out of science altogether.

    1. Regarding your first argument, I think we are talking about slightly different things here. You're imagining using a Bayesian framework to evaluate how much you trust an experimental result which conflicts with a theory given an absolute belief in a theory. That's actually not at all what I meant – though I suppose in certain special cases it could be useful in determining when a particular experimental result is due to an unnoticed systematic error (or incompetence/fraud).

      My scenario for the use of Bayes is different. You start with assigning a prior probability $P(M)$ to the given theory/model, which is a way of quantifying your belief in it, given the available prior information (which includes all previous experiments, and the merits of other competing theories). You then obtain new information, i.e. the data from the latest experiment. This is generally accompanied by information on the probability of the data given the model, $P(D|M)$, which is what is usually used to quote sigmas, significances, confidence limits and so on.

      Now, a really naive falsificationist might insist on using $P(D|M)$ alone. But actually in order to make any judgements based on the new information, what you really need is the probability of the model given the new data, $P(M|D)$, which is calculated simply enough from Bayes' theorem. This is now your updated level of belief in the model. When evaluating the impact of the next experimental result, it is this probability that you use as your prior.

      Anyway, the use of Bayes is primarily because that is the only completely logically consistent approach (though it is often difficult to sensibly quantify the starting prior probability). The question is then just what to we require from $P(M|D)$ to classify a theory as "falsifiable"?

      The thing is, $P(M|D)$ can never be exactly zero: demanding that this be possible means no theory is ever falsifiable, so nothing is scientific. But what other level should we demand? That it be possible to imagine data such that $P(M|D)<0.01$, or $<0.001$, or $<10^{-10}$? All of these are arbitrary limits which are possibly inappropriate in some circumstances. What I'm trying to say is simply that a theory should be considered "falsifiable" if it is possible for $P(M|D)$ to be less than $P(M)$, i.e. for the level of belief in a theory or model to decrease in response to new data.

    2. Out of curiosity, why do you focus on P(M|D) potentially being *less* than P(M), instead of just *different*. One's belief in the correctness of the "God faked the fossils" model won't change at all, whatever the measurement, but would you claim something isn't scientific if the probability of the model given the data could only increase or stay the same?

    3. Actually, maybe that's a dumb question. It seems conceivable to me that such a situation might actually be impossible, or at least highly contrived... (i.e. if P(M|D) can increase for one type of result of a measurement, getting the other result must decrease it, even if only negligibly).

  6. To complement your thoughtful comments, Popper did stress that we should not let go of theories lightly. He did in the 1950's develop the concept of metaphysical research programs. The publication was delayed unfortunately until, for example, the Postscript "Quantum Theory and the Schism in Physics" in 1982 but private papers were circulated earlier, e.g. to Lakatos who refashioned it into what he called scientific research programs. Popper saw an importance in the role of metaphysical (unfalsifiable) theories in the selection and articulation of theoretical problems. Metaphysics and ideas of all kinds may lead to testable scientific theories and testable theories are bathed in such programs. Such research programs are indispensable to science.

  7. An intelligent piece, Sesh. I imagine Sean Carroll is wishing he'd put as much thought into his Edge response.

    Mind you, I think the Bayesian approach is its own slippery slope. It's associated with subjectivity and belief, when scientists ought to be striving for objectivity and no "belief". I'm only an interested amateur, but I'm a taxpayer too, and I narrow my eyes whenever I hear the word sigma. I'm also astonished at how selective physicists can be with evidence. Show them that constants are "running constants" and therefore not constant, and it goes in one ear and out the other. Then in the next breath they're back to talking about the fine-tuned constants and the goldilocks multiverse. You know how you mentioned creationism? Those creationists aren't creationists because they're religious. But because they are people. And people are very good at believing in things for which there is no evidence, so much so that they will dismiss any evidence that challenges that belief.

    John Duffield

    1. Thanks for the compliment, but I don't agree with your characterisation of Bayesianism. It certainly doesn't have to be associated with any loss of objectivity. It is simply the logically consistent method of quantifying how well a given theory is supported by a variety of experimental data, some of which by be contradictory.

      Quantitative Bayesianism my be difficult or impossible in some circumstances, depending on the difficulty in sensibly assigning a prior. However, even in these circumstances, I think it should always be possible to determine whether $P(M|D)<P(M)$ in the sense I explained in the comment above. This falsifiability criterion is then independent of the prior.

  8. I agree with Carroll. I've been saying forever that we have good reasons to conclude that Extraterrestrials are out there. Why should we have to prove it absolutely. We are never gonna be able to travel out that far.
    And I know its more controversial but ghosts too. Its not like we dont have plenty of pictures and Im sure I can balance an equation that works on paper for the public.
    I know when I die I would feel a lot better knowing everyone has been wrong about God. I admit the big bang, dna code, and now the fine tuning has got me worried but its good to know my atheist bro's are out there workin hard to get multiverse in the textbooks. Screw proof. I need to get a goodnight sleep without worrying that all my antigod tirades are penalty free