Friday, March 21, 2014

BICEP2: reasons to be sceptical, part 1

As the dust begins to settle following the amazing announcement of the discovery of gravitational waves by the BICEP2 experiment, physicists around the world are taking stock and scrutinizing the results.

Remember that the claimed detection is enormously significant, in more ways than one. The BICEP team have apparently detected an exceedingly faint B-mode polarization pattern in the CMB, at an order of magnitude better sensitivity than any previous experiment probing the same scales. They have then claimed to have been able to ascribe this B-mode signal unambiguously to cosmological gravitational waves, rather than any astrophysical effects due to intervening dust or other sources of radiation. And finally they have interpreted these results as direct evidence for the theory of inflation, which is really the source of all the excitement, because if true it would pin down the energy scale of inflation at an incredibly high level, with extensive and dramatic consequences for our understanding of high energy particle physics.

However, as all physicists have been saying, with results of this magnitude it is important to be very careful indeed. Speculating who should get the Nobel Prize (or Prizes) for this is still premature. The paper containing the results will of course be subjected to anonymous peer review when it is submitted to a journal, but it has also already faced a rather extraordinary open peer review by social media, with a live group on Facebook, and all sorts of other discussion on blogs, Twitter and the like. (And to the great credit of the scientists on the BICEP team, they have patiently responded to questions and comments on these forums, and the whole process has been carried out very civilly!)

What I wanted to do today is to possibly contribute to that by gathering together all the main points of concern and reasons to be sceptical of the BICEP result. This is partly for my own purposes, since writing things down helps to clarify my thoughts. I will divide these concerns into three main categories, addressing the following questions:

  • how certain can we be that BICEP2 observed a real B-mode signal?
  • how certain can we be that this B-mode signal is cosmological in origin, i.e. that it is due to gravitational waves rather than something less exciting?
  • how certain can we be that these gravitational waves were caused by inflation?

I'll discuss the first category of concerns in part 1 of this post and the next two together in parts 2 and 3. I do not claim that any of the concerns I raise here are original, however any mistakes are definitely mine alone. I'd like to encourage discussion of any of these points via the comments below.

How certain can we be that BICEP2 observed a real B-mode signal?

This is obviously the most basic issue. The general reason for concern here — and this applies to any B-mode detection experiment — is that the experimental pipeline has to be able to decompose the polarization signal seen into two components, the E-mode and the B-mode, and the level of the signal in the B-mode is orders of magnitude smaller than in E. Now, as Peter Coles explains here, the E and B polarization components are in principle orthogonal to each other when the spherical harmonic decomposition can be performed over the whole sky, but this is in practice impossible. BICEP observes only a small portion of the sky, and therefore there is the possibility of "leakage" from E to B when the separating out the components. It would not take much leakage to spoil the B-mode observation.

Obviously the BICEP team implemented many tests of the obtained maps to check for such systematics. One of the ways to do this is to cross-correlate the E and B maps: if there is no leakage the cross-correlation should be consistent with zero. Another important test is the jackknife technique, also nicely explained here: you split your data into two equal halves, and subtract the signal found in one half from that in the other; the answer should also be consistent with zero.

Now one source of concern arises because of a combination of these two tests. The blue points in the following figure show the results of a jackknife test on the BB power:

These points are consistent with zero ... but they are possibly too consistent with zero! The $1\sigma$ error bars of each one of them passes through zero, whereas it would be more natural to expect some more scatter. In fact from the number on the plot you can see that there is only a 1% chance that all 9 blue points should be so close to zero.

This raises the possibility, pointed out by Hans Kristian Eriksen, that the errorbars on the blue points are overestimated. It may then be the case that the errorbars on other points in other jackknife tests are also too large. If that were the case then reducing those errors might mean that some of the other jackknife tests now fail — the points are no longer consistent with zero. As it happens, of the 168 jackknife test results listed in the table in the paper, quite a large number (about 7) of them already "fail" by the stricter standards (2% probability) some other experiments such as QUIET might apply. Obviously some number of tests are always expected to fail, but more than 7 out of 168 starts to look like quite a large number. This then becomes a little worrying.

On the other hand, this extrapolation may be a little exaggerated, because we are surmising that the errorbars might be too large purely on the basis of the one figure above. Clearly if you do a large number of jackknife tests, it becomes less surprising that one of them gives a surprising result, if you see what I mean. Looking through the table for the other BB jackknife results, the particular example from the figure is the only one that stands out as being odd, so it is hard to conclude from this that the errorbars are too large. Overall I'm not convinced that there is necessarily a problem here, but it is something that deserves a little more quantitative attention.

The second source of concern that has been highlighted is that the data at large multipole values appear to be doing something odd. Look at the 5th, 6th and 7th black points from the figure above, which are quite a long way from the theoretical expectation. Peter Coles helpfully drew a little blue circle around them:

The worry here is that even if the data appear to be passing jackknife tests for internal consistency and null tests for EB cross power, the fact that these points are so high suggests that there is still some undetected systematic that has crept in somewhere. This hypothesized systematic could account for the measured values of the crucial first four points, which constitute the detection of the gravitational waves.

Similarly, people are worried about the EE power spectrum, which appears to be too high in the $50< \ell<100$ region — again this could be a sign of leakage from temperature into polarization, which could perhaps be contaminating the B-mode maps despite not explicitly showing up in the jackknife consistency checks.

Now, the BICEP response to this is that you shouldn't judge things simply "by eye". The EE excess does not appear to be statistically significant. It's also not incredibly unlikely that the final two of the circled BB data points could simultaneously be as high as they are just due to random chance — they say "their joint significance is $<3\sigma$", which means that the chance is about 1%. (Of course the chance that all three of the circled points could simultaneously be high is smaller than that, and so presumably less than 1% ... )

Another justification some people have been providing (mostly people from outside the BICEP collaboration to be fair, though some from within it as well) is that the preliminary data from the Keck array, which is a similar instrument to BICEP but with higher sensitivity, appear to show no anomaly in that region. I think this is a somewhat dangerous argument, because the Keck data also don't seem to be quite so high in the region of the crucial first four bandpowers! In any case, the "official" word from BICEP is that any such speculation on the basis of Keck is to be discouraged, because the Keck data is still very preliminary and has not been properly checked.


I'm a little bit worried about the various issues raised here, though overall I would say the odds are in favour of the B-mode detection being secure (this is a different issue to whether this detected signal is due to gravitational waves! More on that in the next post). I would not, however, put those odds at anywhere near 1 in 300,000,000,000 against there being an error, which is the headline significance claimed for the detection of a non-zero tensor-to-scalar ratio ($7\sigma$). If I were forced to quantify my belief, I would say something more like 1 or 2 in 100. That's not particularly secure, but luckily there are follow-up experiments, such as Keck and Planck itself, which should be able to reassure us on that score soon.

A final point: seeing the preliminary Keck data shown in a figure in the paper suggests to me that perhaps the final analysis of Keck data will now not be done "blind". I hope that's not the case, it would be very disturbing indeed if it were. 


  1. There is some LaTeX typo or whatever in this sentence: "Similarly, people are worried about the EE power spectrum, which appears to be too high in the Undefined control sequence \lesssim region — again this could be a sign of leakage from temperature into polarization, which could perhaps be contaminating the B-mode maps despite not explicitly showing up in the jackknife consistency checks.".

    1. Great, thanks for pointing that out. Now fixed.

  2. Thanks for the nice write up.

    I would like to see more data before getting too exited. Looks like there are two bumps in the b mode. Like a 2 hump camel.

    With more data we should get a nice curve like the ams did on the space lab. Just wait a year or two.

    Back to wo4k we go.