Tuesday, March 19, 2013

A real puzzle in cosmology: part II

(This post continues the discussion of the very puzzling observation of the integrated Sachs-Wolfe effect, the first part of which is here. Part II is a bit more detailed: many of these questions are real ones that have been put to me in seminars and other discussions.)

Update: I've been informed that one of the papers mentioned in this discussion has just been withdrawn by the authors pending some re-investigation of the results. I'll leave the original text of the post up here for comparison, but add some new material in the relevant sections.

Last time you told me about what you said was an unusual observation of the ISW effect made in 2008.

Yes. Granett, Neyrinck and Szapudi looked at the average CMB temperature anisotropies along directions on the sky where they had previously identified large structures in the galaxy distribution. For 100 structures (50 "supervoids" and 50 "superclusters"), after applying a specific aperture photometry method, they found an average temperature shift of almost 10 micro Kelvin, which was more than 4 standard deviations away from zero.

Then you claimed that this observed value was five times too large. That if our understanding of the universe was correct, the value they should have seen should not have been definitely bigger than zero. Theory and observation grossly disagree.

Right again. Our theoretical calculation showed the signal should have been at most around 2 micro Kelvin, which is pretty much the same size as the contamination from random noise.

But you used a simple theoretical model for your calculations. I don't like your model. I think it is too simple. That's the answer to your problem – your calculation is wrong.

That could be true – thought I don't think so. Why don't we go over your objections one by one?

First of all, your calculations assume some values for the main cosmological parameters: the matter density, the Hubble parameter, $\mathbf{\sigma_8}$ and so on. But these parameters can take on different values, so essentially you've only ruled out one point in the parameter space of the standard model.

One of the referees of our paper raised the same point, and we did test it. It's true that for our main results we reported only calculations based on a single point in parameter space. But actually the dependence of the theoretical result on the parameter values is very small, and the freedom to vary them is quite constrained by lots of other experimental results in any case. If you want to keep the $\Lambda$CDM model consistent with all other observations, there just isn't enough room for manœuvre. The discrepancy remains.

OK. But your calculation is entirely linear. You even explicitly make an assumption about the linear growth of structures under gravity. We know that non-linear gravitational effects can be important; by ignoring them you're simplifying too much.

Obviously non-linear effects become important at some length scales. At very large scales, the universe is close to homogeneous; fluctuations away from the mean density are small and can be described by linear perturbation theory. At scales of tens of Mpc or smaller, fluctuations are no longer so small. If we were dealing with objects that small, our linear approximation would be questionable, I agree.

But the structures the Hawai'i group were using for their study were hundreds of Mpc across. If density fluctuations were large enough at those scales that linear theory didn't work, our understanding of cosmology would be rather dramatically wrong anyway! By the way, that's why I keep calling these structures "superclusters" and "supervoids": ordinary "clusters" and "voids" are very different things, much smaller in size, but with large (and non-linear) density fluctuations.

Yes, well, I still don't trust any theoretical calculation that isn't confirmed by explicit numerical N-body simulations. Also, your model assumed all structures were spherical. That's got to be unrealistic.

You've got a good point about sphericity, though it's a subtle one. Even if each individual structure is some non-spherical – probably elongated and filamentary – shape, so long as they are as a whole oriented randomly with respect to the axis along which the photon travels to us, the average effect of many structures should still be the same as the effect of one spherical structure.

But uncertainties in the measured redshifts of the galaxies could lead to a selection effect meaning that the structures actually chosen for use the observation weren't randomly oriented, rather the sample was naturally biased towards those structures that contribute higher-than-average temperature shifts (Shaun explains how this works here). So our assumption could have caused a problem.

As it turns out, it didn't. We know this because we decided we should listen to people like you and compare our model predictions with N-body simulations to simultaneously address both lingering doubts. Obviously actually simulating the gravitational interactions of a large chunk of the universe on a computer is an enormously difficult task and we couldn't do it, at least on our own. But Yan-Chaun Cai and others working at Durham University had already done so, and had put their data online for others to use (thanks!).

This is what Cai and others had produced:

Simulations of the ISW contributions to the temperature anisotropies of the CMB from matter fluctuations in two chunks of the universe lying at different distances from us. The patterns are different from the primordial anisotropies, which would normally drown the ISW contributions out. Image credit: Cai et al., arXiv:1003.0974.  
That is, a map of the ISW temperature shifts produced by the matter fluctuations obtained in a very large simulation of part of the universe.

What my collaborators in Helsinki and I did in a paper published this February was to take these maps and apply the same observational procedure on them that the Hawai'i group had used on the real CMB maps to measure the ISW signal, with one key difference. The original procedure had been to first identify structures in a galaxy survey, and then look at the real map in the directions of these structures. What we did was to effectively reverse the order of things, by first looking at the simulated map to see which (simulated) structures gave us the maximum temperature shift, and then seeing what that temperature was. This is always going to overestimate what the real observation procedure could see, because in real life you can't work backwards from the answer!

But what we showed was that even an overestimate like this didn't give a value anywhere near as big as was seen. In fact it was almost exactly what our theoretical model had said it would be:

The prediction from the theoretical model in dashed lines (red for superclusters, blue for supervoids) together with the values obtained from simulations in solid lines. The quantity on the $x$-axis is the filtering angle used; in the original observation this was $4^\circ$. Figure from arXiv:1212.0776

Has anyone else independently confirmed your results?

Yes – in fact a paper by Hernandez-Monteagudo and Smith, which came out just a day after our second one, and used a different large simulation and different reasoning, came to the same conclusions. In a $\Lambda$CDM cosmology, the expected effect cannot be greater than $\langle\Delta T_{\rm avg}\rangle$~$2\;\mu$K.

Ok, I suppose I can grant you that the theory calculation is correct then. But what about the observation itself? Has anyone checked to see that wasn't wrong? Could there be some unknown systematic error?

Well, yes that is clearly the most obvious thing to be concerned about.

One possible source of error is the galactic foreground. The Milky Way obscures our view of much of the sky, which complicates cosmology. Suppose some fraction of the things the Hawai'ian group thought were galaxies were actually foreground stars from our own galaxy contaminating the observed sample. Then some of the "superstructures" they thought they had seen might not have been really cosmological structures, but things nearby. Our galaxy also emits lots of radiation in the CMB wavelengths, so there could easily be some correlation between these stars mimicking galaxies, and measured temperatures. If those foregrounds had affected the observation, all bets would be off.

This is what the CMB sky really looks like before foreground contamination from the Milky Way is removed! 
Of course, you can take precautions against such contamination, by masking out the parts of the sky where part or most of the microwave radiation might be coming from our galaxy or from other known foreground sources. Naturally, this precaution had been taken. Nevertheless, some foreground radiation might have slipped through unnoticed. But such foreground noise would not have a black-body spectrum as perfect as that of the real CMB, and so would contribute very differently to measurements at different frequencies (WMAP measures the temperature of the microwave sky in five different frequency bands; the ones of cosmological interest are called Q, V and W). The thing to check is then whether this apparent measurement of the ISW effect depends on the frequency of the underlying measurement.

It doesn't. This was already noted back in the original paper, was rechecked subsequently by the same authors, and has since been independently checked by Hernandez-Monteagudo and Smith and also by Ilic, Langer and Douspis. Hernandez-Monteagudo and Smith also showed that even if there were some foreground contamination that managed to sneak past this test of frequency dependence, it would behave very differently from what was actually observed after filtering.

So we can be pretty confident the discrepancy isn't because of foregrounds then?

Yes, if we understand how foregrounds work (and thanks to extensive work by WMAP, we think we do), then they should not be the answer. Of course, the Planck team will announce their results very soon, and they might have discovered something revolutionary about foregrounds that has escaped everyone until now, but I doubt it.

But even if it isn't foregrounds, couldn't it be just a statistical fluke? Something like: you make lots of measurements, most don't show anything, just by chance one of them does, you end up only publishing the interesting result? That's related to the "look-elsewhere effect" isn't it?

That certainly could be a problem, I agree. In particular, two rather arbitrary aspects of the original measurement were the number of structures included (50 of each), and the size of the photometry filter used (a $4^\circ$ radius). Why those particular numbers? In fact if you changed those numbers, the observed size of the signal decreased – and therefore became more compatible with the theory expectation.
The variation in the size of the measured signal for superclusters (top) and supervoids (bottom) with the radius of the photometry filter. The three different measurements at each point are for the three frequency bands (Q,V,W). The signal is only measurably different from zero for certain angles. Figure from arXiv:1212.1174.

Well that's it then! If you'd properly accounted for the look-elsewhere effect, the observation wouldn't look so discrepant. It's just a statistical fluke after all! There's nothing wrong with our understanding of the universe!

Maybe … unless of course the same discrepancy shows up in a different measurement using independent data! Like how the two LHC experiments ATLAS and CMS combined gave more convincing evidence of the existence of the Higgs boson than either one alone.

You're going to tell me someone has seen the same discrepancy using a different data set.

I am. (Edit: I was – but this result was in the paper that has just been withdrawn.) In the last year or so, new catalogues of thousands of voids found in galaxy surveys have been published, which could be used to essentially repeat the original observation using independent data. These voids – and in this case most are actually "voids" and not "supervoids" as in the original data – have been identified from a different survey of galaxies, and are much closer to us (though covering a similar region on the sky). The redshifts of galaxies in this survey are much more precisely determined too, so the redshift smearing effect isn't a problem even in principle. Nor is there any chance some of them were actually interloping nearby stars.

The blue, cyan, green, yellow, orange and red voids are part of the new Sutter catalogue of 1146 voids. The purple ones are the 50 used by Granett et al. in the original measurement in 2008. Figure taken from arXiv:1301.5849.

Using these 1146 voids and the same CMB stacking procedure can provide a test of the original measurement that used only 50 voids. That is what is done in a recent very interesting paper (Cai et al.) which is the product of a collaboration between two of the authors of the original paper and the Durham cosmologists who produced the N-body simulations.

(Note: Cai et al. have just withdrawn this paper because of some labelling errors in the void catalogue, which may have affected their results. Pending further investigation on their part everything in the next few paragraphs could be subject to change.)

There were a couple of problems they had to overcome. Firstly, there are actually too many voids! The Sutter catalogue includes some voids that are not quite as empty as the rest – their categorization as voids is a little dubious in the first place, and they would contribute more noise than signal. So the sample needs to be thinned a bit, so that it includes only voids that definitely deserve the name, of which there are 776.

Secondly, these remaining 776 voids vary much more dramatically in size on the sky than the original 50, as you can see from the figure. So using a one-size-fits-all filter for the photometry will wash out all the signal. Instead Cai et al. rescaled the filter radius appropriately for each of the 776 voids. This allows the full statistical power of the larger number of voids to come into play.

The results are quite dramatic:

The average ISW temperature signal due to voids. The paper containing this figure has been withdrawn by its authors. 

Notice, first of all, that the size of the signal is very similar in each frequency band (the red, yellow and green curves in the plot). Then notice that it is much larger than the level of background noise (the black dotted curve): when including all 776 voids the S/N ratio is slightly higher than 3. If only the largest voids are included, the signal is larger, but so (obviously) is the noise.

And then notice difference between the coloured curves showing the observed temperatures, and the black dashed theoretical prediction, obtained from a simulation of the universe. In this case, our theory says the signal should be essentially zero. Instead it is much larger!

OK, I'm finally convinced. Theory disagrees with experiment. But what does it mean? How could we explain this?

Since the paper you just told me about has been withdrawn, I've decided I'm not convinced after all. I still think the result could just be a statistical fluke of no greater importance.

Fair enough. I'm a bit disappointed too, but I can't argue with that at the moment.

Still, I see you took a lot of time writing this blog post. So why not tell me anyway: suppose I were to pretend to be convinced that experiment and theory disagreed, what would this mean and how could we explain it?

This is the difficult – but also potentially exciting! – part. We don't have any answers yet.

The most mundane possible explanation would be that despite everything we thought, we didn't really understand foregrounds well enough, and ultimately they were contaminating the signal in some subtle way. Although this explanation doesn't involve any "new physics" of the theoretical kind, it would still be a pretty radical discovery, and would probably have rather important consequences for a lot of other stuff we thought we had understood about the CMB. I guess the much-awaited release of Planck data on Thursday will go some way to answering this.

(Edit: of course given the sudden withdrawal of further experimental confirmation, an even more mundane possible explanation is that it is just a somewhat unlikely – but not completely impossible – statistical fluke.)

If it's not an experimental problem, it must be a theoretical one. Somehow, our theoretical model of the universe either predicts too few of the very large density fluctuations that produce large ISW signals, or underestimates the size of the signal produced by the smaller fluctuations that do exist. On the face of it, the second of these two options is unlikely. I don't know of any way to construct a theory in which gravitational potentials cause larger changes to the energies of photons traversing them than in General Relativity, without violating all sorts of other experimental constraints. If you do, please tell me!

The first option – finding "new physics" which increases the number of the largest density fluctuations – might be more viable. This could happen if the primordial distribution of density fluctuations created inflation were non-Gaussian; a non-Gaussian distribution could differ from a Gaussian one more in the extremes than in the mean. Or it could be that the primordial distribution was as expected, but over the intervening billions of years matter clumped together at a rate different from that our models predict. Perhaps this is because on very large scales, GR doesn't correctly describe gravity? Or maybe the behaviour of dark energy at late times is the source of the problem, because it isn't the cosmological constant we believe it to be?

All of these speculations have problems. The magnitude of the ISW discrepancy is too large, and the agreement of the standard theory with all other data is too good: it's hard to get a consistent model that affects only this experimental prediction and not any others. But these problems are also opportunities. It could be that this is the key clue that leads to us to completely new cosmological theories, or it could be that it turns out to be a damp squib.

Either way, the journey towards understanding should be fun. (Edit: or disappointing.)


  1. I like this style.

    However that's not why I'm commenting.

    Yesterday Cai et al. withdrew their paper from the arXiv. Or, more specifically, uploaded a new abstract saying that they didn't trust their results any more. http://arxiv.org/abs/1301.6136

    I don't know what to make of the situation.

    I'm going to add an addendum to my post describing their paper. And possibly a new, short, post pointing out this development.

  2. That's interesting. Also annoying, because I spent a lot of time writing this blog post, and now it is out of date immediately! Still, better to learn the truth I suppose. I look forward to seeing their updated results.

    I'll adjust this post accordingly.

    1. Yeah, I was going to make a comment myself about your bad timing.

      I have to admit that I'm somewhat confused by their new abstract. The anomalous nature of their result is produced from the fact that the CMB along the chosen lines of sight is colder than randomly selected lines of sight. Their abstract indicates that their problems were in how they choose the lines of sight. This wouldn't change the fact that the previous method they used to choose the lines of sight gave an anomalous result.

      Still, it's all guess work until they upload a new version...

    2. Yes, if that were the only change, I agree it would be hard to understand how it could enhance the signal seen.

      But this reminded me that Ilic et al. had also noted that the Sutter void catalogue had changed between August and November 2012. I went back and checked their paper; they say that although Sutter et al described the changes as "minor", with "no impact" on their conclusions, in fact it contained 50% more voids, mostly small. Voids common to both August and November versions "have seen modifications in their redshifts, sizes and positions on the sky" and "the entire lrgbright subsample has changed almost completely with a much larger scatter in sizes and redshifts."

      So this would affect the rescaling of the filters, in which case all bets are off.

    3. No, I'm still confused.

      When they compare to the randomly chosen lines of sight they also rescale them. Therefore, still, even if they somehow chose the "wrong" lines of sight and "wrong" rescaling weights, they got a result that was much bigger than could have happened randomly.

      The only thing that I would have expected could have removed the statistical significance would have been that they'd done something wrong to the CMB, because that's where the signal is. But that doesn't seem to be the case at all.

      It's interesting to know the method that chose the lines of sight to work out what might be causing this, but to measure the statistical significance of the result it doesn't matter (so long as only one method was tried).

      At least that's my understanding of the matter.

    4. Ilic et al. found that using the newer catalogue completely wiped out the (already small) signal they were seeing with the older version. I understand your reasoning, but if Cai et al. find the same thing, it'll be hard to argue with the facts on the ground. (We will then have two puzzles to solve rather than just one.)