When using a detector to observe a process in an experiment it is important to understand what effects the detector has on the measurements one makes. While this understanding will always get better there are effects like the energy resolution about which you can not really do much. It would be desirable to have a way to "undo" all the effects a detector has on the measurements and get results which are directly comparable to a theoretical prediction.
Unfolding is attractive as it allows just this!
Contents
Unfolding 101
Unfolding then is the mathematical machinery that allows one to reverse the detector effects. Assume you are trying to measure the distribution of a quantity x, f(x), however your detector smears out (say you mis-measure energies) your measurements so that you are really measuring the distribution of a quantity y, g(y), which is x but "smeared out" by your detector. You can write this in matrix form as . Here the matrix
describes the effect that the detector has on the distribution f(x). If we knew this matrix A, then a first attempt to "undo" the effect of the detector would be to inverse it:
,
and apply it to the measured distribution and recover the true distribution. This turns out to be much trickier than appears at first. If we assume we fully understand the detector, which is usually not true, we can simulate our physics processes with a Monte-Carlo and then apply all the detector effects. Now you have two distributions the "truth", f(x) and your simulated "measurement", g(y). By comparing these you can get A, simple right? It turns out that in practice this is a rather tricky business, even if you assume you fully understand your detector. For a longer and more detailed discussion have a look at Roger Barlow's lecture on unfolding.
In particle physics the distributions f and g are usually thought of as a histograms.
Bayesian Unfolding
Unfolding based on Bayes' theorem uses the two distributions generated by a Monte-Carlo to obtain a "smearing matrix", given a certain physics process occurred what are the probabilities for each measured outcome. This is then used to calculate P(physics|measurement), the probability that given a measurement a certain physics process caused it. The method is described in more detail in A multidimensional unfolding method based on Bayes' theorem by G. D'Agostini.
The pyunfold program is based on this paper. It is meant to be a starting point for people trying to get their head round unfolding. In the following I will briefly discuss the various steps of the algorithm using normally distributed random numbers as my "data".
This code is an adventure in the dangerous jungle of unfolding, do not believe everything you see.
Training
For the training step I used a Gaussian as the "true" distribution (blue in the next figure) and then added a second Gaussian to this distribution to obtain a "measured" distribution (green pluses). This looks a bit like what one might get from a detector that smears out a "sharp" peak because it has a low resolution. Note that both of them are centred on zero, this will be important later on when unfolding data.
This probably correspond to the situation when we think that the only effect our detector will have is to "smear out" our measurements but nothing else. Not consistently over-measuring the energy or anything like that.
Undoing detector effects
In the real world there will usually be some effects which we did not considered which means that what we measure in our detector (green pluses in the next figure) might not look anything like the one we expected from Monte Carlo (green pluses in the previous figure). The question is now, is this because there is some physics or because we do not understand our detector? A way to recover the "true" distribution from the measured one is Bayesian unfolding. In the following image the best estimate of the Bayesian unfolding method are the yellow triangles. Compare this to the true distribution (blue) that underlies the observed one. Note that this true distribution is only available to us because we generated this data set ourselves, in the real world this is not available to us. Comparing the two by eye suggests that they are compatible with each other, our unfolding process successfully undid the effects our detector had and the data tells us that our theory expectation (mean zero, variance one) is in disagreement with the data (mean one, variance 1.3).
Observe that our "training data" was centred on zero and has a standard deviation of one. Our Monte Carlo measured distribution is also centred on zero and has a standard deviation of 1.411. We assumed that the only thing our detector would do is "smear out" the measurements. Compare this to our measured distribution which is centred on three! The question is now, will our unfolding process move everything around so that it agrees with what was used to train it (this would be a disaster) or will it tend towards the distribution used to generate these measurements?
The true data distribution used to generate this is centred on one and has a standard deviation of 1.3. This means that our understanding of the detector is not very good at all but also that the physics process which gave rise to our measurements is different to the one we assumed when training our unfolding process. It is clear from the figure that our unfolding process iterates towards the distribution used to generate this data set.
Software implementations or The magic revealed
If you want to learn how such impressive feats can be accomplished have a read of A multidimensional unfolding method based on Bayes' theorem by G. D'Agostini. You can read a scanned in version of A multidimensional unfolding method based on Bayes' theorem. While the method described in this paper has been implemented by the author and also by ROOunfold I decided to implement this method again in Python as an exercise and because it intrigued me to see how it worked. My code is available from a Mercurial repository called: Bayesian unfolding in Python. It is quite short but requires SciPy and NumPy to be installed. It generated the graphs you looked at above.
This code is an adventure in the dangerous jungle of unfolding, do not believe everything you see. Make sure you have your own snake repellent and look before you leap, it might be a trap
You can download an archive of the code: pyunfold-0.0.1.tar.bz2.
D'Agostini has implemented his algorithm in Fortran and it is available from http://www.roma1.infn.it/~dagos/bayes_distr.txt. It would be a good idea to pit this against pyunfold (using f2py) in order to convince ourselves that they come to the same answer. More about the Bayes' theorem you can read in the custom writing.


