SFB-1491 Graduate School: Astrostatistics

Section 3: Signifcance & Modelling

Dr Angus H Wright

2023-02-02

Section 4: Introduction

Parameter Simulation, Optimisation, & Inference

(or “Applying statistics in modern scientific analyses”)

We apply our understanding of Bayesian statistics to the common problems of parameter simulation, optimisation, and inference. Students will learn the fundamentals of hypothesis testing, quantifying goodness-of-fit, and parameter inference. We discuss common errors in parameter inference, including standard physical and astrophysical biases that corrupt statistical analyses.

The reason we’re here

Our goal in this course is to formulate a basis for performing statistical analyses, in the natural sciences, that you can use for the rest of your academic careers.

To do that, you need to be able to do the following:

Be able to explore and understand complex datasets (Section 1)

Understand the probabilistic nature of experiments and have access to tools that allow you to estimate models from data (Section 2)

Understand how to interpret models/results to perform accurate statistical inference (Section 3).

A Significant Conundrum

Modern and future experiments will never produce data that covers the entire population \(\Omega\) of possible observations.

We will always be attempting to analyses models of variables \(\theta\) using samples of data, and attempting inference using estimates of \(\theta\) that are random variables.

As a result, regardless of the experiment being undertaken, it is generally relevant to ask whether or not an observed relationship, parameter estimate, and/or measurement is “significantly” different from previous work and/or expectations from (e.g.) theory.

Said differently: whenever we measure a variable, it is sensible for us to ask whether or not the estimated value is consistent with our model and/or previous estimates, given the expected random fluctuations of a random variable.

A simple demonstration:

Suppose we have a theory that the true average height of all human beings is \(184\)cm.

Measuring the height of every human being is naturally unfeasible, so we are forced to take a sample of \(n\) humans and just measure their average height.

This estimate of the average height is a random variable, as it will vary from sample-to-sample.

We require a method for determining whether or not any difference between our estimate of the average height \(\theta\) and \(184\)cm is caused by random variation due to our sampling, or whether it demonstrates that the true average height is unlikely to be \(184\)cm.

A simple demonstration:

One method would be to construct some interval (given the data) within which you expect the true value of \(\theta\) to reside with some (quite high) probability: say \(95\%\).

If you construct this interval and find that our hypothesised value of \(184\)cm resides outside it, then we can draw one of two conclusions:

the value of \(\theta=184\)cm is unlikely to be correct; or
we just got very unlucky with our chosen sample.

This procedure provides us with a mechanism for determining whether the data that we have provides evidence to contradict a particular hypothesis.

Aside: the merits of contradiction

Why not come up with a measure of whether or not the data agrees with some hypothesis?

How much evidence does it take to prove something is true?
How much evidence does it take to prove something is false?

This is somewhat the nature of scientific inquiry:

No amount of evidence can give absolute certainty that a hypothesis is true, it can only fail to show that it is false.
However you only need one piece of evidence to disprove a hypothesis.

Significance

Given our observed sample of human heights, we want to assess the significance of the evidence against our particular hypothesis.

We can do this by calculating the fraction of samples of \(n\) humans that would produce a sample mean that is as extreme as the one we observe if the hypothesis is true.

Our hypothesis is that the population mean is \(\hat{\mu}\), and we observe some mean \(\bar{x}\) from our sample of \(n\) observations.

We approximate the variance of \(\mu\) using the variance of our sample \(s^2\), which gives us an estimate of the standard error on \(\hat\mu\): \(s/\sqrt{n}\).

We can then define a new random variable \(t\) which we call our test statistic: \[ t = \frac{\bar{x}-\mu}{s/\sqrt{n}}, \]

Significance

\[ t = \frac{\bar{x}-\mu}{s/\sqrt{n}}, \]

This is the student t-statistic. The distribution of this variable follows the student t-distribution, which is nicely analytic.

This distribution has an analytic PDF, and means that we can trivially calculate the probability of observing a sample of data that have mean \(\bar{x}\) given \(\mu\), \(n\), and \(s\).

Recall that the PDF of the student t-distribution is: \[ p_t(t; \nu) = \frac{\Gamma(\frac{\nu+1}{2})}{\Gamma\left(\frac{\nu}{2}\right)\sqrt{\pi \nu}}\left(1+\frac{t^2}{\nu}\right)^{-(\nu+1)/2} \] where \(\nu=N-1\) is the degrees of freedom.

Significance

The fraction of samples that have less extreme values of \(\bar{x}\) if the true population mean is \(\mu=\hat\mu\) is: \[ f=\int_{-|t|}^{|t|} p_t(x;\nu)\textrm{d}x \]

Consider a very positive (or very negative) value of \(t\); in these circumstances, \(f\) will be close to \(1\) (as \(-|t|\leq x \leq |t|\) contains all the probability mass of the distribution).

This indicates that essentially all random samples with \(\nu\) degrees of freedom would have less extreme values of \(t\) given the hypothesis that \(\mu=\hat\mu\).

A more convenient method of formulating this number is to look at its complement: \[ p=1-f \] which is the fraction of samples that have as extreme a value of \(t\) given \(\mu=\hat\mu\). This is known as the p-value.

The p-value

In this lecturers opinion, the p-value is easily the most misunderstood, misused, and/or misrepresented concept in statistics. So what is the p-value, and what is it not.

What does the p-value tell you: The p-value represents the fraction of samples that would produce a test statistic that is as extreme as the one observed, given that the proposed (generally null) hypothesis is true.
What is the p-value not tell you: The p-value does not tell you anything about the probability that the proposed hypothesis is correct, nor about whether or not the data can be explained as being produced by random chance.

The p-value

Nonetheless, the p-value is widely used in the academic literature as a tool for (classical) hypothesis testing, and/or for justification that experimental evidence is incompatible with the null hypothesis (i.e. that there is no underlying effect/difference).

Let’s assume that we are willing to believe an effect if it has a p-value of \(\alpha\) or less (otherwise you reject the effect in favour of the null hypothesis). The probability that you accept a hypothesis that is actually false is the fraction of samples that would give you a `satisfactory’ p-value even though the null hypothesis was true. But this value is just \(\alpha\). So you can consider the p-value as being the probability that you have accepted a hypothesis that is false.

Using the p-value

In his original \(1920\) publication, Fisher used \(p<0.05\) as an example of a value that might be used to justify rejection of the null hypothesis, when taken in the context of the entire experimental landscape.

We’ve said already what the p-value describes. So now a question:

Given purely random measurement bias, what distribution does the p-value take under many realisations of an experiment?

Using the p-value

\[\begin{aligned} p&=1-\int_{-|t|}^{|t|} p_t(x;\nu)\textrm{d}x\\ &=1-\left[\int_{-\infty}^{|t|} p_t(x;\nu)\textrm{d}x - \int_{-\infty}^{-|t|} p_t(x;\nu)\textrm{d}x\right] \end{aligned}\]

Using the p-value

Which makes sense; the p-value describes the fraction of samples that have more extreme values than that which we observed, assuming the null hypothesis. If the null hypothesis is true, then we should see a p-value as extreme as \(\alpha\) occur \(\alpha\%\) of the time.

But this begs an important question: if every scientist were to use \(p<0.05\) as a metric for “a significant result worthy of publication”, what fraction of published results ought to be false positives?

How much published research is wrong?

Let’s assume that we’re looking at a field of research where there are \(N\) ongoing experiments, all exploring different possible physical relationships. Of those \(N\) experiments, \(f_{\rm true}\) of them are real physical relationships. Finally, each experiment has a statistical power of \(s_{\rm p}\).

If all researchers use a metric of \(p<0.05\) as their determination for whether an effect is real or not, and researchers only publish when they find a significant result, what will the fraction of published results that are wrong?

How much published research is wrong?

\[ P({\rm False}|{\rm pub})=\frac{p\times f}{p\times f+s\times (1-f)} \]

So if \(p=0.05\), \(f=0.9\), and \(s=0.8\):

\[\begin{align} P({\rm False}|{\rm pub})&=\frac{0.05\times 0.9}{0.05\times 0.9+0.8\times (1-0.9)} \\ &\approx 0.36 \end{align}\]

Pratical Statistical Inference

So it’s clear that the use of a standard threshold for p-values as a measure of significance can lead to problematic numbers of incorrect results being published in the literature.

However, as we just saw, this effect can be calculated simply. So why is it a problem, provided that we can easily demonstrate the effect, and so account for it?

Why not, for example, use the high-energy physics mantra of \(p<0.001\) and be done with it?

published s0.8

The Problem is Choice

image label

At it’s simplest level, when provided an arbitrary dataset, our statistical analyses will involve two steps:

Data mining: where we explore and summarise the data; and
Data modelling: where we extract model parameters/trends, and test hypotheses.

However in reality each of these steps involves many stages: With experimental data:

Samples must be defined;
Observations must be taken;
Defective data must be identified and removed; and more.

When modelling the data:

formulate our hypotheses;
construct the likelihood;
perform our inference; and more.

In this lecture, we’re going to explore some of the dangers inherent to these processes. We will establish some of the fundamentals of hypothesis testing, specifically with respect to determining the significance of evidence.

However:

While we could simply go through the definitions, standards, and best practices for determining the significance of evidence, I think it is more educational (and shocking, and fun) to go about this from the opposite direction.

As such, now we’re going to discuss…

Bad Statistics:
(Non-exhaustive) Examples of what not to do

We will use simulated data and real experiments to show how poor use of statistics can lead to pathologically incorrect conclusions, in a (hopefully light-hearted!) effort to demonstrate the pitfalls that careless scientists can find themselves falling into.

This discussion of bad statistics will focus on a few main areas:

Variable Selection

Sample Selection

Data Modification

Additional Observation

Confirmation

Importantly: for the sake of this lecture, we are going to completely ignore the concept of confounding variables (which we spoke about at the beginning of the lecture course). This effect, in reality, makes much of what we are about to discuss much worse.

Variable Selection

We are scientists working to determine any interesting relationships present in our data.

Our dataset contains \(n=1e2\) observations (of galaxies, or particle collisions, etc), and we measured \(20\) different variables for each observation.

##              V1            V2          V3            V4           V5           V6           V7
## 1    1.37095845  1.200965e+00 -2.00092924 -0.0046207678  1.334912585  1.029140719 -0.248482933
## 2   -0.56469817  1.044751e+00  0.33377720  0.7602421677 -0.869271764  0.914774868  0.422320386
## 3    0.36312841 -1.003209e+00  1.17132513  0.0389909129  0.055486955 -0.002456267  0.987653294
## 4    0.63286260  1.848482e+00  2.05953924  0.7350721416  0.049066913  0.136009552  0.835568172
## 5    0.40426832 -6.667734e-01 -1.37686160 -0.1464726270 -0.578355728 -0.720153545 -0.660521859
## 6   -0.10612452  1.055138e-01 -1.15085557 -0.0578873354 -0.998738656 -0.198124330  1.564069493
## 7    1.51152200 -4.222559e-01 -0.70582139  0.4823694661 -0.002432780 -1.029208806 -1.622975935
## 8   -0.09465904 -1.223502e-01 -1.05405578  0.9929436368  0.655511883 -0.966955896  0.863896373
## 9    2.01842371  1.881930e-01 -0.64574372 -1.2463954980  1.476842279 -1.220813089 -0.511602773
## 10  -0.06271410  1.191610e-01 -0.18537797 -0.0334875248 -1.909152788  0.836207704 -1.917365025
## 11   1.30486965 -2.509255e-02 -1.20122205 -0.0709621812 -0.702439473  1.114971798 -1.865813849
## 12   2.28664539  1.080727e-01  2.03697217 -0.7589206537 -0.311430218 -0.412944853  0.245179063
## 13  -1.38886070 -4.854352e-01  0.10777474 -1.0343593609 -1.663157031 -1.128977398  2.223534302
## 14  -0.27878877 -5.042171e-01 -0.08410810 -0.6307319540 -0.750533442 -0.087931332  0.273376064
## 15  -0.13332134 -1.661099e+00  0.49561964  0.5868077200 -0.777351759  2.241903693  1.130784795
## 16   0.63595040 -3.823337e-01  0.03741519 -0.4163226562 -0.722569700  2.041313171  0.838669374
## 17  -0.28425292 -5.126503e-01 -0.13208804 -0.7848878095 -2.188834599 -1.719800337 -0.654615278
## 18  -2.65645542  2.701891e+00  1.47678742  0.1634163195  0.213418550 -0.356906656  0.953960978
## 19  -2.44046693 -1.362116e+00 -0.21703021 -1.2367142351 -0.631922936  1.533280492  0.352951465
## 20   1.32011335  1.372562e-01 -1.28360220  1.0458737762  1.520491194 -0.038244097  0.206599293
## 21  -0.30663859 -1.493625e+00  0.38566789 -0.4845954162  0.795955949  1.597413242  1.001113234
## 22  -1.78130843 -1.470436e+00 -0.35151287  0.1891288117 -1.453529565 -0.333585393  0.747451992
## 23  -0.17191736  1.247024e-01 -0.52179609  0.0510063316  0.098395421  0.604985924 -0.626574837
## 24   1.21467470 -9.966391e-01 -1.06813120 -0.0002406689 -0.593770984  0.224241239  0.395223686
## 25   1.89519346 -1.822614e-03  0.42836590  1.8093820424  0.888281169  3.229069495 -0.892167954
## 26  -0.43046913 -4.282589e-01 -0.17401823 -0.8253279571  0.053070415  0.920452567  0.630818410
## 27  -0.25726938 -6.136716e-01  0.51566773  1.1454704522 -0.557023626 -1.206539069 -0.432705183
## 28  -1.76316309 -2.024678e+00 -0.23436528  0.0315731876  0.438397036 -0.603881169  0.452138644
## 29   0.46009735 -1.224748e+00 -0.65850343 -0.8352058053  0.152608159  0.370235285  0.367999045
## 30  -0.63999488  1.795164e-01  1.25023660 -0.0687636490 -0.164617582 -1.901000647 -0.270387490
## 31   0.45545012  5.676206e-01 -0.27176372  0.7467716870  2.019890621 -1.804409945  0.465512618
## 32   0.70483734 -4.928774e-01  0.94795200 -0.4255187344 -0.529385886 -1.121822606  0.574356226
## 33   1.03510352  6.288407e-05 -1.20158243 -0.7720822352 -0.470786973 -0.347929607 -0.230699990
## 34  -0.60892638  1.122890e+00 -0.46611610  0.1527641067 -1.545936924  1.238902149  1.172242508
## 35   0.50495512  1.439856e+00 -0.26935140  0.9885968452 -0.040526723 -0.274197565  1.392702707
## 36  -1.71700868 -1.097114e+00 -0.39096541 -0.0734583347  0.890356305  0.162380716 -0.661991256
## 37  -0.78445901 -1.173196e-01  1.34870701 -1.3870265536 -2.071387851 -0.064606910 -0.777369207
## 38  -0.85090759  1.201498e+00 -0.02276470 -1.3066759044 -0.250065120 -0.705237097  0.513538579
## 39  -2.41420765 -4.697296e-01  0.24422585 -0.7683953251 -1.181650427  1.362197269 -0.913312238
## 40   0.03612261 -5.246948e-02 -0.94237171 -0.5271081254  1.441937265 -1.096513188 -0.449423805
## 41   0.20599860 -8.610730e-02 -0.72921728 -0.0214270650  1.357895539 -0.228433519  0.802932699
## 42  -0.36105730 -8.876790e-01  0.99806891  0.6704980710  0.334502847 -0.347828081 -0.573476851
## 43   0.75816324 -4.446840e-01  1.25848166 -0.4346170386  1.429338080  0.532128620 -1.928125168
## 44  -0.72670483 -2.944488e-02  1.24886369 -1.1138797833 -0.867317851  1.607234562  0.664390834
## 45  -1.36828104 -4.138688e-01 -1.38063705  0.6071059949  0.950651725  0.513814526 -1.602540224
## 46   0.43281803  1.113386e+00  2.04996069  0.2754569687 -0.585011509  1.382373161 -1.354600257
## 47  -0.81139318 -4.809928e-01  1.01687283  1.1573470694  0.320957523  0.763097049 -3.017932679
## 48   1.44410126 -4.331690e-01 -0.02671746 -1.6824808595 -0.299396017 -0.624584545  0.831237822
## 49  -0.43144620  6.968626e-01  0.70360778  0.0873190889 -0.278543083  0.081543800  0.251097089
## 50   0.65564788 -1.056368e+00 -0.97138523  1.3533618939  0.546115158  1.376079111  0.462293466
## 51   0.32192527 -4.069848e-02 -1.09615624  0.7241738007 -1.303821154 -1.561265786  0.844792223
## 52  -0.78383894 -1.551545e+00  0.04905045 -0.8325528258 -0.250914465  0.324770661 -0.041971524
## 53   1.57572752  1.167170e+00 -1.19849586  0.7325284868  0.171007374 -0.156790317 -1.105575906
## 54   0.64289931 -2.736457e-01  0.19001900 -0.8719268700 -0.403467479  0.877783861  0.563775652
## 55   0.08976065 -4.678453e-01  1.29770590 -0.4533975112  0.104659441  0.750166575  1.303364756
## 56   0.27655075 -1.238252e+00 -1.03387372  1.1875342786 -0.318880787  0.301053404 -1.500220938
## 57   0.67928882 -7.762034e-03 -0.73844075 -0.2901453118  1.618343936  1.492811117 -0.606989235
## 58   0.08983289 -8.002822e-01  0.04656394  0.8285461450  0.714188601 -1.525493800 -0.292245066
## 59  -2.99309008 -5.334923e-01 -1.01759612 -0.2912277088  2.965865370  0.910717558 -1.289683343
## 60   0.28488295  1.287675e+00 -0.38328396 -1.5763624047 -0.795077605 -1.579532028  0.694105849
## 61  -0.36723464 -1.755259e-01  0.87275541 -0.8488156969  0.814365915  0.587716257 -0.599181903
## 62   0.18523056 -1.071782e+00  0.96954501 -1.0885198620  2.098030810  0.089642296  1.256906644
## 63   0.58182373  1.632069e-01  0.38384667 -0.4842905711  0.300980055  0.967261657  0.053508014
## 64   1.39973683 -3.627384e-01 -1.85155566 -0.3363112090 -1.083075142  0.078812353  0.728092518
## 65  -0.72729206  5.900135e-01 -0.05399674 -0.1533578907 -1.006322502 -1.568699836  1.561098068
## 66   1.30254263  1.432422e+00  1.06477321 -0.2432472286 -0.035414565 -2.007823184  0.265624754
## 67   0.33584812 -9.926925e-01  0.81319504  1.8922020416  1.309124361  0.540973524  1.076726259
## 68   1.03850610  4.546503e-01 -0.19081647 -1.3859983373  0.750400487 -0.073376598  0.210697932
## 69   0.92072857  8.489806e-02 -2.69992981 -0.4148243007 -2.138368328 -0.571018394 -1.511673526
## 70   0.72087816  8.955656e-01  0.06096664  0.3490815281 -0.700354109 -0.311068460  0.022402260
## 71  -1.04311894 -2.297781e-01  0.57375170  1.6284422658 -0.009056475 -0.671415067  0.718136206
## 72  -0.09018639  8.366191e-01  0.04580358  0.0885218957 -1.458133487 -0.157340452  0.489457018
## 73   0.62351816 -1.745056e+00  0.15741254  1.2391507084  0.694529646 -0.931305071 -0.173888346
## 74  -0.95352336  1.689459e+00  0.43156537 -1.6445555358 -2.461335475 -1.983009479 -1.217699489
## 75  -0.54282881  8.647780e-01 -0.39654974  1.4463565254  0.143289764 -0.219599804  0.646398369
## 76   0.58099650 -1.507760e-01  1.30997823 -0.6905601717 -0.391222119  1.045275869 -0.916456028
## 77   0.76817874 -1.449007e+00  0.47039340 -0.2764310855 -0.491164086  1.877329566 -1.251823442
## 78   0.46376759  6.430087e-01 -1.24267027 -1.1094187599 -0.283647452  0.002606196  0.594927718
## 79  -0.88577630  4.831939e-01  1.38157546  0.1338693164  0.314794805 -0.080669935 -1.232810665
## 80  -1.09978090 -6.355626e-03  1.20445894  1.7853390517  0.396326578  0.962982281  0.244364415
## 81   1.51270701  1.514559e-01  0.82407396  2.4221633553 -0.225603711  0.053571017  0.002772185
## 82   0.25792144 -5.841090e-01 -1.66262940 -1.0768289021 -1.924950430 -0.434898409 -1.328209686
## 83   0.08844023  3.688067e-01 -0.56930634  0.4859411104 -1.439229298 -1.737297255  1.179696412
## 84  -0.12089654  2.946543e-01  0.63551382  1.3885217387 -1.469657774 -1.263695600 -0.592804999
## 85  -1.19432890 -2.792594e-01  0.04372201 -0.1956568173  0.761863447  0.406308512  1.199978316
## 86   0.61199690 -1.336237e+00  0.34801230 -0.2181747977 -0.243614982 -1.459653968 -0.475033680
## 87  -0.21713985  7.007488e-01  2.45959355 -0.3047779545  0.269676607  1.048457370 -0.575057096
## 88  -0.18275671  5.541966e-01 -0.81838032  0.5978327241 -1.558927521 -1.346430539 -0.031226025
## 89   0.93334633 -8.363066e-01 -2.11320011  1.3974294108 -0.535588007 -0.193570558 -0.358056996
## 90   0.82177311 -1.594588e+00  0.27369527  0.6876197612  0.562451973 -0.002335957 -0.356601078
## 91   1.39211638  2.049586e-01 -0.68759684  0.3201880143 -0.178326123 -0.012829734 -0.877664383
## 92  -0.47617392 -3.450880e-01  0.44604105 -0.3018699258 -0.115135986  0.151947322 -1.212897125
## 93   0.65034856  2.526117e-01 -0.81238472  0.4983486856 -0.072061472  0.598511131  0.613286619
## 94   1.39111046 -1.294002e+00  2.21205548 -0.5495369179  1.210909807 -0.126212437 -0.806203341
## 95  -1.11078888 -9.591704e-01 -0.12370597 -0.2792565036 -0.614896901 -0.248535566 -1.376456930
## 96  -0.86079259  1.085775e+00 -0.47733551  1.0965134431  0.676126458  0.160327395 -0.507847899
## 97  -1.13173868  4.037749e-01 -0.16626149  0.4420130882  0.898599606 -0.433641942 -0.800935487
## 98  -1.45921400  5.864875e-01  0.86256338  0.2410162940 -1.189317904  1.537412419 -2.192785686
## 99   0.07998255  1.815228e+00  0.09734049 -0.2556076554  0.121258850 -2.170246577 -0.290937149
## 100  0.65320434  1.288214e-01 -1.62561674  0.9310329015 -0.011221686  1.027004619  0.167174121
##               V8           V9          V10          V11         V12          V13          V14
## 1    0.294692356  0.688807752  0.941924217  2.325058494 -0.65028443 -0.746516452 -1.185546937
## 2    0.392741265  0.725083018 -0.248614037  0.524122181 -1.00318323  0.036606120 -0.658389291
## 3   -1.000843713  0.217380211  0.096478860  0.970733416 -0.53511391  0.323309624  1.089507949
## 4   -0.325727120 -0.201656732 -0.433930941  0.376973397 -0.11041457  0.379676032  0.508785941
## 5   -1.008348805 -1.365689861  2.178667867 -0.995933397  0.60043017  0.876556496 -0.135906626
## 6   -0.635431482 -0.308937609 -2.958779619 -0.597482913  0.41584506  0.933387988 -0.108782737
## 7   -1.209840688 -0.452902887  0.080888239  0.165251423 -0.10575065 -2.428807523  0.754900160
## 8   -1.116463801  0.663229145  0.110137805 -2.928477179 -0.85656254  1.727994023 -0.223811322
## 9    0.629881163  1.308629537  0.213448326 -0.847914227  1.12732733  0.456002646  0.074955171
## 10  -0.272521570  0.501040310 -1.557819613  0.798584512  0.91628162 -0.570360346 -1.645954802
## 11  -0.258841168 -1.128288535  0.216211546 -0.298455988 -0.72437959 -1.114624402  1.774009365
## 12   1.729558180  1.670997305  0.187663817 -0.283611377  0.68677909  0.905064266  0.765967916
## 13  -0.058392165  1.010353032  1.258622125  0.869519307  0.45008579  0.328096015  0.832287843
## 14  -0.537063785  0.223521216  0.523517781 -0.544355285  1.04522161  1.078089960 -1.905457522
## 15   0.747286696 -2.206484593  1.115454277  0.628803235  0.14962903 -0.060315554 -0.020836676
## 16  -0.487257835 -0.954585619 -0.957502242 -1.422334458 -1.14106715 -0.243518762 -0.408933912
## 17   1.372907781 -0.068573067 -0.124405580 -1.227512633  0.80249940  2.241422561 -1.376386307
## 18  -0.377672361  0.761305902  0.191737829 -1.674105516 -0.72871653 -2.035992971  0.727430374
## 19  -0.616152614 -1.179904200  0.272217251  0.084398482  0.52720929  0.390914465  1.234130781
## 20  -1.168125051  3.211198957 -0.693814464 -0.206125664  1.67347877  0.384813343 -0.553224406
## 21   0.328640359 -2.553824851  1.479171921  1.441871640 -1.41882802  0.438695600 -0.229190178
## 22   1.466510627 -0.235933949 -0.611767022 -0.041782103  1.56191080  0.558140902  1.711461989
## 23  -0.356009545 -0.259562643 -1.614309974  1.353754478  1.35316101 -0.276406383 -1.649807903
## 24   0.261467642 -0.663366917  0.402490111  1.945225276 -0.52869678  1.166288079  0.744220785
## 25   0.333328855 -0.318990711  0.664555447 -0.490938234 -0.25127278 -2.454277235 -0.314021324
## 26   1.422193240  0.742395158  0.944127071  0.388439075 -0.68584769 -0.805566259 -0.115310841
## 27   0.663876601 -0.874292676 -0.946260798 -0.844893259 -0.57036765 -0.119144551 -0.610974343
## 28  -1.073655156 -2.082813766 -0.248288449  0.737990368  0.57963138  0.163216234  1.096695379
## 29  -0.696901775  0.093767903 -1.328436469 -1.079760331 -0.89895901  0.406479508 -1.127167611
## 30  -0.746130457 -0.001819812  0.844763742 -1.026473876 -0.19032989  0.639339996  0.962609492
## 31   0.141572872 -0.013100652  0.347696732  0.288793362 -0.14390338 -1.508517637  1.406464016
## 32  -0.003947221  0.667968075  0.913762586  0.090811319 -0.09657265  0.007670604 -1.641649509
## 33   0.367937501 -0.013167703  0.608274553  0.262623177  0.18144672  0.524168022 -1.126904193
## 34  -0.657342915  0.776047030 -0.607547988  0.069334913  1.59657421  1.326335872  0.591545481
## 35  -0.376346670 -2.010735333  0.179658071 -0.528643819 -0.45133174 -0.113567734 -0.956309120
## 36   0.741360060 -1.128180495 -1.908409419 -0.130202165 -0.79488022  1.599149478 -0.691529886
## 37  -0.099606756  0.348799572  0.502946258  1.620158744  0.79228435 -0.281528099 -0.445352767
## 38  -0.654289879 -0.352898482  0.968899153 -0.017895123  0.38047230 -0.049898139 -0.344484228
## 39   0.971164374  0.944774598 -0.875737126 -1.318489399  0.23578798  0.160214127  0.703698185
## 40   0.013496312 -1.004719853 -2.136024539 -0.844560161  0.59098826 -0.502380562 -0.603005765
## 41  -0.916534667  0.723902732 -1.522827582 -1.101814872 -1.41193020  0.715938049  0.089350412
## 42   1.709688561 -0.668832741 -1.113118208 -0.900090143  1.06329579 -1.345787303  0.737494637
## 43  -1.168100990 -1.113040194  1.240983896 -1.261083920  0.45391788 -0.005643565 -1.540379445
## 44  -1.781036260 -0.342805189  0.003481935 -2.625848902  0.95131298 -0.540542550 -0.689632293
## 45  -2.253132449  0.049779323 -1.237794586  0.669065749 -0.59851771 -0.547073741  0.859793460
## 46   0.651125970 -1.227681962  0.555699119  0.660041509 -1.84194005  1.151445387 -0.376519741
## 47  -0.532832626 -0.764005911 -2.183149349 -0.250600435 -0.48458953  1.052219682 -2.535083470
## 48  -0.275559357 -1.246182062 -0.247024457 -0.723797317 -0.90684682  1.598026529 -0.857809229
## 49   0.289626973  1.016773978  1.112856928 -0.812181632  2.22276245 -0.263796793  0.816026407
## 50  -0.466484062  0.723612624 -0.341673237  0.398734336 -0.41346658  0.032400788 -0.426588909
## 51  -1.608060175 -1.032527137  1.305438342  0.226563574 -0.21567385 -1.551583308 -0.404636840
## 52  -1.949783796  0.557346004  1.795323871  0.421643580  1.06297852 -0.388179430 -0.145769224
## 53  -0.340699918 -0.255581423 -1.167835227  0.004837944  0.62034653  1.764148122  0.719466368
## 54   0.174725527 -1.113391757 -1.152287625  0.618983915  0.67306665 -0.416527728 -0.491090752
## 55  -2.277777561  0.421197440 -0.914455469  0.431133944  2.63370956 -1.166338689  0.475958438
## 56   0.290524481 -0.322950312  0.528728033  0.558312370  0.16611440  0.041894566  0.573153668
## 57   0.422306357  0.880179901  0.835078927  0.495961281  1.90517554  1.246489130 -1.018859578
## 58   1.294737363 -0.194863607  1.209136974  1.661885741  0.25371840 -1.577971304 -0.061872234
## 59   0.164714019  1.188054982  1.299472545 -1.055034612 -1.45902197 -0.253820472 -0.538305102
## 60   0.204953841 -0.509764512  1.075732304  1.508324687 -2.22724824  0.467830542  0.416954726
## 61   0.604603627  0.219548011  0.939801024 -0.334208657  0.64968355  1.977529994 -1.508276656
## 62  -0.019140769  0.370294421  0.163374693 -0.064980744 -0.99924986 -0.615605795  1.539169928
## 63  -0.079714347  0.279108394 -0.888138559  0.081081539  0.04197238  1.519421601  1.346018089
## 64   0.115984464  0.201942437 -0.726160983 -0.448214796  1.77704370 -0.432895194 -0.842855778
## 65   0.744173364 -0.012996542  0.407597006 -2.553806950  0.34718268 -1.031017708 -0.669874228
## 66  -0.431290604 -0.090865006  1.683536339 -0.312932535 -0.28709745 -1.134334498 -0.908487448
## 67  -0.499528952  1.365031365  1.705979824  0.041383152 -1.92457665 -0.221098125 -0.136540642
## 68  -0.865162077  0.908130569 -1.926166701 -1.737728061 -0.80064911 -0.022414699 -0.501037963
## 69  -0.957757283 -0.609888171 -0.833998015  0.549939660 -0.57561797 -0.322901828  0.112638447
## 70   0.326799971  1.396437221  1.419132516 -0.597273693  0.68420573  1.052223160 -1.875217563
## 71   1.547372226  0.144798373 -1.064109715  1.447741806  0.48345735  0.189541166  0.158546714
## 72  -0.968859569 -0.640606772 -1.613577472  0.264868114  0.01396341 -0.105827609  0.015562671
## 73  -0.188440426  0.169802260  0.599811089 -0.788782808  0.18024373 -0.602554473 -1.600603461
## 74  -1.030001161 -0.157187487  1.732964705  0.167732221 -0.28812325 -1.296015931  0.275973089
## 75   0.908086442  0.100939595  1.934504562  0.257583108  0.03022722 -1.959326639  0.054952640
## 76  -0.317381865 -0.973394154  0.532820576  0.912047367  0.27167411  0.213411731 -0.230573355
## 77   0.179003967 -0.819825275  0.719665756 -0.156872383  0.44915843  0.025335870 -0.013216291
## 78   0.348028188  1.362918969  0.560093386 -2.033375708  1.82762766  1.365023188  0.384949196
## 79  -1.054279021  0.961370931 -0.018086932 -1.299665823 -1.11202371  0.291093808  0.346723717
## 80  -0.104744288 -0.883724474 -3.371739083 -0.857250660  1.47480743  0.797217701 -0.431612761
## 81  -0.228342617 -0.900092115  0.760703245  0.426265295  0.67151246  0.040986374  1.150178039
## 82   0.675355562  1.723333133 -0.391226852 -0.284836263 -1.54532854 -0.818324050 -0.036532193
## 83  -1.233244694  1.909042162  1.009901224 -0.614368542 -0.27704723  0.289640643 -0.135801422
## 84  -1.199961544 -0.777141196 -0.626973147 -0.844890832 -0.62934299  0.339787111  0.944618046
## 85   0.765866577 -1.302305263 -1.634577370  0.153694957 -0.93300045 -0.817392731  0.992041362
## 86  -0.588097861  2.623495167  1.245242652 -0.851432276 -1.53788648 -0.028749315 -0.700669755
## 87  -0.660295830  0.229628914  0.181377447 -0.397921073 -0.49410205  2.000781651 -0.991058672
## 88   0.113013522  0.186749892  3.495304264 -0.939977226 -0.43536002 -1.129266063 -0.705618053
## 89  -0.320398800  0.076136635  0.915591913 -1.388253741 -0.32354468  1.472178592 -1.175775893
## 90   1.866381497  1.404860186  1.048506517  0.955184589 -2.06111362 -0.227068916 -1.781196952
## 91   0.259531456 -0.191722235  0.763824937 -1.317495470  0.44150256  1.482310596  0.310607522
## 92   0.161560071  1.459392164 -0.603387377  0.073029146  0.74635125 -0.843053174 -0.145110361
## 93   0.931074909 -0.220655451 -0.370429663 -0.568185849  0.78964287 -0.066195327  0.642549754
## 94  -0.059946749  0.505145886  1.059825938 -1.058450669  0.77074051  0.311233498 -0.001659362
## 95   0.048739700 -1.033497006  1.055110530  0.259313538  0.20050632 -0.112590228  0.409162962
## 96  -1.072875401  0.170473471  0.582972161  0.373134505  1.46835003  0.658738839 -0.496058147
## 97  -2.292971430  1.200668161 -0.986567436  1.264666769 -0.87645548 -0.166861500 -0.960107959
## 98  -1.207206850 -0.163405915  1.684621812  0.325224996 -1.22660469  1.208873465  0.778132225
## 99   0.114109430  1.282475863 -1.366841046 -0.138467320  0.33783792  0.170419434  0.761120616
## 100 -1.033297079  2.727196388 -0.433214349  2.602468897  0.44082409  0.692419636 -1.733712184
##              V15          V16         V17           V18         V19         V20
## 1    0.877294652 -0.601382998  0.02931665  1.4482016746 -1.22133337  0.09782704
## 2   -1.773371439 -0.135816137  1.93431434 -0.0858556606 -0.45298599  1.46502103
## 3   -0.045687324 -0.987272785  0.83751088 -1.5737698017 -0.70231993  0.99043585
## 4   -0.394872254  0.831925015 -0.18534633  1.2924761595 -0.67438039  1.41438675
## 5   -0.128056266 -0.795059516  0.53332918  0.0844677556 -2.30302319 -1.33589777
## 6    1.096237732  0.340464612  1.78495624  0.9670459965 -0.14933290  0.57958364
## 7   -1.255217861  0.870429820  0.18962782  1.9391129942  0.94868959 -0.36182121
## 8   -0.265483695 -1.182161005 -0.56908751  1.0798726382  1.15636246 -0.04685245
## 9    2.553302490  1.022893976  1.45189294 -0.3535935268  2.07435249 -0.59947747
## 10  -1.478306306 -2.108434576  1.86899345 -0.1349231247  0.26528997  0.21510449
## 11  -0.626587568  0.229762638 -0.98455489 -0.9871324333 -1.05200383  0.09363149
## 12  -0.041052022  1.511710792  0.71875676 -0.8217081461  0.50234224 -2.11983442
## 13   0.199953220  0.555437989  0.12934730  0.4895309225  1.09984502  0.85244309
## 14  -0.533305601  0.914878801  0.70164131 -0.2992774589  1.88650483  0.15437457
## 15  -0.266032318 -0.553396624  0.52669783  0.6582548986  0.81125350 -0.97186164
## 16   1.084147814  0.298376090 -0.02051310  0.8324095497  1.41824381 -1.10142784
## 17  -0.180745480  1.106379540  0.43825105  0.3634632612  0.43916990 -0.69024489
## 18   0.251555319 -1.329149585 -1.21052857 -0.3986456586 -1.27750161 -1.58363492
## 19   0.369542235  1.001694129 -1.40441994 -1.4979299895 -0.86137321  0.13776607
## 20   0.407034259 -1.409303874  0.72074916  0.1146281753  1.88698587  1.50234395
## 21   0.492505019 -0.498781025  0.08603550 -1.0378166173  0.70408731  0.19288159
## 22   0.054632579  2.622756852 -0.13161704  0.6459788713 -0.66772053  0.52715473
## 23  -0.708674076  0.510665566 -1.67605198  0.7728454001 -0.70571434 -1.48320629
## 24  -0.058270752 -0.090843194 -0.56949799  0.3160141581 -0.92071858 -1.53920072
## 25  -0.963421822  0.978765679  0.19220515 -1.0381128248  0.14847329 -0.89643289
## 26   0.093393297 -0.243808657 -0.48632292 -0.7551372901 -0.25001736  0.72106998
## 27  -0.347901133  0.621808959  0.08128401 -0.4915043097 -0.94176895 -0.88450194
## 28  -0.118334438 -0.344595220  0.38489725  0.1472257451  1.21419902  0.55859853
## 29  -1.099090957 -2.355224546 -0.13750197  2.3565030433  0.01856643 -0.21873750
## 30  -0.283076412  0.472387633  0.42759311 -0.9206687671  0.75783170  0.38362348
## 31  -1.089734673 -0.511480195 -0.24306445 -2.1568477832 -0.38006270 -1.03912011
## 32  -0.405665544 -0.573602200 -0.31835283 -0.1204000496  0.17494871 -0.94899225
## 33   0.526907260  0.003797204 -0.25410639 -0.8392443296  1.03352331 -0.10416095
## 34   0.240419982  1.342550930  0.63888743 -2.1724399087  0.67572400 -0.20871110
## 35   0.898362457  0.295461512  0.82106538 -0.8743079762  0.94128124 -1.78873020
## 36   0.487968127  0.841786944  2.33547909 -0.3735793489 -1.82663007  0.53708442
## 37   0.111991365  0.629064298 -0.98330756 -1.1021977121  0.17720438 -1.94969607
## 38   0.998313347  0.101598399 -0.70745744  0.7156598607 -0.19180786 -0.41523446
## 39   1.191716809  0.671120875  0.64414910  0.9076403057 -0.65019871  0.05512967
## 40  -1.399387567 -0.138316375  1.63766533 -0.3248051338 -0.56069634 -0.70199972
## 41   0.892147737 -1.249312253  1.22664817 -1.4815859323  1.67576579 -1.40383903
## 42   0.777958970  0.347635244 -0.46770562  0.5941711389  0.43053460 -0.39882393
## 43   0.941914516 -2.277813659 -0.67823106 -0.8548746871 -0.14460662 -0.11042180
## 44  -0.908802712 -0.986209181  1.69640227 -0.8299175366  0.93902571 -0.92407845
## 45   2.408930844 -0.059541344  1.54284685  0.1491068150 -0.54549675 -1.35724300
## 46  -2.594250174 -0.160759490  0.21452759 -0.9761460414  0.13885036 -1.19460448
## 47   0.006863038 -0.420361520  1.70068830 -0.2946844077  1.44686231 -0.52848778
## 48   0.118106669  0.988591016 -0.00798683 -0.7709749158 -0.33262809 -0.45262689
## 49  -0.149697326 -1.069613791 -0.81790036  0.3922837071 -0.79915627  1.41948519
## 50   0.747873288 -2.613643467 -0.05061017 -0.8463059049 -0.54829275 -1.28568114
## 51   0.124034807  0.780451898  0.07992109 -0.8437599309 -1.05079620  0.38114894
## 52  -0.778054747  0.045105074 -1.55957110 -0.9618884443  0.25266249 -0.41746860
## 53   1.410033305 -0.108489159 -0.17149796  0.0110496534  0.32786185  0.09286142
## 54  -0.293911808 -0.391224310 -0.41860709  1.2149489654  0.85567240 -0.66358454
## 55   0.610678645 -1.686999504  0.49128682 -0.8888785157  1.80176227  0.80282739
## 56  -0.666510080 -0.952704455 -0.53122843  0.2181012614  1.40524081 -1.70308265
## 57  -1.059882878  0.507414112  0.48222012 -0.6561169996 -1.25079179  0.15390707
## 58  -0.605838699  0.018816099  0.53629104 -1.0885158473  1.99910885  0.02607550
## 59  -1.056841421 -1.418910374 -0.27932421  1.0911308981  0.13927849  1.05586338
## 60   0.533106473  0.598171931 -0.04350726 -0.5692273217 -0.38666910 -0.22446711
## 61  -1.552092615  0.786379622  2.38151529  1.6208101306  1.85563569  0.49659639
## 62  -0.978489465 -0.134633046  1.54803111 -0.6869884456  0.42885538  0.39963395
## 63   0.027218355 -0.016386518  1.55476200  0.6569967036  1.73079014 -0.64691275
## 64  -1.715572317  1.409140408  0.05811123  1.1106922152 -0.04352367 -0.08376276
## 65  -0.504888026  0.394838699 -0.83977892  2.1407207301 -0.73108581 -0.13362120
## 66  -0.304922681  1.219875128  0.57112345 -0.8309718091 -1.96596408  1.11960386
## 67   1.346078235 -0.045077213  0.06604778  0.0764058562  1.12686532  0.23922097
## 68  -0.257536691  1.479487845  3.58465949  0.9358244462 -0.88808439  0.29060222
## 69  -0.523287301 -0.075683787 -1.93189555 -0.5016667029  0.87135926  2.34360142
## 70  -0.663217799  0.917748457 -0.20283250  0.5948534639  0.11989357  0.41629811
## 71   0.447221040 -0.344778714  1.48371520 -0.4306511691  0.47243494  0.36248108
## 72  -0.319450982 -1.077002656 -2.09851640 -1.1333390527  1.26663991  0.53168263
## 73  -0.543778850  1.001604727 -0.25964867 -2.2371715598 -0.48531638  1.54804224
## 74  -0.279673442  2.296004041  0.33406794  0.5146395724 -0.46153359  0.96148830
## 75   1.065232600 -0.325878115 -0.21963759 -0.3419071152 -0.23175834  0.33843618
## 76   1.567828749  1.468856129 -0.93177180  1.7178774288  1.06035377 -0.61095300
## 77  -0.643112079 -0.236908933 -0.12224022 -1.1163779087 -0.47478570  0.35705205
## 78   0.020521749  0.923493118  1.33852842 -0.4884796192 -0.17551028  0.14150868
## 79  -0.324184458 -0.989191807 -1.34833119  0.8073322934  0.79632450 -1.83629134
## 80   1.857396022  0.275646278  1.41483572 -0.2610087417  0.77874532 -1.54103105
## 81   0.590763995 -0.763661094  0.91404895  0.4880599178  0.79680257 -1.79438286
## 82  -1.780626717 -1.352111984 -1.23668792  1.3527838210  0.04905157  1.41451491
## 83  -0.434108700  0.655024410 -1.50076035  2.1768155913  0.76261293  0.61833789
## 84  -2.292749712  1.609646760  0.03453868  0.0513871112  0.28599307 -2.35045539
## 85  -0.092965238 -0.788506966  0.28629965 -0.8380954452 -0.63281504  0.46703159
## 86   0.703964453 -0.883876113  1.18620809 -1.0846150093  1.80438448  0.06383194
## 87  -0.894489621 -1.072514612 -0.03754937 -0.5465327691 -1.86301097 -0.25626044
## 88  -1.282506792 -0.458140981  0.42037159 -0.4265097047  0.84552838 -0.74452204
## 89   1.255784174  0.665647016  1.28924648 -1.2109975554  0.34166768 -1.26902528
## 90  -0.964629980  0.712228212  1.17068919  0.5977929978  0.02277761  0.86262801
## 91   0.907783881  0.442954227  0.82609365  1.4238957891 -0.61139999 -0.57693467
## 92   0.214073614  0.759587312  0.32391707 -0.4432986256 -0.26052677 -0.13226735
## 93   0.881023123 -0.468877666  0.38095041 -1.0403334657  1.80567031  0.58682810
## 94   1.002544998 -0.172709978  1.20261958 -1.2738970784 -0.09699927  0.53500028
## 95   1.139910505  0.682049899  1.10220575  0.7701516120 -0.38448670 -0.39340994
## 96   1.236035852 -0.281577328 -0.06151494  0.0315226220 -0.87836214 -0.54060942
## 97   0.295401510 -0.961622574  0.36458991 -1.9048547514 -0.49557852  0.25945092
## 98   0.171761024 -1.377650145  1.69255230 -0.1237551771 -0.08429245 -1.17827876
## 99  -0.955389283  2.697586637  1.47136701  0.0003283772 -1.35772207  0.30613848
## 100 -1.210409121 -1.699715056 -0.97724766 -0.8437783447  1.14269782  0.17348123

We have theoretical expectations of what the data ought to show for each of our variables, which we have already subtracted from each column.

Variable Selection

So the null hypothesis in these data is always \(\theta_i=0\), and we can compare how our data differs from the null hypothesis using a t-test.

So let’s look at our first variable:

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -2.99309 -0.61669  0.08980  0.03251  0.66156  2.28665

## 
##  One Sample t-test
## 
## data:  obs$V1
## t = 0.31224, df = 99, p-value = 0.7555
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  -0.1741130  0.2391426
## sample estimates:
##  mean of x 
## 0.03251482

Nothing significant there… what about for our second variable?

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -2.02468 -0.59150 -0.06929 -0.08748  0.46179  2.70189

## 
##  One Sample t-test
## 
## data:  obs$V2
## t = -0.96755, df = 99, p-value = 0.3356
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  -0.26689135  0.09192394
## sample estimates:
##   mean of x 
## -0.08748371

Also nothing… let’s keep going…

Variable Selection

The fourth variable:

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -1.68248 -0.53272 -0.04569  0.03294  0.67478  2.42216

## 
##  One Sample t-test
## 
## data:  obs$V4
## t = 0.3759, df = 99, p-value = 0.7078
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  -0.1409202  0.2067931
## sample estimates:
##  mean of x 
## 0.03293646

Variable Selection

… the ninth…

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -2.55382 -0.66473  0.02398  0.06146  0.72420  3.21120

## 
##  One Sample t-test
## 
## data:  obs$V9
## t = 0.59221, df = 99, p-value = 0.5551
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  -0.1444566  0.2673706
## sample estimates:
##  mean of x 
## 0.06145701

Variable Selection

… the fourteenth…

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -2.5351 -0.6938 -0.1362 -0.1486  0.6043  1.7740

## 
##  One Sample t-test
## 
## data:  obs$V14
## t = -1.6282, df = 99, p-value = 0.1067
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  -0.32963497  0.03248961
## sample estimates:
##  mean of x 
## -0.1485727

Variable Selection

… the seventeenth…

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -2.0985 -0.2646  0.1909  0.2741  0.8566  3.5847

## 
##  One Sample t-test
## 
## data:  obs$V17
## t = 2.6839, df = 99, p-value = 0.008531
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  0.07145775 0.47674746
## sample estimates:
## mean of x 
## 0.2741026

Aha!! We’ve found a significant relationship! The \(17^{\textrm{th}}\) variable is discrepant from the null hypothesis with a p-value of 0.0085308.

We write up our discovery, publish the result, and our discovery is enshrined in the literature forever.

What is the problem with this?

The process I’ve described above is known as data-dredging, the look-elsewhere effect, or the problem of multiple comparisons.

The core issue is that we’re looking at many different chunks of the data, any not taking that into account when we decide whether what we’ve found is significant.

Recall the p-value for many experiments:

We have used in this example a threshold of \(p<0.05\). We therefore expect to find this p-value given random fluctuations in \(1\) out of every \(20\) cases. In our example we have \(20\) variables. So it makes sense that we found a “significant” effect for \(1\) variable.

How can we combat this?

We can correct for this effect by modifying the threshold that is required for determining “significance”, to account for the fact that many variables are under analysis.

The simplest example is the Bonferroni correction, which simply states that the threshold for significance when analysing \(m\) different variables ought to be \(\alpha^\prime=\alpha / m\).

However there are many possible corrections. In R there are a number of them inbuilt, which we can run over our simulated data:

Real World Example: An Empathetic Fish

Do fish feel empathy?

This was a question posed by a group of researches working within the functional magnetic resonance imaging (fMRI) community in 2009.

fMRI studies use the magnetic resonance to produce highly detailed internal images of people (or in this case, fish).

The field uses analysis techniques that are designed to identify activity within (particularly) the brain that can be correlated with an external stimulus, in order to identify parts of the brain that are responsible for different things, or to just demonstrate that comprehension is occurring.

The case of this experiment was to show whether or not an Atlantic Salmon would react differently when shown images of people, rather than images of inanimate objects.

Real World Example: An Empathetic Fish

The researchers placed the fish in an MRI, and presented it with images of humans and other pictures.

They analysed the data using standard processing tools, and found a significant discovery of activity in the brain of the salmon that correlated with the researchers presenting the fish with images of humans, as opposed to objects.

The problem?

Real World Example: An Empathetic Fish

The salmon was frozen at the time of study

It was a frozen Atlantic salmon, bought from a fish-monger.

Data Modification

Data modification can take a number of forms, however the most common are selecting specific subsets of data and/or rejecting certain portions of the data that are deemed to be “outliers”.

Data modification need not be malicious, or even intentional. At its weakest, we may simply discard data that we expect to be outliers.

At its most malicious, it involves hand-selecting data that suit your hypothesis. These processes are generally referred to as cherry-picking.

Data Modification

Suppose now that we set a more strict requirement on our p-value, \(p<0.01\), and that this is the first variable that we looked at (so no modification to our threshold is required).

We’re not quite there with our dataset:

## 
##  One Sample t-test
## 
## data:  obs
## t = -1.8105, df = 99, p-value = 0.07326
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  -0.38179663  0.01748116
## sample estimates:
##  mean of x 
## -0.1821577

But what about those two pesky data points at \(\sim 2\)?

Maybe we can convince ourselves that one of those is an error, because of something that went wrong in our experiment? We convince ourselves to drop one of those data points (after all, it’s only \(1\%\) of the data!). What happens to our p-value?

## 
##  One Sample t-test
## 
## data:  obs
## t = -2.1162, df = 98, p-value = 0.03686
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  -0.4028849 -0.0129442
## sample estimates:
##  mean of x 
## -0.2079145

Off to the journal we go!

Data Modification

This is an example of cherry picking that is very easy for researchers to fall into.

This is because experimental data is messy; it’s easy to fool yourself into thinking random fluctuations are bad data, and thereby justify their removal.

This has a significant influence on determinations of significance, though, as we’ve just seen.

Real World Example: Climate Denialism

The practice of maliciously cherry-picking data is also an important one to understand.

This is the realm of people who wish to use statistics to push an agenda, and one of the most common places to find examples of this practice is in climate change denialism.

In an effort to provide evidence that the globe is not warming, one climate change denier claimed in a newspaper article in 2011 that:

“In fact, National Snow and Ice Data Center records show conclusively that in April 2009, Arctic sea ice extent had indeed returned to and surpassed 1989 levels.”

The implication of this statement is that there is no cause for alarm because there is no systematic reduction in sea ice between the two years.

The assumption being that the lack of difference in April can be used to infer systematic difference over the whole year (or longer).

Real World Example: Climate Denialism

Can you see the problem with this argument?

image label

Additional Observations

A significant statistical fallacy in significance estimation comes from the ability of researchers to adaptively observe more data.

Consider an experiment where we make \(n\) observations of a variable \(X\). We compute our statistic of choice, say the t.test, and calculate a p-value.

We find that our p-value is on the cusp of being “significant”.

We therefore decide to perform some additional observations, and find that the p-value decreases below our required threshold.

Confident that these additional data have confirmed our effect is real:

We Publish

Can you see a problem with this process?

Simulating the effect:

Let us create a dataset of \(n\) observations, and compute the p-value.

## 
##  One Sample t-test
## 
## data:  obs
## t = -1.8105, df = 99, p-value = 0.07326
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  -0.38179663  0.01748116
## sample estimates:
##  mean of x 
## -0.1821577

We now decide to observe more data, in a batch of \(10\) observations.

## 
##  One Sample t-test
## 
## data:  obs
## t = -2.3083, df = 109, p-value = 0.02287
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  -0.40244042 -0.03061316
## sample estimates:
##  mean of x 
## -0.2165268

Bingo! We cross the threshold of \(p<0.05\) and we rush straight to the publisher.

Simulating the effect:

But what happens if we were to continue observing data?

This effect is known most colloquially as p-hacking (although that term can be applied to many of the practices that we discuss here).

Generally speaking the problem is that we can decide when to stop taking observations based on the significance threshold we want to achieve.

This allows us to keep observing data until we work our way down to a significant result.

Simulating the effect:

We can ask the question: how often can I hack my way to significance with up to \(1000\) observations taken \(10\) at a time?

## published
## FALSE  TRUE 
##  0.71  0.29

So by selectively observing more data, we publish \(30\%\) of the time given a statistical significance threshold of \(0.05\).

Confirmation

Finally, we consider the influence of conscious and subconscious human biases on measurements.

Experiments do not happen in windowless rooms in the depths of space. They are performed by human researchers who work in laboratories, and have a keen understanding of the context in which their experiment takes place.

In our discussion of bayesian statistics, we formulated this as a good thing.

The prior knowledge that we bring to an experiment can play an important role in improving our statistical inference.

However there is a dark side to prior knowledge: the (generally sub-)conscious drive to be “consistent”.

Confirmation bias

The last significant statistical fallacy that we will discuss today is one that is extremely important: confirmation bias.

Confirmation bias is the tendency for researchers to continue adapting their results until they agree with some prior belief.

Take, as an example, measurements of the coefficient of charge-parity violation:

image label

Confirmation bias

image label

The figure above was taken from Jeng (2005), and was originally printed in Franklin (1984): “Forging, cooking, trimming, and riding on the bandwagon”.

The figure demonstrates the problem nicely. Prior to 1973, there was a consensus on the value that \(|\eta_\pm|\) ought to hold.

However in the early seventies, there was a shift in the consensus: and all observations began to cluster around that new, different, particular value.

The pre- and post-1973 distributions of \(|\eta_\pm|\) are catastrophically inconsistent with one-another. The cause: confirmation bias.

Similar effects have been seen in measurements of the speed of light, and in the build-up to the discovery of the Higgs Boson.

Real World Example: the penta-quark

Confirmation bias, however, need not require previous measurements. Humans can have a prior belief about a particular result, and simply analyse their data until that result is observed.

Such was the case with the discovery of the \(\theta^+\) penta-quark.

In 2002, a japanese lab published the discovery of the \(\theta^+\) penta-quark at greater than \(5\sigma\) significance (a false positive rate of 1 in ~20 million).

Subsequently over the next \(4\) years \(11\) other research groups searched for and found high-significance detections of the same penta-quark.

However, subsequent searches with more sensitive equipment failed to find any evidence for the penta-quark.

In the same year, one group quoted an \(8\sigma\) detection of the pentaquark, while another group performing the exact same experiment at a different lab with comparable statistical power found nothing.

The problem here is that researchers were not blinded to their data.

That is: they knew the signal that they were trying to detect, and they found it.

As such blind analyses are now a staple in many fields within the natural sciences, including cosmology and high-energy particle physics.

What have we learned

This has been an incomplete discussion of statistical fallacies. There are many more. Notable omissions include:

Regression to the mean

Spurious Correlation

Survivor Bias

Generally, the lesson here is to be very sceptical of using a p-value as a mechanism for determining whether or not something is “interesting”, or “significant”.

Bayesian Hypothesis Testing

As Bayesian statistics is concerned with determining estimates of underlying model parameters given the data, model comparison and hypothesis testing between different models becomes a natural extension of standard Bayesian methods.

Take the simplest possible example: > + Simple null vs Simple Alternative hypotheses: \(H_0=\theta_0\) vs \(H_1= \theta_1\).

We have two hypotheses about the model that generates our data: \(H_0=\theta_0\) vs \(H_1= \theta_1\).

These hypotheses are mutually exclusive and exhaustive (that is, \(\{H_0,H_1\}=\Omega\)). Next assume we have some appropriate test statistic \(T=T(X_1,\dots,X_n)\).

Bayesian Hypothesis Testing

By Bayes Theorem, we have: \[ P(H_0|T) = \frac{P(T|H_0)P(H_0)}{P(T|H_0)P(H_0) + P(T|H_1)P(H_1)} \]

Given that the hypotheses are mutual exclusivity and exhaustive: \[ P(H_1|T)=1-P(H_0|T) \] so \[ \frac{P(H_0|T)}{P(H_1|T)} = \frac{P(H_0)}{P(H_1)} \times \frac{P(T|H_0)}{P(T|H_1)} \] This is the posterior odds ratio, and the last ratio is known as the Bayes factor.

Notice that, therefore, if the prior odds ratio is unity (i.e. that \(\frac{P(H_0)}{P(H_1)}=1\)), then the posterior odds equals the Bayes factor.

Jeffery’s Hypothesis Tests

“If the posterior odds ratio exceeds unity, we accept \(H_0\). Otherwise, we reject \(H_0\) in favour of \(H_1\).”

The Jeffreys Hypothesis testing criterion above has a few important benefits over classical methods of hypothesis testing.

There is no specification of a “significance level” that determines whether or not a hypothesis is accepted/rejected.
It is easily generalisable to many many hypotheses: you just accept the one with the highest posterior probability.

There is one important philosophical difference as well: in accepting \(H_0\) as the preferred model, we do not assume that it is the true model.

We are simply stating that, with the currently available data, \(H_0\) is the more probable alternative.

Jeffery’s Hypothesis Tests

The “Jeffreys Scale” gives a slightly larger dynamic range to the amount of evidence that is encapsulated in the posterior odds ratio:

\(\frac{P(H_0\|T)}{P(H_1\|T)}\)	Strength of evidence
\(< 10^0\)	Negative (supports \(H_1\))
\(10^0\) to \(10^{\frac12}\)	Barely worth mentioning
\(10^{\frac12}\) to \(10^1\)	Substantial
\(10^1\) to \(10^{\frac32}\)	Strong
\(10^{\frac32}\) to \(10^2\)	Very strong

Bayesian Model Comparison

Simple Null vs Simple Alternative

Suppose we have \(X|\theta\sim N(\theta,1)\), and \(H_0: \theta=0\) vs \(H_1: \theta=1\).

We observe the random sample \(X_1,\dots,X_n\), and form the sufficient test statistic \(T=\bar{X} = \frac{1}{N}\sum_{i=1}^N X_i\).

We have \(T|H_0\sim N\left(0,\frac{1}{N}\right)\) and \(T|H_1\sim N\left(1,\frac{1}{N}\right)\).

Assume a priori that we have no prior preference over the models \(P(H_0)=P(H_1)=\frac{1}{2}\).

Therefore the posterior odds ratio is: \[\begin{align} \frac{P(H_0|T)}{P(H_1|T)}&= \frac{P(H_0)}{P(H_1)} \times \frac{P(T|H_0)}{P(T|H_1)}\\ &=\frac{0.5}{0.5}\times\frac{ \left(\frac{N}{2\pi}\right)^\frac{1}{2}\exp\left(-0.5N\bar{X}^2\right)}{ \left(\frac{N}{2\pi}\right)^\frac{1}{2}\exp\left[-0.5N(\bar{X}-1)^2\right]}\\ &=\exp\left[-0.5N(2\bar{X}-1)\right] \end{align}\]

Simple Null vs Simple Alternative

Let’s now look at some simulated data:

x1<-rnorm(10,mean=1.2)
x2<-rnorm(10,mean=0.2)
post_ratio<-function(x) exp(-0.5*length(x)*(2*mean(x)-1))
print(post_ratio(x1)); print(post_ratio(x2))

## [1] 0.02911078

## [1] 1.127367

We can now look at what the posterior odds ratio looks like, in this case, for a range of means and values of \(N\):

Bayesian Modelling

The previous examples of how to calculate model preferences is all well and good, but this is where the magic happens.

Because there is some uncertainty in expressing/specifying any single model: \[ f(\theta,x)=f(x|\theta)f(\theta) \] we can instead construct a single model that we define as being the union of all alternative models that we might wish to entertain.

We will then provide a prior over the suite of encompassed models.

Bayesian Modelling

Take an example where we have two models that we think might be appropriate for our dataset, both of which fall within the general “Gamma” family of distributions.

Recall that the Gamma family of distributions all take the format: \[ f(x|\alpha,\beta,\gamma)=\frac{\gamma\beta^\alpha}{\Gamma(\alpha)}x^{\alpha\gamma-1}\exp\left(-\beta x^\gamma\right) \] Our two hypothesised models are a Weibull distribution: \[ f_1(x|\beta,\gamma)=\gamma\beta x^{\gamma-1}\exp\left(-\beta x^\gamma\right) \] and a two-parameter Gamma distribution: \[ f(x|\alpha,\beta)=\frac{\beta^\alpha}{\Gamma(\alpha)}x^{\alpha-1}\exp\left(-\beta x\right) \] We can analyse these two models in the same way as previously. Require that these two hypotheses be exhaustive (\(P(m_1)=1-P(m_2)\)), and formulate the values of \(\alpha,\beta,\gamma\).

Bayesian Modelling

However, we could alternatively specify a single encompassing model that is just a generalised gamma distribution.

This distribution contains both of the previous \(2\) models and many many more.

Nominally it is no more or less sensible to formulate our model comparison using priors on \(\alpha,\beta,\gamma\) instead of on \(m_1,m_2\), and we can construct priors that recover the behaviour of having only the two models in any case: \[ f(\alpha,\beta,\gamma)= \begin{cases} f(\alpha,\beta,\gamma) & \textrm{if}\;\; \alpha=1 \\ f(\alpha,\beta,\gamma) & \textrm{if}\;\; \gamma=1 \\ 0 & \textrm{otherwise} \\ \end{cases} \]

What did we just do?

We just demonstrated that we can perform model comparison within the bayesian framework by specifying a generic model and providing priors on the parameters that govern that model.

In this way, the likelihood that we specified was general: we didn’t pick particular values for the models in the likelihood, rather we specified a distribution of possible likelihoods and gave (possibly broad) priors on the variables that govern the distribution of possible models.

This leads us to an interesting class of models known (appropriately) as Bayesian hierarchical models (BHMs).

Start small

As a demonstration of the power of BHMs, we’re going to take an initially simple model, and with few logical steps, construct an much more complex model that has exceptional explanatory power.

The dataset that we’re going to explore today to demonstrate this process is one from the US, where \(8\) high-schools reported on the possible benefits of giving students additional coaching prior to the SAT-V (“scholastic aptitude test - verbal”) exams.

Under randomisation, students at each school were given either extra coaching or not. SATs are designed to be resilient to short-term efforts, however all schools think that their program is useful/effective nonetheless.

There’s no prior reason to expect that any program is more or less effective than the others.

The data

We have \(J=8\) independent experiments, with coaching effects \(\theta_j\) being judged by \(n_j\) i.i.d. normally distributed observations \(y_{ij}\), each with (assumed) known error variance \(\sigma^2\).

That is: \[ y_{ij}|\theta_j \sim N(\theta_j,\sigma^2),\;\; i=1,\dots,n_j;\;\; j=1,\dots,J \] The estimated effect of the coaching at each school is given by the mean \(\bar{y}_j\), with the standard error on the estimate \(\sigma_j^2\). \[\begin{align} \bar{y}_j &= \frac{1}{n_j}\sum_{i=1}^{n_j} y_{ij} \\ \sigma_j &= \sqrt\frac{\sigma^2}{n_j} \end{align}\]

The data

The likelihood for each \(\theta_j\) can be expressed in terms of the sufficient statistic: \[ \bar{y}_{j}|\theta_j\sim N(\theta_j,\sigma_j^2) \]

dat<-data.frame(Estimated_Effect=c(28,8,-3,7,-1,1,18,12),
                Standard_Error=c(15,10,16,11,9,11,10,18))
rownames(dat)<-LETTERS[1:8]
print(dat)

##   Estimated_Effect Standard_Error
## A               28             15
## B                8             10
## C               -3             16
## D                7             11
## E               -1              9
## F                1             11
## G               18             10
## H               12             18

Methods of analysis

print(dat)

##   Estimated_Effect Standard_Error
## A               28             15
## B                8             10
## C               -3             16
## D                7             11
## E               -1              9
## F                1             11
## G               18             10
## H               12             18

Each to their own

There are multiple ways that we could approach the modelling of this dataset. The first option is to treat each experiment independently, as the data have been provided. We will call this the separate analysis.

At first glance, there is a mixed-bag of results. Some schools show reasonably large effects (\(\theta_j\geq 18\)), some show small effects (\(0\leq \theta_j\leq 12\)), and some show small negative effects.

However, each estimate also has a large standard error. This makes it difficult to distinguish between the different results. The \(95\%\) posterior credibility intervals for these results all significantly overlap.

Methods of analysis

print(dat)

##   Estimated_Effect Standard_Error
## A               28             15
## B                8             10
## C               -3             16
## D                7             11
## E               -1              9
## F                1             11
## G               18             10
## H               12             18

All together now

The large overlap between the individual credible intervals might suggest that all of the experiments are trying to measure the same underlying quantity.

So we might prefer to assume that \(\theta_j=\theta_0\;\forall j\in\{1,\dots,J\}\).

That is, that all the values of \(\theta_j\) are the same. Given this hypothesis, we can estimate the value of each \(\theta_j\) using the pooled average \(\bar{y}\).

Said differently: assuming that all experiments have the same effect (and produce random independent estimates) then we can treat the \(8\) experiments as a i.i.d. observations from the underlying truth, with known variances.

All together now

We can estimate this quantity simply: \[ \bar{y} = \frac{\sum_{j=1}^J w_j\bar{y}_j}{\sum_{j=1}^J w_j} \] where \(w_j=1/\sigma^2_j\).

ybar<-with(dat,{ 
  weight<-1/Standard_Error^2
  return(sum(weight*Estimated_Effect)/sum(weight))
})
print(ybar)

## [1] 7.685617

The variance of this estimate is the inverse of the sum of the weights: \[ \textrm{var}(\bar{y}) = \frac{1}{\sum_{j=1}^J w_j} \]

var_ybar<-with(dat,{ 
  weight<-1/Standard_Error^2
  return(1/sum(weight))
})
print(var_ybar)

## [1] 16.58053

All together now

So we have an estimate of \(\theta_j\sim N(7.69,16.58)\).

Does this seem reasonable?

Take the experiment at school A as a test case:

In the independent analysis: \(\hat{\theta}_1=28\) and \(\hat{\sigma}_1=15\).
In the pooled analysis: \(\hat{\theta}_1=7.69\) and \(\hat{\sigma}_1=4.07\).

The first estimate tells us that the probability of the true \(\theta_j\) being greater than \(28\) is \(\frac{1}{2}\).

Conversely, the latter estimate tells us that probability of the true \(\theta_1\) is less than \(7\) is also \(\frac{1}{2}\).

A Hierarchical Model

We can display our two previous models as directed acyclic graphs:

image label

These show how the variable we observe (\(\bar{y}_i\)) is related to the parameter of interest \(\theta\).

In the first instance (i.e the separate estimates), we assumed that each school observed a totally independent \(\theta_i\). In the second case (i.e. the pooled estimate), however, we assumed that \(\theta\) was a constant.

A Hierarchical Model

Let’s now instead assume that the values of \(\theta_j\) are drawn from a normal distribution. The properties of the normal distribution we will determine with two hyper-parameters \((\mu, \tau)\).

image label

Mathematically, we are defining the joint probability of all our \(\theta_i\) as the product of the probabilities of observing the data, given that each \(\theta_i\) is drawn from a parent population \(N(\mu,\tau)\). \[\begin{align} f(\theta_1,\dots,\theta_J|\mu,\tau) &= \prod_{j=1}^{J} N(\theta_j|\mu,\tau^2) \\ &=\int \prod_{j=1}^{J}\left[N\left(\theta_j|\mu,\tau^2\right)\right]f(\mu,\tau)\textrm{d}(\mu,\tau) \end{align}\]

A Hierarchical Model

This is a hierarchical model, which can interpret the \(\theta_j\)’s as being randomly drawn from some shared parent distribution. Why is this useful?

We initially had the problem of determining whether or not to choose the independent or pooled estimate. However in our hierarchical model:

As \(\tau\rightarrow 0\): the \(\theta_j\) values are drawn from a narrower and narrower range around \(\mu\). In the limit, \(\theta_j=\mu\;\forall j\in\{1,\dots,J\}\), and so we have the pooled estimate.
As \(\tau\rightarrow \infty\): the \(\theta_j\) values become independent of each other. That is, if we know \(\theta_1\) with absolute certainty, this gives us no information about \(\theta_2\). This is therefore the independent/separate estimate.

For finite, non-zero values of \(\tau\), our result will therefore be some mixture of the pooled and separate analyses.

Important implications

The smaller \(\tau\), the more related are the individual values of \(\theta_j\).

This means that they contribute more to the estimates of \(\theta_j\) for the other experiments: the experiments “borrow strength” from one-another.

image label

SFB-1491 Graduate School: Astrostatistics

Section 3: Signifcance & Modelling

Section 4: Introduction

The reason we’re here

A Significant Conundrum

A simple demonstration:

A simple demonstration:

Aside: the merits of contradiction

Significance

Significance

Significance

The p-value

The p-value

Using the p-value

Using the p-value

Using the p-value

How much published research is wrong?

How much published research is wrong?

Pratical Statistical Inference

The Problem is Choice

However:

Bad Statistics: (Non-exhaustive) Examples of what not to do

Variable Selection

Variable Selection

Variable Selection

Variable Selection

Variable Selection

Variable Selection

What is the problem with this?

How can we combat this?

Real World Example: An Empathetic Fish

Real World Example: An Empathetic Fish

Real World Example: An Empathetic Fish

Data Modification

Data Modification

Data Modification

Real World Example: Climate Denialism

Real World Example: Climate Denialism

Additional Observations

Simulating the effect:

Simulating the effect:

Simulating the effect:

Confirmation

Confirmation bias

Confirmation bias

Real World Example: the penta-quark

What have we learned

Bayesian Hypothesis Testing

Bayesian Hypothesis Testing

Jeffery’s Hypothesis Tests

Jeffery’s Hypothesis Tests

Bayesian Model Comparison

Simple Null vs Simple Alternative

Simple Null vs Simple Alternative

Bayesian Modelling

Bayesian Modelling

Bayesian Modelling

What did we just do?

Start small

The data

The data

Methods of analysis

Each to their own

Methods of analysis

All together now

All together now

All together now

A Hierarchical Model

A Hierarchical Model

A Hierarchical Model

Important implications

Bad Statistics:
(Non-exhaustive) Examples of what not to do