Introduction to Statistics
Guillaume Fürst, 2012
1. Basics Statistics
This first part introduces some basics about variables, graphs, and descriptives statistics. Notions of sampling and statistical inference are covered here as well.
1.1. About variables
Variable are the base of any statistical analysis. They can represent a vast array of concepts; for instance:
 they can represent attributes of people, such as income, preference for novelty
 they can be product or market related, such as quality, prices, month of the year
 they can also represent some experimental settings, for instance time let to take a decision
1.1.1. Variable types
Variables come in different types (e.g., quantitative or qualitative). These types really matter: knowing the type of variable you are dealing with let you choose the correct statistical analysis to run.
Qualitative and quantitative variables
We usually distinguish bewteen:
 qualitative categorical: mere categories that can't be (or hardly) be ordered, such as color, places, vehicles, gender.
 qualitative ordinal: variables from whom a "natural" order exist, such as size (small, medium, big) or army rank (soldier, colonel, general, etc.).
 quantitative discrete: possible values are integers, most often a few only.
 quantitative continuous: have an infinite (or a lot of) number of possible numerical values.
Examples of variables
Dependant and independant variables
 dependante variable (DV): under the influence of some other variable (also called criterion)
 independante variable (IV): the one that influence the DV (also called predictor)
Symbol used to represent this relation: IV > DV
(Correlation imply no direction: V1 <> V2)
1.1.2. Variable modifications
Once in the database, variable type can't be changed. However various manipulations can be done, for instance variables can be recoded or transformed.
Recoding
Recoding do not change the fundamental nature of the variable. However, recoding allows more flexibility. For instance, two categories can be treated as numerical values, and the difference between quantified. (see "dummy variables in linear regression", section 2.4.1.)
Example: recoding sex
Standardization (linear transformation)
Standardization is not normalization.
Standardization just change the variable scaling.
The proportion between values is not affected; the distribution is not changed in the slightest way.
Z scores
x_{i} are the raw scores
μ_{x} is the mean
σ_{x} is the standard deviation
(See subsection 1.2.1. about univariate graph for an example.)
(See subsection 1.3.1. about univariate descriptive statistics.)
Nonlinear transformation
Nonlinear transformation change the original proportion (distance) between original scores.
Such transformations are for instance logarithmic, exponential, quadratic or squared root transformations.
These transformations are useful to model nonlinear relations between variables in the linear models framework
(i.e., generalized linear model).
For instance, the relation between learning and performance is more logarithmic than linear,
while the relation between stress and performance is more quadratic in nature.
Example: logarithmic transformation
x_{i} are the raw scores (original distribution)
log( ) is the function transforming the scores
x'_{i} is the logtransformed distribution
[see subsection 1.2.1 about univariate graph for an example]
Operationalisation and variable type
If the fundamental nature of variable can't be changed once measured,
some important choices can nonetheless be done when designing the operationalisation.
For instance "adult" vs. "children" is a poor operationalisation of the variable age.
Simply measuring age in years or month is a much more powerful operationalisation.
The same logic stands for many variables (e.g., extraversion, prices, intelligence, buying intentions).
As a general rule, and when the nature of the phenomenon allows it,
contiunous variables with many possible values should be prefered.
Continuous variable can always be categorized if necessary (though it's often not recommended),
whereas the reverse is not true.
What are Variables? (statsoft.com)
Variable Types (statistics.laerd.com)
Statistical Distributions (pages.stern.nyu.edu)
Transformations (stattrek.com)
Distribution Fitting (en.wikipedia.org)
1.2. Graphs
Graphs are massively important to inspect and visualize data —
either for representing one variable at the time (univariate graphs, e.g., histogram)
or the relation between two variables (bivariate graphs, e.g., scatterplot).
Graphs allows the detection of unexpected distributions, extreme values (univariate or bivariate),
as well as errors in the data (e.g., impossible values).
Such problems often cannot (or hardly) be detected otherwise, in particular in big datasets.
1.2.1. Univariate graphs
Univariate graphs (i.e., one variable at the time) comes basically in two kinds. One is the represention of the frequency of discrete observations (qualitative or quatitative variable with a few possible values). The other is the representation of distributions of continuous variables.
Qualitative/categorial variable
Example of Pie chart
Example of Barplot
Continuous variable
Examples of histograms
Examples of boxplots
1.2.2. Bivariate graphs
Bivariate graphs allow us to visualize relations bewteen two variables, either two categorical variables, two continuous variables, or a mix of both.
Categorized plots
Categorized barplot
Categorized boxplot
Several quantitative variables
Scatterplot
Matrix scatterplot
Graphing Distributions (onlinestatbook.com)
Scatterplot (stattrek.com)
How to Compare Data Sets (stattrek.com)
1.3. Descriptive statistics
Descriptive statistics are useful to synthetize information about variables. Just as for the graphs, descriptives statistics come in univariate or bivariate kind. Descriptive statistics should always be done conjointly with appropriate graphs.
1.3.1. Univariate statistics
Notions of interest here are frequency of observations, central tendency, and dispersion for a given variable.
Frequency tables
Frequency tables are useful to sum up qualitative variables or quantitative variables with a few values.
Example: number of children
Central tendancy
Central tendancy allows to answer the question "What values are the most frequent?".
Example: mean, median and mode for 3 variables
Dispersion
Dispersion allows to answer the question "To which extend values are spread around the central tendancy?".
Example: min./max., variance and quartiles
Variance's formula
1.3.2. Bivariate statistics
This subsection introduces indices and tables useful to sumarize relations between variables, such as cross tabulation and correlation.
Cross tabulation
Cross tabulation allows to investigate relationships between qualitative variables or quantitative variables with a few values.
Example: sex and social class
Correlation and covariance
Covariance is a nonstandardized correlation; correlation is a standardized covariance. Both represent the strength of association between two continous variables. Both estimate the linear relation between variables. (See also section 2.4.1.)
Covariance
Possible values of covariance go to +∞ to –∞. This makes the interpretation of covariance somewhat uneasy. In some cases 34 could be a large while in other cases 4'000'000'000 could be small.
Possible values of correlation go to 1 (perfect negative correlation) to +1 (perfect positive correlation). 0 means no relation at all between the two variables. Correlations between several variables are classically represented in triangular matrix.
Correlation
Example of correlations
Selfrated importance (imp) and usefulness (use)
of a compact (C) and reflex (R) camera. n=204.
Standard Deviation (en.wikipedia.org)
Summarizing Distributions (onlinestatbook.com)
Describing Bivariate Data (onlinestatbook.com)
Visualizing correlation (uvm.edu)
Disattenuating Correlation Coefficients (rasch.or
1.4. Statistical inference
Inference is the backbone of statistic, the very reason for its existence.
Nothing is absolutely certain or uncertain in statistics; you always have to deal with
the probability of something being true or false.
Specifically, we often want to know wether a given sample reflets some true propertie of a population
(e.g., to what extent a correlation found in a sample can be generalized to population).
The general aim of statistical theory is to provide formal tools to deals with incertainty.
If the basic notion of probability is somewhat intuitive, things can get blurry when dealing with specific
statistic notions (e.g., sampling distribution).
Thus this aim of this section is to introduce some general concepts of statistical inference,
which are underlying virtually all statistical tests. (Specific tests are considered in parts 2 and 3.)
1.4.1. About sampling
All statistic decisions and estimations are based on samples. The aim of the first subsection is to provide an overview of the reasons and implications of using samples to draw conclusions about the population.
Why sampling?
Studying the entire population is never an option. Many population are huge or even of unkomwn or infinite size. Thus research hypotheses have to be tested on samples.
Example of sampling
A important implication of sampling is sampling error. That is, a sample will seldom be identical to the population.
Role of randomness
There are many different type of sampling, but most of them rely on randomness. If the sample is biased, no valid conclusion can be drawn.
Example of a nonrandom sample
If, for whatever reason, the different individuals of the population don't have an equal chance to be included in the sample, the sample will be biased.
Sampling error and sample size
Three examples of different sample size and impact on precision of estimation.
The underlying idea of these three example is the following:
the bigger the sample, the better the estimation, and the higher the certainty.
(See next subsection for more about this.)
In other terms, the probability of "bad luck" (i.e., a special sample very different from the population)
get smaller and smaller as sample size increases.
1.4.2. Statistical decisions
We've just seen that there are "good" and "bad" samples
— i.e., some seamingly more representative of the population than others.
Since this is inevitable, one must have a solution, a criteria to decide wether or not
a sample can be considered as representative beyond sampling error.
The statistical tools to help us taking such decision are
hypothesis testing, pvalue and α level.
Null and alternative hypothesis (H_{0} and H_{1})
The basic notion underlying virtually all statistics tests is the null hypothesis,
where "null" convey the idea of equifrequence, no diffence between groups, no effect of treatment, no correlation, etc.
The alternative hypothesis represents the other possible outcome —
there is a difference between groups, there an effect of the treatment, there is a correlation.
Note that this is a very "black and white" world here:
 either there is no difference (H_{0}) is true;
 either there is a difference (H_{0}) is false.
Correct decisions and error types
The null hypothesis is the one that is formally tested. This is so because it's simpler and straightforward than testing the alternative hypothesis. Plus, on a more fundamental, epistemic level a hypothesis can not be proven to be true.
An analogy
H_{0} is true Truly not guilty 
H_{1} is true Truly guilty 


Accept Null Hypothesis Acquittal 
Right decision  Wrong decision Type II Error 
Reject Null Hypothesis Conviction 
Wrong decision Type I Error 
Right decision 
But how do we formally take this decision?
Pvalue
The pvalue is the probability of obtaining a test statistic at least as extreme as the one that was actually observed,
assuming that the null hypothesis is true.
/!\ The pvalue is not the probability of the null hypothesis being true.
Let's go back to our previous example (subsection 1.4.1.) to figure out what it means:
 in this case, the "sample statistic" would be the proportion of black dots in a given sample;
 the situation where "the null hypothesis is true" would be the distribtion of many — infinite — samples drawn from a population with 50% black dots and 50% white, i.e., all three histograms of the example
In this context, drawing a sample with 50% of black dots is very likely (i.e., large pvalue), while drawing a sample with 90% of black dot is very unlikely (i.e., small pvalue) — but not impossible!
Likelyhood of samples
At this point we face the following question: Where do we draw the line? When "unlikely" is closer to "almost impossible" than "quite likely"?
Null hypothesis rejection, α level and significance
The basic rule of statistial decision goes as follow. When the pvalue is small, we reject
reject the null hypothesis.
In other terms, we conclude that H_{0} is unlikely to be true.
Then, symetrically, we conclude that H_{1} is likely to be true.
Ultimately, we assert that there is a difference between groups, an effect of the treatment, a nonzero correlation.
This is statistical significance.
But what is a small enough pvalue to allow the rejection of the null hypothesis?
This is where arbitrary choices come into the picture.
By convention, one rejects the null hypothesis when the pvalue is smaller than the significance level α (Greek alpha),
which is often 0.05 or 0.01 (5% or 1%).
Of course these are just arbitrary cutoff values.
In a more subtle perspective, we can say that the smaller the pvalue is,
the more confident about our conclusion we can be.
This is why you should always report exact pvalue and not merely say "this is significant because p<.05".
One and twotailed test
One last detail about testing the null hypothesis. Whether the alternative hypothesis is oriented, the test is onetailed or twotailed.
 unoriented hypothesis: H_{1} ≠ 0 → twotailed test
 oriented hypothesis: H_{1} > 0 or H_{1} < 1 → onetailed test
Illustration of one and twotailed tests
/!\ Decision about hypothesis orientation must be based on theory, not on data.
1.4.3. Statistical estimations
In statistics, significance is only one side of the story. The gross side of the story — there's a significant effect, there's no significant effect. This is one necessary information, but this is harldy enough. Other very important pieces of information are effect size and confidence interval.
Effect size
If we've rejected the null hypothesis, and thus conclude that some significant effect does exist, the notion of effect size informs us about the magnitude of this effect.
Example of 3 effect sizes
Indeed, for a given pvalue, say .03, effect size can be very different. This is because pvalues are influenced both by effect size and sample size.
Precision of estimation and confidence interval
Every statistic test give a point estimate (e.g., a mean difference, a correlation).
We've seen that such estimate can be tested to decided whether or not it is significant.
There's one more thing we can consider: confidence interval.
Confidence inteval are useful to give information about reliability of the point estimate.
If the interval is large, the estimation is uncertain, not very reliable.
If the interval is small, the estimation is more certain and reliable.
Effect size and confidence intervals
The precision of estimations is a direct function of sample size (see above, end of subsection 1.4.1.). As show below, point estimate can be high or low, independantly of coinfidence interval's size.
Sampling (en.wikipedia.org)
Hypothesis Testing (en.wikipedia.org)
Statistical Significance (statsoft.com)
Effect size (en.wikipedia.org)
2. Bivariate Analyses
This section is about statistical tests such as ChiSquared, Ttest, Oneway ANOVA and Simple Linear Regression. These tests focus on the relation bewteen two variables. They provide information about wheter the variables are significantly related, whether the relation is strong, and so on.
2.1. Chisquared
There is two kinds of Chisquared test: χ^{2} test of homogeneity (actually for one variable only) and χ^{2} test of independance, which tests the relation between to variables. In both case, variable should be qualitative (categorical).
2.1.1. Χ^{2} test of homogeneity
This test is useful to test the equifrequence of values of a categorical variables
(uniform distribution).
For instance variable liked a movie, with possible values being "yes" or "no".
Assumptions
 Simple random sample
 Observations are independent
 Expected frequencies should be greater than 5
(See links "more" at the end of this section to know what to do if assumptions are not respected.)
Null hypothesis
Formally
There is equifrequence; the distribution is uniform.
Example
You ask people whether they liked a movie. Null hypothesis: you get an equal number of "yes" and "no". Alternative hypothesis: more people liked it OR more people did not.
Test statistic
Formula
O_{i} = an observed frequency;
E_{i} = an expected frequency, asserted by H_{0};
N = the number of cells in the table.
Example
You get 110 "yes" and 94 "no". This is the observed frequencies.
The expected frequency, asserted by the null hypothesis is (110+94)/2=102.
Thus, χ^{2} = ((114–102)^{2}+(90–102)^{2})/102=2.82
Degrees of freedom
Where
N is the variable's number of possible categories/values
Example (df and conclusion)
Here our variable have only two possible values.
Thus, df is 21=1.
For this χ^{2}(df=1)=2.82, the pvalue is .093.
Conclusion: we cannot reject H_{0}; there is no evidence of difference.
Effect size
Cohen's w
n = sample size.
w is most often between 0 and 0.90.
Example
In the previous example, we couldn't reject the null hypothesis.
This mean that, until further evidence (and/or a bigger sample),
we have to assume that the effect size is 0.
Hence calculating the effect size doesn't make any sense.
Let's take another example. Say whether people have a compact camera.
Imagine data are "yes"=61 and "no"=29. In this case, χ^{2}(df=1) would be 11.38,
with an associated pvalue of .00074.
Thus Cohen's w would then be the squared root of 11.38/90, that is 0.35.
A fairsized effect size.
Yet another example. Say the data are now "yes"=1'005'000 and "no"=1'000'000.
χ^{2}(df=1) would be 12.47, pvalue .00041.
Smaller than in the previous example, that is more significant.
What about the effect size? Squared root of 12.47/1000000 is only 0.0035...
2.1.2. Χ^{2} test of independance
This test is useful to test the relation beetween two categorical variables.
Assumptions
 Simple random sample
 Observations are independent
 Expected frequencies should be greater than 5
(Same as for χ^{2} test of homogeneity. See links "more", end of this section to know what to do if assumptions are not respected.)
Null Hypothesis
Formally
The row variable is independent of the column variable; row percents are equal.
Example
Test statistic
Formula
E_{i, j} = an expected frequency in a cell;
R_{i} = total in a row; C_{j} = total in a column; N = R x C.
Example
Degrees of freedom
R = total number of rows; C = total number of columns.
Example (df and conclusion)
χ^{2}(df=1)=25.01, the associated pvalue is < .000001. This is strongly significant, there is a relation between aspirin and heart attack.
Effect size
Cramér's phi
n = sample size;
q = R or C, whichever is less.
Example
Here, Cramer's phi is 0.0336, a very small effect size. Although the test statistic is highly significant (because of the large sample size), the effect size is negligeable.
Pearson's chisquared test (en.wikipedia.org)
pvalue calculator for a ChiSquare test (danielsoper.com)
ChiSquare distribution (en.wikipedia.org)
Yates's correction for continuity (en.wikipedia.org)
McNemar's test (en.wikipedia.org)
Loglinear analysis (en.wikipedia.org)
2.2. Ttest
The ttest focus on the mean of continuous variables: difference between an observed and an expected mean, as well as mean difference between two variables. Here we focus on the latter.
2.2.1. Ttest for dependent samples
We test the difference between two means:
μ_{X1} – μ_{X2} = μ_{D}
where X_{1} and X_{2} are dependant or related (e.g., same person measured twice, married couple, twins, etc.).
Assumptions
 or at least
 X_{1}, X_{2} pairs are independent
 (more powerfull if ρ _{X1}_{,} _{X1} ≠ 0)
Assumption check
What if normality is not respected?
 Transforme the data (see section 1.1.2);
 Use a nonparametricanalysis (see "more", end of this section).
Hypotheses and test
Null hypothesis
Test statistic
Degrees of freedom
Examples
Confidence interval (CI) and effect size
CI
Example for HE0HE1
Cohen's d
Example for HE0HE1
2.2.2. Ttest for independent samples
Also test the difference between two means, but for independent variables (i.e., two groups of different/unrelated people).
Assumptions
 all X_{1i} and X_{2i} are independent
Assumption check
Levene test of homogeneity of variance: tests the null hypothesis of equality of variance.
(This test calculates, for each score in each group, the distance to the mean of the group.
Then it compares the means distances of the two groups.)
If the pvalue is not significant, variances can be considered equal.
(See below to know what to do if variances cannot be considered equal
or if other problems with assumptions occur.)
Hypotheses and test
Null hypothesis
Test statistic
Where
Degrees of freedom
Example
What if assumptions are not respected?
If the equality of variance is not respected, you can use the alternative teststatistic t' with df' degrees of freedom. (Same principle and usage as for t', but with a different estimation.)
The t' statistic
Example
If other assumptions are not respected, transform the data (see section 1.1.2.) or use a nonparametric analysis (see "more", end of this section).
Confidence interval (CI) and effect size
CI
(Same principle as ttest for dependant samples.)
Example
Cohen's d
(Same principle as ttest for dependant samples.)
pvalue calculator for a Student ttest (danielsoper.com)
Student's tdistribution (en.wikipedia.org)
Levene Test (en.wikipedia.org)
Wilcoxon signedrank test (en.wikipedia.org)
2.3. Oneway ANOVA
Basically ANOVA can be seen — among other things — as a generalization of the ttests. That is comparing multiple groups (different categories of a single factor) on a single continuous dependant variable.
2.3.1. Betweensubjects design
If there are only two groups (i.e., the factor has two values),
this is strictly equivalent to the ttest for independent groups (betweensubjects designs).
If there are more than two groups, this is a generalization; same basic principles with a few estimation and interpretation differences.
Assumptions
 Normality of residuals in each group;
 Independence of observation/residuals;
 Homogeneity of variance of residuals in each group.
Null and alternative hypotheses
Formally
Example
Test statistic
Formula
Where
(a is the number of groups, N_{tot} is the number of observations)
Where
Where
i are subjects, j are conditions.
Degrees of freedom
(a is the number of groups, N_{tot} is the number of observations)
Example
Assumptions check (analysis of residual)
Multiple comparisons, effect size and CI
Effect size associated to Ftest
Here partial etasquared is 7.876/(7.876+34.3) = 0.187. This can be interpreted as 18% explained variance.
Multiple comparisons
F statistic and partial etasquared are not what ANOVA is all about. We are often interested in performing specific group comparisons or in testing specific pattern of differences. That's what multiple comparisons are.

There are different kinds of multiple comparisons:
 planned groups comparisons (contrasts);
 posthoc groups comparisons;
 test of polynomial contrasts.
(For practical reasons, numbers 1 and 2 are discussed in this section, while number 3 is discussed in the next subsection about within designs. However any of these comparisons can be done in both designs).
Both planned and posthoc group comparisons (number 1 and 2 above) consist in comparing groups.
The basic principle stays the same: one group (or several) is compared to another (or several others);
either way, the null hypothesis is always that there is no difference.
The main difference between planned comparisons and
posthoc comparisons is about pvalues.
If you run a small set of educated comparison, you generally can use raw, standard pvalues
(see also "more", end of this section, about the notion of orthogonal comparisons).
Whereas if you run many, blind comparisons, you have to correct the pvalues for an additional "chance factor".
Several solutions exist; some of them are introduced in the following examples.
Example of planned group comparisons
Example of posthoc groups comparisons
2.3.2. Withinsubjects design
If there is only two groups (i.e., the factor has two values), this is strictly equivalent to the ttest for dependent groups (withinsubjects designs). If there more than two groups, this is a generalization; same basic principles with a few estimation and interpretation differences.
Assumptions
 Normality of residuals in each group;
 Independence of residuals;
 Homogeneity of variance and covariance in each group.
Null and alternative hypotheses
Formally
(Same as for between designs.)
Example
Test statistic
Test statistic
This is the same basic idea as for betweensubjects designs. However, one main advantage repeated measurement ANOVA is that you are able to partition out variability due to individual differences. In a betweensubjects design there is an element of variance due to individual differences that is combined in with the group and residual terms:
SS_{Total} = SS_{Cond} + SS_{Resid}
In a repeated measures design it is possible to account for these subjects differences, and partition them out from the group and residuals terms. In such a case, the variability can be broken down into betweenconditions variability (or withinsubjects effects) and withinconditions variability. The withingroups variability can be further partitioned into betweensubjects variability (individual differences) and residuals (excluding the individual differences).
SS_{Total} = SS_{Subjects} + SS_{Cond} + SS_{Resid}
As the total variance is better "explained" (i.e., decomposed in more specific parts than in between designs), the residual term gets smaller. This contributes to higher F values, thanks to a smaller denominator (residual value) in the formula.
Mean squares and degrees of freedom
Where
a is the number of conditions;
n the number of observations.
Example
Assumptions check and sphericity correction factor
Normality of differences' scores
GreenhouseGeisser (GG) correction factor
 If GG epsilon is about 1, use the standard Ftest.
 If GG epsilon < .75, use the GG corrected test.
 If GG epsilon > .75, use the HF corrected test. (See below.)
HuyntFeldt (HF) correction factor
Multiple comparisons, effect size and CI
Effect size associated to Ftest
Here partial etasquared is 7.78/(7.78+76.8) = 0.092. This can be interpreted as 9.2% explained variance.
Multiple comparisons
As for between subjects designs, F statistic and partial etasquared is not the whole story.
Specific group comparisons can be done, and with them the tests of specific pattern of differences.
We've mentioned planned groups comparisons (contrasts) and posthoc groups comparisons
in the previous subsection; we'll look here at tests of polynomial contrasts.
(However, remember that any of these comparisons can be done in both designs.)
Polynomial contrasts allow us to test specific pattern of differences between conditions. For instance:
 linear trend: linear decrease or increase across conditions;
 quadratic trend: Ushaped or inverted Ushaped pattern across conditions;
 cubic trend: waveshaped (e.g., "~") pattern across conditions.
Example of polynomial contrasts
pvalue calculator for an FTest (danielsoper.com)
What is Orthogonal contrast? (talkstats.com)
Tukey's HSD (en.wikipedia.org)
Example of multiple comparison issues (neuroskeptic.blogspot.fr)
2.4. Simple Linear Regression
Regression is classically used to investigate the linear relation between two continuous variables.
However, regression can also be used to test a nonlinear relation, using a nonlinear transformation of data
(See "more" end of section 1.1.).
Last, regression can also be used to test the mean differences between two groups (equivalent to ttest for independent group).
2.4.1. Basic principles
Assumptions
 Normality of residuals;
 Independence of observation/residuals;
 Homogeneity of variance;
 No extreme values.
These are true for all linear models, including ANOVA and ttest.
Model
Estimated with least squares method, see here for an intuitive illustration: http://hadm.sph.sc.edu/courses/J716/demos/leastsquares/leastsquaresdemo.html
Null and alternative hypotheses
Tests and estimations
Intercept
 Estimated by b_{0} in the model. (Also called "beta zero hat".)
 Statistical significance tested using the tdistribution.
 The intercept parameter is in itself a raw effect size.
 Confidence interval can be calculated using the three following elements: (1) the estimate of b, (2) 97.5% quantile of tdistribution, (3) standard error of b.
Slope
 Estimated by b_{1} in the model. (Also called "beta one hat".)
 Statistical significance tested using the tdistribution.
 The slope parameter is in itself a raw effect size. A standardized version exists, it's called bz, b* or (rather confusingly) beta.
 Confidence interval can be calculated using the three following elements: (1) the estimate of b, (2) 97.5% quantile of tdistribution, (3) standard error of b (same as for the intercept; see calculator in "more", end of this section). /!\ This method does not work for standardized b.
Rsquared
 Reprensents the variance of Y explained by X.
 Significance tested using Ftest. (In the case of simple linear regression Ftest and ttest for the slope are equivalent. It won't be the case anymore in multiple regression.)
 Rsquared is by naturally a standardized parameter.
 Confidence interval can also be computed (see "more", end of the section).
2.4.2. Examples
Twocontinuous variables
Graphs
Model
Assumptions check
Interpretation
 Intercept:
point estimate is 4.03; people with 0 on Energy should have 4.03 on Wellbeing.
This effect is strongly significant.
(However, this is massive extrapolation from the actual data.)
 Slope:
point estimate is 0.38; that is, for every increase of one unit of Energy, Wellbeing increases of 0.38.
This effect is strongly significant (p=.003).
95% CI for this parameter would be [0.14; 0.62].
Standardized slope (here strictly equivalent to correlation) is 0.31.
 Rsquared: Energy explains 9.7% of Wellbeing. This is significant (see Ftest). However, in the case of simple regression this does not bring additional information.
Using a dummy variable as predictor
Graphs
See section 2.2.2. (boxplot).
Actually all this test is equivalent to ttest!
Model
Assumptions check
Interpretation
 Intercept: point estimate is 36.56; people with 0 on Reflex camera possession (that is, people that does not own such a camera) rate the Importance at 36.56.
 Slope: point estimate is 21.6. People with 1 on Reflex camera possession (people that own such a camera) rate the Importance at 36.56+21.6=58.2.
 Rsquared: variance explained is 16%.
Comparison with structural model of ANOVA
ANOVA models (and ttests) are actually a special case of linear regression model. The equations below (structural models of ANOVA) are indeed very similar to the equation of a regression model.
Between designs
Where
Within designs
Where
Hypothesis test for regression slope (stattrek.com)
Regression slope: confidence interval (stattrek.com)
Regression coefficient confidence interval calculator (danielsoper.com)
Confidence Interval for R square (danielsoper.com)
Cook's distance (en.wikipedia.org)
3. Multivariate Analyses
This section is about statistical tests such as factorial ANOVA and multiple linear regression. These test allows the investigation of the impact of several IVs (predictors, Xs) on one DV (criterion, Y). With these possibilities come the notions of statistical interaction and mediation.
3.1. Interaction & mediation
For an introduction about mediation and interaction, cf. this document: http://www.unige.ch/fapse/mad/static/fuerst/continuousvariableinteraction.pdf (pp. 14).
Moderators and mediators (psych.unimelb.edu.au)
Interpreting interaction effects (eremydawson.co.uk)
Mediation and indirect effect (en.wikipedia.org)
Example of mediation (victoria.ac.nz)
Indirect effect CI calculator (danielsoper.com)
3.2. Factorial ANOVA
This is a generalization of simpler ANOVA design. We will specifically look here at the 2x2 between design. However further generalization can be done (e.g., 2x2 within designs, 2x3 between designs, 2x2x2 designs, mixed designs).
3.2.1. Basic principles
In 2x2 between designs, three types of effect can be tested: (1) main effect of factor 1; (2) main effect of factor 2; (3) interaction effect between factor 1 and factor 2.
Assumptions
Always the same.
Model
Where
Null and alternative hypotheses
You should be able to guess what the alternative hypotheses are! (Always the same story.)
Mean squares and degrees of freedom
Where
a is the number of factor's 1 conditions;
b is the number of factor's 2 conditions;
n is the samble size.
Tests and estimations
In the (quite special) case of this 2x2 design:
 main effect 1: significance via Ftest associated with factor 1; effect size provided by the corresponding partial etasquared.
 main effect 2: significance via Ftest associated with factor 2; effect size provided by the corresponding partial etasquared
 interaction effect: significance via Ftest associated with interaction effect, effect size provided by the corresponding partial etasquared.
/!\ In more complicated designs (i.e., factors with more than two conditions), Ftest does not represent the difference between two groups (see section 2.3.).
3.2.2. Example
Graphs
Model
Assumptions check
Interpretation

There is an effect of the Selection training (SLC_FAV).
People in the favorable group for Selection have higher rated creativity.
This effect is highly significant (F(1,154)=47.04, p<.000001).
Effect size (partial etasquared) is 0.23.

There is an effect of the Generation training (GNR_FAV).
People in the favorable group for Generation have higher rated creativity.
This effect is highly significant (F(1,154)=79.48, p<.000001).
Effect size (partial etasquared) is 0.34.
 There is an interaction effect (SLC_FAV*GNR_FAV). There is a multiplicative effect of Generation and Selection. This effect is highly significant (F(1,154)=38.84, p<.000001). Effect size (partial etasquared) is 0.20.
Multifactor betweensubjects designs (onlinestatbook.com)
2x2 factorial interaction plots and their interpretation (frank.itlab.us)
3.3. Multiple Regression
Multiple regression allow the prediction of one continuous dependant variable (criterion, Y) with multiple independant variables (predictors, Xs).
3.3.1. Basic principles
Assumptions
Always the same.
Additionaly, one should also check that intercorrelation (redundancy) between predictor is not massive.
No independent variable should be strongly predicted by the others.
Model
Null and alternative hypotheses
Tests and estimations
Mostly similar to simple linear regression
 Intercept: Same estimation, same test, same IC. Interpretation differs a bit: the intercept is now the predicted score for a person having a score of 0 on all X variables.
 Slopes: same estimation, same test, same IC. Interpretation differs a bit: the slope for a given X variables is the one beyond and above other predictors. /!\ Standardized slopes are no more equal to correlation here.
 Rsquared: same estimation, same test, same IC. The Rsquared now represent variance explained by all predictors.
3.3.2. Example
Graphs
Model
Assumptions check
Interpretation
You should be able to do it yourself! (See first example of section 2.4.2. and end of section 3.3.1.).
Goldfeld?Quandt test (en.wikipedia.org)
Regression (simple and multiple) (onlinestatbook.com)