Lecture 1: Basic Statistical Tools

1 / 19

# Lecture 1: Basic Statistical Tools - PowerPoint PPT Presentation

Lecture 1: Basic Statistical Tools. <. <. >. Discrete and Continuous Random Variables. A random variable (RV) = outcome ( realization ) not a set value, but rather drawn from some probability distribution. A discrete RV x --- takes on values X 1 , X 2 , … X k.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Lecture 1: Basic Statistical Tools' - branxton

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Lecture 1:Basic Statistical Tools

<

<

>

Discrete and Continuous Random Variables

A random variable (RV) = outcome (realization) not a set

value, but rather drawn from some probability distribution

A discrete RV x --- takes on values X1, X2, … Xk

Pi> 0, S Pi = 1

Probability distribution: Pi = Pr(x = Xi)

A continuous RV x can take on any possible value in some

interval (or set of intervals)

The probability distribution is defined by the

probability density function, p(x)

<

<

<

<

<

<

Joint and Conditional Probabilities

The probability for a pair (x,y) of random variables is

specified by the joint probability density function, p(x,y)

The marginal density of x, p(x)

p(y|x), the conditional density y given x

Relationships among p(x), p(x,y), p(y|x)

x and y are said to be independent if p(x,y) = p(x)p(y)

Note that p(y|x) = p(y) if x and y are independent

Bayes’ Theorem

Suppose an unobservable RV takes on values b1 .. bn

Suppose that we observe the outcome A of an RV correlated

with b. What can we say about b given A?

Bayes’ theorem:

A typical application in genetics is that A is some

phenotype and b indexes some underlying (but unknown)

genotype

Pr(QQ) * Pr (height > 70 | QQ)

Pr(QQ | height > 70) =

Pr(height > 70)

Pr(height > 70) = 0.3*0.5 +0.6*0.3 + 0.9*0.2 = 0.51

= 0.5*0.3 / 0.51 = 0.294

-

x continuous

x discrete

Useful properties of expectations

]

[

Expectations of Random Variables

The expected value, E [f(x)], of some function x of the

random variable x is just the average value of that function

E[x] = the (arithmetic) mean, m, of a random variable x

E[ (x - m)2 ] = s 2, the variance of x

More generally, the rth moment about the mean is given

by E[ (x - m)r ]

r =2: variance

r = 4: (scaled) kurtosis

r =3: skew

(

)

Mean (m) = peak of

distribution

The variance is a measure of

smaller s2, the narrower the

distribution about the mean

The Normal (or Gaussian) Distribution

A normal RV with mean m and variance s2 has density:

Density function = p(x | x > T)

Mean of truncated distribution

Here pT is the height of the normal

at the truncation point,

Variance

T

The truncated normal

Only consider values of T or above in a normal

Let pT = Pr(z > T)

Cov(x,y) < 0, negative (linear) association between x & y

Cov(x,y) > 0, positive (linear) association between x & y

Cov(x,y) = 0, no linearassociation between x & y

Cov(x,y) = 0 DOES NOT imply no association

Covariances
• Cov(x,y) = E [(x-mx)(y-my)]
• = E [x*y] - E[x]*E[y]
Correlation

Cov = 10 tells us nothing about the strength of an

association

What is needed is an absolute measure of association

This is provided by the correlation, r(x,y)

r = 1 implies a perfect (positive) linear association

r = - 1 implies a perfect (negative) linear association

Useful Properties of Variances and Covariances
• Symmetry, Cov(x,y) = Cov(y,x)
• The covariance of a variable with itself is the variance, Cov(x,x) = Var(x)
• If a is a constant, then
• Cov(ax,y) = a Cov(x,y)
• Var(a x) = a2 Var(x).
• Var(ax) = Cov(ax,ax) = a2 Cov(x,x) = a2Var(x)
• Cov(x+y,z) = Cov(x,z) + Cov(y,z)

More generally

Hence, the variance of a sum equals the sum of the

Variances ONLY when the elements are uncorrelated

Regressions

Consider the best (linear) predictor of y given we know x

The slope of this linear regression is a function of Cov,

The fraction of the variation in y accounted for by knowing

x, i.e,Var(yhat - y), is r2

Relationship between the correlation and the regression

Slope:

If Var(x) = Var(y), then by|x = b x|y = r(x,y)

In the case, the fraction of variation accounted for

by the regression is b2

Properties of Least-squares Regressions

The slope and intercept obtained by least-squares:

minimize the sum of squared residuals:

• The regression line passes through the means

of both x and y

• The average value of the residual is zero

• The LS solution maximizes the amount of variation in

y that can be explained by a linear regression on x

• Fraction of variance in y accounted by the regression

is r2

• The residual errors around the least-squares regression

are uncorrelated with the predictor variable x

• Homoscedastic vs. heteroscedastic residual variances

Maximum Likelihood

p(x1,…, xn | q ) = density of the observed data (x1,…, xn)

given the (unknown) distribution parameter(s) q

Fisher (yup, the same one) suggested the method of maximum likelihood --- given the data (x1,…, xn) find the value(s) of q that maximize p(x1,…, xn | q )

We usually express p(x1,…, xn | q) as a likelihood

function l ( q | x1,…, xn) to remind us that it is dependent

on the observed data

The Maximum Likelihood Estimator (MLE) of q are the

value(s) that maximize the likelihood function l given

the observed data x1,…, xn .

MLE of q

Var(MLE) = -1

Negative curvature

at a maximum

2nd derivative = curvature

l (q | x)

The curvature of the likelihood surface in the neighborhood of the MLE informs us as to the precision of the estimator

A narrow peak = high precision. A board peak = lower precision

This is formalize by looking at the log-likelihood surface,

L = ln [l (q | x) ]. Since ln is a monotonic function, the

value of q that maximizes l also maximizes L

The larger the curvature, the smaller

the variance

q

q

q

q

Maximum value of the likelihood function under the

null hypothesis (typically r parameters assigned fixed

values)

Ratio of the value of the maximum value of the

likelihood function under the null hypothesis vs.

the alternative

Likelihood Ratio tests

Hypothesis testing in the ML frameworks occurs

through likelihood-ratio (LR) tests

For large sample sizes (generally) LR approaches a

Chi-square distribution with r df (r = number of

parameters assigned fixed values under null)

posterior distribution of

q given x

Likelihood function for q

prior distribution for q

The appropriate constant so that the posterior

integrates to one.

Bayesian Statistics

Instead of simply estimating a point estimate (e.g.., the

MLE), the goal is the estimate the entire distributionfor the unknown parameter q given the data x

An extension of likelihood is Bayesian statistics

p(q | x) = C* l(x | q) p(q)

Why Bayesian?

• Exact for any sample size

• Marginal posteriors

• Efficient use of any prior information

• MCMC (such as Gibbs sampling) methods