1 / 108

Statistical Inference and Regression Analysis: Stat-GB.3302.30, Stat-UB.0015.01

Statistical Inference and Regression Analysis: Stat-GB.3302.30, Stat-UB.0015.01. Professor William Greene Stern School of Business IOMS Department Department of Economics. Part 1 –Probability and Distribution Theory. 1 – Probability. Sample Space.

Pat_Xavi
Download Presentation

Statistical Inference and Regression Analysis: Stat-GB.3302.30, Stat-UB.0015.01

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistical Inference and Regression Analysis: Stat-GB.3302.30, Stat-UB.0015.01 Professor William Greene Stern School of Business IOMS Department Department of Economics

  2. Part 1 –Probability and Distribution Theory

  3. 1 – Probability

  4. Sample Space • Random outcomes: The result of a process • Sequence of events, • Number of events, • Measurement of a length of time, space, etc. • Outcomes, experiments and sample spaces

  5. Consumer Choice: 4 possible ways a randomly chosen traveler might travel between Sydney and Melbourne  = {Air, Train, Bus, Car}

  6. Market Behavior: Fair Isaacs credit card service to major vendors  = {Reject, Accept}

  7. Measurement of Lifetimes • A box of light bulbs states “Average life is 1500 hours” • Outcome = length of time until failure (lifetime) of a randomly chosen light bulb  = {lifetime | lifetime > 0}

  8. Events • Events are defined as • Subsets of sample space, such as empty set • Intersection of related events • Complements such as “A” and “not A” • Disjoint sets such as (train,bus),(air,car) • Any subset including  is a disjoint union of subsets:  = (Air, Train)  (Bus, Car)

  9. Probability is a Measure • The sample space  is a  - field: • Contains at least one nonempty subset (event) • Is closed under complementarity • Is closed under countable union • Probability is a measure defined on all subsets of  • Axioms of Probability • P() = 1 • A    P(A) > 0 • If A  B = {}, P(A  B) = P(A) + P(B)

  10. Implications of the Axioms • P(~A) = 1 – P(A) as A  ~A =  • P() = 0 as  = ~  and P() = 1 • A  B  P(A) < P(B) as B = A + (~A  B) • P(A  B) = P(A) + P(B) – P(A  B)

  11. Probability • Assigning probability: ‘Size’ of an event relative to size of sample space. • Counting rules for equally likely discrete outcomes • Using combinations and permutations to count elements • Example: Discrete uniform, poker hands • Example hypergeometric: the super committee(House 242R,193D, Senate 49R, 51D&I) • Measurement for continuous outcomes

  12. Applications: Games of Chance; Poker • In a 5 card hand from a deck of 52, there are (52*51*50*49*48)/(5*4*3*2*1) different possible hands. (Order doesn’t matter). 2,598,960 possible hands. • How many of these hands have 4 aces? 48 = the 4 aces plus any of the remaining 48 cards.

  13. Some Poker Hands Full House – 3 of one kind, 2 of another. (Also called a “boat.”) Royal Flush – Top 5 cards in a suit Flush – 5 cards in a suit, not sequential Straight Flush – 5 sequential cards in the same suit suit Straight – 5 cards in a numerical row, not the same suit 4 of a kind – plus any other card

  14. 5 Card Poker Hands

  15. The Dead Man’s Hand • The dead man’s hand is 5 cards, 2 aces, 2 8’s and some other 5th card (Wild Bill Hickok was holding this hand when he was shot in the back and killed in 1876.) The number of hands with two aces and two 8’s is 44 = 1,584 • The rest of the story claims that Hickok held all black cards (the bullets). The probability for this hand falls to only 44/2598960. (The four cards in the picture and one of the remaining 44.) • Some claims have been made about the 5th card, but noone is sure – there is no record. http://en.wikipedia.org/wiki/Dead_man's_hand

  16. Budget Supercommittee

  17. Conditional Probability • P(A|B) = P(A,B)/P(B) = Size of A relative to a subset of  • Basic result p(A,B) = p(A|B) p(B) (follows from the definition) • Bayes theorem • Applications – mammography, drug testing, lie detector test, PSA test.

  18. Using Conditional Probabilities: Bayes Theorem

  19. Drug Testing • Data • P(Test correctly indicates disease)=.98 (Sensitivity) • P(Test correctly indicates absence)=.95 (Specificity) • P(Disease) = .005 (Fairly rare) • Notation • + = test indicates disease, – = indicates no disease • D = presence of disease, N = absence of disease • Data: • P(D) = .005 (Incidence of the disease) • P(+|D) = .98 (Correct detection of the disease) • P(–|N) = .95 (Correct failure to detect the disease) • What are P(D|+) and P(N|–)? Note, P(D|+) = the probability that a patient actually has the disease when the test says they do.

  20. More Information • Deduce: Since P(+|D)=.98, we know P(–|D)=.02 because P(-|D)+P(+|D)=1 [P(–|D) is the P(False negative). • Deduce: Since P(–|N)=.95, we know P(+|N)=.05 because P(-|N)+P(+|N)=1 [P(+|N) is the P(Falsepositive). • Deduce: Since P(D)=.005, P(N)=.995 because P(D)+P(N)=1.

  21. Now, Use Bayes Theorem

  22. Independent events • Definition: P(A|B) = P(A) • Multiplication rule P(A,B) = P(A)P(B) • Application: Infectious disease transmission

  23. 2 – Random Variables

  24. Random Variable • Definition: Maps elements of the sample space to a single variable: • Assigns a number to   • Discrete: Payoff to poker hands • Continuous: Lightbulb lifetimes • Mixed: Ticket sales with capacity constraints. (Censoring)

  25. Market Behavior: Fair Isaacs credit card service to major vendors • = {Reject, Accept} • X = 0=reject, 1=accept

  26. Caribbean Stud Poker {---------------- Sample Space --------------} Probability Variable

  27. Features of Random Variables • Probability Distribution • Mass function: Prob(X=x)=f(x) • Density function: f(x), x = ... • Cumulative probabilities; CDF • Prob(X < x) • F(x) • Quantiles: x such that F(x) = Q • Median: x = median, Q = 0.5.

  28. Discrete Random Variables • Elemental building block • Bernoulli: Credit card applications • Discrete uniform: Die toss • Counting Rules • Binomial: Family composition • Hypergeometric: House/Senate Supercommittee • Models • Poisson: Diabetes incidence, Accidents, etc.

  29. Market Behavior: Fair Isaacs credit card service to major vendors X = 0=reject, 1=accept Prob(X=x)=(1-p)(1-x)px, x=0,1

  30. Binomial Sum of n Bernoulli trials

  31. Examples

  32. Poisson • Approximation to binomial • General model for a type of process

  33. Poisson Approximation to Binomial

  34. Diabetes Incidence per 1000 http://www.cdc.gov/diabetes/statistics/incidence/fig2.htm

  35. Poisson Distribution of Disease Cases in 1000 Draws with =7

  36. Poisson Process: Doctor visits in the survey year by people in a sample of 27,326.  = .8 Poisson probability model is a description of this process, not an approximation

  37. Continuous RV • Density function, f(x) • Probability measure P(event) obtained using the density. • Application: Lightbulb lifetimes?

  38. Probability Density Function; PDF

  39. CDF and Quantiles • pth = quantile; 0 < p < 1 • Quantile = xp such that F(xp) = p. • xp = F-1(p). • For p = .5, xp = median

  40. Model for Light Bulb Lifetimes This is the exponential model for lifetimes. The modelis f(time) = (1/μ) e-time/μ

  41. Model for Light Bulb Lifetimes The area under the entire curve is 1.0.

  42. Continuous Distribution The probability associated with an interval such as 1000 < LIFETIME < 2000 equals the area under the curve from the lower limit to the upper. A partial area will be between 0.0 and 1.0, and will produce a probability.

  43. Probability of a Single Value Is Zero The probability associated with a single point, such as LIFETIME=2000, equals 0.0.

  44. Probabilities via the CDF

  45. Probability for a Range of Values Based on CDF Prob(Life < 2000) (.7364) Minus Prob(Life < 1000) (.4866) Equals Prob(1000 < Life < 2000) (.2498)

  46. Common Continuous RVs • Continuous random variables are all models; they do not occur in nature. The model builder’s toolkit: • Continuous uniform • Exponential • Normal • Lognormal • Gamma • Beta • Defined for specific types of outcomes

  47. Continuous Uniform • f(x) = 1/(b – a), a < x < b • F(x) = x/(b – a), a < x < b.

  48. Exponential • f(x) =  exp(-x), x > 0, 0 otherwise • F(x) = 1 – exp(-x), x > 0 Median: F(M) = .5 1 – exp(-M) = .5 exp(-M) = .5 – M = ln.5 M = -ln.5/ = (ln2)/

  49. Gamma Density Uses the Gamma Function

More Related