Advanced Statistical Methods: Beyond Linear Regression

1 / 23

# Advanced Statistical Methods: Beyond Linear Regression - PowerPoint PPT Presentation

Advanced Statistical Methods: Beyond Linear Regression. John R. Stevens Utah State University Notes 1. Case Study Data Sets Mathematics Educators Workshop 28 March 2009. 1. http://www.stat.usu.edu/~jrstevens/pcmi. Why this workshop?. Me … Outreach mission of USU

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Advanced Statistical Methods: Beyond Linear Regression' - sancha

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Advanced Statistical Methods:Beyond Linear Regression

John R. Stevens

Utah State University

Notes 1. Case Study Data Sets

Mathematics Educators Workshop

28 March 2009

1

http://www.stat.usu.edu/~jrstevens/pcmi

Why this workshop?
• Me …
• Outreach mission of USU
• Too much fun
• You …

2

Outline
• Notes 1: Case Study Data sets
• 1. Challenger Explosion
• 2. Beetle Fumigation
• 3. T-cell Cancer
• Notes 2: Statistical Methods I
• Logistic Regression – incl. Separation of Points
• EM Algorithm
• Notes 3: Statistical Methods II
• Tests for Differential Expression
• Multiple hypothesis testing
• Visualization
• Machine Learning
• Notes 4: Computer Implementation
• (Notes 5): Bonus Material

3

Case Study 1: Challenger
• January 18, 1986 explosion prompted the Presidential Commission on the Space Shuttle Challenger Accident
• Commission's 1986 report attributed the explosion to a burn through of an O-ring seal at a field joint in one of the solid-fuel rocket boosters
• After each of the previous 24 launches, the solid rocket boosters were inspected, and the presence or absence of damage to the field joint was noted

Obs Flight Temp Damage

1 STS1 66 NO

2 STS9 70 NO

3 STS51B 75 NO

4 STS2 70 YES

5 STS41B 57 YES

6 STS51G 70 NO

7 STS3 69 NO

8 STS41C 63 YES

9 STS51F 81 NO

10 STS4 80

11 STS41D 70 YES

12 STS51I 76 NO

13 STS5 68 NO

14 STS41G 78 NO

15 STS51J 79 NO

16 STS6 67 NO

17 STS51A 67 NO

18 STS61A 75 YES

19 STS7 72 NO

20 STS51C 53 YES

21 STS61B 76 NO

22 STS8 73 NO

23 STS51D 67 NO

24 STS61C 58 YES

Challenger Data

Motivating question:

What was sodifferent on the 25th launch?

Case Study 2: Beetle Fumigation – Rhyzopertha Dominica

(Image courtesy Clemson University – USDA Cooperative Extension Slide Series, www.insectimages.org)

Motivation
• Beetle: lesser grain borer
• A primary pest of stored grain
• A year-round problem in moderate climates
• Australian grain industry:
• \$6–8 billion
• Zero tolerance for insect-infested grain
• Phosphine fumigant for control
• Some beetles have developed resistance levels more than 235 times greater than normal

(UQ News Online, 18 Oct. 1999)

Experimental Background
• Two DNA markers linked to resistance
• rp6.79: two genotypes: –,+
• rp5.11: three genotypes: B,H,A
• Motivating question: What contributes to the degree of resistance?
• Mixture of six beetle genotypes  exposure to various concentrations of fumigant (48 hours)
Practical Considerations in Choosing Dosage
• Clearly a high dosage would kill all beetles, regardless of genotype
• Time more important than concentration
• Expense more time with lower dose
• Technical limitations maintain concentration in silos
• Safety spontaneous combustion at high conc.
Case Study 3: T-cell Cancer
• Acute lymphoblastic leukemia (ALL)
• leukemia – cancer of white blood cells
• ALL – excess of lymphoblasts (immature cells that become white blood cells)
• Two types of interest here:
• T-cell – manage cell-mediated immune response(activation of cells, release of cytokines)
• B-cell – manage humoral immune response(secretion of antibodies)
• Researchers used gene expression technology
General assumption of microarray technology
• Use mRNA transcript abundance level as a measure of the level of “expression” for the corresponding gene
• Proportional to degree of gene expression
How to measure mRNA abundance?
• Several different approaches with similar themes:
• Affymetrix GeneChip
• Nimblegen array
• Two-color cDNA array
• more
• Representation of genes on slide
• Small portion of gene
• Larger sequence of gene

oligonucleotide arrays

Affymetrix Probes

25 bp

(Images courtesy Affymetrix, www.affymetrix.com)

Affymetrix Technology – GeneChip
• Each spot on array represents a single probe sequence (with millions of copies)
• Perfect match
• Mismatch
• Each gene is represented by a unique set of probe pairs (usually 12-20 probe pairs per probe set)
• These probes are fixed to the array

(Image courtesy Affymetrix, www.affymetrix.com)

Affymetrix Technology – Expression

A tissue sample is prepared so that its mRNA has fluorescent tags; wait for hybridization

(Images courtesy Affymetrix, www.affymetrix.com)

Affymetrix GeneChip

Image courtesy Affymetrix, www.affymetrix.com

Cartoon Representations
• Animation 1: GeneChip structure

(1 min.)

• Animation 2: Measuring gene expression

(2.5 min)

Data: Spot Intensities

Full Array Image

Close-up of Array Image

Images courtesy Affymetrix, www.affymetrix.com

Basic goal of microarray technology
• “Observe” gene expression in different conditions – healthy vs. diseased, e.g.
• Decide which genes’ expression levels are changing significantly between conditions
• Target those genes – to halt disease, e.g.
• Study those genes – to better understand differences at the genetic level

ALL Data

• “Preprocessed” gene expression data
• 12625 genes (hgu95av2 Affymetrix GeneChip)
• 128 samples (arrays)
• a matrix of “expression values” – 128 cols, 12625 rows
• phenotypic data on all 128 patients, including:
• 95 B-cell cancer
• 33 T-cell cancer
• Motivating question: Which genes are changing expression values systematically between B-cell and T-cell groups?
Next …
• Analysis for these case studies
• Build on known statistical methods
• Notice huge potential for additional methods