slide1 l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Hunting for Genes with Longitudinal Phenotype Data Using Stata Stata Conference Boston 2010 July 15, 2010 PowerPoint Presentation
Download Presentation
Hunting for Genes with Longitudinal Phenotype Data Using Stata Stata Conference Boston 2010 July 15, 2010

Loading in 2 Seconds...

play fullscreen
1 / 57

Hunting for Genes with Longitudinal Phenotype Data Using Stata Stata Conference Boston 2010 July 15, 2010 - PowerPoint PPT Presentation


  • 454 Views
  • Uploaded on

Hunting for Genes with Longitudinal Phenotype Data Using Stata Stata Conference Boston 2010 July 15, 2010. John Charles “Chuck” Huber Jr, PhD Assistant Professor of Biostatistics Department of Epidemiology and Biostatistics School of Rural Public Health Texas A&M Health Science Center

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Hunting for Genes with Longitudinal Phenotype Data Using Stata Stata Conference Boston 2010 July 15, 2010' - Lucy


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Hunting for Genes with Longitudinal Phenotype Data Using StataStata Conference Boston 2010July 15, 2010

John Charles “Chuck” Huber Jr, PhD

Assistant Professor of Biostatistics

Department of Epidemiology and Biostatistics

School of Rural Public Health

Texas A&M Health Science Center

jchuber@tamu.edu

co authors
Co-Authors
  • Michael Hallman, PhD (Principal Investigator)
  • Ron Harrist, PhD
  • Victoria Friedel, MA
  • Melissa Richard, MS
  • Huandong Sun

All at University of Texas School of Public Health

motivation project heartbeat
Motivation – Project Heartbeat!

Reference: Fulton, JE, Dai, S, Grunbaum, JA, Boerwinkle, E, Labarthe, R (1999) Apolipoprotein E affects serial changes

In total and low-density lipoprotein cholesterol in adolescent girls: Project Heartbeat!. Metabolism 48(3): 285-290

motivation
Motivation

Reference: Fulton, JE, Dai, S, Grunbaum, JA, Boerwinkle, E, Labarthe, R (1999) Apolipoprotein E affects serial changes

In total and low-density lipoprotein cholesterol in adolescent girls: Project Heartbeat!. Metabolism 48(3): 285-290

motivation5
Motivation

Reference: Fulton, JE, Dai, S, Grunbaum, JA, Boerwinkle, E, Labarthe, R (1999) Apolipoprotein E affects serial changes

In total and low-density lipoprotein cholesterol in adolescent girls: Project Heartbeat!. Metabolism 48(3): 285-290

motivation6
Motivation
  • Human genetics studies in the 1990s tended to focus on family data – Project Heartbeat! was a population-based study (no relatives)
  • Genetic studies of unrelated individuals became popular in the 2000s
  • Genetic markers called Single Nucleotide Polymorphisms (SNPs) became cheap to ascertain on a very large scale
what is a snp
What is a SNP?

Hartl & Jones (1998) pg 9, Figure 1.5

what is a snp8
What is a SNP?

Watson et al. (2004) pg 23, Figure 2.5

what is a snp9
What is a SNP?
  • A SNP is a single nucleotide polymorphism (the individual nucleotides are called alleles)

ataagtcgatactgatgcatagctagctgactgacgcgatataagtccatactgatgcatagctagctgactgaagcgat

ataagtccatactgatgcatagctagctgactgacgcgat

ataagtcgatactgatgcatagctagctgactgaagcgat

Person 1 – Chromosome 1

Person 1 – Chromosome 2

Person 2 – Chromosome 1

Person 2 – Chromosome 2

SNP1

SNP2

motivation10
Motivation

Stored Genotype Data

Blood samples and DNA available for 131 African-American and 505 non-Hispanic white children between 8 and 17 years of age.

motivation11
Motivation

Stored Phenotype Data

Longitudinal measurements of:

Body Mass Index

Total Cholesterol

HDL & LDL Cholesterol

Systolic and Diastolic BP

Much, much more…..

motivation12
Motivation

Let’s go gene hunting!!!

challenges
Challenges
  • Longitudinal Data – PLINK or HelixTree?
  • Specialized genetic data analysis
  • Need to run a very large number of graphs and models
  • Multiple comparisons and replication
  • Scaling up to 100,000 SNP Chips
longitudinal phenotype data
Longitudinal Phenotype Data

No PLINK….

No HelixTree….

No dice?

longitudinal phenotype data15
Longitudinal Phenotype Data
  • Stata is well equipped for longitudinal data
    • xtreg
    • xtgee
    • gllamm
    • xtmixed
challenges16
Challenges
  • Longitudinal Data – PLINK or HelixTree?
  • Specialized genetic data analysis
  • Need to run a very large number of graphs and models
  • Multiple comparisons and replication
  • Scaling up to 100,000 SNP Chips
genetic data analysis
Genetic Data Analysis
  • Genotype Frequencies
  • Allele Frequencies
  • Hardy-Weinberg Equilibrium
  • Haplotype Reconstruction
  • Linkage Disequilibrium
  • TagSNPs
stata for genetic data analysis
Stata for Genetic Data Analysis

2007 UK Stata Users Group meeting:

http://www.stata.com/meeting/13uk/

A brief introduction to genetic epidemiology using Stata

Neil Shephard, University of Sheffield

An overview of using Stata to perform candidate gene association analysis will be presented. Areas covered will include data manipulation, Hardy–Weinberg equilibrium, calculating and plotting linkage disequilibrium, estimating haplotypes, and interfacing with external programs.

user written genetics commands
User Written Genetics Commands

Programs written by David Clayton

  • ginsheet- Read genotype data from text files.
  • gloci - Make a list of loci.
  • greshape - Reshape a file containing genotypes to a file of alleles.
  • gtab - Tabulate allele frequencies within genotypes and generate indicators (performs Hardy-Weinberg Equilibrium testing).
  • gtype - Create a single genotype variable from two allele variables.
  • htype - Create a haplotype variable from allele variables.
  • mltdt - Multiple locus TDT for haplotype tagging SNPs (htSNPs).
  • origin - Analysis of parental origin effect in TDT trios.
  • pseudocc - Create a pseudo-case-control study from case-parent trios.
  • pscc - Experimental version of pseudocc in which there may be several groups of linked loci.
  • pwld - Pairwise linkage disequilibrium measures.
  • rclogit - Conditional logistic regression with robust standard errors.
  • snp2hap - Infer haplotypes of 2-locus SNP markers.
  • tdt - Classical TDT test.
  • trios - Tabulate genotypes of parent-offspring trios.
user written genetics commands20
User Written Genetics Commands

Programs written by Adrian Mander

  • gipf - Graphical representation of log-linear models.
  • hapipf - Haplotype frequency estimation using an EM algorithm and log-linear modelling.
  • pedread - Read's pedigree data file (in pre-Makeped LINKAGE format), similar to ginsheet
  • pedsumm - Summarises a pre-Makeped LINKAGE file that is currently in Stata's memory.
  • pedraw - Draws one pedigree in the graphics window
  • plotmatrix - Produces LD heatmaps displaying graphically the strength of LD between markers.
  • profhap - Calculates profile likelihood confidence intervals for results from hapipf
  • swblock - A step-wise hapipf routine to identify the parsimonious model to describe the Haplotype block pattern.
  • qhapipf - Analysis of quantitative traits using regression and log-linear modelling when phase is unknown.
  • hapblock - attempts to find the edge of areas containing high LD within a set of loci
user written genetics commands21
User Written Genetics Commands

Programs written by Mario Cleves

  • gencc - Genetic case-control tests
  • genhw - Hardy-Weinberg Equilibrium tests
  • qtlsnp - A program for testng associations between SNPs an a quantitative trait.

Programs written by Catherine Saunders

  • co_power - Power calculations for Case-only study designs.
  • gei_matching -
  • geipower - Power calculations for Gene-Environment interactions.
  • ggipower - Power calculations for Gene-Gene interactions.
  • tdt_geipower - Power calculations for Gene-Environment interactions via TDT analysis.
  • tdt_ggipower - Power calculations for Gene-Gene interactions via TDT analysis.

Programs written by Neil Shephard

  • genass- Performs a number of statistical tests on your genotypic data and collates the results into a Stata formatted data set for browsing.
user written genetics commands22
User Written Genetics Commands

Programs written by Roger Newson

  • multproc – multiple comparison procedures and False Discovery Rates
  • Far too many others to list…….

Programs written by Chuck Huber

  • Accepted by the Stata Journal
  • phaseout – export genotype data to PHASE
  • phasein – import haplotype data from PHASE
  • haploviewout – export haplotype data in HaploView format for estimating and visualizing LD and other tasks
  • Forthcoming (when I have time to clean them up)
  • snpsumm – summarize allele/genotype frequencies and H-W equilibrium for large numbers of SNPs
  • manhattanplot – creates “Manhattan” plots from the results of a genome-wide association study (GWAS)
user written genetics commands23
User Written Genetics Commands

Command: haplologit

Y. V. Marchenko, R. J. Carroll, D. Y. Lin, C. I. Amos, and R. G. Gutierrez (2008) Semiparametric analysis of case--control genetic data in the presence of environmental factors. The Stata Journal 8 (3): 305-333

user written genetics commands24
User Written Genetics Commands

So Stata is well equipped for

genetic data analysis!

challenges25
Challenges
  • Longitudinal Data – PLINK or HelixTree?
  • Specialized genetic data analysis
  • Need to run a very large number of graphs and models
  • Multiple comparisons and replication
  • Scaling up to 100,000 SNP Chips
looping over graphs and models
Looping Over Graphs and Models

Very simplistic structure of a model:

(12 Phenotypes) x (1753 SNPs) x (5 Candidate Models Each)

= 105,180 Models!

looping over graphs and models27
Looping Over Graphs and Models
  • Looping Over Lists
    • Code:

* LOOPING THROUGH A SINGLE LIST OF WORDS

local SnpList "rs2239560 rs7524046 rs35610691"

foreachsnp of local SnpList {

disp "Currently processing SNP `snp'"

}

    • Output:

Currently processing SNP rs2239560

Currently processing SNP rs7524046

Currently processing SNP rs35610691

looping over graphs and models28
Looping Over Graphs and Models
  • Code:

* LOOPING THROUGH TWO LISTS OF WORDS

local SnpList "rs2239560 rs7524046 rs35610691"

local PhenotypeList "bmisbptc"

foreachPhenotype of local PhenotypeList {

foreachsnp of local SnpList {

disp "The outcome variable is `Phenotype' and the SNP is `snp'."

}

}

  • Output:

The outcome variable is bmi and the SNP is rs2239560.

The outcome variable is bmi and the SNP is rs7524046.

The outcome variable is bmi and the SNP is rs35610691.

The outcome variable is sbp and the SNP is rs2239560.

The outcome variable is sbp and the SNP is rs7524046.

The outcome variable is sbp and the SNP is rs35610691.

The outcome variable is tc and the SNP is rs2239560.

The outcome variable is tc and the SNP is rs7524046.

The outcome variable is tc and the SNP is rs35610691.

looping over graphs and models29
Looping Over Graphs and Models
  • Lowess Curves for each Phenotype/SNP Combination:

LOOPING THROUGH TWO LISTS OF WORDS

local SnpList "rs2239560 rs7524046 rs35610691“

local PhenotypeList "bmi sbp tc“

foreach Phenotype of local PhenotypeList {

foreach snp of local SnpList {

twoway (lowess mean_`Phenotype' mean_age if `snp'=="AA", sort lcolor(red)) /*

*/ (lowess mean_`Phenotype' mean_age if `snp'=="AG", sort lcolor(green)) /*

*/ (lowess mean_`Phenotype' mean_age if `snp'=="GG", sort lcolor(blue)) /*

graph export Graph_`Phenotype'_`snp'.ps, as(ps) logo(off) replace

}

}

Note: Postscript files can be easily combined in Adobe Acrobat Professional

looping over graphs and models30
Looping Over Graphs and Models
  • If we run many models, we need to be able to save the results to an output file.
  • Commands for writing to data files
    • postfile: creates an output data file and describes its structure
    • post: writes data to the output data file
    • postclose: closes the output data file
looping over graphs and models31
Looping Over Graphs and Models
  • Longitudinal model for each Phenotype/SNP Combination:

postfile Output str16 phenotype str16 snp chi2 using OutputFile.dta, replace

local SnpList "rs2239560 rs7524046 rs35610691"

local PhenotypeList "bmi sbp tc"

foreach Phenotype of local PhenotypeList {

foreach snp of local SnpList {

xtmixed `Phenotype‘ age i.`snp‘ c.age#i.`snp‘ || Id: age, cov(unstruct)

post Output ("`Phenotype'") ("`snp'") (e(chi2))

}

}

postclose Output

challenges32
Challenges
  • Longitudinal Data – PLINK or HelixTree?
  • Specialized genetic data analysis
  • Need to run a very large number of graphs and models
  • Multiple comparisons and replication
  • Scaling up to 100,000 SNP Chips
multiple comparisons
Multiple Comparisons
  • In our study, we will be computing hundreds of thousands of p-values. How do we control for multiple comparisons?
    • False Discovery Rates
    • Replication in a second dataset
multiple comparisons34
Multiple Comparisons
  • False Discovery Rates are a collection of methods for adjusting for multiple comparisons commonly used in large scale genetics studies where the number of p-values regularly exceeds 500,000.
  • Calculate a threshold p-value for determining overall statistical significance much like a Bonferroni correction.
false discovery rates
False Discovery Rates

Reference: Benjamini, Y. & Hochberg Y. (1995) Controlling the false discovery rate: a practical and powerful

Approach to multiple testing. Journal of the Royal Statistical Society, Series B 57: 289-300

Copied from Benjamini & Hochberg (1995) page 291

replication data
Replication Data
  • Bogalusa Heart Study
    • Similar longitudinal study
    • Included children in the 8-17 age range
    • 478 African-American participants
    • 1081 non-Hispanic White participants
    • Same phenotypes
    • Same genotypes (more or less)
multiple comparisons37
Multiple Comparisons

Strategy

  • Identify the SNPs in the Project Hearbeat! sample that meet the overall threshold for statistical significance using False Discovery Rates.
  • Run the significant SNPs with the Bogalusa data to check for replication of the results.
challenges38
Challenges
  • Longitudinal Data – PLINK or HelixTree?
  • Specialized genetic data analysis
  • Need to run a very large number of graphs and models
  • Multiple comparisons and replication
  • Scaling up to 100,000 SNP Chips
scaling up to 100 000 snps40
Scaling up to 100,000 SNPs
  • Possible Strategies:
    • Read data from text files in “chunks” using the “infix” command.
    • Bribe Bill Gould with vast quantities of beer.
    • Other suggestions?
actual analysis
Actual Analysis

Disclaimer:

  • Since this study is a work in progress, I have changed the gene and SNP names to protect the innocent.
actual analysis42
Actual Analysis
  • Example Data

. list id age bmi SNP1 SNP2 SNP3 if id==1, sepby(id)

+--------------------------------------------+

| id age bmi SNP1 SNP2 SNP3 |

|--------------------------------------------|

1. | 1 14.33812 26.07 AG GG GG |

2. | 1 14.6694 27.06 AG GG GG |

3. | 1 15.00616 28.33 AG GG GG |

4. | 1 15.40041 28.78 AG GG GG |

5. | 1 15.66324 29.76 AG GG GG |

6. | 1 15.97536 29.29 AG GG GG |

7. | 1 16.33128 28.28 AG GG GG |

8. | 1 16.65435 29.85 AG GG GG |

9. | 1 17.01848 28.52 AG GG GG |

10. | 1 17.30595 27.96 AG GG GG |

11. | 1 17.63997 28.28 AG GG GG |

+--------------------------------------------+

actual analysis43
Actual Analysis
  • Variable “Characteristics”

* EXAMPLE OF HOW TO ADD CHARACTERISTICS TO A VARIABLE AND EXTRACT THEM TO A LOCAL MACRO

char SNP1[chromosome] 7

char SNP1[gene] Gene1

char SNP1[position] 142702852

local TempChromosome : char SNP1[chromosome]

local TempGene : char SNP1[gene]

local TempPosition : char SNP1[position]

. disp "SNP1 is on Chromosome `TempChromosome', in `TempGene' at position `TempPosition'"

SNP1 is on Chromosome 7, in Gene1 at position 142702852

actual analysis44
Actual Analysis

Lowess curve of BMI over age

actual analysis45
Actual Analysis

Data checking with the “snpsumm” command:

. snpsumm SNP*, listgeno

Genotype Information

=================================================================

gen1, gen2 and gen3 are the genotypes

gencou~1, gencou~2, gencou~3 are the counts of each genotype

genfreq1, genfreq2 and genfreq3 are the genotype frequencies

=================================================================

+----------------------------------------------------------------------------------------------------------+

| Marker gen1 gencou~1 genfreq1 gen2 gencou~2 genfreq2 gen3 gencou~3 genfreq3 gentotal |

|----------------------------------------------------------------------------------------------------------|

1. | SNP1 AA 23 0.0375 AG 177 0.2887 GG 413 0.6737 613 |

2. | SNP2 AA 37 0.0605 AG 200 0.3268 GG 375 0.6127 612 |

3. | SNP3 AG 1 . GG 612 . . . . |

4. | SNP4 AA 35 0.0571 AG 201 0.3279 GG 377 0.6150 613 |

5. | SNP5 AA 203 0.3524 AG 259 0.4497 GG 114 0.1979 576 |

|----------------------------------------------------------------------------------------------------------|

6. | SNP6 AA 55 0.0899 AG 251 0.4101 GG 306 0.5000 612 |

7. | SNP7 AG 1 . GG 612 . . . . |

8. | SNP8 AA 8 0.0131 AG 124 0.2023 GG 481 0.7847 613 |

9. | SNP9 AA 41 0.0669 AG 204 0.3328 GG 368 0.6003 613 |

10. | SNP10 AA 51 0.0833 AC 247 0.4036 CC 314 0.5131 612 |

|----------------------------------------------------------------------------------------------------------|

11. | SNP11 AA 30 0.0489 AG 208 0.3393 GG 375 0.6117 613 |

+----------------------------------------------------------------------------------------------------------+

actual analysis46
Actual Analysis

Data checking with the “snpsumm” command:

. snpsumm SNP*, listallele

Allele Information

=================================================================

a1 and a2 are the alleles

acount1 and acount2 are the counts of each allele

afreq1 and afreq2 are the counts of each allele

maf is the Minor Allele Frequency

=================================================================

+--------------------------------------------------------------------------+

| Marker a1 acount1 afreq1 a2 acount2 afreq2 atotal maf |

|--------------------------------------------------------------------------|

1. | SNP1 A 223 0.1819 G 1003 0.8181 1226 0.1819 |

2. | SNP2 A 274 0.2239 G 950 0.7761 1224 0.2239 |

3. | SNP3 G 614 . G . . . . |

4. | SNP4 A 271 0.2210 G 955 0.7790 1226 0.2210 |

5. | SNP5 A 665 0.5773 G 487 0.4227 1152 0.4227 |

|--------------------------------------------------------------------------|

6. | SNP6 A 361 0.2949 G 863 0.7051 1224 0.2949 |

7. | SNP7 G 614 . G . . . . |

8. | SNP8 A 140 0.1142 G 1086 0.8858 1226 0.1142 |

9. | SNP9 A 286 0.2333 G 940 0.7667 1226 0.2333 |

10. | SNP10 A 349 0.2851 C 875 0.7149 1224 0.2851 |

|--------------------------------------------------------------------------|

11. | SNP11 A 268 0.2186 G 958 0.7814 1226 0.2186 |

+--------------------------------------------------------------------------+

actual analysis47
Actual Analysis

Data checking with the “snpsumm” command:

. snpsumm SNP*, listhw

Hardy-Weinberg Equilibrium Information

=================================================================

maf is the Minor Allele Frequency

hw_c2 is the Pearson Chi-squared

hw_c2p is the Pearson Chi-Squared p-value

hw_lr is the Likelihood Ratio Chi-squared

hw_lrp is the Likelihood Ratio Chi-Squared p-value

hw_ex is the Exact p-value

=================================================================

+------------------------------------------------------------+

| Marker maf hw_c2 hw_c2p hw_lr hw_lrp hw_ex |

|------------------------------------------------------------|

1. | SNP1 0.1819 0.54 0.4605 0.53 0.4662 0.4965 |

2. | SNP2 0.2239 2.17 0.1407 2.10 0.1470 0.1618 |

3. | SNP3 . . . . . . |

4. | SNP4 0.2210 1.40 0.2363 1.37 0.2425 0.2410 |

5. | SNP5 0.4227 3.57 0.0589 3.56 0.0591 0.0605 |

|------------------------------------------------------------|

6. | SNP6 0.2949 0.12 0.7316 0.12 0.7321 0.7705 |

7. | SNP7 . . . . . . |

8. | SNP8 0.1142 0.00 0.9979 0.00 0.9979 1.0000 |

9. | SNP9 0.2333 2.98 0.0844 2.88 0.0896 0.0901 |

10. | SNP10 0.2851 0.06 0.8050 0.06 0.8053 0.8427 |

|------------------------------------------------------------|

11. | SNP11 0.2186 0.03 0.8670 0.03 0.8673 0.9058 |

+------------------------------------------------------------+

actual analysis48
Actual Analysis
  • Reconstructing haplotypes using PHASE without leaving Stata!

local PositionList "142702852 142736196 142747932 etc.......”

phaseout SNP*, idvar(id) filename("Gene1.inp") position(`PositionList')

shell PHASE -S1234 Gene1.inp Gene1.out 100 1 100

clear

phasein Gene1.out, markers("MarkerList.txt") positions("PositionList.txt")

what is a haplotype
What is a Haplotype?
  • A haplotype is the combination of one or more alleles found on the same chromosome
    • Person 1 has a “gc” haplotype and a “ca” haplotype
    • Person 2 has a “cc” haplotype and a “ga” haplotype

ataagtcgatactgatgcatagctagctgactgacgcgatataagtccatactgatgcatagctagctgactgaagcgat

ataagtccatactgatgcatagctagctgactgacgcgat

ataagtcgatactgatgcatagctagctgactgaagcgat

Person 1 – Chromosome 1

Person 1 – Chromosome 2

Person 2 – Chromosome 1

Person 2 – Chromosome 2

SNP1

SNP2

actual analysis50
Actual Analysis

The resulting haplotypes are back in Stata:

. list id haplotype SNP1 SNP2 SNP3 in 1/10, sepby(id)

+---------------------------------------+

| id haplotype SNP1 SNP2 SNP3 |

|---------------------------------------|

1. | 1 AGGGAAGGGCG A G G |

2. | 1 GGGGAGGGGAG G G G |

|---------------------------------------|

3. | 2 GGGGAGGGGCG G G G |

4. | 2 GGGGAGGGGCA G G G |

|---------------------------------------|

5. | 3 GGGGAGGGGAG G G G |

6. | 3 GGGAGAGGGCA G G G |

|---------------------------------------|

7. | 4 AGGGAGGGGAG A G G |

8. | 4 GGGAGGGAGCG G G G |

|---------------------------------------|

9. | 5 GGGGAGGGGAG G G G |

10. | 5 GAGGGGGGACG G A G |

+---------------------------------------+

actual analysis51
Actual Analysis

haploviewout SNP*, idvariable(id) filename("Gene1") poslabel

actual analysis53
Actual Analysis

. multproc, pval(pvalue) meth(simes) rank(FDR_rank) critical(FDR_critical) reject(FDR_reject)

. list Chromosome Position pvalueFDR_rankFDR_criticalFDR_reject in 1/22, sepby( FDR_reject)

+------------------------------------------------------------------+

| Chromo~e Position pvalueFDR_rankFDR_cri~lFDR_re~t |

|------------------------------------------------------------------|

1. | 3 1.60e+08 .0000372 1 .00003266 Yes |

2. | 8 4.59e+07 .0000465 2 .00006532 Yes |

3. | 12 9.73e+07 .0000529 3 .00009798 Yes |

4. | 7 7.08e+07 .0000661 4 .00013063 Yes |

5. | 3 2.02e+08 .0000701 5 .00016329 Yes |

6. | 11 3.02e+07 .0001106 6 .00019595 Yes |

7. | 2 2.15e+07 .0001391 7 .00022861 Yes |

8. | 5 9.80e+07 .0001418 8 .00026127 Yes |

9. | 4 9229619 .0002013 9 .00029393 Yes |

10. | 2 2.02e+08 .0002179 10 .00032658 Yes |

11. | 5 1.39e+08 .0002698 11 .00035924 Yes |

12. | 18 5.69e+07 .0003339 12 .0003919 Yes |

13. | 12 8.07e+07 .0003429 13 .00042456 Yes |

14. | 16 5.66e+07 .0004299 14 .00045722 Yes |

15. | 3 9249973 .0004815 15 .00048988 Yes |

16. | 9 1.43e+08 .0005735 16 .00052253 Yes |

17. | 19 4.66e+07 .0005778 17 .00055519 Yes |

18. | 8 2.29e+08 .0006019 18 .00058785 Yes |

19. | 13 4.65e+07 .0006124 19 .00062051 Yes |

|------------------------------------------------------------------|

20. | 1 4.39e+07 .0007301 20 .00065317 No |

21. | 5 1.52e+08 .000731 21 .00068583 No |

22. | 8 4.88e+07 .0007519 22 .00071848 No |

+------------------------------------------------------------------+

This continues for

all 1753 SNPs

actual analysis54
Actual Analysis

manhattanplot pvalue Chromosome Position, critical(`FDR_cutoff')

This is VERY heavily based on code by Stephen Turner and Will Bush of Vanderbilt University

http://gettinggeneticsdone.blogspot.com/2010/01/genome-wide-manhattan-plots-in-stata.html

summary
Summary

Stata is a very useful platform for doing longitudinal genome-wide association studies!

acknowledgements
Acknowledgements
  • Grant 1-R01DK073618-02 from the National Institute of Diabetes and Digestive and Kidney Diseases
  • Michael Hallman, PhD
    • Assistant Professor of Epidemiology, UTSPH-Houston
  • Ron Harrist, PhD
    • Associate Professor of Biostatistics, UTSPH-Austin
  • Eric Boerwinkle, PhD
    • Professor and Director of the Division of Epidemiology
    • Kozmetsky Family Chair in Human Genetics, UTSPH-Houston
  • Darwin Labarthe, MD, PhD, MPH
    • Director of the Division for Heart Disease and Stroke Prevention, CDC-Atlanta
references
References
  • Barrett, J., Fry, B., Maller, J., & Daly, M. (2005). Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics, 21, 263-265.
  • Hartl, D.L., Jones, E.W. (1998) Genetics: Principles and Analysis, 4th Ed. Jones & Bartlett Publishers
  • Stephens, M., & Donnelly, P. (2003). A Comparison of Bayesian Methods for Haplotype Reconstruction from Population Genotype Data. American Journal of Human Genetics, 73, 1162–1169.
  • Stephens, M., Smith, N. J., & Donnelly, P. (2001). A New Statistical Method for Haplotype Reconstruction from Population Data. American Journal of Human Genetics, 68, 978–989.
  • Watson, J.D., Baker, T.A., Bell, S.P., Gann, A., Levine, M., Losick, R. (2004) Molecular Biology of the Gene, 5th Ed. Benjamin Cummings