the r genetics package t ools for statistical genetics l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
The R genetics package: T ools for statistical genetics PowerPoint Presentation
Download Presentation
The R genetics package: T ools for statistical genetics

Loading in 2 Seconds...

play fullscreen
1 / 20

The R genetics package: T ools for statistical genetics - PowerPoint PPT Presentation


  • 613 Views
  • Uploaded on

The R genetics package: T ools for statistical genetics. Gregory R. Warnes Associate Director NonClinical Statistics Pfizer Global R&D Groton CT. Outline. Project Goals Simplify Population Genetic Analysis Design Details Extend R ‘Factor’ objects Functions Included

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'The R genetics package: T ools for statistical genetics' - elina


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
the r genetics package t ools for statistical genetics

The R genetics package:Tools for statistical genetics

Gregory R. Warnes

Associate Director

NonClinical Statistics

Pfizer Global R&D

Groton CT

outline
Outline
  • Project Goals

Simplify Population Genetic Analysis

  • Design Details

Extend R ‘Factor’ objects

  • Functions Included
    • Genetic data: Importing & Creation, Manipulation, Information, Annotation, Transformation, Export
    • Statistical Functions: Hardy-Weinberg (Dis-)Equilibrium, Linkage Disequlibrium, Haplotype Imputation, Sample-size tools
  • Simple Examples
    • Creating Genotype Objects
  • Example Session
  • Future Development:
    • Emulate BioConductor Project
    • Large scale SNP analysis
    • Formal Object Class
    • Multi-team collaboration

CT ASA Mini Conference: 2005-03-05

problem
Problem
  • At each genetic position within a gene, diploid cells have two alleles.
  • This suggests storing each allele as separate variable.
  • However, most laboratory methods cannot distinguish between A/B and B/A, yielding three observed genotypes at each position: (A/A), (A/B or B/A), (B/B). Consequently, the observed alleles are confounded,

 This suggests the use of a single genotype variable.

  • This duality is not directly handled by standard statistical packages.

 As a consequence, the need to handle both views creates complexity when manipulating or including genotype data in statistical analysis.

CT ASA Mini Conference: 2005-03-05

initial project goals
Initial Project Goals

Simplify Statistical Analysis using Genetic Data by providing:

  • A genotype object class that appropriately captures the single variable / separate allele duality
  • Methods to import and manipulate genotype objects without string manipulation
  • Simple tools including different ‘views’ of genotype variables in standard statistical models
    • Dominant ( at least one copy of X)
    • Recessive ( both alleles are X)
    • Additive ( Number of copies of X)
    • Heterozygote Effect (Differing Alleles)
    • Independent ( separate effect for each allele combination: A/A, A/B=B/A, B/B)
  • Functions for computing and visualizing common genetic summaries and statistical tests
    • Allele Frequencies
    • Hardy-Weinberg Equilibrium
    • Linkage Disequilibrium
  • Other statistical methods

CT ASA Mini Conference: 2005-03-05

design details
Design Details
  • Design:
    • Genotypes are stored in ‘Factor’ objects, with factor levels formatted as ‘A/C’.
    • A translation table is constructed to quickly extract individual allele information:
  • Consequences
    • Can be stored in standard data frames
    • Can be efficiently manipulated (space & time)
    • Permits both biallelic (C/T) and multi-allelic genetic markers (SSLP’s)

CT ASA Mini Conference: 2005-03-05

genotype manipulation
Genotype Manipulation
  • Importing & Creation

genotype(), as.genotype(), makeGenotypes(), …

haplotype(), as.haplotype(), makeHaplotypes(), …

  • Manipulation

[] (subsetting), []<- (subset assignment), == (equality)

  • Information

summary() (Allele and genotype counts and frequencies), allele.names(),

allele() (Extract individual alleles), nallele() (Number of distinct allele values)

  • Annotation

locus(), gene(), marker(), …

  • Transformation

carrier(), homozygote(), heterozygote(),

allele.count()

  • Export

write.marker.file(), write.pedigree.file(),

write.pop.file()

CT ASA Mini Conference: 2005-03-05

installation
Installation

Windows GUI:

Command Line:

> install.packages(“genetics”,

dependencies=TRUE)

CT ASA Mini Conference: 2005-03-05

statistical functions
Statistical Functions
  • Hardy-Weinberg (Dis-)Equilibrium: D, D’, r, r2, X2

diseq(), diseq.ci() (Confidence Intervals!)

HWE.test(), HWE.chisq(), HWE.exact()

  • Linkage Disequlibrium: D, D’, r, r2

LD(), LDplot(), LDtable()

  • Haplotype Imputation:

hap(), hapambig(), hapmcmc(), hapenum(), hapshuffle()

  • Sample-size tools

gregorius() (Probability of observing a marked of given frequency with specified sample size)

power.casectrl()

  • Utilities

Bootstrap.ci

CT ASA Mini Conference: 2005-03-05

simple examples creating genotype objects
Simple Examples : Creating Genotype Objects

A single vector with a character separator:

> g1 <- genotype( c('A/A','A/C','C/C','C/A',

+ NA,'A/A','A/C','A/C') )

> g3 <- genotype( c('A A','A C','C C','C A',

+ '','A A','A C','A C'),

+ sep=' ', remove.spaces=F)

CT ASA Mini Conference: 2005-03-05

simple examples creating genotype objects10
Simple Examples : Creating Genotype Objects

A single vector with a positional separator

> g2 <- genotype( c('AA','AC','CC','CA','',

+ 'AA','AC','AC'), sep=1 )

Two separate vectors

> g4 <- genotype(

+ c('A','A','C','C','','A','A','A'),

+ c('A','C','C','A','','A','C','C')

+ )

CT ASA Mini Conference: 2005-03-05

simple examples creating genotype objects11
Simple Examples : Creating Genotype Objects

A dataframe or matrix with two columns

> gm <- cbind(

+ c('A','A','C','C','','A','A','A'),

+ c('A','C','C','A','','A','C','C') )

> gm

[,1] [,2]

[1,] "A" "A"

[2,] "A" "C"

[4,] "C" "A"

> g5 <- genotype( gm )

> g5

[1] "A/A" "A/C" "C/C" "A/C" NA "A/A" "A/C" "A/C"

Alleles: A C

CT ASA Mini Conference: 2005-03-05

simple examples creating genotype objects12
Simple Examples : Creating Genotype Objects

Convert 1-column genotype variables read from a file:

> gm1 <- makeGenotypes(

+ read.csv("gm1.csv"))

> gm1

Age Sex G1 V2

1 31 M A/A G/T

2 27 F A/C G/G

3 35 M C/C G/T

4 19 M A/C G/T

5 55 M <NA> G/G

6 34 F A/A G/G

7 45 F A/C T/T

8 32 M A/C G/T

> gm1$G1

[1] "A/A" "A/C" "C/C" "A/C" NA "A/A" "A/C" "A/C"

Alleles: A C

_ gm1.csv __

Age,Sex,G1,G2

31,M,A/A,G/T

27,F,A/C,G/G

35,M,C/C,G/T

19,M,A/C,G/T

55,M,,G/G

34,F,A/A,G/G

45,F,A/C,T/T

32,M,A/C,G/T

CT ASA Mini Conference: 2005-03-05

simple examples creating genotype objects13
Simple Examples : Creating Genotype Objects

Convert 2-column genotype variables read from a file

> gm2 <- makeGenotypes(

+ read.csv("gm2.csv"),

+ convert=list(3:4,5:6))

> gm2

Age Sex G1.1/G1.2 V2.1/V2.2

1 31 M A/A G/T

2 27 F A/C G/G

3 35 M C/C G/T

4 19 M A/C G/T

5 55 M <NA> G/G

6 34 F A/A G/G

7 45 F A/C T/T

8 32 M A/C G/T

______ gm2.csv _____

Age,Sex,G1.1,G1.2,G2.1,G2.2

31,M,A,A,G,T

27,F,A,C,G,G

35,M,C,C,T,G

19,M,C,A,G,T

55,M,,,G,G

34,F,A,A,G,G

45,F,A,C,T,T

32,M,A,C,T,G

CT ASA Mini Conference: 2005-03-05

simple examples displaying genotype information
“Raw”

> g5

[1] "A/A" "A/C" "C/C"

[4] "A/C" NA "A/A“

[5] "A/C" "A/C"

Alleles: A C

“Summary”

> summary(g5)

Allele Frequency:

Count Proportion

A 8 0.57

C 6 0.43

NA 2 NA

Genotype Frequency:

Count Proportion

A/A 2 0.29

A/C 4 0.57

C/C 1 0.14

NA 1 NA

Simple Examples : Displaying Genotype Information

CT ASA Mini Conference: 2005-03-05

simple examples extracting allele information
Genotypes (Independent factor levels):

> g5

[1] "A/A" "A/C" "C/C" "A/C"

[5] NA "A/A" "A/C" "A/C"

Alleles: A C

Allele Counts (Additive Effect):

> allele.count(g5, "A")

[1] 2 1 0 1 NA 2 1 1

attr(,"allele")

[1] "A"

Allele presence (Dominant Effect):

> carrier(g5,'A')

[1] TRUE TRUE FALSE TRUE

[5] NA TRUE TRUE TRUE

Allele Homozygote (Recessive Effect):

> homozygote(g5,'A')

[1] TRUE FALSE FALSE FALSE

[5] NA TRUE FALSE FALSE

Heterozygote (Heterozygote Advantage Effect):

> heterozygote(g5,'A')

[1] FALSE TRUE FALSE TRUE

[5] NA FALSE TRUE TRUE

Simple Examples: Extracting allele information

CT ASA Mini Conference: 2005-03-05

simple examples extracting allele information16
First allele:

> allele(g5, 1)

[1] "A" "A" "C" "A" NA "A"

[7] "A" "A"

attr(,"which")

[1] 1

attr(,"allele.names")

[1] "A" "C“

Both alleles:

> allele(g5)

[,1] [,2]

[1,] "A" "A"

[2,] "A" "C"

[3,] "C" "C"

[4,] "A" "C"

[5,] NA NA

[6,] "A" "A"

[7,] "A" "C"

[8,] "A" "C"

attr(,"which")

[1] 1 2

attr(,"allele.names")

[1] "A" "C"

Simple Examples: Extracting allele information

CT ASA Mini Conference: 2005-03-05

example session
Example Session

CT ASA Mini Conference: 2005-03-05

future development
Future Development

R GeneticsNG

  • Mission:

GeneticsNG is a collaborative project to develop a core set of data structures and analytic tools for the management, visualization, and analysis of genetic data. This core will provide sufficient ease of use, stability, features, documentation, and community supportto inspire users and developers to utilize, contribute and extend the system.

  • Goals:
    • Scalable to Whole-Genome genetic analysis (>1e5 SNPs)
    • Read/Write common genetics data storage formats
    • Port existing open-source genetics codes
      • Current R genetics packages (genetics, haplo.score, gap, …)
      • Other open-source packages…
    • Provide good documentation, including tutorials and training
    • Engage the entire R genetics user/developer community

CT ASA Mini Conference: 2005-03-05

future development19
Future Development

R GeneticsNG

  • Current Team
      • Pfizer: Gregory Warnes, Nitin Jain
      • Channing Laboratory (Harvard): Ross Lazarus
      • BMS: Scott D Chasalow, Giovanni Montana
      • Insightful: Michael O'Connell
      • Univ. Chicago: Junsheng Cheng
      • Join us!
  • Project Page:

http://r-genetics.sf.net/

CT ASA Mini Conference: 2005-03-05

references
References
  • R Project:
    • http://www.r-project.org
  • R genetics package:
    • http://cran.r-project.org/contrib/main/Descriptions/genetics.html
  • R-News article:
    • Warnes GR. ``The Genetics Package,'' R News, Volume 3, Issue 1, June 2003.
  • R GeneticsNG project:
    • http://r-genetics.sf.net/
  • Me:
    • http://www.warnes.net
    • Gregory.R.Warnes@Pfizer.com

CT ASA Mini Conference: 2005-03-05