Notes from the GAW14 “Genetic Analysis Workshop 14” September 7-10, 2004 Noordwijkerhout, NL

Notes from the GAW14“Genetic Analysis Workshop 14”September 7-10, 2004Noordwijkerhout, NL Kelly Burkett September 20th, 2004

Background (1) • The focus of statistical geneticists and genetic epidemiologists is gene mapping (or finding genes/polymorphisms which predispose to inherited diseases) and issues related to this aim • Ex: population substructure, data missing due to technology, effects of genotyping error etc. • Some analyses use techniques applicable only to genetic data (Ex. Linkage analyses); others don’t (Ex. Association analyses through Case-control samples) • Biological processes and population genetics are often exploited in genetic analyses. On the other hand, if traditional methods are used, these mechanisms must be accounted for. • I primarily work on Association analyses on population-based samples

Linkage analyses Rely on family data Study the transmission of blocks of DNA within families (nuclear families and extended pedigrees) Across multiple families, if particular regions in the genome are more likely to be present in affecteds than in unaffecteds, these regions are “linked” to disease Must account for the non-independence of members in the family Association Analyses Rely on either family data (trios) or population-based data Look for particular changes in the DNA which are more likely in those who are affected than in unaffected. If found, these polymorphisms are said to be associated with the disease Although standard techniques can be used, must account for “population stratification”, “genetic heterogeneity”, and non-independence of genetic info on the same chromosome Background (2)

Background (3) • DNA markers • Changes distributed in the genome that are easy to genotype and are not themselves necessarily disease-predisposing • Microsatellites- repeats of particular DNA sequences. Ex. (CA)15 . The number of repeats is the change/allele that is measured/genotyped • SNPs- “single nucleotide polymorphisms”. Changes in the DNA sequence which involve the substitution of one DNA base for another. Genotype which base/allele a person has. • DNA markers act essentially as genomic rulers for genome scans. “Which marker is the disease polymorphism most likely to be closest to…” • SNPs in genes or regions that regulate gene expression can actually be related to disease.

Genetic Analysis Workshops “The Genetic Analysis Workshops (GAWs) are a collaborative effort among genetic epidemiologists to evaluate and compare statistical genetic methods. For each GAW, topics are chosen that are relevant to current analytical problems in genetic epidemiology, and sets of real or computer-simulated data are distributed to investigators worldwide. Results of analyses are discussed and compared at meetings held in even-numbered years.” • The Genetic Analysis Workshops were initially motivated by the development and publication of several new algorithms for statistical genetic analysis, and reports that using different methods of analysis often produced conflicting results. • The Workshops provide an opportunity for participants • to test novel methods on the same well-characterized data sets, • to compare results and interpretations, and • to discuss current problems in genetic analysis http://www.gaworkshop.org/

GAW format • More than a year before GAW, suggestions for topic and data sets are requested from those on the GAW mailing list • Data sets are assembled. Six or seven months before each GAW, a memo is sent to individuals on the GAW mailing list announcing the availability of the GAW data, a short description of the data sets and a form for requesting data (March 29th) • Request data and analyse (we started end of April after receiving the data) • Submit written contributions approximately 6-8 weeks before the Workshop (July 29th). Only those who contribute can attend the workshop. The GAW Advisory Committee reviews contributions • Attend GAW. Contributions are divided into topic groups. Groups meet at GAW to put together presentations summarizing all contributions on the topic. The presentations for each group are made at the workshop. • The proceedings of each GAW are published. Proceedings from GAW14 will be published in part by Genetic Epidemiology, and in part by Biomed Central. GAW13 publications can be found at BMC Genetics (Analysis of Longitudinal Family Data for Complex Diseases and Related Risk Factors)

GAW14 Data • Real Data: • From the Collaborative Study on the Genetics of Alcoholism (COGA) • 143 pedigrees with 1614 members in total • Provided with family relationships, discrete and quantitative phenotypes, some covariates • Genetic Data: a microsatellite genome screen (~400), two SNP genome screens (sizes 11,555 and 4763) • Also provided with a 10% replicate sample for those interested in QC • Simulated Data: • Simulated a behavioural disorder with multiple phenotype definitions • 100 replicates of 100 families each are provided; 4 “populations” also simulated • Control samples were generated for each of the replicates in each population • Genetic Data: 416 marker microsatellite scan, 917 SNP marker scan • Could “purchase” more SNPs in particular regions, to a maximum of 20 purchases • Can request answers for power/type I error type studies

This years suggested topics.. • SNP markers versus microsatellite markers • SNPs can take on only one of two forms. Microsatellite markers can take on many. Therefore, it is thought that the information content of SNPs might not be enough to perform genome scans • A subset of this would be how many SNP markers = A microsatellite markers • How does outcome/phenotype definition affect the results of gene mapping studies • BUT.. Any analysis that involves the data that all participants are given is acceptable!

Summary of topics: • Linkage mapping methods (real and simulated) • Quantitative Trait mapping • Heterogeneity • Parent-of-origin, “imprinting” etc • Multivariate analyses • Analyses of Alcoholism, Smoking and Related Traits • Data Mining • Genotyping Errors/Pedigree Errors and Missing Data • Haplotypes and TagSNPs • Detection and Implications of LD • Association Mapping • Case-Control Analyses • Integrating SNPs and microsatellites • SNPs vs microsatellites on linkage analyses (real and simulated) • Fine Mapping

“A comparison of three methods for selecting tagging single nucleotide polymorphism” Matt Pratola, Kelly Burkett, Mercedeh Ghadessi, Brad McNeney, Jinko Graham and Denise Daley • Background: • Chromosomes are inherited as blocks from each parent • Variant at markers on a chromosome are not independent. Due to recombination though, the farther two markers, the more independent their values • Markers in a gene or region though will have redundant information • A “tagSNP” is a marker which summarizes the information from multiple markers • To save money, only want to genotype tagSNPs

“A comparison…” (2) • Association studies that I work with, each is studying ~40 genes • To genotype all variants in gene would be cost-prohibitive. However not genotyping all will result in a loss of power • Interested in the performance of different algorithms for choosing tagSNPs with respect to the power to detect a true disease association. • Used one population of the simulated data. • Created a case/control study and used their definition of “affected” for Kofendred syndrome • For each replicate, we used a sub-sample to choose tagSNPs. Then only used information from the tagSNPs for the association study. Measured the proportion of replicates having less than 0.05 p-values (Bonferroni corrected). • Didn’t have time to complete all 100 replicates. Will do for the final publication • Conclusions: simulated data wasn’t realistic enough.. Quite disappointing!

Notes on my experience • A lot of work !! Basically three months to come up with a topic, complete an analysis (deal with data issues..), write a 5 page paper and submit it • Apparently the data will be made available on-line at GAWweb. • Many Benefits • Topic we studied is directly related to what I have been working on, but all the data simulation was done by someone else • Workshop itself was extremely useful. • To meet those in the field (put faces to names!) • Results from the workshop are highly referenced. The workshop summarised the various contributions that I can reference at work. • To hear that some of the issues that I have started to struggle with are struggled with by those who have way more experience than me. • I get a publication out of it!

Descriptive Stats • Number of attendees: ~230 • From Canada? ~ 23 • Number of papers submitted: ~184 • Number of pages of notes that I took: 11 • Time it will take me to follow up on these notes: ????

Notes from the GAW14 “Genetic Analysis Workshop 14” September 7-10, 2004 Noordwijkerhout, NL

Notes from the GAW14 “Genetic Analysis Workshop 14” September 7-10, 2004 Noordwijkerhout, NL

Presentation Transcript

Russian School Siege September 1, 2004 – September 3, 2004

The Application of Genetic Programming to Financial Modeling

Center for Human Genetics Genetic Analysis Manual

RMI Workshop - Genetic Algorithms

Pedigree Analysis

How to make sense of genetic studies in AML and MDS

Genetic Analysis in Human Disease

2004 SARX-2004: IX Latin American Seminary of Analysis by X-Ray Techniques

Overview

January 2004

History Of DNA Notes

ARDA Workshop and ATLAS Distributed Analysis Plans

Genetic Programming

WGISS 18, September 2004 Presented by Ivan/Yonsook

Technical Support 2004 Workshop

Association analysis

Flexibility index analysis using Genetic Algorithm

Organisation for Economic Co-Operation and Development 2004 Edition of Education at a Glance