Development of Innovative Genetic Analysis Tools for Variant Detection and Evaluation

Tests utilizing read data- Andrew, Yu • Under development • Use number of reads and proportion of variant reads at a site directly • Case-control burden and collapsing tests • Rare variant transmission distortion tests • May provide an effective solution to differential coverage problem • Wiki: ATAV • ATAV: yes with new system • Regulated: ATAV

Three-group analysis (gene-level)- Andrew, Janice • Analyzes the effect of variants across three groups (as apposed to just case-control) • Accounting for the fact that only some of the possible patterns of allele frequencies across these groups are biologically plausible • Can take covariates • Single locus analysis is up and running (in R) • Gene-level analysis under development • Wiki: yes if widely usable and not in ATAV • ATAV: yes • Regulated: yes/ATAV

De-novo-Poisson-Tester- Andrew, Yu, Yujun • Identifies whether there is an enrichment of de novo mutations • Weighted version under development • Incorporate functional data (e.g., Polyphen2) directly into the test statistic • Wiki: maybe • Distribution probably not needed unless it will be run many times on smaller datasets • ATAV: no • Regulated: maybe • If only being run as a final analysis on large datasets, precise methods can be determined each time

eQTL analyses- Andrew, Chuck • Under development • Score test for tissue-specific expression quantitative trait loci • What projects will use this? • Wiki: if widely usable • ATAV: no • Regulated: probably

Fisher’s Exact Test Permutation Tool- Quanli • Hundreds of thousands of times faster than R • Should be kept in mind when tools need permutation • Wiki: no • ATAV: no • Regulated: no

SV-Simu-Viewer- Yujun • Under development • Simulates and then creates pictures of SVs • Wiki: probably not • Is it used for multiple projects? • ATAV: no • Regulated: no

Somatic-Mutation-Rater- Andrew, Yujun • Under development • Calculates and compares somatic mutation rates • Wiki: yes if widely usable • ATAV: no (maybe if somaticannoDB) • Regulated: yes if widely used

Novel-Seq-Finder- Yujun • Under development • A pipeline for acquiring novel sequences that are not in the reference • Wiki: no • ATAV: no • Regulated: no

Regulatory RVIS- Ayal, Quanli, Slave, Andrew • Under development • CHGV based non-coding measures of evolutionary constraint. • Wiki: yes • ATAV: yes • Regulated: yes

Artifact flagging- Slave • Continues to be under development; new analysis done Christmas time he said? • Putative artifacts and sites of preferential alignment/alternative error are flagged for suggested exclusion of variants • Through comparison with EVS so far • Warnings provided for genes that have excess artifacts • Now extending to flagging sites of repetitive high-confidence de novo mutation calls across multiple trios. • Wiki: ATAV • ATAV: already uses • Regulated: ATAV • Careful consideration and announcement when changing artifact file

DNM Filter-Yongzhuang and Xiaolin • They are going to test this on the malformations and see how it does; may be better for genomes • Uses machine learning to filter candidate DNM calls from GATK or other trio-aware callers • Require trio-aware or can use multi sample calls? • Training data built from 264 epi4k trios and used to predict true DNMs from candidate DNM call set • Selects highly confident DNM calls by generating a score for each candidate DNM • Can specify ranking and thresholding methods for scoring • Designed to expedite obtaining highly confident DNM calls from WGS trio analysis • Has it been tested against current de novo pipeline? • Wiki: yes if outperforms current method • ATAV: no • Regulated: yes if outperforms current method

Development of Innovative Genetic Analysis Tools for Variant Detection and Evaluation