Jayne Duncan FRCPath Course 2010

Look at the Emerging Technologies and consider their Application in the Diagnostic EnvironmentBioinformatic Tools Jayne Duncan FRCPath Course 2010

Keywords • Bioinformatics • Variants of unknown clinical significance • Non-synonymous Missense variants • Splicing variants • Guidelines

Bioinformatic Tools • Bioinformatic tools use computers and statistical techniques to analyse biological data. They are used in the diagnostic lab for interpretation of`; • non-synonymous missense mutations of unknown clinical significance • splice variants that lie out with the canonical splice acceptor and donor sites • Their use in the laboratory has increased over the last few years as a result of: -Increased scale and sensitivity of genetic analysis -Increased use of Sanger sequencing to screen candidate genes to diagnose single gene disorders. • With the advent of next generation sequence technologies and the ability to sequence the entire human genome their use in the diagnostic lab will increase even more in the future.

Establishing Guidelines • Essential to have a set of agreed guidelines; • to assist in the determination of the clinical significance of variants identified in routine screening. • to educate referring clinicians so that they may inform their patients and families appropriately. • Guidelines ratified by the CMGS and the Dutch Society of Clinical Genetic Laboratory Specialists were drawn up in 2008. • Applicable to “the interpretation and reporting of sequence variants of uncertain pathogenicity in genes known to cause inherited Mendelian disease in which molecular genetic testing has a proven clinical validity and utility”.

Proposed Guidelines for Hereditary Breast Cancer • Prior to the CMGS guidelines Vink et al 2005 proposed guidelines for the interpretation of variants of unknown clinical significance (UVs) in hereditary Breast Cancer. • All variants for which pathogenicity is not demonstrated or excluded in peer-reviewed published literature, in a mutation database, or on the basis of own findings are classified as UVs. • Patients are informed by the genetic counsellor of the possibility of finding an UV prior to mutation screening. • The diagnostic lab reports a detected UV to the requesting counsellor, who in turn reports it to the patient. • The uncertainties surrounding the pathogenicity of the UV are discussed, as is the possibility of classification of the UV after further research. An explanation that this may involve co-operation of the patients and their relatives should also be given. • Presymptomatic testing of family members is not offered. Surveillance is offered on the basis of the family history. If this fits a hereditary breast cancer syndrome, surveillance is offered as in families with a BRCA1/2 mutation. • Patients can request prophylactic surgery, but the decision to perform it should be based on the family history and not influenced by the detection of the UV.

Proposed Guidelines continued • Understanding the clinical significance of UVs requires a multidisciplinary approach involving: • Protein function studies • Evolutionary gene sequence conservation • Linkage analysis • Population genetics studies • Other important studies for clarification of individual variants include: co-segregation analysis, RNA analysis and LOH analysis in tumour tissue. • Incorporating this data into a public resource will allow increased consistency in the reporting of UVs, clarification of cancer risk. • Leading to patients receiving more balanced information about their cancer risk.

Interpretation of non-synonymous missense variants • Non-synonymous variants are single base pair substitutions in the DNA that alter the amino acid in the resulting protein. • CMGS guidelines recommend the use of specific tools for the interpretation of such variants. These include: • Polyphen • SIFT • Align-GVGD • Other available sites include: PMut, SNP3D and Panther (output is in the form of a probability, no need for alignments)

Polyphen • Polyphen (Polymorphism phenotyping) is a freely available web based tool • Considers evolutionary conservation, through multiple sequence alignment, physiochemical differences and the proximity of the substitution to predicted functional domains and/or structural features. • It specifically uses annotated UniProt entries to predict whether the amino acid substitution occurs within an important structural or functional site for example, active or binding sites and residues involved in disulphide formation. • Predictive value (accuracy in correctly calling pathogenic mutations) is reliant on the protein of interest having a known annotated crystal structure, or the presence of a similar modelled protein in the UniProt database. • Its scores can be classified as probably damaging (≥2.00), possibly damaging (1.50-1.99), potentially damaging (1.25-1.49), borderline (1.00-1.24) or benign (0.00-0.99). • Recently the Polyphen-2 algorithm has replaced Polyphen. According to Adzubei et al 2010 Polyphen-2 differs from the original Polyphen in the set of predictive features, the alignment pipeline and the method of classification. Although like the original Polyphen the user is unable to input their own alignment into the Polyphen software.

Case studies supporting the use of Polyphen • Lee et al 2007 sequenced the BRCA1 and 2 genes in 1469 population based female breast cancer patients diagnosed between 20 and 49 years of age. • 147 UVs were detected and classified as high risk or low risk based on 5 methods. • Polyphen algorithm, sequence conservation, Grantham matrix scores and a combination of Grantham matrix score and sequence conservation. • Also examined whether women with high risk UVs have characteristics similar to those with known deleterious mutations (e.g. early age at diagnosis, family history and negative oestrogen/progesterone receptor tumours) • All 5 classification methods yielded similar results. However Polyphen was better at isolating BRCA1 UV carriers likely to have a family history of breast or ovarian cancer and may help classify BRCA1 variants

Overview of Polyphen

Polyphen: A Worked Example

SIFT • Sort Intolerant from Tolerant (SIFT) is also a freely available web based tool. • Uses sequence homology of related proteins to predict if an amino acid substitution is likely to be deleterious to protein function based on the degree of conservation of the amino acid through evolution. • Orthologous or Paralogous sequences can be utilised in the evolutionary sequence alignment. • Uses orthologous sequences increases predictive value of SIFT as the encoded proteins will have same function. • SIFT can choose homologous sequences automatically or the user can submit selected pre-aligned sequences to the programme. • An amino acid that is not present at the substitution site in the multiple alignment can still be predicted to be tolerated if there is an amino acid in the alignment that has a similar charge or hydrophobicity. This may not reflect true in vivo situation

Case studies supporting the use of SIFT • Flanagan et al 2010 tested the predictive value of SIFT and PolyPhen on 141 missense variants (131 known pathogenic, of which 66 gain of function and 67 loss of function and 8 known neutral polymorphisms) identified in the ABCC8, GCK and KCNJ11 genes. • SIFT and Polyhen both predicted the pathogenicity of 69.5% of missense variants. When they were used individually this rose to 84%, demonstrating a lack of concordance between programmes. • When results were combined only 56% of variants were called correctly. • Both programmes were better at predicting loss of function mutations rather than gain of function. • Reasons for this are unknown, however it is possible that the substitution of one amino acid for another with a large change in physiochemical properties will cause a loss of function as a result of protein misfolding. Such large amino acid changes are likely to increase the confidence with which SIFT and Polyphen make their predictions. • Gain of function mutations may have a more subtle effect on protein structure, resulting in lower confidence with which the programmes can base their prediction. • Gain of function mutations are also predicted to be less common, therefore the data sets on which predictions are based are likely to be limited • Taken together these two limitations will result in less pronounced change in the parameters of SIFT and Polyphen to predict pathogenicity of gain of function compared to loss of function mutations, resulting in many being classified as benign.

Overview of SIFT

SIFT: A Worked Example

Align GVGD • Align GVGD combines Grantham Variation (GV) (how much evolution variation there is at a given point) and Grantham Distance (GD) (difference between evolutionary amino acid and variant) to give a Grantham score. • Only the most extreme values are classified as most and least likely to interfere with protein function. • Align-GVGD highly dependent on alignment used

Case studies supporting the use of Align-GVGD • Mathe et al 2006 carried out a three step analysis of 1514 missense substitutions in the DNA Binding Domain of TP53, the most frequently mutated gene in human cancers. • Using multiple sequence alignment for each substitution they calculated the GV and the GD. • They then used Align-GVGD to predict the transactivation of each missense substitution. • They compared the predictions against experimentally measured transactivation activity (yeast assays) and predictions made by SIFT to evaluate accuracy. • Predictions showed a high degree of accuracy for mutants showing a loss of transactivation (~88%) with lower prediction accuracy for mutants with a transactivation similar to wild type. • Align-GVGD results were comparable to SIFT and indicate that Align-GVGD can be used as a UV prediction tool.

Align-GVGD: A Worked Example

Interpretation of Nonsynonymous Variants • No Bioinformatic tool is 100% accurate at predicting pathogenicity, • Results should be interpreted with caution and backed up with further functional studies. • Analysis of missense variants should be performed using at least two different programmes, as conflicting results can be generated. • This must taken into account when deciding the likelihood of pathogenicity.

Splice Site Prediction Tools • There are several splice prediction tools commonly used by diagnostic laboratories. • None of these have been fully validated for use in a diagnostic setting and so must be used with caution. • The user is able to adjust the settings on these sites and no information is available on how best to alter these settings. • According to the CMGS best practice guidelines users should use the default settings unless otherwise stated. • Laboratories should be aware that any sequence changes and not just those adjacent to intron/exon boundaries may actually be splice site mutations. • Silent and missense mutations should also be analysed for an effect on splicing, especially when AG or GT dinucleotide sequences are formed.

NGRL have compiled a splice site tools analysis report to assess the performance of a number of tools in the prediction of splicing related variant pathogenicity. The report also assessed the scope of the splice site prediction tools to ensure they could be used in the most appropriate way The analysis allows scientists to use splice site prediction tools in the prediction of pathogenesis with more confidence. Analysis of Splice Site Tools

Tools Chosen for Analysis • Included: GeneSplicer, Human Splice Finder, MaxEntScan, NetGene2, NNSplice and SSFL. • In each algorithmthe splice signal given by the wild type sequence is compared to the splice site signal given by the mutated sequence, supplied by the user. • All methods chosen because they had been recommended for use by UV Guidelines. (MaxEntScan chosen because it is included in the HSF and Alamut splicing interfaces.

Methods • Pathogenic and non-pathogenic splice site related variants retrieved from the literature for analysis. The majority of pathogenic variants were located between 1 and 10 nucleotides from the splice junction. >40 were found within exons and pathogenic variants were also found >100bp from the splice junction. • Only 15 non-pathogenic mutations were found and they mainly occurred at positions further away from the splice site junction. The low number represents the non-reporting of negative results and increases the error rate of the specificity scores. • Default settings and recommended lengths of sequence were used. • Sensitivity (true +ves), specificity (true –ves) and accuracy (true +ves+ true-ves) were measured, as well as the standard errors for each of the statistics • A change in splice site signal of ≥10% was predicted to cause a pathogenic effect. • The UV Guidelines recommend the use of three algorithms to give a consensus prediction. To assign a variant as pathogenic two algorithms had to agree. • To test the range of predictions made by the algorithms at each intronic position near the splice site junction thirteen acceptor and donor splice site junctions from BRCA1 and 2 were analysed. • The wild type base at each position from +1→+10 and -1 →-10 was artificially mutated in silico to each of the remaining three nucleotides and the proportional change in splice signal was recorded for each algorithm

Results • Sensitivity, specificity and accuracy scores showed NNSplice, MaxEntScan , GeneSplicer and SSFL performed the best with between 80 and 92% accuracy and sensitivity. • The removal of variants occurring at +1, +2, -1 and -2 positions reduced the performance of the algorithms, as expected as these always disrupt splice site signalling. • MaxEntScan and NNSplice still achieved an accuracy of >80%. Therefore these algorithms perform reasonably well even with variants where it is more difficult to predict the splicing effect. • The tools were most useful for the prediction of pathogenic and non-pathogenic variants when applied to positions between +3→+7 and -3 to at least -10. At positions further from the splice site no disruption was seen. • The scope of these tools can be defined as the prediction of the disruption of splice sites within these regions. The effect of variants on splice sites further than this cannot be predicted by any of the algorithms. • The tools can however predict new splice sites at other positions. This could occur if the variant caused the sequence surrounding the new splice site to become a closer match to the statistical models used by the tools.

Results- Splice signal Strength

The accuracy obtained by combining results from three algorithms as described in the UV guidelines did not improve the prediction rate of splice site junction variants. However as the Alamut interface performs all four (SSFL, MaxEntScan, NNSplice and Genesplicer) analyses simultaneously it is easy to compare the predictions without a formal consensus method. Results continued

Results-Accuracy

Predicting Splicing Motiffs • Methods such as ESE finder are available to predict splicing enhancer or silencer motifs and branch point motifs • These methods have not been assessed for use in the diagnostic laboratory • The mechanisms by which these motifs regulate splicing are less clearly understood and should only be used with caution. • As with the tools used for predicting the pathogenicity of UVs algorithms alone are not sufficient evidence for a clinical decision. • Results should always be backed up with further evidence.

Adzhubei et al (2010) A method and server for predicting damaging missense mutations. Nature Methods 7 (4) 248-249 Bell et al (2008) CMGS Practice Guidelines for the Interpretation and Reporting of Unclassified Variants in Clinical Molecular Genetics. CMGS website. Flannagen et al (2010) Using SIFT and Polyphen to predict loss of function and gain of function mutations. Genetic Testing and Molecular Biomarkers 14 (4) 533-537. Kumar et al (2009) Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nature Protocols. 4. (8) 1073-82 Lee et al (2007) Evaluation of unclassified variants in the breast cancer susceptibility genes BRCA1 and BRCA2 using five methods: results from a population-based study of young beast cancer patients. Breast Cancer Research. 10. (1) 1-12 Mathe et al (2006) Computational approaches for predicting the biological effect of p53 missense mutations; a comparison of three sequence analysis based methods. Nuc Acids Res 34. (5) 1317-1325 NGRL Splice Site Tools; A Comparative Analysis Report. Beth Hellen 2009 NGRL Best practice guidelines for UVs Vink et al (2005) Unclassified variants in disease-causing genes: non-uniformity of genetic testing and counselling a proposal for guidelines. E.J Hum Genet. 13 525-527 References

Jayne Duncan FRCPath Course 2010

Jayne Duncan FRCPath Course 2010

Presentation Transcript

Duncan

Cally Jayne Christensen

Myanna Duncan

Louise McClelland FRCpath study course 15 th December 2010

Jayne the Jackal

Isadora Duncan

Duncan Jespersen Computers 8 December 2010

FRCPath – December 2008

FRCPath 12 th January 2010

Mutation Detection Session FRCPath Preparation Course 9/11/09

Jayne Duncan FRCPath Course 2010

Cancer Genetics Session FRCPath Preparation Course 12/1/10

Jayne Duncan MRCPath Self Help Course 2010

Wendy Roworth FRCPath course 17/09/10

Michael , Barbara , Jayne

Jayne Oakwood

Jayne Duncan FRCPath Course December 2010

Laboratory Management Session FRCPath Preparation Course 22/10/09

Duncan bannatyne

Duncan

Hair Extensions Course Brisbane -|( Jayne Hair Extension )