1 / 18

Locus Reference Genomic (LRG) Sequences

Locus Reference Genomic (LRG) Sequences. Raymond Dalgleish Department of Genetics University of Leicester. Background. Descriptions of sequence variants should use HGVS nomenclature

ajay
Download Presentation

Locus Reference Genomic (LRG) Sequences

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Locus Reference Genomic (LRG) Sequences Raymond DalgleishDepartment of GeneticsUniversity of Leicester

  2. Background • Descriptions of sequence variants should use HGVS nomenclature • Variants should be described with respect to a reference DNA sequence specified by an accession number and a versione.g. NM_000088.3:c.2362G>T • Mostly works well, but three key issues frequently cause problems for LSDB curators and for diagnostic laboratories

  3. Issue 1: Version not specified • The autosomal dominant RP10 form of retinitis pigmentosa is caused by variants in the IMPDH1 gene • Variants for this gene are described with respect to NM_000883.1, but the version is rarely mentioned in the literature • The current version (NM_000883.3) records a shorter mRNA & protein which could lead to confusion and delay

  4. Issue 2: Alternative splicing • ~93% of genes have alternatively spliced transcripts & may yield several proteins • The CDKN2A locus encodes the tumour suppressor proteins p16INK4a and p14ARF • The mRNAs for the two proteins share exon 2 in common but in different reading frames, due to different upstream exons • Separate RefSeq records for the mRNAs

  5. CDKN2A alternate splicing

  6. Issue 3: Legacy numbering (1) • The “sickle cell” variant of β-globin is due to the substitution of glutamic acid by valine at amino acid 6 • Determined by amino acid sequencing prior to completion of the genetic code • HGVS protein-level description is p.Glu7Val counting from the start codon

  7. Issue 3: Legacy numbering (2) • New exons are often discovered in genes long after their initial characterisation • This interferes with simple sequential numbering of exons from 5´ to 3´ • Legacy numbering is well-established: • COL1A1: 33/34 • CFTR: 6a, 6b,14a, 14b, 17a, 17b • OPRM: O, X, Y • CDKN2A: 1B, 1A

  8. So what is the solution? • An ideal reference sequence would: • be stable over periods as long as 25 years • be free of version (revision) confusion • comprise an “idealised” genomic DNA sequence haplotype providing a practical working framework • contain comprehensive information about all transcripts and proteins encoded by the gene (including alternative numbering schemes) • be mapped to the current genome assembly

  9. Key issues • A joint project between EBI and NCBI • LRGs will be a working representation of a gene with a permanent ID: i.e. no versions • Based on an existing RefSeqGene record • 5 kb upstream and 2 kb downstream • There can be more than one LRG for a given region of the genome • LRGs will have both fixed and updatable feature annotations

  10. Primary fixed annotations • Genomic DNA sequence • Transcripts essential to the reporting of sequence variants • The conceptual translated protein(s) • Non-coding transcripts

  11. Primary updatable annotations • Mapping to current genome assemblies • Chromosomal location • Any alternative IDs • Cross references to other reference sequences • “Legacy” exon and amino acid numbering systems • Links to LSDBs • Information about overlapping genes

  12. Variant reporting with LRGs • The calcitonin gene (CALCA) encodes the peptide hormones calcitonin and calcitonin gene related peptide (CGRP) by alternative splicing • A SNP (rs5241) in the first base of exon 4 affects the calcitonin transcript (t2) & resulting precursor protein (p2) • The variant can be reported at the gene, mRNA and protein level with reference just to LRG_13 (CALCA)

  13. Progress • LRGs can be viewed at the LRG web site: http://www.lrg-sequence.org • The first 12 LRGs have been finalised: • COL1A1, COL1A2, COL3A1, CRTAP, ATP1A2, CACNA1A, SCN1A, PPIB, FKBP10, CALCA, UBE3A, LEPRE1 • 105 others await final approval • Many others are in production

  14. Other tools to view LRGs • Ensembl, NCBI Genome Workbench, NCBI Sequence Viewer will soon provide support for LRGs • NGRL Universal Browser displays LRGs with links through to LSDBs and dbSNP • Mutalyzer will be updated to parse LRGs to support their use in LOVD • Alamut will probably be the first commercial software supporting LRGs

  15. How do I learn more? • LRG web site:http://www.lrg-sequence.org • LRG specification document:http://www.lrg-sequence.org/docs/LRG.pdf • The LRG XML schema is available for download • E-mail addresses: • Request help: help@lrg-sequence.org • Provide feedback: feedback@lrg-sequence.org • Request a new LRG: request@lrg-sequence.org

  16. Acknowledgements

  17. Coordination and funding • LRGs were devised by the GEN2PHEN project: http://www.gen2phen.org • The research leading to these results has received funding from the European Community's Seventh Framework Programme (FP7/2007-2013) under grant agreement number 200754 — the GEN2PHEN project

More Related