1 / 29

Slow and Steady: The Sea Urchin Genome Project

Slow and Steady: The Sea Urchin Genome Project. David A. Schwarz Mentor: Dr. Andrew Cameron Site: California Institute of Technology. Objective. Curate the non annotated, predicted genes of the sea urchin genome. Learn to annotate genes and register as many as possible to spbase.org.

bina
Download Presentation

Slow and Steady: The Sea Urchin Genome Project

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Slow and Steady:The Sea Urchin Genome Project David A. Schwarz Mentor: Dr. Andrew Cameron Site: California Institute of Technology

  2. Objective • Curate the non annotated, predicted genes of the sea urchin genome. • Learn to annotate genes and register as many as possible to spbase.org

  3. Importance • The purple sea urchin: the only non-chordate deuterostome with a sequenced genome. • It could help us understand the evolution of biological processes such as odor perception and immunity. • Developments made in the project could benefit future genome projects.

  4. Strongylocentrotus purpuratus • Phylum: Echinodermata • Radially symmetrical shell, 3 – 10 cm. • Spines can reach 3 cm long. • Moves slowly, feeding mostly on algae. • Reproduces by external fertilization.

  5. Phylogeny

  6. Data Flow

  7. Genome Sequencing • WGS = Whole Genome Shotgun Sequencing • Genome assembly named Spur_v0.5 • CAPSS = Cloned-Array Pooled Shotgun Sequencing Strategy • Genome assembly named Spur_v2.1

  8. Data Flow

  9. WGS: Extract DNA Digest Sequence the Fragments Assemble the genome. CAPSS: Combines WGS with BAC. Uses BACs as framework for genome assembly. Sequencing

  10. CAPSS

  11. Data Flow

  12. GLEAN

  13. Spur_v0.5 – 28,944 predicted ~10,044 annotated 18,944 non annotated Spur_v2.1 23,300 estimated Gene number reduced when duplicates overlap Discrepancy • ~ 5,700 gene difference possibly due to: • 4 – 5% species polymorphism (E. Davidson, et al.) • Assembly error • Prediction error

  14. Python Filtering Python Searching BioPython module: BLAST hit FASTA sequences Grep-like functions: GLEAN models by protein type FASTA sequences in GLEAN protein databse Methods

  15. Example List GLEAN3_00003 ref|NP_104627.1| hypothetical protein [Mesorhizobium loti] >gi|1... 38 0.48 GLEAN3_00004 ref|NP_788284.1| CG33087-PC [Drosophila melanogaster] >gi|232403... 40 0.19 GLEAN3_00005 ref|NP_509604.1| abnormal NUClease NUC-1, deoxyribonuclease DLAD... 69 4e-11 GLEAN3_00008 ref|XP_293875.3| similar to RIKEN cDNA B130016O10 gene [Homo sap... 240 5e-62 GLEAN3_00010 gb|AAH36744.1| FLJ11712 protein [Homo sapiens] 86 6e-16 GLEAN3_00011 gb|AAH36744.1| FLJ11712 protein [Homo sapiens] 143 3e-32 GLEAN3_00014 ref|NP_062642.1| ubiquitin-conjugating enzyme E2A, RAD6 homolog;... 229 2e-59 GLEAN3_00018 failed GLEAN3_00019 failed GLEAN3_00020 failed GLEAN3_00021 ref|NP_196259.2| chaperone protein - related [Arabidopsis thalia... 110 4e-23 GLEAN3_00023 failed GLEAN3_00024 sp|O42587|PRSA_XENLA 26S protease regulatory subunit 6A (TAT-bin... 130 1e-29 GLEAN3_00027 gb|AAD19348.1| reverse transcriptase-like protein [Takifugu rubr... 172 2e-41 GLEAN3_00028 gb|AAH53792.1| MGC64389 protein [Xenopus laevis] 164 3e-39 GLEAN3_00029 failed GLEAN3_00030 ref|XP_060945.2| similar to Olfactory receptor 10T2 [Homo sapien... 54 5e-06 GLEAN3_00032 dbj|BAA22375.1| Nfrl [Xenopus laevis] 339 7e-92 GLEAN3_00033 ref|XP_354640.1| RIKEN cDNA D430035D22 gene [Mus musculus] 186 1e-45 GLEAN3_00034 dbj|BAC04242.1| unnamed protein product [Homo sapiens] 207 5e-52 GLEAN3_00037 dbj|BAC02921.1| zVeph-A [Danio rerio] 112 4e-23 GLEAN3_00038 ref|NP_004198.1| solute carrier family 16, member 3; monocarboxy... 44 0.008 GLEAN3_00039 failed

  16. Data Curation Condition: Different name, same genome coordinates Genes removed: 139

  17. Data Curation Condition: Evidence for gene expression Genes removed: 1,603

  18. Data Curation Condition: No hits Genes removed: 3,145

  19. Data Curation Condition: Exactly the same BLAST hit Genes removed: 4,545

  20. Data Curation Condition: Successful Reciprocal BLAST match Genes removed: 3,952

  21. Reciprocal Blast A B Good Reciprocal Blast Y X Sea urchin protein database (GLEAN) NCBI Nr database GLEAN_A NCBI Protein B (score) (e-value)

  22. Reciprocal Blast A B Y X Bad Reciprocal Blast Sea urchin protein database (GLEAN) NCBI Nr database GLEAN_A NCBI Protein B (score) (e-value)

  23. Data Curation Conditions: Names such as “hypothetical”, “predicted”, “unnamed” Genes removed: 3,041

  24. Annotation Process

  25. Contributions to Annotation • AnnotationAssist.py • Automates searching for families in the Glean database • Autofetches sequences for Clustal X • Stores everything on a unique directory based on Glean model name and family

  26. References • Polymorphism: R.J. Britten, A. Cetta, E.H. Davidson, Cell 15, 1175 (1978) • CAPSS: W. W. Cai, R. Chen, R. A. Gibbs, A. Bradley, Genome Res.11, 1619 (2001).

  27. Dr. Andrew Cameron David Felt Lauren Lee and Nowelle Ibarra SoCalBSI Staff and Coordinator SoCalBSI Participants Funding: NIH NSF DOE Beckman Institute Acknowledgments

More Related