1 / 21

Genome Assembly Stewardship (Ames)

Genome Assembly Stewardship (Ames). Objective 1: Support stewardship of maize genome sequences and forthcoming diverse maize sequences.

stian
Download Presentation

Genome Assembly Stewardship (Ames)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Genome Assembly Stewardship (Ames) Objective 1: Support stewardship of maize genome sequences and forthcoming diverse maize sequences. Goal 1.a: Enlist the community of maize researchers in the genome assembly and annotation process to enable their contributions to and use of improved reference genome sequences in real-time. Goal 1.b: Deliver sequence-based representations of maize diversity, both with respect to the B73 reference genome and in the absence of homologous reference sequence. Anticipated Products: Toolsuite to enable reference genome assembly improvement, documentation of diversity alongside the reference genome assembly, and contribution of structural and function genome annotation by researchers directly.

  2. Genome Assembly Stewardship (Ames)

  3. Genome Assembly Stewardship The GRC data model can represent Multiple tiling paths Some regions have multiple paths with equal evidence

  4. Genome Assembly Stewardship Alternative Alleles Maize bz1 locus has multiple alleles Right schematic taken from Figure 1 of Wang and Dooner. PNAS 2006;103(47):17644–17649

  5. Genome Assembly Stewardship The bz1 locus has multiple alleles and multiple potential assembly paths bz1 stc1 Diagram adapted from The Plant Cell February 2005 vol. 17 no. 2 343-360 and Proc. Natl. Acad. Sci. USA 99, 9573–9578

  6. Genome Assembly Stewardship GRC Tiling Paths Views

  7. Genome Assembly Stewardship Types of issues defined by the GRC Fix patches correct assembly problems and are incorporated into the next major assembly update. Novel patches, or variations are used to add variant sequences and are retained as alternate loci scaffolds. Gaps provide evidence for filling gaps in sequence. A clone problem is an apparent error in the placement of a single clone. A path problem is an apparent error in the placement or tiling of multiple clones. Missing sequence improves the reference sequence or fills a gap. Localization problems indicate clones that appear to map to a different chromosome than reported in the assembly.

  8. Genome Assembly Stewardship • BAC Reassembly and Submission • Why this needs to be done • BACs were reassembled during the assembly process so that the records now in GenBank don’t match the assembly. • We would like to use the GRC tools and data models to improve and annotate the assembly, but this requires that the BAC sequence that is in GenBankmatch the assembly. • Therefore, we are updating the GenBank BAC records to match the current assembly.

  9. Genome Assembly Stewardship • BAC Reassembly and Submission • Participants

  10. Genome Assembly Stewardship • BAC Reassembly and Submission • Considerations • We are taking the most conservative options when there are questions about removal of sequence from BAC records. • BACs were sequenced and submitted by Washington University. MaizeGDB will modify BAC records and prepare the GenBank submission files, then Washington University will submit updates.

  11. Genome Assembly Stewardship BAC Reassembly and Submission Process GenBank: RefGen_v3:

  12. Genome Assembly Stewardship • BAC Reassembly and Submission • Process • Start with current GenBankrecords for all BACs used in assembly (16,082 BACs) • Rearrange BAC sequence according to assembly file from Rod Wing’s lab at the Arizona Genomic Institute. • Remove contaminants, but keep ‘mitochondrial contaminants’ that are known to be nuclear DNA. • Compare outcome to AGP file for RefGen_V3. • Note: No overlapping sequence removed from BACs as overlap is needed by GRC tools. • …continued…

  13. Genome Assembly Stewardship • BAC Reassembly and Submission • Process • …continued: • Re-write BAC sequence according to changes from steps above. • Check rearranged sequence by aligning against V3 assembly with MUMmer. • Work with Karen Clark at GenBank to prepare submission files. • Give final files to Washington University for actual submission.

  14. Genome Assembly Stewardship • BAC Reassembly and Submission • End Products • A full set of submission files for GenBank, one for each BAC that requires updating. • A table describing the changes to each BAC that will be made available at MaizeGDB. • Participation in the Genome Reference Consortium, and the ability to use their tools for future genome assembly annotation.

  15. Genome Assembly Stewardship • Issue Collection • Collection of data that supports local issues (assembly errors, alternative alleles, gene model corrections) in the Genome Assembly will be found and reported by MaizeGDB curators. • Data will also come from researchers in the community. We hope to make reporting the issues obvious and easy.

  16. Genome Assembly Stewardship • Issue Collection • We will use the Jira request/issue tracker to collect and address assembly and gene model issues. • We plan to indicate assembly regions with issues on the genome browser. • Gene model issues will be reported on the gene model record pages.

  17. Genome Assembly Stewardship Issue Collection A form is available on the redesigned MaizeGDB website that will connect to Jira.

  18. Genome Assembly Stewardship • Status • Working out the last few bugs in the submission files with Karen Clark at NCBI. • When BAC records have been updated in GenBank, the tiling path files (TPF) will be submitted to the GRC. • A Jira project for issues and the issue reporting form is operational on the redesigned MaizeGDB site and MaizeGDB curators have started entering issues.

  19. Genome Assembly Stewardship • Next Steps (once NCBI Assembly Data Model is populated) • Release assembly/annotation status page at MaizeGDB. • Release issue reporting form and begin accepting issues from researchers (assembly and structural annotation). • Collect structural annotations for gene models from PlantGDB. • Feed issues to Ware group (funded by NSF for RefGen_v4) and others assembling and annotating the genome. • Work with researchers to document diversity using the NCBI data model (from specific examples from the literature to potentially a whole genome sequence e.g., Oh43)

  20. Genome Assembly Stewardship (Diversity Curation; Columbia) Objective 4: Identify and curate key datasets that will serve to benchmark genomic discovery tools for key agronomic traits, especially response to biotic and abiotic environmental stressors. Sub-objective 4.1: Bring into MaizeGDB the phenotypic data generated by critically important research endeavors including the Maize Diversity Project. Goal 4.1: Provide facile access to phenotypic diversity associated to genotype. Anticipated Products: Data files that can be used as input to common statistical software as well as association files for the Plant Ontology to enable associations of diversity across multiple species.

  21. New Map data: 56,000 SNP from Illumina MaizeSNP50 chip 28,000 genetically mapped using 2 inter-mated panels IBM (B73 x Mo17) LHRF (F2 x F252)  16 regions of major discord with B73 assembly (Ganal et al.2011. PLoS One) Expect soon: (1) diversity maps from related European project “Cornfed” – Dent and Flint inbred lines (2) NAM from Buckler et al. Vp1 viviparous1

More Related