1 / 14

Extracting homoeologous genomic sequences – the challenge of the wheat genome

Extracting homoeologous genomic sequences – the challenge of the wheat genome. Ivan Popov ABI, 2011. Why a challenge?. Wheat is hexaploid – it has 3 times more DNA than most organisms Wheat DNA isn’t just copied 3 times – it contain 3 different genomes!

kana
Download Presentation

Extracting homoeologous genomic sequences – the challenge of the wheat genome

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Extracting homoeologous genomic sequences – the challenge of the wheat genome Ivan Popov ABI, 2011

  2. Why a challenge? • Wheat is hexaploid – it has 3 times more DNA than most organisms • Wheat DNA isn’t just copied 3 times – it contain 3 different genomes! • The A, B and D genomes can contain different variants of a single gene • Then what do we get from sequencing?

  3. NGS gives us a consensus • Sequencing reads are assembled together and small variations are lost • Output is a consensus sequence… Unless we separate the reads before assembly or use the assembly to guide us while we read the three separate sequences.

  4. What we need: • A wheat sequencing database with BLAST functionality • An assembly program (CAP3, Cortex) • A gene of interest • A programming language and someone to use it (probably us again)

  5. Resources explained: Database • Contains sequencing “reads” from wheat genomic research – character strings with varying length (why?) • Usually on-line available • May implement the BLAST and assembly software • Example: www.cerealsdb.uk.net

  6. Resources explained: Software • Basic Local Alignment Search Tool • Sample alignment: ATGCTGGGACCTAT-GAT ATGCTC-GACCAATCGAT • Matching read to gene (different length) • Returns the best matching reads – probably belonging to our gene • Assembler software – overlaps the reads and produces the longest possible sequences: contigs

  7. Workflow • BLAST the database with our gene • Assemble the reads • Look at the result… and see the errors:

  8. An assembly example . : . : . : . : . : . : lcl|GKU3SMK03END4X- TGTGGCCACGCGGCTCACCTGCTCCACTGCGGAGGATGAGACCACCGGGTTCATCACCGG lcl|GHILVEO01C7S9I- TGTGGCCACGCGGCTCACCTGCTCCACTGCGGA lcl|GJVZXJB02IXON4+ TGTGGCCACGCGGCTCACCTGCTCCACTGCGGAGGATGAGACCACCGGGTTCATCACCGG UserQuery- TGCGGCTACGCGGCTCACCTGCTCCACTGCGGAGGACGAGACCACCGGGTTCATCACCGG lcl|GKJD2EX01B6242+ TGTGGCCACGCGGCTCACCTGCTCCACTGCGGAGGATGAGACCACCGGGTTCATCACCGG lcl|GBFXLNY02GE3WG+ TGTGGCCACGCGGCTCACCTGCTCCACTGCGGAGGATGAGACCACCGGGTTCATCACCGG lcl|GKJD2EX01EJ18K+ TGTGGCCACGCGGCTCACCTGCTCCACTGCGGAGGATGAGACCACCGGGTTCATCACCGG lcl|GINEZBA01ALFTD- TGCGGCCACGCGGCTCACCTGCTCCACTGCAGAGGATGAGACCACCGGGTTCATCACTGG lcl|GHILVEO02HKGDP- CGGGTTCATCACCGG lcl|GJVZZDM02G6UTI- TGCGGCCACGCGGCTCACCTGCTCCACTGCAGAGGATGAGACCACCGGGTTCATCACTGG ____________________________________________________________ consensus TGTGGCCACGCGGCTCACCTGCTCCACTGCGGAGGATGAGACCACCGGGTTCATCACCGG

  9. How do we unravelthe genomes? • We select the variable points of interest • We separate the reads by these points • Then we stack the reads in the same order while preserving the different sequences.

  10. A stacking example(only variable points are shown!) Reads: TGACA TGACAA Genome variants: AAGGGG TGACAA AAGGGG ACATC ACATC AAGGGGCT GGGGCT AGCCGT AGCCGT

  11. Workflow part II • Define the variable positions • Separate the variants by co-occurrence • Hope we get a meaningful result

  12. Problems • Sometimes more than 3 variants emerge! • Is it because there are different alleles of the gene? • There are variable points that do not share any reads • Deeper sequencing needed (more reads at each point) • Use of other contigs from the assembly?

  13. What do we gain? • Specific primers • Isolation/amplification of a specific genome (A, B or D) • Connecting phenotypic traits to gene variants • Combining specific gene variants to get new traits • Better wheat = more food!

  14. Questions? And thank you for your attention!

More Related