1 / 38

Genome Browses and Data Display

Genome Browses and Data Display. Andy Conley 3 / 26 /2012. Who is this crazy looking guy?. James Kent. Know that name. He is one of greatest, perhaps the greatest, bioinformatics programmers ever. He was deeply involved in the assembly of the public human genome project.

jabari
Download Presentation

Genome Browses and Data Display

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Genome Browses and Data Display Andy Conley 3/26/2012

  2. Who is this crazy looking guy? James Kent. Know that name. He is one of greatest, perhaps the greatest, bioinformatics programmers ever. He was deeply involved in the assembly of the public human genome project. If you were in the fall class, you compiled the James Kent Source tree. Almost all his. He speaks nothing but the truth.

  3. He knows what a genome browser should be “Genome browsers facilitate genomic analysis by presenting alignment, experimental and annotation data in the context of genomic DNA sequences.” Melissa S Cline & James W Kent, 2009 Genome browsers aggregate data

  4. The UCSC Genome Browser Clicking on any of these takes you to a page full of details CDKN2A

  5. Tracks don’t have to be genes They are any kind of genomic information Genes Transposable element insertions Transcription factor binding sites Sites prone to recombination Conservation of genomics sequences Extremely important in modern times are tracks displaying ChIP-seq or RNA-seq data

  6. What’s good about the UCSC GB? Arguably the most advanced genome browser, it is much more than a tool for looking at genomes It integrates a huge amount of data for each gene it displays. The UCSC also has a graphical front end for downloading from its huge backend database

  7. This UCSC browser does so much more It hosts the ENCODE project, one of the largest, probably the largest, assemblies of functional genomic data. It let’s you jump between orthologous regions in different genome: CDKN2A It’s a massive, massive database backend of over 6500 tables.

  8. So why aren’t there dozens of UCSC Implementations It’s really, really, really hard to install. It’s impossible to understand unless you’ve tried to do it. The UCSC genome browser works so well for the genomes that it has because it is so very, very specialized for those genomes. Each track in the UCSC browser has been lovingly crafted.

  9. There are many genomes out now A ridiculous number of genomes They’re going to be coming out even faster in the next year or two, then faster after that. Things like the new PacBio providing longer reads should make assembling eukaryotic genomes easier.

  10. How do you handle so many genomes? You can’t load them/annotate them by hand – it all has to be automated. The UCSC guys do it for the human genome because it’s the human genome. They’re all different from each other. You have to have some easily deployable storage/display method for your data.

  11. Browser choices There are a number of choices out there for a genome browser There are really just 2 big ones: UCSC GMOD & GBrowse We already discussed why you don’t use the UCSC browser for projects

  12. Generic Model Organism Database Generic – It can handle any organism Model Organism – Not really, whatever genome Database – Not really a database, but there is a database in it. GMOD just sounds good gmod.org

  13. So what is GMOD Then? A simple, easily deployable method for storing, viewing and editing genomic data. GMOD has many, many parts Some of the big ones: Apollo – Eww Chado – A mechanism for storing genomic data GBrowse – A genome browser

  14. GBrowse Probably (definitely) the most commonly used of the GMOD components It is a simple but extensible platform for displaying genomic data It is maintained mostly by this man: Scott Cain

  15. GBrowse installations Many projects use GBrowse as their genome viewer

  16. WormBase WormBase is to the C.elegans genome what the UCSC browser is to the human and mouse genomes. It is huge.

  17. FlyBase FlyBase hosts many Drosophila genomes, though not with the depth of WormBase WormBase is really at the top of non-UCSC browsers in it’s depth of information This makes sense, given that nematodes are so heavily studied and very easy to work with.

  18. NBase The result of the first couple years of the class Currently maintained by Lee Katz at the CDC

  19. More from NBase

  20. You can use colors for information This shows genes that we thought were horizontally transferred Darker genes had more programs that indicated them being horizontally transferred

  21. You can also have specialized tracks We had a track of virulence factors in the first year Clicking on any of them took you to details for the gene, a link to VFDB, etc.

  22. This goes beyond colors You can alter how tracks are show in other ways Add and remove tracks, change the link that appears over a feature in the genome.

  23. What do all of these have in common? One big, important thing: “Genome browsers facilitate genomic analysis by presenting alignment, experimental and annotation data in the context of genomic DNA sequences.” Melissa S Cline & James W Kent, 2009 Genome browsers, in short, aggregate data.

  24. You can do even more customization My rotifertranscriptome browser. It doesn’t have to be a genome Not super exciting from this view. Just the predicted coding region of an assembled contig (mRNA)

  25. All of this is in the conf

  26. Synteny in GBrowse The relative ordering of things in a genome. Just a few years ago, this was not available in GBrowse, it is now. This could easily work for comparing different bacterial species

  27. GBrowse_syn on TAIR

  28. It’s More interesting in WormBase

  29. Are genome browsers useful?

  30. We are bioinformaticists We deal with huge volumes of data The fall class will recall my hatred of GUIs We want high-throughput Genome browses give you none of this. None.

  31. I wasn’t always a computer nerd I spent quite a bit of time in undergrad doing bench work for Dr. Nils Kroger across the street. I worked with these little guys: Fascinating creatures I cared about three genes: Sil1, Sil2, Sil3 They day the genome browser came out changed the game

  32. How useful is it for us? Still pretty useful My main uses: • Make sure my data are correct. Are my intersections between genes and transposable element insertions correct? • Download hosted data. • Make nice pictures • Like a biologist, gene information about specific genes

  33. In answer to the question How useful is it really? It really depends on who you ask It’s really for biologists: they find the browser, search for their favorite gene and get some details about it. Once again, data aggregation.

  34. The rotifer browser They were super excited about it They use it all the time It is like magic to them. If you were to show an iPhone to somebody from 1975, it would be pretty much the same thing. Almost.

  35. Conclusion of GBrowse Will it ever be the greatest genome browser? No. That will always be the UCSC browser Will it remain the easiest to install for some time? Probably Will you get the best return on time spent Yep Synteny is horribly conserved in Haemophilus, so avoid Gbrowse_syn for this class, but do keep it in mind.

  36. Just to make sure you’ve got it Genome browsers: Allow navigation of the genome Show genomic features, whatever they are Show annotations Show comparisons

  37. Database backends GBrowse, and all of GMOD, use GFF files Generic Feature Format Most of it is pretty simple. Chromosome(contig) start, stop, strand, id The last column is what’s important. It lets you put whatever information about the feature you want in there. It’s a very flexible format.

  38. Questions? Thanks for listening

More Related