1 / 20

The Gene Wiki, from a BioRDF-naïve perspective

The Gene Wiki, from a BioRDF-naïve perspective. W3C / HCLSIG BioRDF Subgroup November 17, 2008. Entrez Gene. Patterns of gene annotation. How do we efficiently annotate the function of the ~25,000 genes in the mammalian genome? Goal: “Genome-wide functional genomics”. P( k ) ~ k - a.

Download Presentation

The Gene Wiki, from a BioRDF-naïve perspective

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Gene Wiki, from a BioRDF-naïve perspective W3C / HCLSIGBioRDF SubgroupNovember 17, 2008

  2. Entrez Gene Patterns of gene annotation How do we efficiently annotate the function of the ~25,000 genes in the mammalian genome? Goal: “Genome-wide functional genomics” P(k) ~ k -a 44% of genes in Entrez Gene have zero linked references. Over 75% have five or fewer linked references.

  3. Content Users The Long Tail of Knowledge • Traditional media revolves around the Short Head – a few number of publishers putting out lots of content • “Web 2.0” media revolves around community generated content – a huge population of individuals each generating a (relatively) small amount of content The Short Head Newspapers TV/Hollywood Consumer Reports Olympics Encyclopedia Britannica The Long Tail Blogs YouTube Amazon reviews American Idol Wikipedia “Community intelligence”

  4. The Long Tail of encyclopedias • Wiki: “… a website that allows the visitors themselves to easily add, remove, and otherwise edit and change available content, typically without the need for registration.” • Wikipedia: “the free encyclopedia that anyone can edit.” An expert-led investigation carried out by Nature … revealed numerous errors in both encyclopaedias, but among 42 entries tested, the difference in accuracy was not particularly great: the average science entry in Wikipedia contained around four inaccuracies; Britannica, about three. http://en.wikipedia.org/wiki/Wikipedia:Size_comparisons, July 2008

  5. Advantages of a Gene Wiki 1) Existing gene portals are great for structured content, but a wiki is suited for summarizing unstructured content Entrez Gene Wikipedia Unstructured content allows for free-text, images, diagrams, photos, etc.

  6. Advantages of a Gene Wiki 2) Wiki articles enable two-way communication of information, encouraging contributions and edits from the community. Dec 18, 2002 Jan 3, 2004 Dec 11, 2004 May 6, 2006 Wikipedia is rarely the last place you look, but is often a good first place for an overview.

  7. Gene “stubs” • Active MCB community at WP had already developed ~650 gene articles • Can we accelerate this process through stub creation? • In total, created 7500 new articles and edited 650 previously existing articles.

  8. Why Wikipedia? • Critical mass of articles to which and from which we could link gene pages • Critical mass of editors who were experienced in wiki-related issues (fighting vandalism, copyediting, governance) • Active group of molecular biologists at the MCB “WikiProject” (http://en.wikipedia.org/wiki/WP:MCB) • Alternatives considered • Home-built wiki • Citizendium (citizendium.org)

  9. (650) (7500) Gene wiki usage Current have ~9000 gene pages or stubs at Wikipedia 50% of all edits to gene pages are to newly-created pages… Gene Wiki pages are highly ranked at Google, ensuring critical mass of users and editors…

  10. Positive feedback loop Gene wiki page utility 1 100 2 200 Number of editors Number of readers

  11. 25k gene-specific review articles? Reelin: 33 editors, 221 edits since July 2002 Heparin: 175 editors, 320 edits since June 2003 AMPK: 44 editors, 84 edits since March 2004 RNAi: 232 editors, 708 edits since October 2002 Hyperlinks to related concepts

  12. Gene Wiki activity Steady (and growing?) edit rate over time

  13. Gene Wiki article growth http://manyeyes.alphaworks.ibm.com/manyeyes/visualizations/gene-wiki-top-2500-20081114

  14. Welcome to the semantic web… The main concern with plaintext-on-Wikipedia is that it's not an effective way to truly exploit the long tail, since you're going to end up with this massive plaintext disaster that will require human collating (redundant work- just get it right the first time).” - public-semweb-lifesci mailing list

  15. Primary emphases • Providing useful content– scientists will not find or contribute to a wiki unless it is already useful • Instant feedback– wikis allow changes to be effective immediately, without approval or intermediary (e.g., corrections/additions to NCBI/Ensembl?) • Emphasis on contributors, not data miners – emphasize getting data in, not on getting it out, since complex protocols encourage nonparticipation (e.g., MIAME) • Critical mass – What will differentiate the Gene Wiki from the many other wiki efforts that are stagnant?

  16. Secondary emphases • Reliability and accuracy – do open and uncurated data models produce trustworthy content? • Synergy with existing resource – how can the Gene Wiki make the growth of traditional annotation more efficient? • Enabling semantic queries/structure – how can we structure unstructured content for data mining? (Semantic Mediawiki? NLP?)

  17. Create Gene Wiki stubs Unstructured content from the community 1 2 Semantic encoding of free text (how?) 3 Idealized information flow “Long tail” scientific contributions Direct semantic annotation by scientists Wikipedia NCBI Ensembl … Semantic structure Authoritative annotation databases

  18. Wikipedia NCBI Ensembl … Figure to scale? “Long tail” scientific contributions Semantic structure

  19. Summary • Goal: create a complementary resource to existing tools, not competitive. • Primary emphasis will always be on maximizing community participation. • How do we structure the unstructured contributions?

  20. Acknowledgements Serge Batalov Jason Boyer Jennifer Floyd Yue Hu Jon Huss Jeff Janes Camilo Orozco Steve Su Julia Turner Chunlei Wu David Delano James Goodale Phil McClurg Richard Trager Faramarz Valafar, SDSU Tim Vickers, Washington Univ Michael Cooke Pete Schultz Funding: NIGMS, NIH; Novartis Research Foundation

More Related