1 / 13

Web Taxonomies Discovering the Structure of Information

Web Taxonomies Discovering the Structure of Information. Tim Weninger. Department of Computer Science University of Illinois Urbana-Champaign, Urbana, IL. Information wants to be free. World Wide Web is decentralized and messy. (but it wants to be structured)

arion
Download Presentation

Web Taxonomies Discovering the Structure of Information

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Web TaxonomiesDiscovering the Structure of Information Tim Weninger Department of Computer Science University of Illinois Urbana-Champaign, Urbana, IL

  2. Information wants to be free • World Wide Web is decentralized and messy. • (but it wants to be structured) • Taxonomies are used to describe hierarchical structure of data • Almost always hand crafted • Data is made (forced) to fit the taxonomy • Information wants to be free!

  3. Information wants structure • Just like political science… in data science… • There is no such thing as digital anarchy • Government will always rise • Data democracy • Let the data decide its own form government

  4. Let’s discovera taxonomy of a Web site

  5. Web Graph  Web Tree – is a really hard problem • How do we traverse the graph? • BFS • DFS • MST • With Replacement • Without Replacement • All links • Some links

  6. Web Graph  Web Tree? – BFS

  7. Web Graph  Web Tree • Lists of links • WWW2011 work • Link paths? • Most probable user navigation • PageRank We’re working on all of those – PageRank seems to work

  8. Some explorations – BM25 ranks text

  9. Propagate information backwards – re-rank

  10. Map taxonomies • Assumption • Two taxonomies from Web sites of similar organizational missions will be similar • Lets do integration

  11. Some early results

  12. Brand new result --- Breakthrough this morning Cue scary graphs

  13. Questions? Challenges?

More Related