html5-img
1 / 15

WinaCS Project Web Entity Extraction and Mapping Discovering and Propagating Context

WinaCS Project Web Entity Extraction and Mapping Discovering and Propagating Context. Tim Weninger. Department of Computer Science University of Illinois Urbana-Champaign, Urbana, IL. Past, Present, Future. Past – Entity search and retrieval is one of the dreams of the Web – TBL

primo
Download Presentation

WinaCS Project Web Entity Extraction and Mapping Discovering and Propagating Context

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. WinaCS ProjectWeb Entity Extraction and Mapping Discovering and Propagating Context Tim Weninger Department of Computer Science University of Illinois Urbana-Champaign, Urbana, IL

  2. Past, Present, Future Past – Entity search and retrieval is one of the dreams of the Web – TBL Present – Ranking and Retrieval bi-directional approach 1) Information Networks 2) Web mining and Information Extraction a) List Finding b) Entity-page Discovery c) Entity-page Mapping Future – InfoBase Project Information extraction via Schema Discovery

  3. Finding lists on the Web is Hard! (KDD Explorations Dec. 2010) 1. Google Sets 2. WebTables 3. Mining Data Records (MDR) 4. World Wide Tables (WWT) 5. Tag Path Clustering 6. RoadRunner 6. SEAL 7. Visual List Extraction 8. VIsual-based Page Segmentation (VIPS) 9. Visualized Element Nodes Table extraction (VENTex)

  4. Why is finding lists important? • CharuAggarwal • DeepayanChakrabarti • Ed Chang • Kevin Chang • Olivier Chapelle • Chris Clifton • Jiawei Han • … • Jiawei Han • ChengXiangZhai • Kevin Chang • Dan Roth • Marianne Winslett • Jiawei Han • ChengXiangZhai • Kevin Chang • Dan Roth • Marianne Winslett • SaritaAdve • TarekAdelzaher • VikramAdve • GulAgha • … Correction Inference Disambiguation Recommendation etc

  5. Our list finding algorithm (Accepted: WWW 2011)

  6. List Finding for Entity Page Discovery

  7. Growing Parallel Paths (Accepted: WWW 2011) Result:

  8. Mapping Pages to Records (CIKM’10)

  9. Mapping Pages to Records (CIKM’10) Example Ap1={People, Faculty, Dan Roth, Personal Site} Ap2={Research, Data Mining, Dan Roth, Personal Site} Bag of Anchors: {Research:1, People:1, Faculty:1, Data Mining:1, Dan Roth:2, Personal Site:2} Sorted Bag of Anchors: Au;v1={Dan Roth:2/2=1, Research:1/2=0.5, Data Mining:1/2 =0.5, Personal Site:2/5=0.4, People:1/3=0.33, Faculty:1/3=0.33}

  10. CSMap Locations of top 25 computer science departments. Automatically generated by extracting and ranking 5 digit numbers from Entity Web pages.

  11. Next Steps: The hard part! Infer categories/schemas from a set of WebPages Example: Name Address ZipCode Publications Collaborators Organizations How can we infer this schema? Wikipedia? How can we populate it? What does these entities have in common?

  12. Idea! Propagating schemas

  13. Next Steps: The hardest part! Inferred Given This can be modeled as a heterogeneous information network. Thus, Ranking and Clustering is possible So is semantic search, keyword search and typal search Cube operations are possible

  14. WinaCS – An information network based Web search engine

  15. Questions? Challenges?

More Related