1 / 22

Geographic Web Information Retrieval

Geographic Web Information Retrieval. Alexander Markowetz , University of Marburg Thomas Brinkhoff, FH Oldenburg Bernhard Seeger, University of Marburg. Current Situation In Web-IR. Everybody is online But never seen. Queries are too short Resultsets are too large.

Download Presentation

Geographic Web Information Retrieval

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Geographic Web Information Retrieval Alexander Markowetz, University of Marburg Thomas Brinkhoff, FH Oldenburg Bernhard Seeger, University of Marburg

  2. Current Situation In Web-IR • Everybody is online • But never seen

  3. Queries are too short Resultsets are too large You can effectively block your competitors Good results get buried Current Situation In Web-IR • Smaller Results • Ways to drill the ice-berg

  4. Solutions • Personalized Search • Dynamic/Interactive Search

  5. Geographic Web-IR • Location is the most personal property • „All business is local“ • People already use the web geographically • „Yoga Brooklyn“ • „Linux usergroup Frankfurt“ • And get poor results • We are going to make that a lot better

  6. How-Not-To • Semantic Web • „If just everybody included Geographic Markup in their web-pages“ • Two problems • Chicken-Egg • Malicious Webmaster • Metatags Anyone? • Bottomline • Semantic web is for „B2B“ situations only.

  7. How-To • Modify traditional IR techniques to extract geographic markers • Multigranular approach • Extending basic Web-IR • Map pages to geographic positions • Footprint • Aggregate and Cluster them • Build Applications • Geographic Search • Geographic Web-Mining

  8. Geocoding • Footprint • Geographic Position of a Webpage • Set of points and polygons, associated with some amplitude

  9. Preliminaries • Basic IR Assumptions can easily be extended to „geographic-IR“ • Radius-1 Hypothesis • Radius-2 Hypothesis (co-citation) • Intra-Site Hypothesis • Intra-subdomain • Intra-directory

  10. Dom SDom SDom Dir Dir File File Multigranularity • Information extraction on different levels • Domain • Subdomain • Directory • File • Need to aggregate

  11. On all levels Names of places Zip-codes Area-codes On Site Level Whois Business Directories Links Density over a given area Radius-1 and Radius-2 Geospatial Mapping and Navigation of the Web, Kevin S. McCurley, 10th WWW, 2001 Computing Geographical Scopes of Web Resources, J. Ding, L. Gravano, and N. Shivakumar, VLDB2000 Dom SDom SDom Dir Dir File File Sources

  12. Key Words City Street State Area code Geographic Search • A simple interface • Not so exciting, but... SEARCH

  13. Closer Continue Wider Next ½ mile 1 mile 2 miles 5 miles 10 miles 25 miles 100 miles Next Closer Wider Dynamic Geographic-IR • Replacing the „next“ button

  14. Locality • Final ranking is a (linear) combination of importance and geographic distance. • Chances are: • Amazon will still rank first: no matter where you are • Amazon is a „global bully“ • Idea: • Eliminate global bullies by computing importance differently • Give less weight to links that span a longer distance

  15. Evaluation • Evaluation Web-IR is hard • Evaluating geo-Search is even harder • Mistakes are hard to find

  16. Impact of geo-IR • Next generation Search Engine • Location based Service • For cellphones under UMTS • Move traffic from A&E • Local companies will get more traffic • Increase Profits from Adwords • Smallest businesses will advertise online • Locally focused • The „Leaflet-industry“ will shrink

  17. Geographic Web-Mining • The web reflects human society. • Distorted • Delayed/Ahead • A lot of interesting social questions can be answered by looking at a large webcrawl • You can save time and money compared to door-to-door surveys • This is widely used • But: • Most of them are of geographic nature

  18. Where in Germany are vintage sneakers a trend? Is there a fashion authority that is accepted in all regions of Germany? Do Britney and Madonna have the same audience? Draw a map of Germany with all sites about vintage sneakers. Find all fashion-sites that get a min of 1000 equally distributed links. Map the areas in Germany, where there are significantly more Sites for B. than for M. Example Queries Precise Semantics?

  19. Current Work • Older Prototype • Metasearch on top of lycos.de • Screen-scrape & re-order • Whois only • Did very well

  20. Current Work • Current Prototype for Geographic Search • Limited to Germany = .de domains • 50.000.000 Pages • Expected online by late summer • In co-operation with • Yen-Yu Chen • Xiaohui Long • Torsten Suel • Polytechnic University, Brooklyn

  21. Reinventing Web-IR • Nearly no (academic) work in geo-IR • Allmost every aspect of Web-IR needs to be looked at again • Interfaces • Query processing • Index distribution • Link analysis • User profile analysis • Spam detection • Even: • Other aspects of personalized search • Changes in the web

  22. Thank you Any questions?

More Related