1 / 21

Extracting Metadata for Spatially-Aware Information Retrieval on the Internet

Research Paper Presentation – CS572 Summer 2011. Extracting Metadata for Spatially-Aware Information Retrieval on the Internet. Paper by Paul Clough (University of Sheffield Western Bank). Presented by Donghee Sung. Short Overview.

earlt
Download Presentation

Extracting Metadata for Spatially-Aware Information Retrieval on the Internet

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Research Paper Presentation – CS572 Summer 2011 Extracting Metadata for Spatially-Aware Information Retrieval on the Internet Paper by Paul Clough (University of Sheffield Western Bank) Presented by Donghee Sung

  2. Short Overview • SPIRIT:Spatial awareness to information systems e.g. transport timetables routing system for motorists map-based web sites location based servicesKey Part:Extraction and use of geospatial information

  3. Short Overview • CriteriaSpeed, Reliability, Flexibility, Multilingualism • Geo-Parsing: - Identifying geographic references- Gazetteer lookup with context rules to filter out common-usage words and personal names • Geo-Coding: - Assigning spatial coordinate- Based on information of geographic resource

  4. What’s the SPIRIT? < http://www.geo-spirit.org/ >

  5. What’s the SPIRIT? • SPIRITSPatially-Aware Information Retrieval on the InterneTA search engineto find documents and datasets on the web relating to place or regions

  6. What’s the SPIRIT? • Poor existing web search facilities find information related to a particular location. Vicinity: find other places within radiuswww.somewherenear.comYellow pages services: find a specific place or post codeBuyukkten: associated admin’s IP with telephone area code Stanford Research Institute: proposed ‘.geo’ with cells with  latitude and longitude

  7. What’s the SPIRIT? • Resources relating to place may not be found may not be places nearby may have another name • Major Shortcoming:cannot recognize alternative name modern/historical variants informal name contained places name

  8. What’s the SPIRIT? • SPIRIT ProjectQuery expansion / relevance ranking procedures Machine learning techniques extraction of geographical context generating metadata Multi-modal user interface textual input interactive map feedback Spatial indices for web collections.

  9. Data Sources • Sources of Spatial DataTGN, OS, SABE • A large web collection of SPIRIT

  10. Data Sources

  11. Data Sources

  12. Data Sources

  13. Geo-Parsing Techniques • Tokenization Issues Stop-words Named-Entitiy Recognition (NER) Gazetteers

  14. Geo-Parsing Techniques

  15. Geo-Parsing Techniques • Named-Entity Recognition (NER)Processing a text and identifying to particular categories of Named Entities(NE) People, Organization, Location. etc

  16. Geo-Parsing Techniques • Tokenization Procedure1) Tokenized on whitespace @words = split(/s+/, $sentence); (Perl Regular Expressions) "Isn't it ashame.“ -> Isn't / it / ashame.2) Stemming / Case conversion. isn't / it / asham3) Removing stop-words

  17. Geo-Parsing Techniques • Default setting in indexing and retrieving- Case sensitivity: Off - Stop-word removal: Off - Stemming: OffStop-word removal / stemming -> Reduce the size of index filesBut, can be useful:Stop-words : ‘in’, ‘inside’, or ‘of’Stemming: “London” from “London” &“Londoner”.

  18. Geo-Parsing Techniques • Filtering candidate locations using context rules to remove stop-words references to people and organizations, and links to emails/URLs

  19. Conclusion • Geo-Parsing method could be improved by enhancing the gazetteer matching and filtering • False hits would be reduced by generating better list of stop-words and using further context rules could reduce • Need for creating rules would be alleviateby generating further context rules with features on machine learning

  20. References [3] Jones C.B., R. Purves, A. Ruas, M. Sanderson, M. Sester, M.J. van Kreveld, R. Weibel (2002). Spatial information retrieval and geographical ontologies an overview of the SPIRIT project. SIGIR 2002: In SIGI’02, Tampere, Finland, 387-388. [6] Joho, H. and Sanderson, M. (2004) The SPIRIT collection: an overview of a large web collection. In SIGIR Forum, 38(2), 57-61. [8] Mikheev A., Moens M. and Grover C. (1999) Named Entity recognition without gazetteers. In Proceedings of the Annual Meeting of the European Association for Computational Linguistics EACL'99, Bergen, Norway, 1-8. Spatially-Aware Information Retrieval on the Internet - A Working Searching System

  21. Thank You!

More Related