1 / 22

Mapping of Geographical Entity with Meeting Location from Text for Mobile

Mapping of Geographical Entity with Meeting Location from Text for Mobile. Kyoungryol Kim. 2011. 9. 30. Table of Contents. Introduction Background and Related Work The Proposed System Experimentation Conclusion. 1. Introduction. 1) Motivation 2) Problem Definition 3) Contribution.

stacy
Download Presentation

Mapping of Geographical Entity with Meeting Location from Text for Mobile

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mapping of Geographical Entity with Meeting Location from Text for Mobile Kyoungryol Kim 2011. 9. 30

  2. Table of Contents • Introduction • Background and Related Work • The Proposed System • Experimentation • Conclusion

  3. 1. Introduction 1) Motivation 2) Problem Definition 3) Contribution

  4. Motivation : IE Techniques on Smartphone May 21, 2011 MS Windows Phone RIM Blackberry Google Android Address Recognition Apple iPhone Phone No. Recognition Time(Text) Recognition Adding event by recognized time Location(Text) Recognition People start to pay attention to ‘Location Extraction’ technique (Captured from Apple iPhone)

  5. Motivation : Characteristics of Mobile Device • Memory Issue • Android : 16MB heap size limit for each app. • iPhone : No memory limit, but totally 512MB of RAM (iPhone4) • Speed Issue • People who use mobile devices usually feel uncomfortable when it delays. • IE System • Usually general Information Extraction system consists of many NLP modules which consume more than 1GB memory, at least. • Client-Server model • Client and server communicating model that every processing is done in server-side. • Need internet connection (3G or Wifi). • If many clients request to the server at once, there will be overloading delays or the server dies. IE Method Specialized on Mobile Device is Needed

  6. Goal of this Research • Mapping Meeting Location text to the Geographical Locationand update it to online calendar in mobile device Meeting Announcement Extract Meeting Location The team meeting for the evaluation of first half of Univcast will be held. Date : July 19 (Sat) PM 2 Location : Myeong-dongDandelion Territory Directions to Dandelion Territory At Myeong-dong station gate number 8, take a walk following the downtown then there it is on the first floor of YMCA building. Update Calendar Extract Time

  7. Problem Definition • Extract meeting location from meeting announcement email • Disambiguate the extracted meeting location 회의는 오후 5시 학생회관 101호에서 열립니다. (Meeting will be held 5 PM at Room 101, Student Union.)

  8. 2. Background and Related Works 1) Information Extraction 2) Geocoding 3) Linked Open Data 4) Local-Grammar Graph

  9. Information Extraction • Information Extraction • The objective is to construct structured database from free text or semi-structured text (J. H. Kim 2004) • Related Work • CMU Seminar Announcement Corpus • 485 semi-structured seminar announcements • Types : stime, etime, location, speaker • Focus only on 4 types of information extraction, not on Geocoding. Examples of seminar announcement

  10. Geocoding • Geocoding • The process of finding associated geographic coordinates, often expressed as latitude and longitude, from other geographic data such as street addresses or zip codes (Geocoding, Wikipedia) • Related Work • Geocode from the address (Manov2003; Jones 2003; Peng 2006; Pouliquen 2006; Volz 2007; Overell 2007; Goldberg 2007; Kauppinen 2008) • The big issue of the research is disambiguation of address (Pouliquen et al. 2006) • Multi-referent ambiguity • two different geographic locations share the same name, • e.g. "Cambridge" is it Cambridge, UK or Cambridge, Massachusetts? • Name variant ambiguity • the same location has different names, • Geoname-Non Geoname ambiguity • where a location name could also stand for some other word such as a person name or nouns, • e.g. Metro as the city in Indonesia vs. Metro as the subway system • Focus only on Geocoding address, not all location entity • e.g. "Room 101, Student Union, Hanyang University"

  11. Linked Open Data • Linked Open Data • URL : http://linkeddata.org • The project aims to identify data sets that are available under open licenses, re-publish these in RDF on the Web and interlink them with each other • Geographic Datasets are growing rapidly • For only few Korean Geographical data included in LOD, we regard set of open geographical data as Linked Data, in this research. March 2009 September 2010 September 2011

  12. Local-Grammar Graph • Local-Grammar Graph • The language description model which is to perform automatic analysis and generation of natural language text, information extraction, using local language information in the form of Finite-State Automata. (J. Nam 2006) • Help to increase • efficiency and accuracy by lexicalizing the knowledge forming grammar • readability by consisting grammar as Directed Acyclic Graphs. • Various omissionand permutation can be described which cannot be done by rules or specific features. Example of LGG for 176 kinds of French wine un vin rouge de Bordeaux un vin de Bordeaux rouge un rouge de Bordeaux un Bordeaux rouge un Bordeaux un rouge .... du vin d'Alsace blanc du vin blanc d'Alsace du blanc d'Alsace de l'Alsace de l'Alsace blanc du blanc

  13. 3. The Proposed System 1) Preliminaries 2) Overall Architecture 3) Extraction Module 4) Disambiguation Module

  14. Overall Architecture 제목: 팀장회의 공지 2008년도의 마지막 팀장회의가 11월 22일 토요일 오후 2시 종로 토즈에서 열립니다. 재계약 그리고 명함 배부가 이뤄질 예정이니 팀장님, 그리고 차기팀장님들 모두 와주시기 바랍니다. 오시는 길: 종로 종각역4번 출구에서 내려서 100m 정도 걸어오시면 오른쪽에 있습니다. INPUT Extraction Module Finite-State Transducers Mobile Device OUTPUT Template Generator Disambiguated Result Query Linked Data Disambiguation Module Server Personal GeoData

  15. Extraction Module (1/2) • Construct Local-Grammar Graph (LGG) • Find local patterns around meeting location, inductively. • Scope of local patterns : • Previous/Next/Current sentence including meeting location. • Describe local patterns with 110 information types under 7 categories. • Location, Time, Title, Actor, Label, Connecting words, Etc. • e.g. ‘장소 : ‘ is ‘locLbl’ information type under ‘Label’ category. • Convert LGG to Finite-State Transducer (FST) • Extract Meeting Location by FST 2. 학술대회 일정: 2003년 5월 17일 (토요일) 10:30 ~ 16:30 3. 학술대회 장소: 성공회대학교 피츠버그관 4. 학술대회 순서

  16. Extraction Module (2/2) • Category of LGG for Meeting Location

  17. Disambiguation Module (1/2) • Problem • Multi-reference ambiguity (Pouliquen et al. 2006) • two different geographic locations share the same name • e.g. "Cambridge" is it Cambridge, UK or Cambridge, Massachusetts? • Disambiguation by Linked Data • Personal Geo Data • Personalized OpenStreetMap • User can map and save geographical location to the ‘meeting location’ • (should be applied, consulting by Claus at Leipzig Univ.) • Open Geo Data • Naver Local Search API • Yahoo! POI Search API • Seoul Bus-stop DB • Disambiguation by applying Ranking algorithm • (idea will be borrowed from meta-search researches) • disambiguate with 1st ranked geographical location

  18. Disambiguation Module (2/2) Query : 동측식당 Email : user@email.com Linked Data Personal Geo Data Personal Geo Data user@email.com 동측식당 <36.369051,127.363757> Open Geo Data Disambiguation Seoul Bus-stop Naver Local API 동측식당<37.19051,123.363757> 동측식당<36.347001,127.396285> 동측식당<36.998166,126.894287> 동측식당<37.55111,126.93219> ....... Yahoo! POI API 동측식당 <36.369051,127.363757>

  19. 4. Experimentation 1) Experiment Data 2) Extraction Module 3) Disambiguation Module

  20. Experiment Data • Meeting announcement corpus • 1101 meeting announcements • Collected from the web, with keyword ‘notice’ • Annotation • 10 types of term, 13 types of relation • 3 human annotators with COAT annotation toolkit

  21. Extraction Module • Exp1. Extraction speed/memory comparison • Baseline system : ML based system • Dataset : • already gathered corpus (training/test set) • Exp2. Extraction performance comparison • Baseline system : ML based system • Evaluation : Precision/Recall/F-measure • Dataset : • already gathered corpus (training/test set) • newly gathering corpus (Experimentation should be followed)

  22. Disambiguation Module • Exp1. Accuracy in distance • 6 types of distance : • 0≤x≤100m, 100m≤x<1km, 1km≤x<2km, 2km≤x<3km, 3km≤x<5km and 5km≤x • Exp2. Accuracy Improvement with Personal Geo Data • Evaluation : • hard to show the performance • show some scenarios how can it be applied so that it can improve accuracy. • Exp3. Performance of Ranking Algorithm comparison • Exp4. Disambiguation speed/memory comparison • processing and communication speed/memory comparison • on Server vs. on Mobile device (Experimentation should be followed)

More Related