slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Extraction and Visualization of Geographical Names in Text PowerPoint Presentation
Download Presentation
Extraction and Visualization of Geographical Names in Text

Loading in 2 Seconds...

play fullscreen
1 / 28
Download Presentation

Extraction and Visualization of Geographical Names in Text - PowerPoint PPT Presentation

elgin
106 Views
Download Presentation

Extraction and Visualization of Geographical Names in Text

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Extraction and Visualization of Geographical Names in Text ZHANG Xueying zhangsnowy@163.com Key Laboratory of Virtual Geographical Environment, Ministry of Education Nanjing Normal University Nov. 18, 2009

  2. Content 1 2 3 Background Extraction of geographical names Applications

  3. Information and Library Sciences Computer Science Resolution of Geographical names Natural Language Processing GIS Generation of geographical names …… spatial model of the earth Computational linguistics Medicine Geography Political and social sciences Human Computer Interaction Geophysics Cognitive Psychology Biology(botany/zoology/ecology) Archeology 1.1 Disciplines concerned with geographic space

  4. 1.2 What is a geographical names? Location designator Geographical named entity: named entities with nouns or location expressions Place name: the name by which a geographical place is known. Toponym: a named point of reference in both the physical and cultural landscape on the Earth's surface. Geographical name: essentially labels which distinguish one part of the earth’s surface from another. Location

  5. 1.3 Main tasks Recognition:identify geospatial names from a text span and then classifies them to predefined geographical feature categories. Resolution: look up candidate referents and uses algorithms to pick the correct referents assigned to the recognized geographical names.

  6. 1.4 Basic processing architecture • Applications Geographical Information System • Formalization Natural language processing and Machine learning Extraction Geospatial Information • Dataset Representation Natural language text

  7. 1.5 Statistical models-ME Maximum Entropy 1996 Natural language processing √no assumption of a normal distribution √ no limits of context characteristics √ learning cost of its parameters √Considering single situations

  8. 1.5 Statistical Models-HMM Hidden Markov Model • Markov property • Markov chain model: For observable state sequences (state is knownfrom data). • Hidden Markov Model: For non-observable states

  9. Part-of-speech tagging Speech recognition 1.5 Statistical Models-HMM Handwriting recognition Machine translation HMM in Computational Linguistics

  10. 1.6 Statistical Models-CRF Conditional Random Field • Much like a Markov random field • An HMM –a CRF with very specific feature functions • A CRF --generalization of an HMM

  11. Content 1 2 3 Background Extraction of geographical names Applications

  12. 2.1 Diagram of CRF based recognition linguistic characteristics label granularity Feature template CRF training CRF test Dataset Simple geographical names CCRF training CCRF test Combined geographical names

  13. 2.2 Linguistic characteristics • language, history and culture • special characters • Combined named units • spatial relations

  14. 2.3 Label granularity • Granularity:1-gram, 2-gram, …., word, phrase, sentence, paragraph, discourse • 1-gram: sparse data • Word segmentation

  15. 2.4 CCRF( cascaded CRF)

  16. 2.5 Feature template • Context: observable windows n: training time and test performance

  17. 2.5 Feature template

  18. 2.6 A example 位于黑龙江省哈尔滨市的哈尔滨市儿童公园为孩子们准备了特殊的贺岁礼物。 Harbin Children Park in the Harbin city of Heilongjiang Province prepared special new year gifts for children. 位于黑龙江省哈尔滨市的哈尔滨市儿童公园为孩子们准备了特殊的贺岁礼物。 Harbin Children Park/SGNin the Harbin city/SGN of Heilongjiang Province/SGNprepared special new year gifts for children. 位于黑龙江省哈尔滨市的哈尔滨市儿童公园为孩子们准备了特殊的贺岁礼物。 Harbin Children Park/SGNin the Harbin city of Heilongjiang Province/CGNprepared special new year gifts for children.

  19. 2.7 Experimental performance

  20. 2.8 Resolution approach Matching Gazetteer Candidate referents Cognitive salience model Reference disambiguation intended referents

  21. 2.9 Cognitive salience model • High degree of spatial correlation in geographic references that are in textual proximity.

  22. 2.10 Problems • Ancient geographical names • Spatio-temple Changs • Limits of statistical models • Limits of gazetteers • ……

  23. Content 1 2 3 Background Extraction of geographical names Applications

  24. GeoChunk: an annotation system

  25. TextMAP: a integrated system for text and map

  26. CGeoCoder: a address geocoding systems

  27. SRAnnotation