1 / 13

On building a high performance gazetteer database Amittai Axelrod MetaCarta Inc

On building a high performance gazetteer database Amittai Axelrod MetaCarta Inc. Thanks to. Keith Baker Kenneth Baker Michael Bukatin András Kornai. Plan of the talk. Database background Relating geographic names and features Handling ambiguities and inconsistencies in geographic names

minor
Download Presentation

On building a high performance gazetteer database Amittai Axelrod MetaCarta Inc

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On building a high performance gazetteer databaseAmittai AxelrodMetaCarta Inc

  2. Thanks to Keith Baker Kenneth Baker Michael Bukatin András Kornai

  3. Plan of the talk • Database background • Relating geographic names and features • Handling ambiguities and inconsistencies in geographic names • Classification and storage system for geographic features

  4. Databases • No DB (faking it with flat files) -- clumsy • Record-oriented -- still runs the world • Relational -- making headway • Object-oriented -- still very academic • For MetaCarta GazDB, relational approach made most sense: • Overlapping records (McKinley/Denali) • Need for frequent updates of subparts of records

  5. Gazetteer production process

  6. Conversion scripts • Enforce uniform structure on the data • Normalize across sources (e.g. lat/lon to decimal degrees, spelling, …) • Configuration required once per source • Load data in GazDB • Combination perl/SQL

  7. Relating features and names

  8. Other tables used in GazDB • Population • Elevation • Language • Feature type • Source/versioning info • Temporal extent • Hierarchical information • Confidence • Comments • Change logs (full auditing)

  9. Geographic names • Internationalization • Full Unicode (UTF8) support • Maintain detail language information (SIL) • Name resolution • Canonical form (16 bits) • Display form (8 bit) • Search form (6 bit) • Authoritativeness • Explicitness

  10. Updating a name in the GazDB

  11. Geographic features • Spatial representations • Point, line, area, … • Functional classes • Building, field, campus, city, … • Administrative types • Nation, province, county, international org, …

  12. Export scripts • Read GazDB • Select which fields to include in custom output • Creates .gbdm (MetaCarta format) binaries • Combination perl/SQL • Not yet general across binary output formats

  13. Conclusions • Accept multiple sources (only configure once per source) • Fast loading of large datasets (1m entries per hour on linux desktop) • Simple update procedure • Outputting large binary custom gazetteers for different purposes at extreme speeds (1m entries per minute)

More Related