Semantic Web Research at University of Texas at Dallas
Download
1 / 19

Faculties: Latifur Khan Bhavani Thuraisingham - PowerPoint PPT Presentation


  • 115 Views
  • Uploaded on

Semantic Web Research at University of Texas at Dallas (Schema Matching + Storage & Retrieval of RDF graph). Faculties: Latifur Khan Bhavani Thuraisingham. Semantic Matching in the GIS Domain. Jeffrey Partyka (Ph.D. Student) Faculties: Funded by

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Faculties: Latifur Khan Bhavani Thuraisingham' - parry


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Semantic Web Research at University of Texas at Dallas(Schema Matching + Storage & Retrieval of RDF graph)

Faculties:

Latifur KhanBhavani Thuraisingham


Semantic matching in the gis domain

Semantic Matching in the GIS Domain

Jeffrey Partyka (Ph.D. Student)

Faculties: Funded by

Latifur KhanBhavani Thuraisingham


Schema matching
Schema Matching

  • Performing semantic similarity between two tables by mapping the properties of instances to one another:

EBD similarity


Representing types using n grams

  • Jeffrey Partyka, Neda Alipanah, Nilesh Singhania, Latifur Khan, Bhavani Thuraisingham, “Content Based Ontology Matching for GIS Datasets“, ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM GIS 2008), Page: 407-410, Irvine, California, USA, November 2008.

  • Jeffrey Partyka, Neda Alipanah, Nilesh Singhania, Latifur Khan, Bhavani Thuraisingham, “Content Based Ontology Matching for GIS Datasets“, ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM GIS 2008), Page: 407-410, Irvine, California, USA, November 2008.

  • Jeffrey Partyka, Neda Alipanah, Nilesh Singhania, Latifur Khan, Bhavani Thuraisingham, “Content Based Ontology Matching for GIS Datasets“, ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM GIS 2008), Page: 407-410, Irvine, California, USA, November 2008.

  • Jeffrey Partyka, Neda Alipanah, Nilesh Singhania, Latifur Khan, Bhavani Thuraisingham, “Content Based Ontology Matching for GIS Datasets“, ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM GIS 2008), Page: 407-410, Irvine, California, USA, November 2008.

Representing types using N-grams*

  • Use commonly occurring N-grams in compared columns to determine similarity (N = 2)

CA

CB

N-gram types from A.StrName = {LO, OC, CU,ST,…..}

N-gram types from B.Street = {TR, RA, R4, 5/,…..}

*Jeffrey Partyka, Neda Alipanah, Latifur Khan, Bhavani Thuraisingham & Shashi Shekhar, “Content Based Ontology Matching for GIS Datasets“, ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM GIS 2008), Page: 407-410, Irvine, California, USA, November 2008.


How do we measure n gram similarity between columns
How do we measure N-gram similarity between columns? Khan, Bhavani Thuraisingham, “Content Based Ontology Matching for GIS Datasets“,

  • Entropy-Based Distribution (EBD)

  • EBD is a measurement of type similarity between 2 columns:

  • EBD takes values in the range of [0,1] . Greater EBD corresponds to more similar type distributions between compared columns.

EBD = H(C|T)C = C1 UC2 H(C)


Entropy and conditional entropy
Entropy and Conditional Entropy Khan, Bhavani Thuraisingham, “Content Based Ontology Matching for GIS Datasets“,

Entropy: measure of the uncertainty associated with a random variable:

Conditional Entropy: measures the remaining entropy of a random variable Y given the value of a second random variable X


Visualizing entropy and conditional entropy
Visualizing Entropy and Conditional Khan, Bhavani Thuraisingham, “Content Based Ontology Matching for GIS Datasets“, Entropy

H(C) = –Σpi log pi for all x є C1 U C2

H(C | T) = H (C,T) – H(C) for all x є C1 U C2 and t є T


Faults of this method
Faults of this Method Khan, Bhavani Thuraisingham, “Content Based Ontology Matching for GIS Datasets“,

• Semantically similar columns are not guaranteed to have a high similarity score

A є O1

B є O2

2-grams extracted from A: {Da, al, la, as, Ho, ou, us…}

2-grams extracted from B: {Sh, ha, an, ng, gh, ha, ai, Be, ei, ij…}


Introducing google distance
Introducing Google Distance Khan, Bhavani Thuraisingham, “Content Based Ontology Matching for GIS Datasets“,

* Jeffrey Partyka, Neda Alipanah, Latifur Khan, Bhavani M. Thuraisingham, Shashi Shekhar, “Ontology Alignment Using Multiple Contexts”, International Semantic Web Conference (ISWC) (Posters & Demos), Karlsruhe, Germany, October, 2008.


K-medoid + NGD instance similarity Khan, Bhavani Thuraisingham, “Content Based Ontology Matching for GIS Datasets“,

Extract distinct keywords from compared columns

Step 1

C1

C2

C1 є O1

C2 є O2

Keywords extracted from columns = {Johnson, Rd., School, 15th,…}

Group distinct keywords together into semantic clusters

Step 2

: Column 1

“Rd.”,”Dr.”,”St.”,”Pwy”,…

“Johnson”,”School”,”Dr.”….

: Column 2

C1UC2

Similarity = H(C|T) / H(C)

Calculate Similarity

Step 3


Problems with K- Khan, Bhavani Thuraisingham, “Content Based Ontology Matching for GIS Datasets“, medoid + NGD*

It is possible that two different geographic entities (ie: Dallas, TX and Dallas County) in the same location will have a very low computed NGD value, and thus, be mistaken for being similar:

similarity = .797

*Jeffrey Partyka, Latifur Khan, Bhavani Thuraisingham, “Semantic Schema Matching Without Shared Instances,” to appear in Third IEEE International Conference on Semantic Computing, Berkeley, CA, USA - September 14-16, 2009.


Using geographic type information* Khan, Bhavani Thuraisingham, “Content Based Ontology Matching for GIS Datasets“,

We use a gazetteer to determine the geographic type of an instance:

O1

Geotypes

O2

*Jeffrey Partyka, Latifur Khan, Bhavani Thuraisingham, “Geographically-Typed Semantic Schema Matching,” submitted to ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM GIS 2009), Seattle, Washington, USA, November 2009.


Disambiguating Geographic Types Khan, Bhavani Thuraisingham, “Content Based Ontology Matching for GIS Datasets“, For A Given Instance

We can use metadata and other information to reduce the number of type possibilities for a given instance:

City

Dallas

County

Dallas

City


Geographic Types + NGD Khan, Bhavani Thuraisingham, “Content Based Ontology Matching for GIS Datasets“,

It is now possible to make corrections for the geographic co-occurrence mistakes of NGD:

similarity = .398


Disambiguation using latlong values
Disambiguation Using latlong values Khan, Bhavani Thuraisingham, “Content Based Ontology Matching for GIS Datasets“,

  • Each input consists of a name and coordinates (Lat/Long values).

  • Our knowledge base consists of records for a number of different geospatial features such as streets, lakes, schools, etc. for the entire US.

  • Each entry in the knowledge base contains, coordinates and other spatial information such as length and area of the landmark.


Disambiguation using latlong values contd
Disambiguation Using latlong values (contd..) Khan, Bhavani Thuraisingham, “Content Based Ontology Matching for GIS Datasets“,

Geo-Database


Disambiguation using latlong values contd1
Disambiguation Using latlong values (contd..) Khan, Bhavani Thuraisingham, “Content Based Ontology Matching for GIS Datasets“,

  • We first select look for the entries with similar name in knowledge base.

  • Next, for each feature type in the knowledge base, we choose the entry which is located closest to the input.

  • In case of two features having close proximity to the input, we disambiguate the feature type on the basis of geospatial properties like area and perimeter.


Attribute weighting
Attribute Weighting Khan, Bhavani Thuraisingham, “Content Based Ontology Matching for GIS Datasets“,

  • Default weighting scheme is to treat all 1-1 matches between properties/attributes with equal importance:

50%

50%


Results of Geographic Matching Over 2 Khan, Bhavani Thuraisingham, “Content Based Ontology Matching for GIS Datasets“, Separate Road Network Data Sources


ad