1 / 22

MatchIT 1.1: Data Integration with Semantic Mapping Technologies

MatchIT 1.1: Data Integration with Semantic Mapping Technologies. Michael Schidlowsky Sr. Software Architect. Data Integration. Motivated by: Organizational Changes Mergers and Acquisitions Internal reorganizations (e.g., DHS) Data Mining Standards Conformance Migration Efforts

judithcox
Download Presentation

MatchIT 1.1: Data Integration with Semantic Mapping Technologies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MatchIT 1.1: Data Integration with Semantic Mapping Technologies Michael Schidlowsky Sr. Software Architect

  2. Data Integration • Motivated by: • Organizational Changes • Mergers and Acquisitions • Internal reorganizations (e.g., DHS) • Data Mining • Standards Conformance • Migration Efforts • Legacy Systems • Decouple data sources from application code

  3. Data Integration • Challenges for integration specialist include: • Domain-specific terms • Unfamiliarity with source schemas • Large size of schema set • Semantics often not captured • Captured semantics • Stored in ad-hoc formats • Cannot be reused to facilitate future data integration efforts

  4. Background: Acme Inc., merges with CompuGlobalHyperMeganet. Technical Challenge: Need “Virtual Database” of all sales for all stores in real-time. Which fields represent customers? CUSTOMERID CUST_ID SSN Which fields represent ‘Price’? Sale_Amt Total_Sale What if your database has 10,000 columns? Data Integration: Example

  5. Background: HR needs to use employee information for new company portal. Technical Challenge: Data must be in XML and conform to standard HR schema. Find all fields related to Address? RESIDENCE PREV_RESIDENCE What if your database has 10,000 columns? Data Integration: Example

  6. Ideal Matching Solution • Finds lexical relationships • Captures semantic information • Finds semantic relationships • Provides programmatic access to results (API) • Fast • Scalable • Human Involvement

  7. MatchIT Philosophy • Best Matching tool already exists! What is meant by “ID”?

  8. MatchIT Philosophy • Best Matching tool already exists! What is meant by “ID”? • “PLEASE PRESENT ID”

  9. MatchIT Philosophy • Best Matching tool already exists! What is meant by “ID”? • “PLEASE PRESENT ID” • NY, NJ, ID

  10. MatchIT Philosophy • Best Matching tool already exists! What is meant by “ID”? • “PLEASE PRESENT ID” • NY, NJ, ID • SUPEREGO, EGO, ID

  11. MatchIT 1.1 • - MatchIT is a semantic and lexical matching tool. • Session Outline: • Import and process schemas • Perform lexical matching • Create and manage a semantic vocabulary • Perform semantic matching • Demonstrate 3rd Party integration with Data Integration tool (MetaMatrix)

  12. Import & Process Schemas • Revelytix Models are RDF/OWL • Flexible model architecture • Extensible • Interoperable • Current Importers: • JDBC • XML Schema • MetaMatrix XMI Models Importer Demo

  13. Lexical Matching • Uses lexical distance measures to determine lexical similarity. • Fastest matching technique • Requires no work other than importing schemas • Often yields interesting results Lexical Matching Demo

  14. Create Vocabulary from Schemas • A Vocabulary is • A set of symbols • Occurrences of those symbols in your schemas • Binding of each symbol to one or more semantic concepts • Created by MatchIT from schemas using tokenization algorithms. • Reusable

  15. Tokenization Algorithms • Different schemas require different tokenization techniques. • Tokenization algorithms determine how symbols are extracted from schemas: • Capitalization • Delimiters • English Language Vocabulary Demo

  16. Matching Techniques • MatchIT currently uses two types of matching techniques: • Lexical Matching • Attempts to determine similarity based on the lexical distance between them. • Semantic Matching • Attempts to determine similarity based on the ontological distance between them within a semantic knowledge base.

  17. Parts Supplier Schema(as seen by a person)

  18. Parts Supplier Schema (as seen by a computer)

  19. Semantic Matching • How semantically similar are two concepts?

  20. Semantic Matching • Uses knowledge base distance measures to determine semantic similarity. • Presents ranked candidate matches • Based on semantics captured in Vocabularies • The only way to effectively find relationships between lexically dissimilar symbols: GenderCode SexCode Provider Supplier Amount Quantity Semantic Matching Demo

  21. 3rd Party Integration • MatchIT Integration • MatchIT Java API • Stand-alone application • Embeddable application (as Eclipse plug-ins). • Hides unapproved matches • Useful for various 3rd Party applications: • Data Integration • Data Discovery • Ontology Mediation • Search • Metadata Management • Data Cleansing MetaMatrix Demo

  22. Questions? MatchIT 30-day trial available at http://www.revelytix.com Michael Schidlowsky michaels@revelytix.com

More Related