1 / 19

Data Discovery

Data Discovery. Understanding data relationships. Philip Howard Research Director – Bloor Research. Agenda. What are data relationships and why are they important? Different approaches to discovering data relationships Features you might look for in a data discovery tool.

jfosdick
Download Presentation

Data Discovery

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Discovery Understanding data relationships Philip Howard Research Director – Bloor Research

  2. Agenda • What are data relationships and why are they important? • Different approaches to discovering data relationships • Features you might look for in a data discovery tool

  3. What is a data relationship? • A relationship between database tables, either within or across databases • A relationship within or across non-relational data sources • A relationship between a relational and non-relational source • Note that relationships may be complex and/or involve more than 2 elements

  4. Why are data relationships important? • Data migration

  5. Why are data relationships important? 2. Data archival

  6. Why are data relationships important? 3. Master data management

  7. Why are data relationships important? 4. Data governance

  8. Why are data relationships important? 5. Data modelling

  9. Why are data relationships important? 6. Business intelligence

  10. Why are data relationships important? 7 & 8 & 9 & … Data integration Legacy migration Data warehousing …

  11. Why are data relationships difficult? • No definition exists across multiple sources • Within a source many relationships are not explicit • Ownership of relationships is diverse • Many relationships are defined within application software and not in the data source

  12. Data relationships in place Different issues arise when you consider relationships within systems versus across systems

  13. Data relationships within systems • Typical functions: • Identification of primary-foreign key pairs • Dependency analysis • Redundant columns • Usually provided through data profiling, which also provides error statistics

  14. Data relationships across systems • Requirement for relationship discovery • No requirement for error statistics • Requirement for rule violations where this represents a violation of a cross-source relationship

  15. Specific requirements • For MDM – overlap & precedence analysis, transformation & business rules and exceptions, outlier analysis, matching keys • For data migration & archival – business entities

  16. General functions • Automation of MDM and Profiling functions • Visualisation of relationships • Semantics • the semantic type of the data e.g. zip code • context-free discovery – e.g. recognising that cust# is equivalent to custID • Data classification: recognising the relationship between a pre-defined, business-user-maintained domain of values and the actual content of a field in order to identify the content of a field as well as unexpected values. • Business glossary

  17. Tools Landscape

  18. Conclusion • Understanding data relationships across data sources is important in many data management disciplines • There are relatively few tools that are good at discovering such relationships – moreover, data discovery is a broad discipline and no one tool is good at all aspects of relationship discovery.

More Related