The information integration wizard (Iwiz) project. Report on work in progress Joachim Hammer Presented by Muhammed Al-Muhammed. introduction. people use internet to find information of interest.
The information integration wizard (Iwiz) project
Report on work in progress
people use internet to find information of interest.
* that is easy if all information available in the same
- But this not the case nowadays!
The information of interest could be located in multiple
sources. So what we can do?
- if all sources use the same tools and data modeling to
create and manage their data the problem of finding
information of interest is no longer problem!
but what if these sources use different tools, hardware , software platforms to manage their data (heterogeneity at a peak).
What possible problems? Some could be:
- schematic problems.
- semantics problems.
So what can we do? Obvious
solution is to use a tool the can overcome
heterogeneity problems and decentralization
of information sources. This the reason why the data integration is important.
So what benefits users get out of data integration
tools? The greatest benefits are :
1.the user does not have to worry about what
sources are available;
2. where they are located;
3. how the data is represented in each source;
4.and how each data source is queried;
Goals of the project
help users get information from heterogeneous
sources. How they achieve this goal?
Build integration system using hybrid data
“warehousing / Mediators” approach.
Warehousing stores frequently accessed data.
Mediator supports on-demand queries if the data is
not available in the warehouse.
What issues must be investigated in order to achieve these goals?
Common data model and representation, I.e. what data model can be used to represent the information in the integrated system?. They chose XML for their system. Because it has some nice features such as clear separation of the data and schema.
2. Defining global schema to provide a
representation of relevant data tailored to the
user’s needs.3. Semantic heterogeneities (huge problem)
what hurdles caused by heterogeneity:
- understanding the meaning of the source data
- relating it to the global schema.
- translate values from source to target context
- merging related data
heterogeneity faced at 3 levels: - System level :Hardware, operating system. - Data management :difference in the data models, access commands.. - Semantic level :the difference in the way related or similar data is represented in different sources.
How the three levels of heterogeneity can be overcome?
The first two are overcome by translators and adapters.
The third one is the serious one!
The following diagram gives some idea about the kind of heterogeneities.
* To overcome heterogeneity, mapping
1- schema restructuring; eliminate syntax
and semantic inconsistencies between the
source schema and global schema.
2- schema merging; removal duplications,
removal of inconsistent data ..
4- Knowledge representation; a common metadata knowledgebase to reason about the meaning of and relationships among concepts.
“To deal with the issues, they proposed a system called Iwiz.”
- Iwiz architecture
Transform from XML source target schema XML
Restructuring and Merging
The goals of this are:
1- generate rules for converting data from its native source global schema.
2- populate the global target schema with data.
How the data restructured and merged?