210 likes | 224 Views
TMSync. Topic map-to-topic map updates. Lars Marius Garshol CTO, Ontopia <larsga@ontopia.net> TMRA 2006 2006-10-11. Agenda. Background the problem why TMSync is the solution TMSync in detail what it is how it works Applications what you can do with TMSync Conclusion. The problem
E N D
TMSync Topic map-to-topic map updates Lars Marius Garshol CTO, Ontopia <larsga@ontopia.net> TMRA 2006 2006-10-11
Agenda • Background • the problem • why TMSync is the solution • TMSync in detail • what it is • how it works • Applications • what you can do with TMSync • Conclusion
The problem Solving it with TMSync Background
The problem • Topic Maps hold out a promise as a great technology for data integration • because of merging, global identifiers, etc • However, dynamic sources are poorly supported at the moment • that is, converting once is easy, but staying in sync is hard • A solution that only supports static integration is near-worthless • in practice, integrated data is nearly always going to need updating from the source • building a one-time conversion is easy • building data integration with update support is hard • so, suddenly data integration with Topic Maps isn’t so easy, after all
Merging is not the solution • Merging in Topic Maps is often thought of in terms of <mergeMap> • this is only useful if you are working from XTM files • <mergeMap> only has an effect when the XTM file is loaded • after that, the only way to use the <mergeMap> is to reload from scratch • reloading from scratch loses all changes... • Real applications are based on databases • here <mergeMap> has no effect
What TMSync is • A simple way to update part of one topic map with part of another • define which part of the target topic map you want, • define which part of the source topic map it is the master for, and • the algorithm does the rest
TMSync convert.xslt If the source is not a topic map • Simply do a normal one-time conversion • let TMSync do the update for you • In other words, TMSync reduces the update problem to a conversion problem source.xml
What it is How it works TMSync in depth
TMSync in mathematical terms • A function that given • a target topic map, • a source topic map, • a topic selector for the target map (a function), • a characteristic selector for the target map (a function), • a topic selector for the source map (a function), • a characteristic selector for the source map (a function), • produces an updated target map
Mathematical specification • Currently based on the Q model[1] • mainly because this was the only model in existence when I started working • Will translate to the TMRM • since this is better-known, and now has a TMDM mapping [1] Q: A Model for Topic Maps, http://www.ontopia.net/topicmaps/materials/quads.html
The selection process name occurrence name occurrence occurrence
The update process name NAME NAME occurrence occurrence name name occurrence occurrence occurrence bar bar
How to configure the algorithm • How to specify the topics • use a query • this gives great flexibility, while keeping the algorithm simple • it also means that we can efficiently find the set of topics to work on • How to specify the characteristics • use a query, again, or • use a set of types, or • ...
What the algorithm does • For each topic in the sync’ed fragment • remove all sync’ed characteristics not in the source • except associations to non-sync’ed topics • add all characteristics in the source that are not in the target • leave the rest alone • Remove and add topics in the same way
City of Bergen US Publisher Applications
Norge.no Service Unit Person LivsIT The City of Bergen City of Bergen LivsIT
City of Bergen configuration • On the source side • query to get all instances of “category” and “keyword” • accept all characteristics • On the target side • query to get all instances of “category” and “keyword” • except those with mark-as-local associations • accept all characteristics except local search name and mark-as-local
TMSync Nameless US publisher • Use an automated process to classify documents • documents get reclassified now and then • output of process is an XTM document • If documents did not get reclassified, import would be enough • as it is, they use TMSync classified.xtm
Related work Further work Conclusion
Related work • RDFSync • algorithm to synchronize two RDF graphs efficiently • no business case focus • TM-Views • one possible way to define fragments for update • TMRAP • uses TMSync for the update-topic request
Further work • Reformulate algorithm to TMRM instead of Q • this will be done in the paper submitted to the proceedings • Improve algorithm to handle delta sets • that is, to only need information about what has changed since last in the source • this should not be very difficult • may do this for the final paper