180 likes | 233 Views
Learn about Sangam, a modeling framework for transforming data between different formats while ensuring flexibility and reusability. Explore the cross algebra operators, execution strategies, and graph-based model.
E N D
Sangam: A Transformation Modeling Framework Kajal T. Claypool (U Mass Lowell) and Elke A. Rundensteiner (WPI)
The Era of Electronic Information • Age of electronic information • Data exists in many different formats • Different data models • Different schemas • Users need to • Publish data in many formats • Integrate and transform data • Query and expect results in common format • Underlying problem • Need to express mapping of data from one format to another • Need to perform transformation of data based on expressed mappings.
Schema Translation: State of the Art • Naïve approach [Zhang01,Shanmugasundram99] • Write specific programs to translate data from one format to another • Examples: • Algorithms: translate XML documents into relational data [zhang01,shanmugasundram99] • Latex2html: convert latex into HTML documents
Schema Translation: State of Art • Matching approach [milo98] • Automatically discover the semantic correspondences between two schemas • Generate translations based on discovered matches • Modeling approach [bernstein00,atzeni96] • Transform local schema into common data model • Translation language to express mappings between schemas in middle layer
The Sangam Framework • Goals: Flexible, extensible, and re-usable • Allow users to: • Explicitly model translations between schemas • Compose translations from an existing library of modeled translation patterns • Choose from a library of translation operators • Generate translation model from based on schema match process • For all modeled translations: transform the data based on translation
Overview of Sangam Framework Legend: System Input Pattern Interface User Input System generated output Transformation Framework Schema S1 ToolSet Tran I n t e r f a c e Transform- ation Patterns Displayed to User Data D1 Matches Matcher Transformation Model Schema S2 User feedback Data D2 Evaluator Transformed Schema Transformed Data
Outline • Sangam graphs • Cross algebra operators • Composition techniques • Cross algebra graphs • Execution strategies • Architecture • Conclusions
Cross Algebra for translation Sangam graph Export Import RDB XML Sangam Graphs • Sangam • Common data model: • Sangam graph model • Translation language • Algebra-based
Requirements for a Common Data Model • Graph-based • Common denominator for most data models • Expressiveness • Represent schemas from different data models • Fundamental constraints • Represent constraints such as quantifier, order and key constraints • Existing data models not completely suitable • Relational, and OO cannot represent order in clean manner • XML (older spec) can not represent key constraints
Sangam Graph Model • Satisfies requirements • Graph based • Based on SIGs[Miller93] • Can model schemas from different data models • Can represent quantifier, order, key and foreign key constraints • Graph • Nodes represent entities • Eg. Relation, attribute, element • Edge relationships between them • Eg. Containment relationship between relation and attribute
Example: Sangam Graph <!ELEMENT item (location, mailbox, name)> <!ATTLIST item id ID #REQUIRED featured CDATA #IMPLIED> <!ELEMENT location (#PCDATA)> <!ELEMENT mailbox (mail*)> <!ATTLIST mailbox id CDATA> <!ELEMENT mail (from, to, date)> <!ELEMENT from (#PCDATA)> <!ELEMENT to (#PCDATA)> <!ELEMENT date (#PCDATA)> <!ELEMENT name (firstName, lastName)> <!ELEMENT firstName (#PCDATA)> <!ELEMENT lastName (#PCDATA)>
Cross Algebra • Requirements for a transformation language • Node and edge manipulations • Minimal granularity • Eg: Relation has name and attributes • Allow composition • Unique contribution: algebra-based translation language • Translate from one Sangam graph to another • Four Operators: • Represent core set of graph linear transformations [GBook00] • Can be composed to formulate more complex operations such as a join operation – not our focus
Cross Algebra Operators • cross, connect, smooth and subdivide
Composition of Operators • Context Dependency • Output = union of output of all operators • Derivation • Output = output of root operator
Evaluating a Cross Algebra Graph FunctionEvaluateCAT (input: Operator op, Sangam Graph G, output: Sangam Graph G’) { if (!op.hasChildren ()) G’ p.evaluate (G, G’) op.markDone () out G’ // cached local output return G’ while (op.hasChildren()) { operator opC op.getNextChild () if (e:<op, opC> == derivation) G_local EvaluateCAT (opC, G, G’) G’ op.evaluate (G_local, G’) op.markDone () out G’ // cached local output return G’ elseif (e:<op, opC> == contextdependency) G_local EvaluateCAT (opC, G, G’) G’_local op.evaluate (G, G’) G’ G_local U G’_local op.markDone () out G’_local // local cached output return G’
Conclusions • Sangam: Flexible, extensible, and re-usable transformation modeling framework • Key contributions: • Concept of Sangam • Cross Algebra • An algebra for modeling linear transformations • Composition techniques to deal with finer granularity • Evaluation techniques • Future work • Modeling the data model layer • Optimization of evaluation strategies • Non-linear transformations