1 / 24

Thomas Triebsees University of the German Federal Armed Forces Munich

This research focuses on automating the migration of documents while preserving the semantic properties of embedded queries. The paper presents a framework for tracing and evaluating properties, and constructing automated query evaluation. Results and conclusions are also discussed.

topp
Download Presentation

Thomas Triebsees University of the German Federal Armed Forces Munich

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Towards Automatic Document Migration: Semantic Preservation of Embedded Queries Thomas Triebsees University of the German Federal Armed Forces Munich Department of Computer Science Thomas.Triebsees@unibw.de Winnipeg, 31th August 2007 Thomas Triebsees, Department of Computer Science

  2. Agenda • Research Context and Motivation • Our Approach • Property Specification and Tracing • Automated Query Evalutation and Construction • Results • Conclusions Thomas Triebsees, Department of Computer Science

  3. Research Context and Motivation Thomas Triebsees, Department of Computer Science

  4. Research Context Task:Semantic preservation • high degree of process reliability necessary (trustworthyness) • amount of documents requires automation • document representations (formats) change • still: most QA done hand-crafted Thomas Triebsees, Department of Computer Science

  5. Example Property – Link Consistency WWW harvest Aim: improve portability 137.193.60.82 137.193.60.99 <html> <head> <title>Calculation</title> </head> <body> <a ref=“137.193.60.82/calc05/calc.pdf/"> documents </a> </body> </html> <html> <head> <title>Calculation</title> </head> <body> <a href=“./calc05/calc.pdf/"> documents </a> </body> </html> store source source style.css style.css calc05 calc05 start.html start.html calc.pdf calc.pdf Website Calculation Website Calculation Thomas Triebsees, Department of Computer Science

  6. Example Property – Link Consistency WWW harvest <html> <head> <title>Calculation</title> </head> <body> <a href=“./resources/calc05/calc.pdf/"> documents </a> </body> </html> 137.193.60.82 137.193.60.99 <html> <head> <title>Calculation</title> </head> <body> <a ref=“137.193.60.82/calc05/calc.pdf/"> documents </a> </body> </html> store Calculation source index.html html resources style.css calc05 start.html calc05 style.css calc05 calc.pdf Website Calculation calc.pdf Thomas Triebsees, Department of Computer Science

  7. Semantic Queries 137.193.60.99 <html> <head> <title>Calculation</title> </head> <body> <a href=“./resources/calc05/calc.pdf/"> documents </a> </body> </html> Queries embedded in documents; Formalize semantic preservation: - evaluation - construction? Calculation html resources index.html calc05 style.css calc05 calc.pdf Examples: • URLs query server/directory structure • style sheets (CSS) query XML/HTML documents • XPath expressions query XML documents • … Thomas Triebsees, Department of Computer Science

  8. Our Approach – Semantic Evaluation and Construction of Embedded Queries Thomas Triebsees, Department of Computer Science

  9. Our Approach Trace relevant object histories. Verify preservation requirements w.r.t. source and target objects. What are the relevant properties? What are the different representation forms? What is to be preserved? (4) automated verification Framework tracing property specifications preservation requirements (2) (1) notification property matching property matching source documents target documents migration process (3) Implement transformation: Notify system on transformation steps Thomas Triebsees, Department of Computer Science

  10. (1) Property Specification Concept + Interface • define role names for property • assign roles in different implementations LinksTo link_source link_anchor link_target Context LinkAbs Context LinkRel <html> <head> <title>Calculation</title> </head> <body> <a href=“./resources/calc05/calc.pdf/"> documents </a> </body> </html> 137.193.60.99 137.193.60.82 <html> <head> <title>Calculation</title> </head> <body> <a ref=“137.193.60.82/calc05/calc.pdf/"> documents </a> </body> </html> store Calculation source html resources index.html style.css calc05 start.html calc05 style.css calc05 calc.pdf Website Calculation calc.pdf Thomas Triebsees, Department of Computer Science

  11. presK( {s → link_source, a → link_anchor, t → link_target}, LinksTo(s, a, t), {LinkAbs,LinkRel}, {LinkRel}) (2) Expressing Preservation Requirements Requirement: When transforming a website, translate all absolute links to relative links while preserving link consistency. Expressed semi-formally using concepts and contexts: When transforming a link source, a link anchor, and a link target to a new representation, preserve the concept LinksTo for these objects in the context LinkRel. Expressed formally: Thomas Triebsees, Department of Computer Science

  12. (3) Tracing Semantic Properties - Preservation presK( {s → link_source, a → link_anchor, t → link_target}, LinksTo(s, a, t), {LinkAbs,LinkRel}, {LinkRel}) LinksTo link_source link_anchor link_target LinkAbs LinkRel <html> <head> <title>Calculation</title> </head> <body> <a href=“./resources/calc05/calc.pdf/"> documents </a> </body> </html> 137.193.60.99 137.193.60.82 <html> <head> <title>Calculation</title> </head> <body> <a ref=“137.193.60.82/calc05/calc.pdf/"> documents </a> </body> </html> store Calculation source html resources index.html style.css calc05 start.html calc05 style.css calc05 calc.pdf Website Calculation calc.pdf Thomas Triebsees, Department of Computer Science

  13. Preservation of Embedded Queries Integrating embedded queries Targets:Semantic preservation of link consistency • links can be evaluated semantically • only valid URLs are accepted as links • links can be constructed automatically • only valid URLs are constructed • constructions allow for formal proofs w.r.t. preservation requirement Steps: Formalize queried structure for link evaluation and construction Formalize syntactically valid URLs Combine both Can be generalized to other applications Tools: • Automata Theory (Finite State Automata, FSA) • Graph Theory Thomas Triebsees, Department of Computer Science

  14. Specification of Queried Structure (1) Formalize queried structure • vertices (objects) yield query semantics • labels carry URL substrings • generate finite state automaton Thomas Triebsees, Department of Computer Science

  15. Specification of Queried Structure Thomas Triebsees, Department of Computer Science

  16. Specification of Syntactically Valid URLs (2) Formalize syntactically valid URLs • reduce URI-reference grammar • construct query automaton Grammar for URI-references Thomas Triebsees, Department of Computer Science

  17. Specification of Syntactically Valid URLs Construction of Query automaton Thomas Triebsees, Department of Computer Science

  18. Combine both – Full link automaton (3) Combine both • basically: Let both automata run in parallel • match non-terminal transitions of URL automaton with appropriate transitions of struture automaton Thomas Triebsees, Department of Computer Science

  19. Integration and Benefit LinksTo link_source link_anchor LinkAbs LinkRel link_target 137.193.60.99 <html> <head> <title>Calculation</title> </head> <body> <a href=“./resources/calc05/calc.pdf/"> documents </a> </body> </html> 137.193.60.82 <html> <head> <title>Calculation</title> </head> <body> <a ref=“137.193.60.82/calc05/calc.pdf/"> documents </a> </body> </html> store working provably correct Calculation source html resources index.html style.css calc05 start.html calc05 style.css calc05 construction calc.pdf evaluation Website Calculation calc.pdf Thomas Triebsees, Department of Computer Science

  20. Results Thomas Triebsees, Department of Computer Science

  21. Thomas Triebsees, Department of Computer Science

  22. Conclusions and Outlook Thomas Triebsees, Department of Computer Science

  23. Automated evaluation and construction of embedded queries • Based on formal, automata-theoretic constructions -> provable correctness • Integration into framework for semantic preservation • Future work: • Computing structures on demand • Regular expressions as queries • Include extensions like CSS or XPath predicates Thomas Triebsees, Department of Computer Science

  24. Subject to your questions… ? Thomas Triebsees Universität der Bundeswehr München Department of Computer Science www.unibw.de/Thomas.Triebsees Thomas.Triebsees@unibw.de Thomas Triebsees, Department of Computer Science

More Related