1 / 35

Syntax-directed Transformations of XML Streams

Syntax-directed Transformations of XML Streams. Stefanie Scherzinger joint work with Alfons Kemper . XML Stream Processing. <bib> <book> <year> 1999 </year> <title> Data on the Web </title> <author> Serge Abiteboul </author> <author> Peter Buneman </author>

teige
Download Presentation

Syntax-directed Transformations of XML Streams

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Syntax-directed Transformationsof XML Streams Stefanie Scherzingerjoint work with Alfons Kemper

  2. XML Stream Processing <bib> <book> <year>1999</year> <title>Data on the Web</title> <author>Serge Abiteboul</author> <author>Peter Buneman</author> <author>Dan Suciu</author> </book> ... 1. Very long XML documents. 2. Applications need to be completely main-memory based. <!ELEMENT bib (book)*><!ELEMENT book (year,title,author,author*) <!ELEMENT year #PCDATA> <!ELEMENT title #PCDATA> <!ELEMENT author #PCDATA> 3. Schema information is available.

  3. XML Query Languages XPath <?xml version="1.0" encoding="ISO-8859-1"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/"> <books> <xsl:for-each select="bib/book"> <book> <xsl:copy-of select="title"/> <xsl:copy-of select="author"/> </book> </xsl:for-each> </books> </xsl:template> </xsl:stylesheet> //book[year=2003]/title <books> { for $x in input()//book where $x/year=2003 return <book> {$x/title} <authors> {$x/author} </authors> </book> } </books> Schema knowledgenecessary to specify query! XQuery XSLT

  4. TransformX Attribute Grammars • (Suitable) extended regular tree grammar, e.g. DTD • Add attribution functions (Java code) • Parser generator produces Java code: • Validates the input • Evaluates the attribution functions • Compile and execute

  5. Extended Regular Tree Grammars Grammar G = (Nt,T,P,bib) Nonterminals Nt = {bib,pub,year,title,author} Terminals T = {bib,book,year,title,author,PCDATA} bib ::= bib(pub* ) pub ::= book(year.title.author.author* )pub ::= article( year.title.author.author*) year ::= year( PCDATA ) title ::= title( PCDATA ) author ::= author( PCDATA )  L(G)

  6. Example: Task input: output: <books> <book> <id>1</id> <title>Data on the Web</title> <year>1999</year> <author>Serge Abiteboul</author> <author>Peter Buneman</author> <author>Dan Suciu</author> </book> ... <bib> <book> <year>1999</year> < title>Data on the Web</title> <author>Serge Abiteboul</author> <author>Peter Buneman</author> <author>Dan Suciu</author> </book> ... • Re-label root to “books” • Retrieve all books, but not articles • For each book, output • numerical identifier • title, year, and authors

  7. Example: TransformX Attribute Grammar

  8. Example: TransformX Attribute Grammar definitionsection rulessection attributionfunctions class-membersection

  9. Grammar provides  context information potential for optimization

  10. Extended Regular Tree Grammars Grammar G = (Nt,T,P,bib) Nonterminals Nt = {bib,pub,year,title,author} Terminals T = {bib,book,year,title,author,PCDATA} bib ::= bib(pub* ) pub ::= book( year.title.author.author* )pub ::= article( year.title.author.author*) year ::= year( PCDATA ) title ::= title( PCDATA ) author ::= author( PCDATA )  L(G) Abbreviation: (pub*)=(book  article)*

  11. ERTG where rhs is  or(regular expression) is one-unambiguous: a*.a  a.a*  a.b*  a.c*  a.(b*  c*)  deterministic parsing with one token lookahead parse tree can be unambiguously constructed with lookahead of one token:  DTDs are a dialect of TDLL(1) grammars Lee, Mani, Murata, 2000. TDLL(1) Grammars

  12. Strong One-Unambiguity stronglyone-unambiguous Koch, Scherzinger, 2003.

  13. Syntax in the Abstract Attributed TDLL(1) grammar, i.e., each production • is of one of the four forms:n :: = t() n :: = {f$[} t() n :: = t() {f$]} n :: = {f$[} t() {f$]} • if  is an attributed regular expression, then for the regular expression  without the attribution functions: () must be strongly one-unambiguous

  14. Example

  15. Parse Tree

  16. Attributed Parse Tree

  17. bib book . . . year title year * author title author author author author author Attributed Parse Tree

  18. bib book . . . year title year * author title author author author author author Attributed Parse Tree

  19. bib book . . . year title year * author title author author author author author L-attributed Grammars

  20. bib book . . . year title year * author title author author author author author

  21. bib book . . . year title year * author title author author author author author

  22. bib book . . . year title year * author title author author author author author

  23. bib book . . . year title year * author title author author author author author

  24. bib book . . . year title year * author title author author author author author

  25. In Practice

  26. In Practice

  27. Class Members accessible from withinattribution functions

  28. TransformXAttributes transfer informationbetween attribution functions

  29. The TransformX Parser Generator Translation to Java source code: • The validator module • validate input • output attribution functions as encounteredin attributed extended parse tree  generated in O(|G|3) • The evaluator module • evaluate attribution functions • store attributes on stack • generated in O(1)

  30. Experiments Prototype:C++ implementation,generates Java code Experiments: • Validate the input • Output the input • Evaluate example Data: Books and articles, datasets 31-122 MB Memory consumption: 12 MB

  31. Conclusion & Summary • TransformX attribute grammars specify many queries conveniently often more convenient than SAX grammar may reveal potential for optimization • TransformX parser generatorlittle runtime-overhead (validation+attributes) • Prototype implementation

  32. XML and Attribute Grammars M. Benedikt, C.Y. Chang, W. Fan, J. Freire, and R. Rastogi. “Capturing both Types and Constraints in Data Integration“. SIGMOD’03. M. Benedikt, C.Y. Chan, W. Fan, R. Rastogi, S. Zhen, and A. Zhou. “DTD-Directed Publishing with Attribute Translation Grammars“. VLDB’02. C. Koch and S. Scherzinger:“Attribute Grammars for Scalable Query Processing on XML Streams“, DBPL’03. F. Neven and J. van de Bussche. “Expressiveness of Structured Document Query Languages Based on Attribute Grammars“. JACM, Jan. 2002. S. Nishimura and K. Nakano. “XML Stream Transformer Generation Through Program Composition and Dependency Analysis“. Science of Computer Programming, 2005. One-unambiguous Regular Languages Brüggemann-Klein and D. Wood. “One-Unambiguous Regular Languages“. Information and Computation, 1998. Strong One-unambiguity C. Koch and S. Scherzinger:“Attribute Grammars for Scalable Query Processing on XML Streams“, DBPL’03. TDLL(1) Grammars D. Lee, M. Mani, and M. Murata. “Reasoning about XML Schema Languages using Formal Language Theory.“ Technical Report RJ 10197 Log 95071, IBM Research, Nov. 2000. Lex&Yacc J. R. Levine, T. Mason, D. Brown. “lex&yacc“. O‘Reilly, 1992. Selected Related Work

  33. Thank you

More Related