Sax parsing l.jpg
This presentation is the property of its rightful owner.
Sponsored Links
1 / 21

SAX Parsing PowerPoint PPT Presentation


  • 300 Views
  • Uploaded on
  • Presentation posted in: General

SAX Parsing. Presented by Clifford Lemoine CSC 436 Compiler Design. SAX Parsing Introduction. Review of XML What is SAX parsing? Simple Example program Compiler Design Issues Demonstrated by a more complex example Wrap-up References. Quick XML Review. XML – Wave of the future

Download Presentation

SAX Parsing

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Sax parsing l.jpg

SAX Parsing

Presented by

Clifford Lemoine

CSC 436

Compiler Design


Sax parsing introduction l.jpg

SAX Parsing Introduction

  • Review of XML

  • What is SAX parsing?

  • Simple Example program

  • Compiler Design Issues

    • Demonstrated by a more complex example

  • Wrap-up

  • References


Quick xml review l.jpg

Quick XML Review

  • XML – Wave of the future

    • Method of representing data

    • Differs from HTML by storing and representing data instead of displaying or formatting data

    • Tags similar to HTML tags, only they are user-defined

    • Follows a small set of basic rules

    • Stored as a simple ASCII text file, so portability is insanely easy


Quick xml review4 l.jpg

Quick XML Review

  • Syntax

    • Every XML document has a preamble

      • <?xml version=“1.0” ?>

    • An XML document may or may not have a DTD (Document Type Definition) or Schema

      • <!DOCTYPE catalog>


Quick xml review5 l.jpg

Quick XML Review

  • Syntax cont.

    • Every element has a start and end tag, with optional attributes

      • <catalog version=“1.0”> … </catalog>

    • If an element does not contain any data (or elements) nested within, the closing tag can be merged with the start tag like so:

      • <catalog version=“1.0”/>


Quick xml review6 l.jpg

Quick XML Review

  • Syntax cont.

    • Elements must be properly nested

    • The outermost element is called the root element

    • An XML document that follows the basic syntax rules is called well-formed

    • An XML document that is well-formed and conforms to a DTD or Schema is called valid

    • Once again, XML documents do not always require a DTD or Schema, but they must be well-formed


Quick xml review7 l.jpg

Quick XML Review

  • Sample XML files

    • Catalog.xml

    • authorSimple.xml

    • authorSimpleError.xml


What is sax parsing l.jpg

What is SAX Parsing?

  • Simple API for XML = SAX

  • SAX is an event-based parsing method

    • We are all familiar with event-driven software, whether we know it or not

    • Pop-up windows, pull-down menus, etc.

    • If a certain “event” (or action) happens, do something

  • A SAX parser reads an XML document, firing (or calling) callback methods when certain events are found (e.g. elements, attributes, start/end tags, etc.)


What is sax parsing9 l.jpg

What is SAX Parsing?

  • Benefits of SAX parsing

    • Unlike DOM (Document Object Model), SAX does not store information in an internal tree structure

    • Because of this, SAX is able to parse huge documents (think gigabytes) without having to allocate large amounts of system resources

    • Really great if the amount of data you’re looking to store is relatively small (no waste of memory on tree)

    • If processing is built as a pipeline, you don’t have to wait for the data to be converted to an object; you can go to the next process once it clears the preceding callback method


What is sax parsing10 l.jpg

What is SAX Parsing?

  • Downside

    • Most limitations are the programmer’s problem, not the API’s

    • SAX does not allow random access to the file; it proceeds in a single pass, firing events as it goes

    • Makes it hard to implement cross-referencing in XML (ID and IDREF) as well as complex searching routines


What is sax parsing11 l.jpg

What is SAX Parsing?

  • Callback Methods

    • The SAX API has a default handler class built in so you don’t have to re-implement the interfaces every time (org.xml.sax.helpers.DefaultHandler)

    • The five most common methods to override are:

      • startElement(String uri, String lname, String qname, Attributes atts)

      • endDocument(String uri, String lname, String qname)

      • characters(char text[], int start, int length)

      • startDocument()

      • endDocument()


Simple example program l.jpg

Simple Example Program

  • Sax.java

    • Instantiates a SAX parser and creates a default handler for the parser

    • Reads in an XML document and echoes the structure to the standard out

    • Two sample XML documents:

      • authorSimple.xml

      • authorSimpleError.xml

  • Demonstration here


Compiler design issues l.jpg

Compiler Design Issues

  • What is actually happening when a SAX parser parses an XML document?

  • What type of internal data structures does it use?

  • How do the callback methods fit in?

  • Can it solve problems of world peace, hunger, and death? (Or at least can it help me pass Compiler Design?)

  • Demonstrated with SaxCatalogUnmarshaller example


Compiler design issues14 l.jpg

Compiler Design Issues

  • Heart of the Beast

    • Underneath it all, the SAX parser uses a stack

    • Whenever an element is started, a new data object is pushed onto the stack

    • Later, when the element is closed, the topmost object on the stack is finished and can be popped

    • Unless it is the root element, the popped element will have been a child element of the object that now occupies the top of the stack (board)


Compiler design issues15 l.jpg

Compiler Design Issues

  • Heart of the Beast cont.

    • This process corresponds to the shift-reduce cycle of bottom-up parsers

    • It is crucial that XML elements be well-formed and properly nested for this to work


Compiler design issues16 l.jpg

Compiler Design Issues

  • startElement()

    • Four parameters:

      • String uri = the namespace URI (Uniform Resource Identifier)

      • String lname = the local name of the element

      • String qname = the qualified name of the element

      • Attributes atts = list of attributes for this element

    • If the current element is a complex element, an object of the appropriate type is created and pushed on to the stack

    • If the element is simple, a StringBuffer is pushed on to the stack, ready to accept character data


Compiler design issues17 l.jpg

Compiler Design Issues

  • endElement()

    • Three parameters:

      • String uri = the namespace URI (Uniform Resource Identifier)

      • String lname = the local name of the element

      • String qname = the qualified name of the element

    • The topmost element on the stack is popped, converted to the proper type, and inserted into its parent, which now occupies the top of the stack (unless this is the root element – special handling required)


Compiler design issues18 l.jpg

Compiler Design Issues

  • characters()

    • Three parameters:

      • char text[] = character array containing the entire XML document

      • int start = starting index of current data in text[]

      • int length = ending index of current data in text[]

    • When the parser encounters raw text, it passes a char array containing the actual data, the starting position, and the length of data to be read from the array


Compiler design issues19 l.jpg

Compiler Design Issues

  • characters() cont.

    • The implementation of the callback method inserts the data into the StringBuffer located on the top of the stack

    • Can lead to confusion because of:

      • No guarantee that a single stretch of characters results in one call to characters()

      • It stores all characters, including whitespace, encountered by the parser


Wrap up l.jpg

Wrap-up

  • SAX is an event-based parser, using callback methods to handle events found by the parser

  • Applications are written by extending the DefaultHandler class and overriding the event handler methods

  • The SAX parser usually uses a stack to perform operations

  • And No, SAX will not save the world…


References l.jpg

References

Gittleman, Art. Advanced Java: Internet Applications (Second Edition). Scott Jones Publishers. El Granada, California. 2002. pp. 504-511.

Janert, Phillip K. “Simple XML Parsing with SAX and DOM.” http://www.onjava.com/pub/a/onjava/2002/06/26/xml.html

Published June 26, 2002. Accessed February 10, 2003.

Wati, Anjini. “E-Catalog for a Small to Medium Enterprise.” http://ispg.csu.edu.au/subjects/itc594/reports/Tr-005.doc

Accessed February 10, 2003.


  • Login