440 likes | 600 Views
Introduction to XML. Kostas Kontogiannis Evan Mamas. Outline. Introduce XML, HTML and SGML Compare and Contrast XML vs. HTML XML vs. SGML XML Components, Applications, Industry Thoughts on XML. What is XML?. eXtensible Markup Language Proper subset of SGML for web use Meta-language
E N D
Introduction to XML Kostas Kontogiannis Evan Mamas
Outline • Introduce XML, HTML and SGML • Compare and Contrast • XML vs. HTML • XML vs. SGML • XML • Components, Applications, Industry • Thoughts on XML
What is XML? • eXtensible Markup Language • Proper subset of SGML for web use • Meta-language • Allows you to create your own markup languages • Compromise between HTML and SGML
What is HTML ? • HyperText Markup Language • Language to describe information for transmission over the web. • Uses tags to markup the information • Tags are just a formatting tool • Example • <H1> Hello, World </H1> • Hello, World
Why isn’t HTML enough? • Good enough for presenting text on the web • Not accepted as an authoring or archival form • Extensibility • HTML standard changes continually • Uses tags for formatting • Structures • Has no defined or definable structural rules
What is SGML ? • Standard Generalized Markup Language • International Standard for over 10 years • Language for specifying markup languages • Describes only the formal properties and inter-relations of the components of a document • Document, Entities, Elements, Attributes
Uses of SGML • Formally structured documents • Technical Manuals • Exchange documents • Product documentation • Data encoding • Interchange specification • Provide long-term storage of information which was independent of suppliers and changes in h/w and s/w
SGML Example • Memo • DTD (Document Type Definition) <to>All staff <from>Martin Bryan <date>5th November <subject>Cats and Dogs <text>Please remember to keep all cats and dogs indoors tonight. <!DOCTYPE memo [ <!ELEMENT memo O O ((to & from & date & subject?), text) > <!ELEMENT text - O (para+) > <!ELEMENT para O O (#PCDATA) > <!ELEMENT (to, from, date, subject) - O (#PCDATA) > ]>
Why isn’t SGML enough? • Specification is very long • Contains many options not needed for Web applications • Time consuming and high cost • Expensive tools • Too much for small applications • Bad reputation
XML vs. HTML • New tags and attributes definitions allowed • Document structures can be nested to any level of complexity • Structural validation is possible by describing the grammar
XML vs. SGML • XML is the minimum required subset of SGML for web use • Easier to implement and to create tools for • A new attempt at structured markup languages with a new “face”
XML Components • XML Style Language (XSL) • Cascading Style Sheets, level 2 CCS2 • XML Document Object Model (DOM) • XML Linking Language (XLL) • XML Pointer Language (XPL) • XML Name Spaces • Synchronized Multimedia Integration Language (SMIL) • Resource Description Framework (RDF) • Mathematical Markup Language (MathML)
XML Components (cont.) • XML Style Language (XSL) • Defines a way to present the documents • Separates formatting from content • Has two steps: • Generate a result tree (associate patterns with templates) • Use XML Namespace (formatting vocabulary) to generate formatted output. • Similar to DSSSL for SGML
XML Components (cont.) • Cascading Style Sheets, level 2 CCS2 • Defines a way to present documents • Similar to XSL (Not as strong) • Supported by most browsers <HTML> <TITLE>Bach's home page</TITLE> <STYLE type="text/css"> H1 { color: blue } </STYLE> <BODY> <H1>Bach's home page</H1> <P>Johann Sebastian Bach was a prolific composer. </BODY> </HTML>
XML Components (cont.) • XML Document Object Model (DOM) • In-memory model for representing parsed XML documents • Designed to provide common structures in XML browsers • Intended to enable interoperable XML processing across browsers • Implemented by Internet Explorer and Netscape
XML Components (cont.) • XML Linking Language (XLL) • Links by reference rather than exact location • Provides hyperlinking elements • Simple links like HTML links • Extended • Multi-directional links • Links with multiple destinations • Placing content inline from a linked document • Requires use of XML Pointer Language
XML Components (cont.) • XML Name Spaces • Vocabulary of all elements and attribute types • Namespace prefix (mapped to Uniform Redource Identifier) • Local Part • Allows use of names defined in other documents • Modularity and reuse of a markup • Mechanisms to establish name scope
XML Components (cont.) • Synchronized Multimedia Integration Language (SMIL) • Language for describing interactive synchronized multimedia distributed on the Web • Several components (images, video, audio) can be linked together to create a presentation on the web • Resource Description Framework (RDF) • Abstract mechanism for defining simple relationships among web resources • Mathematical Markup Language (MathML) • Language to describe mathematical expressions
XML DTD • Defines the hierarchy of all user-defined elements (tags) in the XML document • Declares the attributes and behaviour of each XML element • Each XML document calls a specific DTD file to validate its elements
XML DTD • <?xml version="1.0" encoding="UTF-8"?> • <!-- DTD for a simple program beginning of element declarations--> • <!--the root tag of Language--> • <!ELEMENT Language (FileTag*,Declaration*,Function_Call*)> • <!ELEMENT FileTag (IncludeTag*,SourceTag*)> • <!ELEMENT IncludeTag (#PCDATA)*> • <!ELEMENT SourceTag (#PCDATA)*> • <!ELEMENT Declaration (Type_Name|Identifier)*> • <!ELEMENT Type_Name (#PCDATA)*> • <!ELEMENT Identifier (#PCDATA)*> • <!ELEMENT Function_Call (Return_Type*,Function_Name*,Argument*)> • <!ELEMENT Return_Type (Return_Var*)> • <!ELEMENT Return_Var (#PCDATA)> • <!ELEMENT Function_Name (#PCDATA)> • <!ELEMENT Argument (parameterName*)> • <!ELEMENT parameterName (#PCDATA)> • <!--We may want to have external calls or graphics in our document. Currently there is none, but we still have to declare them--> • <!ELEMENT External_Call EMPTY> • <!ELEMENT Graphics EMPTY> • <!--end of element declarations--> Defines what other tags are within the <Language> tag Defines data types for contents within the <IncludeTag> tag
XML Document (page 1 of 2) • <?xml version="1.0"?> • <?xml:stylesheet type="text/xsl" href="studentXSL1.xsl" ?> • <!DOCTYPE Language SYSTEM "Student.dtd"> • <Language> • <FileTag> • <IncludeTag>include stdio.h:</IncludeTag> • </FileTag> • <FileTag> • <IncludeTag>include math.h</IncludeTag> • </FileTag> • <FileTag> • <SourceTag>code statement3:</SourceTag> • </FileTag> • <FileTag> • <SourceTag>code statement2:</SourceTag> • </FileTag> • <Declaration> • <Type_Name>char*</Type_Name> • <Identifier>UW</Identifier> • </Declaration> Calls a XSL style sheet Calls a DTD document
XML Document (page 2 of 2) • <Declaration> • <Type_Name>int</Type_Name> • <Identifier>numOfstudents</Identifier> • </Declaration> • <Declaration> • <Type_Name>char*</Type_Name> • <Identifier>facultyName</Identifier> • </Declaration> • <Function_Call> • <Return_Type> • <Return_Var>student_profile</Return_Var> • </Return_Type> • <Function_Name>elec_eng</Function_Name> • <Argument> • <parameterName>name</parameterName> • </Argument> • </Function_Call> • </Language>
XML Namespaces • Latest milestone for W3C's XML technology (14-January-1999 ) • W3C’s definition of XML NameSpaces: • “XML namespaces provide a simple method for qualifying element and attribute names used in Extensible Markup Language documents by associating them with namespaces identified by URI references.” • Why use it? • Maintain tag meaningfulness and uniqueness • How does it solve the problem? • Add context to XML tags by using prefix and URL
XSL Document (Page 1 of 3) • <?xml version="1.0"?> • <DIV xmlns:xsl="http://www.w3.org/TR/WD-xsl"> • <html:html xmlns:html="http://www.w3.org/TR/REC-html40"> • <i>This page consists of XML, XSL, Namespace, HTML, and Java Applet</i> • <html:head><html:title><H1>Sample C Code (hidden XML tag)</H1></html:title></html:head> • <xsl:for-each select="Language"> • <TD STYLE="padding-left:1em"> • <DIV><xsl:value-of select="/"/></DIV> • <html:font color="red">The above command prints out all contents within tags without any formmating, ordering, linebreaks, etc.</html:font> • </TD> • </xsl:for-each> • <xsl:for-each order-by="+ IncludeTag" select="Language/FileTag"> • <TD STYLE="padding-left:1em"> • <html:BR></html:BR> • <DIV><html:BR><xsl:value-of select="IncludeTag"/></html:BR></DIV> • </TD> • </xsl:for-each> • <html:font color="red">End of IncludeTag, ascending sort on Include Tag Content</html:font> Namespace for XSL Namespace for HTML
XSL Document (Page 2 of 3) • <xsl:for-each order-by="+ SourceTag" select="Language/FileTag"> • <TD STYLE="padding-left:1em"> • <html:BR></html:BR> • <DIV><xsl:value-of page-break-after="SourceTag" select="SourceTag"/></DIV> • </TD> • </xsl:for-each> • <html:font color="red">End of SourceTag, ascending sort on SourceTag Content</html:font> • <html:BR></html:BR> • <xsl:for-each order-by="+ Type_Name" select="Language/Declaration"> • <TD STYLE="padding-left:1em"> • <html:BR></html:BR> • <DIV><html:BR><xsl:value-of select="Type_Name"/></html:BR></DIV> • <DIV><html:BR><xsl:value-of select="Identifier"/></html:BR></DIV> • </TD> • </xsl:for-each> • <html:font color="red">End of Declaration, ascending sort on Type_Name</html:font> • <DIV></DIV>
XSL Document (Page 3 of 3) • <xsl:for-each select="Language/Function_Call"> • <TD STYLE="padding-left:1em"> • <html:BR><DIV><xsl:value-of select="Return_Type"/></DIV></html:BR> • <html:font color="red">End of Return_Type</html:font> • <html:BR><DIV><xsl:value-of select="Function_Name"/></DIV></html:BR> • <html:font color="red">End of Function_Name</html:font> • <html:BR></html:BR> • <html:BR><DIV><xsl:value-of select="Argument"/></DIV></html:BR> • <html:font color="red">End of Argument</html:font> • <html:BR></html:BR> • </TD> • </xsl:for-each> • <html:BR></html:BR> • <html:APPLET code="AgentAction.class" width="400" height="200"></html:APPLET> • <html:BR></html:BR> • </html:html> • </DIV>
Applications that require XML • Information exchange between heterogeneous databases • Health care example • Distributed processing • Semiconductor industry example • Multiple views of the same data • “Intelligent” information agents
Using XML • XML for Storage • Compact syntax • Generalized and standarized • Product independent • XML for Searching • Use of content specific markup enables robust searching • Search engines need to be XML aware • Can use current SGML search engines
What is DOM? • A programming API for XML • logical structure of document • Access and Manipulation of documents
What is DOM? • As an object model, DOM identifies • Interface and Objects used for the doc. • Behaviours and Attributes • Relationships and Collaborations of Interfaces and Objects
What is DOM? • 2 Major Components for DOM Level 1 • DOM Core = Basic functionalities for XML • DOM HTML = Objects and Methods specific to HTML • Level 2 • DOM CSS, DOM Event, DOM Filters and Iterators, DOM Range
Advantages of using DOM • Easy to create, navigate, add, modify documents • DOM abstraction avoids implementation dependencies • DOM applications may use additional language bindings
A Typical DOM Structure <condition_statement> <if_statement> <if_tag> if </if_tag> <expression_tag> (b == c) </expression_tag> <statement_tag> {a += c} </statement_tag> </if_statement> </condition_statement>
<condition_statements> <if_statements> <if_tag>> <expression_tag> <statement_tag> if (b==c) {a+=c} A Typical DOM Structure (2)
A Typical DOM Structure (3) • DOM abstraction is a Tree or Forest Structure • Users have full flexibility to specify the structure • Structural Isomorphism
Some Key Objects • Node • Tree node of the document • root node, parents and children • Element (is a Node object) • Elements of a document • Represents contents between the start tag and end tag • Attributes: defined by DTD
Some Key Objects (2) • Document • root node of a document • NodeIterator • iterates over a set of nodes specified by a filter • AttributeList • collection of Attribute objects, indexed by attribute name
Some Key Objects (3) • Attribute • attribute of an Element Object • DocumentContext • respository for metadata about a document • DOM • provides instance-independent document operations
Memory Management for DOM • DOM APIs operate across a variety of memory implementation methods: • Language platforms that do not expose memory management to user • Language (Java) that provides constructors with Garbage collection capability • Language (C/C++) that requires explicit memory allocations
Resources/Quirks • IE 5 and Navigator 5.0 implement different features: • IE 5.0 - XML/XSL Navigator - XML/CSS • Navigator to support RDF • XML Resources: • http://www.swen.uwaterloo.ca/~group1
Using XML (cont.) • XML for Presentation • Convert to HTML at server • Use Java applications to render in browser • Slow • Use XSL or CSS to render in browser • Fast
XML in the industry • Explosive growth of XML tools and specifications • Tools: JADE, MSXML, JUMBO,... • Specifications: CDF, CFML,EDI • Browsers: IE, Netscape
Thoughts on XML • Seems like a transition stage between HTML and SGML • Will we eventually end up using SGML? • XML follows basic principles of SE • Higher abstraction layer • Reuse • Modularity
References • XML.COM - A guide to XML • http://www.xml.com/xml/pub/w3j/s3.walsh.html • XML.COM - The Road to XML: Adapting SGML to the Web • http://www.xml.com/xml/pub/w3j/s1.discussion.html • The Computer Bulletin - The XML Files • http://www.bcs.org.uk/publicat/ebull/may98/xml.htm • XML, Java, and the future of the Web • http://sunsite.unc.edu/pub/sun-info/standards/xml/why/xmlapps.htm • XML: What is it • http://iai.sgml.com/980106-01.asp • Why do we need XML? • http://info.admin.kth.se/SGML/Konferenser/xml98sve/seminar.html • An Introduction to the Standard Generalized Markup Language • http://www.personal.u-net.com/~sgml/sgml.htm • SGML101 • http://www.uslynx.com/sgml101.htm