740 likes | 748 Views
eXtensible Markup Language. XML. Objectives. Introduce XML Including: XML Documents Basics XML Schema XML Stylesheets & Transformations (XMLS/T) Explore the XML Support in .NET. Contents. Have a Look Back: The pre-XML world “The XML Architecture” XML & XML Document Basics XML Schemata
E N D
Objectives • Introduce XML Including: • XML Documents Basics • XML Schema • XML Stylesheets & Transformations (XMLS/T) • Explore the XML Support in .NET
Contents • Have a Look Back: The pre-XML world • “The XML Architecture” • XML & XML Document Basics • XML Schemata • Stylesheets & Transformations • .NET Framework Support for XML • System.Xml and sub-namespaces
Looking Back • Tightly coupled systems and communication • Proprietary, closed protocols and methods • Data sharing between 3rd party solutions unwieldy • Non-extensible solutions
XML! • XML technologies introduced: • XML 1.0 - Document Basics • XML Schemata • XSLT: Style sheets and Transformations • .NET & XML: • The System.Xml Namespace
XML 1.0 - Document Basics • What is XML? • XML Tags and Tag Sets • Components of an XML Document • Document Instance • XML Document by Example • The XML Parser
What is XML? 1/2 • Stands for “Extensible Markup Language” • Language specification for describing data • Syntax rules • Syntax & Grammar for creating Document Type Definitions • Widely used and open standard • Defined by the World Wide Web Consortium (W3C) • http://www.w3.org/TR/2000/REC-xml-20001006
What is XML? 2/2 • Designed for describing and interchanging data • Data is logically structured • Human readable, writeable and understandable text file! • Easy to Parse; Easy to Read; and Easy to Write! • Metadata: • Data that describes data; data with semantics • Looks like HTML…but it isn’t! • Uses tags to delimit data and create structure • Does not specify how to display the data
XML Tag-Sets • Begin with <someTag> and end with </someTag> • Can have an empty element: <someTag /> • Exceptions are: • XML document declaration: <?xml ... ?> • Comments: <!-- some comment --> • The document type declaration • <! DOCTYPE [ ... ]> • Definition of document elements in an Internal DTD: • <!ELEMENT >, <!ATTLIST>, etc • Promote logical structuring of documents and data • User definable • Create hierarchically nested structure
Components of an XML Document 1/3 • XML Processing Instruction • Document Type Declaration • Document Instance
Components of an XML Document 2/3 • XML Processing Instruction • <?xml version = “1.0” encoding = “UTF-8” ?> • version information • encoding type: UTF-8, UTF-16, ISO-10646-UCS-2, etc • standalone declaration; indicates if there are external file references • Namespace declaration(s), Processing Instructions (for applications), etc
Components of an XML Document 3/3 • Document Type Declaration. Two types: • An Internal declaration • An External reference • Document Instance • This is the XML document instance • Read as: the “XML-ized” data <!DOCTYPE CustomerOrder [ <!-- internal DTD goes here! --> ]> <!DOCTYPE CustomerOrder SYSTEM "http://www.myco.com/CustOrder.dtd">
Document Instance: The Markup • Document Root Element • Required if a document type declaration exists • Must have the same name as the declaration • Elements • Can contain other elements • Can have attributes assigned to them • May or may not have a value • Attributes • Properties that are assigned to elements • Provide additional element information
XML By Example: A Document • <?xml version = “1.0” encoding = “UTF-8” ?> • <! DOCTYPE CustomerOrder • SYSTEM “http://www.myco.com/dtd/order.dtd” > • <CustomerOrder> • <Customer> • <Person> • <FName> Olaf </FName> • <LName> Smith </LName> • </Person> • <Address AddrType = “shipping”> • 91 Park So, New York, NY 10018 </Address> • <Address AddrType = “billing”> • Hauptstrasse 55, D-81671 Munich </Address> • </Customer> • <Orders> • <OrderNo> 10 </OrderNo> • <ProductNo> 100 </ProductNo> • <ProductNo> 200 </ProductNo> • </Orders> • <!-- More <Customer>s ... --> • </CustomerOrder>
XMLData + DTD <!-- XML Data--> <a> <b> Some </b> <c> 100 </c> <c> 101 </c> </a> DTD Not Valid! <!ELEMENT a (b+, c?) > <!ELEMENT b (#PCDATA) > <!ELEMENT c (#PCDATA) > <!-- XML Data--> <a> <b> Some </b> <b> Thing </b> </a> Valid
What’s a DTD? • Document Type Definition (DTD) • Defines the syntax, grammar & semantics • Defines the document structure • What Elements, Attributes, Entities, etc are permitted? • How are the document elements related & structured? • Referenced by or defined in XML documents, but it’s not XML! • Enables validation of XML documents using an XML Parser • Can be referenced to by more than one XML document • DTD’s may reference other DTD’s
OrderNo ProductNo Address ProductNo Address ProductNo Address OrderNo ProductNo ProductNo DTD By Diagram CustomerOrder Customer Orders Person FName LName Orders Orders
DTD By Example • http://www.myco.com/dtd/order.dtd <?xml version = “1.0” encoding = “UTF-8” ?> <!DOCTYPE CustomerOrder [ <!ELEMENT CustomerOrder (Customer, Orders*) > <!ELEMENT Customer (Person, Address+) > <!ELEMENT Person (FName, LName) > <!ELEMENT FName (#PCDATA) > <!ELEMENT LName (#PCDATA) > <!ELEMENT Address (#PCDATA) > <!ATTLIST Address AddrType ( billing | shipping | home ) “shipping” > <!ELEMENT Orders (OrderNo, ProductNo+) > <!ELEMENT OrderNo (#PCDATA) > <!ELEMENT ProductNo (#PCDATA) > ]>
Browser orApplication XML Parser in Action! XML Schema Or DTD XML Parser XML Source Document Validated XML Document
The XML Parser: What is it? • Used to Process an XML Document • Reads, parses & interprets the DTD and XML document • Performs substitutions, validation or additional processing • Knows the XML language rules and can determine: • Is the document Well-Formed? • Is it Valid? • Creates a Document Object Model (DOM) of the instance • Provides programmatic access to the DOM – or instance
What is the DOM? • DOM stands for Document Object Model • Programming interface for HTML & XML documents • An in-memory representation of a document • Defines the document structure through an object model • Tree-view of a document • Nodes, elements and attributes, text elements, etc • W3C defined the DOM Level 1 and Level 2 Core • http://www.w3.org/TR/1998/REC-DOM-Level-1-19981001/ • http://www.w3.org/TR/2000/REC-DOM-Level-2-Core-20001113/
Dom Tree Root Element Child Element Child Element Text Text Generating The DOM <?xml version=“1.0”?> XML Document Parser
Where Do You Find XML Parsers? • Transparently built into XML enabled products • Internet Explorer, SQL Server 2000, etc • All over the Internet! • Microsoft XML Parser • http://msdn.microsoft.com/xml/general/xmlparser.asp • IBM/Apache Xerces • http://xml.apache.org • http://alphaworks.ibm.com
XML Schema • What’s a Schema? • Schema vs. DTD’s • Datatypes & Structure
XML Documents + XML Schema <!-- XML Data--> <a> <b> Some </b> <c> 100 </c> <c> 101 </c> </a> <!-- Some XML Schema --> <element name = “a" > <complexType> <sequence> <element name=“b“ type=“string" minOccurs=“1"/> <element name=“c" type="integer" maxOccurs="1" /> </sequence> </complexType> </element> Not Valid! <!-- XML Data--> <a> <b> Some </b> <b> Thing </b> </a> Valid
What’s a Schema? • Webster’s Collegiate Dictionary defines it as: • A diagrammatic presentation; a structured framework • The XML world defines it as: • A structured framework for your XML Documents! • A definition language - with its own syntax & grammar • A means to structure data and enhance it with semantics! • Best of all: It’s an alternative to the DTD! • Composed of two parts: • Structure: http://www.w3.org/TR/2001/REC-xmlschema-1-20010502/ • Datatypes: http://www.w3.org/TR/2001/REC-xmlschema-2-20010502/
Schema vs. DTDs • Both are XML document definition languages • Schemata are written using XML • Unlike DTD’s, XML Schema are Extensible – like XML! • More verbose than DTD’s but easier to read & write
Datatypes & Structure • Defining datatypes • The simple or primitive datatypes • Based on (or derived) from the Schema datatypes • Complex types • Facets • Declaring data types • <schema> by example
XML Schema Datatypes • Two kinds of datatypes: Built-in and User-defined • Built-in • Primitive Datatypes • string, double, recurringDuration, etc • Derived Datatypes: • CDATA, integer, date, byte, etc • Derived from the primitive types • Example: integer is derived from double • User-defined • Derived from built-in or other user-defined datatypes
The Simple Type: <simpleType> • The Simplest Type Declaration: • <simpleType name = “FirstName” type = “string”/> • Based on a primitive or the derived built-in datatypes • Cannot contain sub-elements or attributes • Can declare constraining properties (“facets”) • minLength, maxLength, Length, etc • May be used as base type of a complexType
The Complex Type: <complexType> • Used to define a new complex type • May be based on simple or existing complexTypes • May declare elements or element references: • <element name=“...” type = “...” /> • <element ref=“...”/> • May declare attributes or reference attribute groups • <attribute name=“...” type=“...”/> • <attributeGroup ref = “...” />
Defining a complexType By Example <complexType name= “Customer”> <sequence> <element name= “Person” type=“Name” /> <element name= “Address” type=“Address” /> </sequence></complexType> <complexType name=“Address”> <sequence> <element name=“Street” type=“string” /> <element name=“City” type=“string” /> <element name=“State” type=“State_Region” /> <element name=“PostalCode” type=“string” /> <element name=“Country” type=“string” /> </sequence> <!-- AddrType attribute not shown --> </complexType>
More Complex Types • Derivation • simpleContent complexContent • Extension & Restriction (we’ll see some of this) • Substitution Groups • Abstract Elements and Types
The Many Facets of a Datatype! • http://www.w3.org/TR/2001/REC-xmlschema-2-20010502/ • A way to constrain datatypes • Constrain the “value space” of a datatype • Specify optional properties • Examples of Constraining Facets: • precision, minLength,enumeration, ... <simpleType name=“FirstName”> <restriction base = “string”> <minLength value = “0” /> <maxLength value = “25” /> </restriction> </simpleType>
Declaring <element> Elements <element name = “FirstName” type = “string” /> • Elements are declared using the <element> tag • Based on either a simple or complex type • May contain simple or other complex types • May reference an existing element <element name = “Address” type = “AddressType” /> <element name = “Orders” > <complexType> <sequence> <element name = “OrderNo” type = “string” /> <element name = “ProductNo” type = “string” /> </sequence> </complexType> </element> <element ref = “FirstName” />
Declaring Attributes • Declared using <attribute> tag • Value pairs • Can only be assigned to <complexType> types • May be grouped into an attribute group – more later! • Based on a <simpleType>, by reference or explicitly <attribute name = “age” type=“integer” /> <!-- OR --> <attribute name = “age” > <simpleType> <restriction base=“integer”> <maxLength = “3”/> </restriction> </simpleType></attribute>
Declaring Attribute Groups 1/2 • Way to group related attributes together • Promotes logical organization • Encourages reuse – defined once, referenced many times • Facilitates maintenance • Improves Schema readability • Must be unique within an XML Schema • Referenced from complexType definitions
Declaring Attribute Groups 2/2 <!-- Define the unique group: --><attributeGroup name = “CreditCardInfo” > <attribute name = “CardNumber” type = “integer” use = “required” /> <attribute name = “ExpirationDate” type = “date” use = “required” /> <attribute name = “CardHolder” type = “FullName” use = “required” /></attributeGroup> <!-- Then you can reference it from a complexType: --><complexType name = “CreditInformation” > <attributeGroup ref = “CreditCardInfo” /></complexType>
Schema Namespaces • Equivalent to XML namespaces • http://www.w3.org/TR/1999/REC-xml-names-19990114/ • Used to qualify schema elements • <schema> must itself be qualified with the schema namespace • Namespace may have a namespace prefix for the schema • Prefix qualifies elements belonging to the targetNamespace <schema xmlns = “http://www.w3.org/2001/XMLSchema” > <schema xmlns = “http://www.w3.org/2001/XMLSchema”xmlns:CO = “http://www.MyCompany.com/Schema”>
<schema> targetNamespace Attribute • <schema>targetNamespace attribute • Declares the namespace of the current schema • This must be a universally unique Universal Resource Identifier (URI) • Helps the parser differentiate type definitions • Used during schema validation • Differentiates differing schema vocabularies in the schema • targetNamespace:namespace_prefix = “some_URI...” • Should match the schema namespace declaration • Example: • targetNamespace:CO ="http://www.myCo.com/CO"
XML <schema> By Example <?xml version="1.0" encoding="UTF-8"?> <xsd:schema targetNamespace = “http://www.myCo.com/CO” xmlns:CO=“http://www.myCo.com/CO” xmlns:xsd="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified“ attributeFormDefault="qualified“ elementFormDefault="qualified"> <!-- Declare the “root element” of our schema --> <xsd:element name="CustomerOrder" type="CO:CustomerOrder"/> <!-- Further Definitions & declarations not shown --></xsd:schema>
Follow the Yellow Brick XPath • Specification found at: • http://www.w3.org/TR/1999/REC-xpath-19991116 • Language used to address parts of an XML document • Permits selection of nodes in an XML document • Uses a path notations like with URL’s • Absolute paths: /CustomerOrder/Orders • Relative paths: Orders
Roadmap To Selection • Location Syntax • axis::node_test[ predicate ] • Location Paths • Axis: Defines from where to start navigating • parent, child, ancestor, attribute, /(the document), etc • Node test: Selects one or more nodes • By tag name, node selector or wildcard (*) • node( ), text( ), comment( ), etc • Predicates: Optional function or expression enclosed in “[...]” • position( ), count( ), etc • Example: /Address:: * [@AddrType=“billing”]
Taking XPath Shortcuts • Abbreviated Syntax exists • The following are equivalent OrderNo[position=1]/ProductNo[position=3] OrderNo[1]/ProductNo[3] • ..instead of parent::node() • . instead self::node() • // instead of /descendant-or-self::node()/
Operators • To select an attribute value use @ CustomerOrder/Customer/Address[@AddrType] • To select the value of an element use $ CustomerOrder/Orders/ProductNo[1][$ProductNo] • Can compare objects arithmetically • <(for “<“), >(for “>”), <=(for “<=“), etc • Must adhere to XML 1.0 quoting rules • Can use logical operators • and • or
XSLT: Stylesheets & Transformations • What is XSLT? • The Basic Structure • Some Template Rules • More Advanced Structure • More Advanced Template Rules (or Features ;) • Transforming It All
What is XSLT? • Widely used and open standard defined by the W3C • A sub-specification of XSL • http://www.w3.org/TR/1999/REC-xslt-19991116 • Designed to be used independently of XSL • Designed primarily for the transformation needed in XSL • W3C defines XSLT: • “a language for transforming XML documents” • XSLT is more than a language – it’s an XML programming language • Can have rules, evaluate conditions, etc • Offers the ability to transform one XML document into another • Transform an XDR Schema to and XSD Schema! • Transform an XML document into an HTML document
XSLT Processor The XSLT Process – Overview Target Schema XSLT Style Sheet XML Source Document XML Target Document Source Schema
Transformation Process Overview • Pass source document to an XSLT processor • Processor contains a loaded XSLT style-sheet • Processor then: • Loads the specified Stylesheet templates... • Traverses the source document, node by node... • Where a node matches a template... • Applies the template to the node • Outputs the (new) XML or HTML result document
Process of “Transmutation” <Orders > <OrderNo> 10 </OrderNo> <ProductNo> 100 </ProductNo> <ProductNo> 200 </ProductNo> </Orders > <Orders > <OrderNo> 20 </OrderNo> <ProductNo> 501 </ProductNo> </Orders > <HTML> <BODY> <TABLE border = “3”> <TR> <TD> 10 </TD> <TD> 100</TD> </TR> <TR> <TD> 10 </TD> <TD> 200</TD> </TR> <TR> <TR></TR> <TD> 20 </TD> <TD> 501 </TD> </TR> </TABLE> </BODY> </HTML> XSLT Processor XSLT Stylesheet