740 likes | 756 Views
This guide introduces XML basics including documents, schema, stylesheets, and transformations in .NET. Explore XML support, its architecture, and standards. Understand XML syntax and metadata, component structures, and data interchange with examples. Learn about Document Type Definitions (DTD) and their role in defining XML document structure and validation. Discover XML processing instructions and namespaces, promoting logical data structuring and readability.
E N D
Objectives • Introduce XML Including: • XML Documents Basics • XML Schema • XML Stylesheets & Transformations (XMLS/T) • Explore the XML Support in .NET
Contents • Have a Look Back: The pre-XML world • “The XML Architecture” • XML & XML Document Basics • XML Schemata • Stylesheets & Transformations • .NET Framework Support for XML • System.Xml and sub-namespaces
Looking Back • Tightly coupled systems and communication • Proprietary, closed protocols and methods • Data sharing between 3rd party solutions unwieldy • Non-extensible solutions
XML! • XML technologies introduced: • XML 1.0 - Document Basics • XML Schemata • XSLT: Style sheets and Transformations • .NET & XML: • The System.Xml Namespace
XML 1.0 - Document Basics • What is XML? • XML Tags and Tag Sets • Components of an XML Document • Document Instance • XML Document by Example • The XML Parser
What is XML? 1/2 • Stands for “Extensible Markup Language” • Language specification for describing data • Syntax rules • Syntax & Grammar for creating Document Type Definitions • Widely used and open standard • Defined by the World Wide Web Consortium (W3C) • http://www.w3.org/TR/2000/REC-xml-20001006
What is XML? 2/2 • Designed for describing and interchanging data • Data is logically structured • Human readable, writeable and understandable text file! • Easy to Parse; Easy to Read; and Easy to Write! • Metadata: • Data that describes data; data with semantics • Looks like HTML…but it isn’t! • Uses tags to delimit data and create structure • Does not specify how to display the data
XML Tag-Sets • Begin with <someTag> and end with </someTag> • Can have an empty element: <someTag /> • Exceptions are: • XML document declaration: <?xml ... ?> • Comments: <!-- some comment --> • The document type declaration • <! DOCTYPE [ ... ]> • Definition of document elements in an Internal DTD: • <!ELEMENT >, <!ATTLIST>, etc • Promote logical structuring of documents and data • User definable • Create hierarchically nested structure
Components of an XML Document 1/3 • XML Processing Instruction • Document Type Declaration • Document Instance
Components of an XML Document 2/3 • XML Processing Instruction • <?xml version = “1.0” encoding = “UTF-8” ?> • version information • encoding type: UTF-8, UTF-16, ISO-10646-UCS-2, etc • standalone declaration; indicates if there are external file references • Namespace declaration(s), Processing Instructions (for applications), etc
Components of an XML Document 3/3 • Document Type Declaration. Two types: • An Internal declaration • An External reference • Document Instance • This is the XML document instance • Read as: the “XML-ized” data <!DOCTYPE CustomerOrder [ <!-- internal DTD goes here! --> ]> <!DOCTYPE CustomerOrder SYSTEM "http://www.myco.com/CustOrder.dtd">
Document Instance: The Markup • Document Root Element • Required if a document type declaration exists • Must have the same name as the declaration • Elements • Can contain other elements • Can have attributes assigned to them • May or may not have a value • Attributes • Properties that are assigned to elements • Provide additional element information
XML By Example: A Document • <?xml version = “1.0” encoding = “UTF-8” ?> • <! DOCTYPE CustomerOrder • SYSTEM “http://www.myco.com/dtd/order.dtd” > • <CustomerOrder> • <Customer> • <Person> • <FName> Olaf </FName> • <LName> Smith </LName> • </Person> • <Address AddrType = “shipping”> • 91 Park So, New York, NY 10018 </Address> • <Address AddrType = “billing”> • Hauptstrasse 55, D-81671 Munich </Address> • </Customer> • <Orders> • <OrderNo> 10 </OrderNo> • <ProductNo> 100 </ProductNo> • <ProductNo> 200 </ProductNo> • </Orders> • <!-- More <Customer>s ... --> • </CustomerOrder>
XMLData + DTD <!-- XML Data--> <a> <b> Some </b> <c> 100 </c> <c> 101 </c> </a> DTD Not Valid! <!ELEMENT a (b+, c?) > <!ELEMENT b (#PCDATA) > <!ELEMENT c (#PCDATA) > <!-- XML Data--> <a> <b> Some </b> <b> Thing </b> </a> Valid
What’s a DTD? • Document Type Definition (DTD) • Defines the syntax, grammar & semantics • Defines the document structure • What Elements, Attributes, Entities, etc are permitted? • How are the document elements related & structured? • Referenced by or defined in XML documents, but it’s not XML! • Enables validation of XML documents using an XML Parser • Can be referenced to by more than one XML document • DTD’s may reference other DTD’s
OrderNo ProductNo Address ProductNo Address ProductNo Address OrderNo ProductNo ProductNo DTD By Diagram CustomerOrder Customer Orders Person FName LName Orders Orders
DTD By Example • http://www.myco.com/dtd/order.dtd <?xml version = “1.0” encoding = “UTF-8” ?> <!DOCTYPE CustomerOrder [ <!ELEMENT CustomerOrder (Customer, Orders*) > <!ELEMENT Customer (Person, Address+) > <!ELEMENT Person (FName, LName) > <!ELEMENT FName (#PCDATA) > <!ELEMENT LName (#PCDATA) > <!ELEMENT Address (#PCDATA) > <!ATTLIST Address AddrType ( billing | shipping | home ) “shipping” > <!ELEMENT Orders (OrderNo, ProductNo+) > <!ELEMENT OrderNo (#PCDATA) > <!ELEMENT ProductNo (#PCDATA) > ]>
Browser orApplication XML Parser in Action! XML Schema Or DTD XML Parser XML Source Document Validated XML Document
The XML Parser: What is it? • Used to Process an XML Document • Reads, parses & interprets the DTD and XML document • Performs substitutions, validation or additional processing • Knows the XML language rules and can determine: • Is the document Well-Formed? • Is it Valid? • Creates a Document Object Model (DOM) of the instance • Provides programmatic access to the DOM – or instance
What is the DOM? • DOM stands for Document Object Model • Programming interface for HTML & XML documents • An in-memory representation of a document • Defines the document structure through an object model • Tree-view of a document • Nodes, elements and attributes, text elements, etc • W3C defined the DOM Level 1 and Level 2 Core • http://www.w3.org/TR/1998/REC-DOM-Level-1-19981001/ • http://www.w3.org/TR/2000/REC-DOM-Level-2-Core-20001113/
Dom Tree Root Element Child Element Child Element Text Text Generating The DOM <?xml version=“1.0”?> XML Document Parser
Where Do You Find XML Parsers? • Transparently built into XML enabled products • Internet Explorer, SQL Server 2000, etc • All over the Internet! • Microsoft XML Parser • http://msdn.microsoft.com/xml/general/xmlparser.asp • IBM/Apache Xerces • http://xml.apache.org • http://alphaworks.ibm.com
XML Schema • What’s a Schema? • Schema vs. DTD’s • Datatypes & Structure
XML Documents + XML Schema <!-- XML Data--> <a> <b> Some </b> <c> 100 </c> <c> 101 </c> </a> <!-- Some XML Schema --> <element name = “a" > <complexType> <sequence> <element name=“b“ type=“string" minOccurs=“1"/> <element name=“c" type="integer" maxOccurs="1" /> </sequence> </complexType> </element> Not Valid! <!-- XML Data--> <a> <b> Some </b> <b> Thing </b> </a> Valid
What’s a Schema? • Webster’s Collegiate Dictionary defines it as: • A diagrammatic presentation; a structured framework • The XML world defines it as: • A structured framework for your XML Documents! • A definition language - with its own syntax & grammar • A means to structure data and enhance it with semantics! • Best of all: It’s an alternative to the DTD! • Composed of two parts: • Structure: http://www.w3.org/TR/2001/REC-xmlschema-1-20010502/ • Datatypes: http://www.w3.org/TR/2001/REC-xmlschema-2-20010502/
Schema vs. DTDs • Both are XML document definition languages • Schemata are written using XML • Unlike DTD’s, XML Schema are Extensible – like XML! • More verbose than DTD’s but easier to read & write
Datatypes & Structure • Defining datatypes • The simple or primitive datatypes • Based on (or derived) from the Schema datatypes • Complex types • Facets • Declaring data types • <schema> by example
XML Schema Datatypes • Two kinds of datatypes: Built-in and User-defined • Built-in • Primitive Datatypes • string, double, recurringDuration, etc • Derived Datatypes: • CDATA, integer, date, byte, etc • Derived from the primitive types • Example: integer is derived from double • User-defined • Derived from built-in or other user-defined datatypes
The Simple Type: <simpleType> • The Simplest Type Declaration: • <simpleType name = “FirstName” type = “string”/> • Based on a primitive or the derived built-in datatypes • Cannot contain sub-elements or attributes • Can declare constraining properties (“facets”) • minLength, maxLength, Length, etc • May be used as base type of a complexType
The Complex Type: <complexType> • Used to define a new complex type • May be based on simple or existing complexTypes • May declare elements or element references: • <element name=“...” type = “...” /> • <element ref=“...”/> • May declare attributes or reference attribute groups • <attribute name=“...” type=“...”/> • <attributeGroup ref = “...” />
Defining a complexType By Example <complexType name= “Customer”> <sequence> <element name= “Person” type=“Name” /> <element name= “Address” type=“Address” /> </sequence></complexType> <complexType name=“Address”> <sequence> <element name=“Street” type=“string” /> <element name=“City” type=“string” /> <element name=“State” type=“State_Region” /> <element name=“PostalCode” type=“string” /> <element name=“Country” type=“string” /> </sequence> <!-- AddrType attribute not shown --> </complexType>
More Complex Types • Derivation • simpleContent complexContent • Extension & Restriction (we’ll see some of this) • Substitution Groups • Abstract Elements and Types
The Many Facets of a Datatype! • http://www.w3.org/TR/2001/REC-xmlschema-2-20010502/ • A way to constrain datatypes • Constrain the “value space” of a datatype • Specify optional properties • Examples of Constraining Facets: • precision, minLength,enumeration, ... <simpleType name=“FirstName”> <restriction base = “string”> <minLength value = “0” /> <maxLength value = “25” /> </restriction> </simpleType>
Declaring <element> Elements <element name = “FirstName” type = “string” /> • Elements are declared using the <element> tag • Based on either a simple or complex type • May contain simple or other complex types • May reference an existing element <element name = “Address” type = “AddressType” /> <element name = “Orders” > <complexType> <sequence> <element name = “OrderNo” type = “string” /> <element name = “ProductNo” type = “string” /> </sequence> </complexType> </element> <element ref = “FirstName” />
Declaring Attributes • Declared using <attribute> tag • Value pairs • Can only be assigned to <complexType> types • May be grouped into an attribute group – more later! • Based on a <simpleType>, by reference or explicitly <attribute name = “age” type=“integer” /> <!-- OR --> <attribute name = “age” > <simpleType> <restriction base=“integer”> <maxLength = “3”/> </restriction> </simpleType></attribute>
Declaring Attribute Groups 1/2 • Way to group related attributes together • Promotes logical organization • Encourages reuse – defined once, referenced many times • Facilitates maintenance • Improves Schema readability • Must be unique within an XML Schema • Referenced from complexType definitions
Declaring Attribute Groups 2/2 <!-- Define the unique group: --><attributeGroup name = “CreditCardInfo” > <attribute name = “CardNumber” type = “integer” use = “required” /> <attribute name = “ExpirationDate” type = “date” use = “required” /> <attribute name = “CardHolder” type = “FullName” use = “required” /></attributeGroup> <!-- Then you can reference it from a complexType: --><complexType name = “CreditInformation” > <attributeGroup ref = “CreditCardInfo” /></complexType>
Schema Namespaces • Equivalent to XML namespaces • http://www.w3.org/TR/1999/REC-xml-names-19990114/ • Used to qualify schema elements • <schema> must itself be qualified with the schema namespace • Namespace may have a namespace prefix for the schema • Prefix qualifies elements belonging to the targetNamespace <schema xmlns = “http://www.w3.org/2001/XMLSchema” > <schema xmlns = “http://www.w3.org/2001/XMLSchema”xmlns:CO = “http://www.MyCompany.com/Schema”>
<schema> targetNamespace Attribute • <schema>targetNamespace attribute • Declares the namespace of the current schema • This must be a universally unique Universal Resource Identifier (URI) • Helps the parser differentiate type definitions • Used during schema validation • Differentiates differing schema vocabularies in the schema • targetNamespace:namespace_prefix = “some_URI...” • Should match the schema namespace declaration • Example: • targetNamespace:CO ="http://www.myCo.com/CO"
XML <schema> By Example <?xml version="1.0" encoding="UTF-8"?> <xsd:schema targetNamespace = “http://www.myCo.com/CO” xmlns:CO=“http://www.myCo.com/CO” xmlns:xsd="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified“ attributeFormDefault="qualified“ elementFormDefault="qualified"> <!-- Declare the “root element” of our schema --> <xsd:element name="CustomerOrder" type="CO:CustomerOrder"/> <!-- Further Definitions & declarations not shown --></xsd:schema>
Follow the Yellow Brick XPath • Specification found at: • http://www.w3.org/TR/1999/REC-xpath-19991116 • Language used to address parts of an XML document • Permits selection of nodes in an XML document • Uses a path notations like with URL’s • Absolute paths: /CustomerOrder/Orders • Relative paths: Orders
Roadmap To Selection • Location Syntax • axis::node_test[ predicate ] • Location Paths • Axis: Defines from where to start navigating • parent, child, ancestor, attribute, /(the document), etc • Node test: Selects one or more nodes • By tag name, node selector or wildcard (*) • node( ), text( ), comment( ), etc • Predicates: Optional function or expression enclosed in “[...]” • position( ), count( ), etc • Example: /Address:: * [@AddrType=“billing”]
Taking XPath Shortcuts • Abbreviated Syntax exists • The following are equivalent OrderNo[position=1]/ProductNo[position=3] OrderNo[1]/ProductNo[3] • ..instead of parent::node() • . instead self::node() • // instead of /descendant-or-self::node()/
Operators • To select an attribute value use @ CustomerOrder/Customer/Address[@AddrType] • To select the value of an element use $ CustomerOrder/Orders/ProductNo[1][$ProductNo] • Can compare objects arithmetically • <(for “<“), >(for “>”), <=(for “<=“), etc • Must adhere to XML 1.0 quoting rules • Can use logical operators • and • or
XSLT: Stylesheets & Transformations • What is XSLT? • The Basic Structure • Some Template Rules • More Advanced Structure • More Advanced Template Rules (or Features ;) • Transforming It All
What is XSLT? • Widely used and open standard defined by the W3C • A sub-specification of XSL • http://www.w3.org/TR/1999/REC-xslt-19991116 • Designed to be used independently of XSL • Designed primarily for the transformation needed in XSL • W3C defines XSLT: • “a language for transforming XML documents” • XSLT is more than a language – it’s an XML programming language • Can have rules, evaluate conditions, etc • Offers the ability to transform one XML document into another • Transform an XDR Schema to and XSD Schema! • Transform an XML document into an HTML document
XSLT Processor The XSLT Process – Overview Target Schema XSLT Style Sheet XML Source Document XML Target Document Source Schema
Transformation Process Overview • Pass source document to an XSLT processor • Processor contains a loaded XSLT style-sheet • Processor then: • Loads the specified Stylesheet templates... • Traverses the source document, node by node... • Where a node matches a template... • Applies the template to the node • Outputs the (new) XML or HTML result document
Process of “Transmutation” <Orders > <OrderNo> 10 </OrderNo> <ProductNo> 100 </ProductNo> <ProductNo> 200 </ProductNo> </Orders > <Orders > <OrderNo> 20 </OrderNo> <ProductNo> 501 </ProductNo> </Orders > <HTML> <BODY> <TABLE border = “3”> <TR> <TD> 10 </TD> <TD> 100</TD> </TR> <TR> <TD> 10 </TD> <TD> 200</TD> </TR> <TR> <TR></TR> <TD> 20 </TD> <TD> 501 </TD> </TR> </TABLE> </BODY> </HTML> XSLT Processor XSLT Stylesheet