1 / 72

Introduction to XML

Introduction to XML. Cheng-Chia Chen September 2007. contents. What is XML ? Where does XML come from? What is its status? Why do we need XML ? XML v.s Other formats Core XML Specifications and APIs How can we do with XML? XML sites

emiko
Download Presentation

Introduction to XML

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to XML Cheng-Chia Chen September 2007

  2. contents • What is XML ? • Where does XML come from? What is its status? • Why do we need XML ? • XML v.s Other formats • Core XML Specifications and APIs • How can we do with XML? • XML sites • A partial list of XML applications and industry initiatives • a sketch of XML documents

  3. What is XML ? • The eXtensible Markup Language • a data-structure definition language : let you define the structure and format of your own data. • a data format (syntax) used for the representation, storage and transmission of data whose format is defined by xml. • Text-based markup Language, let you define your own HTML-likemarkup languages. • Recommended by World Web Consortium (W3C) in Feb 1998. • intended to be used as a new message format over the Internet to complement the inadequacy of HTML. • a subset of SGML • is now very popular and becomes the dominating interchange format of information over the internet

  4. The idea of XML • Existing student information • S9010 張得功 資科系 三年級 chang10@cs.nccu.edu.tw • S9021 王德財 應數系 二年級 null • …

  5. HTML’s concerns • How to present the data: <TABLE BORDER=1 bgcolor=“yellow” > <TR><TH>學號</TH>姓名<TH>科系</TH> <TH>年級</TH> <TH>電郵</TH> </TR> <TR><TD> S9010</TD><TD>張得功</TD> <TD>資科系</TD> <TD>三年級</TD> <TD> chang10@cs.nccu.edu.tw </TD></TR> <TR> <TD> S9021 </TD> <TD>王德財</TD> <TD>應數系</TD> <TD>二年級 </TD> </TR> </TABLE>

  6. XML’s concerns • XML uses markup tags as well, but, describe the content, rather than the presentation of that content. • the same example coded in XML: <students> <student><學號> S9010 </學號> <姓名>張得功</姓名> <科系>資科系</科系> <年級>三年級</年級> <電郵> chang10@cs.nccu.edu.tw </電郵> </student> <student><學號> S9021 </學號> <姓名>王德財</姓名> <科系>應數系</科系> <年級>二年級</年級><電郵/> </student> … </students> Notes: 1. Only contents are encoded in the XML text. 2. All data are annotated by tags indicating their roles or functions in the message.

  7. Where does XML come from ? • a simplified subset of the Standard Generalized Markup Language (SGML) standardized in 1986, based on the Generalized Markup Language invented by IBM in 1969 • simplified for more general use on the Web and as a data interchange format. • without losing extensibility, • easier for anyone to write valid XML. • easier to write a parser • easier for the parser to quickly verify that documents are well-formed and/or valid. • 1.0 recommended by W3c at Feb. 1998. • 1.1 recommended at Feb. 2004.

  8. What is the status of XML? • A pervasive data formats over internet as well as other IT fields. • embraced by all of the leaders in the computer industry. • many vertical industries are embracing XML for its ability to expedite the availability of their domain-specific information for internal and external use. • IBM, Microsoft, Sun, Oracle, HP, … • There are many W3C-proposed extensions to XML. • Most use the XML language, which minimizes the differences in syntax that must be learned. • See • XML at W3c or • The XML Cover Pages • for most up-to-date information.

  9. Why do we need XML ? or What can XML bring us?

  10. XML unifies the syntax of information • Layer of information(data): • bit • byte • character BCD EBCDIC ASCII BIG5 ISO-8859 ==> • UNICODE • syntax(form) XML • semantics (ontology) Semantic Web • Application • Semantic Web: • an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation. • --- Tim Berners-Lee et.al.

  11. New desired requirements in the internet age • Easy retrieval of information over the net • realized by current Web/internet technology • good browser, • web server • HTTP, DNS, search engines. • HTML, URI, HyperText, MIME • Easy/cheap interoperation of existing softwares in the internet. • also the old goal of distributed system/computing • RPC, RMI, CORBA,... • a prerequisite for eCommerce • issues: • data transmission ==> solved by existing internet infrastructure • data representations ?

  12. Why needing a unifying format for data ? • Case: 10 word processors, each need to be able to process docs generated by any other. • 1st approach: • write a converter A-->B for any A and B. • #converter = n x (n-1) = 90 (bad!) • 2nd approach: • invent a common format (C). • write a pair of converters (A --> C, C-->A) for each word processor. • To process doc generated from A by B, simply • A --(A-->C)-- C -- (C-->B) -- B • required converts: 2 x n = 20 (much better!) • prerequisite: need a common format. • This is what XML plays!!

  13. Example:XML in EDA (Electric Design Automation)

  14. Additional benefits of XML (as a common format) • Free (or cheap) cost of obtaining required software for processing XML. • without the need to reinvent the wheel. • can focus on value-added software based on these underlying software. • Decoupling of tightly-coupled distributed systems into loosely one. • less monopolization of software by vendors • more selections of combinations for buyers • more chances of contributing softwares for small company. • less investment for buyers.

  15. Application type of current World-Wide Web • Three-tier WWW architecture : • Major information flows (for human information retrieval): (human) browser --(http)--> webServer --> databases -->wrap result into html or other MIME formats --(http) ---> browser --> human • major interactions and interchanged data formats: • application type: information retrieval • Man ---(html/MIME)--- machine(browser+web server) • web server ------------ backend system (databases)

  16. file system file system file system databases databases databases client browser query(post,get) http IE client browser html/text gif/jpeg video/audio FireFox web server the internet ... client browser ... query result tables web server web server apache IIS

  17. The other major WWW applications: business applications

  18. Additional Interactions for WWW business applications • New application type : web service • additional interactions • backend business system <---> webserver <--> webserver <---> backend business system • Problem: Too many data formats exist among the systems and web servers understanding all kinds of data formats are hard to implement. • Solution: define a universal or a small set of universal data formats (in XML) and require all systems to transmit data using such formats. • but the existing HTML + MIME formats not enough ? • NO!! HTML, while amendable to human via browsers, is not easy for machine to understand/retrieve data.

  19. Advantages of XML over HTML • XML can define your own tags. • XML tags describe the content, rather than the presentation of that content • easier for content search (no annoying presentation data). • easier for page development (separating content from view) • easy for devices to render the contents depending on its environments (single model/multiple views) • Notes for the next figure: • searches can be applied to XML data more easily, and the result can be rendered differently, depending on the destination device. • the XML processor can exist on the server, the client, or both.

  20. work done by the XML processor in response to a client request: • collect data from related data sources • merge sources into a unifying content • rendering data depending on the client’s environment.

  21. Comparison of XML and Other formats • HTML • discussed • Text-based non-markup formats • .c .cpp .java .ini … • Binary formats • .dll .exe .o .swf • .class .png .jpeg …

  22. Advantage of XML over text formats Ex: • JavaML v.s Java; CppML v.s Cpp • XMI v.s rational’s proprietary format • web.xml, plugin.xml v.s ***.ini (for configuration) • build.xml v.s. makefile • XQuery XML format v.s plain text format • RelaxNG XML v.s. plain text format • advantage: • structure explicitly represented in the XML format. • (free and) standard tools (and API) exists for quick parsing of the XML format. => front-end processing avoided/reduced • disadvantage: too verbose. • for storage and transmission. • can be overcome by compression • for human generation; (not a problem for machine generation) • require smarter editor • for human reading/comprehension: • a real problem!!

  23. Advantage of XML over binary formats • Example: • classML v.s .clss file format. • swfml v.s swf (Flash file format) • XER v.s. BER for ASN.1 • advantage: • readable; editable • (free and) open software and APIs available • disadvantage: • take longer time to parse and transmit. The trend: • one data model/ multi representation formats + • converters among the formats.

  24. Core specifications for XML • XML 1.0 • XML Namespace • XML Path language (XPath) • XML Stylesheet Langugae (XSL) • XSL Transformation language (XSLT) • XSL formating Objects (XSLFO) • XML Linking language (XLink) • XML Pointer Langugae (XPointer) • XML schemas (; RelaxNG) • XHTML • XML signatures/canonicalization • XML protocols • XMLForm • XQuery (XML language for Querying XML Documents)

  25. Core Specifications for XML • XML • document type definition (DTD) : a utility used to define the formats and contents of valid XML documents. • a specification to define what kinds of texts are well-formed XML document • XML namespace • Define a mechanism to avoid collision of elements and/or attribute names in documents using multiple sets of DTDs. • Xlink • Define the mechanism for linking to web resources from an XML document. • Xpointer • Define a mechanism for linking to inside an XML document. • XPath • Define a mechanism to refer to part of an XML document

  26. XSL ( XML Stylesheet Language) • a language for expressing stylesheets. • consists of two parts: • XSLT : a language (in XML format) used to describe how to transform an XML document into one in XML or non-XML format. • XSLFO: an XML vocabulary for specifying formatting semantics. • An XSL stylesheet specifies the presentation of a class of XML documents by describing how an instance of the class is transformed into an XML document that uses the formatting vocabulary.

  27. XML Schema • A planned replacement of DTD. • used to define the structures and formats of various messages encoded in XML format. • another competing alternative: RelaxNG • consists of three documents: • Part 0: a primer • an easy-to-understand introcuction • Part 2: Datatypes • define tens of frequently used bulit-in datatypes • Part 3: structures • specifies the XML Schema definition language, offers facilities for describing the structure and constraining the contents of XML documents

  28. API for XML documents • DOM (level 1 , 2 & 3) : • Document Object Model • Tree-based XML API • language independent • SAX (version 1 & 2) : • Simple API for XML Document, • Event-based XML API • JDOM, dom4j, XOM (XML APIs for Java) • DOM for Java • Tree-based, • simpler version of DOM • easier to use than DOM, • suitable for Java only

  29. How can XML be used ? XML was designed to store, carry and exchange data. It was not designed to display data. As a syntax format: • XML is used to Exchange Data • With XML, data can be exchanged between incompatible systems. • XML and B2B : With XML, financial information can be exchanged over the Internet. • XML can be used to Share Data • With XML, plain text files can be used to share data. • XML can be used to Store Data With XML, plain text files can be used to store data and object. As a meta language (for defining data structure) • XML can be used to Create new Languages • XML is the mother of WML, SVG, SMIL, GXL, XHTML, CML,...

  30. XML can make your Data more Useful • With XML, your data is available to more users. • For sensible developers • All sensible developers should have all their future applications exchange data in XML.

  31. What can we do about XML • XML processing tools: • XML parser; XML editors; XML-existing format converter • XML2HTML; DTD2DCD ; DCDeditor • Various Domain-specific XML rendering tools • graphical XML --> Graphic • DTD manager, schema tools, soap processor, web service tools/IDE/system • XML-enabled services/applications: • make your application software capable of serving request from internet (without special prerequisite) and requesting other internet on-line service.

  32. What can we do about XML ? • XML document design/application development • Design standard XML format for various domains • order, transaction, billing, product for business domain • mathematical formula, chemical formula in science • Graph/graphics markup language ; Others: ? • academic artifacts: OODesign (XMI), graph(GXL), petriNet, java Object (XML encoding), AST,... • require cooperation of XML experts and domain experts. • XMLize legacy system data/database • domain:一般企業: 之 人事 庫存 客戶 產品 產品使用手冊 公文; 醫院 學校 政府機關(戶政 地政 稅捐...) : 病例 藥品 課程 戶籍 地籍 稅務 • Approaches: • change old format to new XML format, and optionally, provide a view of old format. • two formats coexisting. • preserve old format, provide a new XML view.

  33. XML information • Java • Sun’s java site: (http://java.sun.com/) • The java tutorial (http://java.sun.com/docs/books/tutorial/) is a nice book to begin with. • Information sources for XML: • W3C site: http://www.w3.org/ • SGML/XML home page: http://xml.coverpages.org/ • XML com: http://www.xml.com/ • XML page of leading computer companies • Microsoft: http://www.microsoft.com/xml/ • IBM: http://www.ibm.com/developer/xml/ • sun: http://java.sun.com/xml • …

  34. XML applications • XML as an alternative representation format • (SVG) Scalar Vector Graph : for vector graph • (MathML) : for mathematical expressions • SMIL (Synchronized Multimedium Integration language): • Resource Description Framework (RDF) : an XML language for describing web resources and their relationship • CML (Chemical Markup Language) : for chemical molecule • JCML : XML format for java bytecodes (object code) • JavaML : for java programs • CppML : XML formats for C++ • Ant : a replacement of make for java • OOML : a OO PL in XML • UIML : user interface Markup language • WAP WML (Wireless Markup Language)

  35. A partial list of XML applications and industry initiatives • W3C Specifications Documentation • Text Encoding Initiative (TEI) • XCES: Corpus Encoding Standard for XML • Encoding and Markup for Texts of the Ancient Near East • Electronic Text Corpus of Sumerian Literature (ETCSL) • Perseus Project • Channel Definition Format, CDF (Based on XML) • RDF Rich Site Summary (RSS) • Open Content Syndication (OCS) • Web Modeling Language (WebML) • Portable Site Information (PSI) • XHTML and 'XML-Based' HTML Modules • W3C Document Object Model (DOM), Level 1 Specification • Web Collections using XML • Meta Content Framework Using XML (MCF) • XML-Data • Namespaces in XML • Resource Description Framework (RDF) • Ontology Interchange Language (OIL) • The Australia New Zealand Land Information Council (ANZLIC) - Metadata • Alexandria Digital Library Project • ATLA Serials Project (ATLAS)

  36. XML in law • BiblioML - XML for UNIMARC Bibliographic Records • Medlane XMLMARC Experiment - MARC to XML • e-Government Interoperability Framework (e-GIF) • US Federal CIO Council XML Working Group • XML Metadata Interchange Format (XMI) - Object Management Group (OMG) • OMG Common Warehouse Metadata Interchange (CWMI) Specification • Object Management Group XML/Value RFP • MDC Open Information Model (OIM) • Dublin Core Metadata Initiative (DCMI) • Open Archives Metadata Set (OAMS) • Publishing Requirements for Industry Standard Metadata (PRISM) • Platform for Internet Content Selection (PICS) XML and Petri Nets • Outline Processor Markup Language (OPML) • ParlML: A Common Vocabulary for Parliamentary Language • Legal XML Working Group • COSCA/NACM JTC XML Court Filing Project • New Mexico District Court XML Interface (XCI)

  37. XML and multimedia • Synchronized Multimedia Integration Language (SMIL) • Multimodal Presentation Markup Language (MPML) • Moving Picture Experts Group: MPEG-7 Standard • DIG35: Metadata Standard for Digital Images • W3C Scalable Vector Graphics (SVG) • Precision Graphics Markup Language (PGML) • Vector Markup Language (VML) • Image Markup Language (IML) • VRML (Virtual Reality Modeling Language) and X3D • Extensible Graph Markup and Modeling Language (XGMML) • Structured Graph Format (SGF) • Graph Exchange Language (GXL) • Petri Net Markup Language (PNML)

  38. XML in chemistry and biochemistry • Georgia State University Electronic Court Filing Project • Web Standards Project (WSP) • Open Software Description Format (OSD) • XLF (Extensible Log Format) Initiative • ALURe (Aggregation and Logging of User Requests) XML Specification • Apache XML Project • WAP Wireless Markup Language Specification • The SyncML Initiative • Materials Property Data Markup Language (MatML) • Measurement Units Markup Language • XML-Based 'eStandard' for the Chemical Industry • Chemical Markup Language • Molecular Dynamics [Markup] Language (MoDL) • StarDOM - Transforming Scientific Data into XML • Bioinformatic Sequence Markup Language (BSML) • BIOpolymer Markup Language (BIOML) • CellML • Gene Expression Markup Language (GEML) • Genome Annotation Markup Elements (GAME)

  39. XML and Finance • Microarray Markup Language (MAML) • XML for Multiple Sequence Alignments (MSAML) • Systems Biology Markup Language (SBML) • OMG Gene Expression RFP • Taxonomic Markup Language • XDELTA: XML Format for Taxonomic Information • Virtual Hyperglossary (VHG) • Weather Observation Definition Format (OMF) • Open Philanthropy Exchange (OPX) • Open Financial Exchange (OFX/OFE) • Interactive Financial Exchange (IFX) • FinXML - 'The Digital Language for Capital Markets' • Investment Research Markup Language (IRML) • Extensible Financial Reporting Markup Language (XFRML) • Extensible Business Reporting Language (XBRL) • XMLPay Specification • Trading Partner Agreement Markup Language (tpaML) • Internet Open Trading Protocol (IOTP) • Financial Products Markup Language (FpML)

  40. XML messaging ( or XML Protocols) • XML Mail Transport Protocol (XMTP) for XML SMTP and MIME Representation • HTML Threading - Use of HTML in Email • XML Messaging (IETF) • Jabber XML Protocol • XML Messaging Specification (XMSG) • M Project: Java XML-Based Messaging System • HTTP Distribution and Replication Protocol (DRP) • Information and Content Exchange (ICE)

  41. FAML DTD for Financial Research Documents • Mortgage Bankers Association of America MISMO Standard • Digital Property Rights Language (DPRL) • Extensible Rights Markup Language (XrML) • Open Digital Rights Language (ODRL) • Research Information Exchange Markup Language (RIXML) • Data Link for Intermediaries Markup Language (daliML) • XML-MP: XML Mortgage Partners Framework • EcoKnowMICS ML • Electronic Book Exchange (EBX) Working Group FIXML - A Markup Language for the FIX Application Message Layer • Bank Internet Payment System (BIPS) • smartX ['SmartCard'] Markup Language (SML)

  42. Secure XML • XML and Encryption • XML Digital Signature (Signed XML - IETF/W3C) • XML Key Management Specification (XKMS) • Security Services Markup Language (S2ML) • AuthXML Standard for Web Security • Digital Signatures for Internet Open Trading Protocol (IOTP) • XML Encoding of SPKI Certificates • Digital Receipt Infrastructure Initiative • Digest Values for DOM (DOMHASH) • Signed Document Markup Language (SDML)

  43. Real Estate Transaction Markup Language (RETML) • OpenMLS and RELML (Real Estate Listing Markup Language) • Data Consortium (Real Estate Standards) • Comprehensive Real Estate Transaction Markup Language (CRTML) • ACORD - XML for the Insurance Industry • iLingo XML Schemas for Insurance • Customer Profile Exchange (CPEX) Working Group • Customer Support Consortium • XML for the Automotive Industry - SAE J2008 • Spacecraft Markup Language (SML) • XML.ORG - The XML Industry Portal • X-ACT - XML Active Content Technologies Council • Electronic Business XML Initiative (ebXML) • BASDA eBIS-XML • Portal Markup Language (PML) • EDGARspace Portal • DII Common Operating Environment (COE) XML Registry • StarOffice XML File Format • Open eBook Initiative • ONIX International XML DTD • NISO Digital Talking Books (DTB)

  44. OpenMath Standard • OMDoc: A Standard for Mathematical Documents • Mathematical Markup Language • Re-Useable Data Language (RDL)" • OpenTag Markup • Metadata - PICS • MIX - Mediation of Information Using XML • CDIF XML-Based Transfer Format Covad xLink API (XML-Based DSL Provisioning) • WebBroker: Distributed Object Communication on the Web • Web Interface Definition Language (WIDL) • Global Engineering Networking Initiative (GEN) • XML/EDI - Electronic Data Interchange • XML/EDI Repository Working Group

  45. Global Uniform Interoperable Data Exchange (GUIDE) • BizCodes Initiative • Universal Data Element Framework (UDEF) • European XML/EDI Workshop • EEMA EDI/EC Work Group - XML/EDI • ANSI ASC X12/XML and DISA • OpenTravel Alliance (OTA) • Hospitality Industry Technology Integration Standards (HITIS) Project • Open Catalog Protocol (OCP) • eCatalog XML (eCX) • vCard Electronic Business Card • Customer Identity / Name and Address Markup Language (CIML, NAML) • AND Global Address XML Definition • Historical Event Markup and Linking • iCalendar XML DTD

  46. EC FrameWorks • CommerceNet Industry Initiative • eCo Interoperability Framework Specification • BizTalk Framework • eCo Framework Project and Working Group • Commerce XML (cXML) • SMBXML: An Open Standard for Small to Medium Sized Businesses • RosettaNet

  47. XML Encoded Form Values • Capability Card: An Attribute Certificate in XML • Telecommunications Interchange Markup (TIM, TCIF/IPI) • aecXML Working Group - Architecture, Engineering and Construction • Building Construction Extensible Markup Language (bcXML) • MasterBuilder Construction Management and Accounting • Green Building XML (gbXML) • Product Data Markup Language (PDML) • Product Definition Exchange (PDX) • Electronic Component Information Exchange (ECIX) and Pinnacles Component Information Standard (PCIS) • ECIX QuickData Specifications • ECIX Component Information Dictionary Standard (CIDS) • ECIX Timing Diagram Markup Language (TDML) • XML and Electronic Design Automation (EDA) • Encoded Archival Description (EAD) • UML eXchange Format (UXF) • XML Data Binding Specification • Translation Memory eXchange (TMX) • P3P Specification: Platform for Privacy Preferences • Extensible Name Service (XNS) • Dialogue Moves Markup Language (DMML)

  48. Scripting News in XML • InterX.org Initiative • Document Encoding and Structuring Specification for Electronic Recipe Transfer (DESSERT) • NuDoc Technology • Coins: Tightly Coupled JavaBeans and XML Elements • DMTF Common Information Model (CIM) • Universal Plug and Play Forum • XML Transition Network Definition (XTND) • Process Interchange Format XML (PIF-XML) • (XML) Topic Maps • DARPA Agent Mark Up Language (DAML) • Rule Markup Language (RuleML) • Relational-Functional Markup Language (RFML) • Ontology and Conceptual Knowledge Markup Languages • Information Flow Framework Language (IFF) • Simple HTML Ontology Extensions (SHOE) • XOL - XML-Based Ontology Exchange Language • Description Logics Markup Language (DLML) • Case Based Markup Language (CBML) • Artificial Intelligence Markup Language (AIML) • Physics Markup Language (PhysicsML)

  49. Procedural Markup Language (PML) • QAML - The Q&A Markup Language • LACITO Projet Archivage de données linguistiques sonores et textuelles [Linguistic Data Archiving Project] • Geography Markup Language (GML) • LandXML • Navigation Markup Language (NVML) • Extensible Data Format (XDF) • Gemini Observatory Project • NASA Goddard Astronomical Data Center (ADC) 'Scientific Dataset' XML • Extensible Scientific Interchange Language (XSIL) • Object Oriented Data Technology (OODT) and XML • Astronomical Markup Language • Astronomical Instrument Markup Language (AIML) • GedML: [GEDCOM] Genealogical Data in XML • adXML.org: XML for Advertising • Newspaper Association of America (NAA) - Standard for Classified Advertising Data • News Industry Text Format (NITF) • XMLNews: XMLNews-Story and XMLNews-Meta • NewsML and IPTC2000 • News Markup Language (NML) • Notes Flat File Format (NFF)

More Related