590 likes | 721 Views
文件整合技術的新里程─ XML. 葉 慶 隆 大同大學資訊工程學系所 Email: chingyeh@cse.ttu.edu.tw URL: http://www.cse.ttu.edu.tw/~chingyeh. 內容. XML 基本概念、背景說明、及應用 用 DTD 定義 XML 文件的結構 XML Schema 人與文件的關係 XML 相關標準: XSL 、 DOM 、 XML Query 、 XLink 、 SOAP XML 在 EB 上的應用 UDDI: 網路服務的整合 參考資料. 文件的結構、內容、及格式. 文件的結構是指文件組成成分組合的方式
E N D
文件整合技術的新里程─XML 葉 慶 隆 大同大學資訊工程學系所 Email: chingyeh@cse.ttu.edu.tw URL: http://www.cse.ttu.edu.tw/~chingyeh
內容 • XML基本概念、背景說明、及應用 • 用DTD定義XML文件的結構 • XML Schema • 人與文件的關係 • XML相關標準:XSL、DOM、XML Query、XLink、SOAP • XML在EB上的應用 • UDDI:網路服務的整合 • 參考資料 文件整合技術的新里程─XML
文件的結構、內容、及格式 • 文件的結構是指文件組成成分組合的方式 • 如一篇文章的章節段落 • 請購單的欄位資料 • 腳踏車組合手冊的簡介、組合程序、零件表、故障排除、索引等 • 文件的內容是指文件內實際的資料 • 如腳踏車組合手冊的文字、圖表等 • 文件的格式是指將文件組成成分以一種視覺效果呈現給讀者 • 如黑體字、斜體字、內縮、段落加寬、表格等 • 文件結構和文件格式容易混淆 文件整合技術的新里程─XML
PRODUCT ADVISORY Number: 146 Type: Parts Date: 8/15/95 Subject: Revised Replacement Parts ... Model 501 User Replaceable Parts The parts list identified in the AnyCorp Model 501 ... New Parts List 1. 345-234 (Filter, cooling fan) 2. 148-745 (Fuse, power: 1.5amp) 3 ... Product Advisory Number: 146 Type: Parts Date: 8/15/95 Revised: Subject: Revised Replacement ... Model 501 User-Replaceable Parts The parts list identified in the ... New Parts List 1. 345-234 (Filter, cooling fan) 2. 148-745 (Fuse, power: 1.5amp) 3. ... 文件的結構、內容、及格式 文件整合技術的新里程─XML
文件的結構、內容、及格式 • MS Word著重文件格式之處理 • 以格式功能將文件結構以視覺效果呈現給讀者 • 人讀的懂,但電腦不易得知文件的內容 • HTML文件係將格式以標記形式安插在文件中,加上Internet功能 • 利用瀏覽器以視覺效果呈現給讀者,方便在網路上航行 • 其呈現效果與Word文件類似 • XML是一種標記語言。XML文件中的標記是用來表示文件中的結構資訊 文件整合技術的新里程─XML
HTML 與 XML的比較 • HTML • SGML在Internet上的應用 • 一種資料呈現的技術 • 不具擴充性的標記 • XML • SGML的精簡版+Internet功能 • 表現文件內容與結構 • 可依所需自訂標記(meta-language) 文件整合技術的新里程─XML
為什麼使用XML • XML可以表現文件內豐富的結構資訊,有利於網路上的應用 • HTML受制於固定的標記,無法表達各式各樣的結構 • SGML可以表達各式各樣的結構,但太複雜,製作成本高,不實用 文件整合技術的新里程─XML
XML小歷史 • 1996年在西雅圖的集會中, SGML專家探討如何結合SGML與web • Sun Microsystems的Jon Bosak領導朝兩個方向討論 • 以 HTML作為資訊格式的不足之處 • SGML作為web應用標準的缺失 • “SGML on web” activity, 07/1996 • 開始調適SGML成為合適於web • 10/02/1998 XML 1.0誕生 文件整合技術的新里程─XML
XML的目標 • 便於在Internet上使用 • 支援各種應用 • 與SGML相容 • 易於寫程式處理XML文件 • 降低選擇性的特徵(features) • 人讀的懂 • XML的設計要迅速 • XML的設計要清楚和正規(formal) • XML文件製作容易 • 簡潔性並非重要考慮因素 文件整合技術的新里程─XML
XML應用 • Electronic commerce • Electronic data interchange (EDI) • Fine-grain content publishing • Internet search engines • Distributed application design • etc. 文件整合技術的新里程─XML
The XML Cataloghttp://www.xml.org/xmlorg_registry/index.shtml • The XML Catalog lists organizations known to be producing industry-specificor cross-industry XML Specifications. • Since XML activity is growing quickly, the list is likely to be incomplete. We would appreciate your sending us any updates or additions. 文件整合技術的新里程─XML
Accounting • Advertising • Architecture and Construction • Architecture and Construction • Automotive • Aviation and Aerospace • Aviation and Aerospace • Bibliographies • Catalogs • Communication • Computer Graphics • Content Syndication • CRM - Customer Relationship Management • Data Mining • Defense Aerospace • Directory Services • Distributed Management • Economics • Education • Electronic Commerce • EDI - Electronic Data Interchange • Energy • Enterprise Information Portals • ERP - Enterprise Resource Planning • Financial and Capital Markets • Food • Forms • Geography • Healthcare • Human Resources • Industrial Automation • Insurance • Legal • Middleware • Music • News • Publishing • Real Estate • Retail • Science • Software • SCM - Supply Chain Management • Translation and NLP • Travel • User Interface • Voice • Weather • Web Applications • Workflow Complete list of the XML Catalog 文件整合技術的新里程─XML
定義XML文件 • XML文件的結構(型態)可以由「文件型態定義」(document type definition, DTD) 、或是Schema來制定 • DTD是透過元素(element) 、屬性(attribute) 、及實體(entity)等宣告,來制定文件結構的規則 文件整合技術的新里程─XML
定義XML文件 <!DOCTYPE label[ <!ELEMENT label (name,street,city,state,country,code)> <!ELEMENT name (#PCDATA)> <!ELEMENT street (#PCDATA)> <!ELEMENT city (#PCDATA)> <!ELEMENT state (#PCDATA)> <!ELEMENT country (#PCDATA)> <!ELEMENT code (#PCDATA)> ]> <label><name>Rock N. Robyn</name> <street>Jay Bird Street</street> <city>Baltimore</city> <State>MD</state> <country>USA</country> <code>43214</code> </label> 文件整合技術的新里程─XML
Well-Formed and Valid Documents • XML has two different notions of “correct.” • Valid documents • Declaring conformance to a DTD in a document type declaration • “Using the right words in the right place” • Type-valid • Well-formed documents • Markup is intelligible. • “Getting the pronunciation right” • Non-type-valid 文件整合技術的新里程─XML
DTD語法 • Seven major headings: • document type declarations • element types • attributes • entities • notations • conditional sections • processing instructions 文件整合技術的新里程─XML
文件類型宣告Document Type Declaration • 文件類型宣告定義了文件的邏輯結構規則、及一些相關的實體宣告(entity) • 規定文件的邏輯及實體結構 • 規則可以含在文件類型宣告裡、存在外部檔案或二者 文件整合技術的新里程─XML
文件類型宣告Document Type Declaration <!DOCTYPE label[ <!ELEMENT label (name,street,city,state,country,code)> <!ELEMENT name (#PCDATA)> <!ELEMENT street (#PCDATA)> <!ELEMENT city (#PCDATA)> <!ELEMENT state (#PCDATA)> <!ELEMENT country (#PCDATA)> <!ELEMENT code (#PCDATA)> ]> <label><name>Rock N. Robyn</name> <street>Jay Bird Street</street> <city>Baltimore</city> <State>MD</state> <country>USA</country> <code>43214</code> </label> <?xml version “1.0”?> <!DOCTYPE LABEL SYSTEM http://www.sgmlsource.com/dtds/label.dtd> <LABEL> . . . </LABEL> 文件整合技術的新里程─XML
文件類型宣告Document Type Declaration <?xml version “1.0”?> <!DOCTYPE LABEL SYSTEM http://www.sgmlsource.com/dtds/label.dtd> <LABEL> . . . </LABEL> <!DOCTYPE GARAGESALE SYSTEM “garage.dtd”> <!ENTITY LOGO “logo.gif”>]> <GARAGE>. . .</GARAGE> 文件整合技術的新里程─XML
元素類型宣告Elements Type Declaration • 元素構成了XML文件的邏輯結構 Element Type Declaration [45] elementdecl ::= '<!ELEMENT' S Name S contentspec S? '>' [46] contentspec ::= 'EMPTY' | 'ANY' | Mixed | children Element-content Models [47] children ::= (choice | seq) ('?' | '*' | '+')? [48] cp ::= (Name | choice | seq) ('?' | '*' | '+')? [49] choice ::= '(' S? cp ( S? '|' S? cp )* S? ') [50] seq ::= '(' S? cp ( S? ',' S? cp )* S? ')' 文件整合技術的新里程─XML
Elements Type Declaration <!ELEMENT spec (front, body, back?)><!ELEMENT div1 (head, (p | list | note)*, div2*)><!ELEMENT dictionary-body (%div.mix; | %dict.mix;)*> <!ELEMENT p (#PCDATA|a|ul|b|i|em)*><!ELEMENT b (#PCDATA)> 文件整合技術的新里程─XML
屬性Attributes • 屬性賦予元素附加資訊(meta-data),如安全等級、修改狀態、代號等 <!ATTLIST sample id ID #IMPLIED n CDATA #REQUIRED status (draft|final) “final”> 屬性值類型 內定值 屬性名稱 文件整合技術的新里程─XML
實體Entities • 有兩類實體 • general entities: apply within the top-level and its attribute values. • parameter entities: apply within the internal and external DTD subsets. 文件整合技術的新里程─XML
Entities: General Entities <!ENTITY xml “Extensible Markup Language”> <para>The &xml; is derived from ISO 8879, an International Standard<index label=&xml;> <para> <para>The Extensible Markup Language is derived from ISO 8879, an International Standard<index label=“Extensible Markup Language”> <para> 文件整合技術的新里程─XML
Entities: Parameter Entity <!ENTITY %inline “#PCDATA|emphasis|link”> <!ELEMENT para (%inline;)*> <!ELEMENT para (#PCDATA|emphasis|link)*> 文件整合技術的新里程─XML
Notations • Notations are used to include non-XML contents ─ like graphics, sounds, video , or source-code listing ─ in XML documents. • While the XML parser knows nothing about the specific notations, it can pass them on to the processing software to let it know what kinds of data to handle. <!NOTATION TeX PUBLIC “+//ISBN 0-201-13448-9::Knuth//NOTATION The TeXbook//EN”> 文件整合技術的新里程─XML
Conditional Sections • In the external DTD subsets and external parameter entities, XML allows conditional sections that the parser can include or ignore, depending on the value of the keywords at the start. <![IGNORE [ <!ELEMENT para (#PCDATA)> ]]> <!DOCTYPE book SYSTEM “book.dtd”[ <!ENTITY %include-para “INCLUDE”> ]> <!ENTITY %include-para “IGNORE”> <![%include-para;[ <!ELEMENT para (#PCDATA)> ]]> overriding a parameter entity 文件整合技術的新里程─XML
Processing Instructions • XML parser will pass PIs on to your application, but will be up to you to do something useful with them. <?IS10744:arch name=“abc”> 文件整合技術的新里程─XML
Example: A DTD for B2B EC • RosettaNet PIP 3 A2 Price And Availability Query Version 1.2 Available at http://www.rosettanet.org 文件整合技術的新里程─XML
XML Schema • The new XML Schema system aims at providing a rich grammatical structure for XML documents that overcomes the limitations of the DTD. 文件整合技術的新里程─XML
Limitations of DTD • XML inherited DTDs from SGML. • DTDs can be used to define content models and, to a limited extent, the datatypes of attributes, but they have a number of obvious limitations: • different (non-XML) syntax • no support for namespaces • extremely limited datatyping • a complex and fragile extension mechanism based on little more than string substitution (no explicit relationship) 文件整合技術的新里程─XML
What is a Schema? • A schema is a model for describing the structure of information. • In the context of XML, a schema describes a model for a whole class of documents. • A schema might also be viewed as an agreement on a common vocabulary for a particular application that involves exchanging documents. 文件整合技術的新里程─XML
What is a Schema? • In schemas, models are described in terms of constraints. • Two kinds of constraints that you can give: • content model constraints describe the order and sequence of elements and • datatype constraints describe valid units of data. 文件整合技術的新里程─XML
<address> <name>Namron H. Slaw</name> <street>256 Eight Bit Lane</street> <city>East Yahoo</city> <state>MA</state> <state>CT</state> <zip>blue</zip> </address> invalid What is a Schema? • For example, a schema might describe a valid <address> with the content model constraint that • it consists of a <name> element, followed by • one or more <street> elements, followed by • exactly one <city>, <state>, and <zip> element. • The content of a <zip> might have a further datatype constraint that it consist of either a sequence of exactly five digits or a sequence of five digits, followed by a hyphen, followed by a sequence of exactly four digits. No other text is a valid ZIP code. 文件整合技術的新里程─XML
Features of Schema • Richer datatypes • booleans, numbers, dates and times, URIs, integers, decimal numbers, real numbers, intervals of time, etc. • User defined types • Attribute grouping • Refinable archetypes • Namespace support 文件整合技術的新里程─XML
Validity • Reasons why need to validate documents: • EC: received is exactly what you expect. • B2B: validating before inserting into your database. • XML document for control purpose • Content model validity tests whether the order and nesting of tags is correct. • Datatype validity is the ability to test whether specific units of information are of the correct type and fall within the specified legal values. 文件整合技術的新里程─XML
Illustrations of XML Schema An XML document fragment <InvoiceNo>123456789</InvoiceNo> <ProductID>J123456</ProductID> DTD fragment describing the above elements <!ELEMENT InvoiceNo (#PCDATA)> <!ELEMENT ProductID (#PCDATA)> XML Schema fragment describing the above elements <element name='InvoiceNo' type='positive-integer'/> <element name='ProductID' type='ProductCode'/> <simpleType name='ProductCode' base='string'> <pattern value='[A-Z]{1}d{6}'/> </simpleType> 文件整合技術的新里程─XML
In Brief • Schemas greatly improves over DTDs. • Certain kinds of applications can be made more interoperable by XML Schema. • DTDs are well understood and they do offer a good way to describe the structure of an document for interchange. • It will take some time before XML Schema are as well understood. 文件整合技術的新里程─XML
Types of Interaction with Documents • Most documents stored in XML forms are created for the purpose of conveying information or keeping track of information. • Types of interactions people have with documents: • creation and modification • management, storage, and archiving • utilization. 文件整合技術的新里程─XML
Types of Interaction with Document Document utilization Document creation and modification Document management and storage Document classification Document assembly Document archival Document storage Printing Creation Searching and viewing Update Review/ validation Exchange Online searching viewing, exchange, export Useful database information Conversion/ transformation Building alternate documents Extraction, analysis Import 文件整合技術的新里程─XML
XSL: Transformation and Styling • XSL is a language for expressing stylesheets. (www.w3.org) • It consists of two parts: • XSL Transformations (XSLT): a language for transforming XML documents • An XML vocabulary for specifying formatting semantics (XSL Formatting Objects) XSL document (template rules) XSL processor XML document Output 文件整合技術的新里程─XML
DOM: Document Object Model • DOM is an API for HTML and XML • It defines the logical structure of documents and the way a document is accessed and manipulated. • With the DOM, programmers can build documents, navigate their structure, and add, modify, or delete elements and content. • Language-independent 文件整合技術的新里程─XML
DOM: Document Object Model • DOM parsers • Xerces: xml.apache.org • C++, Java, Perl • IBM: www.alphaworks.ibm.com • Java, C++ • SUN: java.sun.com • Java • MS: msdn.microsoft.com • IE, Data Channel XML Parser 文件整合技術的新里程─XML
DOM: Document Object Model • Abstract Tree Produced by XML Parser 文件整合技術的新里程─XML
DOM: Document Object Model • DOM operations: getDocType getName getElementsByTagName item getFirstChild getNodeValue getAttribute getChildNodes getLength getTagName 文件整合技術的新里程─XML