What is xml
Download
1 / 44

What is XML? - PowerPoint PPT Presentation


  • 141 Views
  • Updated On :

What is XML?. And Why Do I Care?. In the age of Google, why have fielded data?. More efficient for both data entry and for systems to search, retrieve and ingest Parsed, discretely fielded data can be recombined mechanically for a variety of outputs and uses, including XML.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'What is XML?' - grizelda


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
What is xml l.jpg

What is XML?

And Why Do I Care?


In the age of google why have fielded data l.jpg
In the age of Google, why have fielded data?

  • More efficient for both data entry and for systems to search, retrieve and ingest

  • Parsed, discretely fielded data can be recombined mechanically for a variety of outputs and uses, including XML


Slide3 l.jpg
A popular YouTube to illustrate the power of XML:“The Machine is Using Us”http://youtube.com/watch?v=NLlGopyXT_g

By Michael Wesch, an Assistant Professor of Cultural Anthropolgy at Kansas State University, this clip illustrates how he can supply the same data content to many Web 2.0 sites. The same principles can be applied to the model of supplying data to various software interfaces and tools in an automated fashion—stop and watch it now—it will get you in the XML mood!


So this changes the landscape of digital tools for users and support staff l.jpg
So…..? This changes the landscape of digital tools for users and support staff

It is no longer a matter of “one-size fits all” tools, but a new scenario of multiple tools to fit the users and the use. Supporting multiple tools is less of a burden because the data can be generated once and be automatically transformed by XML stylesheets for each tool or interface or digital collection


What is xml5 l.jpg
What is XML?

  • Extensible Markup Language(XML) is a universal language for sharing data between applications. XML is most appropriate for situations where the volume of data is generally small, as the data is transmitted as text, and controlling the structure of the data is important.

  • TRANSLATION: It shuffles data between applications, and users can grab it and send it to a new application too


What xml does l.jpg
What XML does

  • Tags information

  • Facilitates transfer of that information between applications and also out to the Web (Web 2.0)

  • Allows information to be provided by schemas, which organize information and can represent standards (like MARC or VRA Core 4 or Dublin Core)


How does xml work l.jpg
How does XML work?

  • It “tags” data—identifies what that data is (what meaning it holds).

    MARC tags by using numeric designators:

    for instance a “245” field is always a title, a “700” or “7xx” field is a personal name (creator)



Xml tags l.jpg
XML tags

  • XML tags with natural language—easy to see what the information (the data value) is within the “chicken lips”

    ><


Xml example in vra core 4 l.jpg
XML example (in VRA Core 4)

<!-- AGENT -->

<set>

<display>Jasper Francis Cropsey (American painter, 1823-1900)</display>

<index>

<agent>

<name type="personal" vocab="ULAN" refid="500012491">Cropsey, Jasper Francis</name>

<dates type="life">

<earliestDate>1823</earliestDate>

<latestDate>1900</latestDate>

</dates>

<culture>American</culture>

<role vocab="AAT" refid="300025136">painter</role>

</agent>

</index>

</set>


Schema where the data standard and xml meet l.jpg
Schema: Where the data standard and XML meet

Once a data standard like VRA Core 4.0 is devised, with all the elements and qualifiers laid out, the standard can then be expressed in one XML document called the schema—a road map to then apply to a specific XSLT style sheet that tells a database (or another type of application) how to export data into (Core 4) XML. A schema is a set of rules to which the xml document must conform to be “valid”


Vra core 4 0 xml schema a small sample l.jpg
VRA Core 4.0 XML schema (a small sample)

<!-- Agent -->

<xsd:complexType name="agentType">

<xsd:annotation><xsd:documentation>VRA Agent element. Subelements are used for different types of data (names, roles, dates, etc.). At least one subelement must be provided.</xsd:documentation>

</xsd:annotation>

<xsd:sequence minOccurs="1" maxOccurs="unbounded">

<xsd:element name="attribution" type="basicString" minOccurs="0" />

<xsd:element name="culture" type="basicString" minOccurs="0" />

<xsd:element name="dates" type="agentDateType" minOccurs="0" />

<xsd:element name="name" type="agentNameType" minOccurs="0" />

<xsd:element name="role" type="basicString" minOccurs="0" />

</xsd:sequence>

<xsd:attributeGroup ref="vraAttributes" />


Xml example compare this output to the previous slide schema outline for the agent data element l.jpg
XML example (compare this output to the previous slide--schema outline for the agent data element)

<!-- AGENT -->

<set>

<display>Jasper Francis Cropsey (American painter, 1823-1900)</display>

<index>

<agent>

<name type="personal" vocab="ULAN" refid="500012491">Cropsey, Jasper Francis</name>

<dates type="life">

<earliestDate>1823</earliestDate>

<latestDate>1900</latestDate>

</dates>

<culture>American</culture>

<role vocab="AAT" refid="300025136">painter</role>

</agent>

</index>

</set>


What is xslt l.jpg
What is XSLT? slide--schema outline for the agent data element)

  • You can export XML data from FileMaker or Access (and many other programs) to use in an assortment of applications simply by applying the appropriate Extensible Stylesheet Language Transformation(XSLT) stylesheet. XSLT is also XML-based. You can use a stylesheet to take an XML document and turn it into plain text, PDF documents, web pages, or to import fielded data into other applications.


Xlst sample how the xml is actually exported from a database in this case fmp l.jpg
XLST Sample—how the XML is actually exported from a database (in this case FMP)

<!-- Agent -->

<set>

<display>

<xsl:value-of select="fm:AgentDisplay" />

</display>

<index>

<xsl:for-each select="fm:AgentSortName/fm:DATA">

<xsl:variable name="i">

<xsl:value-of select="position()" /> </xsl:variable>

<agent>


File extensions for the 3 parts of xml l.jpg
File Extensions for the 3 parts of XML database (in this case FMP)

So when you see these file extensions, you will know what you are looking at:

The XML document is .xml

The XML schema is .xsd

The XSLT stylesheet is .xsl


Ummm yeah ok l.jpg
Ummm, yeah, OK database (in this case FMP)

Will you do coding/tagging for schemas? (No, you will use schemas provided/published for standards—MARC (MODS), VRA 4.0, CDWA lite, etc.)

Will you do coding/tagging for XSLT? (Maybe, if you take a class and are interested. More likely you will get tech support or support from user groups)

Will you be able to look at an XML document and basically understand it and edit it? (Yes, this is similar to learning HTML and HTML editors)


So how does this fit into my cataloging l.jpg
So how does this fit into my cataloging? database (in this case FMP)

VRA Core 4 and CCO were both formed with an eye to output and expression in XML

They can be used in “flat” systems, but there is a clear benefit to using relational databases, and XML is also good at capturing/transmitting relational structure


Relational databases l.jpg
Relational Databases database (in this case FMP)

  • Relate information stored in multiple tables

  • Ideally, there is no redundancy of data entry—each value that might be reused in data entry is only entered once and stored in one table that is related for use everywhere else in the database (made available anywhere needed in the data entry workflow)

  • Numeric keys are normally used in this process


Excel sample flat file output l.jpg
Excel sample (“flat file” output) database (in this case FMP)

Notice that each row represents an image file and conflates the work and image records (repeats the information about the work for each image).

Each repeating value (like Artist) must have a column reserved for possible use.


A pithy answer to why relational for cataloging l.jpg
A pithy answer to “why relational?” (for cataloging) database (in this case FMP)

Message from Jan Eklund to VRA-L, Feb 20, 2008, subject: Re: CONTENTdm and metadata (search list archive for full message)

Complexity: “complexity cannot be captured efficiently in a flat data model because basically you have to leave space in every record to accommodate the most complex object you will ever encounter. This adds up to a lot of wasted space, and wasted space means more money…”

Consistency: “all the descriptive data about the work is entered once, and every image that shows this work inherits the same information”


Slide22 l.jpg

Image and Work records (example from VCat) database (in this case FMP)


Slide23 l.jpg

Repeating values are supported for each element database (in this case FMP)

“indexed” value (in this case the sort name)

Numeric key

A note field is possible for every Core 4 element

“display” value done to CCO recommended formatting. Note that the Agent Nationality is supplied automatically here by the

Link (numeric key) to the Agent Authority


Slide24 l.jpg

Authority record database (in this case FMP)

All the information about the agent is supplied from this file on the basis of the numeric key

Numeric key


Slide25 l.jpg

The same information expressed in Core 4 XML—this is automatically output from the database

<agentSet>

<display>ACT Architecture (French architectural firm, ca. 1982-present); Gaetana Aulenti (Italian interior designer, born 1927); Victor Alexandre Frédéric Laloux (French architect, 1850-1937)</display>

<notes>ACT Architecture (Renaud Bardon, Pierre Colboc and Jean-Paul Philippon)</notes>

<agent>

<name vocab="ULAN" refid="500023967" type="personal">Laloux, Victor Alexandre Frédéric</name>

<dates type="life">

<earliestDate>1850</earliestDate>

<latestDate>1937</latestDate>

</dates>

<culture>French</culture>

</agent>

<agent>

<name vocab="LCNAF" refid="nr 95039966" type="corporate">ACT Architecture</name>

<dates type="activity">

<earliestDate>1982</earliestDate>

<latestDate>2082</latestDate>

</dates>

<culture>French</culture>

</agent>

<agent>

<name vocab="ULAN" refid="500031019" type="personal">Aulenti, Gaetana</name>

<dates type="life">

<earliestDate>1927</earliestDate>

<latestDate>9999</latestDate>

</dates>

<culture>Italian</culture>

</agent>

</agentSet>


The element set of core 4 l.jpg
The Element Set of Core 4 automatically output from the database


Format and global attributes l.jpg
Format and Global Attributes automatically output from the database


Reciprocity in relationships l.jpg
Reciprocity in Relationships automatically output from the database

Easy to show relationships between works in a relational database and via XML. In this case the XSLT stylesheet (in conjunction with programming within the database) can be written to supply the reciprocity (the other related work) based on the numeric key.


Stylesheets can do a lot l.jpg
Stylesheets can do a lot! automatically output from the database

They literally do “transformations”—they can change the XML into other formats, they can recombine parsed information—and they can even take that more efficient and consistent relational data and “flatten” it, and output it in csv (Excel) for import into delivery systems or other uses that are not yet XML-compatible!


Other data standards field structures and xml l.jpg
Other Data Standards (field structures) and XML automatically output from the database

  • MARC; MODS

  • CDWA

  • Dublin Core

  • VRA Core 4.0

  • EAD

  • METS


Marc machine readable cataloging l.jpg
MARC—Machine Readable Cataloging automatically output from the database

  • Emerged from a Library of Congress-led initiative that began in the 1970sfor bibliographic (reprographic) materials

  • Uses numeric tags to designate the fields (“245” means title, “700” fields are makers/creators etc)

  • This enabled computer protocols to share data worldwide

  • “The future of the MARC formats is a matter of some debate in the worldwide library science community. On the one hand, the formats are quite complex and are based on outdated technology. On the other, there is no alternative bibliographic format with an equivalent degree of granularity. The huge user base, billions of records in tens of thousands of individual libraries, also creates inertia” (Wikipedia entry)


Mods metadata object description schema l.jpg
MODS—Metadata Object Description Schema automatically output from the database

  • A schema that allows the traditional numerically tagged MARC to be turned into XML

  • Can carry data from existing MARC plus allows creation of new XML-based records—a way to integrate and move forward?

    http://www.loc.gov/standards/mods/


Cdwa core description of works of art l.jpg
CDWA—Core Description of Works of Art automatically output from the database

  • Developed by the Getty specifically to describe art, architecture and cultural artifacts

  • A very granular standard—the fields are very narrowly defined and there are many specific fields (as opposed to a few fields that use “qualifiers”) Example: Creation - Commissioner - Commissioner Role

  • See the CDWA lite xml schema:

    http://www.getty.edu/research/conducting_research/standards/cdwa/cdwalite.html


Dublin ohio core l.jpg
Dublin (Ohio) Core automatically output from the database

  • Developed by OCLC (headquartered in Dublin OH) (serving 53,500 libraries in 96 countries)

  • Created to describe “born digital” items in particular

  • Simple “bins” of data that can be further “qualified” (difference in Simple DC and Qualified DC)

  • A qualifier is an element refinement—example Date. Creation


The simple dublin core metadata element set dcmes consists of 15 l.jpg
The automatically output from the databaseSimpleDublin Core Metadata Element Set (DCMES) consists of 15:

  • Title

  • Creator

  • Subject

  • Description

  • Publisher

  • Contributor

  • Date

  • Type

  • Format

  • Identifier

  • Source

  • Language

  • Relation

  • Coverage

  • Rights


Vra core 4 0 l.jpg
VRA Core 4.0 automatically output from the database

  • Published in April 2007:

    http://www.vraweb.org/datastandards/VRA_Core4_Welcome.html

    • A data standard guiding data structure

    • Formed with an eye to expressing content in XML—with both index and display values

    • Formed like library records with a “bib” (work) record and an item (image) record

    • Formed as is Dublin Core with a 1:1 relationship—one record describes one object


Ead encoded archival description l.jpg
EAD (Encoded Archival Description) automatically output from the database

Started 1993 at Berkeley—now maintained by Library of Congress with SAA (Society of American Archivists) Began using SGML, now uses XML

So, tagged and machine-readable, but not necessarily 1:1 records—simple way to make groups/boxes of material retrievable


Sample ead finding aid l.jpg
Sample EAD Finding Aid automatically output from the database

  • http://webtext.library.yale.edu/art/art.VRC1.htm

  • 152 boxes; 64 linear feet of mounted photographs of American painting now in storage

  • Simply used the outline of the original filing/drawers and tagged them—this translates now to boxes of material with barcodes


Mets metadata encoding and transmission standard l.jpg
METS (Metadata Encoding and Transmission Standard) automatically output from the database

http://www.loc.gov/standards/mets/

Think of it as an XML “wrapper”—it can describe a group of objects, a collection of different objects, can “wrap” around a set of XML items that are different formats and therefore may be a way to integrate and present these


Mets profiles l.jpg
METS Profiles automatically output from the database

UCSD Simple Object Profile

  • abstract:The UCSD Libraries uses the UCSD Simple Object profile for composing METS instances for digital objects consisting of a single digital content file and associated descriptive, administrative, and structural metadata. The single digital content file may be of any format type, e.g., audio, image, text, or video, and it may be represented in the METS instance with content equivalent file versions. For example, a digital image may be represented in the METS instance by a TIFF file, a JPEG file, and a GIF file, with each containing the same content image.


What do book librarians have that vr professionals don t l.jpg
What do [book] librarians have that VR professionals don’t?

Tools and networked utilities for COPY CATALOGING:

MARC (Machine Readable Cataloging) for field structure (data standard)

AACR2 (Anglo-American Cataloging Rules) for data formatting (data content)

XML and Z39.50 (and other protocols) for transmitting data

OCLC as a shared records repository (sustainable business model)


How do we get to shared vr image cataloging l.jpg
How do we get to shared VR image cataloging? don’t?

  • Have to develop the same general mechanisms as the library world

    • VRA Core 4.0 = MARC

    • CCO = AACR2

    • XML will be one transmission vehicle/protocol

    • OAI (Open Archives Initiative) may become a harvesting and retrieval mechanism for record sharing


Oai open archives initiative xml based l.jpg
OAI (Open Archives Initiative)—XML Based don’t?

http://www.openarchives.org/

Started by 2 computer scientists at Cornell to quickly share information via mechanical “harvesting”—databases are opened to allow harvesting and results are then put in a central repository for searching. It is a “low-barrier” interoperability framework using Dublin Core (in XML) as its minimum standard, but one can also use other standards (expressed in XML) on top of that.

Google is using OAI to harvest data from the National Library of Australia. (See also U Michigan’s OAIster project).


See xml matters l.jpg
See—XML matters! don’t?

Susan Jane Williams

Independent Cataloging and Consulting

williams.susanjane@gmail.com


ad