Xml for scientific computing l.jpg
This presentation is the property of its rightful owner.
Sponsored Links
1 / 57

XML for Scientific Computing PowerPoint PPT Presentation


  • 108 Views
  • Uploaded on
  • Presentation posted in: General

XML for Scientific Computing. Several case studies for XML data in scientific computing. Overview. We will present case studies of the following systems XSIL: Extensible Scientific Interchange Language XDMF: Extensible Data Model and Format Discipline Specific XML: ChemicalML

Download Presentation

XML for Scientific Computing

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Xml for scientific computing l.jpg

XML for Scientific Computing

Several case studies for XML data in scientific computing


Overview l.jpg

Overview

  • We will present case studies of the following systems

    • XSIL: Extensible Scientific Interchange Language

    • XDMF: Extensible Data Model and Format

    • Discipline Specific XML: ChemicalML

    • Gateway Application Descriptors (plus Castor)

  • XML by itself is just markup, like HTML without a browser. Each of the above uses a related set of software to manipulate the XML data.

  • We present several examples of XML to give you an overview.

  • We conclude with some remarks about standards for science applications.


Overview of case studies l.jpg

Overview of Case Studies

  • XSIL and XDMF are examples of representing (meta)data for scientific computing.

    • Concentrate on data structures, data I/O.

    • Meaning of data not described.

  • ChemicalML marks up domain specific data.

    • Meaningfully describes data content.

  • Gateway application data describes science codes themselves.

  • All possess a data object model.

    • Object oriented data descriptions guide the markup tag definitions.


Slide4 l.jpg

XSIL

XML tags for generic scientific data markup, with related Java software.


Slide5 l.jpg

XSIL

  • Developed in support of several projects led by CACR.

    • Example: LIGO, Digital Sky

    • Roy Williams, CalTech.

  • See http://www.cacr.caltech.edu/SDA/xsil/ for more information and free software.

  • XSIL developed for astronomical and gravitational wave communities.

  • But provides general purpose tags.

  • Also comes with software for building Java applications that manipulate, display XSIL documents.


Xsil tags l.jpg

XSIL Tags

  • XSIL defines a small number of tags

    • XSIL: base container for the object model.

    • Comment

    • Param: an arbitrary name/value pair

    • Time: describes time, plus format

    • Table: data in columns and rows

    • Array: table data with specific size

    • URL:

    • Streams: for handling data

  • We’ll now go over some of these in detail.


The xsil tag i l.jpg

The XSIL Tag I

  • XSIL documents map to a document object model with associated handling code.

  • The root tag for XSIL is <XSIL>:

    <XSIL Name=“Example” Type=“Examples.MyExample>

    </XSIL>

  • Type points to the Java code that should process this file.

    • It’s some file called MyExample.java in the package Examples.


The xsil tag ii l.jpg

The XSIL Tag II

  • XSIL tags can be nested if different parts of the XSIL document need to be handled by different codes.

    <XSIL Name=“Example” Type=“Examples.MyExample”>

    <XSIL Name=“Subsection” Type=“Examples.Subsection”>

    </XSIL> </XSIL>

  • XSIL tags thus are the base container in a generic object hierarchy.

    • MyExample object “has a” Subsection object


More on object containers l.jpg

More On Object Containers

  • Consider an Electromagnetics example:

    • A target is represented as a grid for finite difference integration of Maxwell’s eqns.

    • The base input file contains one or more materials.

    • Each material has specific EM properties.

  • If translated to XSIL, could look like this:

    <XSIL Name=“EMRoot” Type=“CEA.Root”>

    <!– Some general parameters -->

    <XSIL Name=“EMMaterial” Type=“CEA.Material”>

    <!– Some info describing the material. -->

    </XSIL>

    </XSIL>


Parameters l.jpg

Parameters

  • Each XSIL tag can contain one or more parameters.

  • Params are arbitrary name/value pairs.

  • Params optionally have units.

    <XSIL …>

    <Param Name=“Color”>Red</Param>

    <Param Name=“Weight” Unit=“kg”>3.14</Param>

    </XSIL>


Tables l.jpg

Tables

  • Params associate one value per name

  • Tables support multiple values

    • A Table row can have any number of values.

  • Each table contains column definitions followed by an arbitrary number of entries.

  • Tables get data from streams (discussed later).


Example table l.jpg

Example Table

<XSIL…>

<Table>

<Column Name=“Color” Type=“string”/>

<Column Name=“Weight” Type=“float” Unit=“kg”/>

<Column Name=“Length” Type=“float” Unit=“meter”/>

<Stream Type=“Local” Delimiter=“,”>

“Red”,100.2,0.2

“Green”,21.7,1.2

</Stream>

</Table>

</XSIL>


Xsil arrays l.jpg

XSIL Arrays

  • XSIL arrays are similar to Fortran and C arrays.

  • For mixed type data, use Tables.

  • If all data is the same (integers, floats), use Arrays.

    <Array Type=“int”>

    <Dim Name=“x-dim”>2</Dim>

    <Dim Name=“y-dim”>2</Dim>

    <Stream Type=“Local” Delimiter=“,”>

    137,42

    8,13

    </Stream>

    </Array>


Xsil streams l.jpg

XSIL Streams

  • XSIL Streams can be used to load data

  • Data sources can be

    • In the file itself (as shown in previous examples).

    • From files on disk

    • From URLs (http://, ftp://, and file:// supported)

  • Loading data from disk

    <Stream Type=“Remote” Encoding=“Littleendian”>

    /home/user1/data/datafile.dat

    </Stream>

  • Loading data from URLs

    <Stream Type=“Remote”>

    http://my.server.edu/XSILdata/datafile.dat

    </Stream>


Ex use xsil to describe input data l.jpg

Ex: Use XSIL to describe input data

<XSIL Name=“InputData” Type=“Examples.InDataHandler”>

<XSIL Name=“Target 1” Type=“Examples.Target”>

<Param Name=“Target”>Scud</Param>

<Param Name=“dx”>0.1</Param>

<Array>

<Dim Name=“X-Dimension”>100</Dim>

<Dim Name=“Y-Dimension”>100</Dim>

<Stream Type=“Remote”>

/home/mpierce/data/mydata.dat

</Stream>

</Array>

</XSIL>

<XSIL Name=“Target 2” Type=“Examples.Target”>

<!– Another target -->

</XSIL>

</XSIL>


Table and array types l.jpg

Table and Array Types

  • Table and Array data can be (in bits)

    • boolean (1)

    • byte (8)

    • short (16)

    • int (32)

    • long (64)

    • float (32)

    • double (64)

    • floatComplex (64)

    • doubleComplex (128)

    • string (arbitrary length)


Using xsil l.jpg

Using XSIL

  • The previous example just marks up data.

  • XSIL also comes with Java bindings that

    • Read the file and parse it.

    • Extract parameter values, units, etc.

    • Read in and manipulate tables, arrays

  • Central ideas:

    • Each XSIL tag corresponds to a Java class

    • XSIL’s Type points to your custom driver code that uses the XSIL classes.


Xsil coding example l.jpg

XSIL Coding Example

  • Consider following small XSIL example

    <XSIL Type=“Examples.MyExample”>

    <Param Name=“x0”>12.0</Param>

    <Param Name=“dx”>0.1</Param>

    </XSIL>


Xsil java code example l.jpg

XSIL Java Code Example

package extensions.Examples

import org.escience.XSIL

public class MyExample {

String x0,dx;

XSIL root;

public MyExample(String xsilFileName) {

root=new XSIL(xsilFileName);

}

public void construct() {

for(int i=0;i<root.getChildCount();i++) {

XSIL x=root.getChild(i);

if(x instance of Param) {

Param p=(Param)x;

if(p.getName().equals(“x0”)) x0=p.getText();

if(p.getName().equals(“dx”)) dx=p.getText();

}}}}


Code notes l.jpg

Code Notes

  • All classes (Param, Table, etc.) extend the XSIL class.

  • Pass the XSIL class root the XSIL path through the constructor.

    • XSIL handles all parsing

  • XSIL class defines getChildCount(), getChild() methods.

  • Param class defines getName() and getText() methods.


Xsil summary l.jpg

XSIL Summary

  • Defines a small set of general purpose tags for scientific data.

  • Data itself is not directly marked up.

    • Read in through streams

  • XSIL software maps Java classes to XSIL tags.

    • Convenient for working with XSIL docs.

    • DOM classes are much more cumbersome to use.


Slide22 l.jpg

XDMF

A data model geared toward finite element codes, with associated software in C++, Java, and TCL


Ice xdmf l.jpg

ICE XDMF

  • ICE (Interdisciplinary Computing Environment) is a comprehensive project at ARL MSRC that attempts to provide a common software platform for DoD scientific codes.

    • Jerry Clarke, lead developer

  • XDMF (Extensible Data Model and Format) provides a common data format for several different codes

    • Primary focus: finite element codes for fluid dynamics and structural mechanics.

    • XDMF and related software provides the backbone for loosely coupling applications and visualization.


Xdmf design l.jpg

XDMF Design

  • XDMF divides data into “light” and “heavy” types.

  • Light data, or metadata, is formatted in XML and will be described in more depth.

  • Heavy data is in HDF5 and not presented here.


Xdmf basic concepts l.jpg

XDMF Basic Concepts

  • XDMF basic tags are <DataStructure> and <DataTransform>

  • <DataStructure> defines the actual data.

  • <DataTransform> defines the area of interest (AOI) in the data.

    • AOI defined by coordinates, a function, or a hyperslab.

  • <DataTransform> contains one or more <DataStructures>

    • The transform defines how the data structure will be filtered.


Simple data structure l.jpg

Simple Data Structure

  • The example below is for 655 XYZ values in the indicated HDF5 file.

    <DataStructure Name="Some XYZ Data"

    Type="Float"

    Dimensions="655 3">

    MyData.h5:/MyXYZdata

    </DataStructure>

  • Simple character data can also be included directly the XML document.


Data structure for mesh connections and pressures l.jpg

<DataStructure

Name="Connections"

Type="Int"

Precision="8"

Dimensions="100 8" >

MyData.h5:/MyConns

</DataStructure>

<DataStructure

Name="Pressure"

Type="Float"

Precision="8"

Dimensions="100">

MyData.h5:/MyPressure

</DataStructure>

Data Structure for Mesh Connections and Pressures


Data structure attribute summary l.jpg

Data Structure Attribute Summary

<DataStructure

Name= "Any name " Some meaningful name to the owner

Rank="NumberOfDimensions" Redundant information

Dimensions="Kdim Jdim Idim" The slowest varying dimension is listed first

Type="Char | Float | Int | Compound" Default is Float

Precision="BytesPerElement" Default is 4

Format="XML | HDF" Default is XML

>


Xdmf array types l.jpg

XDMF Array Types

  • XDMF array entries can have these types:

    • Integer

    • Float

    • Char

  • All are 4 bytes by default, can be increased to 8 bytes.


Datatransform l.jpg

DataTransform

  • DataTransform defines a way for the raw data to be filtered

    • Gives a certain Area of Interest in data set.

  • Possible transforms:

    • Coordinate: Select an particular area

    • Function: Define simple algorithm for selecting area

    • Hyperslab: Define start, stride, and count for each dimension of an array.


Hyperslab transform example l.jpg

Hyperslab Transform Example

  • The following markup instructs the processing code to apply an hyperslab transform to a 4-D array.

  • The first data structure defines the hyperslab:

    • 0000 are the starting points for each dim

    • 2221 are the strides for each dim

    • 25 50 75 3 are the step sizes for each dim

  • The second data structure gives the raw data, a 100x200x300x3 array in the noted HDF5 file.

  • The transform will produce a 25x50x75x3 region that includes every other plane of the original data in the original data region [0,0,0,0]-[50,100,150,2].


Hyperslab transform example32 l.jpg

<DataTransform

Dimensions="25 50 75 3"

Type="HyperSlab">

<DataStructure

Dimensions="3 4"

Format="XML">

0 0 0 0 2 2 2 1 25 50 75 3

</DataStructure>

<DataStructure

Name="Points"

Dimensions="100 200 300 3"

Format="HDF">

MyData.h5:/XYZ

</DataStructure>

</DataTransform>

Hyperslab Transform Example


Data organization l.jpg

Data Organization

  • DataStructures and DataTransform constitute XDMF’s data representation.

  • XDMF Domain tags are used as arbitrary containers.

  • Domains contain grids, grids contain topologies, geometries and attributes, as well as data structures.

  • Attributes include scalars, vectors, tensors


An xdmf example l.jpg

<Domain Name="Example #1">

<Grid Name="My Hex Grid with Pressure">

<Topology Type="Hexahedron"

Dimensions="100"

Order="7 6 5 4 3 2 1 0">

<DataStructure

Name="Connections"

Type="Int"

Precision="8"

Dimensions="100 8" >

MyData.h5:/MyConns

</DataStructure>

</Topology>

(continued in next column)

<Geometry Type="XYZ">

<DataStructure Name="XYZ Data"

Type="Float"

Dimensions="655 3">

MyData.h5:/MyXYZdata

</DataStructure>

</Geometry>

<Attribute Type="Scalar“ Center="Cell">

<DataStructure Name="Pressure"

Type="Float"

Precision="8"

Dimensions="100">

MyData.h5:/MyPressure

</DataStructure>

</Attribute>

</Grid>

</Domain>

An XDMF Example


Review of example l.jpg

Review of Example

  • Recall XDMF is primarily for structured and unstructured finite element grids.

    • Input data includes grid connectivity info, grid geometry, and pressure values

  • The Domain contains a Grid

  • The Grid is defined by Topology, Geometry, and Attributes.

  • Topology, Attributes, and Geometry contain data sources and structure info.


Xdmf api l.jpg

XDMF API

  • Like XSIL, XDMF treats the XML markup as a set of instructions to be processed by actual programs.

  • XDMF defines an API of document processing engines.

    • Core is in C++

    • ICE also provides Java and TCL APIs through wrappers around core.

  • See http://www.arl.hpc.mil/ice/Examples/CodeIntegration/DemoIceRt.cxx for code example.


Xdmf summary l.jpg

XDMF Summary

  • Provides a few general purpose tags

  • Again, data is not directly marked up.

    • Stored in HDF5

  • XDMF handled programmatically with APIs in C++, Java, Tcl.

  • More information:

    • http://www.arl.hpc.mil/ice/


Comparison of xsil and xdmf l.jpg

XSIL

Larger tag set

Java API

Can read data that is in document, on disk, from URL

Questionable performance and memory efficiency for very large data sets.

Free and open source

XDMF

Uses HDF5 for large data sets.

C++, Java, TCL APIs.

Defines both data structures and transform instructions.

Supports arrays, but not mixed data types (such as XSIL Tables).

Integrated with ICE

Comparison of XSIL and XDMF


Chemical markup language l.jpg

Chemical Markup Language

A domain specific XML markup language.


Cml introduction l.jpg

CML Introduction

  • XSIL and XDMF use XML to describe code input files and give simple processing instructions.

  • Tags describe data structure, not content.

  • We now examine a domain specific example, the Chemical Markup Language.

  • Other domain markup languages:

    • Mathematics Markup Language (MathML)

    • Geography Markup Language (GML)


Xml for chemistry l.jpg

XML for Chemistry

  • Goal: provide a common chemical data format that is an open, universal standard.

    • Data representation is platform independent

    • Support structured searches of data banks.

    • Provide a common format for software (particularly visualization).

    • Support multidisciplinary data formats (biology, math) through XML namespaces.

    • Provide a data object hierarchy suitable for object oriented programming.


Cml structure l.jpg

CML Structure

  • Chemistry lends itself to object container structure

    • Atoms have protons, neutrons, electrons

    • Molecules have atoms

    • Complex molecules and compounds are composed of molecules, molecular pieces (benzene rings, for example)

  • CML defines these as data objects with property fields


A simple example glycine l.jpg

<molecule convention="MDLMol" id="glycine" title="GLYCINE">

<date day="22" month="11" year="1995">

</date>

<atomArray>

<atom id="a1">

<string builtin="elementType">

C</string>

<float builtin="x2">0.6424</float>

<float builtin="y2">0.4781</float>

</atom>

….

</atomArray>

<bondArray>

<bond id="b1">

<string builtin="atomRef">a1</string>

<string builtin="atomRef">a2</string>

<string builtin="order">1</string>

</bond>

….

</bondArray>

</molecule>

A Simple Example: Glycine


Cml example software l.jpg

CML Example Software


Previous slide l.jpg

Previous Slide

  • Browser tool, Jumbo-3.0

    • User can display dozens of CML’d molecules.

    • Molecules can by rotated in display.

    • Display is rendered in SVG (Adobe plugin).

    • Molecule displayed is cholesterol. They also have glycine in database, but not as exciting to look at.


Gateway application descriptors l.jpg

Gateway Application Descriptors

Describing scientific applications themselves with XML and mapping to Java with Castor.


Gateway application descriptors47 l.jpg

Gateway Application Descriptors

  • Gateway is a computational web portal for securely submitting and monitoring jobs, transferring files, and archiving information.

  • Gateway describes scientific applications and host computers with XML metadata.

  • This is used to provide general purpose tools that can be used to build portals for specific applications.


Application descriptors l.jpg

Application Descriptors

  • Gateway describes scientific applications and host machines in XML.

  • This is used to generate HTML forms needed to collect information needed to create batch queuing scripts and job submission.

  • The general object container scheme is

    • Portals contain applications

    • Applications contain hosts

    • Each also has a set of descriptive parameters.


Example ansys on grids l.jpg

<Application>

<ApplicationName>ANSYS

</ApplicationName>

<Version>5.0</Version>

<Parameter Name="IOStyle">

<Value>StandardIO</Value>

</Parameter>

<Parameter Name="NumberOfInFiles">

<Value>1</Value>

</Parameter>

(continued on next column)

<Host>

<HostName>

grids.ucs.indiana.edu

</HostName>

<HostIP>156.56.103.5</HostIP>

<RemoteCopy>rcp

</RemoteCopy>

<RemoteExec>rsh</RemoteExec>

<WorkDir>/tmp</WorkDir>

<QueueType>CSH</QueueType>

<QsubPath>/usr/bin/csh

</QsubPath>

<ExecPath>echo

</ExecPath>

</Host>

</Application>

Example: ANSYS on Grids


Java data object bindings l.jpg

Java Data Object Bindings

  • As with other examples, the descriptor does not do anything by itself.

  • Must provide language bindings to make it useful in programs.

  • We used Castor (http://castor.exolab.org) to generate classes for us.


Castor for data object creation l.jpg

Castor for Data Object Creation

  • Direct mapping between Application tag and Java object, for example.

  • Each object has necessary getter and setter methods for manipulating data.

  • After making classes from XML schema (once), load in XML file to program to create particular data object instances (unmarshalled)

  • When program is done, modified data objects can be marshalled back into XML file format.

  • We still have to write the Java code for specific uses, utility classes….


Other markup languages and some comparison l.jpg

Other markup languages and some comparison

Various shortcomings of programming and markup languages


Xml schema l.jpg

XML Schema

  • XML Schema defines many built-in types

    • binary, boolean, byte, decimal, double, float, int, long, short, string

    • And many more

  • Does not define standards for

    • Arrays

    • Complex (real+imaginary) numbers


Slide54 l.jpg

SOAP

  • Known as XML Remote Procedure Call protocol.

    • RPC is only one part of SOAP

  • Also defines encoding rules for data exchange.

  • SOAP inherits all XML Schema Built-in Types (see previous slide).

  • Defines additional compound types

    • Struct: arbitrary collection of types (say, strings and floats) similar to XSIL table entry.

    • Array: can contain primitive and compound types

      • An array can be built out of arrays.


Hdf5 and xml l.jpg

HDF5 and XML

  • Types include

    • Integers

      • 2-64 bit, signed or unsigned, big or little endian

    • Floats (32, 64 bit, BE or LE)

    • Strings

    • Arrays

  • Arbitrary compound types

  • See http://hdf.ncsa.uiuc.edu/HDF5/XML/


Compatibility and missing features l.jpg

Compatibility and Missing Features

  • No standard XML definitions for arrays and “compound types” like XSIL tables.

    • We have several defs: SOAP, XSIL, XDMF, XML-HDF5

  • Lack of built-in support for complex (real + imaginary) types

    • XML, XML-HDF5, XDMF can easily define complex but not in standard way.

    • Java does not have built-in complex type, either


More missing features l.jpg

More Missing Features

  • Varying support for integers, floats with different sizes.

    • C/C++ does not guarantee consistent bit size.

  • Binary data must specify Big Endian/Little Endian encoding for cross platform compatibility.

    • XML-HDF5, XSIL, XDMF all do this

    • XML does not

  • XSIL does not have signed/unsigned


  • Login