Paul bye and mike accola
This presentation is the property of its rightful owner.
Sponsored Links
1 / 17

Toolkit Enhancements InfoSphere Streams Version 3.0 PowerPoint PPT Presentation


  • 97 Views
  • Uploaded on
  • Presentation posted in: General

Paul Bye and Mike Accola. Toolkit Enhancements InfoSphere Streams Version 3.0. Important Disclaimer. THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY.

Download Presentation

Toolkit Enhancements InfoSphere Streams Version 3.0

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Paul bye and mike accola

Paul Bye and Mike Accola

Toolkit EnhancementsInfoSphere Streams Version 3.0


Toolkit enhancements infosphere streams version 3 0

Important Disclaimer

THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY.

WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED.

IN ADDITION, THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE.

IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION.

NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, OR SHALL HAVE THE EFFECT OF:

  • CREATING ANY WARRANTY OR REPRESENTATION FROM IBM (OR ITS AFFILIATES OR ITS OR THEIR SUPPLIERS AND/OR LICENSORS); OR

  • ALTERING THE TERMS AND CONDITIONS OF THE APPLICABLE LICENSE AGREEMENT GOVERNING THE USE OF IBM SOFTWARE.

    The information on the new product is intended to outline our general product direction and it should not be relied on in making a purchasing decision. The information on the new product is for informational purposes only and may not be incorporated into any contract. The information on the new product is not a commitment, promise, or legal obligation to deliver any material, code or functionality. The development, release, and timing of any features or functionality described for our products remains at our sole discretion.

THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE.

IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION.


Toolkit enhancements infosphere streams version 3 0

Agenda

  • Toolkit Repackaging for Streams Version 3.0

  • Database Toolkit XML Support

  • Database Toolkit Optional Control Port

  • Database Toolkit Netezza Native Loader


Toolkit enhancements infosphere streams version 3 0

Toolkit Repackaging for Streams Version 3.0

  • The Streams Financial Toolkit and Mining Toolkit are now included in the base InfoSphere Streams installation along with the rest of the toolkits

    • Shipped and installed separately in prior releases of Streams

  • No changes are necessary to SPL source files using these toolkits from previous releases

  • Application Makefile and commandline sc commands used to build applications with the Financial or Mining Toolkit in a previous release need to be adjusted to point to the new toolkit locations

    OLD:

    sc -a -z -t /mining/toolkit/install/location

    NEW:

    sc -a -z -t $(STREAMS_INSTALL)/toolkits/com.ibm.streams.mining


Toolkit enhancements infosphere streams version 3 0

Database Toolkit – XML Support

  • With the addition of the native ‘xml’ SPL type, the Database Toolkit has been enhanced to interact with XML data between the database and the Streams application

  • Tuple attributes of type ‘xml’ can be written to, or read from, database column types of CHAR, VARCHAR, NCHAR, or NVARCHAR

    • Native column types are now specified in the connections.xml file with the new <native_schema> and <column> elements which replace the deprecated <external_schema> and <attribute> elements

    • Example:

OLD:

<external_schema>

<attribute name=“id” type=“int32/>

<attribute name=“name” type=“rstring” length=“15”/>

</external_schema>

NEW:

<native_schema>

<column name=“id” type=“INTEGER/>

<column name=“name” type=“VARCHAR” length=“15”/>

</native_schema>


Toolkit enhancements infosphere streams version 3 0

Example - ODBCAppend

DB Table

INTEGER (id)

VARCHAR (xmldata)

ODBCAppend

source

Input tuple schema

-int32 id

-xml xmldata

SPL Application:

////////////////////////////////////////////////

// Read from file

////////////////////////////////////////////////

stream<int32 id, xml xmldata> tabledata = FileSource() {

param

file : “tabledata.csv";

format : csv;

initDelay : 5.0;

}

////////////////////////////////////////////////

// Write data to the database

////////////////////////////////////////////////

() as DBSink = ODBCAppend(tabledata) {

param

connection : “DBXML";

access : “TableWithXML";

connectionDocument : "./etc/connections.xml";

}

connection.xml:

<connection_specifications>

<connection_specification name=“DBXML" >

<ODBC database=“mydb” user=“user" password=“password” />

</connection_specification>

</connection_specifications>

<access_specifications>

<access_specification name=“TableWithXML">

<table tablename=“XMLTABLE" />

<uses_connection connection=“DBXML" />

<native_schema>

<column name="id" type="INTEGER" />

<column name=“xmldata" type=“VARCHAR“ length=“15” />

</native_schema>

</access_specification>

</access_specifications>


Toolkit enhancements infosphere streams version 3 0

Example - ODBCSource

Output tuple schema

-int32 id

-xml xmldata

DB Table

INTEGER (id)

VARCHAR (xmldata)

ODBCSource

SPL Application:

////////////////////////////////////////////////

// Read data from the database

////////////////////////////////////////////////

stream<int32 id, xml xmldata> dbdata = ODBCSource() {

param

connection : “DBXML";

access : “TableWithXML";

connectionDocument : "./etc/connections.xml";

}

connection.xml:

<connection_specifications>

<connection_specification name=“DBXML" >

<ODBC database=“mydb” user=“user" password=“password” />

</connection_specification>

</connection_specifications>

<access_specifications>

<access_specification name=“TableWithXML">

<query query="SELECT * FROM XMLTABLE" replays="1" isolation_level="READ_COMMITTED" />

<uses_connection connection=“DBXML" />

<native_schema>

<column name="id" type="INTEGER" />

<column name=“xmldata" type=“VARCHAR“ length=“15” />

</native_schema>

</access_specification>

</access_specifications>


Toolkit enhancements infosphere streams version 3 0

DB2 pureXML Support

  • If using a DB2 database, the Database Toolkit operators can read from and write to DB2 pureXML table columns

    • Can provide additional XML validation if DB2 is configured for this

    • Can use DB2’s support for the xQuery language to do queries in a Streams application. Example:

      <query query="SELECT XMLQUERY('for $d in $doc/cusInfo return&lt;out&gt;{$d/name}&lt;/out&gt;' passing info as \&quot;doc\&quot;) from PERSONTEST as c where XMLEXISTS ('$i/cusInfo[company=\&quot;IBM\&quot;]' passing c.info as \&quot;i\&quot;)" replays="1" isolation_level="READ_COMMITTED" />

  • DB2 pureXML fields are specified in the connections.xml <column> element as type=“XML”


Toolkit enhancements infosphere streams version 3 0

Example – ODBCSource with DB2 pureXML

Output tuple schema

-int32 id

-xml xmldata

DB Table

INTEGER (id)

XML (xmldata)

ODBCSource

SPL Application:

////////////////////////////////////////////////

// Read data from the database

////////////////////////////////////////////////

stream<int32 id, xml xmldata> dbdata = ODBCSource() {

param

connection : “DBXML";

access : “TableWithXML";

connectionDocument : "./etc/connections.xml";

}

connection.xml:

<connection_specifications>

<connection_specification name=“DBXML" >

<ODBC database=“mydb” user=“user" password=“password” />

</connection_specification>

</connection_specifications>

<access_specifications>

<access_specification name=“TableWithXML">

<query query="SELECT * FROM XMLTABLE" replays="1" isolation_level="READ_COMMITTED" />

<uses_connection connection=“DBXML" />

<native_schema>

<column name="id" type="INTEGER" />

<column name=“xmldata" type=“XML" length=“200”/>

</native_schema>

</access_specification>

</access_specifications>


Toolkit enhancements infosphere streams version 3 0

Database Toolkit – Optional Control Port

  • All Database Toolkit operators that use an ODBC connection now accept an optional “control” input port, which can be used to change operator configuration at runtime

    • Initially, the control port only supports changing the connection password

    • Additional support (e.g. userid, delay time, etc.) may be added in the future

  • The control port tuple schema is a tuple containing exactly two rstring values

    • First attribute contains the pre-defined name of the configuration option being set (e.g. “connection.password”)

    • Second attribute contains the value corresponding to the configuration option being set (e.g. “mynewpassword”)

  • The control port is configured as either port 0 if the operator does not have a required input port, or port 1 if the operator has a required input port


Toolkit enhancements infosphere streams version 3 0

Example – ODBCSource with control port

ODBCSource

source

(output tuples)

Control Port Schema

rstring name

rstring value

SPL Application:

////////////////////////////////////////////////

// Read new password from file

////////////////////////////////////////////////

stream<rstring name, rstring value> configdata = FileSource() {

param

file : “password.csv";

format : csv;

initDelay : 5.0;

output:

configdata : name = “connection.password”;

}

////////////////////////////////////////////////

// Read data from the database

////////////////////////////////////////////////

stream<DBOutputSchenma> dbdata = ODBCSource(configdata) {

param

connection : “DBXML";

access : “TableWithXML";

connectionDocument : "./etc/connections.xml";

}


Database toolkit netezza native loader

Database Toolkit Netezza Native Loader

  • Enhancement to the Database Toolkit in Streams Version 3.0

  • Utilizes Netezza’s External Table interface which allows for high speed data inserts (faster than ODBC)

  • Based on versions of operators in DeveloperWorks Note: interface has changed in the new versions of the operators


New operators

New Operators

  • NetezzaPrepareLoad

    • Takes an input stream (tuple)

    • Generates a delimited string that can be used by NetezzaLoad. Format of string defined by user within a connection.xml file (similar to what is done for ODBCAppend)

  • NetezzaLoad

    • Takes an input stream with one rstring attribute containing the delimited string from NetezzaPrepareLoad

    • Loads records into specified Netezza table


Basic usage

Basic Usage

NetezzaPrepareLoad

source

NetezzaLoad

////////////////////////////////////////////////

// Prepare the string to load

////////////////////////////////////////////////

stream<rstring buf> preparedData = NetezzaPrepareLoad(dataSource) {

param

access : "access1";

escapeCharList : [","];

delimiter : ",";

}

////////////////////////////////////////////////

// Load the record into Netezza

////////////////////////////////////////////////

() as myLoad = NetezzaLoad(preparedData) {

param

connection : "conn1";

access : "access1";

delimiter : ",";

EscapeChar : "\\";

}


Additional use cases example a

Additional Use Cases – Example A

NetezzaLoad

source

NetezzaPrepareLoad

ThreadedSplit

NetezzaLoad

////////////////////////////////////////////////

// Prepare the string to load

////////////////////////////////////////////////

stream<rstring buf> preparedData =

NetezzaPrepareLoad(dataSource) {

param

access : "access1";

escapeCharList : [","];

delimiter : ",";

}

////////////////////////////////////////////////

// Split down two paths

////////////////////////////////////////////////

(stream <rstring buf> preparedData1 ;

stream <rstring buf> preparedData2) =

ThreadedSplit(preparedData) {

param

bufferSize : 1000u;

}

////////////////////////////////////////////////

// Two load operators

////////////////////////////////////////////////

() as myLoad1 = NetezzaLoad(preparedData1) {

param

connection : "conn1";

access : "access1";

delimiter : ",";

EscapeChar : "\\";

}

() as myLoad2 = NetezzaLoad(preparedData2) {

param

connection : "conn1";

access : "access1";

delimiter : ",";

EscapeChar : "\\";

}


Additional use cases example b

Additional Use Cases – Example B

NetezzaPrepareLoad

source

ThreadedSplit

NetezzaLoad

NetezzaPrepareLoad

////////////////////////////////////////////////

// Split down two paths

////////////////////////////////////////////////

(stream <MySchema> dataSource1 ;

stream <MySchema> dataSource2) =

ThreadedSplit(dataSource) {

param

bufferSize : 1000u;

}

////////////////////////////////////////////////

// Two prepare operators

////////////////////////////////////////////////

stream<rstring buf> preparedData1 =

NetezzaPrepareLoad(dataSource1) {

param

access : "access1";

escapeCharList : [","];

delimiter : ",";

}

stream<rstring buf> preparedData2 =

NetezzaPrepareLoad(dataSource2) {

param

access : "access1";

escapeCharList : [","];

delimiter : ",";

}

////////////////////////////////////////////////

// Load the records into Netezza

////////////////////////////////////////////////

() as myLoad =

NetezzaLoad(preparedData1, preparedData2) {

param

connection : "conn1";

access : "access1";

delimiter : ",";

EscapeChar : "\\";

}


Other notes

Other Notes

  • Both Netezza operators contain optional error output ports

  • NetezzaLoad contains optional input port for updating password information


  • Login