1 / 31

Central Data Exchange Environmental Information Exchange Network

Central Data Exchange Environmental Information Exchange Network. Exchange Network Enhancements By David Fladung April 19, 2006. Agenda. CDX Overview Open Source Utilization Data Transformation (Mapper) Business Process Execution Language (BPEL) Rich User Interface (RUI) client

maxim
Download Presentation

Central Data Exchange Environmental Information Exchange Network

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Central Data ExchangeEnvironmental Information Exchange Network Exchange Network Enhancements By David Fladung April 19, 2006

  2. Agenda • CDX Overview • Open Source Utilization • Data Transformation (Mapper) • Business Process Execution Language (BPEL) • Rich User Interface (RUI) client • Geographic Data Interaction

  3. CDX Overview

  4. CDX Overview

  5. Open Source Utilization • CDX utilizes about 50 open source products/frameworks • JBoss (Wind River Node application server) • PostgreSQL (Wind River Node database) • Struts (Model View Controller [MVC]) • Hibernate (Object Relational Mapping [ORM]) • Axis (WS engine and libraries) • Maven (build and release management) • AspectJ (quality of service) • StAX (streaming parsing of large XML) • Velocity (templating/mapping) • Quartz (job scheduling) • ActiveBPEL (business process management)

  6. Open Source Utilization Yellow – current open source implementation Grey – potential for open source implementation White – not applicable

  7. Open Source Utilization • Advantages • Low Total Cost of Ownership (TCO) • Rich user community • Adequate documentation • Proven performance • Promotes rapid development • Easy to integrate • Disadvantages • Potential that product may no longer be supported • Advanced support may require cost

  8. Data Transformation • Convert from one data format to another • XML • Flat file (i.e. delimited) • Database • Handle large file sizes • Use streaming approach rather than in memory • Provide a robust and reusable interface • Standard configuration files • Standard APIs • Reusable across multiple tiers

  9. Data Transformation • TRI OUT – flat file to XML • NC Node – database to XML for Beaches and NEI data • Puerto Rico Node – flat file to XML for AQS data • Wind River Node – database to XML for AQS • Geo Toolkit for Region 5 – XML to XML for Geo data • EnviroFlash – flat file to unstructured email (text) • TRIME (XML to database) • Water Sentinel (database to XML, XML to database) • GLNPO (database to Excel, database to XML)

  10. Data Transformation Yellow – current use of mapper implementation White – not applicable

  11. Data Transformation • Architecture • Mapping engine • Run the transformation process • Built on the Velocity open source project • Configuration files • Mapping instructions • Location of the data sources and data targets • Conditional logic, custom methods • Custom Java methods - provides the custom transformation such as data formatting. • Pluggable readers • Pluggable writers

  12. Data Transformation • Mapping steps • Logical mapping • The process of analyzing the data source and the data target and creating the document that specifies the relations between the source and target fields. • If the data source is relational database, this process includes developing the query to extract the data from the database. • Physical mapping - the process of creating the configuration files to implement the logical mapping specifications. • Custom methods (if needed)

  13. Data Transformation • Database to XML (Puerto Rico Node) • ## Database Query • #set ($sqlQuery = "select distinct TRANSACTION_TYPE, ACTION_CODE, STATE_CODE, COUNTY_CODE, SITE_ID from ${tableName}RA where ACTION_CODE = 'D' and TRANSACTION_TYPE = 'RA'") • ## Set Reader properties • #set ($tmp = $MapperEngine.setMapReaderProperty('SQL_COMMAND', $sqlQuery ) ) • #set ($tmp = $MapperEngine.setMapReaderProperty('ENCODING', 'XML_ENCODING') ) • ## Loop for each record in result set • #foreach($row in $MapperEngine.getIterator()) • ## Write XML • <aqs:ActionRawDataDelete> • <aqs:SiteIdentifierDetails> • ## Use value from record as a variable • <aqs:StateCode>$!row.STATE_CODE</aqs:StateCode> • <aqs:CountyCode>$PRFunctions.getNumberDigitStr($!row.COUNTY_CODE , 3)</aqs:CountyCode> • <aqs:SiteNumber>$PRFunctions.getNumberDigitStr($!row.SITE_ID , 4)</aqs:SiteNumber> • </aqs:SiteIdentifierDetails> • ## Call subsequent execution • #set( $config = $MapperEngine.createMapperConfiguration() ) • #set ($tmp = $!config.ContextConfig.put( 'SITE_ID', $!row.SITE_ID )) • #set ($tmp = $!config.ContextConfig.put( 'tableName', $tableName )) • #set ($tmp = $!config.ContextConfig.put( 'subs', 'PRMonitorDeleteRAMap' )) • $MapperEngine.subExecute('MapperServices/PR/PRDBReadConfig.vm', 'MapperServices/PR/PRMonitorDeleteRAMap.vm', $config) • </aqs:ActionRawDataDelete> • #end

  14. Data Transformation • Flat file to unstructured text through custom Java (EnviroFlash) • ## Column names for delimited text file • $MapperEngine.setMapReaderProperty('COL_NAMES_LIST',['CITY','COUNTY','STATE','UV_INDEX','UV_ALERT']) • ## Delimiter • $MapperEngine.setMapReaderProperty('DELIMITER','\|') • ## Loop for all records in text file • #foreach($row in $MapperEngine.getIterator()) • #if($templateCallback.isCitySubscribedTo($row.STATE, $row.CITY, $row.COUNTY)) • ## Use values from record as variable • #set( $config = $MapperEngine.createMapperConfiguration() ) • #set ($tmp = $!config.ContextConfig.put( 'CITY', $row.CITY ) ) • #set ($tmp = $!config.ContextConfig.put( 'COUNTY', $row.COUNTY ) ) • #set ($tmp = $!config.ContextConfig.put( 'STATE', $row.STATE ) ) • #set ($tmp = $!config.ContextConfig.put( 'UV_INDEX', $row.UV_INDEX ) ) • #set ($tmp = $!config.ContextConfig.put( 'UV_ALERT', $row.UV_ALERT ) ) • #set ($tmp = $!config.ContextConfig.put( 'subscriberURL', $subscriberURL ) ) • #set ($tmp = $!config.ContextConfig.put( 'environmentName', $environmentName ) ) • #set ($tmp = $MapperEngine.subExecute('gov/epa/cdx/enviroflash/uv/templates/writeUVMailConfig.vm', 'gov/epa/cdx/enviroflash/uv/templates/writeUVMailMap.vm', $config) ) • #set ($outMail = $!MapperEngine.getObjectCacheMap().get('OUT_MAIL') ) • #set ($tmp = $templateCallback.sendEmail($outMail, $row.STATE, $row.CITY, $row.COUNTY, $row.UV_ALERT) ) • #end • #end

  15. Data Transformation • Advantages • Provides an ability to concentrate mapping logic within the configuration file and custom methods. • Provides ability to handle several data source types. • Provides an ability to decouple readers and writers. • Provides streaming capabilities to handle large size files (tested against 680 MB). • Provides an ability to use custom Java methods. • Does not require license fee. • Requires minimum coding. • Superior performance compared to commercial tools (XAware, BEA Liquid Data) - 30 times faster on large data sets. • Uses streaming approach for low memory overhead.

  16. BPEL • BPEL is a standard for orchestrating Web Services. • XML based description of a business process • Contains references to supporting WSDL files • Portable between BPEL engines • BPEL allows for a formal specification of business processes. • BPEL meshes well with Service Oriented Architectures (SOA). • BPEL provides several useful constructs • Transaction context management • Synchronous and asynchronous web service invocation and response • Conditional branching • Parallel flow activities • Fault handling and exception invocation

  17. BPEL

  18. BPEL • BPEL within CDX • Motivations • Can it simplify the design of existing dataflows? • Can it reduce the cost of dataflow development? • Can it speed up the process of integrating CDX Web and Node applications? • Can it provide better visibility into existing flows? • Goals • Identify a target platform. • Demonstrate feasibility of deployment/integration. • Demonstrate ability to reuse existing CDX services. • Determine if BPEL allows for quick development of dataflow components.

  19. BPEL • Prototype specifics • Exposed generic CDX services (Java) as Web Services • XML validation • Retrieval of transaction/document metadata • Created a CDX Services project to host the web services • Model existing National Emissions Inventory (NEI) dataflow. • Enhance CDX infrastructure to support use of BPEL orchestration. • Configure a production-like environment to host the services. • Deploy ActiveBPEL engine (deployed within Tomcat) • Set up persistence of processes (Oracle DMBS)

  20. BPEL

  21. BPEL

  22. BPEL

  23. BPEL

  24. BPEL • Findings • BPEL prototype demonstrates feasibility in the EPA environment. • Appears that cost savings could be realized for future flows as the CDX service suite increases, however, it is not yet clear what the savings are. • Learning curve is not insignificant • Tools have not yet reached full maturity.

  25. RUI Client • Guidelines • Provide more features/capabilities than a web application is capable of delivering. • Provide flexible configuration for interaction with multiple Nodes. • Support all existing Exchange Network Web Services and dataflows. • Provide pluggable transformation/visualization for multiple dataflows (Mapper, XML binding). • Use NAAS for authentication/authorization.

  26. RUI Client

  27. RUI Client

  28. RUI Client

  29. RUI Client

  30. RUI Client • Current capabilities • Supports submit, download, and transaction history search • Supports configurable data transformation • Supports NAAS authentication/authorization • Future capabilities • Support query and data visualization • Add ability to sign/encrypt documents (CROMERR)

  31. Geographic Data Interaction • Some dataflows have geographic data (e.g. FRS) • Provide the capability to visualize data • Provide the capability to update the data • API’s exist for addressing geographic data • Google Maps • ESRI products suite • CDX approach • Integrate Google Maps API into CDX web applications • Provide end to end solution for querying and updating data

More Related