using advanced data mining and integration in environmental risk management
Download
Skip this Video
Download Presentation
Using Advanced Data Mining and Integration in Environmental Risk Management

Loading in 2 Seconds...

play fullscreen
1 / 42

Using Advanced Data Mining and Integration in Environmental Risk Management - PowerPoint PPT Presentation


  • 120 Views
  • Uploaded on

Using Advanced Data Mining and Integration in Environmental Risk Management. Ladislav Hluchy Ondrej Habala , Martin Šeleng, Peter Krammer , Viet Tran Institute of Informatics Slovak Academy of Sciences. Contents. EU FP 7 project ADMIRE – overview Architecture of DMI solution in ADMIRE

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Using Advanced Data Mining and Integration in Environmental Risk Management' - ismael


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
using advanced data mining and integration in environmental risk management

Using Advanced Data Mining and Integration in Environmental Risk Management

LadislavHluchy

OndrejHabala, Martin Šeleng, Peter Krammer, Viet Tran

Institute of Informatics

Slovak Academy of Sciences

slide2

Contents

  • EU FP7 project ADMIRE – overview
  • Architecture of DMI solution in ADMIRE
  • New DMI process language – DISPEL
  • Pilot application scenarios – ORAVA, RADAR
    • goals, architecture, experimental results
  • Tools in ADMIRE

SAMI 2011, Smolenice, Slovakia, January 2011

slide3

ADMIRE - Advanced Data Mining and Integration Research for Europe

  • 7th Framework Program
  • ICT, Call 1.2.A
  • Commenced in February 2008 over 36 months.
  • €4.3 million in costs, and €3 million in EC funding

SAMI 2011, Smolenice, Slovakia, January 2011

collaborators
Collaborators
  • University of Edinburgh, UK (Coordinator)
    • NeSc - National e-Science Centre
    • EPCC - Edinburgh Parallel Computing Centre
  • Fujitsu Labs of Europe, UK
  • University of Vienna, Austria
    • Institute of Scientific Computing
  • Universidad Politécnica de Madrid, Spain
    • Facultad de Informatica
  • Slovak Academy of Sciences, Slovakia
    • Institute of Informatics
  • ComArch S.A., Poland

SAMI 2011, Smolenice, Slovakia, January 2011

admire goals
ADMIRE Goals

Accelerate access to and increase the benefits from data exploitation;

Deliver consistent and easy to use technology for extracting information and knowledge;

Cope with complexity, distribution, change and heterogeneity of services, data, and processes, through abstract view of data mining and integration; and

Provide power to users and developers of data mining and integration processes.

SAMI 2011, Smolenice, Slovakia, January 2011

admire structure
ADMIRE Structure
  • WP1: High-Level Model and Language Research
    • Incremental development of models and languages with a goal of describing Data Mining and Integration (DMI) processes abstractly
  • WP2: Architecture Research
    • Incremental development of a flexible, scalable and open DMI architecture
  • WP3: Platform Support & Delivery
    • Deliver robust service platforms, support users and encapsulate knowledge in a book
  • WP4: Service Infrastructure Development and Enhancement
    • Develop technology and services to enhance the DMI service infrastructure based on Fujitsu’s USMT

SAMI 2011, Smolenice, Slovakia, January 2011

admire structure1
ADMIRE Structure
  • WP5: Data Mining and Integration Tools Development
    • Develop and integrate tools that make the technology easier to use and reduce the frequency of failures
  • WP6: Integrated Applications
    • Demonstration of validation and performance of architecture, language, platform and tools as an integrated environment for Data Mining and Integration
  • WP7: Project Management
    • Management and coordination of the project

SAMI 2011, Smolenice, Slovakia, January 2011

admire architecture separation of concerns
ADMIRE Architecture: Separation of Concerns

SAMI 2011, Smolenice, Slovakia, January 2011

admire architecture
ADMIRE Architecture

SAMI 2011, Smolenice, Slovakia, January 2011

dispel data intensive systems process engineering language
DISPEL – Data Intensive Systems Process-Engineering Language
  • Data-intensive distributed systems
  • Connection point of complex application requests and complex enactment systems
    • Benefit: method development, engineering and evolution of supported practices can take place independently in each world
  • Describes enactment requests for streaming-data workflows processes
  • “Process-engineering time” – transform and optimize process in preparation for enactment period

SAMI 2011, Smolenice, Slovakia, January 2011

dispel simple example
DISPEL: Simple Example

Creating streams of literals

String sql1 = "SELECT * FROM some_table";

String sql2 = “SELECT * FROM table2”;

String resource = "128.18.128.255";

SQLQuery query = new SQLQuery;

|- sql1, sql2 -| => query.expression;

|- resource -| => query.resource;

Tee tee = new Tee;

query.result => tee.connectInput;

Creating connections

SAMI 2011, Smolenice, Slovakia, January 2011

dispel real use
DISPEL – real use

SAMI 2011, Smolenice, Slovakia, January 2011

admire s high level architecture
ADMIRE’s High-Level Architecture

SAMI 2011, Smolenice, Slovakia, January 2011

admire gateways
ADMIRE Gateways

USMT

SAMI 2011, Smolenice, Slovakia, January 2011

security
Security
  • Framework built on top of formal Grid Infrastructure, available security mechanisms include:
    • Transport level security: SSL, HTTPs, (currently available)
    • Message level security: Web Services Security: SOAP Message Security
    • X509 certificate authentification
    • Multiple stakeholder authorization
    • Explicit Trust Delegation (ETD)

SAMI 2011, Smolenice, Slovakia, January 2011

pilot applications
Pilot Applications
  • Admire has 2 pilot applications
    • CRM
    • FloodApp
  • FloodApp
    • Orava
    • Radar
    • SVP

SAMI 2011, Smolenice, Slovakia, January 2011

acrm application
ACRM Application
  • Large-scale, distributed Churn scenario
    • 4 database parts, distributed among ADMIRE partners
    • Graphical UI for business

analysts

    • Using ADMIRE workbench,

DISPEL and framework

to create predictions

of customer churn

  • Mining over distributed data

SAMI 2011, Smolenice, Slovakia, January 2011

flood application data sets used in hydrological scenarios
Flood ApplicationData sets used in hydrological scenarios

SAMI 2011, Smolenice, Slovakia, January 2011

FSKD 2010

Yantai, China, August 10-12

19

scenarios deployment in testbed
Scenarios deployment in testbed

Two scenarios (ORAVA, RADAR) completely deployed in testbed

Other scenario’s data are partially deployed

5 nodes (1 real + 4 virtual nodes)

Databases (MySQL + PostgreSQL), GRIB files in file storage

USMT (Unified System Management Technology - Jetty container), OGSA-DAI (Apache Tomcat)

SAMI 2011, Smolenice, Slovakia, January 2011

orava scenario
Orava scenario
  • Legend
    • Green area – Orava (part of north Slovakia)
    • Blue – Orava reservoir and local rivers
    • Red dots– hydrological measurement stations
  • Notes
    • We are interested only on hydrological stations below the Orava reservoir
    • In our tests we will use the hydrological station 5830 (Tvrdosin)

SAMI 2011, Smolenice, Slovakia, January 2011

orava data mining concept
ORAVA – data mining concept
  • Targets – water level and temperature at a station below the reservoir

Targets of data mining

Given in a schedule

Predicted by a meteo model

Predictors – rainfall amount (reservoir and station), air temperature (reservoir and station), reservoir discharge, reservoir temperature

SAMI 2011, Smolenice, Slovakia, January 2011

orava data integration
ORAVA – data integration
  • Integration of data from
    • GRIB files
    • Reservoirs
  • Inputs
    • Time period of experiment
    • Reservoir ID
    • List of hydro stations
    • Geo coordinates

SAMI 2011, Smolenice, Slovakia, January 2011

orava data sets
ORAVA – data sets

SAMI 2011, Smolenice, Slovakia, January 2011

orava integrated and preprocessed data
ORAVA – integrated and preprocessed data

Time

ReplaceMissingValues Filter

LinearTrend Filter

ZeroEpsilon Filter

Kelvin2Celsius Filter

Integrated preprocessed data

Time

Integrated raw data

SAMI 2011, Smolenice, Slovakia, January 2011

orava data mining
ORAVA – data mining
    • Input - Integrated data
  • Data Mining Phases:
    • Data understanding
      • Data visualization
      • Data quality exploration
    • Data preparation
      • Missing values substitution (ReplaceMissingValues filter)
      • Noise reduction (ZeroEpsilon filter)
      • Switching from one scale to another (Kelvin2Celsius filter)
      • Data modifying (LinearTrend filter)
    • Model training
      • Training on historical data (8760 records)
      • Linear Regression model
      • Neural networks - multilayer perceptron without hidden layers
    • Model Evaluation
      • Testing of the trained model
      • N-fold cross validation
      • Using training sets
    • Output - Prediction model

SAMI 2011, Smolenice, Slovakia, January 2011

orava data mining results prediction of temperature
Orava – data mining resultsprediction of temperature

Linear Regression model equation:

SAMI 2011, Smolenice, Slovakia, January 2011

orava temperature prediction model comparison
Orava – temperature prediction model comparison

SAMI 2011, Smolenice, Slovakia, January 2011

orava prediction of water level
Orava – prediction of water level
  • Neural network model – multilayer perceptron
  • Input parameters (6)
    • Rainfall ([S+1]), Water-Level ([X])
    • Outflows ([D], [D+1] – [D], ln([D]), sqrt([D]))
  • Output
    • Difference

of water

level

([X+1] – [X])

SAMI 2011, Smolenice, Slovakia, January 2011

orava water level prediction
Orava – water level prediction

Data count : 8735 records

Activation function of the feed-forward

neural network: sigmoid

Correlation coefficient: 0.9816

Mean absolute error :0.4105

Root mean squared err.:0.9673

Relative absolute error :

30.5869 % (from difference)

Root relative squared error

19.2384 % (from difference)

SAMI 2011, Smolenice, Slovakia, January 2011

radar
RADAR

Targets of data mining

  • Very short-term rainfall prediction from weather radar data
    • Movement of areas with higher air moisture content, and thus also higher precipitation potential
  • Mining of matrices of data

SAMI 2011, Smolenice, Slovakia, January 2011

31

meteorologic data
Meteorologic data
  • Networkofsynopticstations in Slovakia
    • 27 stations in Slovakia
    • Useddatafromyear 2007, 2008
    • Rainfall, humidity, atmospheric

pressure and temperature

valuesfor

eachhour

SAMI 2011, Smolenice, Slovakia, January 2011

radar isotonic model
RADAR isotonic model
  • Actual model for rainfall prediction
    • Isotonic reggresion model structure
    • Training on historical data
    • Correlation coefficient 0.4593
    • Mean absolute error 0.1105
    • Root mean squared error 0.5490
    • Total Number of Instances 89700
    • Validation 10 Cross Fold

SAMI 2011, Smolenice, Slovakia, January 2011

table of isotonic model
Table of isotonic model

SAMI 2011, Smolenice, Slovakia, January 2011

hydrometeorological performance
Hydrometeorological performance

Probability of detection with threshold 0,3 and 0,6 mm rainfall per hour:

  • POD0,3 = 63,87 %
  • POD0,6= 56,22 %

Miss rate with threshold 0,3 and 0,6 mm rainfall per hour:

  • MR0,3 = 1,85 %
  • MR0,6 = 1,58 %

SAMI 2011, Smolenice, Slovakia, January 2011

radar model
RADAR model
  • Other tested models
    • Neural networks, SMOreg, linear regression, ...
    • Reached correlation coeficient between 0,35 and 0,42
    • Validation - 10 Cross Fold

Problems in model creation :

    • process is significantly stochastic
    • Some input variables are backwards dependenton output
    • Meteorological process is very sensitive
    • Reflection matrix represents quantity of water in atmosphere,

not exact rainfall rate in specified area, as opposed to data from synoptic stations

SAMI 2011, Smolenice, Slovakia, January 2011

admire tools
ADMIRE Tools

Registry client GUI

Process designer

SKSA

Gateway Process Manager

DMI Model Visualizer

SAMI 2011, Smolenice, Slovakia, January 2011

registry client gui
Registry client GUI

Read-only access to ADMIRE Registry

list PEs and view their properties

search, sort PEs

Write access to Registry is done via DISPEL documents

SAMI 2011, Smolenice, Slovakia, January 2011

process designer
Process Designer

Manage your DMI project (files, directories – project structure)

Select elements from the Registry

View the canonical (DISPEL) representation of your DMI process in real time

View the properties of your chosen elements

Edit your DMI process graphically

SAMI 2011, Smolenice, Slovakia, January 2011

semantic knowledge sharing assistant
Semantic Knowledge Sharing Assistant

Context the user works in

Several reservoirs, one settlement

Knowledge that may be useful in this context

previously entered by other users

Provides access to existing user’s knowledge, sorting and selecting it automatically according to the user’s current working context

SAMI 2011, Smolenice, Slovakia, January 2011

gateway process manager
Gateway Process Manager

Keep track of running processes

stop/pause/cancel the process

view the process’ source DISPEL

access process’ results (if available) in several ways – raw or visualized

SAMI 2011, Smolenice, Slovakia, January 2011

dmi model visualizer
DMI Model Visualizer

Visualization of data mining models

Read Weka classifier object

produce PMML (Predictive Model MarkupLanguage) description of the model

Show the PMML as a graphical tree

SAMI 2011, Smolenice, Slovakia, January 2011

admire project
Admire Project

Thank you for attention.

SAMI 2011, Smolenice, Slovakia, January 2011

ad