Using advanced data mining and integration in environmental risk management
Download
1 / 42

Using Advanced Data Mining and Integration in Environmental Risk Management - PowerPoint PPT Presentation


  • 120 Views
  • Uploaded on

Using Advanced Data Mining and Integration in Environmental Risk Management. Ladislav Hluchy Ondrej Habala , Martin Šeleng, Peter Krammer , Viet Tran Institute of Informatics Slovak Academy of Sciences. Contents. EU FP 7 project ADMIRE – overview Architecture of DMI solution in ADMIRE

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Using Advanced Data Mining and Integration in Environmental Risk Management' - ismael


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Using advanced data mining and integration in environmental risk management

Using Advanced Data Mining and Integration in Environmental Risk Management

LadislavHluchy

OndrejHabala, Martin Šeleng, Peter Krammer, Viet Tran

Institute of Informatics

Slovak Academy of Sciences


Contents Risk Management

  • EU FP7 project ADMIRE – overview

  • Architecture of DMI solution in ADMIRE

  • New DMI process language – DISPEL

  • Pilot application scenarios – ORAVA, RADAR

    • goals, architecture, experimental results

  • Tools in ADMIRE

SAMI 2011, Smolenice, Slovakia, January 2011


ADMIRE - Advanced Data Mining and Integration Research for Europe

  • 7th Framework Program

  • ICT, Call 1.2.A

  • Commenced in February 2008 over 36 months.

  • €4.3 million in costs, and €3 million in EC funding

SAMI 2011, Smolenice, Slovakia, January 2011


Collaborators
Collaborators Europe

  • University of Edinburgh, UK (Coordinator)

    • NeSc - National e-Science Centre

    • EPCC - Edinburgh Parallel Computing Centre

  • Fujitsu Labs of Europe, UK

  • University of Vienna, Austria

    • Institute of Scientific Computing

  • Universidad Politécnica de Madrid, Spain

    • Facultad de Informatica

  • Slovak Academy of Sciences, Slovakia

    • Institute of Informatics

  • ComArch S.A., Poland

SAMI 2011, Smolenice, Slovakia, January 2011


Admire goals
ADMIRE Goals Europe

Accelerate access to and increase the benefits from data exploitation;

Deliver consistent and easy to use technology for extracting information and knowledge;

Cope with complexity, distribution, change and heterogeneity of services, data, and processes, through abstract view of data mining and integration; and

Provide power to users and developers of data mining and integration processes.

SAMI 2011, Smolenice, Slovakia, January 2011


Admire structure
ADMIRE Structure Europe

  • WP1: High-Level Model and Language Research

    • Incremental development of models and languages with a goal of describing Data Mining and Integration (DMI) processes abstractly

  • WP2: Architecture Research

    • Incremental development of a flexible, scalable and open DMI architecture

  • WP3: Platform Support & Delivery

    • Deliver robust service platforms, support users and encapsulate knowledge in a book

  • WP4: Service Infrastructure Development and Enhancement

    • Develop technology and services to enhance the DMI service infrastructure based on Fujitsu’s USMT

SAMI 2011, Smolenice, Slovakia, January 2011


Admire structure1
ADMIRE Structure Europe

  • WP5: Data Mining and Integration Tools Development

    • Develop and integrate tools that make the technology easier to use and reduce the frequency of failures

  • WP6: Integrated Applications

    • Demonstration of validation and performance of architecture, language, platform and tools as an integrated environment for Data Mining and Integration

  • WP7: Project Management

    • Management and coordination of the project

SAMI 2011, Smolenice, Slovakia, January 2011


Admire architecture separation of concerns
ADMIRE Architecture: EuropeSeparation of Concerns

SAMI 2011, Smolenice, Slovakia, January 2011


Admire architecture
ADMIRE Architecture Europe

SAMI 2011, Smolenice, Slovakia, January 2011


Dispel data intensive systems process engineering language
DISPEL – EuropeData Intensive Systems Process-Engineering Language

  • Data-intensive distributed systems

  • Connection point of complex application requests and complex enactment systems

    • Benefit: method development, engineering and evolution of supported practices can take place independently in each world

  • Describes enactment requests for streaming-data workflows processes

  • “Process-engineering time” – transform and optimize process in preparation for enactment period

SAMI 2011, Smolenice, Slovakia, January 2011


Dispel simple example
DISPEL: Simple Example Europe

Creating streams of literals

String sql1 = "SELECT * FROM some_table";

String sql2 = “SELECT * FROM table2”;

String resource = "128.18.128.255";

SQLQuery query = new SQLQuery;

|- sql1, sql2 -| => query.expression;

|- resource -| => query.resource;

Tee tee = new Tee;

query.result => tee.connectInput;

Creating connections

SAMI 2011, Smolenice, Slovakia, January 2011


Dispel real use
DISPEL Europe– real use

SAMI 2011, Smolenice, Slovakia, January 2011


Admire s high level architecture
ADMIRE’s High-Level Architecture Europe

SAMI 2011, Smolenice, Slovakia, January 2011


Admire gateways
ADMIRE Gateways Europe

USMT

SAMI 2011, Smolenice, Slovakia, January 2011


Security
Security Europe

  • Framework built on top of formal Grid Infrastructure, available security mechanisms include:

    • Transport level security: SSL, HTTPs, (currently available)

    • Message level security: Web Services Security: SOAP Message Security

    • X509 certificate authentification

    • Multiple stakeholder authorization

    • Explicit Trust Delegation (ETD)

SAMI 2011, Smolenice, Slovakia, January 2011


Pilot applications
Pilot Applications Europe

  • Admire has 2 pilot applications

    • CRM

    • FloodApp

  • FloodApp

    • Orava

    • Radar

    • SVP

SAMI 2011, Smolenice, Slovakia, January 2011


Acrm application
ACRM Application Europe

  • Large-scale, distributed Churn scenario

    • 4 database parts, distributed among ADMIRE partners

    • Graphical UI for business

      analysts

    • Using ADMIRE workbench,

      DISPEL and framework

      to create predictions

      of customer churn

  • Mining over distributed data

SAMI 2011, Smolenice, Slovakia, January 2011


Flood application data sets used in hydrological scenarios
Flood Application EuropeData sets used in hydrological scenarios

SAMI 2011, Smolenice, Slovakia, January 2011

FSKD 2010

Yantai, China, August 10-12

19


Scenarios deployment in testbed
Scenarios deployment in testbed Europe

Two scenarios (ORAVA, RADAR) completely deployed in testbed

Other scenario’s data are partially deployed

5 nodes (1 real + 4 virtual nodes)

Databases (MySQL + PostgreSQL), GRIB files in file storage

USMT (Unified System Management Technology - Jetty container), OGSA-DAI (Apache Tomcat)

SAMI 2011, Smolenice, Slovakia, January 2011


Orava scenario
Orava scenario Europe

  • Legend

    • Green area – Orava (part of north Slovakia)

    • Blue – Orava reservoir and local rivers

    • Red dots– hydrological measurement stations

  • Notes

    • We are interested only on hydrological stations below the Orava reservoir

    • In our tests we will use the hydrological station 5830 (Tvrdosin)

SAMI 2011, Smolenice, Slovakia, January 2011


Orava data mining concept
ORAVA – data mining concept Europe

  • Targets – water level and temperature at a station below the reservoir

Targets of data mining

Given in a schedule

Predicted by a meteo model

Predictors – rainfall amount (reservoir and station), air temperature (reservoir and station), reservoir discharge, reservoir temperature

SAMI 2011, Smolenice, Slovakia, January 2011


Orava data integration
ORAVA – data integration Europe

  • Integration of data from

    • GRIB files

    • Reservoirs

  • Inputs

    • Time period of experiment

    • Reservoir ID

    • List of hydro stations

    • Geo coordinates

SAMI 2011, Smolenice, Slovakia, January 2011


Orava data sets
ORAVA – data sets Europe

SAMI 2011, Smolenice, Slovakia, January 2011


Orava integrated and preprocessed data
ORAVA – integrated and preprocessed data Europe

Time

ReplaceMissingValues Filter

LinearTrend Filter

ZeroEpsilon Filter

Kelvin2Celsius Filter

Integrated preprocessed data

Time

Integrated raw data

SAMI 2011, Smolenice, Slovakia, January 2011


Orava data mining
ORAVA – data mining Europe

  • Input - Integrated data

  • Data Mining Phases:

    • Data understanding

      • Data visualization

      • Data quality exploration

    • Data preparation

      • Missing values substitution (ReplaceMissingValues filter)

      • Noise reduction (ZeroEpsilon filter)

      • Switching from one scale to another (Kelvin2Celsius filter)

      • Data modifying (LinearTrend filter)

    • Model training

      • Training on historical data (8760 records)

      • Linear Regression model

      • Neural networks - multilayer perceptron without hidden layers

    • Model Evaluation

      • Testing of the trained model

      • N-fold cross validation

      • Using training sets

    • Output - Prediction model

  • SAMI 2011, Smolenice, Slovakia, January 2011


    Orava data mining results prediction of temperature
    Orava – data mining results Europeprediction of temperature

    Linear Regression model equation:

    SAMI 2011, Smolenice, Slovakia, January 2011


    Orava temperature prediction model comparison
    Orava Europe – temperature prediction model comparison

    SAMI 2011, Smolenice, Slovakia, January 2011


    Orava prediction of water level
    Orava – prediction of water level Europe

    • Neural network model – multilayer perceptron

    • Input parameters (6)

      • Rainfall ([S+1]), Water-Level ([X])

      • Outflows ([D], [D+1] – [D], ln([D]), sqrt([D]))

    • Output

      • Difference

        of water

        level

        ([X+1] – [X])

    SAMI 2011, Smolenice, Slovakia, January 2011


    Orava water level prediction
    Orava Europe – water level prediction

    Data count : 8735 records

    Activation function of the feed-forward

    neural network: sigmoid

    Correlation coefficient: 0.9816

    Mean absolute error :0.4105

    Root mean squared err.:0.9673

    Relative absolute error :

    30.5869 % (from difference)

    Root relative squared error

    19.2384 % (from difference)

    SAMI 2011, Smolenice, Slovakia, January 2011


    Radar
    RADAR Europe

    Targets of data mining

    • Very short-term rainfall prediction from weather radar data

      • Movement of areas with higher air moisture content, and thus also higher precipitation potential

    • Mining of matrices of data

    SAMI 2011, Smolenice, Slovakia, January 2011

    31


    Meteorologic data
    Meteorologic data Europe

    • Networkofsynopticstations in Slovakia

      • 27 stations in Slovakia

      • Useddatafromyear 2007, 2008

      • Rainfall, humidity, atmospheric

        pressure and temperature

        valuesfor

        eachhour

    SAMI 2011, Smolenice, Slovakia, January 2011


    Radar isotonic model
    RADAR isotonic model Europe

    • Actual model for rainfall prediction

      • Isotonic reggresion model structure

      • Training on historical data

      • Correlation coefficient 0.4593

      • Mean absolute error 0.1105

      • Root mean squared error 0.5490

      • Total Number of Instances 89700

      • Validation 10 Cross Fold

    SAMI 2011, Smolenice, Slovakia, January 2011


    Table of isotonic model
    Table of isotonic model Europe

    SAMI 2011, Smolenice, Slovakia, January 2011


    Hydrometeorological performance
    Hydrometeorological performance Europe

    Probability of detection with threshold 0,3 and 0,6 mm rainfall per hour:

    • POD0,3 = 63,87 %

    • POD0,6= 56,22 %

      Miss rate with threshold 0,3 and 0,6 mm rainfall per hour:

    • MR0,3 = 1,85 %

    • MR0,6 = 1,58 %

    SAMI 2011, Smolenice, Slovakia, January 2011


    Radar model
    RADAR model Europe

    • Other tested models

      • Neural networks, SMOreg, linear regression, ...

      • Reached correlation coeficient between 0,35 and 0,42

      • Validation - 10 Cross Fold

        Problems in model creation :

      • process is significantly stochastic

      • Some input variables are backwards dependenton output

      • Meteorological process is very sensitive

      • Reflection matrix represents quantity of water in atmosphere,

        not exact rainfall rate in specified area, as opposed to data from synoptic stations

    SAMI 2011, Smolenice, Slovakia, January 2011


    Admire tools
    ADMIRE Tools Europe

    Registry client GUI

    Process designer

    SKSA

    Gateway Process Manager

    DMI Model Visualizer

    SAMI 2011, Smolenice, Slovakia, January 2011


    Registry client gui
    Registry client GUI Europe

    Read-only access to ADMIRE Registry

    list PEs and view their properties

    search, sort PEs

    Write access to Registry is done via DISPEL documents

    SAMI 2011, Smolenice, Slovakia, January 2011


    Process designer
    Process Designer Europe

    Manage your DMI project (files, directories – project structure)

    Select elements from the Registry

    View the canonical (DISPEL) representation of your DMI process in real time

    View the properties of your chosen elements

    Edit your DMI process graphically

    SAMI 2011, Smolenice, Slovakia, January 2011


    Semantic knowledge sharing assistant
    Semantic Knowledge Sharing Assistant Europe

    Context the user works in

    Several reservoirs, one settlement

    Knowledge that may be useful in this context

    previously entered by other users

    Provides access to existing user’s knowledge, sorting and selecting it automatically according to the user’s current working context

    SAMI 2011, Smolenice, Slovakia, January 2011


    Gateway process manager
    Gateway Process Manager Europe

    Keep track of running processes

    stop/pause/cancel the process

    view the process’ source DISPEL

    access process’ results (if available) in several ways – raw or visualized

    SAMI 2011, Smolenice, Slovakia, January 2011


    Dmi model visualizer
    DMI Model Visualizer Europe

    Visualization of data mining models

    Read Weka classifier object

    produce PMML (Predictive Model MarkupLanguage) description of the model

    Show the PMML as a graphical tree

    SAMI 2011, Smolenice, Slovakia, January 2011


    Admire project
    Admire Project Europe

    Thank you for attention.

    SAMI 2011, Smolenice, Slovakia, January 2011


    ad