1 / 20

Peter Brezany, Ivan Janciak, Alexander W ö hrer, A Min Toja University of Vienna

GridMiner A Framework for Knowledge Discovery on the Grid – from a Vision to Design and Implementation. Peter Brezany, Ivan Janciak, Alexander W ö hrer, A Min Toja University of Vienna Institute for Software Science email: janciak@par.univie.ac.at. GridMiner Overview. Start: Jan. 2003

tambre
Download Presentation

Peter Brezany, Ivan Janciak, Alexander W ö hrer, A Min Toja University of Vienna

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GridMinerA Framework for Knowledge Discoveryon the Grid – from a Vision to Design and Implementation Peter Brezany, Ivan Janciak, Alexander Wöhrer, A Min Toja University of Vienna Institute for Software Science email: janciak@par.univie.ac.at

  2. GridMiner Overview • Start: Jan. 2003 • Host: University of Vienna Vienna University of Technology • Target: • provide tools to discover and access relevant knowledge and information from different distributed and heterogeneous data sources • Test application area: medical • traumatic brain injury treatment • Predicting the outcome of seriously ill patients • analytical part focuses on data mining and On-Line Analytical Processing (OLAP)

  3. Project members

  4. Outline • Motivation/ Requirements • GridMiner Services • Architecture • Dynamic Service Composition Engine • OLAP • Knowledge base • Data Integration • Graphical user interface • Implementation • Summary

  5. The process to cover • Data distributed over participating hospitals • accesses from different platforms (hand held, PC,…) for data generation, querying, analysis • Process needs to access various data sources

  6. GridMiner • Motivation • integrate knowledge discovery and knowledge management as an autonomic system • manage and control whole lifecycle of knowledge • give a strong support to other intelligent entities in their needs for knowledge • Basic Requirements • Ability to access and analyze a huge amount of information – typically heterogeneous and geographically distributed • Intelligent behavior ability to maintain, discover, extend, present and communicate knowledge • High performance (real-time or soft real-time) query processing • High security guarantee

  7. GridMiner Services • Dynamic Workflow Control Service • Data mining services • Sequences (SPADE) • Clustering (SimpleKMeans) • Decision rules (SPRINT) • OLAP (sequential/parallel version) • Association rules on OLAP • Grid Data Mediator Service

  8. GridMiner Architecture User environment Graphical User Interface Knowledge Base Service configuration DSCE Client Web Dynamic service control engine (DSCE) Grid Data Access and Integration Data mining services

  9. Dynamic Service Control Engine • Process a workflow described by DSCL. • Based on the Open Grid Services Architecture • Supports both interactive and batch processing • User independent processing of the workflow • Provision of all intermediate results from the involved services • Full user control during workflow execution • Supports the OGSA Notification Model

  10. Dynamic Service Control Engine (cont.)

  11. Knowledge Base Rules SWRL OWL Facts Web Ontology Language OWL + OWL-S Datatsource Ont. Activity Ontology Datamining Ont. Domain Ontology XML ,XML Schema (XSL) (webrowset,pmml…) Metadata

  12. OLAP • Multidimensional data analysis by sequential and distributed / parallel OLAP engines. • Cube construction and querying • Representation of query results by OLAP Modeling Markup Language • Integration with data mining engines (Association rules on OLAP)

  13. Grid Data Mediation Service Principles • Tight Federation: • global (relational) schema • Virtual integration: • let the data where it is • always up-to-date data • No proprietary solution • inherit well solve aspects from OGSA-DAI • Not bound to special architecture • Supported data sources: • RDBMS (via JDBC), XMLDB (Xindice), CSV files • Operators: “Union all” and “inner join” • Operators are XQuery based (using SAXON)

  14. Data Integration Scenario • Heterogeneities: • Name in A is „First Last“ (as the target format) • Name in C has to be combined • Distribution: • 3 data sources

  15. Data Integration Scenario (cont.) • Query: SELECT p_name FROM patient WHERE id=10 Standard to optimized

  16. Implementation/Technology • Globus 3.2 • OGSA/DAI • GUI – Workflow constructions/Results visualization (JGraph, Java web Start, Java server pages) • Service Configuration (Java server pages/PHP/..) • Knowledge base – (XML,OWL)

  17. Decision Rules (SPRINT) (Select 10k rows) (Select 20k rows) Decision Rules (C45) Database (100k rows) Decision Rules (C45) Data mining Scenario

  18. Graphical User Interface

  19. Summary • Integrated data mining infrastructure • Covers the whole process • Service Oriented Architecture • Implemented Prototype • Project ongoing • New data mining tasks (algorithms) • Knowledge management • More information: http://www.gridminer.org

  20. Thank you Questions?

More Related