1 / 15

Neural, Bayesian, and Evolutionary Systems for High-Performance

Neural, Bayesian, and Evolutionary Systems for High-Performance Computational Knowledge Management: Progress Report. Wednesday, August 4, 1999 William H. Hsu, Ph.D. Automated Learning Group National Center for Supercomputing Applications http://www.ncsa.uiuc.edu/People/bhsu.

Download Presentation

Neural, Bayesian, and Evolutionary Systems for High-Performance

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Neural, Bayesian, and Evolutionary Systemsfor High-Performance Computational Knowledge Management: Progress Report Wednesday, August 4, 1999 William H. Hsu, Ph.D. Automated Learning Group National Center for Supercomputing Applications http://www.ncsa.uiuc.edu/People/bhsu

  2. Overview: T&E Data Modeling • Short-Term Objectives: Building a Data Model • Data Integrity • Rudimentary Entity-Relational Data Model (cf. Oracle 8) • Definition of Prognostic Monitoring Problem • Longer-Term Objectives: Scalable Data Mining from Time Series • Multimodal Sensor Integration • Relevance Determination • Building Causal (Explanatory) Models • Example: Super ADOCS Data Format (SDF) • 1719-channel asynchronous data bus (General Dynamics) • Data types: time (7), ballistics/firing control (~350), fuel (~10), hydraulics (~10), wiring harness/other electrical (~310), spatial/GPS (~60), diagnostics/feedback/command (~750), profilometer (~50), unused (~135) • Engineering units: counts, elapsed time, rates, percent efficiency, etc. • 33 caution/warning channels; internal diagnostics • Analytical Applications: Learning, Inference (Decision Support)

  3. Data Mining: Objectivesfor Testing and Evaluation • Objectives • Scalability: handling disparity in temporal, spatial granularity • Data integrity: verification (formal model) or validation (testing) • Multimodality: ability to integrate knowledge/data sources • Efficiency: consume only the necessary bandwidth for model • Acquisition (data warehousing) • Maintenance (incrementality) • Analysis (interactive, configurable data mining system) • Visualization (transparent user interface) • Applicable Technologies • Selective downsampling: adapting grain size of data model • Data model validation • Simple relational database (RDB) model • Ontology: knowledge base definition, units, abstract data types • Multimodal sensor integration: mixture models for data fusion • Data preparation: selection, synthesis, partitioning of data channels

  4. Data Models and Ontologies(Super ADOCS Data Format) Diagnostic Hazard Ballistics

  5. SubproblemDefinition Partition Evaluator Metric-Based Model Selection Attribute Selection and Partitioning Learning Method ? ? ? Multiattribute Data Set Learning Architecture ? ( Architecture, Method ) Subproblem Data Fusion Overall Prediction Learning Specification Data Mining: Data Fusion Systemfor Testing and Evaluation

  6. Data Mining: Integrated Modeling and Testing (IMT) Information Systems • Application Testbed • Aberdeen Test Center: M1 Abrams main battle tank (SEP data, SDF) • Reliability testing • T&E Information Systems: Common Characteristics • Large-Scale Data Model • Input (M1 A2 SEP): 1.8Mb ~ 459Mb; minutes to hours • Output: 33 caution/warning channels; internal diagnostics • Data Integrity Requirements • Specification of test objective and metrics (in progress) • Generated by end user (e.g., author of test report, instrumentation report) • Multimodality • Selection of relevant data channels (given prediction objective) • Data fusion problem: data channels from different categories • Data Reduction Requirements • Excess bandwidth: non-uniform downsampling (frequency reduction) • Irrelevant data channels (e.g., targeting with respect to excess RPMs)

  7. Relevance Determination Problems inTesting and Evaluation • Problems • Machine learning for decision support and monitoring • Extraction of temporal features • Model selection • Sensor and data fusion • Solutions • Clustering and decomposition of learning tasks • Selection, synthesis, and partitioning of data channels • Approach • Simple relational data model • Relevance determination (importance ranking) for data channels • Multimodal Data Fusion • Hierarchy of time series models • Quantitative (metric-based) model selection

  8. Deployment of KDD and VisualizationComponents • Database Access • SDF (Super ADOCS Data File) import • Flat file export • Internal data model: interaction with learning modules • Deployment • Java stand-alone application • Interactive management of modules, data flow • Presentation: Web-Based Interface • Simple, URL-based invocation system • Common Gateway Interface (CGI) and Perl • Alternative implementation: servlets (http://www.javasoft.com) • Configurable using forms • Messaging Systems (Deployment  Presentation) • Between configurators and deployment layer • Between data management modules and visualization components

  9. Rapid KDD Development Environment NCSA Infrastructure for High-Performance Computation in Data Mining [1]

  10. NCSA Infrastructure for High-Performance Computation in Data Mining [2]

  11. Slave 1 Slave 2 Slave 3 Slave 4 Slave 5 Slave 6 Slave 7 Slave 8 Cluster (Network of Workstations) Modelfor Master/Slave Genetic Wrapper 100-base-T Ethernet • Master • Jenesis: Java-based simple genetic algorithm (sGA) running in master virtual machine (VM) • Load balancing task manager • Message passing communication (TCP/IP, MPI, PVM) NCSA ALG (8-node Beowulf cluster) • Slaves (Linux PCs) • Migratable processes • Replicated data set • MLC++ (machine learning library written in C++)

  12. Progress to Date(Functionality Demonstrated) • SDF Viewer/Editor • Platform-independent user interface • Selection, grouping of attributes • Implementation: Common Gateway Interface/Perl • Work in progress: downsampling; Java servlet version • Demonstration: data format; data dictionary; integrity checking • Data Model/Ontology • Built from SDF data dictionary • General process • Future work: JDBC/SQL, MS Access  Oracle 8 • Attribute Subset Selection System • Workstation cluster • Genetic algorithm (Java/C++) • D2K: Rapid Application Development System for HP Data Mining • Visual programming system • Supports code reuse and interfaces with existing codes (e.g., MLC++)

  13. Current and Future Development • Distributivity • Cluster: Network Of Workstations (NOW) • Supports (task/functionally) parallel, distributed applications • Example: simple genetic algorithm for attribute subset selection • Rapid Application Development Environment • Simple and intuitive user interface • NCSA Data to Knowledge (D2K): visual programming system (Java) • Incorporation of Domain Expert Knowledge • Knowledge engineering and interactive elicitation • Probabilistic knowledge: relevance, inter-attribute relations • Correlation and causality (priors in reliability testing) • Refinement of IMT Plans • Test plan refinement • Instrumentation plan refinement • Future Work: Using Refined Data Model to Improve Data Bus Specification

  14. Time Series Analysis:Development Timeline • Development Vision • Prognostic capability • Monitoring • Data integrity checking, general data model development tools/expertise • Decision support (especially relevance determination) • Development Schedule: Technology Transfer • CY4 (Fiscal 2000) • Time series visualization tool (integrated with model identification tool) • User interface for analytical tool, user training • CY5 (Fiscal 2001) technology transfer: D2K, user training • New personnel • NCSA Automated Learning Group • Aberdeen Test Center • CY4 Action Items for IMT • Model deployment: using time series modeling tools and techniques • Elicitation: subject matter expertise (for training systems)

  15. Decision Support Systems Single-Task Multistrategy Learning Task-Specific Multistrategy Learning Supervised Supervised Unsupervised Unsupervised Definition of New Data Mining Problem(s) Relevant Inputs (Single Objective) Relevant Inputs (Multiple Objectives) Reduction of Inputs Subdivision of Inputs Decomposition Methods Heterogeneous Data (Multiple Sources) Summary: Model Development Process • Model Identification • Queries: test/instrumentation reports • Specification of data model • Grouping of data channels by type • Prediction Objective Identification • Specification of test objective: failure modes, hazard levels • Identification of metrics • Reduction • Refinement of data model • Selection of relevant data channels (given prediction objective) • Synthesis • New data channels that improve prediction quality • Integration • Multiple time series data sources

More Related