1 / 12

Mining Large Data at SDSC

Mining Large Data at SDSC. Natasha Balac , Ph.D. Geosciences. Data Management and Mining. Modeling and Simulation. A Deluge of Data. Today, data comes from everywhere Scientific instruments Experiments Sensors and sensor nets New devices And is used by everyone Scientists Consumers

ace
Download Presentation

Mining Large Data at SDSC

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mining Large Data at SDSC Natasha Balac, Ph.D.

  2. Geosciences Data Managementand Mining Modeling and Simulation A Deluge of Data • Today, data comes from everywhere • Scientific instruments • Experiments • Sensors and sensor nets • New devices • And is used by everyone • Scientists • Consumers • Educators • General public • IT environments must support unprecedented diversity, globalization, integration, scale, and use • Turning the deluge of data into usable information requires an unprecedented level of integration, globalization, scale, and access Life Sciences Preservationand Archiving Astronomy

  3. Why DATA MINING? • Necessity is mother of invention • Huge amounts of data • Electronic records of our decisions • Choices in the supermarket • Financial records • Our comings and goings • We swipe our way through the world – every swipe is a record in a database • Data rich – but information poor • Lying hidden in all this data is information!

  4. What is DATA MINING? • Extracting or “mining” knowledge from large amounts of data • Data-driven discovery and modeling of hidden patterns (we never new existed) in large volumes of data • Extraction of implicit, previously unknown and unexpected, potentially extremely useful information from data • Fundamental idea: learn rules/patterns/relationships automatically from the data

  5. Terminology • Gold Mining vs. Sand Mining • Knowledge mining from databases • Knowledge extraction • Data/pattern analysis • Knowledge Discovery Databases (KDD) • Predictive Modeling • Machine Learning • Business Intelligence

  6. CRISP-DM (Cross Industry Standard Process for Data Mining) CRISP-DM Process Model

  7. Data Mining Driven Engineering Product Design • Incorporate parallel computing and data mining capabilities into engineering and optimizing product design models • Complex challenges new product design • accurate acquisition/ interpretation of raw customer data • Integrating newly found knowledge in the engineering design process • developing analytical techniques that help reduce the computational time required to generate product portfolios. • Mining paid search on-line customer preference data

  8. A java based Data Driven Product Design (DDPD) • Platform is developed that integrates the supercomputing resources at the SDSC with complex engineering design simulation platforms such as Matlab in an effort to streamline the product design and development process

  9. Tools in the GUI • Data Mining algorithms: Weka, Parallel Weka and Parallel C4.5, Parallel K-means • Data Driven Product Design Platform utilizes Matlab’s powerful computation engine directly from the GUI. Optimization choices available from the user interface include Matlab , Tomlab, Excel Solver, Star-P, Parallel Matlab, Parallel CPLEX, etc.

  10. Visual Representation of Data Mining results linking with serial optimization models

  11. Thank You

More Related