1 / 17

Advanced Database Applications Database Indexing and Data Mining CS591-G1 -- Fall 2001

Advanced Database Applications Database Indexing and Data Mining CS591-G1 -- Fall 2001. George Kollios Boston University. Prof. George Kollios Office: MCS 288 Office Hours: Monday 2:00pm-3:30pm Thursday 11:00am-12:30pm Mailing List: cs591g1 .

aiko
Download Presentation

Advanced Database Applications Database Indexing and Data Mining CS591-G1 -- Fall 2001

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Advanced Database ApplicationsDatabase Indexing and Data MiningCS591-G1 -- Fall 2001 George Kollios Boston University

  2. Prof. George Kollios Office: MCS 288 Office Hours: Monday 2:00pm-3:30pm Thursday 11:00am-12:30pm Mailing List: cs591g1

  3. History of Database Technology • 1960s: • Data collection, database creation, IMS and network DBMS • 1970s: • Relational data model, relational DBMS implementation • 1980s: • RDBMS, advanced data models (extended-relational, OO, deductive, etc.) and application-oriented DBMS (spatial, scientific, engineering, etc.) • 1990s—2000s: • Data mining and data warehousing, multimedia databases, and Web databases

  4. Query Optimization and Execution Relational Operators Files and Access Methods Buffer Management Disk Space Management DB Structure of a RDBMS Modern Database Systems Extend these layers • A DBMS is an OS for data! • A typical RDBMS has a layered architecture. • This is one of several possible architectures; each system has its own variations.

  5. Index Methods for RDBMS • Hashing Methods: • Linear Hashing, extendible hashing • B-tree family: • B+-trees and variations • Both of them are one-dimensional

  6. Overview of the course • Spatial Database Systems • GIS, CAD/CAM : EOSDIS project NASA • Manages points, lines and regions • Temporal Database Systems • Billing, medical records • Spatio-temporal Databases • Moving objects, changing regions, etc

  7. Overview of the course • Multimedia and medical databases • A multimedia system can store and retrieve objects/documents with text, voice, images, video clips, etc • Time series databases • Stock market, ECG, trajectories, etc

  8. Multimedia databases • Applications: • Digital libraries, entertainment, office automation • Medical imaging: digitized X-rays and MRI images (2 and 3-dimensional) • Query by content: (or QBE) • Efficient • ‘Complete’ (no false dismissals)

  9. What is Data Mining? • Data mining (knowledge discovery in databases): • The efficient discovery of : previously unknown,valid, potentially useful and understandable information or patterns from data in large databases • Alternative names: • Knowledge discovery(mining) in databases (KDD), knowledge extraction, data/pattern analysis, data archeology, data dredging, etc.

  10. DM Applications • Database analysis and decision support • Market analysis: target marketing, market basket analysis, market segmentation • Fraud detection and management • Biology and medicine • Text mining (news group, email, documents) and Web analysis.

  11. Data Mining: Confluence of Multiple Disciplines Database Technology Statistics Data Mining Machine Learning Visualization Information Science Other Disciplines

  12. Overview of terms • Data: a set of facts (items) D, stored in a database • Pattern: an expression E in a language L, that describes a subset of facts • Attribute: a field in an item i in D. • Interestingness: a function ID,L that maps an expression to a measure space M

  13. The Data Mining Task • For a given dataset D, language of facts L, interestingness function ID,L and threshold c, find the expression E that: ID,L(E) > c efficiently.

  14. How Data Mining is used • Identify the problem • Use data mining techniques to transform the data into information • Act on the information • Measure the results

  15. DM Functionalities • Concept description: • Generalize, summarize, and contrast data characteristics, e.g., dry vs. wet regions • Association (correlation and causality): • Multi-dimensional vs. single-dimensional association • age(X, “20..29”) ^ income(X, “20..29K”) à buys(X, “PC”) [support = 2%, confidence = 60%] • contains(T, “computer”) à contains(x, “software”) [1%, 75%]

  16. DM Functionalities • Cluster analysis • Class label is unknown: Group data to form new classes, e.g., cluster houses to find distribution patterns • Clustering based on the principle: maximizing the intra-class similarity and minimizing the interclass similarity

  17. DM Functionalities • Classification and Prediction • Finding models (functions) that describe and distinguish classes or concepts for future prediction • E.g., classify countries based on climate, or classify cars based on gas mileage • Presentation: decision-tree, classification rule, neural network • Prediction: Predict some unknown or missing numerical values

More Related