Download Presentation
## Data Mining

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Data Mining**Database Systems Timothy Vu**Mining**Mining is the extraction of valuable minerals or other geological materials from the earth, usually bauxite, coal, diamonds, iron, precious metals, lead, limestone, nickel, phosphate, rock salt, tin, and uranium, petroleum, natural gas, and even water. Often something that is valuable, rare, or useful.**What is Data Mining**Data Mining, also known as Knowledge-Discovery in Databases (KDD), is the process of automatically searching large volumes of data for patterns. In order to achieve this, data mining uses computational techniques from statistics, machine learning and pattern recognition. Machine learning - a method for creating computer programs by the analysis of data sets. Pattern recognition - classify data (patterns) based on either a priori knowledge or on statistical information extracted from the patterns.**Why Data Mining**• Data mining is a technique that helps individuals or companies find useful information to make better decisions from large amounts of data. • Reduce risks • Find problems and issues • Save money • High confidence predictions • Simplifying information**Discussion Topics**1 ) Classification 2 )Regression 3) Association 4) Clustering**Classifiers**Decision-Tree Classifiers – each node has an associated class and each internal node has a predicate. Bayesian Classifiers – find the distribution of attribute values for each class in the training data ( the maximum probability predicted ). Nuro Net Classifiers – Use the training data to train artificial nuro nets.**Regression**Regression – Deals with the prediction of a value rather than a class. Linear Regression – Predict values using a polynomial by finding the curve fitting, meaning finding coefficients that give the best answer.**Associations**Finding the association or relationship between two or more items. Support – measure of what fractions of the pupulation satisifies both the antecedent and the consequent of the rule. MILK => Screwdrivers Confidence – how often the consequent is true when the antecedent is true. MILK => Bread**Clustering**Clustering is the classification of similar objects into different groups, or more precisely, the partitioning of a data set into subsets (clusters), so that the data in each subset (ideally) share some common trait - often proximity according to some defined distance measure.**Applications of Data Mining**• 1. Predictions • - Stock Market • - Earth Quakes • NBA games • 2. Association • - Store Inventory • Fashion Trends • 3. Descriptive Patterns • - Disease Analysis • - Image Recognition • - Fraud Detection**References**• Silberschatz, H.F. Korth, S. Sudershan: Database System Concepts, 5th ed., McGraw-Hill, 2006 • Runge , Marschall, Magnus Ohman , and Frank Netter. Netter's Cardiology (Netter Clinical Science). W.B. Saunders Company, 2004. • "Data mining". Wikipedia. 4/1/2006 <http://en.wikipedia.org/wiki/Data_Mining>.