1 / 24

Presentation Title: DATA MINING

Department of Computer Science Sir Syed University of Engineering & Technology, Karachi-Pakistan. Presentation Title: DATA MINING. Submitted By . Osama Ghulam Mohammad. (2010-CS-20) Noureen Chagani (2010-CS-11)

parley
Download Presentation

Presentation Title: DATA MINING

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Department of Computer Science Sir Syed University of Engineering &Technology, Karachi-Pakistan. Presentation Title:DATA MINING Submitted By Osama Ghulam Mohammad. (2010-CS-20) Noureen Chagani (2010-CS-11) NaveedUsman (2010-CS-23)

  2. TABLE OF CONTENTS • What is data mining ? • Data mining consists of five major elements • Why Mine Data? • Commercial Viewpoint • Scientific Viewpoint • Some of the techniques used for data mining

  3. What is data mining ? • Data Mining, also known as Knowledge-Discovery in Databases (KDD), is the process of automatically searching large volumes of data for patterns. • It is the process of extraction of knowledge from large datasets. • Extremely large datasets. • Useful knowledge that can improve processes.

  4. Data mining consists of five major elements: • Extract, transform, and load transaction data onto the data warehouse system. • Store and manage the data in a multidimensional database system. • Provide data access to business analysts and information technology professionals. • Analyze the data by application software. • Present the data in a useful format, such as a graph or table.

  5. Why Mine Data? Commercial Viewpoint • Lots of data is being collected and warehoused • Web data, e-commerce • purchases at department/grocery stores • Bank/Credit Card transactions • Computers have become cheaper and more powerful • Competitive Pressure is Strong • Provide better, customized services for an edge (e.g. in Customer Relationship Management)

  6. Why Mine Data? Scientific Viewpoint • Data collected and stored at enormous speeds (GB/hour). • remote sensors on a satellite • telescopes scanning the skies • microarrays generating gene expression data • scientific simulations generating terabytes of data • Traditional techniques infeasible for raw data. • Data mining may help scientists . • in classifying and segmenting data

  7. Some of the techniques used for data mining are: • Artificial neural networks - Neural networks are useful for pattern recognition or data classification, through a learning process. Non-linear predictive models that learn through training and resemble biological neural networks in structure.

  8. Neural Network • Neural Networks map a set of input-nodes to a set of output-nodes • Number of inputs/outputs is variable • The Network itself is composed of an arbitrary number of nodes with an arbitrary topology

  9. Decision tree • Tree-shaped structures that represent sets of decisions. These decisions generate rules for the classification of a dataset.

  10. Decision tree (data) height hair eyes class short blond blue A tall blond brown B tall red blue A short dark blue B tall dark blue B tall blond blue A tall dark brown B short blond brown B

  11. hair dark blond red B A eyes blue brown A B

  12. The Nearest neighborhood method A classification technique that classifies each record based on the records most similar to it in an historical database.

  13. An important technique for Data Mining is: CLUSTURING

  14. Clustering : (Definition) • Clustering can be considered the most important unsupervised learning technique; so, as every other problem of this kind, it deals with finding a structure in a collection of unlabeled data. • Clustering is “the process of organizing objects into groups whose members are similar in some way”. • A cluster is therefore a collection of objects which are “similar” between them and are “dissimilar” to the objects belonging to other clusters.

  15. Clustering The greater the similarity (or homogeneity) within a group, and the greater the difference between groups, the “better” or more distinct the clustering.

  16. Why clustering? A few good reasons ... • Simplifications • Pattern detection

  17. The K-Means Clustering Method Basic K-means Algorithm for finding K clusters: 1. Select K points as the initial centroids. 2. Assign all points to the closest centroid. 3. Recompute the centroid of each cluster. 4. Repeat steps 2 and 3 until the centroids don’t change.

  18. Figure 10a shows the case when the cluster centers coincidewith the circle centers. This is a global minimum. Figure 10b shows a local minima.

  19. Cluster Example

  20. “The key in business is to know something that nobody else knows.” — Aristotle Onassis“To understand is to perceive patterns.” — Sir Isaiah Berlin

  21. Thank You 

More Related