1 / 15

Data Mining

Data Mining. By : Tung, Sze Ming ( Leo ) CS 157B. Definition. A class of database application that analyze data in a database using tools which look for trends or anomalies. Data mining was invented by IBM. Purpose.

Download Presentation

Data Mining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Mining By : Tung, Sze Ming ( Leo ) CS 157B

  2. Definition • A class of database application that analyze data in a database using tools which look for trends or anomalies. • Data mining was invented by IBM.

  3. Purpose • To look for hidden patterns or previously unknown relationships among the data in a group of data that can be used to predict future behavior. • Ex: Data mining software can help retail companies find customers with common interests.

  4. Background Information • Many of the techniques used by today's data mining tools have been around for many years, having originated in the artificial intelligence research of the 1980s and early 1990s. • Data Mining tools are only now being applied to large-scale database systems.

  5. The Need for Data Mining • The amount of raw data stored in corporate data warehouses is growing rapidly. • There is too much data and complexity that might be relevant to a specific problem. • Data mining promises to bridge the analytical gap by giving knowledgeworkers the tools to navigate this complex analytical space.

  6. The Need for Data Mining, cont’ • The need for information has resulted in the proliferation of data warehouses that integrate information multiple sources to support decision making. • Often include data from external sources, such as customer demographics and household information.

  7. Approach to Data Mining • association • sequence-based analysis • clustering • classification

  8. Association • Classic market-basket analysis, which treats the purchase of a number of items (for example, the contents of a shopping basket) as a single transaction. • This information can be used to adjust inventories, modify floor or shelf layouts, or introduce targeted promotional activities to increase overall sales or move specific products. • Example : 80 percent of all transactions in which beer was purchased also included potato chips.

  9. Sequence-based analysis • Traditional market-basket analysis deals with a collection of items as part of a point-in-time transaction. • to identify a typical set of purchases that might predict the subsequent purchase of a specific item.

  10. Clustering • Clustering approach address segmentation problems. • These approaches assign records with a large number of attributes into a relatively small set of groups or "segments." • Example : Buying habits of multiple population segments might be compared to determine which segments to target for a new sales campaign.

  11. Classification • Most commonly applied data mining technique • Algorithm uses preclassified examples to determine the set of parameters required for proper discrimination. • Example : A classifier derived from the Classification approach is capable of identifying risky loans, could be used to aid in the decision of whether to grant a loan to an individual.

  12. Issues of Data Mining • Present-day tools are strong but require significant expertise to implement effectively. • Issues of Data Mining • Susceptibility to "dirty" or irrelevant data. • Inability to "explain" results in human terms.

  13. Issues • susceptibility to "dirty" or irrelevant data • Data mining tools of today simply take everything they are given as factual and draw the resulting conclusions. • Users must take the necessary precautions to ensure that the data being analyzed is "clean."

  14. Issues, cont’ • inability to "explain" results in human terms • Many of the tools employed in data mining analysis use complex mathematical algorithms that are not easily mapped into human terms. • what good does the information do if you don’t understand it?

  15. The End

More Related