1 / 13

Data Mining: Potentials and Challenges

Explore the potentials and challenges of data mining in deployed applications and commercial products, with a focus on vertical applications and horizontal tools. Discover new opportunities and challenges in non-conventional domains, structured and unstructured data, and security/privacy concerns.

tperkins
Download Presentation

Data Mining: Potentials and Challenges

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Mining:Potentials and Challenges Rakesh Agrawal & Jeff Ullman

  2. Observations • Transfer of data mining research into deployed applications and commercial products • Greater success in vertical applications • Horizontal tools: Examples: • SAS Enterprise Miner: Sophisticated Statisticians segment • DB2 Intelligent Miner: database applications requiring mining • Emergence of the application of data mining in non-conventional domains • Combination of structured and unstructured data • New challenges due to security/privacy concerns • DARPA initiative to fund data mining research

  3. Identifying Social Links Using Association Rules Input: Crawl of about 1 million pages

  4. Website Profiling using Classification Input: Example pages for each category during training

  5. Discovering Trends Using Sequential Patterns & Shape Queries Input: i) patent database ii) shape of interest

  6. Discovering Micro-communities Frequently co-cited pages are related. Pages with large bibliographic overlap are related.

  7. New Challenges • Privacy-preserving data mining • Data mining over compartmentalized databases

  8. 30 | 25K | … 50 | 40K | … Randomizer Randomizer 65 | 50K | … 35 | 60K | … Reconstruct Age Distribution Reconstruct Salary Distribution Decision Tree Algorithm Model Inducing Classifiers over Privacy Preserved Numeric Data Alice’s age Alice’s salary John’s age 30 becomes 65 (30+35)

  9. Other recent work • Cryptographic approach to privacy-preserving data mining • Lindell & Pinkas, Crypto 2000 • Privacy-Preserving discovery of association rules • Vaidya & Clifton, KDD2002 • Evfimievski et. Al, KDD 2002 • Rizvi & Haritsa, VLDB 2002

  10. Computation over Compartmentalized Databases

  11. Some Hard Problems • Past may be a poor predictor of future • Abrupt changes • Wrong training examples • Actionable patterns (principled use of domain knowledge?) • Over-fitting vs. not missing the rare nuggets • Richer patterns • Simultaneous mining over multiple data types • When to use which algorithm? • Automatic, data-dependent selection of algorithm parameters

  12. Discussion • Should data mining be viewed as “rich’’ querying and “deeply’’ integrated with database systems? • Most of current work make little use of database functionality • Should analytics be an integral concern of database systems? • Issues in data mining over heterogeneous data repositories (Relationship to the heterogeneous systems discussion)

  13. Summary • Data mining has shown promise but needs much more further research We stand on the brink of great new answers, but even more, of great new questions -- Matt Ridley

More Related