1 / 12

Some Interesting Problems

Some Interesting Problems. Rakesh Agrawal IBM Almaden Research Center. Foundations. What is data mining A collection of techniques? A set of composable operations (a la Relational Algebra)? Hints: Inductive Databases (Mannila) Relational Calculus + Statistical Quantifiers (Imielinski).

magda
Download Presentation

Some Interesting Problems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Some Interesting Problems Rakesh Agrawal IBM Almaden Research Center

  2. Foundations • What is data mining • A collection of techniques? • A set of composable operations (a la Relational Algebra)? • Hints: • Inductive Databases (Mannila) • Relational Calculus + Statistical Quantifiers (Imielinski)

  3. Privacy Implications • Can we build accurate data models while preserving privacy of individual records? • Hints • Randomization (Agrawal & Srikant): Replace x by x+y where y is drawn from a known distribution • Anonymization (Crypto literature)

  4. Web Mining: Beyond Click Streams • Mining knowledge bases from the web • Completeness • Accuracy • Malicious Spam • Hints: • Brin’s Book experiment • etc. etc.

  5. Web Mining: Beyond hrefs • What other social behaviors exist on the web and how to make use of them? • Hints: • Viral marketing paper in this conf • etc. etc.

  6. Actionable Patterns • Principled use of domain knowledge for • discarding uninteresting patterns • performance • Hints: • Papers in the recent KDD conferences

  7. Simultaneous mining over multiple data types • Not just • Relational tables • Time series • Textual documents • But patterns across all of them

  8. Some more problems • Online, incremental algorithms over data streams • When to retire the past data • Long sequential patterns • Discovering richer patterns (trees and dags) • Automatic, data-dependent selection of algorithm parameters

  9. What not to work on? • The field is too young! • Let every flower bloom!!! • Too early to say we don’t need new algorithms • Impressive results of the PVSM algorithm • Emphasize evaluation and benchmarks • Interesting research issues

  10. Applications most likely to benefit from data mining • Web applications (I think) • Bioinformatics (I hope!)

  11. Inhibitors • Insufficient skill base (Education) • Usability

  12. The true delight is in the finding out, rather than in the knowing. Isaac Asimov

More Related