data mining potentials and challenges l.
Skip this Video
Loading SlideShow in 5 Seconds..
Data Mining: Potentials and Challenges PowerPoint Presentation
Download Presentation
Data Mining: Potentials and Challenges

Loading in 2 Seconds...

play fullscreen
1 / 13

Data Mining: Potentials and Challenges - PowerPoint PPT Presentation

  • Uploaded on

Data Mining: Potentials and Challenges. Rakesh Agrawal & Jeff Ullman. Observations. Transfer of data mining research into deployed applications and commercial products Greater success in vertical applications Horizontal tools: Examples:

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Data Mining: Potentials and Challenges' - JasminFlorian

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
data mining potentials and challenges

Data Mining:Potentials and Challenges

Rakesh Agrawal & Jeff Ullman

  • Transfer of data mining research into deployed applications and commercial products
    • Greater success in vertical applications
    • Horizontal tools: Examples:
      • SAS Enterprise Miner: Sophisticated Statisticians segment
      • DB2 Intelligent Miner: database applications requiring mining
  • Emergence of the application of data mining in non-conventional domains
    • Combination of structured and unstructured data
  • New challenges due to security/privacy concerns
  • DARPA initiative to fund data mining research
identifying social links using association rules
Identifying Social Links Using Association Rules

Input: Crawl of about 1 million pages

website profiling using classification
Website Profiling using Classification

Input: Example pages for each category during training

discovering trends using sequential patterns shape queries
Discovering Trends Using Sequential Patterns & Shape Queries

Input: i) patent database ii) shape of interest

discovering micro communities
Discovering Micro-communities

Frequently co-cited pages are related. Pages with large bibliographic overlap are related.

new challenges
New Challenges
  • Privacy-preserving data mining
  • Data mining over compartmentalized databases
inducing classifiers over privacy preserved numeric data

30 | 25K | …

50 | 40K | …



65 | 50K | …

35 | 60K | …


Age Distribution


Salary Distribution

Decision Tree



Inducing Classifiers over Privacy Preserved Numeric Data

Alice’s age

Alice’s salary

John’s age

30 becomes 65 (30+35)

other recent work
Other recent work
  • Cryptographic approach to privacy-preserving data mining
    • Lindell & Pinkas, Crypto 2000
  • Privacy-Preserving discovery of association rules
    • Vaidya & Clifton, KDD2002
    • Evfimievski et. Al, KDD 2002
    • Rizvi & Haritsa, VLDB 2002
some hard problems
Some Hard Problems
  • Past may be a poor predictor of future
    • Abrupt changes
    • Wrong training examples
  • Actionable patterns (principled use of domain knowledge?)
  • Over-fitting vs. not missing the rare nuggets
  • Richer patterns
  • Simultaneous mining over multiple data types
  • When to use which algorithm?
  • Automatic, data-dependent selection of algorithm parameters
  • Should data mining be viewed as “rich’’ querying and “deeply’’ integrated with database systems?
    • Most of current work make little use of database functionality
  • Should analytics be an integral concern of database systems?
  • Issues in data mining over heterogeneous data repositories (Relationship to the heterogeneous systems discussion)
  • Data mining has shown promise but needs much more further research

We stand on the brink of great new answers, but even more, of great new questions -- Matt Ridley