1 / 7

Mainlining Data Mining:

This panel talk explores the current state of data mining technology, its challenges and opportunities, and the research issues that will shape its transition to the mainstream. Topics include data scrubbing, visualization, understanding, and new opportunities such as text mining, time series analysis, and domain-specific applications.

mgonzalez
Download Presentation

Mainlining Data Mining:

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mainlining Data Mining: Jim Gray Microsoft Panel talk at ICDE2000 San Diego, 2 Mar 2000

  2. Is data mining still a niche technology? • 97,363 items on Northern Light re “data mining” • 9,075,288 items re “data base” or “database” • Is 100,000 items a niche? (OR: 14K, XML: 250K) • Today data mining tools for experts (statisticians). (Decision Trees, Clusters, K-means, Neural nets…) • High tech and High Touch aka: consulting and license fees And the vendors like it that way. • Claim that you MUST understand the technology to use it.

  3. But.. The Petabytes are Coming!! • We will be/are drowning in data/email/web.. • Abstraction & categorization are key technologies • But, • They have to work. • They have to be trivial to learn. • Successful Ubiquitous data mining (clustering/classifiers…) • Mail Filters/Classifiers • Resume readers • Shopping recommendations, Community finders • Web search engines

  4. Key technical/research issues for transition to the mainstream? PROCESS PROBLEMS: • Getting data into tool is hell • Scrubbing data is hell • Then comes the easy part: mining • Then comes the really hard part: visualization and understanding • Most of us: • Can’t understand neural nets (that’s bad). • Can’t understand statistics (that’s a fact).

  5. Key technical/research issues for transition to the mainstream? Opportunities: It’s not just numbers • Text mining • Time series • Domain specific • Web logs • Protein patterns • Spatial (e.g. geology, astronomy) • Image

  6. 1990 FORD 1991 CHEVY 1992 1993 By Year By Make By Make & Year RED WHITE BLUE By Color & Year By Make & Color Sum By Color New opportunities for KDM? • Make data capture/scrub/import trivial • Provide intuitive manipulation interfaces • Provide simpler analysis concepts support/confidence concept precision/recall ranking pivot & rollup & cube • Provide interactivevisual data explorer. • Case in point: I have yet to see a nice data cube visualizer.

  7. Research challenges that will impact data mining? • Simpler analysis concepts • Visualization tools to navigate data • Better algorithms = Better answers

More Related