1 / 13

Practical Lessons of Data Mining at Yahoo!

Practical Lessons of Data Mining at Yahoo!. Presenter: Jun-Yi Wu Authors: Ye Chen, Dmitry Pavlov, Pavel Berkhin , Aparna Seetharaman , Albert Meltzer. 國立雲林科技大學 National Yunlin University of Science and Technology. 2009 CIKM. Outline. Motivation Objective Experience

luisa
Download Presentation

Practical Lessons of Data Mining at Yahoo!

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Practical Lessons of Data Mining at Yahoo! Presenter: Jun-Yi Wu Authors: Ye Chen, Dmitry Pavlov, PavelBerkhin, AparnaSeetharaman, Albert Meltzer 國立雲林科技大學 National Yunlin University of Science and Technology 2009 CIKM

  2. Outline • Motivation • Objective • Experience • Conclusion • Comments

  3. Motivation Information Raw Data The usage of data in many commercial applications has been growingat an unprecedented pace in the last decade. While successful data mining efforts lead to major business advances, there were also numerous, less publicized efforts that for one or another reason failed.

  4. Objective • To discuss practical lessons based on years of our data mining experiences at Yahoo! • To offer insights into how to drive the data mining effort to success in a business environment. • To reflect on four success factors: methodology, data, infrastructure, and people.

  5. Success Factors • Methodology • Data • A Data-driven Perspective • Data Preprocessing • Data Size and Sampling • Data Distribution • Data Understanding • Modeling Goals and Evaluation

  6. Success Factors • Infrastructure • An infrastructure for Web-scale Data • Gridification • The Scalability Dilemma • People • Engaging the Wider Community

  7. Success Factors • Methodology • Many companies fail to take full advantage of their data because they do not apply data mining techniques to study, manage and learn from their data.

  8. Success Factors • Data-A Data-driven Perspective • Companies habitually rely on their "gut feelings" instead of relying on the data to drive decision-making. • That being said, one should not underestimate the importance of domain knowledge. • We argue that domain knowledge should guide empirical investigation, especially at the exploratory stage.

  9. Success Factors • Data-DataPreprocessing • The data mining process starts with data preprocessing, or so-called ETL (extract, transform and load), during which raw user data logs go through a series of perturbations and get loaded into a data warehouse (DW). • ETL may introduces biases in downstream data. • The timestamp may not be consistently normalized • Data consistency is a big challenge. • Data integration is a big architectural challenge.

  10. Success Factors Data-DataDistribution

  11. Success Factors • Data • A Data-driven Perspective • Data Preprocessing • Data Size and Sampling • Data Distribution • Data Understanding • Modeling Goals and Evaluation

  12. Conclusion

  13. Comments • Advantage • Drawback • … • Application • Information Search and Retrieval 13

More Related