RapStar’s Solution to Data Mining Hackathon on Best Buy Mobile Site

RapStar’s Solution to Data Mining Hackathon on Best Buy Mobile Site Kingsfield, Dragon

Beat Benchmark

Beat Benchmark • Naive Bayes • We want to know the probability that user click sku under context . • We use queryas context first. • So we have: • Select 5 item with highest predicted probability as prediction.

Use Time information • Time is a good feature in data mining.

Use Time information • Divided data into 12 time periods based on click_time field • Use frequency at time period where click_time belongs to as “prior” instead of global frequency.

Use Time information • Smooth data

Unigram to Bigram • Likelihood of Naive Bayes: • Here is word. • Use Bigram instead of Unigram(word). • Use query “xbox call of duty” • Rerank: “call duty of xbox” • Bigram: [“call duty”, ”call of”, ”call xbox”… “of xbox”] • Once We have bigram training data, the rest is the same as unigram • Blending unigram and bigram:

Data Processing • The most important part: Query Correction • Lemmatization • Split words and number • Query correction(in small version) • A lot of thing that can help to improve: • “x box”, “x men” • New algorithm for query correction • Rank predictions that user clicked lower.

Conclusion • Data Preprocessing and feature Engineering are most important things.

RapStar’s Solution to Data Mining Hackathon on Best Buy Mobile Site

RapStar’s Solution to Data Mining Hackathon on Best Buy Mobile Site

Presentation Transcript

Data Mining I

Data Mining Query Languages

Data Mining

Data Mining

Introduction to Data Mining

Data Mining

Data Mining

What is Data Mining ?

Course Overview

PingER: Navigating the web site and mining the data

Data Mining: Introduction

Data Mining: What? WHY? HOW?

Introduction to Data Mining

Privacy-oriented Data Mining by Proof Checking

Data Mining: Data

Web Mining

Temporal Data Mining

Data Mining: Applications

Chapter Two

Data Mining