90 likes | 184 Views
In this data mining hackathon solution, we utilize Naive Bayes algorithm to predict the probability of users clicking on SKU under specific contexts. By incorporating time information and transforming data from unigrams to bigrams, we enhance prediction accuracy. Techniques such as query correction, lemmatization, and word splitting further refine the model. Key emphasis is on data preprocessing and feature engineering for successful outcomes.
E N D
RapStar’s Solution to Data Mining Hackathon on Best Buy Mobile Site Kingsfield, Dragon
Beat Benchmark • Naive Bayes • We want to know the probability that user click sku under context . • We use queryas context first. • So we have: • Select 5 item with highest predicted probability as prediction.
Use Time information • Time is a good feature in data mining.
Use Time information • Divided data into 12 time periods based on click_time field • Use frequency at time period where click_time belongs to as “prior” instead of global frequency.
Use Time information • Smooth data
Unigram to Bigram • Likelihood of Naive Bayes: • Here is word. • Use Bigram instead of Unigram(word). • Use query “xbox call of duty” • Rerank: “call duty of xbox” • Bigram: [“call duty”, ”call of”, ”call xbox”… “of xbox”] • Once We have bigram training data, the rest is the same as unigram • Blending unigram and bigram:
Data Processing • The most important part: Query Correction • Lemmatization • Split words and number • Query correction(in small version) • A lot of thing that can help to improve: • “x box”, “x men” • New algorithm for query correction • Rank predictions that user clicked lower.
Conclusion • Data Preprocessing and feature Engineering are most important things.