1 / 13

KDD Cup 2007 Task I Algorithm & Analysis

Neural Network Final Project. KDD Cup 2007 Task I Algorithm & Analysis. Student : M9615039 胡正穎  [emil0928@gmail.com] M9615902 張馨文  [shinwen65@gmail.com] Group number : 10 Advisor : Dr. Hahn-Ming Lee. Outline. Introduction Data Set Method and System

leah-oneill
Download Presentation

KDD Cup 2007 Task I Algorithm & Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Neural Network Final Project KDD Cup 2007 Task I Algorithm & Analysis Student : M9615039胡正穎  [emil0928@gmail.com] M9615902 張馨文  [shinwen65@gmail.com] Group number : 10 Advisor : Dr. Hahn-Ming Lee

  2. Outline • Introduction • Data Set • Method and System • Training and Text Set • Result • Analysis 2

  3. Introduction • Our Task Description • This Task is to predict which users rated which movies in 2006. • According to the information from the Web-side of KDD Cup 2007, we get the training data and answer data format. Try to find the relation between the training data set files. Hope to predict the rating of the 2006 correctly. 3

  4. Data Set • Our Data Set Structure 4

  5. Method and System • Method I • Expanding movie_id ,customer_id ,and rating which the customer gavetoindependent elements of a matrix. Each year has one characteristic matrix, then we take these matrices from individual year for training. 5

  6. Method and System (cont.) • Method II • According to training data set of each years 2002-2005),We classify these data sets into three matrices which row is movie_id and column is customer_id. Each year has one characteristic matrix, then we take these matrices from individual year for training alternately. 6

  7. Training and Text Set • Due to a great quantity of the movies and the customers, the size of the produced matrix should be very large(17770x2649429). • We select the amount of data from the answer file as the problem domain.

  8. Training and Text Set (cont.) • The size of our matrix is too huge to be accepted by Matlab program. 8

  9. Result • Our Result 9

  10. Result (cont.) 10

  11. Analysis • How could we choose the information of the training data set which is effective? • To gather the statistic of rating 1-5, and  it shows that the number of users and movies with a given average, almost 6,000 movies and  200,000 customers had given an average rating of 3.5. (Finding the relation between rating and movie ID & rating and customer ID) • On the other hand, majority of users will not see the same movie again, so the information of customer ID has  little effect of the prediction, so we can abandon it. 11

  12. Analysis (cont.) • The ideas of how to rising the accuracy • We try to.. • (1) Adding weight to training set • (2) Increasing the learning rate • (3) More training tests • (4) To adjust the number of network layers

  13. Thank you! The End

More Related