1 / 16

Neural Network homework-2

Neural Network homework-2. Title: Use KDD CUP 2007 (or KDD CUP 2008) derived a DATASET the best learning algorithm, to be completed TASK1 or TASK2 final analysis. Groups:Group 1 Members: Tang Chia Ping M9615010 HSIEH HSIN JU M9605103 Payment date: December 28, 96.

marged
Download Presentation

Neural Network homework-2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Neural Network homework-2 Title: Use KDD CUP 2007 (or KDD CUP 2008) derived a DATASET the best learning algorithm, to be completed TASK1 or TASK2 final analysis. Groups:Group 1 Members: Tang Chia Ping M9615010 HSIEH HSIN JU M9605103 Payment date: December 28, 96

  2. 1. Introduction The first task in KDD Cup 2007 is to predict which users rated which movies in 2006, given the Netflix Prize training data set that contains more than 100 million ratings from over 480 thousand users on nearly 18 thousand movie titles collected between 1998 and 2005. In our practice, we cast the task as a link prediction problem and address it via a simple classification approach.

  3. 1-1 The Movies Description • This is in accordance with the annual volume of Movies, charts drawn from the figure shows that with the annual increase in the number of films with the increase, especially in 2004 reached its highest point, in 2005, of sudden reduced to five films Department 100, which we can see the relevance, do only affect the ratings of the characteristics of factors.

  4. 1-2 Training Dataset File Description • MovieID1,CustomerID11,Date11,Date11,YearOfReleas11MovieID1,CustomerID12,Date12,Date12,YearOfReleas12...MovieID2,CustomerID21,Date21,Date21,YearOfReleas21MovieID2,CustomerID22,Date21,Date22,YearOfReleas22...

  5. MovieIDs range from 1 to 17770 sequentially.  • CustomerIDs range from 1 to 2649429, with gaps. There are 480189 users.  • Dates have the format YYYY.  • Year Of Release can range from 1890 to 2005 and may correspond to the release of corresponding DVD, not necessarily its theaterical release.

  6. 1-3 The Feature Selection follwing as: • User ID a unique identifier for a user .  • Movie Name title of the movie.   • User Movie Rating a number between 1 and 5 (1 is lowest) .  • Average Rating by User average rating on all movies rated by the user . • Average Popular Movie Rating by User average rating on all popular movies rated by the user . • User Ratings number of ratings by the user.

  7. 2. Analysis of the results 2-1Random Sampling Fig2. Random distribution

  8. 2-2 Training Parameters of the Network Fig3. Leaning rate = 0.05

  9. 2-3 Network: Fig4. Network: (1) there are four neurals in the first layer; (2) there are three neurals in the second layer

  10. 2-4 Weight to layer Fig5. Weight to layer 1 Fig6. Weight to layer 2

  11. 2-5 Bias to layer Fig7. Bias to layer 1 Fig8. Bias to layer 2

  12. 2-6 Training with TRAINGDM Fig9. The Performance is 0.397427

  13. 3.discussion • Neural is the most important parameters can be adjusted, the use of the Internet is looking forward to show some of the expectations or interested behaviour. • Neural operation is divided into two: the main decisions training weights of the network with partial weight, and in accordance with training simulation to predict the output value or verified the accuracy of the network. Do the most simple and widely used for Surpervised Learing.

  14. The KDD cup in 2007 there are two main tasks: Task 1-Who Rated What and Task2-HOw Manr Ratings. On these two tasks, we were racking their brains, because information is so large, up to 17,700 of the information is dealing with, we had no choice but to its split into several, to run in MATLAB, and add a lot of what we believe will be affected by factors inside. • Use MATLAB to deal with these data, according to MovieIDs 、 CustomerIDs 、Dates 、 Year Of Release to run programs, and design Leaning rate = 0.05, four neurals in the first layer, three neurals in the second layer, by The Performance is 0.397427. • For the operation, we are headache, so we Before many of reference literature to help us do so, the situation will be detailed in the steps described.

  15. 4.References • Saharon Rosset, Claudia Perlich and Yan Liu, "KDD Cup 2007 Task 2 Winner's Report"  • George S. Davidson, Brian N. Wylie, Kevin W. Boyack, "Cluster Stability and the Use of Noise in Interpretation of Clustering“ •   Eamonn Keogh and Christian Shelton, "Workshop and Challenge on Time Series Classification"  • Yan Liu and Zhenzhen Kou, "Predicting Who Rated What in Large-scale Datasets"  • Miklos Kurucz stvan Nagy, Andras A. Benczur I Adrienn Szabo, "Tamas Kiss Balazs TormaWho Rated What: a combination of SVD, correlation and frequent sequence mining"  • James Malaugh Inductis, Sachin Gangaputra Inductis and Nikhil Rastogi Inductis, "KDD Cup 2007 – How often will that movie be rated? "

  16. Finished listening to thank the report

More Related