1 / 17

KDD CUP 2007

KDD CUP 2007. Neural Network HW2 Group 14. Yu Szu-Hsien (M9609208) Ciou Yun-Rong(M9608305). How? (method & system). 1.  Make into a matrix. From analyzing the film types that the customers has rated, we can predict the customers’ rating on the other films in the same type.

alina
Download Presentation

KDD CUP 2007

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. KDD CUP 2007 Neural Network HW2 Group 14 Yu Szu-Hsien (M9609208) Ciou Yun-Rong(M9608305) KDD CUP 2007 Neural Network HW2

  2. How? (method & system) Group 14 HW 2

  3. 1.  Make into a matrix • From analyzing thefilm types that the customers has rated, we can predict the customers’ rating on the other films in the same type. Group 14 HW 2

  4. 2. The characteristics of the problem • This problem takes the data in an enormous database as a basis. • The rating series of every customer imply the personality, favorite and time interval. • Every movie can compile statistics, and it is appraised that how many customers have rated in different time, regarded as time series. • Every customer can compile statistics, and it is appraised that what user rated, regarded as time series. Group 14 HW 2

  5. Methods → How to find the similar films and similar users? • Similarity measures • Use Poisson regression • Clustering analysis • Association rule • Random forests • Collaborative filtering method (group filter or social filtering) • Singular value decomposition (SVD) Group 14 HW 2

  6. System • <Weka> : multilayer perceptron (MLP) • Data mining software in Java • <MATLAB> : backpropagation • The language of technical computing • <MS SQL 2005> : clustering • A comprehensive, integrated data management and analysis software Group 14 HW 2

  7. Result (training & test set) Group 14 HW 2

  8. Difficulty confronted • “ Out of memory!! ”-- The dataset size is too large. • Not enough eigenvalues of the dataset. • What are the valuable eigenvalues we really need? • Which algorithm should be used? Group 14 HW 2

  9. Training & Test set • Downsize the dataset : Grouping by their eigenvalues (using SQL)  Sampling from the groups for training • Make the sampled dataset into a matrix • Train in the tool : Weka, MATLAB • Evaluate the accuracy by RMSE Group 14 HW 2

  10. The Sketch Group 14 HW 2

  11. SQL Server Group 14 HW 2

  12. MATLAB(1/2) Group 14 HW 2

  13. MATLAB(2/2) (# Training Data = 10040, Test Data = 42) Group 14 HW 2

  14. Weka (# Training Data = 118, Test Data = 13) Group 14 HW 2

  15. Analysis (why) Group 14 HW 2

  16. Analysis • <Weka> • We regard the data as a matrix of the movies and users • Defect:enormous matrix Solution:classify the movies or users first • Minimum of the wrong rate:multilayer perceptron • neural number&training times • <MATLAB> • Not enough eigenvalue (only one eigenvalue about movie classification) • We will find more eigenvalue about the dependence among the movie and customer (use SVD) Group 14 HW 2

  17. Thank You! Group 14 HW 2

More Related