1 / 19

Automatic Collection “Recruiter”

Automatic Collection “Recruiter”. Shuang Song. Project Goal. Given a collection, automatically suggest other items to add to the collection Design a process to achieve the task Apply different filtering algorithms Evaluate the result. Collection. New Items. 1. 3. Query Terms.

zareh
Download Presentation

Automatic Collection “Recruiter”

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Automatic Collection “Recruiter” Shuang Song

  2. Project Goal • Given a collection, automatically suggest other items to add to the collection • Design a process to achieve the task • Apply different filtering algorithms • Evaluate the result

  3. Collection New Items 1 3 Query Terms Training Sets 2 External Source Filter Query Results The Process • Tokenization and frequency counting • New items extraction • New items filtering and ranking

  4. Filtering Algorithms • Latent Semantic Analysis (LSA) • Pre-processing, no stemming • SVD over term by document matrix • Pseudo-document representation of new items • Gzip Compression Algorithms

  5. Relevance Measure - LSA Collection Signature Vector Pseudo-document Vector V V* LSA Feature Space

  6. Relevance Measure - gzip

  7. First Experiment – Math Forum Collection • 19 courseware in the collection • 10 items in the experiment set • First 5 from math forum • The other 5 from other collections in www.smete.org

  8. First Experiment Result

  9. Second Experiment – Collaborative Filtering Collection • 12 papers in the collection • 11 items in the experiment set • First 10 from Citeseer • Query terms submitted: (information 284) (algorithm 250) (ratings 217) (filtering 159) (system 197) (query 149) (reputation 114) (reviewer 109) (collaborative 106) (recommendations 98) • Last one is the paper we read in class: “An Algorithm for Automated Rating of Reviewers”

  10. Second Experiment Result

  11. Second Experiment – User Study • 6 people in my research lab participated in this study • 3 of them with IR background • 3 of them without IR background • They were asked to rate the 11 items in the experiment set in according to the the degree of relevance to the given collection

  12. Second Experiment Result – Human Rating

  13. Second Experiment Result – Another View

  14. Second Experiment Result –comparison of w/o SVD and w/o weightings

  15. Second Experiment – Correlation with human rating

  16. Second Experiment –precision and recall (cutoff: RLSA >0.5 & Rgzip>0.2)

  17. Second Experiment –precision and recall (cutoff: RLSA >0.4 & Rgzip>0.17)

  18. Comparison of Two Filtering Algorithms • Gzip works well when input documents are just abstracts, while LSA works for both • LSA captures words association pattern and statistical importance, gzip scans for repetition only. • LSA is more computationally demanding, while gzip is simple • Effectiveness

  19. To Do List And Future Work • Accurate and trustworthy evaluation from expert (collection owner?) • Extract full text and abstract from Citeseer automatically

More Related