1 / 23

Predicting Download Directories for Web Resources

Dept. of Informatics & Telecommunications University of Athens, Greece. Predicting Download Directories for Web Resources George Valkanas Dimitrios Gunopulos. 4 th International Conference on Web Intelligence, Mining and Semantics June 3, 2014. Online User Activities.

Download Presentation

Predicting Download Directories for Web Resources

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dept. of Informatics & Telecommunications University of Athens, Greece Predicting Download Directories for Web Resources George Valkanas Dimitrios Gunopulos 4th International Conference on Web Intelligence, Mining and SemanticsJune 3, 2014

  2. Online User Activities

  3. Save Link In Folder Facilitating Downloads

  4. Save Link In Folder • Problems: • Predefined Directories • Blunt approach / No learning • UI Clutter • Tedious user management Facilitating Downloads

  5. A principled solution

  6. A principled solution Associate the navigation through the hierarchy with a cost functionOne possible c.f.: Hierarchical Navigation Cost (HNC), i.e., #clicksHNC(imgs/, docs/) = 2

  7. Problem Definition • Given • The hierarchical structure • A target directory T, where theresource will be saved • Goal • Suggest a directory S that minimizes the cost function cf( S, T )

  8. Problem Definition • Given • The hierarchical structure • A target directory T, where theresource will be saved • Goal • Suggest a directory S that minimizes the cost function cf( S, T ) • But if I know T, why not suggest T directly? (0 cost)

  9. Problem Definition • Given • The hierarchical structure • A target directory T, where theresource will be saved • Goal • Suggest a directory S that minimizes the cost function cf( S, T ) • But if I know T, why not suggest T directly? (0 cost)In this setting, we don’t know T until it’s too late!

  10. Casting to a classification framework • Directories are potential class values • T is the true target class • S is the output of a classification process • Web resource properties → classification features • Recommend S that best matches T • Use directories from past saves as candidate classes

  11. Features & Distances

  12. Experimental Setup • Implement classifier as a FF plugin • DiDoCtor approach • Javascript • 1-NN classifier • 6 participants • 4-month minimum use period • Baseline • Last-by-domain (LBD), current browser approach • Simulated, based on submitted result • Metrics • Click Distance: HNC, Breadcrumbs • Classification Accuracy

  13. Preliminary Result Analysis

  14. Preliminary Result Analysis • Take Home Messages • Users have different saving pattern behavior(s)

  15. Preliminary Result Analysis • Take Home Messages • Users have different saving pattern behavior(s) • Users have high variability in their accesses to each directory

  16. Click Distance - HNC Take Home Message Significant reduction in number of clicks to reach target directory!

  17. Click Distance - HNC Click distance gainis even higherwhen consideringa breadcrumbs UI! Take Home Message Significant reduction in number of clicks to reach target directory!

  18. Running Accuracy Take Home Message DiDoctor is much more accurate in predicting the download directory

  19. Basic Model Extensions • Feature reweighting • RELIEF_F

  20. Basic Model Extensions • Feature reweighting • RELIEF_F • Suggesting k directories

  21. Alternative classifiers • Take Home Messages • Classifiers can help! • DiDoCtor generallyperforms the best • Accuracy is affectedby user behavior!

  22. Conclusions & Future work • Approach for facilitating downloads • Optimization problem & classification framework • Experimentation with real users • Basic model extensions • Further exploit the temporal dimension • More informative features (e.g., entities) • Automatic generation of directories

  23. Thank you! • Questions?  • Acknowledgements • To the evaluators of our plugin • Heraclitus II fellowship, THALIS-GeoComp, THALIS-DISFER, Aristeia-MMD, EU project INSIGHT

More Related