1 / 41

Identifying Multi-ID Users in Open Forums

Identifying Multi-ID Users in Open Forums. Hung-Ching Chen Mark Goldberg Malik Magdon-Ismail. Who is Using Multiple IDs?. Consider two chatrooms, “letters” and “numbers”. letters. numbers. dogg: hey anna, Z rocks! anna: hey dogg whats new mack: what’s that dogg? dogg: not much

iokina
Download Presentation

Identifying Multi-ID Users in Open Forums

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Identifying Multi-ID Users in Open Forums Hung-Ching Chen Mark Goldberg Malik Magdon-Ismail

  2. Who is Using Multiple IDs? Consider two chatrooms, “letters” and “numbers” letters numbers dogg: hey anna, Z rocks! anna: hey dogg whats new mack: what’s that dogg? dogg: not much dogg: Z is the best mack: i love 5 catt: i don’t joop: 27 rules! catt: i agree joop :)

  3. Multi-ID Users letters numbers dogg: hey anna, Z rocks! anna mack dogg/catt joop

  4. Multi-ID Users letters numbers dogg: hey anna, Z rocks! mack: i love 5 anna mack dogg/catt joop

  5. Multi-ID Users letters numbers dogg: hey anna, Z rocks! anna: hey dogg whats new mack: i love 5 anna mack dogg/catt joop

  6. Multi-ID Users letters numbers dogg: hey anna, Z rocks! anna: hey dogg whats new mack: what’s that dogg? mack: i love 5 anna mack dogg/catt joop

  7. Multi-ID Users letters numbers dogg: hey anna, Z rocks! anna: hey dogg whats new mack: what’s that dogg? mack: i love 5 catt: i don’t anna mack dogg/catt joop

  8. Multi-ID Users letters numbers dogg: hey anna, Z rocks! anna: hey dogg whats new mack: what’s that dogg? dogg: not much mack: i love 5 catt: i don’t anna mack dogg/catt joop

  9. Multi-ID Users letters numbers dogg: hey anna, Z rocks! anna: hey dogg whats new mack: what’s that dogg? dogg: not much mack: i love 5 catt: i don’t joop: 27 rules! anna mack dogg/catt joop

  10. Multi-ID Users letters numbers dogg: hey anna, Z rocks! anna: hey dogg whats new mack: what’s that dogg? dogg: not much dogg: Z is the best mack: i love 5 catt: i don’t joop: 27 rules! anna mack dogg/catt joop

  11. Multi-ID Users letters numbers dogg: hey anna, Z rocks! anna: hey dogg whats new mack: what’s that dogg? dogg: not much dogg: Z is the best mack: i love 5 catt: i don’t joop: 27 rules! catt: i agree joop :) anna mack dogg/catt joop

  12. Overview • A model of a public forum • Two efficient statistics-based algorithms for identification of multi-ID users • Simulation results

  13. Overview • A model of a public forum • Two efficient statistics-based algorithms for identification of multi-ID users • Simulation results

  14. A Model of a Public Forum • Every actor has one response queue • Common average and variance of response delay • The server has a queue to process messages with very short delay

  15. Model Friendship Graph anna mack dogg joop catt

  16. Model Alias Graph anna mack dogg joop catt

  17. Multi-ID Users • One response queue for each actor dogg catt anna catt anna mack dogg/catt joop

  18. Multi-ID Users letters numbers dogg: hey anna, Z rocks! dogg dogg catt anna catt anna mack dogg/catt joop

  19. Multi-ID Users letters numbers dogg: hey anna, Z rocks! mack: i love 5 mack dogg dogg catt catt anna mack dogg/catt joop

  20. Multi-ID Users letters numbers dogg: hey anna, Z rocks! anna: hey dogg whats new mack: i love 5 anna dogg dogg mack catt anna mack dogg/catt joop

  21. Multi-ID Users letters numbers dogg: hey anna, Z rocks! anna: hey dogg whats new mack: what’s that dogg? mack: i love 5 mack anna dogg mack catt anna mack dogg/catt joop

  22. mack anna Multi-ID Users letters numbers dogg: hey anna, Z rocks! anna: hey dogg whats new mack: what’s that dogg? mack: i love 5 catt: i don’t catt mack catt anna mack dogg/catt joop

  23. Multi-ID Users letters numbers dogg: hey anna, Z rocks! anna: hey dogg whats new mack: what’s that dogg? dogg: not much mack: i love 5 catt: i don’t dogg mack catt anna catt anna mack dogg/catt joop

  24. Multi-ID Users letters numbers dogg: hey anna, Z rocks! anna: hey dogg whats new mack: what’s that dogg? dogg: not much mack: i love 5 catt: i don’t joop: 27 rules! joop dogg catt mack catt anna mack dogg/catt joop

  25. Multi-ID Users letters numbers dogg: hey anna, Z rocks! anna: hey dogg whats new mack: what’s that dogg? dogg: not much dogg: Z is the best mack: i love 5 catt: i don’t joop: 27 rules! dogg joop dogg catt mack anna mack dogg/catt joop

  26. Multi-ID Users letters numbers dogg: hey anna, Z rocks! anna: hey dogg whats new mack: what’s that dogg? dogg: not much dogg: Z is the best mack: i love 5 catt: i don’t joop: 27 rules! catt: i agree joop :) catt dogg dogg catt joop anna mack dogg/catt joop

  27. Overview • A model of a public forum • Two efficient statistics-based algorithms for identification of multi-ID users • Simulation results

  28. First Algorithm • Collect information {‹timei, IDi›} • Compute minimum time delay minD for every pair of IDs • Cluster the delays using k-means into two groups. • Call pairs with larger center suspected to be the same actor (red) • Call pairs with smaller center suspected to different (blue) • Connected components using red edges are ID groups representing one actor

  29. 1. Collect information time dogg dogg dogg mack mack anna catt catt joop

  30. 2. minD(dogg, mack) dogg dogg dogg mack mack

  31. 2. minD(dogg, catt) dogg dogg dogg catt catt

  32. 3. Cluster minD values minD(dogg, catt) minD(dogg, mack) Different actors Same actor

  33. Contradictions 3. Resulting Alias Graph anna mack dogg joop catt

  34. 4. One Red Component anna mack dogg joop 9 false positives 10% accuracy catt

  35. Refinement Algorithm • Find groups connected with red edges • Color IDs into smaller ID groups using blue edges Follow original algorithm

  36. 4. One Red Component anna mack dogg joop catt

  37. 5. Color Blue Graph anna mack dogg joop catt

  38. 1 false positive 90% accuracy 5. Color Blue Graph anna mack dogg joop catt

  39. Overview • A model of a public forum • Two efficient statistics-based algorithms for identification of multi-ID users • Simulation results

  40. Simulation results

  41. Future Work • Real-life testing • More sophisticated model • Different types of users • Short and overlapping online appearances • Length of posts • Algorithm improvements • Statistical evaluation of the new models • Lexical and semantic analysis

More Related