1 / 25

Detecting Gambling Sites From Post Behaviors

Detecting Gambling Sites From Post Behaviors. Shensi Tong, Hanlong Zhang, Beijun Shen , Hao Zhong Shanghai Jiao Tong University Yongjian Wang, Bo Jin The Third Research Institute of Ministry of Public Security Presented By Shensi Tong 2 016.5.16. Outline. Introduction Approach

Download Presentation

Detecting Gambling Sites From Post Behaviors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Detecting Gambling Sites From Post Behaviors ShensiTong,HanlongZhang,BeijunShen,HaoZhong ShanghaiJiaoTongUniversity YongjianWang,BoJin TheThird Research Institute of Ministry of Public Security PresentedByShensiTong 2016.5.16

  2. Outline • Introduction • Approach • Evaluation • Optimization • Conclusions

  3. Introduction • DetectingGamblingSitesisimportant • Internetgamblingisevenmoreaddictivethantraditionalgambling,whichisharmful • MostcountriesexplicitlyprohibitInternetgamblingorunderstrictlysupervision • DetectingGamblingSitesischallenging • Tothebestofourknowledge, nopreviousworkwasproposedtodetectgamblingsitesautomatically • Thereisnoconsensusofwhichisbestfeaturetodetectgamblingsites

  4. Introduction • MajorContributions • The first approach that mines behavior models for gamblingsites and detects previously unknown gambling siteswith mined models • A tool and two evaluations on 1TB dataset. The resultsshow that our tool detects gambling sites effectively. POST behavior of a website is the best feature todetermine whether it is a gambling site or not • An addition evaluation on applying graph analysis toimprove our approach. The results are valuable to furtheroptimize our approach

  5. Outline • Introduction • Approach • Evaluation • Optimization • Conclusions

  6. Approach

  7. Approach • PreprocessingHTTPPOSTs • Typically,aPOSTrequestmessageconsistsofthefollowingparts • Requestline • “POST/a/.../script?K1 =V1 &...&Kn =Vn HTTP/1.1” • Cookieinrequestheader • “JSESSIONID=064185D5B6; NETEASE SSN=shanghai” • Requestbody • “subject=Test&message=test&formhash=bbb14e19&usesig =1&posttime=138672”. Hashpost = MD5( Script& Keys( RequestLine)& Keys( RequestBody))

  8. Approach • ClusteringSites • Filtering • Inthispaper,wesetα1to5 • ComputestheJaccardcoefficientbetweentwowebsites • Weputtwowebsitesintothesameclusterifandonlyiftheirsimilarityvalueishigherthanapredefinedthresholdβ1

  9. Approach • MiningBehaviorModels • Pickoutgamblingsiteclustersmanually • Minesabehaviormodelforcluster • POSTTF-IDF • Sortinadescendingorderandselecttopα3 as the model

  10. Approach • DetectingPreviousUnknownGamblingSites • Calculatethesimilarityvaluebetweenunknownsitesandminedmodel • Ifthevalueishigherthanthresholdβ2, wesetittogamblingsites • Ifsomesitesnotfollowanyminedmodel,were-runourapproachtotrainanewmodel

  11. Outline • Introduction • Approach • Evaluation • Optimization • Conclusions

  12. Evaluations • Datasets • 4,000,000,000HTTPPOSTs • 750,000sites • 1TB • ErrorMeasures

  13. Evaluations

  14. Evaluations

  15. Conclusion • Features • URL • Consistsoflexicalandhostinformation • HTML • ExtractsfromHTMLtagsthatappearinHTMLcodeofWebpages • Semantic • CapturestextualinformationthatisvisibleonWebpages

  16. Outline • Introduction • Approach • Evaluation • Optimization • Conclusions

  17. Optimization • GraphAnalysisFeatures • Degree • Numberofitsneighbors • Similarity • Similaritybetweentwowebsites • HashCount • UniqueHashPOSTforawebsite • Utmcsr • Sourcewebsitetoenterthiswebsite • Utmctr • Keywordsthatenterinsearchengine • Utmv • Usedtoidentifyasitefortrafficstatistics

  18. Optimization • Observation1 • Likeattractslike

  19. Optimization • Observation2 • Concentration

  20. Optimization • Observation3 • Anomaly

  21. Optimization • OptimizationResults • Matchingvaluesincookies • Ifsomekeywordsappearsinutmctr, thesiteislikelytobeagamblingsites • Filteringoutliersfromsites • DetermineawebsitewhetherbelongtoaclusteraccordingtoitsHashCount • FilteringlargePOSTsites • Filteringoutliersfromclusters

  22. Outline • Introduction • Approach • Evaluation • Optimization • Conclusions

  23. Conclusion • We propose a novel approach that detects gambling sites based on POST behavior • We evaluate our approach on large corpus, and our results show that our approach achieves both high precision and recall • We apply graph analysis to improve performance and recall

  24. Q&A

  25. Thank you

More Related