1 / 28

SPAM DETECTION IN P2P SYSTEMS

SPAM DETECTION IN P2P SYSTEMS. Team Matrix Abhishek Ghag Darshan Kapadia Pratik Singh. OVERVIEW. P2P Basics Spam The Spam Detection Problem Approaches to the Spam Detection Problem Proposal References. P2P Basics. Used to connect nodes or machines via large adhoc connections.

hao
Download Presentation

SPAM DETECTION IN P2P SYSTEMS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SPAM DETECTIONIN P2P SYSTEMS Team Matrix Abhishek Ghag Darshan Kapadia Pratik Singh

  2. OVERVIEW • P2P Basics • Spam • The Spam Detection Problem • Approaches to the Spam Detection Problem • Proposal • References

  3. P2P Basics • Used to connect nodes or machines via large adhoc connections. • No concept of a client or server. • All nodes or peers are equal. • The equal peer nodes function as both client and server. • Classification of P2P:- • Centralized P2P network – Napster. • Decentralized P2P network – KaZaA. • Structured P2P network – CAN. • Unstructured P2P network – Gnutella. • Hybrid P2P network – JXTA.

  4. Advantages of P2P:- • All peers provide resources like bandwidth, computing power, storage space, CPU cycles. • Replication of data over multiple peers eliminates single point of failure. • Applications of P2P:- • File Sharing • Internet Telephony e.g. Skype. • Streaming media files.

  5. From http://www.acm.org/crossroads/xrds9-4/gfx/GamestateFidelity1.jpg

  6. Spam • Spam is any file that is misrepresented deliberately. • A well known problem in P2P file sharing systems. • Used to manipulate established retrieval and ranking techniques. • Anonymous, decentralized and dynamic in nature.

  7. Spam Taken From Malware Prevalence in the KaZaA FileSharing Network Research Paper ACM

  8. Virusesin P2P Taken From Malware Prevalence in the KaZaA FileSharing Network Research Paper ACM

  9. Why is Spam Harmful? • Degrades user search experience. • Assists the propagation of viruses in the network. • More than 200 viruses use P2P as a propagation vector. • Increases the load on the traffic in the network.

  10. Spam • Hard to detect spam automatically as:- • Insufficient and biased information returned as user query. • Anonymous, decentralized and dynamic nature. • Naïve spam detection technique is download and check manually.

  11. Approaches to Spam Detection Problem • Mainly two approaches to the spam detection problem. • Detection after downloading file • User compares the file with the known databases of genuine files. • User filters the file so that other user don't get the spammed copy • Detection before downloading file • Rigid Trust • Web of trust • Reputation System • Blocking IP address

  12. Object Reputation:- • Involves the user to vote for a file either positively or negatively. • Based on the voting evaluation and the voting protocol, the file is regarded as genuine or spam. • Disadvantages: - • Consumes time and labor. • Wastage of bandwidth and computing resources. • Risk of opening malware. • Thus there arises a need to develop an effective automatic spam detection technique.

  13. Goal Automatic Detection of Spam files.

  14. Query Processing • Client writes a query. • Server compares the result. • System Identifier and descriptor. • The client groups the individual groups by keys. • Ranking. • The client becomes the server.

  15. Spamming • Steps 1, 3 and 5. • Object Reputation on step 1. • Feature based Spam Detection on steps 3 and 5.

  16. Feature Based Spam Detection • Characterizing Spam. • Characterizing Spammers. • Then implement techniques that use this characterization to rank the query results.

  17. Classification of Spam Type 1:- • Files whose replicas have semantically different descriptors. • The Spammer might name a file after a currently popular song or might give multiple names to the same file descriptor. Eg: different song titles for a same key 26NZUBS655CC66COLKMWHUVJGUXRPVUF: “12 days after christmas.mp3” “i want you thalia.mp3” “come on be my girl.mp3” …

  18. Classification of Spam • Type 2:- • Files with long descriptors • In this a Spammer inserts a single long descriptor for the file. • E.g., a single replica descriptor for key 1200473A4BB17724194C5B9C271F3DC4: “Aerosmith, Van Halen, Quiet Riot, Kiss, Poison, Acdc, Accept, Def Leappard, Boney M, Megadeth, Metallica, Offspring, Beastie Boys, Run Dmc, Buckcherry, Salty Dog Remix.mp3”

  19. Classification of Spam • Type 3:- • Files with descriptors with no query terms. • In this, if a server is wishing to share a file, it may return the file regardless of whether it matches the query results. • Eg. “ Can you afford 0.09 www.BuyLegalMP3.com.mp3”

  20. Classification of Spam • Type 4:- • Files that are highly replicated on a single peer. • Normal users do not create multiple replicas of the same file on a single server. This is aimed at manipulating the group size. It retards processing of query routing techniques used for finding hard to find data. • E.g..177 replicas of the file DY2QXX3MYW75SRCWSSUG6GY3FS7N7YC shared on a single peer.

  21. Proposal • We plan to implement the Feature based Spam Detection technique that characterizes the spam based on various features. • It includes a probing technique that aggregates more descriptive information of result files and statistics of peer and ranking functions. • Our implementation requires little new functionality in the existing P2P file sharing systems, thus it can be combined easily with other existing techniques.

  22. References • Papers. • Author – Dongmei Jia Title – Cost Effective Spam Detection Techniques in P2P File Sharing Systems. Conference -- Proceeding of the 2008 ACM workshop on Large scale Distributed Systems for information retrieval. Date -- October 2008. Publisher -- ACM. URL -- http://portal.acm.org.ezproxy.rit.edu/results.cfm?coll=portal&dl=ACM&CFID=14901064&CFTOKEN=96029385

  23. Author – Dongmei Jia, Wai Gen Yee, Ophir Frieder Title – Spam Characterization and Detection in Peer to Peer File Sharing Systems. Conference -- Proceeding of the 17th ACM conference on Information and knowledge mining Date -- October 2008. Publisher -- ACM. URL -- http://portal.acm.org.ezproxy.rit.edu/citation.cfm?id=1458082.1458128&coll=portal&dl=ACM&CFID=14901064&CFTOKEN=96029385 References

  24. References • Author – Jia Liang, Rakesh Kumar, Yongjian Xi, Keith W Ross Title – Pollution in P2P File Sharing Systems. Conference -- INFOCOM 2005. 24th Annual Joint Conference of the IEEE Computer and Communications Societies. Proceedings IEEE Date -- March 2005. Publisher -- ACM. URL -- http://ieeexplore.ieee.org.ezproxy.rit.edu/stamp/stamp.jsp?arnumber=1498344&isnumber=32100

  25. SOURCES • http://en.wikipedia.org/wiki/Peer-to-peer

  26. Questions???

More Related