1 / 28

SNFS: The design and implementation of a Social Network File System

SNFS: The design and implementation of a Social Network File System. Ch. Kaidos, A. Pasiopoulos N. Ntarmos, P. Triantafillou University of Patras. Shameless plug. If interested, please check out eXO: Decentralized Autonomous Scalable Social Networking ,

dandre
Download Presentation

SNFS: The design and implementation of a Social Network File System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SNFS: The design and implementationof a Social Network File System Ch. Kaidos, A. Pasiopoulos N. Ntarmos, P. Triantafillou University of Patras

  2. Shameless plug.. • If interested, please check out • eXO: Decentralized Autonomous Scalable Social Networking, • 5th Conference on Innovative Data Systems Research (CIDR2011), 2011.

  3. Social Networks • Our Take: • Search for • People (friends, experts, …) • Content (books, photos, videos, blogs, websites, …) • Form entities (collections) • Friends-lists, content-libs • Search for • entities • Using previously-formed collections… • SNFS currently provides the foundation for these… Social Networks

  4. Tagging • Profiles: • sets of tags describing entities. • “Search for”: • based on profiles. • Ranked retrieval (top-k) Tag 1 Tag 2 Tag 3 Tag 4 Tag 5

  5. Current State 5,000,000,000 photos 3,000 photos/min (as of September 2010) 2,000,000,000 videos served up each day (May 2010) 600,000,000 monthly active users (January 2011) 15,000,000 books (October 2010) 130,000,000 by the end of the decade

  6. Current State Need to access published content 22,750,000,000 queries in search engines 4,000,000,000 queries in YouTube 351,000,000 queries in Facebook 416,000,000 queries in MySpace (U.S. market figures, December 2009) ?

  7. Current State How do I provide intresting objects to my users? How do I find stuff I want?

  8. Proposal A content-aware file system for Social Network Systems Usefull to users... ... And service providers too!

  9. Previous Work on File Indexing 1991 – Semantic File Systems by Gifford 1996 – BeFS by Giampaolo and Meurillon, part of the BeOS BeOS never had commercial success... 1998 – Indexing Service on Windows NT, not needed at the time Remnant of the Object File System from the unmaterialized Cairo project • Typically • no ranked retrieval • No users’ input (tags) • No user relationships

  10. Desktop Searches 2004 – Windows Desktop Search, widely popular 2005... – Mac OS X's Spotlight, Google Desktop, Beagle, Strigi, Tracker... • Typically • no ranked retrieval ? • No user relationships • no exploits from relations for searching

  11. Problems Power tools for power users... But for average users... Boolean operators??? SQL like queries???

  12. Previous Work on Ranked Retrieval 1968 – SMART system by Salton, introduced weights in retrieval, instead of classical Boolean retrieval 1975 – Vectors and cosine similarity by Salton 1988 – Other functions for similarity tested and evaluated by Salton and Buckley 2003 – Fagin proposes and compares several efficient algorithms for top-k retrieval

  13. Design

  14. Design – SNFS Tags are extracted from object, stemmed and frequency is counted Each object is associated with a unique id in a Tree Weights for each tag and document are calculated A tf-idf weighting scheme was chosen

  15. Design – SNFS Term Weight and Object ID are stored in an inverted index Each posting list of the index is a B+Tree stored in secondary memory The position of the root of the B+Tree in the index is stored in a Red Black Tree

  16. Design – Search and retrieval The query is split in terms and stemmed The score of each document is calculated using a threshold algorithm and a tf-idf function

  17. Threshold Algorithms Input: Posting lists sorted on weight (decreasing) NRA (No Random Access) Algorithm Score Doc ID Doc ID d1 s1 t1 d1 d4 d2 s2 +s6 +s7 d2 t2 s3 +s8 d5 d3 d3 d2 s4 +s9 d4 t3 d2 d4 d3 s5 d5 depth 1 2 3 Threshold s1+s2+s3 t1 s4+s5+s6 s7+s8+s9 When no score bellow the top-k objects can be improved to exceed the threshold the algorithm halts

  18. Threshold Algorithms Input: Posting lists sorted on weight (decreasing) TA (Threshold Algorithm with random accesses) Score Doc ID Doc ID d1 s1 t1 d1 d4 d2 d5 s2 +s6 +s7 d2 t2 s3 +s8 d5 d3 d3 d2 s4 +s9 d4 t3 d2 d4 d3 +s10 s5 d5 depth 1 2 3 Threshold s1+s2+s3 s4+s5+s6 s7+s8+s9 When score of the last object is bellow threshold the algorithm halts

  19. Qualitative Comparison NRA TA Disk Accesses System Calls State Keeping and computation We expect TA to perform many more slow disk accesses Can NRA's large state keeping keeping and computation need overcome TA's disk accesses? We implement both, on hard disk and on RAM-disk to find out...

  20. Implementation with FUSE

  21. Testing - 4 real world test sets - files containing tags from online objects - index is normally on secondary memory - ram-disk used to evaluate the effect of disk accesses

  22. Results demanded vs Time Disk based index TA NRA

  23. Results demanded vs Time RAM based index TA NRA

  24. Query Terms vs Time Disk based index TA NRA

  25. Query Terms vs Time RAM based index TA NRA

  26. Beagle vs NRA Terms vs time Results vs time

  27. Conclusions SNFS: - Indexing, storage, and ranked retrievalof entities in a SN. - Study of efficiencyof algorithms and implementations, using real-world data, and various implementations. - Competitive performance, (eg against Beagle). - Many ways of further expansion

  28. Future Work - Expansion for distributed systems and clouds - Distributed file systems (HDFS) - Distributed data structures - Tagging, Indexing, and searching for entity-collections – straightforward, as our ‘object’ implementation/abstraction captures this. • Establishing entities consisting of relationships between entities, using advanced-tagging, and searching for these…

More Related