1 / 13

INDRA - A Distributed In-memory Cache for Online Social Networks

INDRA - A Distributed In-memory Cache for Online Social Networks. Long Kai Anjali Sridhar. Sreeram Kannan Siva Theja Maguluri. Motivation – “big multi-get”. Memcached In-memory distributed hash table service used in Facebook 400k connections to any Memcached server

benson
Download Presentation

INDRA - A Distributed In-memory Cache for Online Social Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. INDRA - A Distributed In-memory Cache for Online Social Networks Long Kai Anjali Sridhar SreeramKannan Siva Theja Maguluri

  2. Motivation – “big multi-get” • Memcached • In-memory distributed hash table service used in Facebook • 400k connections to any Memcached server • Estimated 5 GB memory is required to maintain TCP connections • Replace TCP with UDP • High Communication overhead

  3. Related Work - SPAR • In the consistent storage(SPAR): • Scalability: on average 7 copies are stored • System flexibility: confined to multi-get applications of small data items • Algorithmic flexibility: inefficient dynamic adaptation of new usage pattern • Reliability: complicated failure recovery mechanism • Load-balancing J.M. Pujol, V. Erramilli, G. Siganos, X. Yang, N. Laoutaris, P. Chhabra, and P. Rodriguez. The little engine (s) that could: Scaling online social networks. InACM SIGCOMM Computer Communication Review ,volume 40, pages 375–386. ACM, 2010.

  4. Indra Guarantees: Most recent copy is present in the primary. Secondary copies are eventually consistent. Design principles: Reliance on the consistent storage for failure recovery. Idempotent operations only CLIENT INDRA SERVER (MEMORY-CACHE) CONSISTENT STORAGE

  5. Advantages • Modularity: Data reliability is decoupled from the partition and replication algorithm module. • Flexibility: Use of caching and eviction at individual servers to dynamically adapt to new usage patterns.

  6. Algorithm • Indra objectives • To place friends’ data together • To replicate popular data items • Based on access log • Weighted graph; weights denote joint access frequency e a d c c d Placement Replication b f

  7. Mathematical Model Balance the load among servers Minimize replicas Collocate user with friends Collocation Gain Replication Cost Server Load Cost Maximize - - • Placement plans • Replication plans Among all Problem incorporates several NP-Hardproblems!

  8. Problem Decomposition Key Idea: Separate the problem into two simpler problems! Collocation Gain Collocation Gain Server Load Cost Replication Cost Partitioning Problem Maximize Maximize - - Placement plans Replication plans Among all Among all Replication Problem

  9. Online Algorithm • Current state of system • User arrives • Compare the two possible placements • Assigned to server 1 • Replicated at server 2 • User arrives • Placement trades off collocation gain and load balancing cost e a d c g g c d b f g h

  10. Evaluation • Data Set • Random Walk on Facebook Data Set from Max Planck Institute for Software Systems • 6373 vertices and 183,734 edges; average 28.83 neighbors • Metrics • Number of Connections • Number of TCP packets • Experimental Setup • 10 / 15 Servers, • Consistent Storage interface, • Offline algorithm • 1000 read requests • Tcpdumpand Wiresharkfor bandwidth analysis

  11. Bandwidth : Indra Vs Random Replication Factor: 10 Servers: 1.2 15 Servers :1.8

  12. Trade off between Replication and Connections

  13. Contributions • Proposed In-Memory Distributed Cache • Takes advantage of data access relationships • Retrieves small data items in Online Social Networks • Uses Dynamic Partition and Replication Algorithm • Results show • Factor of 4 decrease in the number of TCP packets • Can trade off Replication for number of connections Thanks!

More Related