Manageability, Availability and Performance in Porcupine: A Highly Scalable, Cluster-based Mail Service

Manageability, Availability and Performance in Porcupine: A Highly Scalable, Cluster-based Mail Service Yasushi Saito, Brian N Bershad and Henry M.Levy University of Washington

What is Porcupine? • Highly scalable Mail server • “Cluster based internet mail server using SMTP” • Why do we need a another mail service ? • Conventional systems do not exploit the Heterogeneity of the nodes. • Conventional systems are not efficient • Conventional systems use legacy software Kalyan Boggavarapu Lehigh University CSE 498

Disadvantages of Conventional Mail Servers • Manageability: • The earlier systems are to be configured manually. • System has to be tuned for the newly added node / system in the distributed file system. • So a lot of work is involved when a node fails or a new node is added to the system. • Availability: • This depends on how can the system tolerate the loss of a node. • The conventional systems are less fault tolerant • When a node has failed the users on that node cannot access the nodes temporarily. • Performance: • Number of nodes in the system is not proportional to performance. • No dynamic load balancing Kalyan Boggavarapu Lehigh University CSE 498

Goals • Manageability • Availability • Performance Billions messages per second Kalyan Boggavarapu Lehigh University CSE 498

System Overview Kalyan Boggavarapu Lehigh University CSE 498

How Porcupine Achieve its goals Kalyan Boggavarapu Lehigh University CSE 498

Key Data Structures • Mailbox fragment • Mail map • User profile database • User profile soft state (set of users) • User map • Cluster membership list Kalyan Boggavarapu Lehigh University CSE 498

Data Structure Managers Kalyan Boggavarapu Lehigh University CSE 498

A cluster of 2 Kalyan Boggavarapu Lehigh University CSE 498

Receiving a Message Kalyan Boggavarapu Lehigh University CSE 498

Load Balancing • Equal distribution of data among the nodes • Identify the hot-spots and divide the load accordingly • Test Bed • Systems: 30 • Ethernet: 100Mbps • OS: Linux 2.2.7 • Mean Message Size: 4.7KB; Max 1MB • Number of users: 5M • Authentication: No Kalyan Boggavarapu Lehigh University CSE 498

Manageability Kalyan Boggavarapu Lehigh University CSE 498

Porcupine re-configures automatically Without: fall in #msgs = 100(approx) With: fall in # of msgs = 50(approx) Kalyan Boggavarapu Lehigh University CSE 498

Availability Kalyan Boggavarapu Lehigh University CSE 498

Mail map consistency • C fails before update • No problem the message is replicated • C deleted all the messages of Bob (A), but update failed. • No problem A will delete the dangling pointers • A fails before the update • A new manager will take the update later Kalyan Boggavarapu Lehigh University CSE 498

States of Replication • Hard State • Password and Userlogin is written permanently. • Data that should not be lost. • Soft State • User to nodes mapping. • This can be reconstructed after a loss. Kalyan Boggavarapu Lehigh University CSE 498

Hard State Replication • Aim: consistency • Type: Per-message, Per-User • Effect: efficient during normal operation Kalyan Boggavarapu Lehigh University CSE 498

Effect of Replication Kalyan Boggavarapu Lehigh University CSE 498

B B B B B C C C C C A A A A A B B B B B A A A A A B B B B B A A A A A C C C C C Soft-state Reconstruction 2. Distributed disk scan 1. Membership protocol Usermap recomputation B A A B A B A B A C A C A C A C A bob: {A,C} bob: {A,C} bob: {A,C} suzy: suzy: {A,B} B A A B A B A B A C A C A C A C B joe: {C} joe: {C} joe: {C} ann: ann: {B} suzy: {A,B} C suzy: {A,B} suzy: {A,B} ann: {B} ann: {B} ann: {B} Kalyan Boggavarapu Lehigh University CSE 498 Timeline

Advantages of Porcupine • Best use of Resources • Self configuration • Dynamic load balancing • Result: • Geographically distributed clusters servers • Highly scalable • Fault tolerant • Future work • Better membership protocol • Applying porcupine to other applications like Usenet. Kalyan Boggavarapu Lehigh University CSE 498

Sources • Porcupine figure in all slides is from http://www.bluebison.net/yosemite/porcupine.htm • Diagrams in slides 17,19 are from slides at http://www.hpl.hp.com/personal/Yasushi_Saito/pubs.html#publications Kalyan Boggavarapu Lehigh University CSE 498

Manageability, Availability and Performance in Porcupine: A Highly Scalable, Cluster-based Mail Service

Manageability, Availability and Performance in Porcupine: A Highly Scalable, Cluster-based Mail Service

Presentation Transcript

TempDB: Performance and Manageability

Cluster Resource Management: A Scalable Approach

TempDB : Performance and Manageability

Manageability, Availability and Performance in Porcupine: A Highly Scalable, Cluster-based Mail Service

Kargus : A Highly-scalable Software-based Intrusion Detection System

Cluster-Based Scalable Network Services

AVAILABILITY PERFORMANCE

Highly Scalable Packetised correlators

Performance and Availability in Wide-Area Service Composition

Linux High-Availability Cluster

Building Highly Scalable Websites

Manageability, availability and performance in Porcupine: scalable, cluster-based mail service

A Highly Scalable Perfect Hashing Algorithm

Porcupine: A Highly Available Cluster-based Mail Service

Porcupine: A Highly Scalable, Cluster-based Mail Service

Porcupine: a highly scalable email service

Porcupine: A Highly Available Cluster-based Mail Service

Highly scalable & functional Magento website design service

Mail Service Performance

Mail Service Performance

Flexibility, Manageability and Performance in a Grid Storage Appliance

AVAILABILITY PERFORMANCE