Distributed Storage Allocations and a Hypergraph Conjecture of Erd ő s

Distributed Storage Allocationsand a Hypergraph Conjecture of Erdős • Yi-Hsuan Kao1, Alexandros G. Dimakis2,Derek Leong3 4, and Tracey Ho3 • 1University of Southern California, Los Angeles, California, USA • 2The University of Texas at Austin, Austin, Texas, USA • 3California Institute of Technology, Pasadena, California, USA • 4Institute for Infocomm Research, Singapore • ISIT 2013 • 2013-07-09

Distributed Storage Allocations: An Example Suppose you have a distributed storage systemcomprising 5 storage devices (“nodes”)… 1 2 3 4 5

Distributed Storage Allocations: An Example Each node independently fails with probability 1/3, and survives with probability 2/3… 2 4 1 2 3 4 5 (1/3)2 (2/3)3 ≈ 0.0329218

Distributed Storage Allocations: An Example Each node independently fails with probability 1/3, and survives with probability 2/3… 2 4 1 3 5 1 3 5 2 4 (1/3)5 ≈ 0.00411523

Distributed Storage Allocations: An Example You are given a single data object of (normalized) unit size, and a total storage budget of 7/3… 1 2 3 4 5

Distributed Storage Allocations: An Example You can use any coding schemeto store any amount of coded data in each node,as long as the total amount of storage usedis at most the given budget 7/3… 1 2 3 4 5

Distributed Storage Allocations: An Example 1 2 3 4 5 010010101010010101000101010101000101010111010101001001010001010100 01101010001010101110101010010010100010101001 1010010101000101001110 1010010101000101001110

Distributed Storage Allocations: An Example (1/3)2 (2/3)3 ≈ 0.0329218 01101010001010101110101010010010100010101001 010010101010010101000101010101000101010111010101001001010001010100 1010010101000101001110 1010010101000101001110 1 2 3 4 5 ?

Distributed Storage Allocations: An Example For maximum reliability, we need to find (1) an optimal allocation of the given budget over the nodes, and (2) an optimal coding scheme that jointly maximize the probability of successful recovery

Distributed Storage Allocations: An Example Using an appropriate code, successful recovery can occur wheneverthe data collector accesses at least a unit amount of data(= size of the original data object) s 1 2 3 4 5 t2 t1 A. G. Dimakis et al., “Network coding for distributed storage systems,” Trans. Inf. Theory, Sep 2010. A. Jiang, “Network coding for joint storage and transmission with minimum cost,” in Proc. ISIT, Jul 2006.

Distributed Storage Allocations: An Example 1 2 3 4 5

Distributed Storage Allocations: An Example n= 5 nodes, access probability p= 2/3, budget T = 7/3 RecoveryProbability A 7/157/15 7/157/157/15 0.79012 B 7/6 7/60 0 0 1 5 4 3 2 0.88889 C 2/3 2/3 1/31/31/3 C 0.90535

Distributed Storage Allocations • Two access models for the data collector • Independently probabilistic access to each node • Access to a random fixed-size subset of nodes • Goal: Allocate a given budget T for maximum reliability,i.e., maximize the probability of successful recovery • Combinatorial problem data collector coded data (size T) data (unit size) n storage nodes D. Leong, A. G. Dimakis, and T. Ho, “Distributed storage allocations,” IEEE Trans. Inf. Theory, Jul. 2012.

Fixed-Size Subset Problem: A Conjecture • Our Conjecture:If budget T is an integer, then the optimal allocation is either maximal spreading (1/r, ..., 1/r, 0, .., 0) or minimal spreading (1, ..., 1, 0, ...,0) • Our conjecture turns out to be equivalent to the Fractional Erdős Conjecture (e.g., see N. Alon, P. Frankl, H. Huang, V. Rödl, A. Rucinski, and B. Sudakov, “Large matchings in uniform hypergraphs and the conjectures of Erdős and Samuels,” Journal of Combinatorial Theory Series A, vol. 119, pp. 1200–1215, 2012.)

Hypergraphs and the Minimum Vertex Cover Problem • The Erdős conjecture is a graph theoretic claim involving hypergraphs and their matching numbers • More intuitive to explain in terms of the dual problem: Minimum Vertex Cover Problem: • Given a hypergraph H = (V,E) • Assign 0 or 1 to each vertex in V such that each hyperedge in E has a sum of at least 1 (each hyperedge is “covered” by one or more vertices) • Goal is to minimize the sum over all vertices • Vertices = nodes (assigned value is xi) • Hyperedges = successful r-subsets of nodes • In the fractional version of the problem, we are allowed to assign a fractional value between 0 and 1 to each vertex • By strong duality, the fractional minimum vertex cover problem and the fractional maximum matching problem have the same optimal solution

Integral Erdős Conjecture (1965) • Erdős conjectured that given a constraint on the matching number, the maximum number of hyperedges in an r-uniform hypergraph can be one of only two possible values: maximal spreading (1/r, ..., 1/r,0,..,0) minimal spreading (1,...,1,0,..,0)

Fractional Erdős Conjecture maximal spreading (1/r, ..., 1/r,0,..,0) minimal spreading (1,...,1,0,..,0) max spreading min spreading

Further Results Involving the Conjectures • If the Integral Erdős Conjecture is true, then we obtain strong bounds on the probability of successful recovery for the fixed-size subset problem • Also, we found new conditions under which the Integral Erdős Conjecture implies the Fractional Erdős Conjecture.

Upper Bounds for the Fixed-Size Subset Problem • We found stronger bounds for the fixed-size subset problem that are independent of the conjectures • These bounds can also be used to improve our earlier bounds for the independent probabilistic access variation of the storage allocation problem • Theorem: For any feasible allocation (x1, ..., xn), i.e., such that x1 + ... +xn ≤ T and xi ≥ 0 for all i,the number of successful r-subsets S has the following upper bound: • Proof uses permutation counting argumentssimilar to Katona’s proof of the Erdős-Ko-Rado theorem

Thank You!

Distributed Storage Allocations and a Hypergraph Conjecture of Erd ő s

Distributed Storage Allocations and a Hypergraph Conjecture of Erd ő s

Presentation Transcript

CS590A Distributed Network Algorithms Prof. Gopal Pandurangan

An Introduction to the Storage Bridge Bay Specification

Lecture 5: Record Storage and Primary File Organizations

QoS Support in Operating Systems

On the Unique Games Conjecture

Midterm Review CS 230 – Distributed Systems (ics.uci/~cs230)

Determining Global States of Distributed Systems

Storage Area Network (SAN)

Open Distributed Processing and Multimedia

Massively Parallel/Distributed Data Storage Systems

Table Storage

Introduction Background Distributed DBMS Architecture Distributed Database Design

Storage and Querying

Distributed Databases

Steiner Ratio

Outline

Chapter 23

Distributed k -ary System Algorithms for Distributed Hash Tables

Chapter 11: Storage and File Structure

Erasure coding

Background

Storage Area Network (SAN)