Distributed Storage Allocation Problems

Distributed Storage Allocation Problems Derek Leong, Alexandros G. Dimakis, Tracey Ho California Institute of Technology NetCod 2009 2009-06-16

Motivation

Motivation 0.1 2 ? ? ? ? ? Σ≥1?

Motivation A 1 1 0 0 0 B 2/5 2/5 2/5 2/5 2/5 C 1/2 1/2 1/2 1/2 0

Motivation A 1 1 0 0 0 Success probability = 0.90× 0.15×0 successful 0-subsets + 0.91× 0.14×2 successful 1-subsets+ 0.92× 0.13×7 successful 2-subsets+ 0.93× 0.12×9 successful 3-subsets+ 0.94× 0.11×5 successful 4-subsets+ 0.95× 0.10×1 successful 5-subsets =0.99

Motivation B 2/5 2/5 2/5 2/5 2/5 Success probability = 0.90× 0.15×0 successful 0-subsets + 0.91× 0.14×0 successful 1-subsets+ 0.92× 0.13×0 successful 2-subsets+ 0.93× 0.12×10 successful 3-subsets+ 0.94× 0.11×5 successful 4-subsets+ 0.95× 0.10×1 successful 5-subsets =0.99144

Motivation C 1/2 1/2 1/2 1/2 0 Success probability = 0.90× 0.15× 0 successful 0-subsets + 0.91× 0.14× 0 successful 1-subsets+ 0.92× 0.13×6 successful 2-subsets+ 0.93× 0.12×10 successful 3-subsets+ 0.94× 0.11×5 successful 4-subsets+ 0.95× 0.10×1 successful 5-subsets =0.9963

Motivation A 0.99 1 1 0 0 0 B 0.99144 2/5 2/5 2/5 2/5 2/5 0.9963 C 1/2 1/2 1/2 1/2 0

Motivation 0.1 2 allocationmodel access model ? ? ? ? ? Σ≥1?

Problem Description x How do we use storage nodes to store a data object reliably, subject to an aggregate storage budget? • x • Storage Allocation • Access by the Data Collector • Objective

Problem Description x How do we use storage nodes to store a data object reliably, subject to an aggregate storage budget? • x • Storage Allocation • Source s has a data object of unit size • It can use n storage nodes to store x1, x2, …, xn amount of data • But faces an aggregate storage budget T, i.e. • Access by the Data Collector • Objective

Problem Description x How do we use storage nodes to store a data object reliably, subject to an aggregate storage budget? • x • Storage Allocation • Access by the Data Collector • Data collector t attempts to recover the data object by accessinga subset r of storage nodes • It succeeds when the total amount of data accessed is at least the size of the data object, i.e. • Objective

Problem Description x How do we use storage nodes to store a data object reliably, subject to an aggregate storage budget? • x • Storage Allocation • Access by the Data Collector • Objective • We seek the optimal allocation that maximizes the probability of successful recovery

Problem Description x How do we use storage nodes to store a data object reliably, subject to an aggregate storage budget? • x • Difficulty • Problem is nonconvex • Large space of possible symmetric and nonsymmetric allocations(an allocation is symmetric if all its nonzero elements are equal,and nonsymmetric otherwise)

[1] Deterministic Allocation with Probabilistic Access Data collector accesses each storage node independentlywith constant probability p

[1] Deterministic Allocation with Probabilistic Access • Symmetric allocations can be suboptimal • †Given n = 5 storage nodes,budget T = 12/5, and p = 0.9,the nonsymmetric allocationperforms better than the optimal symmetric allocation • Finding the optimal symmetric allocation is also nontrivial †Originally from a discussion among R. Karp, R. Kleinberg, †C. Papadimitriou, E. Friedman, and others†at UC Berkeley

[2] Deterministic Allocation with Fixed Access Data collector accesses an r-subset of storage nodes,selected uniformly at random from the collection of all possible r-subsets, where r<n is a constant

[2] Deterministic Allocation with Fixed Access • Equivalently, we can seek the allocation that minimizes the budget T, among all allocationsthat achieve a given probabilityof successful recovery

[2] Deterministic Allocation with Fixed Access • Example: (n, r) = (6,2) • Question: For any budget T, is therealways a symmetric allocation thatproduces the maximum success probability?

[2] Deterministic Allocation with Fixed Access • Question: What is the optimal symmetric allocation? • For most choices of (n, r, T), theoptimal allocation either concentrates the budget over a minimal number of nodes, or spreads it out maximally • An example of an exception is (n, r, T) = (15, 3, 4.6)for which the optimal number of nodes to use, 9, is neither of the extremes

[2] Deterministic Allocation with Fixed Access • For Probability-1 Recovery, the problem reduces to a simple LP • Result 1:If we require all possible r -subsets to allow successful recovery, then we need a minimum budget ofwhich corresponds to the allocationi.e. it is optimal to spread the budget maximally • We can also bound the success probability above which this allocation is optimal

[3] SymmetricProbabilisticAllocation with Fixed Access Each storage node is used independently with constant probability s/n to store the same amount of data 1/`, andthe total storage used must be at most budget T in expectation

[3] SymmetricProbabilisticAllocation with Fixed Access • Probability of successful recovery can be written aswhere “Bin(n, p)” denotes the binomial random variable with n trials and success probability p • Reparameterizing in terms ofbudget T gives the success probability , each nonempty node stores1/` amount of data ,

[3] SymmetricProbabilisticAllocation with Fixed Access • Result 2: For any r≥ 2, and at any budget T large enough to support a success probabilityxXXxxP(r, T,`)> 0.9for some `, the choice ofx x xxxxxxxx`=ris optimal, i.e. it is best to spread the budget maximally each nonempty node stores1/` amount of data

[3] SymmetricProbabilisticAllocation with Fixed Access • As we increase the budget T, we observe a sharp change in the optimal allocation • For small budgets and thereforelow success probabilities,it is optimal to store the data object in its entirety (`= 1) and hope the data collector accesses at least one of the nonempty nodes • For large budgets and therefore high success probabilities, it is optimal to store only 1/r amount of data in each nodeused (`=r) and hope the data collector accesses r of them r= 5

[3] SymmetricProbabilisticAllocation with Fixed Access • We conjecture that for any r and T, the optimal choice of ` that maximizessuccess probability P (r, T,`) is either `= 1 or`=r r= 5 each nonempty node stores1/` amount of data

[3] SymmetricProbabilisticAllocation with Fixed Access • We conjecture that for any r and T, the optimal choice of ` that maximizessuccess probability P (r, T,`) is either `= 1 or`=r r= 5 increasing budgetper node each nonempty node stores1/` amount of data store less store more

Summary & Future Work [1]Deterministic Allocation with Probabilistic Access • Suboptimality of symmetric allocations [2]Deterministic Allocation with FixedAccess • Optimal allocation for high probability recovery • Extreme point solutions not necessarily optimal for symmetric allocations • Is there always a symmetric optimal allocation? [3]iSymmetricProbabilisticAllocation withFixedAccess • Optimal allocation in high-probability regime • Is there a phase transition in optimal allocationwith increasing budget?

Distributed Storage Allocation Problems Derek Leong, Alexandros G. Dimakis, Tracey Ho California Institute of Technology NetCod 2009 2009-06-16

Distributed Storage Allocation Problems

Distributed Storage Allocation Problems

Presentation Transcript

Flexible Storage Allocation

Dynamic Storage Allocation Problem

Storage Allocation

Distributed Storage

Distributed Storage System Survey

Network Coding Distributed Storage

Unigroup : OpenAFS Distributed Storage

Distributed Storage

Secure Distributed Storage: Recent Results and Open Problems

Distributed Storage Networks

(Distributed) (Structured) Storage Systems

Distributed Storage and Consistency

Dynamic Storage Allocation

Storage Allocation

Cluster distributed dynamic storage

Storage Allocation for Embedded Processors

Flexible Storage Allocation

Storage Allocation

Distributed Databases architecture, fragmentation, allocation

DiDaS Distributed Data Storage