Trading off space for passes in graph streaming problems
This presentation is the property of its rightful owner.
Sponsored Links
1 / 27

Trading off space for passes in graph streaming problems PowerPoint PPT Presentation


  • 44 Views
  • Uploaded on
  • Presentation posted in: General

Trading off space for passes in graph streaming problems. Camil Demetrescu Irene Finocchi Andrea Ribichini. University of Rome “La Sapienza”. Dagstuhl Seminar 05361. Processing massive data streams. Large body of work in recent years.

Download Presentation

Trading off space for passes in graph streaming problems

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Trading off space for passes in graph streaming problems

Trading off space for passes in graph streaming problems

Camil Demetrescu

Irene Finocchi

Andrea Ribichini

University of Rome “La Sapienza”

Dagstuhl Seminar 05361


Processing massive data streams

Processing massive data streams

Large body of work in recent years

Practically motivated, raises interesting theoretical questions

Areas:

Databases, Sensors, Networking, Hardware, Programming lang.

Core problems:

Algorithms, Complexity, Statistics, Probability, Approximation theory


Classical streaming

input

stream

1st

pass

2nd

pass

M

M

M

M

M

M

M

M

n = size of input stream (# of items)

p = number of passes

s = size of working memory M (space in bits)

Classical streaming


Classical streaming1

Seminal work by Munro and Paterson (1980): pass-efficient selection and sorting

Several problems shown to be solvable with polylog(n) space and passes in the 90’s(e.g., approximating frequency moments)

Classical streaming is very restrictive: for many fundamental problems (e.g., on graphs)

provably impossible to achieve polylog(n) space and passes

Classical streaming


Graph streaming problems

Recent interest in graph problems in “semi-streaming” models, where:

space = O( N · polylog(N) )

passes = O( polylog(N) )

[Feigenbaum et al., ICALP 2004]

Graph streaming problems

For many basic graph problems

(e.g., connectivity, shortest paths):

passes = Ω (N/space)

( N = number of vertices )

O(N · polylog(N)) space “sweet spot” for graph streaming problems [Muthukrishnan, 2001]


Graph algorithms in classical streaming

Graph algorithms in classical streaming

Approximate triangle counting[Bar-Yossef et al., SODA 2002]

Matching, bipartiteness, connectivity, MST, t-spanners, …[Feigenbaum et al., ICALP 2004, SODA 2005]

All of them make one, or very few passes, but require Ω(N) space


Trading off space for passes

Natural question:

Can we reduce space if we do more passes?

[Munro and Paterson ‘80, Henzinger et al. ‘99]

Example:

Processing a 50 GB graph on a 1 GB RAM PC(4 billion vertices, 6 billion edges)

Trading off space for passes

s = (N/p) algorithm: ~16 passes (a few hours)

s = (N) algorithm: out of memory

(16 GB RAM would be required)


Some facts on modern commodity i o

Sequential access rates are comparable to (or even faster than) random access rates in main memory:

A RAID disk controller can deliver 100 MB/s access rate

On a 1+ GHz Pentium PC, random access to 2GB of main memory in 32 byte chunks: 80 MB/s effective access rate

Sequential access uses caches optimally(this makes algorithms cache-oblivious)

Some facts on modern commodity I/O

[Ruhl ‘03 - Rajagopalan ‘02]


Some facts on modern commodity i o1

External memory storage is cheap (less than a dollar per gigabyte) and readily available

Some facts on modern commodity I/O

Above facts imply that both reading and writing sequentially can improve performances

  • Classical read-only streaming perhaps overly pessimistic?

 Why not exploiting temporary storage?


The streamsort model aggarwal et al 04

input

stream

use a sorting primitiveto reorder the stream

1st

pass

M

M

M

M

M

M

M

M

M

M

M

M

M

M

M

interm.

stream

2nd

pass

M

output

stream

The StreamSort model [Aggarwal et al.’04]


How much power does sorting yield

Good news:

Undirected connectivity can be solved in polylog(N) space and passes

in StreamSort

[Aggarwal et al., FOCS 2004]

How much power does sorting yield?

Open problem:

No clue on how to get polylog(N) bounds for Shortest Paths (even BFS) in StreamSort


Dish of the day

We address:

- Connectivity - Single-source shortest paths

Dish of the day

We show that StreamSort can yield interestingresults even without using sorting at all

(call this more restrictive model W-Stream: allows intermediate streams, but no sorting)

In this model, we show effective space/passes tradeoffs for natural graph streaming problems


Graph connectivity

We now show the following:

Upper bound: UCON in W-Stream p= O(N · log N / s)

Graph connectivity

UCON: G=(V,E) undirected graph with N vertices given as stream of edges in arbitrary order. Find out if G is connected.

Lower bound: UCON in W-Stream p = Ω(N/s)


Graph connectivity algorithm

1

2

11

8

11

8

3

7

5

1

5

pass

F

12

12

9

4

9

10

10

6

G

G’

Input stream

Output stream

Graph connectivity: algorithm

Red phase

Generic pass: two phases

Blue phase


Graph connectivity analysis

Graph connectivity: analysis

How many passes?

All vertices of F that are not component representatives disappear from the output graph

  • Invariant: F is induced by a set of edges

  • each tree in F contains at least two vertices

At each pass we loose at least |V(F)| / 2 = (s/log N) vertices

 p = O( N·log N / s)


Single source shortest paths

Single-source shortest paths

SSSP: G=(V,E,w) weighted directed graph with N vertices given as arbitrary stream of edges. Find distances from a given source t to all other vertices.

Lower bound 1: BFS in W-Stream: p= Ω(N/ s)

Space-efficient algorithms for SSSPalways require multiple passes

Lower bound 2: finding vertices up to constant distance d: p ≤ d  s = Ω( N1+1/(2d) )

[Feigenbaum et al., SODA 2005]


Single source shortest paths1

Single-source shortest paths

Hard even using sorting as a primitive

Previous results on distances in streaming: approximate (spanners) in undirected graphs only

No sublinear-space streaming algorithm for SSSP previously known.

We make a first step, showing that we can solve SSSP in W-Stream in sublinear space and passes simultaneously in directed graphs with small integer edge weights


Single source shortest paths bound

Thm: For any space restriction s, there is a randomized one-sided error algorithm for directed SSSP in W-Stream with edge weights in {1,2,…,C} s.t.:

N

~

C·N·log3/2 N

N

p = Ω

p = O

p = O

s

√s

√s

Single-source shortest paths: bound

For C = O(s1/2-) and polynomial sublinear space, we also get sublinear p

In this talk we focus on C=1 (BFS)


Single source shortest paths approach

Overall approach: First build many short paths “in parallel”, then stitch them together to form long paths.

Single-source shortest paths: approach

For a given space restriction, this helps us reduce the number of passes to find long paths


Single source shortest paths step 1 5

Example: (chain)

t

1

6

10

5

8

3

7

2

4

9

Single-source shortest paths: step 1/5

Pick a set K of (s/log N)1/2 random vertices including source t


Single source shortest paths step 2 5

3

3

2

3

0

1

2

0

1

2

0

1

0

1

N log N

|K|

The more memory we have,the larger |K|, and thus the smaller the # of passes

Single-source shortest paths: step 2/5

Find distances up to (N log N) / |K|

from each vertex in K(short distances)

Example: (chain)

t

1

6

10

5

8

3

7

2

4

9


Single source shortest paths step 3 5

3

3

2

3

0

1

2

0

1

2

0

1

0

1

3

3

2

G’

t

1

5

7

4

Single-source shortest paths: step 3/5

Build a graph G’ = (K, E’), where: (x,y)  E’  dist(x,y) ≤ (N log N) / |K| in G

Example: (chain)

t

1

6

10

5

8

3

7

2

4

9


Single source shortest paths step 4 5

0

3

6

8

0

3

6

8

Single-source shortest paths: step 4/5

Find in G’ distancesfrom t to all other vertices of K

Example: (chain)

t

1

6

10

5

8

3

7

2

4

9

3

3

2

G’

t

1

5

7

4


Single source shortest paths step 5 5

3

3

2

3

0

3

6

8

0

1

2

0

1

2

0

1

0

1

1

2

4

5

7

9

Single-source shortest paths: step 5/5

For each v, let: dist(t,v)= min c  K{dist(t,c) + dist(c,v)}

(final distances)

Example: (chain)

t

1

6

10

5

8

3

7

2

4

9


Results are correct with high prob

Results are correct with high prob.

Sampling thm. Let K be a set of vertices chosen uniformly at random. Then the probability that a simple path with more than (c ·N · log N) / |K| vertices intersects K is at least 1-1/nc for any c > 0

[Greene & Knuth,’80]


Conclusions and further work

Conclusions and further work

We have shown effective space/passes tradeoffs for problems that seem hard in classical streaming (graph connectivity & shortest paths)

Can we do the same in the classical read-only streaming model?

Can we prove stronger lower bounds in classicalstreaming?

Can we close the gap between upper and lower bound for BFS in W-Stream?

Space/passes tradeoffs for other problems?


Thank you

Thank you


  • Login