1 / 115

# Introduction to Fractals and Fractal Dimension - PowerPoint PPT Presentation

Introduction to Fractals and Fractal Dimension. Christos Faloutsos. Intro to Fractals - Outline. Motivation – 3 problems / case studies Definition of fractals and power laws Solutions to posed problems More examples Discussion - putting fractals to work! Conclusions – practitioner’s guide

Related searches for Introduction to Fractals and Fractal Dimension

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Introduction to Fractals and Fractal Dimension' - zamora

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Introduction to Fractalsand Fractal Dimension

Christos Faloutsos

• Motivation – 3 problems / case studies

• Definition of fractals and power laws

• Solutions to posed problems

• More examples

• Discussion - putting fractals to work!

• Conclusions – practitioner’s guide

• Appendix: gory details - boxcounting plots

• Road end-points of Montgomery county:

• Q : distribution?

• not uniform

• not Gaussian

• ???

• no rules/pattern at all!?

Galaxies (Sloan Digital Sky Survey w/ B. Nichol)

- ‘spiral’ and ‘elliptical’ galaxies

(stores and households ...)

- patterns?

- attraction/repulsion?

- how many ‘spi’ within r from an ‘ell’?

Problem #3: traffic

• disk trace (from HP - J. Wilkes); Web traffic

#bytes

- how many explosions to expect?

- queue length distr.?

time

• Fractals / self-similarities / power laws

• Seminal works from Hilbert, Minkowski, Cantor, Mandelbrot, (Hausdorff, Lyapunov, Ken Wilson, …)

• Motivation – 3 problems / case studies

• Definition of fractals and power laws

• Solutions to posed problems

• More examples and tools

• Discussion - putting fractals to work!

• Conclusions – practitioner’s guide

• Appendix: gory details - boxcounting plots

• Simple 2-D figure has finite perimeter and finite area

• area ≈ perimeter2

• Do all 2-D figures have finite perimeter, finite area, and area ≈ perimeter2?

= self-similar point set, e.g., Sierpinski triangle:

zero area;

infinite length!

...

• Paradox: Infinite perimeter ; Zero area!

• ‘dimensionality’: between 1 and 2

• actually: Log(3)/Log(2) = 1.58…

• “self similar”

• pieces look like whole independent of scale

• “dimension”

dim = log(# self-similar pieces) / log(scale)

• dim(line) = log(2)/log(2) = 1

• dim(square) = log(4)/log(2) = 2

• dim(cube) = log(8)/log(2) = 3

A: 1 (= log(2)/log(2))

Intrinsic (‘fractal’) dimension

x

y

5

1

4

2

3

3

2

4

Intrinsic (‘fractal’) dimension

A: nn ( <= r ) ~ r^1

(‘power law’: y=x^a)

Q: fd of a plane?

A: nn ( <= r ) ~ r^2

Fd = slope of (log(nn) vs log(r) )

Intrinsic (‘fractal’) dimension

Notice

avg nn(<=r) is exactly

tot#pairs(<=r) / (2*N)

Intrinsic (‘fractal’) dimension

including ‘mirror’ pairs

ONLY for a perfectly self-similar point set:

zero area;

infinite length!

...

=log(n)/log(f) = log(3)/log(2) = 1.58

within <=r )

1.58

log( r )

Sierpinsky triangle

== ‘correlation integral’

• Euclidean objects have integer fractal dimensions

• point: 0

• lines and smooth curves: 1

• smooth surfaces: 2

• fractal dimension -> roughness of the periphery

a point set may have several fd, depending on scale

Important properties

• Motivation – 3 problems / case studies

• Definition of fractals and power laws

• Solutions to posed problems

• More examples and tools

• Discussion - putting fractals to work!

• Conclusions – practitioner’s guide

• Appendix: gory details - boxcounting plots

• any rules?

Solution #1

A: self-similarity ->

• <=> fractals

• <=> scale-free

• <=> power-laws (y=x^a, F=C*r^(-2))

• avg#neighbors(<= r ) = r^D

log(#pairs(within <= r))

log( r )

Solution #1

A: self-similarity

• avg#neighbors(<= r ) ~ r^(1.51)

log(#pairs(within <= r))

log( r )

• Montgomery County of MD (road end-points)

• Long Beach county of CA (road end-points)

Galaxies ( ‘BOPS’ plot - [sigmod2000])

log(#pairs)

log(r)

log(#pairs within <=r )

- 1.8 slope

- plateau!

- repulsion!

ell-ell

spi-spi

spi-ell

log(r)

log(#pairs within <=r )

- 1.8 slope

- plateau!

- repulsion!

ell-ell

spi-spi

spi-ell

log(r)

r2

r2

r1

spatial d.m.

Heuristic on choosing # of clusters

log(#pairs within <=r )

- 1.8 slope

- plateau!

- repulsion!

ell-ell

spi-spi

spi-ell

log(r)

log(#pairs within <=r )

• - 1.8 slope

• - plateau!

• repulsion!!

ell-ell

spi-spi

-duplicates

spi-ell

log(r)

time

Solution #3: traffic

• disk traces: self-similar:

80%

Solution #3: traffic

• disk traces (80-20 ‘law’ = ‘multifractal’)

#bytes

time

Clarification:

• fractal: a set of points that is self-similar

• multifractal: a probability density function that is self-similar

Many other time-sequences are bursty/clustered: (such as?)

Tape# N

time

Tape accesses

# tapes needed, to retrieve n records?

(# days down, due to failures / hurricanes / communication noise...)

Tape# N

time

Tape accesses

50-50 = Poisson

# tapes retrieved

real

# qual. records

• How, for the (correlation) fractal dimension?

• A: Box-counting plot:

log(sum(pi ^2))

pi

r

log( r )

• pi : the percentage (or count) of points in the i-th cell

• r: the side of the grid

• compute sum(pi^2) for another grid side, r’

log(sum(pi ^2))

pi’

r’

log( r )

• etc; if the resulting plot has a linear part, its slope is the correlation fractal dimension D2

log(sum(pi ^2))

log( r )

• Box counting plot: Log( N ( r ) ) vs Log ( r)

• r: grid side

• N (r ): count of non-empty cells

• (Hausdorff) fractal dimension D0:

• Hausdorff fd:

r

log(#non-empty cells)

D0

log(r)

• q=0: Hausdorff fractal dimension

• q=2: Correlation fractal dimension (identical to the exponent of the number of neighbors vs radius)

• q=1: Information fractal dimension

• in general, the Dq’s take similar, but not identical, values.

• except for perfectly self-similar point-sets, where Dq=Dq’ for any q, q’

• Montgomery County of MD (road end-points)

• Long Beach county of CA (road end-points)

• many fractal dimensions, with nearby values

• can be computed quickly

(O(N) or O(N log(N))

• (code: on the web)

Problem definition: ‘Feature selection’

• given N points, with E dimensions

• keep the k most ‘informative’ dimensions

[Traina+,SBBD’00]

not informative

Problem definition: ‘Feature selection’

• given N points, with E dimensions

• keep the k most ‘informative’ dimensions

Re-phrased: spot and drop attributes with strong (non-)linear correlations

Q: how do we do that?

A: Hint: correlated attributes do not affect the intrinsic/fractal dimension, e.g., if

y = f(x,z,w)

we can drop y

(hence: ‘partial fd’ (PFD) of a set of attributes = the fd of the dataset, when projected on those attributes)

global FD=1

PFD=1

PFD~0

global FD=1

PFD=1

PFD=1

global FD=1

PFD~1

PFD~1

Q: Algorithm?

Dim. reduction - w/ fractals

A: e.g., greedy - forward selection:

keep the attribute with highest partial fd

add the one that causes the highest increase in pfd

etc., until we are within epsilon from the full f.d.

Dim. reduction - w/ fractals

drop the attribute with least impact on the p.f.d.

repeat

until we are epsilon below the full f.d.

Dim. reduction - w/ fractals

Dim. reduction - w/ fractals

A: we should keep at least as many as the f.d. (and probably, a few more)

Dim. reduction - w/ fractals

(daily exchange rates for USD, HKD, BP, FRF, DEM, JPY - i.e., 6-d vectors, one per day - base currency: CAD)

Dim. reduction - w/ fractals

FRF

e.g.:

USD

log(#pairs(<=r))

correlation integral

1.98

log(r)

E.g., on the ‘currency’ dataset

16-d vectors,

one for each

of ~1K faces

4.25

PFD~1

PFD~1

Dim. reduction - w/ fractals

• Conclusion:

• can do non-linear dim. reduction

• Zipf’s law

• Korcak’s law / “fat fractals”

• Q:vocabulary word frequency in a document - any pattern?

freq.

aaron

zoo

log(freq)

“a”

• Bible - rank vs frequency (log-log)

“the”

log(rank)

log(freq)

• Bible - rank vs frequency (log-log)

• similarly, in manyother languages; for customers and sales volume; city populations etc etc

log(rank)

log(freq)

• Zipf distr:

• freq = 1/ rank

• generalized Zipf:

• freq = 1 / (rank)^a

log(rank)

log(#medals)

rank

Scandinavian lakes

Any pattern?

log(count( >= area))

Scandinavian lakes area vs complementary cumulative count (log-log axes)

log(area)

Japan islands

log(count( >= area))

Japan islands;

area vs cumulative count (log-log axes)

log(area)

How to generate such regions?

Q: How to generate such regions?

A: recursively, from a single region

• tool#1: (for points) ‘correlation integral’: (#pairs within <= r) vs (distance r)

• tool#2: (for categorical values) rank-frequency plot (a’la Zipf)

• tool#3: (for numerical values) CCDF: Complementary cumulative distr. function (#of elements with value >= a )

• Real data often disobey textbook assumptions: not Gaussian, not Poisson, not uniform, not independent)

• self-similar datasets: appear often

• powerful tools: correlation integral, rank-frequency plot, ...

• intrinsic/fractal dimension helps:

• find patterns

• dimensionality reduction

• Q: What about L1, Linf?

• A: Same slope, different intercept

log(#neighbors)

log(d)

log(#pairs)

1.51

2.8

log(hops)

log( r )

Practitioner’s guide:

• tool#1: #pairs vs distance, for a set of objects, with a distance function (slope = intrinsic dimensionality)

internet

MGcounty

-0.82

log(rank)

Practitioner’s guide:

• tool#2: rank-frequency plot (for categorical attributes)

Bible

internet domains

log(freq)

log(rank)

log(area)

Practitioner’s guide:

• tool#3: CCDF, for (skewed) numerical attributes, eg. areas of islands/lakes, UNIX jobs...)

scandinavian lakes

• Software for fractal dimension

• http://www.cs.cmu.edu/~christos

• christos@cs.cmu.edu

• Strongly recommended intro book:

• Manfred Schroeder Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise W.H. Freeman and Company, 1991

• Classic book on fractals:

• B. Mandelbrot Fractal Geometry of Nature, W.H. Freeman, 1977

• Belussi, A. and C. Faloutsos (Sept. 1995). Estimating the Selectivity of Spatial Queries Using the `Correlation' Fractal Dimension. Proc. of VLDB, Zurich, Switzerland.

• Faloutsos, C. and V. Gaede (Sept. 1996). Analysis of the z-ordering Method Using the Hausdorff Fractal Dimension. VLDB, Bombay, India.

• Proietti, G. and C. Faloutsos (March 23-26, 1999). I/O complexity for range queries on region data stored using an R-tree. International Conference on Data Engineering (ICDE), Sydney, Australia.

• Motivation – 3 problems / case studies

• Definition of fractals and power laws

• Solutions to posed problems

• More tools and examples

• Discussion - putting fractals to work!

• Conclusions – practitioner’s guide

• Appendix: gory details - boxcounting plots

• Q: in NN queries, what is the effect of the shape of the query region? [Belussi+95]

L2

Linf

r

L1

• Q: in NN queries, what is the effect of the shape of the query region?

• that is, for L2, and self-similar data:

log(#pairs-within(<=d))

D2

L2

r

log(d)

• Q: What about L1, Linf?

log(#pairs-within(<=d))

D2

L2

r

log(d)

• Q: What about L1, Linf?

• A: Same slope, different intercept

log(#pairs-within(<=d))

D2

L2

r

log(d)

L2

r

NN queries

• Q: what about the intercept? Ie., what can we say about N2 and Ninf

Ninf neighbors

N2 neighbors

Linf

r

volume: V2

volume: Vinf

• Consider sphere with volume Vinf and r’ radius

Ninf neighbors

N2 neighbors

r’

Linf

L2

r

r

volume: V2

volume: Vinf

• Consider sphere with volume Vinf and r’ radius

• (r/r’)^E = V2 / Vinf

• (r/r’)^D2 = N2 / N2’

• N2’ = Ninf (since shape does not matter)

• and finally:

( N2 / Ninf ) ^ 1/D2 = (V2 / Vinf) ^ 1/E

Conclusions: for self-similar datasets

• Avg # neighbors: grows like (distance)^D2 , regardless of query shape (circle, diamond, square, e.t.c. )

Internet topology

• Internet routers: how many neighbors within h hops?

log(#pairs)

Reachability function: number of neighbors within r hops, vs r (log-log).

Mbone routers, 1995

log(hops)

More power laws on the Internet

log(degree)

log(rank)

degree vs rank, for Internet domains (log-log) [sigcomm99]

More power laws - internet

• pdf of degrees: (slope: 2.2 )

Log(count)

Log(degree)

log( i-th eigenvalue)

0.47

log(i)

Scree plot for Internet domains (log-log) [sigcomm99]

2.63 = fd

octree levels

More apps: Brain scans

• Oct-trees; brain-scans

[Burdett et al, SPIE ‘93]:

• benign tumors: fd ~ 2.37

• malignant: fd ~ 2.56

2 years

More fractals:

• cardiovascular system: 3 (!)

• stock prices (LYCOS) - random walks: 1.5

• Coastlines: 1.2-1.58 (Norway!)

• duration of UNIX jobs [Harchol-Balter]

• Energy of earthquakes (Gutenberg-Richter law) [simscience.org]

log(freq)

amplitude

day

magnitude

• publication counts (Lotka’s law)

• Distribution of UNIX file sizes

• Income distribution (Pareto’s law)

• web hit counts [Huberman]

• In- and out-degree distribution of web sites [Barabasi], [IBM-CLEVER]

• length of file transfers [Bestavros+]

• Click-stream data (w/ A. Montgomery (CMU-GSIA) + MediaMetrix)

• Software for fractal dimension

• http://www.cs.cmu.edu/~christos

• christos@cs.cmu.edu

• Strongly recommended intro book:

• Manfred Schroeder Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise W.H. Freeman and Company, 1991

• Classic book on fractals:

• B. Mandelbrot Fractal Geometry of Nature, W.H. Freeman, 1977

• [ieeeTN94] W. E. Leland, M.S. Taqqu, W. Willinger, D.V. Wilson, On the Self-Similar Nature of Ethernet Traffic, IEEE Transactions on Networking, 2, 1, pp 1-15, Feb. 1994.

• [pods94] Christos Faloutsos and Ibrahim Kamel, Beyond Uniformity and Independence: Analysis of R-trees Using the Concept of Fractal Dimension, PODS, Minneapolis, MN, May 24-26, 1994, pp. 4-13

• [vldb95] Alberto Belussi and Christos Faloutsos, Estimating the Selectivity of Spatial Queries Using the `Correlation' Fractal Dimension Proc. of VLDB, p. 299-310, 1995

• [vldb96] Christos Faloutsos, Yossi Matias and Avi Silberschatz, Modeling Skewed Distributions Using Multifractals and the `80-20 Law’ Conf. on Very Large Data Bases (VLDB), Bombay, India, Sept. 1996.

• [vldb96] Christos Faloutsos and Volker Gaede Analysis of the Z-Ordering Method Using the Hausdorff Fractal Dimension VLD, Bombay, India, Sept. 1996

• [sigcomm99] Michalis Faloutsos, Petros Faloutsos and Christos Faloutsos, What does the Internet look like? Empirical Laws of the Internet Topology, SIGCOMM 1999

• [icde99] Guido Proietti and Christos Faloutsos, I/O complexity for range queries on region data stored using an R-tree International Conference on Data Engineering (ICDE), Sydney, Australia, March 23-26, 1999

• [sigmod2000] Christos Faloutsos, Bernhard Seeger, Agma J. M. Traina and Caetano Traina Jr., Spatial Join Selectivity Using Power Laws, SIGMOD 2000