- 501 Views
- Uploaded on
- Presentation posted in: General

Introduction to Fractals and Fractal Dimension

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Introduction to Fractalsand Fractal Dimension

Christos Faloutsos

- Motivation – 3 problems / case studies
- Definition of fractals and power laws
- Solutions to posed problems
- More examples
- Discussion - putting fractals to work!
- Conclusions – practitioner’s guide
- Appendix: gory details - boxcounting plots

Copyright: C. Faloutsos (2001)

- Road end-points of Montgomery county:
- Q : distribution?
- not uniform
- not Gaussian
- ???
- no rules/pattern at all!?

Copyright: C. Faloutsos (2001)

Galaxies (Sloan Digital Sky Survey w/ B. Nichol)

- ‘spiral’ and ‘elliptical’ galaxies

(stores and households ...)

- patterns?

- attraction/repulsion?

- how many ‘spi’ within r from an ‘ell’?

Copyright: C. Faloutsos (2001)

Poisson

- disk trace (from HP - J. Wilkes); Web traffic

#bytes

- how many explosions to expect?

- queue length distr.?

time

Copyright: C. Faloutsos (2001)

- Fractals / self-similarities / power laws
- Seminal works from Hilbert, Minkowski, Cantor, Mandelbrot, (Hausdorff, Lyapunov, Ken Wilson, …)

Copyright: C. Faloutsos (2001)

- Motivation – 3 problems / case studies
- Definition of fractals and power laws
- Solutions to posed problems
- More examples and tools
- Discussion - putting fractals to work!
- Conclusions – practitioner’s guide
- Appendix: gory details - boxcounting plots

Copyright: C. Faloutsos (2001)

- Simple 2-D figure has finite perimeter and finite area
- area ≈ perimeter2

- Do all 2-D figures have finite perimeter, finite area, and area ≈ perimeter2?

Copyright: C. Faloutsos (2001)

= self-similar point set, e.g., Sierpinski triangle:

zero area;

infinite length!

...

Copyright: C. Faloutsos (2001)

- Paradox: Infinite perimeter ; Zero area!
- ‘dimensionality’: between 1 and 2
- actually: Log(3)/Log(2) = 1.58…

Copyright: C. Faloutsos (2001)

- “self similar”
- pieces look like whole independent of scale

- “dimension”
dim = log(# self-similar pieces) / log(scale)

- dim(line) = log(2)/log(2) = 1
- dim(square) = log(4)/log(2) = 2
- dim(cube) = log(8)/log(2) = 3

Copyright: C. Faloutsos (2001)

Q: fractal dimension of a line?

A: 1 (= log(2)/log(2))

Copyright: C. Faloutsos (2001)

Q: dfn for a given set of points?

x

y

5

1

4

2

3

3

2

4

Copyright: C. Faloutsos (2001)

Q: fractal dimension of a line?

A: nn ( <= r ) ~ r^1

(‘power law’: y=x^a)

Q: fd of a plane?

A: nn ( <= r ) ~ r^2

Fd = slope of (log(nn) vs log(r) )

Copyright: C. Faloutsos (2001)

Algorithm, to estimate it?

Notice

avg nn(<=r) is exactly

tot#pairs(<=r) / (2*N)

including ‘mirror’ pairs

Copyright: C. Faloutsos (2001)

ONLY for a perfectly self-similar point set:

zero area;

infinite length!

...

=log(n)/log(f) = log(3)/log(2) = 1.58

Copyright: C. Faloutsos (2001)

log(#pairs

within <=r )

1.58

log( r )

== ‘correlation integral’

Copyright: C. Faloutsos (2001)

- Euclidean objects have integer fractal dimensions
- point: 0
- lines and smooth curves: 1
- smooth surfaces: 2

- fractal dimension -> roughness of the periphery

Copyright: C. Faloutsos (2001)

fd = embedding dimension -> uniform pointset

a point set may have several fd, depending on scale

Copyright: C. Faloutsos (2001)

- Motivation – 3 problems / case studies
- Definition of fractals and power laws
- Solutions to posed problems
- More examples and tools
- Discussion - putting fractals to work!
- Conclusions – practitioner’s guide
- Appendix: gory details - boxcounting plots

Copyright: C. Faloutsos (2001)

- Cross-roads of Montgomery county:
- any rules?

Copyright: C. Faloutsos (2001)

1.51

A: self-similarity ->

- <=> fractals
- <=> scale-free
- <=> power-laws (y=x^a, F=C*r^(-2))
- avg#neighbors(<= r ) = r^D

log(#pairs(within <= r))

log( r )

Copyright: C. Faloutsos (2001)

1.51

A: self-similarity

- avg#neighbors(<= r ) ~ r^(1.51)

log(#pairs(within <= r))

log( r )

Copyright: C. Faloutsos (2001)

- Montgomery County of MD (road end-points)

Copyright: C. Faloutsos (2001)

- Long Beach county of CA (road end-points)

Copyright: C. Faloutsos (2001)

Galaxies ( ‘BOPS’ plot - [sigmod2000])

log(#pairs)

log(r)

Copyright: C. Faloutsos (2001)

log(#pairs within <=r )

- 1.8 slope

- plateau!

- repulsion!

ell-ell

spi-spi

spi-ell

log(r)

Copyright: C. Faloutsos (2001)

log(#pairs within <=r )

- 1.8 slope

- plateau!

- repulsion!

ell-ell

spi-spi

spi-ell

log(r)

Copyright: C. Faloutsos (2001)

r1

r2

r2

r1

Heuristic on choosing # of clusters

Copyright: C. Faloutsos (2001)

log(#pairs within <=r )

- 1.8 slope

- plateau!

- repulsion!

ell-ell

spi-spi

spi-ell

log(r)

Copyright: C. Faloutsos (2001)

log(#pairs within <=r )

- - 1.8 slope
- - plateau!
- repulsion!!

ell-ell

spi-spi

-duplicates

spi-ell

log(r)

Copyright: C. Faloutsos (2001)

#bytes

time

- disk traces: self-similar:

Copyright: C. Faloutsos (2001)

20%

80%

- disk traces (80-20 ‘law’ = ‘multifractal’)

#bytes

time

Copyright: C. Faloutsos (2001)

Clarification:

- fractal: a set of points that is self-similar
- multifractal: a probability density function that is self-similar
Many other time-sequences are bursty/clustered: (such as?)

Copyright: C. Faloutsos (2001)

Tape#1

Tape# N

time

# tapes needed, to retrieve n records?

(# days down, due to failures / hurricanes / communication noise...)

Copyright: C. Faloutsos (2001)

Tape#1

Tape# N

time

50-50 = Poisson

# tapes retrieved

real

# qual. records

Copyright: C. Faloutsos (2001)

- How, for the (correlation) fractal dimension?
- A: Box-counting plot:

log(sum(pi ^2))

pi

r

log( r )

Copyright: C. Faloutsos (2001)

- pi : the percentage (or count) of points in the i-th cell
- r: the side of the grid

Copyright: C. Faloutsos (2001)

- compute sum(pi^2) for another grid side, r’

log(sum(pi ^2))

pi’

r’

log( r )

Copyright: C. Faloutsos (2001)

- etc; if the resulting plot has a linear part, its slope is the correlation fractal dimension D2

log(sum(pi ^2))

log( r )

Copyright: C. Faloutsos (2001)

- Box counting plot: Log( N ( r ) ) vs Log ( r)
- r: grid side
- N (r ): count of non-empty cells
- (Hausdorff) fractal dimension D0:

Copyright: C. Faloutsos (2001)

- Hausdorff fd:

r

log(#non-empty cells)

D0

log(r)

Copyright: C. Faloutsos (2001)

- q=0: Hausdorff fractal dimension
- q=2: Correlation fractal dimension (identical to the exponent of the number of neighbors vs radius)
- q=1: Information fractal dimension

Copyright: C. Faloutsos (2001)

- in general, the Dq’s take similar, but not identical, values.
- except for perfectly self-similar point-sets, where Dq=Dq’ for any q, q’

Copyright: C. Faloutsos (2001)

- Montgomery County of MD (road end-points)

Copyright: C. Faloutsos (2001)

- Long Beach county of CA (road end-points)

Copyright: C. Faloutsos (2001)

- many fractal dimensions, with nearby values
- can be computed quickly
(O(N) or O(N log(N))

- (code: on the web)

Copyright: C. Faloutsos (2001)

Problem definition: ‘Feature selection’

- given N points, with E dimensions
- keep the k most ‘informative’ dimensions
[Traina+,SBBD’00]

Copyright: C. Faloutsos (2001)

not informative

Copyright: C. Faloutsos (2001)

Problem definition: ‘Feature selection’

- given N points, with E dimensions
- keep the k most ‘informative’ dimensions
Re-phrased: spot and drop attributes with strong (non-)linear correlations

Q: how do we do that?

Copyright: C. Faloutsos (2001)

A: Hint: correlated attributes do not affect the intrinsic/fractal dimension, e.g., if

y = f(x,z,w)

we can drop y

(hence: ‘partial fd’ (PFD) of a set of attributes = the fd of the dataset, when projected on those attributes)

Copyright: C. Faloutsos (2001)

global FD=1

PFD=1

PFD~0

Copyright: C. Faloutsos (2001)

global FD=1

PFD=1

PFD=1

Copyright: C. Faloutsos (2001)

global FD=1

PFD~1

PFD~1

Copyright: C. Faloutsos (2001)

(problem: given N points in E-d, choose k best dimensions)

Q: Algorithm?

Copyright: C. Faloutsos (2001)

Q: Algorithm?

A: e.g., greedy - forward selection:

keep the attribute with highest partial fd

add the one that causes the highest increase in pfd

etc., until we are within epsilon from the full f.d.

Copyright: C. Faloutsos (2001)

(backward elimination: ~ reverse)

drop the attribute with least impact on the p.f.d.

repeat

until we are epsilon below the full f.d.

Copyright: C. Faloutsos (2001)

Q: what is the smallest # of attributes we should keep?

Copyright: C. Faloutsos (2001)

Q: what is the smallest # of attributes we should keep?

A: we should keep at least as many as the f.d. (and probably, a few more)

Copyright: C. Faloutsos (2001)

Results: E.g., on the ‘currency’ dataset

(daily exchange rates for USD, HKD, BP, FRF, DEM, JPY - i.e., 6-d vectors, one per day - base currency: CAD)

FRF

e.g.:

USD

Copyright: C. Faloutsos (2001)

log(#pairs(<=r))

correlation integral

1.98

log(r)

Copyright: C. Faloutsos (2001)

if unif + indep.

Copyright: C. Faloutsos (2001)

16-d vectors,

one for each

of ~1K faces

4.25

Copyright: C. Faloutsos (2001)

E.g., on the eigenface dataset

Copyright: C. Faloutsos (2001)

global FD=1

PFD~1

PFD~1

- Conclusion:
- can do non-linear dim. reduction

Copyright: C. Faloutsos (2001)

- Zipf’s law
- Korcak’s law / “fat fractals”

Copyright: C. Faloutsos (2001)

- Q:vocabulary word frequency in a document - any pattern?

freq.

aaron

zoo

Copyright: C. Faloutsos (2001)

log(freq)

“a”

- Bible - rank vs frequency (log-log)

“the”

log(rank)

Copyright: C. Faloutsos (2001)

log(freq)

- Bible - rank vs frequency (log-log)
- similarly, in manyother languages; for customers and sales volume; city populations etc etc

log(rank)

Copyright: C. Faloutsos (2001)

log(freq)

- Zipf distr:
- freq = 1/ rank

- generalized Zipf:
- freq = 1 / (rank)^a

log(rank)

Copyright: C. Faloutsos (2001)

log(#medals)

rank

Copyright: C. Faloutsos (2001)

Scandinavian lakes

Any pattern?

Copyright: C. Faloutsos (2001)

log(count( >= area))

Scandinavian lakes area vs complementary cumulative count (log-log axes)

log(area)

Copyright: C. Faloutsos (2001)

Japan islands

Copyright: C. Faloutsos (2001)

log(count( >= area))

Japan islands;

area vs cumulative count (log-log axes)

log(area)

Copyright: C. Faloutsos (2001)

Copyright: C. Faloutsos (2001)

How to generate such regions?

Copyright: C. Faloutsos (2001)

Q: How to generate such regions?

A: recursively, from a single region

Copyright: C. Faloutsos (2001)

- tool#1: (for points) ‘correlation integral’: (#pairs within <= r) vs (distance r)
- tool#2: (for categorical values) rank-frequency plot (a’la Zipf)
- tool#3: (for numerical values) CCDF: Complementary cumulative distr. function (#of elements with value >= a )

Copyright: C. Faloutsos (2001)

- Real data often disobey textbook assumptions: not Gaussian, not Poisson, not uniform, not independent)
- self-similar datasets: appear often
- powerful tools: correlation integral, rank-frequency plot, ...
- intrinsic/fractal dimension helps:
- find patterns
- dimensionality reduction

Copyright: C. Faloutsos (2001)

- Q: What about L1, Linf?
- A: Same slope, different intercept

log(#neighbors)

log(d)

Copyright: C. Faloutsos (2001)

log(#pairs(within <= r))

log(#pairs)

1.51

2.8

log(hops)

log( r )

- tool#1: #pairs vs distance, for a set of objects, with a distance function (slope = intrinsic dimensionality)

internet

MGcounty

Copyright: C. Faloutsos (2001)

log(degree)

-0.82

log(rank)

- tool#2: rank-frequency plot (for categorical attributes)

Bible

internet domains

log(freq)

log(rank)

Copyright: C. Faloutsos (2001)

log(count( >= area))

log(area)

- tool#3: CCDF, for (skewed) numerical attributes, eg. areas of islands/lakes, UNIX jobs...)

scandinavian lakes

Copyright: C. Faloutsos (2001)

- Software for fractal dimension
- http://www.cs.cmu.edu/~christos
- [email protected]

Copyright: C. Faloutsos (2001)

- Strongly recommended intro book:
- Manfred Schroeder Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise W.H. Freeman and Company, 1991

- Classic book on fractals:
- B. Mandelbrot Fractal Geometry of Nature, W.H. Freeman, 1977

Copyright: C. Faloutsos (2001)

- Belussi, A. and C. Faloutsos (Sept. 1995). Estimating the Selectivity of Spatial Queries Using the `Correlation' Fractal Dimension. Proc. of VLDB, Zurich, Switzerland.
- Faloutsos, C. and V. Gaede (Sept. 1996). Analysis of the z-ordering Method Using the Hausdorff Fractal Dimension. VLDB, Bombay, India.
- Proietti, G. and C. Faloutsos (March 23-26, 1999). I/O complexity for range queries on region data stored using an R-tree. International Conference on Data Engineering (ICDE), Sydney, Australia.

Copyright: C. Faloutsos (2001)

- Motivation – 3 problems / case studies
- Definition of fractals and power laws
- Solutions to posed problems
- More tools and examples
- Discussion - putting fractals to work!
- Conclusions – practitioner’s guide
- Appendix: gory details - boxcounting plots

Copyright: C. Faloutsos (2001)

- Q: in NN queries, what is the effect of the shape of the query region? [Belussi+95]

L2

Linf

r

L1

Copyright: C. Faloutsos (2001)

- Q: in NN queries, what is the effect of the shape of the query region?
- that is, for L2, and self-similar data:

log(#pairs-within(<=d))

D2

L2

r

log(d)

Copyright: C. Faloutsos (2001)

- Q: What about L1, Linf?

log(#pairs-within(<=d))

D2

L2

r

log(d)

Copyright: C. Faloutsos (2001)

- Q: What about L1, Linf?
- A: Same slope, different intercept

log(#pairs-within(<=d))

D2

L2

r

log(d)

Copyright: C. Faloutsos (2001)

L2

r

- Q: what about the intercept? Ie., what can we say about N2 and Ninf

Ninf neighbors

N2 neighbors

Linf

r

volume: V2

volume: Vinf

Copyright: C. Faloutsos (2001)

- Consider sphere with volume Vinf and r’ radius

Ninf neighbors

N2 neighbors

r’

Linf

L2

r

r

volume: V2

volume: Vinf

Copyright: C. Faloutsos (2001)

- Consider sphere with volume Vinf and r’ radius
- (r/r’)^E = V2 / Vinf
- (r/r’)^D2 = N2 / N2’
- N2’ = Ninf (since shape does not matter)
- and finally:

Copyright: C. Faloutsos (2001)

( N2 / Ninf ) ^ 1/D2 = (V2 / Vinf) ^ 1/E

Copyright: C. Faloutsos (2001)

Conclusions: for self-similar datasets

- Avg # neighbors: grows like (distance)^D2 , regardless of query shape (circle, diamond, square, e.t.c. )

Copyright: C. Faloutsos (2001)

2.8

- Internet routers: how many neighbors within h hops?

log(#pairs)

Reachability function: number of neighbors within r hops, vs r (log-log).

Mbone routers, 1995

log(hops)

Copyright: C. Faloutsos (2001)

-0.82

log(degree)

log(rank)

degree vs rank, for Internet domains (log-log) [sigcomm99]

Copyright: C. Faloutsos (2001)

-2.2

- pdf of degrees: (slope: 2.2 )

Log(count)

Log(degree)

Copyright: C. Faloutsos (2001)

log( i-th eigenvalue)

0.47

log(i)

Scree plot for Internet domains (log-log) [sigcomm99]

Copyright: C. Faloutsos (2001)

Log(#octants)

2.63 = fd

octree levels

- Oct-trees; brain-scans

Copyright: C. Faloutsos (2001)

[Burdett et al, SPIE ‘93]:

- benign tumors: fd ~ 2.37
- malignant: fd ~ 2.56

Copyright: C. Faloutsos (2001)

1 year

2 years

- cardiovascular system: 3 (!)
- stock prices (LYCOS) - random walks: 1.5
- Coastlines: 1.2-1.58 (Norway!)

Copyright: C. Faloutsos (2001)

Copyright: C. Faloutsos (2001)

- duration of UNIX jobs [Harchol-Balter]
- Energy of earthquakes (Gutenberg-Richter law) [simscience.org]

log(freq)

amplitude

day

magnitude

Copyright: C. Faloutsos (2001)

- publication counts (Lotka’s law)
- Distribution of UNIX file sizes
- Income distribution (Pareto’s law)
- web hit counts [Huberman]

Copyright: C. Faloutsos (2001)

- In- and out-degree distribution of web sites [Barabasi], [IBM-CLEVER]
- length of file transfers [Bestavros+]
- Click-stream data (w/ A. Montgomery (CMU-GSIA) + MediaMetrix)

Copyright: C. Faloutsos (2001)

- Software for fractal dimension
- http://www.cs.cmu.edu/~christos
- [email protected]

Copyright: C. Faloutsos (2001)

- Strongly recommended intro book:
- Manfred Schroeder Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise W.H. Freeman and Company, 1991

- Classic book on fractals:
- B. Mandelbrot Fractal Geometry of Nature, W.H. Freeman, 1977

Copyright: C. Faloutsos (2001)

- [ieeeTN94] W. E. Leland, M.S. Taqqu, W. Willinger, D.V. Wilson, On the Self-Similar Nature of Ethernet Traffic, IEEE Transactions on Networking, 2, 1, pp 1-15, Feb. 1994.
- [pods94] Christos Faloutsos and Ibrahim Kamel, Beyond Uniformity and Independence: Analysis of R-trees Using the Concept of Fractal Dimension, PODS, Minneapolis, MN, May 24-26, 1994, pp. 4-13

Copyright: C. Faloutsos (2001)

- [vldb95] Alberto Belussi and Christos Faloutsos, Estimating the Selectivity of Spatial Queries Using the `Correlation' Fractal Dimension Proc. of VLDB, p. 299-310, 1995
- [vldb96] Christos Faloutsos, Yossi Matias and Avi Silberschatz, Modeling Skewed Distributions Using Multifractals and the `80-20 Law’ Conf. on Very Large Data Bases (VLDB), Bombay, India, Sept. 1996.

Copyright: C. Faloutsos (2001)

- [vldb96] Christos Faloutsos and Volker Gaede Analysis of the Z-Ordering Method Using the Hausdorff Fractal Dimension VLD, Bombay, India, Sept. 1996
- [sigcomm99] Michalis Faloutsos, Petros Faloutsos and Christos Faloutsos, What does the Internet look like? Empirical Laws of the Internet Topology, SIGCOMM 1999

Copyright: C. Faloutsos (2001)

- [icde99] Guido Proietti and Christos Faloutsos, I/O complexity for range queries on region data stored using an R-tree International Conference on Data Engineering (ICDE), Sydney, Australia, March 23-26, 1999
- [sigmod2000] Christos Faloutsos, Bernhard Seeger, Agma J. M. Traina and Caetano Traina Jr., Spatial Join Selectivity Using Power Laws, SIGMOD 2000

Copyright: C. Faloutsos (2001)

- [PODS94] Faloutsos, C. and I. Kamel (May 24-26, 1994). Beyond Uniformity and Independence: Analysis of R-trees Using the Concept of Fractal Dimension. Proc. ACM SIGACT-SIGMOD-SIGART PODS, Minneapolis, MN.
- [Traina+, SBBD’00] Traina, C., A. Traina, et al. (2000). Fast feature selection using the fractal dimension. XV Brazilian Symposium on Databases (SBBD), Paraiba, Brazil.

Copyright: C. Faloutsos (2001)