# Distance Functions on Hierarchies - PowerPoint PPT Presentation

1 / 38

Distance Functions on Hierarchies. Eftychia Baikousi. Outline. Definition of metric & similarity Various Distance Functions Minkowski Set based Edit distance Basic concept of OLAP Lattice Distance in same level of hierarchy Distance in different level of hierarchy.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Distance Functions on Hierarchies

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

## Distance Functions on Hierarchies

Eftychia Baikousi

### Outline

• Definition of metric & similarity

• Various Distance Functions

• Minkowski

• Setbased

• Editdistance

• Basic concept of OLAP

• Lattice

• Distance in same level of hierarchy

• Distance in different level of hierarchy

### Definition of metric

• A distance function on a given set M is a function d:MxM , that satisfies the following conditions:

• d(x,y)≥0 and

• d(x,y)=0 iff x=y

• Distance is positive between two different points and is zero precisely from a point to itself

• It is symmetric: d(x,y)=d(y,x)

• The distance between x and y is the same in either direction

• It satisfies the triangleinequality: d(x,z) ≤ d(x,y)+ d(y,z)

• The distance between two points is the shortest distance along any path

• Is a metric

### Definition of similarity metric

• Let s(x,y) be the similarity between two points x and y, then the following properties hold:

• s(x,y) =1 only if x=y (0≤ s ≤1)

• s(x,y) =s(y,x)x and y (symmetry)

• The triangle inequality does not hold

### Outline

• Definition of metric & similarity

• Various Distance Functions

• Minkowski

• Setbased

• Editdistance

• Basic concept of OLAP

• Lattice

• Distance in same level of hierarchy

• Distance in different level of hierarchy

### Edit Distance- Levenshtein distance

• Edit distance between two strings

x=x1 ….xn, y=y1…ym

is defined as the minimum number of atomic edit operations needed

• Insert : ins(x,i,c)=x1x2…xicxi+1…xn

• Delete : del(x,i)=x1x2…xi-1xi+1…xn

• Replace : rep(x,i,c)=x1x2…xi-1cxi+1…xn

• Assign cost for every edit operation c(o)=1

### Edit distances

• Needleman-Wunch distance or Sellers Algorithm

• Insert

• a characterins(x,i,c)=x1x2…xicxi+1…xn

• with cost(o)=1

• a gap ins_g(x,i,g)=x1x2…xigxi+1…xn

• withcost(o)=g

• Delete

• a characterdel(x,i)=x1x2…xi-1xi+1…xn

• withcost(o)=1

• a gapdel_g(x,i)=x1x2…xi-1xi+1…xn

• withcost(o)=g

• Replace

• a characterrep(x,i,c)=x1x2…xi-1cxi+1…xn

• withcost(o)=1

### Edit distances

• Jaro distance

• Let two strings s and t and

• s’= characters in s that are common with t

• t’ = characters in t that are common with s

• Ts,t=number of transportations of characters in s’ relative to t’

### Edit distances

• Jaro distance Example

• Let s =MARTHA and t =MARHTA

• |s’|=6

• |t’|=6

• Ts,t = 2/2since mismatched characters are T/H and H/T

### Edit distances

• Jaro Winkler

• JWS(s,t)= Jaro(s,t) + ((prefixLength * PREFIXSCALE * (1.0-Jaro(s,t)))

• Where:

• prefixLength : the length of common prefix at the start of the string

• PREFIXSCALE: a constant scaling factor which gives more favourable ratings to strings that match from the beginning for a set prefix length

### Edit distances

• Jaro Winkler Example

• Let s =MARTHA and t =MARHTA and PREFIXSCALE = 0.1

• Jaro(s,t)=0.8055

• prefixLength=3

• JWS(s,t)= Jaro(s,t) + ((prefixLength * PREFIXSCALE * (1.0-Jaro(s,t)))

= 0.8055 + (3*0.1*(1-0.8055)) = 0.86385

### Outline

• Definition of metric & similarity

• Various Distance Functions

• Minkowski

• Setbased

• Editdistance

• Basic concept of OLAP

• Lattice

• Distance in same level of hierarchy

• Distance in different level of hierarchy

### Βασικές Έννοιες OLAP

• Αφορά την ανάλυση κάποιων μετρήσιμων μεγεθών (μέτρων)

• πωλήσεις, απόθεμα, κέρδος,...

• Διαστάσεις: παράμετροι που καθορίζουν το περιβάλλον (context) των μέτρων

• ημερομηνία, προϊόν, τοποθεσία, πωλητής, …

• Κύβοι: συνδυασμοί διαστάσεων που καθορίζουν κάποια μέτρα

• Ο κύβος καθορίζει ένα πολυδιάστατο χώρο διαστάσεων, με τα μέτρα να είναι σημεία του χώρου αυτού

REGION

W

S

N

Juice

10

Cola

13

PRODUCT

Soap

Jan

MONTH

### Βασικές Έννοιες OLAP

• Τα δεδομένα θεωρούνται αποθηκευμένα σε ένα πολυδιάστατο πίνακα (multi-dimensional array), ο οποίος αποκαλείται και κύβος ή υπερκύβος (Cube και HyperCube αντίστοιχα).

• Ο κύβος είναι μια ομάδα από κελιά δεδομένων (data cells). Κάθε κελί χαρακτηρίζεται μονοσήμαντα από τις αντίστοιχες τιμές των διαστάσεων (dimensions)του κύβου.

• Τα περιεχόμενα του κελιού ονομάζονται μέτρα (measures) και αναπαριστούν τις αποτιμώμενες αξίες του πραγματικού κόσμου.

### Ιεραρχίες επιπέδων για OLAP

• Μια διάσταση μοντελοποιεί όλους τους τρόπους με τους οποίους τα δεδομένα μπορούν να συναθροιστούν σε σχέση με μια συγκεκριμένη παράμετρο του περιεχομένου τους.

• Ημερομηνία, Προϊόν, Τοποθεσία, Πωλητής, …

• Κάθε διάσταση έχει μια σχετική ιεραρχία επιπέδωνσυνάθροισης των δεδομένων (hierarchy of levels). Αυτό σημαίνει, ότι η διάσταση μπορεί να θεωρηθεί από πολλά επίπεδα αδρομέρειας.

• Ημερομηνία: μέρα, εβδομάδα, μήνας, χρόνος, …

### Ιεραρχίες Επιπέδων

• ΙεραρχίεςΕπιπέδων: κάθε διάσταση οργανώνεται σε διαφορετικά επίπεδα αδρομέρειας

• Ο χρήστης μπορεί να πλοηγηθεί από το ένα επίπεδο στο άλλο, δημιουργώντας νέους κύβους κάθε φορά

Αδρομέρεια: το αντίθετο της λεπτομέρειας

-- ο σωστός όρος είναι αδρομέρεια...

Sales volume

Region

Product

Month

### Κύβοι & ιεραρχίες διαστάσεων για OLAP

Διαστάσεις: Product, Region, DateΙεραρχίες διαστάσεων:

Country

Year

Industry

Category

Region

Quarter

City

Week

Product

Month

Day

Store

### Outline

• Definition of metric & similarity

• Various Distance Functions

• Minkowski

• Setbased

• Editdistance

• Basic concept of OLAP

• Lattice

• Distance in same level of hierarchy

• Distance in different level of hierarchy

### Lattice

• A lattice is a partially ordered set (poset) in which every pair of elements has a unique supremum and an inifimum

• The hierarchy of levels is formally defined as a lattice (L,<)

• such that L= (L1, ..., Ln, ALL) is a finite set of levels and

• < is a partial order defined among the levels of L

• such that L1<Li<ALL  1≤i≤n.

• the upper bound is always the level ALL,

• so that we can group all values into the single value ‘all’.

• The lower bound of the lattice is the most detailed level of the dimension.

### Outline

• Definition of metric & similarity

• Various Distance Functions

• Minkowski

• Setbased

• Editdistance

• Basic concept of OLAP

• Lattice

• Distance in same level of hierarchy

• Distance in different level of hierarchy

### Distances in the same level of Hierarchy

• Let a dimension D,

• its levels of hierarchies L1<Li<ALL and

• two specific values x and y s.t. x, y Li

All

L2

L1

### Distances in the same level of Hierarchy

• Explicit

• Minkowski

• Set Based

• Highway

• With respect to the detailed level

• Attribute Based

### Distances in the same level of Hierarchy

• Explicit assignment

• n2 distances for the n values of the dom(Li)

• Minkowski family

• reduce to the Manhattan distance: |x-y|

• Set based family

• reduced to {0, 1}, where

### Distances in the same level of Hierarchy

• Highway distance

• Let the values of level Liform a set of k clusters, where each cluster has a representative rk

• dist(x, y)= dist(x, rx)+ dist(rx, ry)+ dist(y, ry)

• Specify

• k2 distances: dist (rx, ry) and

• k distances: dist(x, rx)

### Distances in the same level of Hierarchy

• With respect to the detailed level

• f is a function that picks one of the descendants

• Attribute based

•  level L  attributes:

•  v [v1 … vn]  dom(L)

• Distance can be defined with respect to the attributes

### Outline

• Definition of metric & similarity

• Various Distance Functions

• Minkowski

• Setbased

• Editdistance

• Basic concept of OLAP

• Lattice

• Distance in same level of hierarchy

• Distance in different level of hierarchy

### Distances in different levels of Hierarchy

• Explicit

• dist1+ dist2

• dist3+dist4

• With respect to the detailed level

• With respect to their least common ancestor

• Highway

• Attribute Based

dist2

xy

y

Ly

dist1

dist3

Lx

yx

x

dist4

### Distances in different levels of Hierarchy

• Let a dimension D,

• its levels of hierarchies L1<Li<ALL

• two specific values x and y s. t. xLx yLy

• Lx<Ly

• ancestor ofxin levelLy

• a descendant ofyin levelLx

dist2

xy

y

Ly

dist1

dist3

Lx

yx

x

dist4

### Distances in different levels of Hierarchy

• Explicit assignment

• define distLx,Ly(x, y)x  Lx, y Ly

• dist1 +dist2

• Where is a distance of two values from the same level of hierarchy

• special case: y is an ancestor of xthen dist2=0

dist2

xy

y

Ly

dist1

dist3

Lx

yx

x

dist4

### Distances in differentlevels of Hierarchy

• dist3 +dist4

• Wherea distance of two values from the same level of hierarchy

• special case: y is an ancestor of xthen dist4=0

### Distances in different levels of Hierarchy

• With respect to the detailed level

• Letand

• Wheredist(x1, y1)a distance of two values from the same level of hierarchy

### Distances in different levels of Hierarchy

• With respect to their commonancestor

• Let Lzthe level of hierarchy where x and y have their first common ancestor

• number of “hops” needed to reach the first common ancestor

• normalizing according to the height of the level

### Distances in different levels of Hierarchy

• Highway distance

• Let every Li is clustered into ki clusters and every cluster has its own representativerki

• Attribute Based

•  level L  attributes:

•  v [v1 … vn]  dom(L)

• Distance can be defined with respect to the attributes

### Types of Levels

• Nominal = 

• values hold the distinctness property

• values can be explicitly distinguished

• Ordinal < >

• values hold the distinctness property & the order property

• values abide by an order

• Interval + -

• values hold the distinctness, order & the addition property

• a unit of measurement exists

• there is meaning of the difference between two values