Distance functions on hierarchies
This presentation is the property of its rightful owner.
Sponsored Links
1 / 38

Distance Functions on Hierarchies PowerPoint PPT Presentation


  • 67 Views
  • Uploaded on
  • Presentation posted in: General

Distance Functions on Hierarchies. Eftychia Baikousi. Outline. Definition of metric & similarity Various Distance Functions Minkowski Set based Edit distance Basic concept of OLAP Lattice Distance in same level of hierarchy Distance in different level of hierarchy.

Download Presentation

Distance Functions on Hierarchies

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Distance functions on hierarchies

Distance Functions on Hierarchies

Eftychia Baikousi


Outline

Outline

  • Definition of metric & similarity

  • Various Distance Functions

    • Minkowski

    • Setbased

    • Editdistance

  • Basic concept of OLAP

    • Lattice

    • Distance in same level of hierarchy

    • Distance in different level of hierarchy


Definition of metric

Definition of metric

  • A distance function on a given set M is a function d:MxM , that satisfies the following conditions:

    • d(x,y)≥0 and

    • d(x,y)=0 iff x=y

      • Distance is positive between two different points and is zero precisely from a point to itself

    • It is symmetric: d(x,y)=d(y,x)

      • The distance between x and y is the same in either direction

    • It satisfies the triangleinequality: d(x,z) ≤ d(x,y)+ d(y,z)

      • The distance between two points is the shortest distance along any path

  • Is a metric


Definition of similarity metric

Definition of similarity metric

  • Let s(x,y) be the similarity between two points x and y, then the following properties hold:

    • s(x,y) =1 only if x=y (0≤ s ≤1)

    • s(x,y) =s(y,x)x and y (symmetry)

    • The triangle inequality does not hold


Outline1

Outline

  • Definition of metric & similarity

  • Various Distance Functions

    • Minkowski

    • Setbased

    • Editdistance

  • Basic concept of OLAP

    • Lattice

    • Distance in same level of hierarchy

    • Distance in different level of hierarchy


Minkowski family

Minkowski Family


Set based

Set Based


Edit distance levenshtein distance

Edit Distance- Levenshtein distance

  • Edit distance between two strings

    x=x1 ….xn, y=y1…ym

    is defined as the minimum number of atomic edit operations needed

    • Insert : ins(x,i,c)=x1x2…xicxi+1…xn

    • Delete : del(x,i)=x1x2…xi-1xi+1…xn

    • Replace : rep(x,i,c)=x1x2…xi-1cxi+1…xn

  • Assign cost for every edit operation c(o)=1


Edit distances

Edit distances

  • Needleman-Wunch distance or Sellers Algorithm

    • Insert

      • a characterins(x,i,c)=x1x2…xicxi+1…xn

        • with cost(o)=1

      • a gap ins_g(x,i,g)=x1x2…xigxi+1…xn

        • withcost(o)=g

    • Delete

      • a characterdel(x,i)=x1x2…xi-1xi+1…xn

        • withcost(o)=1

      • a gapdel_g(x,i)=x1x2…xi-1xi+1…xn

        • withcost(o)=g

    • Replace

      • a characterrep(x,i,c)=x1x2…xi-1cxi+1…xn

        • withcost(o)=1


Edit distances1

Edit distances

  • Jaro distance

  • Let two strings s and t and

    • s’= characters in s that are common with t

    • t’ = characters in t that are common with s

    • Ts,t=number of transportations of characters in s’ relative to t’


Edit distances2

Edit distances

  • Jaro distance Example

  • Let s =MARTHA and t =MARHTA

    • |s’|=6

    • |t’|=6

    • Ts,t = 2/2since mismatched characters are T/H and H/T


Edit distances3

Edit distances

  • Jaro Winkler

  • JWS(s,t)= Jaro(s,t) + ((prefixLength * PREFIXSCALE * (1.0-Jaro(s,t)))

  • Where:

    • prefixLength : the length of common prefix at the start of the string

    • PREFIXSCALE: a constant scaling factor which gives more favourable ratings to strings that match from the beginning for a set prefix length


Edit distances4

Edit distances

  • Jaro Winkler Example

  • Let s =MARTHA and t =MARHTA and PREFIXSCALE = 0.1

    • Jaro(s,t)=0.8055

    • prefixLength=3

  • JWS(s,t)= Jaro(s,t) + ((prefixLength * PREFIXSCALE * (1.0-Jaro(s,t)))

    = 0.8055 + (3*0.1*(1-0.8055)) = 0.86385


Outline2

Outline

  • Definition of metric & similarity

  • Various Distance Functions

    • Minkowski

    • Setbased

    • Editdistance

  • Basic concept of OLAP

    • Lattice

    • Distance in same level of hierarchy

    • Distance in different level of hierarchy


Distance functions on hierarchies

Βασικές Έννοιες OLAP

  • Αφορά την ανάλυση κάποιων μετρήσιμων μεγεθών (μέτρων)

    • πωλήσεις, απόθεμα, κέρδος,...

  • Διαστάσεις: παράμετροι που καθορίζουν το περιβάλλον (context) των μέτρων

    • ημερομηνία, προϊόν, τοποθεσία, πωλητής, …

  • Κύβοι: συνδυασμοί διαστάσεων που καθορίζουν κάποια μέτρα

    • Ο κύβος καθορίζει ένα πολυδιάστατο χώρο διαστάσεων, με τα μέτρα να είναι σημεία του χώρου αυτού


Distance functions on hierarchies

REGION

W

S

N

Juice

10

Cola

13

PRODUCT

Soap

Jan

MONTH

Κύβοι για OLAP


Distance functions on hierarchies

Κύβοι για OLAP


Distance functions on hierarchies

Βασικές Έννοιες OLAP

  • Τα δεδομένα θεωρούνται αποθηκευμένα σε ένα πολυδιάστατο πίνακα (multi-dimensional array), ο οποίος αποκαλείται και κύβος ή υπερκύβος (Cube και HyperCube αντίστοιχα).

  • Ο κύβος είναι μια ομάδα από κελιά δεδομένων (data cells). Κάθε κελί χαρακτηρίζεται μονοσήμαντα από τις αντίστοιχες τιμές των διαστάσεων (dimensions)του κύβου.

  • Τα περιεχόμενα του κελιού ονομάζονται μέτρα (measures) και αναπαριστούν τις αποτιμώμενες αξίες του πραγματικού κόσμου.


Distance functions on hierarchies

Ιεραρχίες επιπέδων για OLAP

  • Μια διάσταση μοντελοποιεί όλους τους τρόπους με τους οποίους τα δεδομένα μπορούν να συναθροιστούν σε σχέση με μια συγκεκριμένη παράμετρο του περιεχομένου τους.

    • Ημερομηνία, Προϊόν, Τοποθεσία, Πωλητής, …

  • Κάθε διάσταση έχει μια σχετική ιεραρχία επιπέδωνσυνάθροισης των δεδομένων (hierarchy of levels). Αυτό σημαίνει, ότι η διάσταση μπορεί να θεωρηθεί από πολλά επίπεδα αδρομέρειας.

    • Ημερομηνία: μέρα, εβδομάδα, μήνας, χρόνος, …


Distance functions on hierarchies

Ιεραρχίες Επιπέδων

  • ΙεραρχίεςΕπιπέδων: κάθε διάσταση οργανώνεται σε διαφορετικά επίπεδα αδρομέρειας

  • Ο χρήστης μπορεί να πλοηγηθεί από το ένα επίπεδο στο άλλο, δημιουργώντας νέους κύβους κάθε φορά

    Αδρομέρεια: το αντίθετο της λεπτομέρειας

    -- ο σωστός όρος είναι αδρομέρεια...


Distance functions on hierarchies

Sales volume

Region

Product

Month

Κύβοι & ιεραρχίες διαστάσεων για OLAP

Διαστάσεις: Product, Region, DateΙεραρχίες διαστάσεων:

Country

Year

Industry

Category

Region

Quarter

City

Week

Product

Month

Day

Store


Outline3

Outline

  • Definition of metric & similarity

  • Various Distance Functions

    • Minkowski

    • Setbased

    • Editdistance

  • Basic concept of OLAP

    • Lattice

    • Distance in same level of hierarchy

    • Distance in different level of hierarchy


Lattice

Lattice

  • A lattice is a partially ordered set (poset) in which every pair of elements has a unique supremum and an inifimum

  • The hierarchy of levels is formally defined as a lattice (L,<)

    • such that L= (L1, ..., Ln, ALL) is a finite set of levels and

    • < is a partial order defined among the levels of L

    • such that L1<Li<ALL  1≤i≤n.

  • the upper bound is always the level ALL,

    • so that we can group all values into the single value ‘all’.

  • The lower bound of the lattice is the most detailed level of the dimension.


Outline4

Outline

  • Definition of metric & similarity

  • Various Distance Functions

    • Minkowski

    • Setbased

    • Editdistance

  • Basic concept of OLAP

    • Lattice

    • Distance in same level of hierarchy

    • Distance in different level of hierarchy


Distances in the same level of hierarchy

Distances in the same level of Hierarchy

  • Let a dimension D,

  • its levels of hierarchies L1<Li<ALL and

  • two specific values x and y s.t. x, y Li

All

L2

L1


Distances in the same level of hierarchy1

Distances in the same level of Hierarchy

  • Explicit

  • Minkowski

  • Set Based

  • Highway

  • With respect to the detailed level

  • Attribute Based


Distances in the same level of hierarchy2

Distances in the same level of Hierarchy

  • Explicit assignment

    • n2 distances for the n values of the dom(Li)

  • Minkowski family

    • reduce to the Manhattan distance: |x-y|

  • Set based family

    • reduced to {0, 1}, where


Distances in the same level of hierarchy3

Distances in the same level of Hierarchy

  • Highway distance

    • Let the values of level Liform a set of k clusters, where each cluster has a representative rk

    • dist(x, y)= dist(x, rx)+ dist(rx, ry)+ dist(y, ry)

    • Specify

      • k2 distances: dist (rx, ry) and

      • k distances: dist(x, rx)


Distances in the same level of hierarchy4

Distances in the same level of Hierarchy

  • With respect to the detailed level

    • f is a function that picks one of the descendants

  • Attribute based

    •  level L  attributes:

    •  v [v1 … vn]  dom(L)

    • Distance can be defined with respect to the attributes


Outline5

Outline

  • Definition of metric & similarity

  • Various Distance Functions

    • Minkowski

    • Setbased

    • Editdistance

  • Basic concept of OLAP

    • Lattice

    • Distance in same level of hierarchy

    • Distance in different level of hierarchy


Distances in different levels of hierarchy

Distances in different levels of Hierarchy

  • Explicit

  • dist1+ dist2

  • dist3+dist4

  • With respect to the detailed level

  • With respect to their least common ancestor

  • Highway

  • Attribute Based


Distances in different levels of hierarchy1

dist2

xy

y

Ly

dist1

dist3

Lx

yx

x

dist4

Distances in different levels of Hierarchy

  • Let a dimension D,

  • its levels of hierarchies L1<Li<ALL

  • two specific values x and y s. t. xLx yLy

  • Lx<Ly

  • ancestor ofxin levelLy

  • a descendant ofyin levelLx


Distances in different levels of hierarchy2

dist2

xy

y

Ly

dist1

dist3

Lx

yx

x

dist4

Distances in different levels of Hierarchy

  • Explicit assignment

    • define distLx,Ly(x, y)x  Lx, y Ly

  • dist1 +dist2

    • Where is a distance of two values from the same level of hierarchy

    • special case: y is an ancestor of xthen dist2=0


Distances in different levels of hierarchy3

dist2

xy

y

Ly

dist1

dist3

Lx

yx

x

dist4

Distances in differentlevels of Hierarchy

  • dist3 +dist4

    • Wherea distance of two values from the same level of hierarchy

    • special case: y is an ancestor of xthen dist4=0


Distances in different levels of hierarchy4

Distances in different levels of Hierarchy

  • With respect to the detailed level

    • Letand

    • Wheredist(x1, y1)a distance of two values from the same level of hierarchy


Distances in different levels of hierarchy5

Distances in different levels of Hierarchy

  • With respect to their commonancestor

    • Let Lzthe level of hierarchy where x and y have their first common ancestor

    • number of “hops” needed to reach the first common ancestor

    • normalizing according to the height of the level


Distances in different levels of hierarchy6

Distances in different levels of Hierarchy

  • Highway distance

    • Let every Li is clustered into ki clusters and every cluster has its own representativerki

  • Attribute Based

    •  level L  attributes:

    •  v [v1 … vn]  dom(L)

    • Distance can be defined with respect to the attributes


Types of levels

Types of Levels

  • Nominal = 

    • values hold the distinctness property

    • values can be explicitly distinguished

  • Ordinal < >

    • values hold the distinctness property & the order property

    • values abide by an order

  • Interval + -

    • values hold the distinctness, order & the addition property

    • a unit of measurement exists

    • there is meaning of the difference between two values


  • Login