Distance Functions on Hierarchies - PowerPoint PPT Presentation

Distance functions on hierarchies
Download
1 / 38

  • 81 Views
  • Uploaded on
  • Presentation posted in: General

Distance Functions on Hierarchies. Eftychia Baikousi. Outline. Definition of metric & similarity Various Distance Functions Minkowski Set based Edit distance Basic concept of OLAP Lattice Distance in same level of hierarchy Distance in different level of hierarchy.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

Distance Functions on Hierarchies

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Distance functions on hierarchies

Distance Functions on Hierarchies

Eftychia Baikousi


Outline

Outline

  • Definition of metric & similarity

  • Various Distance Functions

    • Minkowski

    • Setbased

    • Editdistance

  • Basic concept of OLAP

    • Lattice

    • Distance in same level of hierarchy

    • Distance in different level of hierarchy


Definition of metric

Definition of metric

  • A distance function on a given set M is a function d:MxM , that satisfies the following conditions:

    • d(x,y)≥0 and

    • d(x,y)=0 iff x=y

      • Distance is positive between two different points and is zero precisely from a point to itself

    • It is symmetric: d(x,y)=d(y,x)

      • The distance between x and y is the same in either direction

    • It satisfies the triangleinequality: d(x,z) ≤ d(x,y)+ d(y,z)

      • The distance between two points is the shortest distance along any path

  • Is a metric


Definition of similarity metric

Definition of similarity metric

  • Let s(x,y) be the similarity between two points x and y, then the following properties hold:

    • s(x,y) =1 only if x=y (0≤ s ≤1)

    • s(x,y) =s(y,x)x and y (symmetry)

    • The triangle inequality does not hold


Outline1

Outline

  • Definition of metric & similarity

  • Various Distance Functions

    • Minkowski

    • Setbased

    • Editdistance

  • Basic concept of OLAP

    • Lattice

    • Distance in same level of hierarchy

    • Distance in different level of hierarchy


Minkowski family

Minkowski Family


Set based

Set Based


Edit distance levenshtein distance

Edit Distance- Levenshtein distance

  • Edit distance between two strings

    x=x1 ….xn, y=y1…ym

    is defined as the minimum number of atomic edit operations needed

    • Insert : ins(x,i,c)=x1x2…xicxi+1…xn

    • Delete : del(x,i)=x1x2…xi-1xi+1…xn

    • Replace : rep(x,i,c)=x1x2…xi-1cxi+1…xn

  • Assign cost for every edit operation c(o)=1


Edit distances

Edit distances

  • Needleman-Wunch distance or Sellers Algorithm

    • Insert

      • a characterins(x,i,c)=x1x2…xicxi+1…xn

        • with cost(o)=1

      • a gap ins_g(x,i,g)=x1x2…xigxi+1…xn

        • withcost(o)=g

    • Delete

      • a characterdel(x,i)=x1x2…xi-1xi+1…xn

        • withcost(o)=1

      • a gapdel_g(x,i)=x1x2…xi-1xi+1…xn

        • withcost(o)=g

    • Replace

      • a characterrep(x,i,c)=x1x2…xi-1cxi+1…xn

        • withcost(o)=1


Edit distances1

Edit distances

  • Jaro distance

  • Let two strings s and t and

    • s’= characters in s that are common with t

    • t’ = characters in t that are common with s

    • Ts,t=number of transportations of characters in s’ relative to t’


Edit distances2

Edit distances

  • Jaro distance Example

  • Let s =MARTHA and t =MARHTA

    • |s’|=6

    • |t’|=6

    • Ts,t = 2/2since mismatched characters are T/H and H/T


Edit distances3

Edit distances

  • Jaro Winkler

  • JWS(s,t)= Jaro(s,t) + ((prefixLength * PREFIXSCALE * (1.0-Jaro(s,t)))

  • Where:

    • prefixLength : the length of common prefix at the start of the string

    • PREFIXSCALE: a constant scaling factor which gives more favourable ratings to strings that match from the beginning for a set prefix length


Edit distances4

Edit distances

  • Jaro Winkler Example

  • Let s =MARTHA and t =MARHTA and PREFIXSCALE = 0.1

    • Jaro(s,t)=0.8055

    • prefixLength=3

  • JWS(s,t)= Jaro(s,t) + ((prefixLength * PREFIXSCALE * (1.0-Jaro(s,t)))

    = 0.8055 + (3*0.1*(1-0.8055)) = 0.86385


Outline2

Outline

  • Definition of metric & similarity

  • Various Distance Functions

    • Minkowski

    • Setbased

    • Editdistance

  • Basic concept of OLAP

    • Lattice

    • Distance in same level of hierarchy

    • Distance in different level of hierarchy


Distance functions on hierarchies

Βασικές Έννοιες OLAP

  • Αφορά την ανάλυση κάποιων μετρήσιμων μεγεθών (μέτρων)

    • πωλήσεις, απόθεμα, κέρδος,...

  • Διαστάσεις: παράμετροι που καθορίζουν το περιβάλλον (context) των μέτρων

    • ημερομηνία, προϊόν, τοποθεσία, πωλητής, …

  • Κύβοι: συνδυασμοί διαστάσεων που καθορίζουν κάποια μέτρα

    • Ο κύβος καθορίζει ένα πολυδιάστατο χώρο διαστάσεων, με τα μέτρα να είναι σημεία του χώρου αυτού


Distance functions on hierarchies

REGION

W

S

N

Juice

10

Cola

13

PRODUCT

Soap

Jan

MONTH

Κύβοι για OLAP


Distance functions on hierarchies

Κύβοι για OLAP


Distance functions on hierarchies

Βασικές Έννοιες OLAP

  • Τα δεδομένα θεωρούνται αποθηκευμένα σε ένα πολυδιάστατο πίνακα (multi-dimensional array), ο οποίος αποκαλείται και κύβος ή υπερκύβος (Cube και HyperCube αντίστοιχα).

  • Ο κύβος είναι μια ομάδα από κελιά δεδομένων (data cells). Κάθε κελί χαρακτηρίζεται μονοσήμαντα από τις αντίστοιχες τιμές των διαστάσεων (dimensions)του κύβου.

  • Τα περιεχόμενα του κελιού ονομάζονται μέτρα (measures) και αναπαριστούν τις αποτιμώμενες αξίες του πραγματικού κόσμου.


Distance functions on hierarchies

Ιεραρχίες επιπέδων για OLAP

  • Μια διάσταση μοντελοποιεί όλους τους τρόπους με τους οποίους τα δεδομένα μπορούν να συναθροιστούν σε σχέση με μια συγκεκριμένη παράμετρο του περιεχομένου τους.

    • Ημερομηνία, Προϊόν, Τοποθεσία, Πωλητής, …

  • Κάθε διάσταση έχει μια σχετική ιεραρχία επιπέδωνσυνάθροισης των δεδομένων (hierarchy of levels). Αυτό σημαίνει, ότι η διάσταση μπορεί να θεωρηθεί από πολλά επίπεδα αδρομέρειας.

    • Ημερομηνία: μέρα, εβδομάδα, μήνας, χρόνος, …


Distance functions on hierarchies

Ιεραρχίες Επιπέδων

  • ΙεραρχίεςΕπιπέδων: κάθε διάσταση οργανώνεται σε διαφορετικά επίπεδα αδρομέρειας

  • Ο χρήστης μπορεί να πλοηγηθεί από το ένα επίπεδο στο άλλο, δημιουργώντας νέους κύβους κάθε φορά

    Αδρομέρεια: το αντίθετο της λεπτομέρειας

    -- ο σωστός όρος είναι αδρομέρεια...


Distance functions on hierarchies

Sales volume

Region

Product

Month

Κύβοι & ιεραρχίες διαστάσεων για OLAP

Διαστάσεις: Product, Region, DateΙεραρχίες διαστάσεων:

Country

Year

Industry

Category

Region

Quarter

City

Week

Product

Month

Day

Store


Outline3

Outline

  • Definition of metric & similarity

  • Various Distance Functions

    • Minkowski

    • Setbased

    • Editdistance

  • Basic concept of OLAP

    • Lattice

    • Distance in same level of hierarchy

    • Distance in different level of hierarchy


Lattice

Lattice

  • A lattice is a partially ordered set (poset) in which every pair of elements has a unique supremum and an inifimum

  • The hierarchy of levels is formally defined as a lattice (L,<)

    • such that L= (L1, ..., Ln, ALL) is a finite set of levels and

    • < is a partial order defined among the levels of L

    • such that L1<Li<ALL  1≤i≤n.

  • the upper bound is always the level ALL,

    • so that we can group all values into the single value ‘all’.

  • The lower bound of the lattice is the most detailed level of the dimension.


Outline4

Outline

  • Definition of metric & similarity

  • Various Distance Functions

    • Minkowski

    • Setbased

    • Editdistance

  • Basic concept of OLAP

    • Lattice

    • Distance in same level of hierarchy

    • Distance in different level of hierarchy


Distances in the same level of hierarchy

Distances in the same level of Hierarchy

  • Let a dimension D,

  • its levels of hierarchies L1<Li<ALL and

  • two specific values x and y s.t. x, y Li

All

L2

L1


Distances in the same level of hierarchy1

Distances in the same level of Hierarchy

  • Explicit

  • Minkowski

  • Set Based

  • Highway

  • With respect to the detailed level

  • Attribute Based


Distances in the same level of hierarchy2

Distances in the same level of Hierarchy

  • Explicit assignment

    • n2 distances for the n values of the dom(Li)

  • Minkowski family

    • reduce to the Manhattan distance: |x-y|

  • Set based family

    • reduced to {0, 1}, where


Distances in the same level of hierarchy3

Distances in the same level of Hierarchy

  • Highway distance

    • Let the values of level Liform a set of k clusters, where each cluster has a representative rk

    • dist(x, y)= dist(x, rx)+ dist(rx, ry)+ dist(y, ry)

    • Specify

      • k2 distances: dist (rx, ry) and

      • k distances: dist(x, rx)


Distances in the same level of hierarchy4

Distances in the same level of Hierarchy

  • With respect to the detailed level

    • f is a function that picks one of the descendants

  • Attribute based

    •  level L  attributes:

    •  v [v1 … vn]  dom(L)

    • Distance can be defined with respect to the attributes


Outline5

Outline

  • Definition of metric & similarity

  • Various Distance Functions

    • Minkowski

    • Setbased

    • Editdistance

  • Basic concept of OLAP

    • Lattice

    • Distance in same level of hierarchy

    • Distance in different level of hierarchy


Distances in different levels of hierarchy

Distances in different levels of Hierarchy

  • Explicit

  • dist1+ dist2

  • dist3+dist4

  • With respect to the detailed level

  • With respect to their least common ancestor

  • Highway

  • Attribute Based


Distances in different levels of hierarchy1

dist2

xy

y

Ly

dist1

dist3

Lx

yx

x

dist4

Distances in different levels of Hierarchy

  • Let a dimension D,

  • its levels of hierarchies L1<Li<ALL

  • two specific values x and y s. t. xLx yLy

  • Lx<Ly

  • ancestor ofxin levelLy

  • a descendant ofyin levelLx


Distances in different levels of hierarchy2

dist2

xy

y

Ly

dist1

dist3

Lx

yx

x

dist4

Distances in different levels of Hierarchy

  • Explicit assignment

    • define distLx,Ly(x, y)x  Lx, y Ly

  • dist1 +dist2

    • Where is a distance of two values from the same level of hierarchy

    • special case: y is an ancestor of xthen dist2=0


Distances in different levels of hierarchy3

dist2

xy

y

Ly

dist1

dist3

Lx

yx

x

dist4

Distances in differentlevels of Hierarchy

  • dist3 +dist4

    • Wherea distance of two values from the same level of hierarchy

    • special case: y is an ancestor of xthen dist4=0


Distances in different levels of hierarchy4

Distances in different levels of Hierarchy

  • With respect to the detailed level

    • Letand

    • Wheredist(x1, y1)a distance of two values from the same level of hierarchy


Distances in different levels of hierarchy5

Distances in different levels of Hierarchy

  • With respect to their commonancestor

    • Let Lzthe level of hierarchy where x and y have their first common ancestor

    • number of “hops” needed to reach the first common ancestor

    • normalizing according to the height of the level


Distances in different levels of hierarchy6

Distances in different levels of Hierarchy

  • Highway distance

    • Let every Li is clustered into ki clusters and every cluster has its own representativerki

  • Attribute Based

    •  level L  attributes:

    •  v [v1 … vn]  dom(L)

    • Distance can be defined with respect to the attributes


Types of levels

Types of Levels

  • Nominal = 

    • values hold the distinctness property

    • values can be explicitly distinguished

  • Ordinal < >

    • values hold the distinctness property & the order property

    • values abide by an order

  • Interval + -

    • values hold the distinctness, order & the addition property

    • a unit of measurement exists

    • there is meaning of the difference between two values


  • Login