distance functions on hierarchies
Download
Skip this Video
Download Presentation
Distance Functions on Hierarchies

Loading in 2 Seconds...

play fullscreen
1 / 38

Distance Functions on Hierarchies - PowerPoint PPT Presentation


  • 107 Views
  • Uploaded on

Distance Functions on Hierarchies. Eftychia Baikousi. Outline. Definition of metric & similarity Various Distance Functions Minkowski Set based Edit distance Basic concept of OLAP Lattice Distance in same level of hierarchy Distance in different level of hierarchy.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Distance Functions on Hierarchies' - quincy


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
outline
Outline
  • Definition of metric & similarity
  • Various Distance Functions
    • Minkowski
    • Setbased
    • Editdistance
  • Basic concept of OLAP
    • Lattice
    • Distance in same level of hierarchy
    • Distance in different level of hierarchy
definition of metric
Definition of metric
  • A distance function on a given set M is a function d:MxM , that satisfies the following conditions:
    • d(x,y)≥0 and
    • d(x,y)=0 iff x=y
      • Distance is positive between two different points and is zero precisely from a point to itself
    • It is symmetric: d(x,y)=d(y,x)
      • The distance between x and y is the same in either direction
    • It satisfies the triangleinequality: d(x,z) ≤ d(x,y)+ d(y,z)
      • The distance between two points is the shortest distance along any path
  • Is a metric
definition of similarity metric
Definition of similarity metric
  • Let s(x,y) be the similarity between two points x and y, then the following properties hold:
    • s(x,y) =1 only if x=y (0≤ s ≤1)
    • s(x,y) =s(y,x)x and y (symmetry)
    • The triangle inequality does not hold
outline1
Outline
  • Definition of metric & similarity
  • Various Distance Functions
    • Minkowski
    • Setbased
    • Editdistance
  • Basic concept of OLAP
    • Lattice
    • Distance in same level of hierarchy
    • Distance in different level of hierarchy
edit distance levenshtein distance
Edit Distance- Levenshtein distance
  • Edit distance between two strings

x=x1 ….xn, y=y1…ym

is defined as the minimum number of atomic edit operations needed

    • Insert : ins(x,i,c)=x1x2…xicxi+1…xn
    • Delete : del(x,i)=x1x2…xi-1xi+1…xn
    • Replace : rep(x,i,c)=x1x2…xi-1cxi+1…xn
  • Assign cost for every edit operation c(o)=1
edit distances
Edit distances
  • Needleman-Wunch distance or Sellers Algorithm
    • Insert
      • a characterins(x,i,c)=x1x2…xicxi+1…xn
        • with cost(o)=1
      • a gap ins_g(x,i,g)=x1x2…xigxi+1…xn
        • withcost(o)=g
    • Delete
      • a characterdel(x,i)=x1x2…xi-1xi+1…xn
        • withcost(o)=1
      • a gapdel_g(x,i)=x1x2…xi-1xi+1…xn
        • withcost(o)=g
    • Replace
      • a characterrep(x,i,c)=x1x2…xi-1cxi+1…xn
        • withcost(o)=1
edit distances1
Edit distances
  • Jaro distance
  • Let two strings s and t and
    • s’= characters in s that are common with t
    • t’ = characters in t that are common with s
    • Ts,t=number of transportations of characters in s’ relative to t’
edit distances2
Edit distances
  • Jaro distance Example
  • Let s =MARTHA and t =MARHTA
    • |s’|=6
    • |t’|=6
    • Ts,t = 2/2since mismatched characters are T/H and H/T
edit distances3
Edit distances
  • Jaro Winkler
  • JWS(s,t)= Jaro(s,t) + ((prefixLength * PREFIXSCALE * (1.0-Jaro(s,t)))
  • Where:
    • prefixLength : the length of common prefix at the start of the string
    • PREFIXSCALE: a constant scaling factor which gives more favourable ratings to strings that match from the beginning for a set prefix length
edit distances4
Edit distances
  • Jaro Winkler Example
  • Let s =MARTHA and t =MARHTA and PREFIXSCALE = 0.1
    • Jaro(s,t)=0.8055
    • prefixLength=3
  • JWS(s,t)= Jaro(s,t) + ((prefixLength * PREFIXSCALE * (1.0-Jaro(s,t)))

= 0.8055 + (3*0.1*(1-0.8055)) = 0.86385

outline2
Outline
  • Definition of metric & similarity
  • Various Distance Functions
    • Minkowski
    • Setbased
    • Editdistance
  • Basic concept of OLAP
    • Lattice
    • Distance in same level of hierarchy
    • Distance in different level of hierarchy
slide15
Βασικές Έννοιες OLAP
  • Αφορά την ανάλυση κάποιων μετρήσιμων μεγεθών (μέτρων)
    • πωλήσεις, απόθεμα, κέρδος,...
  • Διαστάσεις: παράμετροι που καθορίζουν το περιβάλλον (context) των μέτρων
    • ημερομηνία, προϊόν, τοποθεσία, πωλητής, …
  • Κύβοι: συνδυασμοί διαστάσεων που καθορίζουν κάποια μέτρα
    • Ο κύβος καθορίζει ένα πολυδιάστατο χώρο διαστάσεων, με τα μέτρα να είναι σημεία του χώρου αυτού
slide16
REGION

W

S

N

Juice

10

Cola

13

PRODUCT

Soap

Jan

MONTH

Κύβοι για OLAP
slide18
Βασικές Έννοιες OLAP
  • Τα δεδομένα θεωρούνται αποθηκευμένα σε ένα πολυδιάστατο πίνακα (multi-dimensional array), ο οποίος αποκαλείται και κύβος ή υπερκύβος (Cube και HyperCube αντίστοιχα).
  • Ο κύβος είναι μια ομάδα από κελιά δεδομένων (data cells). Κάθε κελί χαρακτηρίζεται μονοσήμαντα από τις αντίστοιχες τιμές των διαστάσεων (dimensions)του κύβου.
  • Τα περιεχόμενα του κελιού ονομάζονται μέτρα (measures) και αναπαριστούν τις αποτιμώμενες αξίες του πραγματικού κόσμου.
slide19
Ιεραρχίες επιπέδων για OLAP
  • Μια διάσταση μοντελοποιεί όλους τους τρόπους με τους οποίους τα δεδομένα μπορούν να συναθροιστούν σε σχέση με μια συγκεκριμένη παράμετρο του περιεχομένου τους.
    • Ημερομηνία, Προϊόν, Τοποθεσία, Πωλητής, …
  • Κάθε διάσταση έχει μια σχετική ιεραρχία επιπέδωνσυνάθροισης των δεδομένων (hierarchy of levels). Αυτό σημαίνει, ότι η διάσταση μπορεί να θεωρηθεί από πολλά επίπεδα αδρομέρειας.
    • Ημερομηνία: μέρα, εβδομάδα, μήνας, χρόνος, …
slide20
Ιεραρχίες Επιπέδων
  • ΙεραρχίεςΕπιπέδων: κάθε διάσταση οργανώνεται σε διαφορετικά επίπεδα αδρομέρειας
  • Ο χρήστης μπορεί να πλοηγηθεί από το ένα επίπεδο στο άλλο, δημιουργώντας νέους κύβους κάθε φορά

Αδρομέρεια: το αντίθετο της λεπτομέρειας

-- ο σωστός όρος είναι αδρομέρεια...

slide21
Sales volume

Region

Product

Month

Κύβοι & ιεραρχίες διαστάσεων για OLAP

Διαστάσεις: Product, Region, DateΙεραρχίες διαστάσεων:

Country

Year

Industry

Category

Region

Quarter

City

Week

Product

Month

Day

Store

outline3
Outline
  • Definition of metric & similarity
  • Various Distance Functions
    • Minkowski
    • Setbased
    • Editdistance
  • Basic concept of OLAP
    • Lattice
    • Distance in same level of hierarchy
    • Distance in different level of hierarchy
lattice
Lattice
  • A lattice is a partially ordered set (poset) in which every pair of elements has a unique supremum and an inifimum
  • The hierarchy of levels is formally defined as a lattice (L,<)
    • such that L= (L1, ..., Ln, ALL) is a finite set of levels and
    • < is a partial order defined among the levels of L
    • such that L1
  • the upper bound is always the level ALL,
    • so that we can group all values into the single value ‘all’.
  • The lower bound of the lattice is the most detailed level of the dimension.
outline4
Outline
  • Definition of metric & similarity
  • Various Distance Functions
    • Minkowski
    • Setbased
    • Editdistance
  • Basic concept of OLAP
    • Lattice
    • Distance in same level of hierarchy
    • Distance in different level of hierarchy
distances in the same level of hierarchy
Distances in the same level of Hierarchy
  • Let a dimension D,
  • its levels of hierarchies L1
  • two specific values x and y s.t. x, y Li

All

L2

L1

distances in the same level of hierarchy1
Distances in the same level of Hierarchy
  • Explicit
  • Minkowski
  • Set Based
  • Highway
  • With respect to the detailed level
  • Attribute Based
distances in the same level of hierarchy2
Distances in the same level of Hierarchy
  • Explicit assignment
    • n2 distances for the n values of the dom(Li)
  • Minkowski family
    • reduce to the Manhattan distance: |x-y|
  • Set based family
    • reduced to {0, 1}, where
distances in the same level of hierarchy3
Distances in the same level of Hierarchy
  • Highway distance
    • Let the values of level Liform a set of k clusters, where each cluster has a representative rk
    • dist(x, y)= dist(x, rx)+ dist(rx, ry)+ dist(y, ry)
    • Specify
      • k2 distances: dist (rx, ry) and
      • k distances: dist(x, rx)
distances in the same level of hierarchy4
Distances in the same level of Hierarchy
  • With respect to the detailed level
    • f is a function that picks one of the descendants
  • Attribute based
    •  level L  attributes:
    •  v [v1 … vn]  dom(L)
    • Distance can be defined with respect to the attributes
outline5
Outline
  • Definition of metric & similarity
  • Various Distance Functions
    • Minkowski
    • Setbased
    • Editdistance
  • Basic concept of OLAP
    • Lattice
    • Distance in same level of hierarchy
    • Distance in different level of hierarchy
distances in different levels of hierarchy
Distances in different levels of Hierarchy
  • Explicit
  • dist1+ dist2
  • dist3+dist4
  • With respect to the detailed level
  • With respect to their least common ancestor
  • Highway
  • Attribute Based
distances in different levels of hierarchy1
dist2

xy

y

Ly

dist1

dist3

Lx

yx

x

dist4

Distances in different levels of Hierarchy
  • Let a dimension D,
  • its levels of hierarchies L1
  • two specific values x and y s. t. xLx yLy
  • Lx
  • ancestor ofxin levelLy
  • a descendant ofyin levelLx
distances in different levels of hierarchy2
dist2

xy

y

Ly

dist1

dist3

Lx

yx

x

dist4

Distances in different levels of Hierarchy
  • Explicit assignment
    • define distLx,Ly(x, y)x  Lx, y Ly
  • dist1 +dist2
    • Where is a distance of two values from the same level of hierarchy
    • special case: y is an ancestor of xthen dist2=0
distances in different levels of hierarchy3
dist2

xy

y

Ly

dist1

dist3

Lx

yx

x

dist4

Distances in differentlevels of Hierarchy
  • dist3 +dist4
    • Wherea distance of two values from the same level of hierarchy
    • special case: y is an ancestor of xthen dist4=0
distances in different levels of hierarchy4
Distances in different levels of Hierarchy
  • With respect to the detailed level
    • Let and
    • Wheredist(x1, y1)a distance of two values from the same level of hierarchy
distances in different levels of hierarchy5
Distances in different levels of Hierarchy
  • With respect to their commonancestor
    • Let Lzthe level of hierarchy where x and y have their first common ancestor
    • number of “hops” needed to reach the first common ancestor
    • normalizing according to the height of the level
distances in different levels of hierarchy6
Distances in different levels of Hierarchy
  • Highway distance
    • Let every Li is clustered into ki clusters and every cluster has its own representativerki
  • Attribute Based
    •  level L  attributes:
    •  v [v1 … vn]  dom(L)
    • Distance can be defined with respect to the attributes
types of levels
Types of Levels
  • Nominal = 
    • values hold the distinctness property
    • values can be explicitly distinguished
  • Ordinal < >
    • values hold the distinctness property & the order property
    • values abide by an order
  • Interval + -
    • values hold the distinctness, order & the addition property
    • a unit of measurement exists
    • there is meaning of the difference between two values
ad