Tutorial 1
This presentation is the property of its rightful owner.
Sponsored Links
1 / 28

Tutorial 1 PowerPoint PPT Presentation


  • 109 Views
  • Uploaded on
  • Presentation posted in: General

Tutorial 1. General Introduction to SDA. Yin-Jing Tien ( 田銀錦 ) Institute of Statistical Science Academia Sinica [email protected] June 13, 2014. Symbolic data Analysis (SDA) ( Diday 1987). Text: Billard and Diday (2006):

Download Presentation

Tutorial 1

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Tutorial 1

Tutorial 1

General Introduction to SDA

Yin-Jing Tien (田銀錦)

Institute of Statistical Science

Academia Sinica

[email protected]

June 13, 2014


Tutorial 1

Symbolic data Analysis (SDA)

(Diday 1987)

Text:

Billard and Diday (2006):

Symbolic Data Analysis: Conceptual Statistics and Data Mining. Wiley.

Diday, E., Noirhomme-Fraiture, M. (2008):

Symbolic Data Analysis and The SODAS Software. JohnWiley & Sons Ltd., Chichester, England.


Tutorial 1

Symbolic data

(Diday 1987)

  • Classical Data : Individuals:single value

  • Single player

  • age = 25, eye color = blue

  • Symbolic Data : Symbolic units (Concept: groups)

  • Team

  • interval : age range = [20, 36]

  • multiple values: eye color = {blue,brown,black}


Tutorial 1

Symbolic data analysis When?

  • When we are interested the higher level units (Concept: groups/classes ).

  • When the initial data are composed by

  • Symbolic data tables

  • When the data is BIG


Tutorial 1

Symbolic data types


Tutorial 1

Symbolic data types (quantitative)

Multi-valued symbolic random variable Y is one or more values

{12,23,20}

Interval-valued symbolic random variable Y is one that takes values in an interval

[17, 25]

Modal multi-valued

{0.5, 3/8, 1.5, 4/8, 2, 1/8}

Modal interval-valued (Histogram)

{[12,40), 1/7, [40, 60), 2/7, [60, 80], 4/7}


Tutorial 1

Symbolic data types (qualitative)

Multi-valued symbolic random variable Y is one or more values

E.g., Bird Colors, Y=color

Modal multi-valued

{single, 3/8, married, 5/8}


Tutorial 1

Basic Descriptive Statistics: Interval Value

Let Zi = (I1i, I2i, . . . , Iki)T be the interval data for the ith variable with k concepts, where Ici = [aci, bci], c = 1, 2, . . . , k.

Sample Mean of Iiis

Sample Variance of Ziis


Tutorial 1

Basic Descriptive Statistics: Interval Value

Rewrite as

Total Variation = Within Variation + Between Variation

Within Variation

Between Variation


Tutorial 1

Similarity between Variables (interval-valued data) (Billard and Diday (2006))

The empirical covariancefunction between Ziand Zjis

The empirical correlation coefficient between Ziand Zjis

Where


Tutorial 1

Distance between concept

Definition 7.6: The Cartesian join A⊕B between two sets A and B is their

componentwise union,

Definition 7.7: The Cartesian meet A⊗B between two sets A and B is their

componentwise intersection,


Tutorial 1

Distance between concept


Tutorial 1

Distance between concept (Multi-valued)

The Gowda-Diday dissimilarity measure (Gowda and Diday, 1991)

(relative sizes)

(relative content)


Tutorial 1

Distance between concept (Multi-valued)

Example: Color and Habitat of Birds (Table 7.2)

Y1 = Color, Y2 = Habitat

For Y1: D11(ω1, ω2)=(|2-1|)/2 = 1/2

D21(ω1, ω2)=(|2+1-2*1|)/2 = 1/2

p=2

The Gowda-Didaydissimilarity

For Y2:D11(ω1, ω2)=(|2-1|)/2 = 1/2

D21(ω1, ω2)=(|2+1-2*1|)/2 = 1/2

D(ω1, ω2)=(1/2+1/2)+(1/2+1/2) = 2

Normalized (adjust for scale) weights are32

D(ω1, ω2)=(1/2+1/2)/3+(1/2+1/2)/2= 5/6


Tutorial 1

Distance between concept (Multi-valued)

TheIchino-Yaguchi dissimilarity measure (Ichino and Yaguchi, 1994)

ϕj(ω1, ω2)= )

ϕ1(ω1, ω2)= 2-1+γ(2*1-2-1)

= 1-γ

For Y1:

ϕ2(ω1, ω2)= 2-1+γ(2*1-2-1)

= 1-γ

For Y2:

Takingγ=0.5

UnweightedMinkowskidistance

Dq(ω1, ω2)= (0.5q+0.5q)1/q

Weighted Minkowskidistance ( )

Dq(ω1, ω2)= ((0.5/3)q+(0.5/2)q)1/q


Tutorial 1

Distance between concept (Interval-valued)

Let Zi= (I1i, I2i, . . . , Iki)T be the interval data for the ith variable with k concepts, where Ici = [aci, bci], c = 1, 2, . . . , k.

The Gowda-Diday dissimilarity measure (Gowda and Diday, 1991)

Dj(ω1, ω2)

for the variable Yj

D(ω1, ω2) =

(relative length)

(relative content)

(relative position)

length of the entire distance spanned by ω1andω2

, if the intervals overlap

length of the intersection

, otherwise

total length in covered by the observe values of Yj


Tutorial 1

Distance between concept (Interval-valued)

The Ichino-Yaguchi dissimilarity measure(Ichino and Yaguchi, 1994)

ϕj(ω1, ω2) = )

=

(empty if no interaction)

=

The generalized Minkowski distance of order q ≥1 between two interval-valued

observations ξ(ω1) and ξ(ω2) is

dq(ω1, ω2)

Where ϕj(ω1, ω2) is the Ichino-Yaguchidistance and is a weight function associated with variable Yj .

ϕj(ω1, ω2)

When q = 1  City Block distance

When q = 2 Euclidean distance


Tutorial 1

Distance between concept (Interval-valued)

The Hausdorff Distance(Chavent and Lechevallier, 2002)

ϕj(ω1, ω2))

d(ω1, ω2)

The Euclidean Hausdorff Distance

d(ω1, ω2)

Where ϕj(ω1, ω2) is the HausdorffDistance

The Normalization Euclidean Hausdorff Distance

Where

d(ω1, ω2)

The Span Normalization Euclidean Hausdorff Distance

Where the span =

d(ω1, ω2)


Tutorial 1

Distance between concept (Interval-valued)

Example: Take the first 3 observations

only of veterinary data

D(ω1, ω2) =

Gowda-Didaydissimilarity

(Y1)

|120-158|/65]

(Y2)


Tutorial 1

Distance between concept (Interval-valued)

TheIchino-Yaguchidissimilarity

ϕj(ω1, ω2) = )

=

(empty if no interaction)

=

ϕ1(ω1, ω2) = |180-120|)

= 58+(-58)

ϕ2(ω1, ω2) = |355-222.2|)

= 100.8+

The generalized Minkowski distance

When q = 1  City Block distance

When q = 2 Euclidean distance

dq(ω1, ω2)


Tutorial 1

Distance between concept (Interval-valued)

TheHausdorffDistance

ϕj(ω1, ω2))

d(ω1, ω2)

ϕ1(ω1, ω2))38

38 + 99.8 = 137.8

ϕ2(ω1, ω2))99.8

The Euclidean Hausdorff Distance

d(ω1, ω2)

The Normalization Euclidean Hausdorff Distance

]288.78

d(ω1, ω2)

The Span Normalization Euclidean Hausdorff Distance

=

= 185-120 = 65

d(ω1, ω2)

= 355-117.2 = 237.8


Tutorial 1

Distance between concept (group) of interval-valued data


Tutorial 1

Comparison of between-concept distance measures


Tutorial 1

Interval-valued symbolic data analysis

  • Books(Bock and Diday (2000), Billard and Diday (2003,

  • 2006), and Diday and Noirhomme-Fraiture (2008))

  • PCA(Chouakria, Cazes, and Diday (2000); Palumbo and

  • Lauro (2003); Gioia and Lauro (2006); Hamada,

  • Minami, and Mizuta (2008))

  • Clustering analysis ( Brito (2002); Souza and de

  • Carvalho (2004); Chavent et al. (2006); Bock (2008))

  • Discriminant analysis (Lauro, Verde, and Palumbo (2000);

  • Duarte Silva and Brito (2006))

  • MDS (Groenen et al. (2006); Minami and Mizuta (2008)

  • Regression (Billard and Diday (2000); de Carvalho et al.

  • (2004))


Tutorial 1

Visualization Tools for Symbolic Data (Analysis)


Tutorial 1

Symbolic Data Analysis Software

• SODAS (2003)

FREE from 2 European Consortium

• SYR (2008)

More professional from SYROKKO Company

www.syrokko.com


  • Login