slide1
Download
Skip this Video
Download Presentation
Tutorial 1

Loading in 2 Seconds...

play fullscreen
1 / 28

Tutorial 1 - PowerPoint PPT Presentation


  • 194 Views
  • Uploaded on

Tutorial 1. General Introduction to SDA. Yin-Jing Tien ( 田銀錦 ) Institute of Statistical Science Academia Sinica [email protected] June 13, 2014. Symbolic data Analysis (SDA) ( Diday 1987). Text: Billard and Diday (2006):

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Tutorial 1' - cerise


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Tutorial 1

General Introduction to SDA

Yin-Jing Tien (田銀錦)

Institute of Statistical Science

Academia Sinica

[email protected]

June 13, 2014

slide2

Symbolic data Analysis (SDA)

(Diday 1987)

Text:

Billard and Diday (2006):

Symbolic Data Analysis: Conceptual Statistics and Data Mining. Wiley.

Diday, E., Noirhomme-Fraiture, M. (2008):

Symbolic Data Analysis and The SODAS Software. JohnWiley & Sons Ltd., Chichester, England.

slide3

Symbolic data

(Diday 1987)

  • Classical Data : Individuals:single value
  • Single player
  • age = 25, eye color = blue
  • Symbolic Data : Symbolic units (Concept: groups)
  • Team
  • interval : age range = [20, 36]
  • multiple values: eye color = {blue,brown,black}
slide4

Symbolic data analysis When?

  • When we are interested the higher level units (Concept: groups/classes ).
  • When the initial data are composed by
  • Symbolic data tables
  • When the data is BIG
slide6

Symbolic data types (quantitative)

Multi-valued symbolic random variable Y is one or more values

{12,23,20}

Interval-valued symbolic random variable Y is one that takes values in an interval

[17, 25]

Modal multi-valued

{0.5, 3/8, 1.5, 4/8, 2, 1/8}

Modal interval-valued (Histogram)

{[12,40), 1/7, [40, 60), 2/7, [60, 80], 4/7}

slide7

Symbolic data types (qualitative)

Multi-valued symbolic random variable Y is one or more values

E.g., Bird Colors, Y=color

Modal multi-valued

{single, 3/8, married, 5/8}

slide8

Basic Descriptive Statistics: Interval Value

Let Zi = (I1i, I2i, . . . , Iki)T be the interval data for the ith variable with k concepts, where Ici = [aci, bci], c = 1, 2, . . . , k.

Sample Mean of Iiis

Sample Variance of Ziis

slide9

Basic Descriptive Statistics: Interval Value

Rewrite as

Total Variation = Within Variation + Between Variation

Within Variation

Between Variation

slide10

Similarity between Variables (interval-valued data) (Billard and Diday (2006))

The empirical covariancefunction between Ziand Zjis

The empirical correlation coefficient between Ziand Zjis

Where

slide11

Distance between concept

Definition 7.6: The Cartesian join A⊕B between two sets A and B is their

componentwise union,

Definition 7.7: The Cartesian meet A⊗B between two sets A and B is their

componentwise intersection,

slide13

Distance between concept (Multi-valued)

The Gowda-Diday dissimilarity measure (Gowda and Diday, 1991)

(relative sizes)

(relative content)

slide14

Distance between concept (Multi-valued)

Example: Color and Habitat of Birds (Table 7.2)

Y1 = Color, Y2 = Habitat

For Y1: D11(ω1, ω2)=(|2-1|)/2 = 1/2

D21(ω1, ω2)=(|2+1-2*1|)/2 = 1/2

p=2

The Gowda-Didaydissimilarity

For Y2:D11(ω1, ω2)=(|2-1|)/2 = 1/2

D21(ω1, ω2)=(|2+1-2*1|)/2 = 1/2

D(ω1, ω2)=(1/2+1/2)+(1/2+1/2) = 2

Normalized (adjust for scale) weights are32

D(ω1, ω2)=(1/2+1/2)/3+(1/2+1/2)/2= 5/6

slide15

Distance between concept (Multi-valued)

TheIchino-Yaguchi dissimilarity measure (Ichino and Yaguchi, 1994)

ϕj(ω1, ω2)= )

ϕ1(ω1, ω2)= 2-1+γ(2*1-2-1)

= 1-γ

For Y1:

ϕ2(ω1, ω2)= 2-1+γ(2*1-2-1)

= 1-γ

For Y2:

Takingγ=0.5

UnweightedMinkowskidistance

Dq(ω1, ω2)= (0.5q+0.5q)1/q

Weighted Minkowskidistance ( )

Dq(ω1, ω2)= ((0.5/3)q+(0.5/2)q)1/q

slide16

Distance between concept (Interval-valued)

Let Zi= (I1i, I2i, . . . , Iki)T be the interval data for the ith variable with k concepts, where Ici = [aci, bci], c = 1, 2, . . . , k.

The Gowda-Diday dissimilarity measure (Gowda and Diday, 1991)

Dj(ω1, ω2)

for the variable Yj

D(ω1, ω2) =

(relative length)

(relative content)

(relative position)

length of the entire distance spanned by ω1andω2

, if the intervals overlap

length of the intersection

, otherwise

total length in covered by the observe values of Yj

slide17

Distance between concept (Interval-valued)

The Ichino-Yaguchi dissimilarity measure(Ichino and Yaguchi, 1994)

ϕj(ω1, ω2) = )

=

(empty if no interaction)

=

The generalized Minkowski distance of order q ≥1 between two interval-valued

observations ξ(ω1) and ξ(ω2) is

dq(ω1, ω2)

Where ϕj(ω1, ω2) is the Ichino-Yaguchidistance and is a weight function associated with variable Yj .

ϕj(ω1, ω2)

When q = 1  City Block distance

When q = 2 Euclidean distance

slide18

Distance between concept (Interval-valued)

The Hausdorff Distance(Chavent and Lechevallier, 2002)

ϕj(ω1, ω2))

d(ω1, ω2)

The Euclidean Hausdorff Distance

d(ω1, ω2)

Where ϕj(ω1, ω2) is the HausdorffDistance

The Normalization Euclidean Hausdorff Distance

Where

d(ω1, ω2)

The Span Normalization Euclidean Hausdorff Distance

Where the span =

d(ω1, ω2)

slide19

Distance between concept (Interval-valued)

Example: Take the first 3 observations

only of veterinary data

D(ω1, ω2) =

Gowda-Didaydissimilarity

(Y1)

|120-158|/65]

(Y2)

slide20

Distance between concept (Interval-valued)

TheIchino-Yaguchidissimilarity

ϕj(ω1, ω2) = )

=

(empty if no interaction)

=

ϕ1(ω1, ω2) = |180-120|)

= 58+(-58)

ϕ2(ω1, ω2) = |355-222.2|)

= 100.8+

The generalized Minkowski distance

When q = 1  City Block distance

When q = 2 Euclidean distance

dq(ω1, ω2)

slide21

Distance between concept (Interval-valued)

TheHausdorffDistance

ϕj(ω1, ω2))

d(ω1, ω2)

ϕ1(ω1, ω2))38

38 + 99.8 = 137.8

ϕ2(ω1, ω2))99.8

The Euclidean Hausdorff Distance

d(ω1, ω2)

The Normalization Euclidean Hausdorff Distance

]288.78

d(ω1, ω2)

The Span Normalization Euclidean Hausdorff Distance

=

= 185-120 = 65

d(ω1, ω2)

= 355-117.2 = 237.8

slide26

Interval-valued symbolic data analysis

  • Books(Bock and Diday (2000), Billard and Diday (2003,
  • 2006), and Diday and Noirhomme-Fraiture (2008))
  • PCA(Chouakria, Cazes, and Diday (2000); Palumbo and
  • Lauro (2003); Gioia and Lauro (2006); Hamada,
  • Minami, and Mizuta (2008))
  • Clustering analysis ( Brito (2002); Souza and de
  • Carvalho (2004); Chavent et al. (2006); Bock (2008))
  • Discriminant analysis (Lauro, Verde, and Palumbo (2000);
  • Duarte Silva and Brito (2006))
  • MDS (Groenen et al. (2006); Minami and Mizuta (2008)
  • Regression (Billard and Diday (2000); de Carvalho et al.
  • (2004))
slide28

Symbolic Data Analysis Software

• SODAS (2003)

FREE from 2 European Consortium

• SYR (2008)

More professional from SYROKKO Company

www.syrokko.com

ad