# CoDaPack : A tool for Compositional Data Analysis - PowerPoint PPT Presentation

1 / 11

CoDaPack : A tool for Compositional Data Analysis. M. Comas- Cufí & S. Thió-Henestrosa ( marc.comas@udg.edu ) Dept. Computer Sciences and Applied Mathematics University of Girona ( UdG ) Catalonia-Spain. What’s coda?.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

CoDaPack : A tool for Compositional Data Analysis

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

## CoDaPack: A toolforCompositional Data Analysis

M. Comas-Cufí & S. Thió-Henestrosa

(marc.comas@udg.edu)

Dept. Computer Sciences and Applied Mathematics

University of Girona (UdG)

Catalonia-Spain

### What’s coda?

We are interested in relative values ; total sum is not informative.

• Vector x=[x1, x2,…, xD]

• Add to a constant: 100, 1, 106, 109, …

• Units: percentage, part per one, ppm, ppb, …

• Has positive elements

• Carry only relative information

• Examples

• Production (pieces): [Ok, NonOk, Rework] = [87, 1, 12]

• Household budget (€): [Food, Serv., Other] = [1150, 623, 351]

• Daily activities (h): [Work, Sleep, Other] = [7.5, 7.5, 9]

### Sample space of coda: simplex

• Compositional data live in the simplex (S) represented in ternary (D=3), quaternary (D=4), … diagram

x = [0.45,0.35,0.2]

x=[0.2,0.25,0.2 ,0.35]

D=3

S3

D=4

S4

### Euclidean distance appropriate?

B

A

STOP PROD.

HALF PROD.

NON-STOP

PROD.

STOP PROD.

HALF PROD.

NON-STOP

PROD.

A2009= [0.2, 0.1, 0.7]

B2009= [0.4, 0.3, 0.3]

B2010= [0.3, 0.4, 0.3]

A2010= [0.1, 0.2, 0.7]

A2010- A2009= B2010- B2009= [-0.1, 0.1, 0]

de(A)=de(B)=0.14  measures the absolute difference

B

A

STOP PROD.

HALF PROD.

NON-STOP

PROD.

NON-STOP

PROD.

STOP PROD.

HALF PROD.

2009

2010

0.1

0.2

0.7

0.7

0.4

0.4

0.3

0.3

0.3

0.1

0.2

0.3

0.7

0.3

0.3

0.4

0.1

0.2

-50%

-25%

+100%

+33.3%

0%

0%

### Euclidean distance appropriate?

STOP PROD.

Our interest lies on relative values

A2010/A2009=[1/2, 2, 1]

B2010/B2009=[3/4, 4/3, 1]

Euclidian distance:

de(A) = de(B) = 0.14

B2009

B2010

A2009

A2010

Aitchisondistance:

da(A)=0.6276

da(B) = 0.3970

HALF PROD.

NON-STOP

PROD.

### Log-ratio methodology

• Aitchison geometry to CODA is equivalent to classical euclidean geometry to log-ratio values.

Simplex (restricted space)  Real space (non restricted)

[x1,…,xD] log(xi/xj), i,j = 1,…,D, j ≠ i

### Software

CoDaPack: software developed by the Departament of Computer Science and Applied Mathematics in the Universitat de Girona. Easy and intuitive.

http://ima.udg.edu/codapackmarc.comas@udg.edu

compositions (R-package): analysis of compositional and positive data using different approaches.

http://cran.r-project.org/ raimon.tolosana@upc.edu

robCompositions (R-package): robust estimation for compositional data

http://cran.r-project.org/ templ@tuwien.ac.at

### References

• Aitchison, J., 1986. The Statistical Analysis of Compositional Data. Chapman & Hall, London. Reprinted in 2003 with additional material byBlackburn Press.

• Proceedings of CoDaWork, 2003-2005-2008-2011: available in http://dugi-doc.udg.edu/handle/10256/150.

• CoDaWeb: Compositional Data Analysis Web Site: http://www.compositionaldata.com/