1 / 21

Similarity/ Dissimilarity

Similarity/ Dissimilarity. Various types of variable Data Mining: Concept and Techniques ( Jiawei Han, Micheline Kamber ). Struktur data. Data matrix (object-by-variable structure) Struktur ini dalam bentuk tabel relasional , n objek x p variable:

shasta
Download Presentation

Similarity/ Dissimilarity

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Similarity/ Dissimilarity Various types of variable Data Mining: Concept and Techniques (Jiawei Han, MichelineKamber)

  2. Struktur data • Data matrix (object-by-variable structure) • Strukturinidalambentuktabelrelasional, nobjek x p variable: • Dissimilarity matrix (object-by-object structure) • Menyimpankumpulanperkiraandariseluruhpasangan n objek (n-by-n tabel) • Strukturinidigunakanuntukmenghitungklasterdariobjek.

  3. Macam-macam type data dalamsuatuvariabel: • Interval-scaled variable • Binary variable • Categorical variable • Ratio variable

  4. Interval scaled variable • Interval scaled variable: ukuran2 kontinudariskala linear • Contoh: tinggi, berat, koordinat latitude atau longitude (clustering rumah), temperaturcuaca

  5. Interval-scaled variables • Hitung mean absolute deviation, sf: • Hitungukuranygdistandarisasikan (or Z-score)

  6. Euclidean Distance • Manhattan Distance

  7. Contoh • Penghitungan dissimilarity tanpastandarisasi

  8. Binary variable • Variabel yang memilikiduanilai 0 dan 1, dimana 0absent, 1present • Bagaimanamenghitungketidaksamaan (dissimilarity) Dissimilarity (jarak)

  9. Similarity:

  10. Contohvariabelbiner • Suatutabelrekordpasienberisiatribut: nama, gender, fever, cough, test-1,test-2, test-3, test-4 Nama object identifier Gender simetricatribute

  11. Categorical variable • Variabelkategori: secaraumumsamadenganvariabelbinernamunmengambillebihdari 2 keadaan. • Contoh: map color (5 states): red, yellow, green, pink, and blue • Ukuran dissimilarity: • Dimana, p: jumlah variable, dan m: juml. variabelygbernilaisama

  12. Contoh Dissimilarity

  13. Makauntuk categorical variable, test 1, p=1 dan d(i,j)=0 jikasamadan 1 jikabeda. Sehinggadihasilkanmatrikberikut:

  14. Ordinal Varible • Ordinal variable : miripdengan categorical variable, namunmemilikiartidalamurutandanbergunabilatidakdapatdiukursecaraobjektif. • Contoh: profesional rank: assistant, associate, and full for professor.

  15. Jikaadanilai f darisuatuobjekxifdanmempunyai states ygterurutdalamrangking 1,…, Mf, makabisadituliskanrifє {1, …Mf} • Merubahnilairfdenganmenormalisasi :[0,1] denganrumusan: • Kemudiandicarinilaidisimilarity- nyadigunakanrumusanjarak

  16. Contoh ordinal variable (test-2) • Ada 3 state dalam test 2: fair, good dan excellent, sehingga Mf=3 • Rubahnilaidlmobjektsbdengan 3(1, 2, dan 3). • Normalisasisehingga: rank-1=0, rank-2=0.5, rank-3=1 • MakadenganrumusanEcluidian Dist, didapatkan:

  17. Ratio-scaled variable • Biasanyadigunakanuntukukuranpositippadaskala non linear sepertiskalaexponensial dg rumusan: • Contoh: pertumbuhanpopulasibakteriataukerusakanakibatradioaktif • Menggunakantransformasilogaritmikdengan formula yif=log(xif), nilaiyif yang digunakansbg interval value

  18. Contoh ratio-scaled variable (test-3) • Makadgnmengaplikasikanlogaritmikdari tiap2 nilaidalam test-3: didapatkannilai: 2.65, 1.34, 2.21, dan 3.08 untukobjek 1-4 • Sehinggadenganrumusjarakdidapatkan : Dinormalisasi , Dg membagi 1.74

  19. Variable of Mixed Types • Dalamdunianyata, seringdijumpai variable ygmemilikicampuran, antara lain: interval scaled, symmetric binary, categorical, ordinal atau ratio-scaled. • Makadapatdigunakanrumusanberikutuntukmencari dissimilarity: Dimana: ∂ij=0 jikaxjfatauxjfkosong & ∂ij=1 jikaadanilainya

  20. Contohperhitungan mix-variable • Jikadidapatkandariperhitungansebelumnyadidapatkanjarakdaritiap-tiapvariabel (test1, test2, dan test 3: • Makadenganmengaplikasikanrumusantersebutdidapatkan

  21. Final result of dissimilarity (test1, test2 & test3)

More Related