ROUGH SET THEORY AND DATA MINING

ROUGH SET THEORY AND DATA MINING Dr. Sarjon Defit

PENDAHULUAN Teori rough set adalah sebuah teknik matematik yang dikembangkan oleh Pawlack pada tahun 1980. Teknik ini digunakan untuk menangani masalah Uncertainty, Imprecision dan Vagueness dalam aplikasi Artificial Intelligence (AI). Ianya merupakan teknik yang efisien untuk Knowledge Discovery in Database (KDD) proses dan Data Mining. Secara umum, teori rough set telah digunakan dalam banyak applikasi seperti medicine, pharmacology, business, banking, engineering design, image processing dan decision analysis.

REPRESENTASI DATA DALAM ROUGH SET Rough set menawarkan dua bentuk representasi data yaitu Information Systems (IS) dan Decision Systems (DS). Definisi Information Systems: Sebuah Information Systems (IS) adalah pasangan IS={U,A}, dimana U={e1, e2,…, em} dan A={a1, a2, …, an} yang merupakan sekumpulan example dan attribute kondisi secara berurutan. Definisi diatas memperlihatkan bahwa sebuah Information Systems terdiri dari sekumpulan example, seperti {e1, e2, …, em} dan attribute kondisi, seperti {a1, a2, …, an}. Sebuah Information Systems yang sederhana diberikan dalam table-1.

Example Studies Education …. Works E1 Poor SMU … Poor E2 Poor SMU … Good E3 Moderate SMU … Poor E4 Moderate Diploma … Poor E5 Poor SMU … Poor E6 Poor SMU … Poor E7 Moderate Diploma … Poor E8 Good MSc … Good E9 Good MSc … Good E10 Good MSc … Good …. … … … E99 Poor SMU … Good E100 Moderate Diploma … Poor Tabel-1: Information Systems Object

Tabel-1 memperlihatkan sebuah Information Systems yang sederhana. Dalam Information System, tiap-tiap baris merepresentasikan objek sedangkan column merepresentasikan attribute. Ianya terdiri dari m objek, seperti E1, E2,…, Em, dan n attribute seperti Studies, Education, …., Works. Dalam banyak applikasi, sebuah outcome dari pengklasifikasian diketahui yang direpresentasikan dengan sebuah Decision Attribute, C={C1, C2, …, Cp}. Maka Information Systems (IS) menjadi IS=(U,{A,C}). Decision Systems (DS) yang sederhana diperlihatkan pada tabel-2.

Example Studies Education …. Works Income (D) E1 Poor SMU … Poor None E2 Poor SMU … Good Low E3 Moderate SMU … Poor Low E4 Moderate Diploma … Poor Low E5 Poor SMU … Poor None E6 Poor SMU … Poor None E7 Moderate Diploma … Poor Low E8 Good MSc … Good Medium E9 Good MSc … Good Medium E10 Good MSc … Good High …. … … … E99 Poor SMU … Good Low E100 Moderate Diploma … Poor Low Tabel-2: Decision System

Tabel-2 memperlihatkan sebuah Decision Systems yang sederhana. Ianya terdiri dari m objek, seperti E1, E2, …, Em, dan n attribute, seperti Studies, Education, …, Works dan Income (D). Dalam tabel ini, n-1 attribute, Studies, Education, …, Works, adalah attribute kondisi, sedangkan Income adalah decision attribute.

Discerning Object Discern objek baik indiscernibility, equivalence class dan discernibility matrix adalah konsep penting dalam teori rough set. Indiscerniblity relation Definisi Indiscerniblity: Diberikan sebuah Decision Systems, DS{U,(A, C)}, indiscernibility didefinisikan sebagai sekumpulan objek yang mempunyai nilai decision yang sama.

Class Studies(A) Education (B) Works(C) Income Num_obj EC1 Poor SMU Poor None 50 EC2 Poor SMU Good Low 5 EC3 Moderate SMU Poor Low 30 EC4 Moderate Diploma Poor Low 10 EC5,1 Good MSc Good Medium 4 EC5,2 Good MSc Good High 1 Equivalence Class Equivalence class adalah mengelompokan objek-objek yang sama untuk attribute A  (U, A). Diberikan Decision Systems pada tabel-2, kita dapat memperoleh equivalence class (EC1-EC5) seperti digambarkan pada tabel-3.

Class Studies (A) Education (B) Works(C) Income Num_obj EC1 1 2 3 1 50 EC2 1 2 1 2 5 EC3 2 2 3 2 30 EC4 2 3 3 2 10 EC5,1 3 5 1 3 4 EC5,2 3 5 1 4 1 Class EC5 adalah sebuah indeterminacy yang memberikan 2 (dua) keputusan yang berbeda. Situasi ini dapat ditangani dengan teknik data cleaning. Kolom yang paling kanan mengindikasikan jumlah objek yang ada dalah Decision System untuk class yang sama. Contoh dalam tabel-3 disederhanakan kedalam numerical representation. Tabel-4 memperlihatkan numerical representation dari equivalence class dari tabel-3.

EC1 EC2 EC3 EC4 EC5 EC1 x C A Ab Abc EC2 C X Ac Abc Ab EC3 A Ac x B Abc EC4 Ab Abc B x Abc EC5 Abc Ab Abc Abc X Discernibility Matrix Definisi Discerniblity Matrix: Diberikan sebuah IS A=(U,A) and B  A, discernibility matrix dari A adalah MB, dimana tiap-tiap entry MB(I,j) tediri dari sekumpulan attribute yang berbeda antara objek Xi dan Xj. Tabel-5 memperlihatkan discerniblity matrix dari tabel-4.

EC1 EC2 EC3 EC4 EC5 EC1 X C A AB ABC EC2 C X X X AB EC3 A X X X ABC EC4 AB X X X ABC EC5 ABC AB ABC ABC X Discernibility Matrix Modulo D Diberikan sebuah DS A=(U,A{d{) dan subset dari attribute B  A, discernibility matrix modulo D dari A, MBd, didefinisikan seperti berikut dimana MB(I,j) adalah sekumpulan attribute yan berbeda antara objek Xi dan Xj dan juga berbeda attribute keputusan.

Reduct Reduct adalah penyeleksian attribut minimal (interesting attribute) dari sekumpulan attribut kondisi dengan menggunakan Prime Implicant fungsi Boolean. Kumpulan dari semua Prime Implicant mendeterminasikan sets of reduct. Discernibility matrix modulo D pada tabel-5 dapat ditulis sebagai formula CNF seperti diperlihatkan pada tabel-6. Class CNF of Boolean Function Prime Implicant Reducts E1 ca(ab) (abc) ac {a,c} E2 c (ab) c (ab) {a,c}, {b,c} E3 a (abc) a {a} E4 (ab) (abc) ab {a}, {b} E5 (abc)  (ab) (ab) {a}, {b}

Generating Rules The major process of discovering knowledge in database is the extraction of rules from the decision system. The rough set method in generating decision rules from the decision table is based on the reduct set computation. Figure-1 shows the rules generation process using reducts and equivalence classes.

Class A B C Dec E1 E2 E3 E4 E5,1 E5,2 1 1 2 2 3 3 2 2 2 3 5 5 3 1 3 3 1 1 1 2 2 2 3 4 [E1,{a,c}] [E2, {a,c}, {b,c} [E3, {a}] [E4, {a},{b}] [E5, {a}, {b} Reduct Equivalence Classes A1C3  d1 A1c1  d2, b2c1  d2 A2  d2 B3  d2 A3  d3, a3  d4 B5  d3, b5  d4 Rules

ROUGH SET THEORY AND DATA MINING

ROUGH SET THEORY AND DATA MINING

Presentation Transcript

Rough Sets Theory

Set Theory

Fuzzy-rough data mining

Set Theory

Set Theory

Set Theory

Set Theory

Rough Sets Theory

Reducing the Response Time for Data Warehouse Queries Using Rough Set Theory

Rough Sets Theory

SET THEORY

Data mining: theory and applications

3. Rough set extensions

Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU

ROUGH SET BASED DECISION SUPPORT

Set Theory

Granular Computing and Rough Set Theory Lotfi A. Zadeh Computer Science Division

Rough Set Strategies to Data with Missing Attribute Values

Set Theory

Set Theory

Data Mining, Information Theory and Image Interpretation