200 likes | 379 Views
Data Mining: P enelitian Data Mining. Romi Satria Wahon o romi@romisatriawahono.net http://romisatriawahono.net +6281586220090. Romi Satria Wahono. SD Sompok Semarang (1987) SMPN 8 Semarang (1990) SMA Taruna Nusantara , Magelang (1993)
E N D
Data Mining:Penelitian Data Mining Romi Satria Wahonoromi@romisatriawahono.nethttp://romisatriawahono.net+6281586220090
Romi Satria Wahono • SD Sompok Semarang (1987) • SMPN 8 Semarang (1990) • SMA Taruna Nusantara, Magelang (1993) • S1, S2 dan S3 (on-leave)Department of Computer SciencesSaitama University, Japan (1994-2004) • Research Interests: Software EngineeringandIntelligent Systems • Founder IlmuKomputer.Com • Peneliti LIPI (2004-2007) • Founder dan CEO PT Brainmatics Cipta Informatika
Course Outline • Pengenalan Data Mining • Proses Data Mining • Evaluasi dan Validasi pada Data Mining • Metode dan Algoritma Data Mining • Penelitian Data Mining
Penelitian Data Mining • Standard Proses Penelitian pada Data Mining • Journal Publications on Data Mining • Research on Classification • Research on Clustering • Research on Prediction • Research on Association Rule
Data Mining Standard Process(CRISP–DM) • A cross-industry standard was clearly required that is industryneutral,tool-neutral, and application-neutral • The Cross-Industry Standard Processfor Data Mining (CRISP–DM) was developed in 1996 (Chapman, 2000) • CRISP-DM provides a nonproprietary and freely availablestandard process for fitting data mining into the general problem-solving strategyof a business or research unit
1. Business UnderstandingPhase • Enunciate the project objectives and requirements clearly in terms of thebusiness or research unit as a whole • Translate these goals and restrictions into the formulation of a data mining problem definition • Prepare a preliminary strategy for achieving these objectives
2. Data UnderstandingPhase • Collectthedata • Use exploratory data analysis to familiarize yourself with the data and discoverinitialinsights • Evaluatethe quality of the data • If desired, select interesting subsets that may contain actionable patterns
3. Data PreparationPhase • Prepare from the initial raw data the final data set that is to be used for allsubsequent phases. This phase is very labor intensive • Select the cases and variables you want to analyze and that are appropriate for youranalysis • Perform transformations on certain variables, if needed • Clean the raw data so that it is ready for the modeling tools
4. Modeling phase • Select and apply appropriate modeling techniques • Calibrate model settings to optimize results • Remember that often, several different techniques may be used for the same data miningproblem • If necessary, loop back to the data preparation phase to bring the form ofthe data into line with the specific requirements of a particular data miningtechnique
5. Evaluationphase • Evaluate the one or more models delivered in the modeling phase for qualityand effectiveness before deploying them for use in the field • Determine whether the model in fact achieves the objectives set for it in thefirstphase • Establish whether some important facet of the business or research problemhas not been accounted for sufficiently • Come to a decision regarding use of the data mining results
6. Deploymentphase • Make use of the models created: Model creation does not signify the completion of a project • Example of a simple deployment: Generate a report • Example of a more complex deployment: Implement a parallel data miningprocessinanotherdepartment • For businesses, the customer often carries out the deployment based on your model
Latihan • Pelajari dan pahami Case Study 1-5 dari buku Larose (2005) Chapter 1 • Pelajari dan pahami bagaimana menerapkan CRISP-DM pada tesis Firmansyah (2011) tentang penerapan algoritma C4.5 untuk penentuan kelayakan kredit
TransactionsandJournals • Review Paper (surveyandstate-of-the-art): • ACM ComputingSurveys (CSUR) • Research Paper (technical): • ACM Transactions on Knowledge Discovery from Data (TKDD) • ACM Transactions on Information Systems (TOIS) • IEEE Transactions on Knowledge and Data Engineering • SpringerData Mining and Knowledge Discovery • International Journal of Business Intelligence and Data Mining (IJBIDM)
CognitiveAssignmentIII • Baca 1 paper ilmiah yang diterbitkan di journal 2010-2012 yang berhubungan dengan metode data mining yang sudah kita pelajari • Rangkumkan masing-masing dalam bentuk slidedengan struktur: • Latar Belakang Masalah (Research Background) • Pernyataan Masalah (Problem Statements) • Pertanyaan Penelitian (Research Questions) • Tujuan Penelitian (Research Objective) • Metode-Metode yang Sudah Ada (Existing Methods) • Metode yang Diusulkan (ProposedMethod) • Hasil (Results) • Kesimpulan (Conclusion) • Presentasikan di depan kelas pada mata kuliah berikutnya
Referensi • Ian H. Witten, Frank Eibe, Mark A. Hall, Data mining: PracticalMachineLearning Toolsand Techniques3rd Edition, Elsevier, 2011 • Daniel T. Larose, Discovering Knowledgein Data: an Introductionto DataMining, John Wiley & Sons, 2005 • Florin Gorunescu, Data Mining: Concepts, Models and Techniques, Springer, 2011 • JiaweiHanandMichelineKamber, Data Mining:Concepts and TechniquesSecond Edition, Elsevier, 2006 • OdedMaimonandLiorRokach, Data Mining and Knowledge Discovery Handbook Second Edition, Springer, 2010 • Warren Liao and EvangelosTriantaphyllou (eds.), Recent Advances in Data Mining of Enterprise Data: Algorithmsand Applications, World Scientific, 2007