CSE591 (575) Data Mining

CSE591 (575) Data Mining 1/21/2003 - 5/6/2003 Computer Science & Engineering ASU

Introduction Introduction to this Course Introduction to Data Mining

Introduction to the Course • First, about you - why take this course? • Your background and strength • AI, DBMS, Statistics, Biology, … • Your interests and requests • What is this course about? • Problem solving • Handling data • transform data to workable data • Mining data • turn data to knowledge • validation and presentation of knowledge

This course • What can you expect from this course? • Knowledge and experience about DM • Problem solving and solution presentation • How is this course conducted? • Presentations • Individual projects • Course Format • Individual Projects 40% • Exams and/or quizzes 40% • Class participation 20% • off-campus students?

Projects - Start NOW! • How to start? • Projects should be sufficiently challenging but reasonable, suitable for one semester • How to choose your individual project • Real-world problems • Problems that might make differences • Two types of projects • Available projects • Self-proposed projects (Approval’s needed)

Some project ideas • Dealing with high dimensional data • Data of supervised, unsupervised learning • Image mining • Feature extraction, clustering of images • Active sampling • Various data structures (kd-trees, R-trees, Multi-Dimen Scaling) • Meta data (RDF, namespace) for mining • Ensemble learning • Sequence mining (HMM learning) • Bioinformatics and applications (feature selection) • Intelligent driving data analysis • Data integration, data reduction (random projection)

How is a project evaluated? • It depends on • What do you want to achieve • Its impact • Your effort • The sooner you start, the better • The beginning is not easy

Course Web Site • http://www.public.asu.edu/~huanliu/cse591.html • My office and office hours • GWC 342 • T 10:30 - 11:30am and Th 4:00-5:00pm • My email: hliu@asu.edu • Slides and relevant information will be made available at the course web site

Any questions and suggestions? • Your feedback is most welcome! • I need it to adapt the course to your needs. • Please feel free to provide yours anytime. • Share your questions and concerns with the class – very likely others may have the same. • No pain no gain – no magic for data mining. • The more you put in, the more you get • Your grades are proportional to your efforts.

Introduction to Data Mining Definitions Motivations of DM Interdisciplinary Links of DM

What is DM? • Or more precisely KDD (knowledge discovery from databases)? • Many definitions • A process, not plug-and-play raw data  transformed data  preprocessed data  data mining  post-processing  knowledge • One definition is • A non-trivial process of identifying valid, novel, useful and ultimately understandable patterns in data

Need for Data Mining • Data accumulate and double every 9 months • There is a big gap from stored data to knowledge; and the transition won’t occur automatically. • Manual data analysis is not new but a bottleneck • Fast developing Computer Science and Engineering generates new demands • Seeking knowledge from massive data • Any personal experience?

When is DM useful • Data rich • Two invited talks so far have convincingly demonstrate it • Large data (dimensionality and size) • Image data (size) • Gene data (dimensionality) • Little knowledge about data (exploratory data analysis) • What if we have some knowledge?

DM perspectives • Prediction, description, explanation, optimization, and exploration • Completion of knowledge (patterns vs. models) • Understandability and representation of knowledge • Some applications • Business intelligence (CRM) • Security (Info, Comp Systems, Networks, Data, Privacy) • Scientific discovery (bioinformatics)

Challenges • Increasing data dimensionality and data size • Various data forms • New data types • Streaming data, multimedia data • Efficient search and data access • Intelligent update and integration

Interdisciplinary Links of DM • Statistics • Databases • AI • Machine Learning • Visualization • High Performance Computing • supercomputers, distributed/parallel/cluster computing

Statistics • Discovery of structures or patterns in data sets • hypothesis testing, parameter estimation • Optimal strategies for collecting data • efficient search of large databases • Static data • constantly evolving data • Models play a central role • algorithms are of a major concern • patterns are sought

Relational Databases • A relational databases can contain several tables • Tables and schemas • The goal in data organization is to maintain data and quickly locate the requested data • Queries and index structures • Query execution and optimization • Query optimization is to find the best possible evaluation method for a given query • Providing fast, reliable access to data for data mining

AI • Intelligent agents • Perception-Action-Goal-Environment • Search • uniform cost and informed search algorithms • Knowledge representation • FOL, production rules, frames with semantic networks • Knowledge acquisition • Knowledge maintenance and application

Machine Learning • Focusing on complex representations, data-intensive problems, and search-based methods • Flexibility with prior knowledge and collected data • Generalization from data and empirical validation • statistical soundness and computational efficiency • constrained by finite computing & data recourses • Challenges from KDD • scaling up, cost info, auto data preprocessing

Visualization • Producing a visual display with insights into the structure of the data with interactive means • zoom in/out, rotating, displaying detailed info • Various branches of visualization methods • show summary properties and explore relationships between variables • investigate large databases and convey lots of information • analyze data with geographic/spatial location • A pre- and post-processing tool for KDD

Bibliography • W. Klosgen & J.M. Zytkow, edited, 2001, Handbook of Data Mining and Knowledge Discovery.

CSE591 (575) Data Mining

CSE591 (575) Data Mining

Presentation Transcript

CSE591 Project

LAW 575 UOP Course/law575help.com

LAW 575 UOP Course Tutorial / Tutorialoutlet

LAW 575 UOP Course Tutorial - law575dotcom

LAW 575 UOP Tutorial Courses/ Uoptutorial

LAW 575 Uop Material-law575dotcom

CMGT 575 UOP Tutorial / cmgt575dotcom

LAW 575 Help Tutorials/law575helpdotcom

LAW 575 UOP learning Guidance/tutorialrank

CMGT 575 professional tutor - cmgt575dotcom

CMGT 575 Nerd Peer Educator/cmgt575nerddotcom

LAW 575 help Absolute Tutors / law575help.com

LAW 575 Help eaching effectively/law531helpdotcom

LAW 575 Help Education Expert/ law575helpdotcom

CMGT 575 ASSIST Real Education Real Results/cmgt575assistdotcom

CMGT 575 Course Career Path Begins /cmgt575dotcom

CMGT 575 Course Career Path Begins /cmgt575dotcom

LAW 575 Dreams Come True /uophelp.com