a kit for knowledge discovery n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
A Kit For Knowledge Discovery PowerPoint Presentation
Download Presentation
A Kit For Knowledge Discovery

Loading in 2 Seconds...

play fullscreen
1 / 76

A Kit For Knowledge Discovery - PowerPoint PPT Presentation


  • 124 Views
  • Updated on

A Kit For Knowledge Discovery. Data, Data everywhere yet. I can’t find the data I need data is scattered over the network many versions, subtle differences I can’t get the data I need need an expert to get the data I can’t understand the data I found available data poorly documented

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

A Kit For Knowledge Discovery


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
    Presentation Transcript
    1. A Kit For Knowledge Discovery

    2. Data, Data everywhere yet ... • I can’t find the data I need • data is scattered over the network • many versions, subtle differences • I can’t get the data I need • need an expert to get the data • I can’t understand the data I found • available data poorly documented • I can’t use the data I found • results are unexpected • data needs to be transformed from one form to other

    3. ? • There are sequence of steps (with eventual feedback loops) that should be followed to discover knowledge (e.g., patterns) in data. • Achieving Standardized Process Model

    4. 1 2 3 • Legitimate • Innovative • Probably useful • Accurate understandable patterns in data. What is KDD ? Knowledge Discovery in Data is the significant method of evaluating

    5. __ ____ __ ____ __ ____ Patterns and Rules Knowledge Discovery Process Interpretation & Evaluation Knowledge Data Mining Knowledge Integration RawData Transformation Selection & Cleaning Understanding Transformed Data Target Data DATA Ware house

    6. Clustering Based On Attributes Events Correlation – Association Sequencing Events ~ Later Predictions Outcomes of Data Mining Forecasting Future Classification on Recognizing patterns

    7. Data Mining • Look for hidden patterns and trends in data that is not immediately apparent from summarizing the data

    8. Data Mining + = Interestingness criteria Hidden patterns Data

    9. Data Mining Type of Patterns + = Interestingness criteria Hidden patterns Data

    10. Data Mining Type of data Type of Interestingness criteria + = Interestingness criteria Hidden patterns Data

    11. What is a Data Warehouse? A single, complete and consistent store of data obtained from a variety of different sources made available to end users in a what they can understand and use in a business context.

    12. Information Data What is Data Warehousing? A process of transforming data into information and making it available to users in a timely enough manner to make a difference

    13. 3 Data Mining Process Problem Definition Data Integration & Cleaning Model Framing & Evaluation Knowledge Discovery 1 2 4

    14. Data Mining Task Basic Operations in DM • Descriptive: • Clustering / Similarity Matching • Association rules • Deviation detection • Predictive: • Regression • Classification • Collaborative Filtering

    15. Why Machine Learning Growing flood of online data Budding industry Progress in algorithms and theory • Data mining: using historical data to improve decision • medical records ⇒ medical knowledge • log data to model user • Software applications we can’t program by hand • autonomous driving • speech recognition • Self customizing programs • Newsreader that learns user interests

    16. Machine Learning Unsupervised Data have no target attribute. Explore Data to find Patterns Text Unsupervised Supervised Data Mining Machine Learning Supervised Discover patterns in the data. Presence of Target Attribute

    17. Applications Of Data Mining

    18. Applications of Data Mining • Fraud/Non-Compliance Anomaly detection • Isolate the factors that lead to fraud, waste and abuse • Target auditing and investigative efforts more effectively • Credit/Risk Scoring • Intrusion detection • Recruiting/Attracting customers • Maximizing profitability (cross selling, identifying profitable customers) • Service Delivery and Customer Retention • Build profiles of customers likely to use which services

    19. Tools For Data Mining LinkOut NCBI Sequin Rapid Miner LibSvm ADaM etc….

    20. Why Weka Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes.

    21. About WEKA Waikato Environment for Knowledge Analysis (WEKA) Developed by the Department of Computer Science, University of Waikato, New Zealand Machine learning/data mining software coded in Java Used for research, education, and applications Exclusively for KDD. Various Versions are available such as Version 2.3, 1998; Version 3.0, 1999; Version 3.4, 2003; Version 3.6, 2008.

    22. Weka GUI Chooser

    23. A Vital Part In Weka ww.themegallery.com Explorer

    24. Weka !!!!!!!! Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. Perfectly suited for developing new machine learning schemes.

    25. Explorer Weka’s Structural Layout Knowledge Flow Simple CLI Experimenter Performing experiments and conducting statistical tests between learning schemes Supports the same functions as the Explorer but with drag-and-drop Provides a simple command-line interface that allows direct execution of WEKA An environment for exploring data with WEKA

    26. Algorithms www.themegallery.com

    27. WEKA ! File WEKA stores data in flat files (ARFF format). Easy to transform EXCEL file to ARFF format. ARFF file consists of a list of instances ARFF file can be created using Notepad or Word. Attribute Relation File Format (ARFF) • Name of the dataset is with @relation • Attribute information is with @attribute • Data is with @data.

    28. Sample ARFF

    29. Select Attributes 5 Associate 4 Cluster 3 Classify 2 Preprocess 1 Intrinsic Operations

    30. Pre-Processing

    31. Preprocessing • Changing Data formats as per the Needs. • Varies as Per Mining Datasets. • Some of the Preprocessing Steps • Adding/removing attributes • Attribute value substitution • Discretization (MDL, Kononenko, etc.) • Time series filters (delta, shift) • Sampling, randomization • Missing value management • Normalization and other numeric transformations

    32. Algorithms

    33. Opening Files Current Relation Operations Browse for the data file in local file system. • Relations • Instances • Schema • Attributes • Filters Pre-Processing

    34. Weka – Formulating Files

    35. Dataset -.txt Format

    36. Weka ~ Dataset’s

    37. Missing Values

    38. GenericObjectEditor • A Property Editor for objects as editable in the GenericObjectEditor configuration file, which lists possible values that can be selected from, and themselves configured. • The configuration file is called "GenericObjectEditor.props" and may live in either the location given by "user.home" or the current directory (this last will take precedence), and a default properties file is read from the weka distribution.

    39. Weka ~ GenericObjectEditor • This Editor allows configure a filter. • Same kind of dialog box is used to configure other objects, such as classifiers and clusterers.

    40. Sample - Cluster Attributes for Cluster

    41. Weka’s Viewer

    42. PCA Analysis

    43. Pre-Processing Retrievals Before After

    44. Retrieving Significant Attributes

    45. Select Attribute !

    46. Algorithms

    47. Feature Selection • Some columns are noisy or redundant. This noise makes it more difficult to discover meaningful patterns from the data; • To discover quality patterns, most data mining algorithms require much larger training data set on high-dimensional data set. • Feature selection, also known as variable selection, feature reduction, attribute selection or variable subset selection, • is the technique of selecting a subset of relevant features for building robust learning models

    48. Attribute Selection • Attribute selection involves searching through all possible combinations of attributes in the data to find which subset of attributes works best for prediction. • To do this, two objects must be set up: • The evaluator determines what method is used to assign a worth to each subset of attributes. • The search method determines what style of search to be done • The Attribute Selection Mode box has two options: • 1. Use full training set. • 2. Cross-validation.

    49. Attribute Selection • Very flexible: arbitrary combination of search and evaluation methods • Both filtering and wrapping methods • Search methods • best-first • genetic • ranking ... • Evaluation mmeasures • Relief • information gain • gain ratio ...

    50. Applying Algorithm