By: Ashmi Banerjee (125186) Suman Datta (1251132) CSE- 3rd year.

By: Ashmi Banerjee (125186) Suman Datta (1251132) CSE- 3rd year. DECISION TREE

INTRODUCTION TO DECISION TREES Decision tree learning is one of the most widely used and practical methods for inductive inference. It is a method for approximating discrete-valued functions that is robust to noisy data and capable of learning disjunctive expressions. These learning methods are among the most popular of inductive inference algorithms and have been successfully applied to a broad range of tasks from learning to diagnose medical cases to learning to assess credit risk of loan applicants.

DECISION TREE REPRESENTATION A decision tree is a classification model whose structure consists of a number of nodes and arcs. In general, a node is labelled by an attribute name, and an arc by a valid value of the attribute associated with the node from which the arc originates. The top-most node is called the root of the tree, and the bottom nodes are called the leaves. Each leaf is labelled by a class (value of the class attribute). When used for classification, a decision tree is traversed in a top-down manner, following the arcs with attribute values satisfying the instance that is to be classified. The traversal of the tree leads to a leaf node and the instance is assigned the class label of the leaf.

TYPES OF ATTRIBUTES • Binary Attributes • Nominal Attributes • Ordinal • Continuous

BINARY ATTRIBUTES The test condition for a binary attribute generates two potential outcomes.

NOMINAL ATTRIBUTES It can have many values. It can be split into multiple subgroups depending on the number of distinct values corresponding to the attribute.

ORDINAL ATTRIBUTES Ordinal attributes can also produce binary or multi way splits. They can be grouped as long as the grouping does not violate the order property of the attribute value.

An Illustrative Example To illustrate a decision tree, consider the learning task represented by the training examples of the following table. Here the target attribute PlayTennis, which can have values yes or no for different Saturday mornings, is to be predicted based on other attributes of the morning in question.

An Illustrative EXAMPLE contd..

A model Decision Tree based on the Data

Will it predict correctly for all data?? But is our model a Good One????

measures FOR SELECTING THE BEST FIT

The smaller the degree of impurities in the leaf nodes the skewed is the classification. The impurities can be measured as:

Humidity provides greater information gain than Wind, relative to the target classification. Here, E stands for entropy and S for the original collection of examples. Given an initial collection S of 9 positive and 5 negative examples, [9+, 5-], sorting these by their Humidity produces collections of [3+, 4-1 (Humidity = High) and [6+, 1-] (Humidity = Normal). The information gained by this partitioning is .151, compared to a gain of only .048 for the attribute Wind.

An Exercise

Thank you !

By: Ashmi Banerjee (125186) Suman Datta (1251132) CSE- 3rd year.

By: Ashmi Banerjee (125186) Suman Datta (1251132) CSE- 3rd year.

Presentation Transcript

Tips For Avoiding and Reducing Credit Card Debt

The Vision and Reality of Ubiquitous Computing

GAURANGANATH BANERJEE HUMPHREY MILFORD OXFORD UNIVERSITY PRESS LONDON BOMBAY MADRAS CALCUTTA 1921

Data Mining for Anomaly Detection

CE 4640: Transportation Design

The Year In Cardiology

Funding Year 2013: Completing form 470

Hypertension and Its Management

Probabilistic Models for Matrix Completion Problems

Decision Tree Learning

Sunanda Banerjee ( Saha Inst. Nucl . Phys., Kolkata, India)

What Matters to Student Success in the First Year of University?

VOICE OVER WiFi

Observing the Sky

Welcome to the Columba College Presentation 2008

Cardiology ECG Review for the ABIM

New Year on 6 continents

APES year in review

City of El Cajon Fiscal Year 2008-09 Mid-Year Budget Review, Five-Year Business Plan, and

Anomaly Detection: A Tutorial