Abstract

Data mining is a progression of finding

needed information from the large amount of data which is retrieves different

databases. Classification is one of the techniques in data mining which is

a function that assigns items in a collection to target categories or

classes. The goal of classification is to accurately predict the

target class for each case in the data. Decision tree (DT) algorithm is

based on classification technique which is powerful and popular tool. This

algorithm represents some rules, which can be understood by humans and used

in knowledge system such as database. DT is represented by hierarchical tree

structure. It is used in many different disciplines including diagnosis,

cognitive science, artificial intelligence, game theory, engineering and data

mining. This paper is deliberate to study about Decision Tree learning

Algorithms.

Keywords : Hierarchical tree,

Decision Tree algorithm, Classification

I.INTRODUCTION

Data mining is a type of categorization

technique which is actually used to take out unknown patterns from bulky

databases. Data mining uses two types of approaches i.e supervised learning or

unsupervised learning. Classification is based on supervised learning approach.

Classification methods aspire to recognize the classes that belongs objects

from some expressive qualities. It finds convenience in a wide range of human

activities and predominantly in automated decision making.

II.DECISION TREE

Decision trees are a simple, but powerful

form of multiple variable analysis. DT is a flow chart like tree structure. DT

is especially attractive in a data mining environment. DT can handle high

dimensional data and having ability to break down a complex decision making

process in to collection of simpler decision, which is providing solution to

easy interpret.

The reasons are:

1. Intuitive

representation, the resulting model is easy to assimilate by humans

2. DT

does not require any parameter setting from the user and suited for exploratory

knowledge discovery.

3. DT

can be constructed relatively fast and the accuracy.

DT easily derives the rules

corresponding to the tree by traversing each leaf of the tree starting from the

node. It denoted that many different leaves of the tree refer to the same class

labels, but each leaf refers to a different rule. Each node denotes as

following:-

Ø

Internal node :

A test on an attribute.

Ø

Branch node :

Outcome of the test.

Ø

Leaf node :

Class distributions.

Ø

Arc/edge :

Split of one attribute.

Ø

Path :

A disjunction of test to make the final decision.

Root

node

Internal node Warm Cold

Non-mammals

Leaf node

Non-mammals

Mammals

Yes No

Fig 1: Decision Tree

DT Features are:-

Ø

Produce

results that communicate very well in symbolic and visual terms and the ability

to incorporate multiple predictors in a simple step-by-step fashion.

Ø

Incorporate

various levels of measurement, including qualitative (e.g., good-bad) and

quantitative measures.

Ø

Adapt

to various twists and turns in data – unbalanced effects, nested effects,

offsetting effects, interactions and nonlinearities – that frequently defeat

other one-way and multi-way statistical and numeric approaches.

Ø

Nonparametric

and highly robust and produce similar effects regardless of the level of

measurement of the fields that are used to construct decision tree branches.

Decision Tree Parameters:

1.

Minimal

size for split: The

size of a node is the number of Examples in its subset. Only those nodes are

split whose size is greater than or equal to the minimal size for split parameter. This

range is in -1 to +? and type must be an integer. Default size is 4.

2.

Minimal

Leaf Size: The size

of a leaf is the number of Examples in its subset. The tree is generated in

such a way that every leaf has at least the minimal leaf size number of Examples. This

range is in -1 to +? and type must be an integer. Default size is 2.

3.

Minimal

Gain: The gain of a

node is calculated before splitting it. The node is split if its gain is

greater than the minimal gain. A higher value of minimal gain results in fewer splits and thus a smaller

tree. A value that is too high will completely prevent splitting and a tree

with a single node is generated. This range is in 0.0 to +? and type

must be real. Default size is 0.1

4.

Maximal

depth of Tree: The

depth of a tree varies depending upon the size and characteristics of the Example

Set. This parameter is used to restrict the depth of the decision tree. If its

value is set to ‘-1’, the maximal depth

parameter puts no bound on the depth of the tree. In this case the tree is

built until other stopping criteria are met. If its value is set to ‘1’, a tree

with a single node is generated. Default size is 20.

5.

Decision

Criterion:

Selects the criterion on which

Attributes will be selected for splitting. It can have one of the following

values:

Ø

Information_gain: The entropies of all

the Attributes are calculated and the one with least entropy is selected for

split. This method has a bias towards selecting Attributes with a large

number of values.

Ø Gain_ratio:

A variant of information gain that adjusts the information gain for each

Attribute to allow the breadth and uniformity of the Attribute values.

Ø Gini_index:

A measure of inequality between the distributions of label characteristics.

Splitting on a chosen Attribute results in a reduction in the average gini index

of the resulting subsets.

Ø Accuracy:

An Attribute is selected for splitting, which maximizes the accuracy of the

whole tree.

The construction of the decision tree

involves the following three main phases:-

Construction Phase: The initial

decision tree is constructed in this phased, based on the entire training

data-set. It requires recursively partitioning the training set into two or

more, sub-partition using a splitting criterion, until a stopping criteria is

met.

Pruning Phase: The tree constructed in

the previous phase may not result in the best possible set of rules due to over-fitting.

The pruning phase removes some of the lower branches and nodes to improve its

performance.

Processing the Pruned tree: To improve

understandability.

III.DECISION TREE ALGORITHM

A decision tree

(DT) model is a computational model consisting of three parts:

1) A decision tree

is defined.

2) An algorithm to

create the tree.

3) An algorithm

that applies the tree to data and solves the problem under consideration.

Algorithm:

Input: D //Decision

Tree

Output: O// Model

Prediction

DT Proc algorithm:

Simplest algorithm

to illustrate prediction technique using DT.

For each t € D do

n = root node of D;

While n not leaf

node do

Obtain answer to

question on n applied to t;

Indentify arc from

t, which contains correct answer;

n = node at end of

this arc;

Make prediction

for t based on label of n;

Types of Decision tree algorithms:-

Ø

Hunt’s

Ø

CART

Ø

SPRINT

Ø

J48

Ø

C4.5

Ø

Random Forest

IV. ADVANTAGES AND DISADVANTAGES OF DECISION

TREE

S.No

Advantages

Disadvantages

1

Self

explanatory and easy in terms of readability and who is not familiar with

data mining before can interpret a decision tree if it is of small size.

Big

majority of the inducers require the target value of the data instances to be

discrete valued only.

2

Both categorical and numerical or continuous

attributes can be handled.

Cannot

represent the same subtrees as one so it will replicate the trees since every

path is mutually exclusive

3

The

data may contain missing attribute values and decision trees will still

handle it.

Decision

trees are greedy, they sometimes become overly sensitive and tend to choose a

split due to a noise or an irrelevant attribute which provides a wrong split

and affect the accuracy poorly.

4

The

data might have error and decision trees will still handle it and they are

robust to noise.

Not

sufficient to make a statistically significant decision for the class representation

of the nodes which is also known as the data fragmentation problem.

5

Constructing

models that require large sized training datasets to be easier,

computationally inexpensive (in comparison with other machine learning

algorithms) and faster

Handling

the missing data require a lot of computation which is a drawback by means of

computational time

Table 1: Advantages and Disadvantages

V. EVALUATION PERFORMANCE OF DECISION TREE

CONFUSION MATRIX

A confusion matrix is a table that

is often used to describe the performance of a decision tree model (or

“classifier”) on a set of test data for which the true values are

known. The right

diagonal elements TP (true positive) and TN (true negative) correctly classify

Instances as well as FP (false positive) and FN (false negative) incorrectly

classify Instances.

Actual Vs Predicted

Predicted

Survived

Not Survived

Actual

Survived

True

Positive

(TP)

False

Negative

(FN)

Not Survived

False

Positive

(FP)

True

Negative

(TN)

Table 2: Actual Vs

Predicted Confusion Matrix

Total number of

instances = Correctly classified instance + Incorrectly classified instance

Correctly

classified instance = TP + TN

Incorrectly

classified instance = FP + FN

PRECISION

Precision is the ratio of modules

correctly classified to the number of entire modules classified fault-prone. It

is proportion of units correctly predicted as faulty.

Precision=TP/

TP+ FP

RECALL

Recall is the ratio of modules

correctly classified as fault-prone to the number of entire faulty modules.

Recall=TP/

TP+ FN

COST MATRIX

A cost matrix is similar to

confusion matrix but minor difference is with finding the value of cost

accuracy through misclassification error rate.

Misclassification

error rate = 1 – Accuracy

F-MEASURE

FM is a combination of recall and

precision. It is also defined as harmonic mean of precision and recall.

F-Measure=2*Recall*Precision/

(Recall+Precision)

ACCURACY

It is defined as the ratio of

correctly classified instances to total number of instances.

Accuracy=TP+TN/

(TP+FP+TN+FN)

CALCULATE VALUE TPR, TNR, FPR and FNR

For good classifiers, TPR and TNR

both should be nearer to 100%. Similar is the case with precision and accuracy

parameters. On the contrary, FPR and FNR both should be as close to 0% as

possible. Calculate the value of true positive rate, true negative rate, false

positive rate and false negative rate by methods shown below:-

TPR=TP/ (TP+FN)

TNR=TN/

(FP+TN)

FPR=FN/

(TP+FN)

FNR=FP/

(FP+TN)

VI. CONCLUSION

This paper presented about decision

tree which is highly efficient tools of machine learning and data mining,

capable to produce accurate and easy-to-understand models. This paper provided

some fundamental characteristics about decision tree. According to this paper, performance

evaluation is strongly based on decision criterion such as information gain,

gini-index and gain-ratio. DT can handle large size of training datasets and divide

it to smaller form of data. Finally it is concluded, DT is a readability tree

form structure who is not familiar with classification techniques before can

easily understand about the results of this tree. In future, we will implement

in DT with dataset and will check the performance evaluation.