Data mining is a progression of finding
needed information from the large amount of data which is retrieves different
databases. Classification is one of the techniques in data mining which is
a function that assigns items in a collection to target categories or
classes. The goal of classification is to accurately predict the
target class for each case in the data. Decision tree (DT) algorithm is
based on classification technique which is powerful and popular tool. This
algorithm represents some rules, which can be understood by humans and used
in knowledge system such as database. DT is represented by hierarchical tree
structure. It is used in many different disciplines including diagnosis,
cognitive science, artificial intelligence, game theory, engineering and data
mining. This paper is deliberate to study about Decision Tree learning
Keywords : Hierarchical tree,
Decision Tree algorithm, Classification
Data mining is a type of categorization
technique which is actually used to take out unknown patterns from bulky
databases. Data mining uses two types of approaches i.e supervised learning or
unsupervised learning. Classification is based on supervised learning approach.
Classification methods aspire to recognize the classes that belongs objects
from some expressive qualities. It finds convenience in a wide range of human
activities and predominantly in automated decision making.
Decision trees are a simple, but powerful
form of multiple variable analysis. DT is a flow chart like tree structure. DT
is especially attractive in a data mining environment. DT can handle high
dimensional data and having ability to break down a complex decision making
process in to collection of simpler decision, which is providing solution to
The reasons are:
representation, the resulting model is easy to assimilate by humans
does not require any parameter setting from the user and suited for exploratory
can be constructed relatively fast and the accuracy.
DT easily derives the rules
corresponding to the tree by traversing each leaf of the tree starting from the
node. It denoted that many different leaves of the tree refer to the same class
labels, but each leaf refers to a different rule. Each node denotes as
Internal node :
A test on an attribute.
Branch node :
Outcome of the test.
Leaf node :
Split of one attribute.
A disjunction of test to make the final decision.
Internal node Warm Cold
Fig 1: Decision Tree
DT Features are:-
results that communicate very well in symbolic and visual terms and the ability
to incorporate multiple predictors in a simple step-by-step fashion.
various levels of measurement, including qualitative (e.g., good-bad) and
to various twists and turns in data – unbalanced effects, nested effects,
offsetting effects, interactions and nonlinearities – that frequently defeat
other one-way and multi-way statistical and numeric approaches.
and highly robust and produce similar effects regardless of the level of
measurement of the fields that are used to construct decision tree branches.
Decision Tree Parameters:
size for split: The
size of a node is the number of Examples in its subset. Only those nodes are
split whose size is greater than or equal to the minimal size for split parameter. This
range is in -1 to +? and type must be an integer. Default size is 4.
Leaf Size: The size
of a leaf is the number of Examples in its subset. The tree is generated in
such a way that every leaf has at least the minimal leaf size number of Examples. This
range is in -1 to +? and type must be an integer. Default size is 2.
Gain: The gain of a
node is calculated before splitting it. The node is split if its gain is
greater than the minimal gain. A higher value of minimal gain results in fewer splits and thus a smaller
tree. A value that is too high will completely prevent splitting and a tree
with a single node is generated. This range is in 0.0 to +? and type
must be real. Default size is 0.1
depth of Tree: The
depth of a tree varies depending upon the size and characteristics of the Example
Set. This parameter is used to restrict the depth of the decision tree. If its
value is set to ‘-1’, the maximal depth
parameter puts no bound on the depth of the tree. In this case the tree is
built until other stopping criteria are met. If its value is set to ‘1’, a tree
with a single node is generated. Default size is 20.
Selects the criterion on which
Attributes will be selected for splitting. It can have one of the following
Information_gain: The entropies of all
the Attributes are calculated and the one with least entropy is selected for
split. This method has a bias towards selecting Attributes with a large
number of values.
A variant of information gain that adjusts the information gain for each
Attribute to allow the breadth and uniformity of the Attribute values.
A measure of inequality between the distributions of label characteristics.
Splitting on a chosen Attribute results in a reduction in the average gini index
of the resulting subsets.
An Attribute is selected for splitting, which maximizes the accuracy of the
The construction of the decision tree
involves the following three main phases:-
Construction Phase: The initial
decision tree is constructed in this phased, based on the entire training
data-set. It requires recursively partitioning the training set into two or
more, sub-partition using a splitting criterion, until a stopping criteria is
Pruning Phase: The tree constructed in
the previous phase may not result in the best possible set of rules due to over-fitting.
The pruning phase removes some of the lower branches and nodes to improve its
Processing the Pruned tree: To improve
III.DECISION TREE ALGORITHM
A decision tree
(DT) model is a computational model consisting of three parts:
1) A decision tree
2) An algorithm to
create the tree.
3) An algorithm
that applies the tree to data and solves the problem under consideration.
Input: D //Decision
Output: O// Model
DT Proc algorithm:
to illustrate prediction technique using DT.
For each t € D do
n = root node of D;
While n not leaf
Obtain answer to
question on n applied to t;
Indentify arc from
t, which contains correct answer;
n = node at end of
for t based on label of n;
Types of Decision tree algorithms:-
IV. ADVANTAGES AND DISADVANTAGES OF DECISION
explanatory and easy in terms of readability and who is not familiar with
data mining before can interpret a decision tree if it is of small size.
majority of the inducers require the target value of the data instances to be
discrete valued only.
Both categorical and numerical or continuous
attributes can be handled.
represent the same subtrees as one so it will replicate the trees since every
path is mutually exclusive
data may contain missing attribute values and decision trees will still
trees are greedy, they sometimes become overly sensitive and tend to choose a
split due to a noise or an irrelevant attribute which provides a wrong split
and affect the accuracy poorly.
data might have error and decision trees will still handle it and they are
robust to noise.
sufficient to make a statistically significant decision for the class representation
of the nodes which is also known as the data fragmentation problem.
models that require large sized training datasets to be easier,
computationally inexpensive (in comparison with other machine learning
algorithms) and faster
the missing data require a lot of computation which is a drawback by means of
Table 1: Advantages and Disadvantages
V. EVALUATION PERFORMANCE OF DECISION TREE
A confusion matrix is a table that
is often used to describe the performance of a decision tree model (or
“classifier”) on a set of test data for which the true values are
known. The right
diagonal elements TP (true positive) and TN (true negative) correctly classify
Instances as well as FP (false positive) and FN (false negative) incorrectly
Actual Vs Predicted
Table 2: Actual Vs
Predicted Confusion Matrix
Total number of
instances = Correctly classified instance + Incorrectly classified instance
classified instance = TP + TN
classified instance = FP + FN
Precision is the ratio of modules
correctly classified to the number of entire modules classified fault-prone. It
is proportion of units correctly predicted as faulty.
Recall is the ratio of modules
correctly classified as fault-prone to the number of entire faulty modules.
A cost matrix is similar to
confusion matrix but minor difference is with finding the value of cost
accuracy through misclassification error rate.
error rate = 1 – Accuracy
FM is a combination of recall and
precision. It is also defined as harmonic mean of precision and recall.
It is defined as the ratio of
correctly classified instances to total number of instances.
CALCULATE VALUE TPR, TNR, FPR and FNR
For good classifiers, TPR and TNR
both should be nearer to 100%. Similar is the case with precision and accuracy
parameters. On the contrary, FPR and FNR both should be as close to 0% as
possible. Calculate the value of true positive rate, true negative rate, false
positive rate and false negative rate by methods shown below:-
This paper presented about decision
tree which is highly efficient tools of machine learning and data mining,
capable to produce accurate and easy-to-understand models. This paper provided
some fundamental characteristics about decision tree. According to this paper, performance
evaluation is strongly based on decision criterion such as information gain,
gini-index and gain-ratio. DT can handle large size of training datasets and divide
it to smaller form of data. Finally it is concluded, DT is a readability tree
form structure who is not familiar with classification techniques before can
easily understand about the results of this tree. In future, we will implement
in DT with dataset and will check the performance evaluation.