How to implement Decision Tree in R Programming?

Description

To implement Decision Tree for readingSkills Dataset using R.

What is Decision Tree?

  • Used for both classification and prediction.
  • It is a flowchart like tree structure, where each internal node denotes a test on an attribute, each branch represents an outcome of the test, and each leaf node (terminal node) holds a class label.

Data set :

    • readingSkills – R inbuilt dataset

R Packages :

        • Inbuilt Dataset : readingSkills from R package party
        • caret – Classification and regression trees
        • rpart.plot – For Plotting rpart trees

R Function:

        • str(data frame) – To display the internal structure of an data frame
        • summary(data frame) – Displays Mininmum, Maximum, Mean, Median, 1st quartile, 2nd quartile values of each numeric, integer like R object in the data frame. And displays count for factor like R object.
        • createDataPartition(y, p= ) – To partition the data set into train and test

y – dependent variable

p – Percentage of data for training

        • dim(matrix or array or data frame) – To get the dimension of the Matrix, Array, Data frame
        • trainControl(method=,number=,repeats=) – To select Optimal tuning parameter

method – resampling method

number – Either the number of folds or number of resampling iterations repeats

repeats – For repeated k-fold cross-validation only: the number of complete sets of folds to compute.

        • train(formula, data=, trControl= ) – To fit predictive models for different tuning parameter

Data – data frame

trControl – List of values for tuning

prp(x=) – To plot decision trees

x – Decision tree model

        • predict(model=, data=) – to predict an outcome value on the basis of one or multiple predictor variables

model – fitted model

data – New data

      • confusionMatrix(predicted_value=, y_test=) – To implement Confusion Matrix

#Import data

#install.packages(“party”)
library(“party”)

my_input<-readingSkills
View(my_input)

str(my_input)
summary(my_input)

#Splitting into train and test dataset

library(“caret”)

training<-createDataPartition(my_input$nativeSpeaker,p=0.7,list = F)

train<-my_input[training,]
test<-my_input[-training,]

dim(train)
dim(test)

#Data Preparation

#Checking Missing Values

sum(is.na(my_input))

#Decision tree model

#install.packages(“rpart.plot”)
library(“rpart.plot”)

tr<-trainControl(method = “repeatedcv”,number = 10, repeats = 3)
dec_tree<-train(nativeSpeaker~.,data=train,method=”rpart”,
parms=list(split=”gini”),
trControl=tr,
tuneLength=10)

dec_tree

#Plot decision tree

prp(dec_tree$finalModel,box.palette=”Reds”,tweak=1.2)

#Prediction

pred_dec<-predict(dec_tree,test)

#Confusion Matrix

confusionMatrix(pred_dec,test$nativeSpeaker)

Leave Comment

Your email address will not be published. Required fields are marked *

clear formSubmit