To implement Decision Tree for readingSkills Dataset using R.
y – dependent variable
p – Percentage of data for training
method – resampling method
number – Either the number of folds or number of resampling iterations repeats
repeats – For repeated k-fold cross-validation only: the number of complete sets of folds to compute.
Data – data frame
trControl – List of values for tuning
prp(x=) – To plot decision trees
x – Decision tree model
model – fitted model
data – New data
#Import data
#install.packages(“party”)
library(“party”)
my_input<-readingSkills
View(my_input)
str(my_input)
summary(my_input)
#Splitting into train and test dataset
library(“caret”)
training<-createDataPartition(my_input$nativeSpeaker,p=0.7,list = F)
train<-my_input[training,]
test<-my_input[-training,]
dim(train)
dim(test)
#Data Preparation
#Checking Missing Values
sum(is.na(my_input))
#Decision tree model
#install.packages(“rpart.plot”)
library(“rpart.plot”)
tr<-trainControl(method = “repeatedcv”,number = 10, repeats = 3)
dec_tree<-train(nativeSpeaker~.,data=train,method=”rpart”,
parms=list(split=”gini”),
trControl=tr,
tuneLength=10)
dec_tree
#Plot decision tree
prp(dec_tree$finalModel,box.palette=”Reds”,tweak=1.2)
#Prediction
pred_dec<-predict(dec_tree,test)
#Confusion Matrix
confusionMatrix(pred_dec,test$nativeSpeaker)