Research Breakthrough Possible @S-Logix pro@slogix.in

Office Address

Social List

How to Detect Breast Cancer with a Decision Tree Algorithm in Python?

Breast Cancer Detection using Decision Tree

Condition for Detecting Breast Cancer Using Decision Tree Algorithm

  • Description: Breast cancer is one of the most common forms of cancer affecting women worldwide. Early detection of breast cancer can greatly improve the chances of successful treatment. In this project, we will use a Decision Tree algorithm to build a machine learning model that can classify breast cancer as malignant or benign based on a set of features extracted from breast tissue biopsies. The dataset used in this case study is the well-known "Breast Cancer Wisconsin (Diagnostic) Dataset", which includes features such as radius, texture, smoothness, compactness, and concavity of the cell nuclei.
Why Should We Choose Decision Tree Algorithm?
  • Interpretability: The decision-making process of the model is easy to follow and understand.
  • Non-Linear Relationships: Decision trees can model non-linear relationships between features.
  • Handles Numerical and Categorical Data: Decision trees can handle both types of data effectively.
  • No Need for Feature Scaling: Unlike algorithms like SVM or KNN, Decision Trees do not require normalization or scaling of features.
Step-by-Step Process
  • Data Collection: Load the Breast Cancer dataset from sources like sklearn.datasets.
  • Data Preprocessing: Handle missing or irrelevant data, if necessary, and split the dataset into training and testing sets.
  • Model Training: Train the Decision Tree model on the training data.
  • Model Evaluation: Evaluate the model using accuracy, precision, recall, and F1-score. Visualize the Decision Tree to understand the decision-making process.
  • Model Tuning: Fine-tune the model using hyperparameters like maximum depth, min_samples_split, etc.
  • Results Interpretation: Present results and draw conclusions based on the model's performance.
Sample Source Code
  • import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    import seaborn as sns
    from sklearn.model_selection import train_test_split
    from sklearn.tree import DecisionTreeClassifier, plot_tree
    from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

    # Load the dataset
    from sklearn.datasets import load_breast_cancer
    data = load_breast_cancer()
    X = pd.DataFrame(data.data, columns=data.feature_names)
    y = pd.Series(data.target, name='target')

    # Split data into train and test sets
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

    # Train the model
    dt_classifier = DecisionTreeClassifier(random_state=42)
    dt_classifier.fit(X_train, y_train)

    # Make predictions
    y_pred = dt_classifier.predict(X_test)

    # Evaluate the model
    accuracy = accuracy_score(y_test, y_pred)
    print(f"Accuracy: {accuracy*100:.2f}")

    # Confusion Matrix
    conf_matrix = confusion_matrix(y_test, y_pred)
    sns.heatmap(conf_matrix, annot=True, fmt="d", cmap="Blues", xticklabels=['Benign', 'Malignant'], yticklabels=['Benign', 'Malignant'])
    plt.xlabel('Predicted')
    plt.ylabel('True')
    plt.title('Confusion Matrix')
    plt.show()

    # Classification Report
    print(classification_report(y_test, y_pred))

    # Visualize the Decision Tree
    plt.figure(figsize=(20, 10))
    plot_tree(dt_classifier, filled=True, feature_names=data.feature_names, class_names=data.target_names, rounded=True)
    plt.title('Decision Tree Visualization')
    plt.show()
Screenshots
  • Breast Cancer Decision Tree Screenshot