Research breakthrough possible @S-Logix pro@slogix.in

Office Address

Social List

How to Detect Breast Cancer with the KNN Algorithm in Python?

Breast Cancer Detection using KNN

Condition for Detecting Breast Cancer Using K-Nearest Neighbors (KNN) Algorithm in Python

  • Description: Breast cancer is one of the most common types of cancer worldwide. Early detection is crucial for effective treatment and improved survival rates. In this project, we use the K-Nearest Neighbors (KNN) algorithm to detect and classify breast cancer as either malignant or benign based on features extracted from breast tissue biopsies. The KNN algorithm is a simple, yet effective, machine learning algorithm that classifies data points based on the majority vote of their neighbors.
Why Should We Choose KNN for Breast Cancer Detection?
  • Simplicity: KNN is easy to understand and implement, making it a good choice for medical data classification tasks.
  • Non-Parametric: KNN makes no assumptions about the underlying data distribution, making it a good fit for problems with non-linear decision boundaries.
  • Efficient for Small Datasets: For datasets with fewer features and manageable size, KNN can be very effective in providing good results.
  • Good for Pattern Recognition: KNN is effective at recognizing patterns in data, which is crucial for detecting and classifying cancer cells based on patterns in features like cell radius, texture, smoothness, etc.
Step-by-Step Process
  • Data Collection: Use the breast cancer dataset, which is available in public repositories like UCI Machine Learning Repository or the sklearn library.
  • Data Preprocessing: Clean the data by handling missing values (if any). Normalize or scale the data to improve model performance.
  • Train-Test Split: Split the dataset into training and testing sets.
  • Model Training: Train a KNN classifier on the training data.
  • Model Evaluation: Evaluate the model's performance using various metrics like accuracy, precision, recall, F1-score, and confusion matrix. Visualize the performance using appropriate plots such as the confusion matrix and ROC curve.
  • Result Interpretation: Based on the output of the model, classify the tumors as malignant or benign.
Sample Source Code
  • import numpy as np
    import pandas as pd
    import seaborn as sns
    import matplotlib.pyplot as plt
    from sklearn.datasets import load_breast_cancer
    from sklearn.model_selection import train_test_split
    from sklearn.preprocessing import StandardScaler
    from sklearn.neighbors import KNeighborsClassifier
    from sklearn.metrics import accuracy_score, confusion_matrix, classification_report, roc_curve, auc

    # 1. Load dataset
    data = load_breast_cancer()
    X = data.data
    y = data.target

    # 2. Data Preprocessing
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
    scaler = StandardScaler()
    X_train = scaler.fit_transform(X_train)
    X_test = scaler.transform(X_test)

    # 3. Train the KNN classifier
    knn = KNeighborsClassifier(n_neighbors=5)
    knn.fit(X_train, y_train)

    # 4. Model evaluation
    y_pred = knn.predict(X_test)

    # Accuracy score
    accuracy = accuracy_score(y_test, y_pred)
    print(f"Accuracy: {accuracy * 100:.2f}%")

    # Confusion Matrix
    conf_matrix = confusion_matrix(y_test, y_pred)
    sns.heatmap(conf_matrix, annot=True, fmt='d', cmap='Blues', xticklabels=data.target_names, yticklabels=data.target_names)
    plt.title('Confusion Matrix')
    plt.xlabel('Predicted')
    plt.ylabel('Actual')
    plt.show()

    # Classification Report
    print("Classification Report:")
    print(classification_report(y_test, y_pred, target_names=data.target_names))

    # ROC Curve
    fpr, tpr, thresholds = roc_curve(y_test, knn.predict_proba(X_test)[:, 1])
    roc_auc = auc(fpr, tpr)

    plt.figure()
    plt.plot(fpr, tpr, color='darkorange', lw=2, label=f'ROC curve (area = {roc_auc:.2f})')
    plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--')
    plt.xlim([0.0, 1.0])
    plt.ylim([0.0, 1.05])
    plt.xlabel('False Positive Rate')
    plt.ylabel('True Positive Rate')
    plt.title('Receiver Operating Characteristic')
    plt.legend(loc='lower right')
    plt.show()
Screenshots
  • Breast Cancer Detection Screenshot