Research Breakthrough Possible @S-Logix pro@slogix.in

Office Address

Social List

Implementation of Support Vector Machine (SVM) using sklearn in Python

SVM Implementation Screenshot

Condition for Support Vector Machine (SVM) Implementation Using Python

  • Description:
    Support Vector Machine (SVM) is a supervised learning algorithm primarily used for classification tasks but can also be applied to regression problems. The core idea behind SVM is to find the optimal hyperplane that best separates data points of different classes in a feature space. It maximizes the margin between the closest points of each class (support vectors), which helps in generalizing the model to unseen data.
  • SVM can handle both linear and non-linear classification problems by using different kernel functions such as linear, polynomial, and radial basis function (RBF).
Why Should We Choose SVM?
  • Effective in High Dimensional Spaces:
    SVM performs well in high-dimensional spaces, making it ideal for text classification, image recognition, and other applications with complex features.
  • Memory Efficient:
    Only the support vectors are used in the decision-making process, making it memory efficient.
  • Versatility:
    It can be used for both classification and regression tasks.
  • Robust to Overfitting:
    With proper regularization and kernel choice, SVM can avoid overfitting, especially in high-dimensional spaces.
Step by Step Process
  • Step 1: Import Libraries
    Load necessary Python libraries for data manipulation, modeling, and evaluation.
  • Step 2: Load Data
    Choose a suitable dataset for classification (not using Iris or Wine).
  • Step 3: Preprocess Data
    Handle missing values, scaling features, and splitting the data into training and testing sets.
  • Step 4: Train the SVM Model
    Choose an appropriate kernel (linear or non-linear) and train the model using SVC or SVR from sklearn.
  • Step 5: Evaluate the Model
    Use performance metrics like accuracy, confusion matrix, and classification report to evaluate the model.
  • Step 6: Visualization
    Plot decision boundaries and performance metrics.
Sample Source Code
  • # Import necessary libraries
    import numpy as np
    import matplotlib.pyplot as plt
    from sklearn import datasets
    from sklearn.model_selection import train_test_split
    from sklearn.preprocessing import StandardScaler
    from sklearn.svm import SVC
    from sklearn.metrics import classification_report, confusion_matrix

    # Load the Breast Cancer dataset
    data = datasets.load_breast_cancer()
    X = data.data
    y = data.target

    # Check the shape of the dataset
    print(f"Features shape: {X.shape}")
    print(f"Target shape: {y.shape}")

    # Split the dataset into training and testing sets (80% train, 20% test)
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

    # Standardize the data (important for SVM)
    scaler = StandardScaler()
    X_train = scaler.fit_transform(X_train)
    X_test = scaler.transform(X_test)

    # Instantiate the SVM classifier with RBF kernel
    svm = SVC(kernel='rbf', gamma='scale', C=1)

    # Fit the model to the training data
    svm.fit(X_train, y_train)

    # Make predictions on the test set
    y_pred = svm.predict(X_test)

    # Evaluate the model performance
    print("Classification Report:")
    print(classification_report(y_test, y_pred))

    print("Confusion Matrix:")
    print(confusion_matrix(y_test, y_pred))

    # Visualizing the Decision Boundary
    X_train_2d = X_train[:, :2]
    X_test_2d = X_test[:, :2]
    svm.fit(X_train_2d, y_train)

    # Create a mesh grid for plotting the decision boundary
    xx, yy = np.meshgrid(np.linspace(X_train_2d[:, 0].min(), X_train_2d[:, 0].max(), 100),
    np.linspace(X_train_2d[:, 1].min(), X_train_2d[:, 1].max(), 100))

    # Predict for each point in the mesh grid
    Z = svm.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)

    # Plot the decision boundary
    plt.contourf(xx, yy, Z, alpha=0.75, cmap='coolwarm')
    plt.scatter(X_train_2d[:, 0], X_train_2d[:, 1], c=y_train, marker='o', edgecolors='k', cmap='coolwarm', label='Train')
    plt.scatter(X_test_2d[:, 0], X_test_2d[:, 1], c=y_test, marker='^', edgecolors='k', cmap='coolwarm', label='Test')
    plt.title('SVM Decision Boundary (2D projection)')
    plt.xlabel('Feature 1')
    plt.ylabel('Feature 2')
    plt.legend()
    plt.show()
Screenshots
  • Support Vector Machine (SVM) Implementation Using Python