Implementation of Support Vector Machine (SVM) using sklearn in Python
Share
Condition for Support Vector Machine (SVM) Implementation Using Python
Description:
Support Vector Machine (SVM) is a supervised learning algorithm primarily used for classification
tasks but can also be applied to regression problems. The core idea behind SVM is to find the
optimal hyperplane that best separates data points of different classes in a feature space. It
maximizes the margin between the closest points of each class (support vectors), which helps in
generalizing the model to unseen data.
SVM can handle both linear and non-linear classification problems by using different kernel
functions such as linear, polynomial, and radial basis function (RBF).
Why Should We Choose SVM?
Effective in High Dimensional Spaces:
SVM performs well in high-dimensional spaces, making it ideal for text classification, image recognition, and other applications with complex features.
Memory Efficient:
Only the support vectors are used in the decision-making process, making it memory efficient.
Versatility:
It can be used for both classification and regression tasks.
Robust to Overfitting:
With proper regularization and kernel choice, SVM can avoid overfitting, especially in high-dimensional spaces.
Step by Step Process
Step 1: Import Libraries
Load necessary Python libraries for data manipulation, modeling, and evaluation.
Step 2: Load Data
Choose a suitable dataset for classification (not using Iris or Wine).
Step 3: Preprocess Data
Handle missing values, scaling features, and splitting the data into training and testing sets.
Step 4: Train the SVM Model
Choose an appropriate kernel (linear or non-linear) and train the model using SVC or SVR from sklearn.
Step 5: Evaluate the Model
Use performance metrics like accuracy, confusion matrix, and classification report to evaluate the model.
Step 6: Visualization
Plot decision boundaries and performance metrics.
Sample Source Code
# Import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import classification_report, confusion_matrix
# Load the Breast Cancer dataset
data = datasets.load_breast_cancer()
X = data.data
y = data.target
# Check the shape of the dataset
print(f"Features shape: {X.shape}")
print(f"Target shape: {y.shape}")
# Split the dataset into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Standardize the data (important for SVM)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
# Instantiate the SVM classifier with RBF kernel
svm = SVC(kernel='rbf', gamma='scale', C=1)
# Fit the model to the training data
svm.fit(X_train, y_train)
# Make predictions on the test set
y_pred = svm.predict(X_test)
# Evaluate the model performance
print("Classification Report:")
print(classification_report(y_test, y_pred))