How to Implement K-Nearest Neighbors (KNN) Algorithm Using Scikit-learn in Python
Share
Condition for K-Nearest Neighbors (KNN) Algorithm Using scikit-learn in Python
Description:
K-Nearest Neighbors (KNN) is a simple and powerful algorithm used for classification and regression tasks. It works by finding the 'K' closest data points to a given input point and assigning a label based on the majority vote of these neighbors (for classification) or averaging their values (for regression). KNN is a non-parametric and lazy learning algorithm, meaning it does not require a training phase and makes decisions at the time of prediction.
In this documentation, we will implement a K-Nearest Neighbors Classifier using the scikit-learn library to classify data points.
Step by Step Process
Step 1: Data Collection
Collect or choose a dataset that contains labeled data points for classification.
Step 2: Data Preprocessing
Clean the data, handle missing values, and normalize the features (since KNN is sensitive to the scale of the data).
Step 3: Model Training
KNN does not have a traditional training phase; it memorizes the entire dataset.
Step 4: Distance Calculation
Calculate the distance between the test point and all training points.
Step 5: Finding Neighbors
Sort the distances and select the 'K' nearest points.
Step 6: Classification/Regression
For classification, use majority voting to predict the class label of the test point. For regression, calculate the average of the K-nearest values.
Step 7: Model Evaluation
Evaluate the model using various classification metrics such as accuracy, confusion matrix, precision, recall, and F1-score.
Sample Code
# Import necessary libraries
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
from sklearn.datasets import load_iris
# 1. Load dataset
data = load_iris()
X = data.data
y = data.target
# 2. Data Preprocessing
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Standardize the features (important for KNN)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
# 3. Initialize and Train the KNN Model
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)
# 7. Plot Decision Boundaries (for 2D features)
X_train_2d = X_train[:, :2] # Use only the first two features for visualization
X_test_2d = X_test[:, :2]
# Fit the model again using the 2D data
knn_2d = KNeighborsClassifier(n_neighbors=5)
knn_2d.fit(X_train_2d, y_train)