Post

🧭 K-Nearest Neighbors (KNN): The Complete Beginner's Guide

🧭 K-Nearest Neighbors (KNN): The Complete Beginner's Guide

K-Nearest Neighbors (KNN) is perhaps the most intuitive algorithm in the machine learning repertoire. Its logic is simple: “Tell me who your neighbors are, and I’ll tell you who you are.” It is a non-parametric, lazy learning algorithm used for both classification and regression.

1. Core Functions

While primarily known for classification, KNN is a versatile tool:

  • Classification: Assigns a label based on a plurality vote of the $k$ nearest neighbors.
  • Regression: Predicts a continuous value by averaging the values of the $k$ neighbors.
  • Imputation: Fills missing data by using the mean or mode of similar complete records.
  • Anomaly Detection: Identifies outliers as points lacking a dense neighborhood.

2. Model Architecture

Non-Parametric & Lazy

Unlike algorithms that learn weights (like Linear Regression), KNN is Non-Parametric: the number of “parameters” grows with the data. It is also a Lazy Learner, meaning there is no training phase. It simply “memorizes” the dataset and performs all logic during the prediction phase.

Key Hyperparameters

  • $k$ (Number of Neighbors): Controls the bias-variance tradeoff.
    • Low $k$: High variance (overfitting), sensitive to noise.
    • High $k$: High bias (underfitting), smoother decision boundaries.
  • Distance Metric: The mathematical definition of “closeness.”

3. Distance Metrics

The choice of distance determines how the algorithm perceives similarity.

MetricMathematical FormulaBest Use Case
Euclidean$d(x, y) = \sqrt{\sum (x_i - y_i)^2}$Standard straight-line distance.
Manhattan$d(x, y) = \sum |x_i - y_i|$High-dimensional or grid-based data.
Minkowski$(\sum |x_i - y_i|^p)^{1/p}$Generalized formula ($p=1$ is Manhattan, $p=2$ is Euclidean).
Hamming$\text{Count of differing symbols}$Categorical or binary data.
Cosine$\frac{A \cdot B}{|A| |B|}$Text similarity (orientation over magnitude).

4. The Prediction Mechanism

To predict a test sample, KNN follows a strict sequence:

  1. Distance Calculation: Computes the distance between the query point and every training point.
  2. Neighbor Selection: Sorts distances and isolates the $k$ points with the smallest values.
  3. Output Aggregation:
    • Classification: Uses Majority Voting (Plurality) to assign a class.
    • Regression: Uses Average Aggregation (Mean or Median).

Pro Tip: In binary classification, always use an odd $k$ to prevent mathematical ties!


5. Optimization & Accuracy

To get the most out of KNN, you must handle the data with care:

Feature Scaling

Because KNN is distance-based, features with large ranges will dominate the model. Normalization (0-1) or Standardization (Z-score) is mandatory.

The Curse of Dimensionality

As features increase, the “volume” of the space grows exponentially, making all points seem equidistant. Accuracy crashes.

  • Solution: Use PCA or Feature Selection to reduce noise.

Weighted Voting

Instead of equal votes, use Inverse Distance Weighting. This gives points that are physically closer to the query more influence than points further away.


6. Computational Complexity

KNN is “cheap” to train but “expensive” to use.

PhaseBrute ForceOptimized (Trees)
Training$O(1)$$O(d \cdot n \log n)$
Prediction$O(n \cdot d)$$O(d \cdot \log n)$
Space$O(n \cdot d)$$O(n \cdot d)$

where $n$ is samples and $d$ is dimensions.

To improve efficiency, use spatial partitioning:

  • KD-Trees: Great for low-dimensional, dense data.
  • Ball Trees: Better for high-dimensional or sparse data.

7. Visualizing the Boundaries

Decision Boundary Map

Shows the colored regions where the algorithm shifts its vote. As $k$ increases, these boundaries transition from jagged and complex to smooth and generalized.

Voronoi Diagram

A specific geometric map for $k=1$. It divides the entire space into “cells” around each training point. Every location inside a cell is assigned the class of the center point.


8. Summary of Sensitivities

Warning: KNN is highly sensitive to:

  1. Outliers: A single bad point can flip a local vote.
  2. Imbalance: Majority classes naturally “outvote” minority classes.
  3. Missing Data: You cannot calculate distance with null values.

Final Rule of Thumb

Start with $k = \sqrt{n}$ (where $n$ is your sample size) and refine using Cross-Validation.

This post is licensed under CC BY 4.0 by the author.