Problem

The same algorithm in two languages can have wildly different real-world cost. K-means is a small, well-defined kernel that exercises memory layout, vectorization, and parallelism — a clean target for a head-to-head.

Approach

  • Python side: NumPy + scikit-learn baseline, plus a hand-rolled NumPy implementation to control assignment + update loops directly.
  • Rust side: from-scratch implementation using ndarray and Rayon for parallel assignment.
  • Both run on the same synthetic and real datasets across a range of n, k, and dimensionality.
  • Measured cold-start, per-iteration cost, and steady-state throughput.