Problem
The same algorithm in two languages can have wildly different real-world cost. K-means is a small, well-defined kernel that exercises memory layout, vectorization, and parallelism — a clean target for a head-to-head.
Approach
- Python side: NumPy + scikit-learn baseline, plus a hand-rolled NumPy implementation to control assignment + update loops directly.
- Rust side: from-scratch implementation using
ndarrayand Rayon for parallel assignment. - Both run on the same synthetic and real datasets across a range of n, k, and dimensionality.
- Measured cold-start, per-iteration cost, and steady-state throughput.
Links
- Repo: github.com/nilesh-patil/pythonvsrust-kmeans
- Related blog post: Distributed K-Means Clustering in Python