Python vs Rust: k-means at scale

Problem

The same algorithm in two languages can have wildly different real-world cost. K-means is a small, well-defined kernel that exercises memory layout, vectorization, and parallelism — a clean target for a head-to-head.

Approach

Python side: NumPy + scikit-learn baseline, plus a hand-rolled NumPy implementation to control assignment + update loops directly.
Rust side: from-scratch implementation using ndarray and Rayon for parallel assignment.
Both run on the same synthetic and real datasets across a range of n, k, and dimensionality.
Measured cold-start, per-iteration cost, and steady-state throughput.

Python vs Rust: k-means at scale

Nilesh Patil

Problem

Approach

Links

Share on