Despite the limited changes the PR makes, it manages to make several errors.
According to benchmarks in issue #31130:
- With broadcast: np.column_stack → 36.47 µs, np.vstack().T → 27.67 µs (24% faster)
- Without broadcast: np.column_stack → 20.63 µs, np.vstack().T → 13.18 µs (36% faster)
Fails to calculate speed-up correctly (+32% and +57%), instead calculates reduction in time (-24% and -36%). Also those figures are just regurgitated from the original issue.
The improvement comes from np.vstack().T doing contiguous memory copies and returning a view, whereas np.column_stack has to interleave elements in memory.
Regurgitated information from the original issue.
Changes
- Modified 3 files
- Replaced 3 occurrences of np.column_stack with np.vstack().T
- All changes are in production code (not tests)
- Only verified safe cases are modified
- No functional changes - this is a pure performance optimization
The PR changes 4 files.
