overview for jacobgorm

Improve developing efficiency in pySpark? [Discussion] in c/machinelearning@academy.garden

[–] jacobgorm@alien.top 1 points 2 years ago

I think you're right about the lazy eval. Can you somehow materialize or dump/reimport the 1000 rows view to use for experimentation.

FWIW sampling 1000 rows at random is the same as permuting the entire dataset at random and reading out the first 1000 rows, not sure if that would be feasible or help in your case, but merge sort would make this an O(n log n) operation, so in theory it should not be too horrible.

Improve developing efficiency in pySpark? [Discussion] in c/machinelearning@academy.garden

[–] jacobgorm@alien.top 1 points 2 years ago (7 children)

8 minutes to display 1000 rows? Sounds like a bug somewhere. How many bytes do you have per row, roughly?

[D] Rust in ML in c/machinelearning@academy.garden

[–] jacobgorm@alien.top 1 points 2 years ago

Because python allows you to prototype and iterate quickly, whereas in Rust you have to fight the compiler every step of the way to convince it to do what you want. People have been trying to build DL frameworks in languages such as Swift and C++ (dlib, Flashlight) but none have taken off.

Python can be a pita due to stuff like lack of multi-threading, but for most things it is quick and easy to experiment in, and the amount of code you have to write is not too far off from the corresponding mathematical notation, so for now I think it will keep its position as the most popular language for AI/ML.

Before we could use python, most researchers were using Matlab, which was really holding down progress due to its closed-source nature.