I know the typical answer is "no because all the libs are in python".. but I am kind of baffled why more porting isn't going on especially to Go given how Go like Python is stupid easy to learn and yet much faster to run. Truly not trying to start a flame war or anything. I am just a bigger fan of Go than Python and was thinking coming in to 2024 especially with all the huge money in AI now, we'd see a LOT more movement in the much faster runtime of Go while largely as easy if not easier to write/maintain code with. Not sure about Rust.. it may run a little faster than Go, but the language is much more difficult to learn/use but it has been growing in popularity so was curious if that is a potential option.
There are some Go libs I've found but the few I have seem to be 3, 4 or more years old. I was hoping there would be things like PyTorch and the likes converted to Go.
I was even curious with the power of the GPT4 or DeepSeek Coder or similar, how hard would it be to run conversions between python libraries to go and/or is anyone working on that or is it pretty impossible to do so?
โ
You should understand that Python was a leader in the data manipulation and statistics and scientific workloads and unix pipeline/glue spaces (having largely supplanted Perl and Awk and R) before becoming a leader in AI. AI was just a natural extension, because it had all the right stuff for manipulating data and running numbers, and manipulating data is really the bigger part of AI, aside from developing the NNA (neural network architecture) itself (but that is a specialised job for a handful of people, and not constantly reworked in the same way as training data. Python is not really slower for this kind of work, because of the accelerated underlying libraries for the NNA's, and usually being I/O bound in the data manipulation part anyway. In short, Python is the right tool for the job, rather than the wrong one, if you understand the actual problems that AI researchers face.
Inference is just running the models to do useful work, rather than training them. Rust can be used for that too. I do plan to use rust for this as well, but not in abandonment of python: in a different use case, where I want to be able to build executables that just work. Since python is interpreted, it's harder to just ship a binary that will work on any system. That matters for AI-based end-user mass-market applications far more than for AI-based training / inference. Rust can deploy almost anywhere, from servers to android to the client side of web browsers. That said, I'm concerned about the libraries that Rust might have available for AI and the other stuff my app will need, even though candle looks great so far.
Data prep is more like cleaning the input/training data before training on it.
The vector part that you're starting to get a sense of is not a data prep thing; it's much closer to how transformers work. They transform vectors in a hyperspace. So you throw all of the words into the space, and the AI learns the right vectors to represent the knowledge about how all those words relate.
A vector database is different: my understading is that you basically you load data, break it into chunks, project each chunk into a hyperspace (maybe the SAME shape of hyperspace by necessity, not sure), and store that (vector, data) key-value information as tokens in your LLM's context, like giving the AI an index card for reference, and it's the librarian, and then you ask it a question. It might know, or it can look to its card index, and dig out the information.