this post was submitted on 28 Nov 2023
1 points (100.0% liked)

LocalLLaMA

3 readers
1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 1 year ago
MODERATORS
 

I know the typical answer is "no because all the libs are in python".. but I am kind of baffled why more porting isn't going on especially to Go given how Go like Python is stupid easy to learn and yet much faster to run. Truly not trying to start a flame war or anything. I am just a bigger fan of Go than Python and was thinking coming in to 2024 especially with all the huge money in AI now, we'd see a LOT more movement in the much faster runtime of Go while largely as easy if not easier to write/maintain code with. Not sure about Rust.. it may run a little faster than Go, but the language is much more difficult to learn/use but it has been growing in popularity so was curious if that is a potential option.

There are some Go libs I've found but the few I have seem to be 3, 4 or more years old. I was hoping there would be things like PyTorch and the likes converted to Go.

I was even curious with the power of the GPT4 or DeepSeek Coder or similar, how hard would it be to run conversions between python libraries to go and/or is anyone working on that or is it pretty impossible to do so?

top 50 comments
sorted by: hot top controversial new old
[–] mmgaggles@alien.top 1 points 11 months ago

It always blew my mind that people were using boto3 to train from s3, or some sort of hack solution like s3fs [1]. The AWS plugin for cpp was a good answer to this, but it’s in stasis while it’s job moves to torch.data [2].

boto3 and s3fs can’t hold a candle to a good go / rust / cpp based s3 client.

[1] https://pytorch.org/data/beta/dp_tutorial.html#accessing-aws-s3-with-fsspec-datapipes [2] https://github.com/aws/amazon-s3-plugin-for-pytorch

[–] innocent2powerful@alien.top 1 points 11 months ago (1 children)

The demand is truly exist. The highest voted answer didn’t recognize the machine cost of serving a full AI product. Algorithm related library is fast, but langchain is so slow. What I know is many companies are replacing python with go now

[–] Dry-Vermicelli-682@alien.top 1 points 11 months ago

Well that is interesting. What libraries are they using in Go do you know? Or are they building their own from scratch. I would imagine there would be some movement to translate python to Go for some situations. But there is a couple examples within this thread that show some good use of Go with OpenAI (and a local llama as well).

I am thinking that I could stand up a model locally that uses OpenAI API, and then write some code in Go that calls the OpenAI APIs of the model.. and then it would likely swap to ChatGPT APIs or if we decide to run our own larger model in the cloud.

[–] filchr@alien.top 1 points 11 months ago

If you dislike Python but you still want a high level feel but performant code you can try Julia. ML libraries in Julia are written well.. in Julia.

[–] AppointmentPatient98@alien.top 1 points 11 months ago (1 children)

Most AI training code is just a big for loop with each line calling highly performant c/cpp libraries underneath. There is no value that go or rust can add here.

[–] dobkeratops@alien.top 1 points 11 months ago

Most AI training code is just a big for loop with each line calling highly performant c/cpp libraries underneath. There is no value that go or rust can add here (most libs used in python are not in python only their stubs are in python).

some people want to train on procedural generators (eg game engine) which would be in C++. being able to have the whole codebase in one language would smooth this out. (in my case I have a rust 3d engine code base that i'd like to use to drive AI )

ggml is a great idea IMO.

[–] Disastrous_Elk_6375@alien.top 1 points 11 months ago (1 children)

There's some movement in the rust space, the main advantage being that you can compile to wasm and serve models in any browser. There are several efforts in this direction. This can also be linked to edge-computing, with more services starting to use wasm/wasi etc. There's a world where you have your entire codebase in rust, and you get to deliver models either to browsers or wasm "VMs" in an edge provider.

[–] Dry-Vermicelli-682@alien.top 1 points 11 months ago

That sounds pretty slick.

[–] ttkciar@alien.top 1 points 11 months ago (1 children)

I'm mostly using Bash and Perl.

[–] thewayupisdown@alien.top 1 points 11 months ago (1 children)

Could you elaborate on the Perl part, if possible? I don't mind learning as much Python as necessary as I go along, but I'd much rather be doing all that is convenient in Perl.

load more comments (1 replies)
[–] m98789@alien.top 1 points 11 months ago (1 children)
[–] Dry-Vermicelli-682@alien.top 1 points 11 months ago (1 children)

But the actual training code.. isn't there a crap ton of code that trains the model so that the model can respond with NLP and other capabilities? There has to be code behind all that somewhere? The ability for the "logic" of the AI to do what it does.. that code is python as well yah? I would assume that in Go or Rust or C would execute much faster and this AI could be much faster (and less memory, no python runtime, etc)? Or is there already some back end c/cpp code that does ALL that AI logic/guts, and python even for training models is still just glue that calls in to the c/cpp layer?

[–] m98789@alien.top 1 points 11 months ago (4 children)

Correct, even for training the models, all the Python code you see is really just a friendly interface over highly optimized C/cuda code.

There are no “loops” or matrix multiplication being done in Python. All the heavy lifting is done in lower level highly optimized code.

load more comments (4 replies)
[–] the_quark@alien.top 1 points 11 months ago (1 children)

The runtime of your code basically doesn't matter. You hand it off to a GPU for all the hard calculations, and even a fast environment, your code is going to spend 99% of its execution time waiting to get stuff back from the GPU.

I'm sure Python's I/O polling sleeps are just as efficient as Go's.

[–] Dry-Vermicelli-682@alien.top 1 points 11 months ago (1 children)

Interesting. I was thinking more the code that is used to train models. It seems the ability to run a model is pretty well covered with the likes of llamacpp and such. So not sure it makes much sense in that area. But I assume the team at OpenAI spent a lot of time wriiting code that is used for the training aspect? It can't just be a couple lines of code that read in some vector data and train the model, then write it out. There must be a ton of logic/etc for that as well?

But then again, maybe that doesnt need to be fast either. I don't know.

[–] the_quark@alien.top 1 points 11 months ago (2 children)

The model training is also GPU-intensive. If you read about model training costs, they talk about things like "millions of GPU-hours."

As I understand the process (note, I am a software developer, but do not professionally work on AI), you're feeding lots of example text into the GPU during the training process. The hard work is curating and creating those examples, but that's largely either human-intensive or you're using a LLM to help you with it, which is...GPU-intensive.

Optimizing the non-GPU code just isn't much of a win. I know all the cool kids hate Python because it's typeless and not compiled, but it's just not much of a bottleneck in this space.

load more comments (2 replies)
[–] n4pst3r3r@alien.top 1 points 11 months ago (1 children)

For rust there is candle. There are example binaries in the crate that you can just build to run e.g. a llama model. TheBloke also links to Candle in the documentation of models that it can run.

Another rust lib is burn, which has promising support for different backends, but can't run too many models yet.

[–] Dry-Vermicelli-682@alien.top 1 points 11 months ago

Burn looks interesting. So I think I am just lacking the understanding of how this all works. I assumed there is code that is used in AI that handles the models.. some sort of way to use the models as "data" but the NLP, AI "logic" brain, etc would be done in code. I assumed that that is largely the python code. I assumed that models were more or less data that a runtime AI engine uses to find answers to the questions asked, thus thought the model runners handled the NLP work and turned incoming queries in to some model specific format that allows the algorithms of the model to do what they do.. e.g. return responses and answers as if a human replied. I assumed ALL that was tons and tons of code done in python, and thus, was thinking if that is the runtime "brain" that AI uses, then wouldn't it run even faster in Go or Rust or something.

I am sadly not sure if I am explaining this right. I just assumed there was likely millions of lines of code behind the AI "brain" and that the model was basically gobs of data in some sort of.. for lack of a better word compressed database format. So when "training" occurs.. I am not entirely clear what is going on, other than it takes a ton of compute and it results in a single .gguf or similar file that is the model that can then be loaded by the likes of ollama, etc and then queried against by users using plain english. The code behind the training, the code behind running a model.. that is what I am foggy on. Is there code IN the model.. in binary format or something along with ALL the data it draws from?

I originally thought AI would use the internet in real time.. but that would clearly take a LOT longer for AI to search the web for answers and then formulate some sort of intelligent response rather than just some sort of paste of snippets it finds.

[–] Water-cage@alien.top 1 points 11 months ago

I use raw php /s

[–] perlthoughts@alien.top 1 points 11 months ago

yep and the future is optimizers, custom compiled cuda kernels, and more asic chips (eventually). python is just the glue that is commonly used. it's good glue though, but there are other glues...

[–] KyxeMusic@alien.top 1 points 11 months ago (1 children)

Learning it cause I believe it will be valuable in the future.

Sure python is just the glue, and we won't actually see much difference in terms of speed. But executing models as simple binaries without dependencies is more valuable than people think for scalability.

[–] Dry-Vermicelli-682@alien.top 1 points 11 months ago

That's an interesting response. I responded elsewhere that I am just missing how all this comes together. I assumed the runtime (e.g. python glue in this case) handles the NLP query you type in, turns it in to some meaningful structure that is then applied to the model to find responses. I am unclear if the model is the binary code/logic/brain of the AI.. or is it just a ton of data in a compressed format that the model runner uses to find stuff? I assume the former, since python is glue code apparently.

But also.. the training stuff.. I am unclear if that is gobs and gobs of python code.. if so, converted to Go or Rust.. wouldn't it train MUCH faster given the exponential runtime increase of something like Go or Rust? Or does that ALSO not matter since most of the training is done on GPUs/ASICs and again python is just glue code using those GPU/Asic libraries (which are likely in C/C++)? E.g. TensorFlow and the likes by nvidia I assume is used for training.

But the code that actually does training.. that is what I am really trying to understand. Code somehow results in an AI that can "think" (though not AGI or sentient.. but seems like it can think) like a human.. and respond with often much better details and data than any human. ALL of the data it is trained on.. is basically rows and rows of structures that are like key/values (or something like that) I assume, and somehow that results in a single file (gguf or whatever format it is) that a very little bit of python code can then execute..

I am just baffled how all the different parts work.. and the code behind those. I always assumed python was used in the early days due to "ease to learn" and that somehow the slow runtime speed nobody gave a shit about because back then it was just starting, but now that its big, its too late to "rewrite" in Go or Rust or what not. But it sounds like a lot of the training stuff uses native nvidia/asic/etc binary libraries (likely done in c/c++) and that the majority of the python code doesn't need speed of Go/Rust/C to run. it is again glue code that just uses the underlying c/c++ libraries provided by the hardware that is used for training?

[–] Exotic-Estimate8355@alien.top 1 points 11 months ago (1 children)

I disagree that go is as simple as python. Don’t stay only with the syntax but also see the stdlib, built ins, etc. Simply making a Fibonacci function with memoization in go is much more complicated in go and requires you to think of more low level unnecessary stuff than python. I’m python you have it in literally four lines of code

[–] _Lee_B_@alien.top 1 points 11 months ago (2 children)

Go is neither as simple as python, nor as powerful. In fact, I don't know of any modern general-purpose language that's more limited. It's faster, and produces native code, and it's type-safe to an extent, but that's about it. In almost every way, it's a bad excuse for a modern language.

[–] coehorn@alien.top 1 points 11 months ago (1 children)

Go doesn't have GIL and this reason alone is why I left Python years ago ;)

[–] Dry-Vermicelli-682@alien.top 1 points 11 months ago

What is GIL?

[–] Dry-Vermicelli-682@alien.top 1 points 11 months ago

I don't know why you say "nor as powerful". In which way? For a Go developer that knows Go, performance/memory wise, its much more powerful. Maintaining code.. its on par.. a go dev and a python dev are going to very likely be "equal" in ability to read/maintain code in their languages of choice. Go lacks some things, sure, and is more verbose in some areas, sure, but I dont see how that makes it a bad excuse for a modern language. On the contrary, it is very fast/easy to learn, produces small fast memory efficient (largely) binaries that can be built on all platforms for all platforms and has probably some of the best threading capabilities of any language.

I would argue that for the use case, perhaps python IS better suited for AI.. and I am just not at all knowledgeable in how that may be. So I'll give you that.. if the runtime and training bits of AI do NOT need the much more performance of a language like Go or Rust, then so be it. But if it's the usual "There are just good libraries in python that have been written for many years and would have to be rewritten in Go/Rust/etc" excuse.. then that doesn't tell me that python is better for AI, just that it would require work that nobody wants to do to convert the existing python libs that are so good, to Go/Rust/etc.

[–] A0sanitycomp@alien.top 1 points 11 months ago (1 children)

This is quite simple for me… I only know python and very small amounts of JavaScript/html/ and css. More important than efficiency gains is just me getting the job done which really is an efficiency gain in itself.

[–] Dry-Vermicelli-682@alien.top 1 points 11 months ago (1 children)

OK.. so that's fair, but I would counter with.. if Go/Rust were going to increase the runtime performance of training/using the AI models by a factor of 2, 3 or more, and the time to learn Go is a couple weeks and Rust a couple years (kidding.. sort of), if you're job is for years to come doing this sort of work and the end result of say, Go is training much faster or doing data prep much faster.. wouldnt the benefits of learning Go or even Rust be worth the exponential increase in time savings for training/running, as well as memory efficiency, less resources needed, etc?

Not saying you should, cause I don't even know if Go/Rust/Zig will result in much faster training/etc. I would assume if that were the case, then company's like OpenAI would have been using these languages already since they had the money and time to do so.

load more comments (1 replies)
[–] kilust@alien.top 1 points 11 months ago (1 children)

Actually using Rust with the Candle library for inference and production code for LLM. Still using python for training, as the majority of the ecosystem is in python. We choose Rust for the intrinsic benefits of the language, and also because we build desktop apps with Tauri. We also use rust for data preparation and other machine learning stuff other than LLMs. If you’re just starting up with Rust, I would recommend gaining more experience before using it as your main language for ML.

[–] Dry-Vermicelli-682@alien.top 1 points 11 months ago

So are you (your company) building models very fine tuned for specific needs of your company (products)? That is what I am trying to learn.. but knowing Go and not a big fan of Python (dont know it well), I was hoping I could utilize my knowledge of Go + the runtime speed/memory/etc to train my own custom models. I am NOT sure how all that works though. I feel like its just some loop that reads in the prepared data and puts it in a new format, and thats it. lol. I don't quite understand what "training" does. How it works. Or the code behind training. Is it some iterative process.. like keeps repeating the same thing so the AI "learns".. so like.. you ask it "What is 2+2" and it says 8, 7, 9, 13, 6, 5, 4, 2, 3, 4, 5, 4, 4, 5, 4, 4, 4, 4, 4, 4, .. ?? So eventually on some iteration it gets the right answer consistently.. and at that point you say "trained" next question?

[–] scooby374@alien.top 1 points 11 months ago (1 children)
[–] Dry-Vermicelli-682@alien.top 1 points 11 months ago

I saw that from another response. Very cool Burn also looks pretty good.

[–] okoyl3@alien.top 1 points 11 months ago

I'm using C++ TensorRT based code to do Inference in rust. It feels sad, but at least I got it working.

[–] AfterAte@alien.top 1 points 11 months ago

Do a test with llama-cpp.exe directly and using oobabooga (which uses llama-cpp-python) and see if there's a consistent difference. I'm guessing even glue can be a bottleneck.

[–] Traditional_Can_2394@alien.top 1 points 11 months ago (1 children)

For RAG in go, check out https://github.com/tmc/langchaingo. Still work in progress, but definitely usable

[–] Dry-Vermicelli-682@alien.top 1 points 11 months ago

I recently saw something about RAG. MAn.. so man damn acronyms.. brain is exploding already with too much. :D.

[–] _Lee_B_@alien.top 1 points 11 months ago (2 children)

Different trade-offs. Go is not python, and Rust is not Python, nor Go.

If you want raw CPU performance or very solid, reliable, production code that's maintainable and known-good, AND/OR you want code that is native, systems-level, and can be deployed on many devices and operating systems or even without an operating system, then some of the rust-based libraries might be the way to go.

If you're purely obsessed with CPU performance, assembly is the way to go, but using assembly optimally for machine learning on a modern CPU is a whole heap of study and work in its own right.

Arguably, but very importantly, any time you spend obsessing over such high-performance code for months could be obsolete by the time you're done coding it.

If you want easy, rapid development where you can focus on what the code DOES at a high level, with very cool meta-programming rather than being down in the weeds of how to move bytes around or who owns what piece of memory, python makes a lot more sense.

Honestly, I don't see much practical reason to go with a language like Go, though. It's a half way house that is neither one nor the other.

[–] Dry-Vermicelli-682@alien.top 1 points 11 months ago (1 children)

I hear you.. that's what I am trying to understand. I guess going back to when AI dev started, maybe Go wasn't around much (too early) and Rust as well. But I question why the AI blokes would choose a dynamic slow runtime language for things like training AI, which seems to be the massive bulk of the cpu/gpu work, over using much faster native binary languages like Go/Rust, or even C. But you said something, which maybe is what I am missing. Others have said this too. Python is more or less "glue code" to use underlying C (native) binary libaries. If that is the case, then I get it. I assumed the super crazy ass long training times and expensive GPUs needed was due in part that python is much slower runtime.. and that using Go/Rust/C would reduce the large training times by quite a bit if it was used. But I am guessing from all the responses that the python code just pushes the bulk of the work on to the GPU using native binary libs.. and thus the code done in python does not have to be super fust runtime. Thus, you pick up the "creative" side of python and benefit from using that in ways that might be harder to do in Go or Rust.

But some have replied they are using Rust for inference, data prep, etc.. I'll have to learn what Inference is.. not sure what that part is, and nor do I fully understand what data prep entails. Is it just turning gobs of all sorts of data in various formats in to a specific structure (I gather from some reading a vector database) that the training part understands the structure of that database.. so you're basically gathering data (Scraping the web, reading CSV files, github, etc) and putting that in to a very specific sort of key/value (or similar) structure, that the training bit then uses to train with?

load more comments (1 replies)
load more comments (1 replies)
[–] HarambeTenSei@alien.top 1 points 11 months ago

Because the ML people already learned python, are comfortable and are not interested in putting in the effort to learn a new language for basically no benefit

[–] __SlimeQ__@alien.top 1 points 11 months ago

I use C#. Initially I'd gone all out trying to wrap Llama.cpp myself, but I was getting outdated in a matter of weeks and it was going to take a ton of effort to keep up.

So instead I run a local ooba server and use the api. I get to do all my business logic in nice, structured C#, while all the python stuff says in ooba and I don't have to dig into it really at all.

[–] mantafloppy@alien.top 1 points 11 months ago
[–] swfsql@alien.top 1 points 11 months ago (1 children)

I'm studying AI on Rust with dfdx.
The cool side of it is that you can have a model where every tensor (shape dimensions and lengths) are checked at compile-time, and that includes layer parameters, allowing you to basically avoid runtime shape errors.

[–] Dry-Vermicelli-682@alien.top 1 points 11 months ago

I've no clue what that means.. I'll take your word for it. :). Just started on this AI journey and so far trying to learn how to run a model locally, figure out how to maximize what little hardware I have, but interested in the whole shabang.. how you gather data, what sort of data, what it looks like, how you format it (is that inference?) to then be ingested during the training step. What code is used for training.. what does training do, how does training result in a model, and what running the model does.. is the model a binary (code) that you just pass input and get output. So much to learn.

[–] seanpuppy@alien.top 1 points 11 months ago

I saw an interesting article somewhere that showed you can be a lot more memory efficient doing inference with Rust, since you dont have several GBs of python dependencies.

[–] squareOfTwo@alien.top 1 points 11 months ago (2 children)

Python is a degenerated language without strong typing etc. which will die out at some point just like Perl or Cobol. Don't listen to short cut answers like "Python is only glue!!!!".

Not all ML workload is best to be written in Python.

Use your tools wisely.

load more comments (2 replies)
[–] maccam912@alien.top 1 points 11 months ago (1 children)

I am using go for a story generation thing. It writes a story and does image prompts one at a time, but for generating images and narrations I can run those all in parallel super easily with go.

[–] Dry-Vermicelli-682@alien.top 1 points 11 months ago (1 children)

Is there a library you're using? Or you're just using go with a simple query and calling an API like OpenAI?

load more comments (1 replies)
[–] hanzabu@alien.top 1 points 11 months ago (2 children)

the slow python you think you r using is in fact optimized c code behind the scenes. the real next evolution is to decouple performance from GPUs and return to a more sane GPU/CPU and RAM combo. and creating new serious alternatives to CUDA.

load more comments (2 replies)
load more comments
view more: next ›