overview for sdmat

Fitting 70B models in a 4gb GPU, The whole model, no quants or distil or anything! in c/localllama@poweruser.forum

[–] sdmat@alien.top 1 points 11 months ago

This technique is actually really useful for batch processing.

I.e. if you run 100 generations and reuse the layer while it is loaded that will go much faster than the total serial time.

[D] how to explain why RL is difficult to someone who knows nothing about it? in c/machinelearning@academy.garden

[–] sdmat@alien.top 1 points 11 months ago

Just try a huge number of ways to explain it. See how you go and iterate on the best approaches. Maybe trying to glean some high level concepts about education in the process would help.

Anyone spend a bunch of $$ on a computer for LLM and regret it? in c/localllama@poweruser.forum

[–] sdmat@alien.top 1 points 1 year ago

What did you go with?

Is there a technical reason that distributed LLMs don't exist? in c/localllama@poweruser.forum

[–] sdmat@alien.top 1 points 1 year ago

No, the primary concern is that network latency kills the serial performance of LLMs.

You can have a distributed llm getting decent throughput in total across many slow generations. You can't have a distributed LLM with throughput for a single generation competitive to running in a single cluster.

For roleplay purposes, Goliath-120b is absolutely thrilling me in c/localllama@poweruser.forum

[–] sdmat@alien.top 1 points 1 year ago

You can get preemptible A100s for $1/hr, so not exactly breaking the bank if willing to take the risk.