Delicious-View-8688

joined 1 year ago
[–] Delicious-View-8688@alien.top 1 points 11 months ago

Yes. There is an implementation that loads each layer as required - thereby reducing the VRAM requirements. Just Google it. LLaMa 70b with 4GB.

[–] Delicious-View-8688@alien.top 1 points 11 months ago

CPU speed ups... So... Mac's are back in the game for local LLM?

[–] Delicious-View-8688@alien.top 1 points 11 months ago

I think it never really became a thing. People/Teams/Orgs treated them as SWEs with ML applications and that is more or less what it became. Most seem to just throw xgboost and random models from huggingface at everything and see what sticks.