overview for candre23

Qwen-72B released in c/localllama@poweruser.forum

[–] candre23@alien.top 1 points 11 months ago (2 children)

we have expanded the context window length to 32K

Kinda buried the lead here. This is far and away the biggest feature of this model. Here's hoping it's actually decent as well!

55B Yi model merges in c/localllama@poweruser.forum

[–] candre23@alien.top 1 points 11 months ago

It's a new foundational model, so some teething pains are to be expected. Yi is heavily based on (directly copied, for the most part) llama2, but there are just enough differences in the training parameters that default llama2 settings don't get good results. KCPP has already addressed the rope scaling, and I'm sure it's only a matter of time before the other issues are hashed out.

Best model-setup for CPU-only? in c/localllama@poweruser.forum

[–] candre23@alien.top 1 points 11 months ago (1 children)

70b models will be extremely slow on pure CPU, but you're welcome to try. There's no point in looking on "torrent sites" for LLMs - literally everything is hosted on huggingface.

Best model-setup for CPU-only? in c/localllama@poweruser.forum

[–] candre23@alien.top 1 points 11 months ago (3 children)

Yes, your GPU is too old to be useful for offloading, but you could still use it for prompt processing acceleration at least.

With your hardware, you want to use koboldCPP. This uses models in GGML/GGUF format. You should have no issue running models up to 120b with that much RAM, but large models will be incredibly slow (like 10+ minutes per response) running on CPU only. Recommend sticking to 13b models unless you're incredibly patient.

55B Yi model merges in c/localllama@poweruser.forum

[–] candre23@alien.top 1 points 11 months ago (3 children)

All yi models are extremely picky when it comes to things like prompt format, end string, and rope parameters. You'll get gibberish from any of them unless you get everything set up just right, at which point they perform very well.

Why is no one releasing 70b models? in c/localllama@poweruser.forum

[–] candre23@alien.top 1 points 11 months ago (1 children)

It's adorable that you think any 13b model is anywhere close to a 70b llama2 model.

It is time keep hoarding AI models as Chinese censorship hits NYC based Huggingface the biggest AI library in c/datahoarder@selfhosted.forum

[–] candre23@alien.top 1 points 11 months ago

Anywhere from 1 to several hundred GB. Quantized (compressed), the most popular models are 8-40gb each. LORAs are a lot smaller, but full models take up a lot of space.

Hardware Q's: Best model performance with 75+ 30 series GPU's? in c/localllama@poweruser.forum

[–] candre23@alien.top 1 points 11 months ago

No idea why you would need ~1800GB vram.

Homeboy's waifu is gonna be THICC.

Tesla P40 cards - what cooling solutions work well? in c/localllama@poweruser.forum

[–] candre23@alien.top 1 points 11 months ago

Extremely effective and definitely the quietest option, but requires a lot of space: https://www.printables.com/model/484282-nvidia-tesla-p40-120mm-blower-fan-adapter-straight

Is it worth using a bunch of old GTX 10 series cards ( like 1060 1070 1080 ) for running local LLM? in c/localllama@poweruser.forum

[–] candre23@alien.top 1 points 11 months ago

The ONLY pascal card worth bothering with is the P40. It's not fast, but it's the cheapest way to get a whole bunch of usable vram. Nothing else from that generation is worth the effort.

Sam Altman out as CEO of OpenAI. Mira Murati is the new CEO. in c/localllama@poweruser.forum

[–] candre23@alien.top 1 points 11 months ago

And Brockman just quit. Hell of a shakeup over there.

https://arstechnica.com/information-technology/2023/11/openai-president-greg-brockman-quits-as-nervous-employees-hold-all-hands-meeting/

Microsoft announced the Maia 100 AI Accelerator Chip. It's also expanding the use of the AMD MI300 in it's datacenters. Is this the beginning of the end of CUDA dominance? in c/localllama@poweruser.forum

[–] candre23@alien.top 1 points 11 months ago

Is this the beginning of the end of CUDA dominance?

Not unless intel/AMD/MS/whoever ramps up their software API to the level of efficiency and just-works-edness that cuda provides.

I don't like nvidia/cuda any more than the next guy, but it's far and away the best thing going right now. If you have an nvidia card, you can get the best possible AI performance from it with basically zero effort on either windows or linux.

Meanwhile, AMD is either unbearably slow with openCL, or an arduous slog to get rocm working (unless you're using specific cards on specific linux distros). Intel is limited to openCL at best.

Until some other manufacturer provides something that can legitimately compete with cuda, cuda ain't going anywhere.