this post was submitted on 30 Jul 2024

1219 points (98.0% liked)

linuxmemes

26950 readers

2752 users here now

Hint: :q!

Sister communities:

Community rules (click to expand)

1. Follow the site-wide rules

Instance-wide TOS: https://legal.lemmy.world/tos/
Lemmy code of conduct: https://join-lemmy.org/docs/code_of_conduct.html

2. Be civil

Understand the difference between a joke and an insult.

Do not harrass or attack users for any reason. This includes using blanket terms, like "every user of thing".

Don't get baited into back-and-forth insults. We are not animals.

Leave remarks of "peasantry" to the PCMR community. If you dislike an OS/service/application, attack the thing you dislike, not the individuals who use it. Some people may not have a choice.

Bigotry will not be tolerated.

3. Post Linux-related content

Including Unix and BSD.

Non-Linux content is acceptable as long as it makes a reference to Linux. For example, the poorly made mockery of sudo in Windows.

No porn, no politics, no trolling or ragebaiting.

4. No recent reposts

Everybody uses Arch btw, can't quit Vim, <loves/tolerates/hates> systemd, and wants to interject for a moment. You can stop now.

5. 🇬🇧 Language/язык/Sprache

This is primarily an English-speaking community. 🇬🇧🇦🇺🇺🇸

Comments written in other languages are allowed.

The substance of a post should be comprehensible for people who only speak English.

Titles and post bodies written in other languages will be allowed, but only as long as the above rule is observed.

6. (NEW!) Regarding public figures

We all have our opinions, and certain public figures can be divisive. Keep in mind that this is a community for memes and light-hearted fun, not for airing grievances or leveling accusations.

Keep discussions polite and free of disparagement.

We are never in possession of all of the facts. Defamatory comments will not be tolerated.

Discussions that get too heated will be locked and offending comments removed.

Please report posts and comments that break these rules!

Important: never execute code or follow advice that you don't understand or can't verify, especially here. The word of the day is credibility. This is a meme community -- even the most helpful comments might just be shitposts that can damage your system. Be aware, be smart, don't remove France.

founded 2 years ago

MODERATORS

poopsmith@lemmy.world

zephyr@lemmy.world

rtxn@lemmy.world

1219

What is this? (Its OC!) (lemmy.world)

submitted 1 year ago* (last edited 1 year ago) by Smokeydope@lemmy.world to c/linuxmemes@lemmy.world

106 comments fedilink hide all child comments

List of icons/services suggested:

Calibre
Jitsi
Kiwix
Monero (Node)
Nextcloud
Pihole
Ollama (Should at least be able to run tiny-llama 1.1B)
Open Media Vault
Syncthing
VLC Media Player Media Server

you are viewing a single comment's thread
view the rest of the comments

[–] Finadil@lemmy.world 43 points 1 year ago (2 children)

Ollama on a ten year old laptop? Lol, maybe at 1T/s for an 8B.

[–] Smokeydope@lemmy.world 30 points 1 year ago* (last edited 1 year ago)

tinyllama 1.1b would probably run reasonably fast. Dumb as a rock for sure. But hey its a start! My 2015 t460 thinkpad laptop with an i7 6600U 2.6GhZ duo core was able to do llama 3.1 8B at 1.2T-1.7T/s which while definitely slow at about a word per second. Still, was also just fast enough to have fun in real time with conversation.

[–] abbadon420@lemm.ee 4 points 1 year ago (2 children)

Than what are the minimal specs to run ollama (llama3 8b or preferably 27b) at a decent speed?

I have an old pc that now runs my plex and arr suite. Was thinking of upgrading it a bit and running ollama on it as well. It doesn't have a gpu, so what else does it need? I don't have a big budget, so no new nvidia card for me.

[–] Smokeydope@lemmy.world 4 points 1 year ago* (last edited 1 year ago) (2 children)

"decent speed" depends on your subjective opinion and what you want it to do. I think its fair to say if it can generate text around your slowest tolerable reading speed thats a bare minimum for real time conversational things. If you want a task done and don't mind stepping away to get a coffee it can be much slower.

I was pleasantly suprised to get anything at all working on an old laptop. When thinking of AI my mind imagines super computers and thousand dollar rigs and data centers. I don't think mobile computers like my thinkpad. But sure enough the technology is there and your old POS can adopt a powerful new tool if you have realistic expectations on matching model capacity with specs.

Tiny llama will work on a smartphone but its dumb. llama3.1 8B is very good and will work on modest hardware but you may have to be patient with it if especially if your laptop wasn't top of the line when it was made 10 years ago. Then theres all the models in between.

The i7 6600U duo core 2.6ghz CPU in my laptop trying to run 8B was jusst barely enough to be passing grade for real time talking needs at 1.2-1.7 T/s it could say a short word or half of a complex one per second. When it needed to process something or recalculate context it took a hot minute or two.

That got kind of annoying if you were getting into what its saying. Bumping the PC up to a AMD ryzen 5 2600 6 core CPU was a night and day difference. It spits out a sentence very quick faster than my average reading speed at 5-6 t/s. Im still working on getting the 4GB RX 580 GPU used for offloading so those numbers are just with the CPU bump. RAM also matters DDR6 will beat DDR4 speed wise.

Heres a tip, most software has the models default context size set at 512, 2048, or 4092. Part of what makes llama 3.1 so special is that it was trained with 128k context so bump that up to 131072 in the settings so it isnt recalculating context every few minutes..

[–] brucethemoose@lemmy.world 2 points 1 year ago (1 children)

Heres a tip, most software has the models default context size set at 512, 2048, or 4092. Part of what makes llama 3.1 so special is that it was trained with 128k context so bump that up to 131072 in the settings so it isnt recalculating context every few minutes…

Some caveats, this massively increases memory usage (unless you quantize the cache with FA) and it also massively slows down CPU generation once the context gets long.

TBH you just need to not keep a long chat history unless you need it,.

[–] Smokeydope@lemmy.world 1 points 1 year ago* (last edited 1 year ago) (2 children)

Thank you thats useful to know. In your opinion what context size is the sweet spot for llama 3.1 8B and similar models?

[–] brucethemoose@lemmy.world 1 points 1 year ago* (last edited 1 year ago)

Oh I got you mixed up with the other commenter, apologies.

I'm not sure when llama 8b starts to degrade at long context, but I wanna say its well before 128K, and where other "long context" models start to look much more attractive depending on the task. Right now I am testing Amazon's mistral finetune, and it seems to be much better than Nemo or llama 3.1 out there.

[–] brucethemoose@lemmy.world 1 points 1 year ago (1 children)

4 core i7, 16gb RAM and no GPU yet

Honestly as small as you can manage.

Again, you will get much better speeds out of "extreme" MoE models like deepseek chat lite: https://huggingface.co/YorkieOH10/DeepSeek-V2-Lite-Chat-Q4_K_M-GGUF/tree/main

Another thing I'd recommend is running kobold.cpp instead of ollama if you want to get into the nitty gritty of llms. Its more customizable and (ultimately) faster on more hardware.

[–] Smokeydope@lemmy.world 1 points 1 year ago* (last edited 1 year ago) (1 children)

Thats good info for low spec laptops. Thanks for the software recommendation. Need to do some more research on the model you suggested. I think you confused me for the other guy though. Im currently working with a six core ryzen 2600 CPU and a RX 580 GPU. edit- no worries we are good it was still great info for the thinkpad users!

[–] brucethemoose@lemmy.world 1 points 1 year ago

8GB or 4GB?

Yeah you should get kobold.cpp's rocm fork working if you can manage it, otherwise use their vulkan build.

llama 8b at shorter context is probably good for your machine, as it can fit on the 8GB GPU at shorter context, or at least be partially offloaded if its a 4GB one.

I wouldn't recommend deepseek for your machine. It's a better fit for older CPUs, as it's not as smart as llama 8B, and its bigger than llama 8B, but it just runs super fast because its an MoE.

[–] abbadon420@lemm.ee 1 points 1 year ago

I have a 4 core i7, 16gb RAM and no GPU yet. I haven't tried anything yet, because I need to wipe windows and install mint first, but it sounds promising. Thanks for the details.

[–] brucethemoose@lemmy.world 2 points 1 year ago* (last edited 1 year ago)

Can you afford an Arc A770 or an old RTX 3060?

Used P100s are another good option. Even an RTX 2060 would help a ton.

27B is just really chunky on CPU, unfortunately. There's no way around it. But you may have better luck with MoE models like deepseek chat or Mixtral.