this post was submitted on 27 Apr 2026
153 points (96.4% liked)

Programming

26701 readers
538 users here now

Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!

Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.

Hope you enjoy the instance!

Rules

Rules

  • Follow the programming.dev instance rules
  • Keep content related to programming in some way
  • If you're posting long videos try to add in some form of tldr for those who don't want to watch videos

Wormhole

Follow the wormhole through a path of communities !webdev@programming.dev



founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] mabeledo@lemmy.world 5 points 13 hours ago (1 children)

For which you still need massive amounts of memory and compute to run reliably. That, and the fact that chatbots and agents nowadays rely on all sorts of proprietary customizations, outside of the realm of LLMs, to perform certain tasks.

The gap will take decades to close, if it ever does.

[–] Eyekaytee@aussie.zone 0 points 12 hours ago* (last edited 11 hours ago) (1 children)

For which you still need massive amounts of memory and compute to run reliably

2026's average gaming PC is massive amounts of memory and compute apparently

The gap will take decades to close, if it ever does.

lol there are plenty of open source models in the top 100 with multiple SOTA models released in the last few months alone

There's also smaller LLM's being made like https://eurollm.io/ which excel in their own ways

That, and the fact that chatbots and agents nowadays rely on all sorts of proprietary customizations

Funny that just came up: https://discourse.ubuntu.com/t/the-future-of-ai-in-ubuntu/81130?=0

Previously, to benefit from the full power of LLMs, you had to skew to higher parameter models. Recent developments in models like Gemma 4 and Qwen-3.6-35B-A3B demonstrate advanced capabilities such as tool-calling which enable LLMs to search the web, interact with external APIs and file systems, troubleshoot live systems and fundamentally reason about topics that lie outside of their initial training data.

The gap will take decades to close, if it ever does.

😁

[–] mabeledo@lemmy.world 3 points 11 hours ago (1 children)

2026's average gaming PC is massive amounts of memory and compute apparently

Any model that can run on 16GB or less, is not going to be any close in real world tasks, to any other cloud based model. It just cannot be. There are people out there running Qwen on the Mac Studio with 96GB, and it falls short of cloud based models in both performance and speed.

lol there are plenty of open source models in the top 100 with multiple SOTA models released in the last few months alone

The top 100 of what, exactly? Many blended benchmark results are notoriously biased, and LLMs “cheat” on benchmarks on every single opportunity, so it is still hard to tell, outside of real world tasks and speed, which models are actually better than others.

But regardless, the main point of the gap is resources. Even if the average gaming computer was really enough to run meaningful models, the vast majority of the world wouldn’t have access to it, even more so in this day and age, where a single RAM stick couldn’t be bought with a whole monthly salary in most parts of the world.

[–] Eyekaytee@aussie.zone 0 points 10 hours ago (1 children)

But regardless, the main point of the gap is resources

What makes you think we won't have the resources in the future?

Any model that can run on 16GB or less, is not going to be any close in real world tasks, to any other cloud based model. It just cannot be.

Well you can compare Gemma 4 running in LM Studio on an average gaming PC to ChatGPT3.5 and you tell me? Or is your benchmark purely based on right at this very moment between open source models today vs cloud today?

For reference Gemma 4 is 26 billion parameters, gp3 thought to be over 175 billion and of course had no optimisations like MoE, it was searching its entire library every single question so was rather slow as well

We know as well that there is no slow down in pushing for optimisations, Deepseeks initial release was the initial driver for you don't have to just scale up using hardware alone

https://research.google/blog/turboquant-redefining-ai-efficiency-with-extreme-compression/

They're also pushing with Chinese native chips from Huawei trying to diversify away from nvidia holding the crown

The problem I've got is that you all have a god of the gaps, the conversation I was having 3 years ago was different to 2 years ago was different to 1 year ago, I was told AI could never do songs good enough then suddenly people were worried they couldn't tell the difference, then they said they could never do movies, now apparently not only is it good enough it's hilarious

https://www.youtube.com/watch?v=fgHn7PI55J4

The open source LLM's we have today are incredible and in the last few months we've had Qwen, GLM, Nemotron/Nvidia, Mistral, Google and heeaaps of others released, it feels like you're just looking for a reason to be dour and pessimistic but that's just me

Any way I'm off to sleep, have a good one :)

[–] mabeledo@lemmy.world 1 points 9 hours ago

The problem I've got is that you all have a god of the gaps, the conversation I was having 3 years ago was different to 2 years ago was different to 1 year ago

And I guess the problem I have with you, is that you seem to think that you can get results with 16GB, competitive with models that run on a Blackwell 6000 with 96GB, while ignoring the fact that the vast majority of the people in the world are running GPUs with 4 to 8 GB of VRAM, if they even have access to GPUs, at all.

That’s the gap. Most people don’t have the kind of money you think they do, and even those who do have some money, they will never achieve the same results as with cloud models, because if there’s a state of the art optimization that makes models 10 times smaller, cloud models will become 10 times bigger with that advantage. It’s pretty simple.