Programming

27334 readers

257 users here now

Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!

Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.

Hope you enjoy the instance!

Rules

Follow the programming.dev instance rules
Keep content related to programming in some way
If you're posting long videos try to add in some form of tldr for those who don't want to watch videos

Wormhole

Follow the wormhole through a path of communities !webdev@programming.dev

founded 3 years ago

MODERATORS

snowe@programming.dev

Ategon@programming.dev

UlrikHD@programming.dev

bugsmith@programming.dev

Spyro@programming.dev

Local LLM agents (lemmy.world)

submitted 3 weeks ago by Kkk2237pl@lemmy.world to c/programming@programming.dev

33 comments fedilink hide all child comments

Has anyone tried in organization to use self hosted llm models for agentic programming?

Im curious if it makes any sense. My organization spends fortune on tokens from us companies. I want to recommend something…

you are viewing a single comment's thread
view the rest of the comments

[–] eager_eagle@lemmy.world 9 points 3 weeks ago* (last edited 3 weeks ago) (2 children)

Qwen 3.6 and gemma4 models are the only ones usable for agentic prog sessions that I and my employer run locally. It's less stable and slower than third-party services, even on much better hardware (as it's with my employer). The best way is to go with a provider hosting deepseek flash/pro if your privacy policy allows though. It's going to be hard to beat their price.

[–] onlinepersona@programming.dev 2 points 3 weeks ago (1 children)

I thought those didn't support tool calling. Has that changed?

[–] eager_eagle@lemmy.world 4 points 3 weeks ago

they do

[–] adhdsergio@lemmy.world 1 points 3 weeks ago (1 children)

How many concurrent users and what hardware if i may ask?

[–] eager_eagle@lemmy.world 3 points 3 weeks ago* (last edited 3 weeks ago)

it's an h100, I think, no idea about how many users

in my personal setup i use quantized versions on a 3080, which is not great, so I still lean a lot on APIs