Programming

27349 readers

207 users here now

Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!

Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.

Hope you enjoy the instance!

Rules

Follow the programming.dev instance rules
Keep content related to programming in some way
If you're posting long videos try to add in some form of tldr for those who don't want to watch videos

Wormhole

Follow the wormhole through a path of communities !webdev@programming.dev

founded 3 years ago

MODERATORS

snowe@programming.dev

Ategon@programming.dev

UlrikHD@programming.dev

bugsmith@programming.dev

Spyro@programming.dev

Local LLM agents (lemmy.world)

submitted 3 weeks ago by Kkk2237pl@lemmy.world to c/programming@programming.dev

33 comments fedilink hide all child comments

Has anyone tried in organization to use self hosted llm models for agentic programming?

Im curious if it makes any sense. My organization spends fortune on tokens from us companies. I want to recommend something…

you are viewing a single comment's thread
view the rest of the comments

[–] MagicShel@lemmy.zip 1 points 3 weeks ago

I run this setup with 36GB (32+4). Local LLMs can be really effective BUT you are constrained by context size in a way you aren't on cloud services.

Cline supports running a local model through lmstudio but my experience feeding it any significant tasks is it just can't handle reading and holding the contexts to build components for enterprise scale applications.

I use Claude to write a lot of utility one-off scripts. With a maximum window of 1M tokens I can hit 30+% context just writing Python scripts. API contracts, development standards, existing reusable modules, and sometimes reading the code/documentation of the services I'm going to be calling.

My MacBook can't handle 300k token contexts. 30k seems doable. I should see how it handles my utility script folder...

Anyway that's still no Claude but if you need a cheaper model and you can afford for developers to spend time on it before ultimately deciding they need to spend for Claude or Codex or Gemini, then rubbing a local model on a beefy MacBook is 100% an option.

Stepping up from there to building a locally hosted LLM is probably the worst of all worlds. It will be a beefy CapEx, prone to saturation by all the users, and you will most likely still have to punt the hardest jobs to cloud AI. It can certainly be done and done well, but the best example I know runs on $250-500k worth of hardware (to service a pretty big number of users to be fair).