this post was submitted on 13 Nov 2023
1 points (100.0% liked)

LocalLLaMA

1 readers
1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 10 months ago
MODERATORS
 

Using and losing lots of money on gpt-4 ATM, it works great but for the amount of code I'm generating I'd rather have a self hosted model. What should I look into?

top 23 comments
sorted by: hot top controversial new old
[–] ababana97653@alien.top 1 points 10 months ago

For coding specifically, have you looked at Amazons CodeWhisper - I think it’s free for personal use

[–] xadiant@alien.top 1 points 10 months ago
[–] --dany--@alien.top 1 points 10 months ago (1 children)

Phind-CodeLlama 34B is the best model for general programming, and some techy work as well. But it's a bad joker, it only does serious work. Try quantized models if you don't have access to A100 80GB or multiple GPUs. 4 bit quantization can fit in a 24GB card.

[–] berzerkerCrush@alien.top 1 points 10 months ago

I tried the V7, which is supposedly better than GPT4. it couldn't do the things I asked it to do, unlike GPT 4 (through Bing Chat). DeepSeek also did a couple of things, but its solutions where sometimes not ideal. It's underwhelming.

The web search engine is interesting through.

[–] woodmastr@alien.top 1 points 10 months ago (2 children)

You might want to check EvalPlus Leaderboard

#Model Size pass@1

1🥇GPT-4 (May 2023)🗒️N/A⚡76.8

2🥈DeepSeek-Coder-instruct🗒️33B⚡72.6

3🥉DeepSeek-Coder-instruct🗒️6.7B⚡70.1

[–] Zemanyak@alien.top 1 points 10 months ago

Great board. I wish they had Phind 7 or whatever they use on their live website.

[–] KOTNcrow@alien.top 1 points 10 months ago

That board is in serious need of an update, check the Yi-34b model, very impressive. Dolphin 2.2 Yi 34b is a variant I cant wait to try.

[–] DifferentPhrase@alien.top 1 points 10 months ago (2 children)

As far as self hosted models go, deepseek-coder-33B-instruct is the best model I have found for coding. Anecdotally it seems more coherent and gives better results than Phind-CodeLlama-34B-v2.

[–] yonomono@alien.top 1 points 10 months ago (1 children)

Think this would be good-enough/suitable to use with AutoGPT/BabyAGI type situations? This is my main use-case, for bulk inspiration if not productivity. The API's can get expensive if left on full-automatic overnight.

[–] nishant299@alien.top 1 points 10 months ago

I wanna do something similar, please let me know what conclusion you reach to

[–] SlateHardjaw@alien.top 1 points 10 months ago (1 children)

What environment do you use to interact with self-hosted code models when coding? I've been using and enjoying Cursor for the way it's integrated into the IDE, but I've been exploring options for going self-hosted just to feel freer from whatever record I'm putting on someone else's server.

[–] DifferentPhrase@alien.top 1 points 10 months ago

My code editor of choice (Helix) doesn’t support integrations or plugins so I haven’t tried Cursor or Copilot. I’m building my own UI right now that focuses on first-class support for models served by llama.cpp.

[–] leepenkman@alien.top 1 points 10 months ago (1 children)

i would grab a server like vllm or text-generator.io (open source too)
Then get a model like others have suggested like deepseek or something to put in the server (both those servers are OpenAI compatible so makes switching easy)

[–] Charuru@alien.top 1 points 10 months ago

I've not heard of text-generator.io, is it as performant as vllm on multibatch or is it a wrapper around it?

[–] yonomono@alien.top 1 points 10 months ago

Think this would be good-enough/suitable to use with AutoGPT/BabyAGI type situations? This is my main use-case, for bulk inspiration if not productivity. The API's can get expensive if left on full-automatic overnight.

[–] amsat@alien.top 1 points 10 months ago

If you allow models to work together on the code base and allow them to criticize each other and suggest improvements to the code, the result will be better, this is if you need the best possible code, but it turns out to be expensive. So the best thing is the work of a team of models and not just one.

[–] spidLL@alien.top 1 points 10 months ago

How hosting any model, which at the moment would be inferior to GPT-4, would cost less than 20 dollars per month?

Anyway, GitHub Copilot cost 10 and has plugins for any IDE, in VsCode has also a chat. I don’t remember which model is based on, but works pretty well. You might try that.

[–] Illustrious-Lake2603@alien.top 1 points 10 months ago

Deepseek Coder 6.7b is able to write the game Snake. Not many are able to do this!!

[–] Super_Pole_Jitsu@alien.top 1 points 10 months ago

The phind.com model seems decent

[–] -Tesla@alien.top 1 points 10 months ago

I don't have an answer for you, but I am curious, how much code do you have it generate on an average work/programming day?

[–] xbaha@alien.top 1 points 10 months ago

i use Phindv, gpt3.5 free together and forward code between them to optimize and fix issues, works smooth for me.

[–] Hoang_Nghia_31@alien.top 1 points 10 months ago

I think copilot if another option it have chat extension now.

[–] Enough_Cheesecake_81@alien.top 1 points 10 months ago

I am using Meta’s AI: Code Llama via the API of deepinfra.