this post was submitted on 25 Nov 2023
1 points (100.0% liked)

LocalLLaMA

1 readers
1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 10 months ago
MODERATORS
 

I've tried a few of these models but it was some months ago. Have y'all seen any that can hold a conversation yet?

top 16 comments
sorted by: hot top controversial new old
[–] Lie-Reasonable@alien.top 1 points 10 months ago (1 children)
[–] ttkciar@alien.top 1 points 9 months ago (1 children)

Yep, I came here to suggest this. Marx-3B-v3 seems to be an improvement over the original, too.

Also, I have only used it a little, but NousResearch-Nous-Capybara-3B-v1.9 has been very good for me, so far.

[–] bot-333@alien.top 1 points 9 months ago

I suggest you to try IS-LM 3B.

[–] paryska99@alien.top 1 points 10 months ago (1 children)

There is new rocket 3b that might be worth a try. It's suspiciously high in benchmarks so i suspect contamination of the dataset, but I saw people have good experience with it.

[–] CardAnarchist@alien.top 1 points 9 months ago (1 children)

I was very impressed by rocket 3B in the brief time I tested it. Unfortunately I've only ever used mistral based 7B's so I can't compare it to older 7B's but it wouldn't surprise me if the benchmark results are accurate and it is as good as the old 7B's.

I'm glad I tried it as now I know to keep an eye on 3B progress. Might not be too long before 3B's are performing at the level of current mistral 7Bs!

One weird thing though. It crashed for me when I attempted to load in all layers even though I had the VRAM space and when loading 32/35 layers it gave me the same inference speed as when I load 32/35 layers of a 7B.

[–] paryska99@alien.top 1 points 9 months ago (1 children)

Thanks for the input.

What inference engine did you use? It's possibly a bug as these things tend to happen with the new models.
I for one can't wait for the lookahead decoding in llamacpp and others, combine that with some smaller models and we'll have blazing fast speeds on pennies worth of hardware from what i recon.

[–] CardAnarchist@alien.top 1 points 9 months ago

I use koboldccp.

You are probably right about it being a bug as at first I couldn't get the model to work at all (it crashed koboldccp when loading up) but it was just because I had a week old version of koboldccp. I needed to download the version that came out like 4 days ago (at that time) ha! Then it loaded up fine but with that already mentioned quirk. I guess it will get fixed in short time.

Yeah the future of local LLM's lies in the smaller models for sure!

[–] DirectionOdd9824@alien.top 1 points 9 months ago

Did any try phi-1.5?

[–] __SlimeQ__@alien.top 1 points 9 months ago

i did this 1.1B TinyLlama quant last week. it's not smart by any means but it can hold a conversation somewhat. and, crucially, it's 600mb

[–] bot-333@alien.top 1 points 9 months ago (1 children)

Not sure if self promotion here is allowed. I found my own IS-LM 3B to be the most coherent, verbose, and factual/correct 3B I've tried. IMO it's better than Rocket 3B, but it scores worse in benchmarks. I suspect a contamination in Rocket 3B.

[–] m98789@alien.top 1 points 9 months ago
[–] faldore@alien.top 1 points 9 months ago
[–] faldore@alien.top 1 points 9 months ago

Did you see Samantha-phi-1.5?

https://huggingface.co/ehartford/samantha-phi

Note this is pre-ChatML

I think I'll re-train this with ChatML

[–] Nonetendo65@alien.top 1 points 9 months ago

I've found Orca-Mini to be quite helpful for simple generation tasks < 200 tokens, given it's only 2.0GB it's quite powerful and easy to deploy on consumer hardware. Orca is the famous dataset that the wonderful Mistral 7B was trained on :)