this post was submitted on 30 Oct 2023
1 points (100.0% liked)

LocalLLaMA

11 readers
4 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 2 years ago
MODERATORS
 

Hey everyone, happy to say I’m officially announcing Obsidian V0.5 as part of my work at Nous Research and building upon my work creating the Capybara V1.9 dataset.

This model is blazing fast and is likely the first Multi-modal model that is efficient enough to fit within the ram constraints of even a non-pro iphone! at practical speeds as well!

This model in its current state is largely a multi-modal version of Nous-Capybara-3B which I also only recently released, I’ve designed the dataset with novel synthesis methods (Paper currently being done) it’s made to be robust with conversational abilities and even includes multi-turn data that has been synthesized as a continuation of single turn data examples contained within datasets like Airoboros, Know_logic, EverythingLM and more.

It’s built using Llava 1.5 techniques but instead of a 7B llama as a base, we choose to use the new StableLM 3B model trained for 4 trillion tokens. (We plan to train upon Mistral likely as well)

Any questions or feedback are much appreciated!

Download here: https://huggingface.co/NousResearch/Obsidian-3B-V0.5

Or download quantized version here, Courtesy of Nisten: https://huggingface.co/nisten/obsidian-3b-multimodal-q6-gguf

top 12 comments
sorted by: hot top controversial new old
[–] emsiem22@alien.top 1 points 2 years ago

We plan to train upon Mistral likely as well

Hear, hear!

[–] toothpastespiders@alien.top 1 points 2 years ago (2 children)

I was extraordinarily skeptical of the utility of 3b models until...about 1 day ago when I gave orca mini a fair shot. In particular by training it on one specialized task. Which wound up producing results that honestly floored me.

All of which is to say that I'm VERY excited to see this. I really think the 3B models can be something of a perfect swiss army knife. Compact and always available. Multi modal capabilities are just a perfect fit for that exact type of methodology. Can't wait to give this a shot!

[–] InTheTransition@alien.top 1 points 2 years ago (1 children)

What was the task? Just curious about what I can use mini models for

[–] toothpastespiders@alien.top 1 points 2 years ago

Creating alpaca formatted json data from big blocks of text that often have a lot of garbage in it. The untrained orca 3b model wasn't able to stick to the format if I provided it as an example in the instructions. But it did great with it after training on a small dataset of about 100 examples or so.

It's still a bit early to call it a total success since I've only ran it through a handful of tests on similar blocks of text. But just the fact that it's grabbing facts from the text and correctly formulating prompts around it is really impressive to me. 13b trained on the same data set is, unsurprisingly, still quite a bit better. But 3b's still doing far far better than I would have thought possible. It'd be really cool to get a little scraping pipe going with next to no resource use.

[–] dogesator@alien.top 1 points 2 years ago (2 children)

I can almost guarantee you that Capybara 3B and Obsidian 3B will perform would perform even significantly better than orca mini. The base model that I’m using for training 3B is the much newer StableLM 3B model trained for 4 trillion tokens of training while orca mini base model is open llama 3B which was only trained on around 1-2 Trillion tokens and performs significantly worse.

[–] metalman123@alien.top 1 points 2 years ago (1 children)

When do you expect to have benchmarks?

[–] dogesator@alien.top 1 points 2 years ago

So far have only benchmarked Hellaswag and Arc Challenge but it’s significantly beating both WizardLM-13B and GPT4-X-Vicuna-13B on both benchmarks! These are not the latest sota models ofcourse but it’s amazing to see how this 3B model is surpassing the best 13B models of just 6 months ago.

I’ll see if we can have it benchmarked officially on the HF leaderboard this week so people can see how it compares with latest models.

[–] toothpastespiders@alien.top 1 points 2 years ago

Dang, given that I was already impressed with a model trained on half the tokens I suspect I will be impressed!

[–] altoidsjedi@alien.top 1 points 2 years ago

Please do train on mistral! very much looking forward to seeing how that works, I’m LOVING the mistral models.

[–] Icaruswept@alien.top 1 points 2 years ago

Seriously impressive work, well done!

[–] Beb_Nan0vor@alien.top 1 points 2 years ago

That's impressive! This is the type of thing I like to see.

[–] daaain@alien.top 1 points 2 years ago

Even the quantised version seems to be working pretty well with the stablelm-support branch, but either the template or model is missing the end token or the LlamaCPP branch isn't quite ready as the output just keeps going...

Does anyone else have the same problem and know what to do?

This is how I interpreted the template from the model card:

https://preview.redd.it/eqxvq2lefjxb1.png?width=1240&format=png&auto=webp&s=b5c1f1550b50fcf16ab1177894b890b852cf65a9