this post was submitted on 17 Nov 2023
1 points (100.0% liked)

LocalLLaMA

3 readers
1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 1 year ago
MODERATORS
top 29 comments
sorted by: hot top controversial new old
[–] migtissera@alien.top 1 points 11 months ago (1 children)

Just on another note, this place is just super hostile! I didn't think it would be, considering it's the LocalLLaMA sub-reddit and we are all here to support open source or freely available models.

This is harsher than the Twitter mob!

I'll still release models, but sorry guys, not coming here again.

[–] llama_in_sunglasses@alien.top 1 points 11 months ago

Sorry to hear that. This thread is pretty wild, almost every other model thread on LocalLlama has at most a few crazies and they get downvoted. Your Synthia models are fairly popular, so the reactions you got seems pretty out of place to me.

[–] Creative_Bottle_3225@alien.top 1 points 11 months ago

do you have to download 71GB to try it?! :-)

[–] CasimirsBlake@alien.top 1 points 11 months ago (1 children)

Tell me I'm going to need another GPU without telling me I'm going to need another GPU... Eeek.

[–] Sabin_Stargem@alien.top 1 points 11 months ago

When I built my gaming rig, I thought that I wouldn't need to update for several years. Then a AI came along and kicked my sandcastle into the surf.

My wallet is unhappy, and has already lost inches from the diet it has been put on.

[–] IxinDow@alien.top 1 points 11 months ago

How many tokens in your substack example?
Do you have examples of using model for fiction with length 16K-40K tokens?

[–] llama_in_sunglasses@alien.top 1 points 11 months ago

Thanks for the model, it's really nice to have some synthia magic on a Yi-34B 200K base.

Part of the generation from your suggested prompt:

The magnetic field of our planet is generated by an iron-nickel core that rotates like a dynamo, creating electric currents which in turn produce the magnetic force we experience as compass needles pointing northward when held still relative to this field's direction over time periods measured in years rather than seconds or minutes because it varies slightly due to solar wind interactions with upper layers known collectively as "ionosphere."

I found this particular output unintentionally hilarious because it reminds me a lot of the reddit comments I type out then delete because it's just some overexplainy run-on gibberish.

[–] pseudonerv@alien.top 1 points 11 months ago

I thought I saw a Tess-XL but it's gone, now. What happened?

[–] ReMeDyIII@alien.top 1 points 11 months ago (1 children)

According to TheBloke the Sequence Length is 8192 ctx, so I'm assuming 8192 ctx is its default and it can extend up to 200k ctx via alpha_scale?

[–] migtissera@alien.top 1 points 11 months ago
[–] mcmoose1900@alien.top 1 points 11 months ago (1 children)

Almost the same syntax as Yi Capybara. Excellent.

I propose all Yi 34B 200K finetunes use Vincuna-ish prompt syntax, so they can ALL be merged into one hellish voltron model.

[–] mcmoose1900@alien.top 1 points 11 months ago (1 children)

The deed is done:

https://huggingface.co/brucethemoose/Capybara-Tess-Yi-34B-200K

Seems coherent in transformers, I'm gonna quant it to exl2 and test it out.

[–] SomeOddCodeGuy@alien.top 1 points 11 months ago

Just wanted to come back and let you know I started using this last night, and this is fantastic. I haven't put it through much testing yet, but just know that on initial use I'm very impressed by this model for general purpose AI assistant. It's keeping to the Assistant's more informal speech patterns while also answering questions well and keeping up with large context. Those are 3 checkboxes I've never been able to check at once. This praise wont' get much visibility since it's an older thread, but just wanted to let you know at least.

[–] YearZero@alien.top 1 points 11 months ago (1 children)

Testing it now, but it's worse than 7b models on logic questions for me. Huge disappointment compared to Dolphin and Nous-Capybara, both Yi finetunes and are the best models I've tested so far. It just goes to show you how much difference finetuning a base model can make.

[–] drifter_VR@alien.top 1 points 11 months ago (1 children)

Nice, did you manage to make a difference between Dolphin and Nous-Capybara ? Bothe are pretty close to me

[–] YearZero@alien.top 1 points 11 months ago (1 children)
[–] drifter_VR@alien.top 1 points 11 months ago (1 children)

Thanks, I remember your tests, it's great you are still on it.So according to your tests, 34b models compete with GPT3.5. I am not too surprised. And Mistral-7b is not so far behind, what a beast !
Will you benchmark 70b models too ?

[–] YearZero@alien.top 1 points 11 months ago

Unfortunately I don't have enough ram/gpu, and too broke right now to afford paying for extra! But in the future I hope I will

[–] mcmoose1900@alien.top 1 points 11 months ago (1 children)

More random feedback: you should put some combination of Yi, 34B, and or 200K in the title.

No one tags anything on HF, so the only way to browse models is by title. I would have totally missed this in my Yi/34B searches if not for the Reddit post.

[–] Sabin_Stargem@alien.top 1 points 11 months ago

Yeah, it was only by luck that I stumbled onto this. Something like "Yi-34b-200k - Tess Medium" would work better.

[–] f1kkz@alien.top 1 points 11 months ago

500k context next? This is hilarious 😂

[–] sophosympatheia@alien.top 1 points 11 months ago

This model kicks ass. I strongly recommend trying it for roleplay. The 4-bit 32g act order GPTQ quant is on par with 70b models, so I can only imagine what higher-bit quants can do.

[–] harrro@alien.top 1 points 11 months ago
[–] PMMeYourWorstThought@alien.top 1 points 11 months ago

Fuck Yi and it's license model.

[–] bespoke-mushroom@alien.top 1 points 11 months ago

Read through the substack "conversation" with Tess. Obviously Tess is so good that it reveals a strange symmetry...

...A mathematical model (Tess) gives a seemingly coherent English language response, describing a seemingly coherent branch of theoretical physics, which after reading turns out to be nothing but mathematical gibberish.

Thank heavens civil engineers do not use phrases like "These infinities are due to the fact that particles can emit or absorb infinitely many virtual particles. Renormalization allows us to make sense of these infinities"

Thanks for your work on Tess, I am sure it can be used for either actual science, or even greater fantasy than QFT.

[–] BangkokPadang@alien.top 1 points 11 months ago (1 children)

What makes this any different than the “base” Yi-34B-200k model?

Where can we see a description of what the model has been finetuned on (datasets used, Lora’s used, etc.) and/or your methods for doing so? I’m not finding any of this information in the model card or the substack link.

[–] Slimxshadyx@alien.top 1 points 11 months ago

I’m not sure why he is being very vague with this model. He said it’s fine tuned to be better at instruct? I think

[–] Tiny_Arugula_5648@alien.top 1 points 11 months ago

What's the VRAM usage? a context that big can use an enormous amount..

[–] WinstonP18@alien.top 1 points 11 months ago

Thanks for publishing your model! When I clicked on the link above, it went to a HF page that says "Please download Tess-M-STEM series for reasoning, logic and STEM related tasks.". But when I went to your main profile page, I don't see any Tess models labeled as 'STEM', just TESS models with and without the word 'Creative'. Can I presume those in the latter group are the so-called 'STEM-specific' models?

Also, the dialogue shown in substack is pretty good! Is that done with the 'Creative' or STEM model?