overview for xadiant

Which local models are best for writing “literature” in c/localllama@poweruser.forum

[–] xadiant@alien.top 1 points 2 years ago

As I understand LLMs basically write the average pattern of a billion books, so when you add gpt-4 and 3.5 data into the mix, which averages the average, things get boring very fast. For model suggestion, Yi-34b based ones look fine for literary purposes.

I think being very specific and editing (co-writing with the model) could help. Some LoRA training on specific books could be helpful to mimic a certain style.

High temperature and repetition penalty could help too.

DPO models seem to be pretty good in c/localllama@poweruser.forum

[–] xadiant@alien.top 1 points 2 years ago

High repetition penalty? One model I merged suddenly started speaking Spanish in one summarisation task lol

A Lecture on How to Scale Open LLMs to GPT4 Level in c/localllama@poweruser.forum

[–] xadiant@alien.top 1 points 2 years ago

Really cool, will check the video out. Since we found an actually qualified person though, let me ask a few layman questions, hope you have time to answer them!

sampling methods. Most of them look simple, but we still don't really know how to tune them. Do you think novel sampling methods or specific combinations could improve output quality by a lot?

For instance, beam search. Does beam search provide a linear improvement in quality as you go up or not?

Do you think ideal numbers for temperature, top_k and top_p are context or model based, or both?

1

Any Easy and Local Way to Run Benchmarks? (alien.top)

submitted 2 years ago by xadiant@alien.top to c/localllama@poweruser.forum

7 comments fedilink

I want to see if some presets and custom modifications work well in benchmarks, but running HellaSwag or MMLU looks too complicated for me, and it takes 10+ hours to upload 20GBs of data.

I assume there isn't a convenient webui for chumps to run benchmarks with (apart from ooba perplexity, which I assume isn't the same thing?). Any advise?

Venus-120b: A merge of three different models in the style of Goliath-120b in c/localllama@poweruser.forum

[–] xadiant@alien.top 1 points 2 years ago

Exactly what I was thinking. I just fail miserably each time I merge the layers.

Venus-120b: A merge of three different models in the style of Goliath-120b in c/localllama@poweruser.forum

[–] xadiant@alien.top 1 points 2 years ago (4 children)

Any tips/attempts on frankensteining 2 yi-34b models together to make a ~51B model?

StyleTTS 2 - Closes gap further on TTS quality + Voice generation from samples in c/localllama@poweruser.forum

[–] xadiant@alien.top 1 points 2 years ago (1 children)

That's acceptable. Did you full train or fine-tune though? And how much data?

StyleTTS 2 - Closes gap further on TTS quality + Voice generation from samples in c/localllama@poweruser.forum

[–] xadiant@alien.top 1 points 2 years ago (4 children)

Goddammit, I just fine-tuned Tortoise with custom voice. Can't wait for webui's for the StyleTTS. Hope it's easy to fine-tune

What is the best code generation model aside from gpt-4? in c/localllama@poweruser.forum

[–] xadiant@alien.top 1 points 2 years ago

CodeBooga.

I wonder theres way to run LLM without loading on ram in c/localllama@poweruser.forum

[–] xadiant@alien.top 1 points 2 years ago (1 children)

Sure, it's just going to generate 5 tokens per week

New Microsoft codediffusion paper suggests GPT-3.5 Turbo is only 20B, good news for open source models? in c/localllama@poweruser.forum

[–] xadiant@alien.top 1 points 2 years ago (5 children)

No fucking way. GPT-3 has 175B params. In no shape or form they could have discovered the "secret sauce" to make an ultra smart 20B model. TruthfulQA paper suggests that bigger models are more likely to score worse, and ChatGPT's TQA score is impressively bad. I think the papers responsible for impressive open-source models are max 12-20 months old. Turbo version is probably quantized, that's all.