overview for BalorNG

55B Yi model merges in c/localllama@poweruser.forum

[–] BalorNG@alien.top 1 points 2 years ago

EXTRERMINATE!

A new way to speed up the work of transformers. in c/localllama@poweruser.forum

[–] BalorNG@alien.top 1 points 2 years ago

I say:

It has a performance hit, but it remains to be seen if going with a much larger model can compensate for that.
The model needs to be trained from scratch, you cannot finetune an existing model for this apparently...

X.AI Grok could potentially be open sourced on a 6 month delay from launch in c/localllama@poweruser.forum

[–] BalorNG@alien.top 1 points 2 years ago

I mean, you can jailbreak/browbeat chatgpt/Claude into going against guardrails relatively easily, I smash "X" for doubt that Grok is going to be any different. If it will, now THAT is going to huge, if not in a way we'd like to I guess...

🐺🐦‍⬛ **Big** LLM Comparison/Test: 3x 120B, 12x 70B, 2x 34B, GPT-4/3.5 in c/localllama@poweruser.forum

[–] BalorNG@alien.top 1 points 2 years ago (1 children)

That explains why Goliath worked and yours - not so much, I guess...

100B, 220B, and 600B models on huggingface! in c/localllama@poweruser.forum

[–] BalorNG@alien.top 1 points 2 years ago (1 children)

"Prompt Template: Alpeca" Wut?

Looks like a scam to be fair. I bet if you apply, you'll get "Just send us 100$ for access!"

[P] Just uploaded the first >34B Yi Model in c/machinelearning@academy.garden

[–] BalorNG@alien.top 1 points 2 years ago

Did you do post-merge retraining? Without at least some results are going to be poor...

🐺🐦‍⬛ **Big** LLM Comparison/Test: 3x 120B, 12x 70B, 2x 34B, GPT-4/3.5 in c/localllama@poweruser.forum

[–] BalorNG@alien.top 1 points 2 years ago (3 children)

Did you do post-merge training and how much?

Running full Falcon-180B under budget constraint in c/localllama@poweruser.forum

[–] BalorNG@alien.top 1 points 2 years ago

10s/tok and couple kilowatts of power... ok, if it was as smart as Einstein and as unerring as an Oracle it might make sense, but you can use it for free at Petals at 3 tok/sec and it is most certainly not...

Open LLM Leaderboard vs Reality: How do you evaluate "good" ? in c/localllama@poweruser.forum

[–] BalorNG@alien.top 1 points 2 years ago

Technically, you can somewhat automate the testing process by creating a script that makes that model aswer a series of questions that are relevant to YOU and are unique (so cannot be gamed by training for benchmarks) and evaluate those yourself.

Make sure you experiment using different sampling methods and run several tests due to inherent randomness of output.

Microsoft hires former OpenAI CEO Sam Altman in c/localllama@poweruser.forum

[–] BalorNG@alien.top 1 points 2 years ago (4 children)

Please dear Tzeench, have someone leak gpt4 in general confusion, I MUST know if this is really 10 7b models in a trench coat :)

FYI. Event Tomorrow. Mistral AI's Open Source Initiative: Ambitions, approaches, and roadmap ahead in c/localllama@poweruser.forum

[–] BalorNG@alien.top 1 points 2 years ago

My name is Mensch. Uber Mensch.

Look's like Mistral's cooking something tasty... no word on release date yet, though. in c/localllama@poweruser.forum

[–] BalorNG@alien.top 1 points 2 years ago (1 children)

He MUST become a CEO of Uber, too! :))))

1

Finally, a diffusion based LMM! (alien.top)

submitted 2 years ago by BalorNG@alien.top to c/localllama@poweruser.forum

6 comments fedilink

https://arxiv.org/abs/2310.17680

Ok, technically a tiny language model for now:

Imagine a developer who can only change their last line of code, how often would they have to start writing a function from scratch before it is correct? Auto-regressive models for code generation from natural language have a similar limitation: they do not easily allow reconsidering earlier tokens generated. We introduce CodeFusion, a pre-trained diffusion code generation model that addresses this limitation by iteratively denoising a complete program conditioned on the encoded natural language. We evaluate CodeFusion on the task of natural language to code generation for Bash, Python, and Microsoft Excel conditional formatting (CF) rules. Experiments show that CodeFusion (75M parameters) performs on par with state-of-the-art auto-regressive systems (350M-175B parameters) in top-1 accuracy and outperforms them in top-3 and top-5 accuracy due to its better balance in diversity versus quality.

And only for code. And seems it is much slower. But looks extremely interesting as "proof of concept".

I think that instead of a lot of "denoising" steps to generate text from gibberish, a dual-model system that takes a typical autoregressive input and than runs a few "denoising" steps to look for errors and inconsistencies might be best of both worlds, instead of typical methods of increasing model output quality like progressive refinement that require rewriting entire text token-by-token several times...