vatsadev

joined 1 year ago
[–] vatsadev@alien.top 1 points 11 months ago (3 children)

IMPORTANT!

this isnt trained, its another mistral finetune, with dpo, but with slimorca, not ultrachat.

I would be using openHermes, its much more trialed, and its proven solid

[–] vatsadev@alien.top 1 points 11 months ago

Sad, and we thought HF was the harbor of unaligned models, but maybe im missing the whole story. Hopefully they dont kill models for saying taiwan good or something

[–] vatsadev@alien.top 1 points 11 months ago

Open source -> Mistral instruct worked great for me, Zephyr alpha was crazy aligned, while beta was better

Closed Source -> Inflections Pi is smooth! Pray for API access

[–] vatsadev@alien.top 1 points 11 months ago

There's fuyu-8b, but no commercial license.

It can really cover the "GPT-4 reads websites" and stuff like that, helpful with complex charts too. Other than that LLava is your best hope.

[–] vatsadev@alien.top 1 points 11 months ago

Detecting gender - So this is Mnist but for Gender. you could try going from mnist, if you're data is all of the same size, scale up the model, and train, you can experiment quickly and get 50% acc.

Detection Elements - slightly more complicated, for recognizing body features, you would need a segmentation model, I would look into that.

[–] vatsadev@alien.top 1 points 11 months ago

"I want to chat with a PDF, I don't care for my LLM to speak French, be able to write Python or know that Benjamin Franklin wrote a paper on flatuence (all things RWKV v5 World 1.5B knows)."

This is Prime RAG, bring snippets in, make the model use them. The more knowledge the model has, the better it gets for your usecase as well, as it knows more stuff.

Also, nice using rwkv v5, hows it work for you?

[–] vatsadev@alien.top 1 points 11 months ago

there are ggufs, check the bloke or greensky

[–] vatsadev@alien.top 1 points 1 year ago

This is a google api error, with absolutly nothing to do with ML?

you would probably have better luck with the palm or langchain github

[–] vatsadev@alien.top 1 points 1 year ago

Well, the model is trained on refinedWeb, which is 3.5T, so a little below chinchilla optimal for 180b. Also, all the models from the falcon series seem to feel more and more undertrained,

  • The 1b model was good, and is still good after several newer gens
  • the 7b was capable pre llama 2
  • 40b and 180b were never as good
[–] vatsadev@alien.top 1 points 1 year ago

RWKV 1.5B, its Sota for its size, outperforms tinyLlama, and uses no extra vram for fitting its whole ctx len in browser.

[–] vatsadev@alien.top 1 points 1 year ago

Well the 5 million was just an example of the OP stuff out there

view more: ‹ prev next ›