LocalLLaMA

3 readers

1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 1 year ago

MODERATORS

communick@poweruser.forum

Orca 2: Teaching Small Language Models How to Reason (www.microsoft.com)

submitted 1 year ago by Memories-Of-Theseus@alien.top to c/localllama@poweruser.forum

14 comments fedilink hide all child comments

top 14 comments

sorted by: hot top controversial new old

[–] PwanaZana@alien.top 1 points 1 year ago

Obvious question (and I'm assuming the answer is We didn't try it yet): How does this model fare in terms of performance/output?

[–] TheCrazyAcademic@alien.top 1 points 1 year ago

It'd be interesting to see how an MoE framework of multiple Orca 2s each trained on different subsets of data basically routing your prompt to different orca 2 experts would fair. I feel like that can come extraordinarily close to a GPT 4 in performance metrics but would take decent computing power to test the hypothesis. If each orca 2 expert is 10 billion parameters and you wanted to run a 100 billion sparse orca 2 MoE that's gonna require at least 500 gig+ of VRAM at minimum.

[–] Amgadoz@alien.top 1 points 1 year ago (2 children)

Important: researcher only, non commercial license.

[–] bearbarebere@alien.top 1 points 1 year ago

Ugh

[–] CosmosisQ@alien.top 1 points 1 year ago

IANAL, but theoretically, it's not possible to copyright model weights (at least in the US). While the licensing of large language models hasn't been specifically tested in court, people have tried and failed with other machine learning models. The alleged copyright holder may refuse to do business with you in the future, but you're unlikely to face legal repercussions.

[–] Slimxshadyx@alien.top 1 points 1 year ago

Wow! Exciting! Are these uncensored models or does the training data include refusals? Does anyone know? What was orca 1?

[–] yahma@alien.top 1 points 1 year ago (1 children)

Do we get the dataset this time?

[–] professorlust@alien.top 1 points 1 year ago

Given the legal challenges to the use of training data, you’re probably never going to see the public release of training data of a major corporation LLM.

There will be leaks from time to time but no corporation will expose themselves to litigation just help the open source community

[–] visarga@alien.top 1 points 1 year ago

Tried the models, the 13B is very slow, the 7B is speedy but a little quirky. It made the plan how to solve the task but didn't actually proceed in solving the task. It doesn't have good conversational flair.

[–] eggandbacon_0056@alien.top 1 points 1 year ago

Come on stop that bs smh ...

https://preview.redd.it/u02s07bxup1c1.png?width=1567&format=png&auto=webp&s=f64dbc83fcd64fbf33c18007f3b8b45419703179

[–] LinuxSpinach@alien.top 1 points 1 year ago

Progressive Learning: We start with LLaMA-2-7B or LLaMA-2-13B checkpoint and
finetune it on the train split of FLAN-v2 dataset for one epoch. Note that FLAN-v2 dataset
contains both zero-shot and few-shot problems. We then train on 5 million ChatGPT data
from Orca 1 for 3 epochs. Then we train on the combination of 1 million GPT-4 data from
Orca 1 and Orca 2’s 817K data for 4 epochs.

[–] hwpoison@alien.top 1 points 1 year ago

We love you TheBloke https://huggingface.co/TheBloke/Orca-2-7B-GGUF

[–] littlexxxxx@alien.top 1 points 1 year ago

The paper does not explain the real interesting question to me, which is the reasoning strategy and its related system instruction for each sub-tasks, and how did they select the strategy for each clustered sub-task, manually or through some prompts by leveraging openai api.

If they did the main task by hand, then this paper is not insightful and useful at all.

[–] xplode145@alien.top 1 points 1 year ago

can someone give me ELI5 version of how can i train ORca2 with my local data files/folders? pretty please.