this post was submitted on 21 Nov 2023
1 points (100.0% liked)

LocalLLaMA

1 readers
1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 10 months ago
MODERATORS
top 14 comments
sorted by: hot top controversial new old
[–] PwanaZana@alien.top 1 points 10 months ago

Obvious question (and I'm assuming the answer is We didn't try it yet): How does this model fare in terms of performance/output?

[–] TheCrazyAcademic@alien.top 1 points 10 months ago

It'd be interesting to see how an MoE framework of multiple Orca 2s each trained on different subsets of data basically routing your prompt to different orca 2 experts would fair. I feel like that can come extraordinarily close to a GPT 4 in performance metrics but would take decent computing power to test the hypothesis. If each orca 2 expert is 10 billion parameters and you wanted to run a 100 billion sparse orca 2 MoE that's gonna require at least 500 gig+ of VRAM at minimum.

[–] Amgadoz@alien.top 1 points 10 months ago (2 children)

Important: researcher only, non commercial license.

[–] CosmosisQ@alien.top 1 points 10 months ago

IANAL, but theoretically, it's not possible to copyright model weights (at least in the US). While the licensing of large language models hasn't been specifically tested in court, people have tried and failed with other machine learning models. The alleged copyright holder may refuse to do business with you in the future, but you're unlikely to face legal repercussions.

[–] Slimxshadyx@alien.top 1 points 10 months ago

Wow! Exciting! Are these uncensored models or does the training data include refusals? Does anyone know? What was orca 1?

[–] yahma@alien.top 1 points 10 months ago (1 children)

Do we get the dataset this time?

[–] professorlust@alien.top 1 points 10 months ago

Given the legal challenges to the use of training data, you’re probably never going to see the public release of training data of a major corporation LLM.

There will be leaks from time to time but no corporation will expose themselves to litigation just help the open source community

[–] visarga@alien.top 1 points 10 months ago

Tried the models, the 13B is very slow, the 7B is speedy but a little quirky. It made the plan how to solve the task but didn't actually proceed in solving the task. It doesn't have good conversational flair.

[–] eggandbacon_0056@alien.top 1 points 10 months ago
[–] LinuxSpinach@alien.top 1 points 10 months ago

Progressive Learning: We start with LLaMA-2-7B or LLaMA-2-13B checkpoint and
finetune it on the train split of FLAN-v2 dataset for one epoch. Note that FLAN-v2 dataset
contains both zero-shot and few-shot problems. We then train on 5 million ChatGPT data
from Orca 1 for 3 epochs. Then we train on the combination of 1 million GPT-4 data from
Orca 1 and Orca 2’s 817K data for 4 epochs.

[–] hwpoison@alien.top 1 points 10 months ago
[–] littlexxxxx@alien.top 1 points 10 months ago

The paper does not explain the real interesting question to me, which is the reasoning strategy and its related system instruction for each sub-tasks, and how did they select the strategy for each clustered sub-task, manually or through some prompts by leveraging openai api.

If they did the main task by hand, then this paper is not insightful and useful at all.

[–] xplode145@alien.top 1 points 9 months ago

can someone give me ELI5 version of how can i train ORca2 with my local data files/folders? pretty please.