Monkey_1505

joined 1 year ago
[–] Monkey_1505@alien.top 1 points 11 months ago

I think that's where the real performance will be. Not sure about vram, but probably would make sense to start with mistral 11b, or llama-2 20b splices. Proof of concept.

[–] Monkey_1505@alien.top 1 points 11 months ago (3 children)

IMO don't bother with Frankenstein models unless you plan to seriously train them with a broad dataset. They just tend towards getting confused, not following instructions etc. You'd probably need to run an orca dataset at it, and then some RP on top.

[–] Monkey_1505@alien.top 1 points 11 months ago

I dislike Frankenstein models. the 20b, the 120b they are all the same - major confusion, can't follow logic or instructions properly. Great prose, but pretty useless for that reason.

Someone would have to invest some major training on one of them before it'd be any good.

[–] Monkey_1505@alien.top 1 points 11 months ago

Why do people brew their own beer, or grow their own weed?

It's because they want to be more connected to the process, in control of it, and cut out the middleman. Also, local models probably won't destroy civilization.

[–] Monkey_1505@alien.top 1 points 11 months ago

For instruct specifically, certain models do better with certain things. OpenChat, OpenHermes and Capybara seem to be the best. But they will all underperform next to a good merge/finetune of a 13B model. Depending on the type of instruction one of those will be better than the others.

For repetition this seems to fall away somewhat with very long context sizes. Because of the sliding window, it can handle these context sizes, and if you use something like llamacpp the context can be reused such that you won't have to process the whole prompt each time.

7b is generally better for creative writing, however, there are as I said, specific types of instructions they will handle well.

 

According to these leaks they seem to be developing something similar to Apples M series chipset shared lpddr-5 ram, shared across iGPU and CPU (I think?).

Looks like a maximum of 32gb though. And probably not as fast as the high end macs. But still promising that other, notably cheaper manufacturers are trying to copy this design.

https://www.techpowerup.com/315941/intel-lunar-lake-mx-soc-with-on-package-lpddr5x-memory-detailed

[–] Monkey_1505@alien.top 1 points 11 months ago

I wouldn't rule it out. If some company wins the big pie slices, the others might then decide the best offense is co-operation. Nothing to depend on ofc.

[–] Monkey_1505@alien.top 1 points 11 months ago (2 children)

Here I am just hoping any of it becomes open source.

IDC about more wannabe corporate models.

[–] Monkey_1505@alien.top 1 points 11 months ago

My guess is some kind of exploit, feature or flaw that he knew about with potential PR impacts, that he didn't tell them. Something akin to knowing Bing would go Sydney.

[–] Monkey_1505@alien.top 1 points 11 months ago

Knowledge is a strange goal for any model when we have the internet. IMO. Just connect your model to a web search.

[–] Monkey_1505@alien.top 1 points 11 months ago

Having used it a lot, I can say for sure that without much prompting it readily produces junk web text, urls etc, so it is not a fully filtered or fully synthetic dataset.

My guess would be that it's just 'a bit better filtered than llama-2', and maybe slightly more trained on that set. Slightly better quality set, slightly more trained on that set.

My intuition based on this, is that per parameter size EVERYTHING open source could be optimized considerably more.

[–] Monkey_1505@alien.top 1 points 11 months ago

ST. By far the most customizability.

[–] Monkey_1505@alien.top 1 points 11 months ago (7 children)

Mostly silly tavern.

view more: next β€Ί