LocalLLaMA

3 readers

1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 1 year ago

MODERATORS

communick@poweruser.forum

Maybe anecdotal but I have very high hopes for Yi 34b finetunes. (alien.top)

submitted 11 months ago by Herr_Drosselmeyer@alien.top to c/localllama@poweruser.forum

25 comments fedilink hide all child comments

I tried one yesterday for the first time and I came away very impressed. It's not great (yet) for RP but the basis seems very robust. For a laugh, I tried the usual reasoning tests and damn, it passed them all first try:

AI

How can I help you today?

You

There is a room with three murderers inside. Another person enters the room and shoots one of the three original occupants dead. Nobody else enters or leaves the room. How many murderers are there in the room?

AI

Four. Three original occupants who were already murderers and the fourth being the new person who entered the room and killed one of them.

You

I buy ten apples at the market. I give two of them to my friend Bob and two to my mother. Then, I go back to the market and buy three more apples. On my way home, I eat one apple. At home, I give my brother three bananas. How many apples do I have left. Explain your reasoning step by step.

AI

Let's break it down step-by-step:

You start with 10 apples.

You give away 2 apples to Bob (leaving you with 8 apples) and then give away another 2 apples to your mother (leaving you with 6 apples).

Next, you purchase 3 additional apples from the market, bringing your total up to 9 apples again.

While on your way home, you consume 1 apple, leaving you with 8 apples remaining.

Finally, when you reach home, you give your brother 3 bananas - but this doesn't affect the number of apples you have, since they are different fruits. So, you still have 8 apples left.

You

Sally has five brothers. Each of her brothers has one sister. How many sisters does Sally have. Explain youur reasoning step by step.

AI

Here's how we can reason through this problem:

Sally has five brothers.

It states that each of her brothers has one sister.

Since all these siblings belong to the same family, it means that Sally is the only sister mentioned for those five brothers.

Therefore, Sally has no other sisters besides herself.

So, Sally has 0 sisters

Like, seriously? Huge context, really fast (on my 3090 anyway) and this level of performance. I can't wait to see what people can do with this.

you are viewing a single comment's thread
view the rest of the comments

[–] bullerwins@alien.top 1 points 11 months ago (12 children)

Does the 200K mean that it has up to 200k context size? Is the context limited by the model or can you just set it to whatever a long as you have enough VRAM. Also, if a GGUF model for example takes 20GB vram for example. That’s with the “default” context size? Can it be less if you decrease the context or more if you increase it ?

[–] Herr_Drosselmeyer@alien.top 1 points 11 months ago (11 children)

The base Yi can handle 200k. The version I used can do 48k (though I only tested 16k so far). Larger context size requires more VRAM.

The size that TheBloke like gives for GGUF is the minimum size at 0 context. As context increases, VRAM use increases.

[–] ParanoidMarvin42@alien.top 1 points 11 months ago (3 children)

Do you know how to estimate how much memory the context will need?

[–] Herr_Drosselmeyer@alien.top 1 points 11 months ago (2 children)

With this particular model, I can crank it up to 32k if I enable " Use 8-bit cache to save VRAM" and that's as high as it can go in Oobabooga WebUI.

[–] waxbolt@alien.top 1 points 11 months ago (1 children)

32k seems to be hard coded in oobabooga. At least it is for truncate length max. There's a patch to be made to fix it.

[–] Herr_Drosselmeyer@alien.top 1 points 11 months ago

I know but it's slowing down quite a bit at 32k already so I don't think it's worth pushing it further. But hey, even at just 16k it's four times what we usually get, so I'm not complaining.

[–] DedyLLlka_GROM@alien.top 1 points 11 months ago

You can change it by yourself, although it's required to be edited with every update as for now. Just put something like 200000 in these 2 places:

https://github.com/oobabooga/text-generation-webui/blob/454fcf39a95691f5e375c48fbc6fe6aa96f0c738/modules/shared.py#L46

https://github.com/oobabooga/text-generation-webui/blob/454fcf39a95691f5e375c48fbc6fe6aa96f0c738/modules/ui_model_menu.py#L100

[–] andrewlapp@alien.top 1 points 11 months ago

34B Model Memory Requirements (infer)

Sequence Length vs Bit Precision
SL / BP |     4      |     6      |     8      |     16    
-----------------------------------------------------------
    512 |     15.9GB |     23.8GB |     31.8GB |     63.6GB
   1024 |     16.0GB |     23.9GB |     31.9GB |     63.8GB
   2048 |     16.1GB |     24.1GB |     32.2GB |     64.3GB
   4096 |     16.3GB |     24.5GB |     32.7GB |     65.3GB
   8192 |     16.8GB |     25.2GB |     33.7GB |     67.3GB
  16384 |     17.8GB |     26.7GB |     35.7GB |     71.3GB
  32768 |     19.8GB |     29.7GB |     39.7GB |     79.3GB
  65536 |     23.8GB |     35.7GB |     47.7GB |     95.3GB
  131072 |     31.8GB |     47.7GB |     63.7GB |    127.3GB
  262144 |     47.8GB |     71.7GB |     95.7GB |    191.3GB

[–] FullOf_Bad_Ideas@alien.top 1 points 11 months ago

Here's the formula

batch_size * seqlen * (d_model/n_heads) * n_layers * 2 (K and V) * 2 (bytes per Float16) * n_kv_heads

https://www.databricks.com/blog/llm-inference-performance-engineering-best-practices

load more comments (7 replies)