overview for FullOf_Bad

Deepseek llm 67b Chat & Base in c/localllama@poweruser.forum

[–] FullOf_Bad_Ideas@alien.top 1 points 1 year ago (1 children)

Grant of Copyright License. Subject to the terms and conditions of this License, DeepSeek hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare, publicly display, publicly perform, sublicense, and distribute the Complementary Material, the Model, and Derivatives of the Model.

I really really enjoy seeing perpetual irrevocable licenses.

Models Megathread #2 - What models are you currently using? in c/localllama@poweruser.forum

[–] FullOf_Bad_Ideas@alien.top 1 points 1 year ago

Remember to try 8-bit cache If you haven't yet, it should get you to 5.5k tokens context length.

You can get around 10-20k context length with 4bpw yi-34b 200k quants on single 24GB card.

Dear Model Mergers, Have You Solved Merger of Different Model Families? in c/localllama@poweruser.forum

[–] FullOf_Bad_Ideas@alien.top 1 points 1 year ago (3 children)

I've only seen merging of same-upstream-pretrained-model-at-same-size.

Not anymore.

Here's a merge of llama 2 13B and llama 1 33B https://huggingface.co/chargoddard/llama2-22b

how performant are current coding models in PyTorch and other deep learning coding? in c/localllama@poweruser.forum

[–] FullOf_Bad_Ideas@alien.top 1 points 1 year ago

Have you checked deepseek Coder instruct 33b already? I don't know about it's knowledge of pytorch but it's pretty much the best local coding model you can run, so it's your best shot.

Amazon Introduces Q, an A.I. Chatbot for Companies in c/localllama@poweruser.forum

[–] FullOf_Bad_Ideas@alien.top 1 points 1 year ago (1 children)

Instead, it uses an Amazon platform known as Bedrock, which connects several A.I. systems together, including Amazon’s own Titan as well as ones developed by Anthropic and META.

It's a llama! :D i wonder how they can comply with llama license, I think they have more than 700M customer.

Good to see more competitors at least, enterprise office people are totally In MS hands so that's not an area where open source end-to-end solutions have too much chance of competing, the only way to get them there is if big corp like Amazon adopts them in their infrastructure for a product like this.

Is Open LLM Leaderboard reliable source ? yi:34B is at the top but I get better results with neural-chat:7B model in c/localllama@poweruser.forum

[–] FullOf_Bad_Ideas@alien.top 1 points 1 year ago

Are you talking about base yi-34B or a fine-tuned one? Base model will be hard to use but will score pretty high. Benchmarks are generally written with completion in mind, so they work really well on base models and instruct tuning may make it much easier to work with but not necessarily score higher on benchmarks.

Serious inquiry: I've been tinkering a lot with finetuning and was wondering if it would be worth to buy a V100 of my own in c/localllama@poweruser.forum

[–] FullOf_Bad_Ideas@alien.top 1 points 1 year ago (1 children)

I can't corraborate results for Pascal cards. They had very limited FP16 performance, usually 1:64 of FP32 performance. Switching over to rtx 3090 ti from gtx 1080 got me around 10-20x gains in qlora training, assuming keeping the exact same batch size and ctx length, changing only calculations from fp16 to bf16.

Relationship of RAM to context size? in c/localllama@poweruser.forum

[–] FullOf_Bad_Ideas@alien.top 1 points 1 year ago

yeah, it will be a sum of tokens that the next token is generated on. I don't know how often KV cache is updated.

Could multiple 7b models outperform 70b models? in c/localllama@poweruser.forum

[–] FullOf_Bad_Ideas@alien.top 1 points 1 year ago

Yeah if that's the case, I can see gpt-4 requiring about 220-250B of loaded parameters to do token decoding

Maybe anecdotal but I have very high hopes for Yi 34b finetunes. in c/localllama@poweruser.forum

[–] FullOf_Bad_Ideas@alien.top 1 points 1 year ago

Here's the formula

batch_size * seqlen * (d_model/n_heads) * n_layers * 2 (K and V) * 2 (bytes per Float16) * n_kv_heads

https://www.databricks.com/blog/llm-inference-performance-engineering-best-practices

Relationship of RAM to context size? in c/localllama@poweruser.forum

[–] FullOf_Bad_Ideas@alien.top 1 points 1 year ago (2 children)

Formula to calculate kv cache, as in space used by context

batch_size * seqlen * (d_model/n_heads) * n_layers * 2 (K and V) * 2 (bytes per Float16) * n_kv_heads

https://www.databricks.com/blog/llm-inference-performance-engineering-best-practices

This blog post is really good, I recommend you to read it.

Usually bigger models have more layers, heads and dimensions, but I am not sure whether heads or dimensions grow faster. It's something you can look up though.

Could multiple 7b models outperform 70b models? in c/localllama@poweruser.forum

[–] FullOf_Bad_Ideas@alien.top 1 points 1 year ago (3 children)

Jondurbin made something like this with qlora.

The explanation that gpt-4 is MoE model doesn't make sense to me. Gpt4 api is 30x more expensive than gpt-3-5-turbo. Gpt-3-5 turbo is 175B parameters, right? So, if they had 8 220B experts, it wouldn't need to cost 30x more, it would be 20-50% more for API use. There was also some speculation that 3.5 turbo is 22B. In that case it also doesn't make sense to me that it would be 30x as expensive.