this post was submitted on 28 Nov 2023
1 points (100.0% liked)

LocalLLaMA

1 readers
1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 10 months ago
MODERATORS
 

https://huggingface.co/deepnight-research

I'm not affiliated with this group at all, I was just randomly looking for any new big merges and found these.

100B model: https://huggingface.co/deepnight-research/saily_100B

220B model: https://huggingface.co/deepnight-research/Saily_220B

600B model: https://huggingface.co/deepnight-research/ai1

They have some big claims about the capabilities of their models, but the two best ones are unavailable to download. Maybe we can help convince them to release them publicly?

top 44 comments
sorted by: hot top controversial new old
[–] planetofthemapes15@alien.top 1 points 9 months ago

This is fun, I should publish a 1T model called "AGI-QSTAR-1T" and say it's as good as GPT-5 but no you may not see it.

"Oh and BTW if you want to hire me, I'm willing to accept $1M/yr jobs."

[–] yiyecek@alien.top 1 points 9 months ago

Huggingface should add a dislike button

[–] opi098514@alien.top 1 points 9 months ago (2 children)

It’s the best out there…. But no you can’t try it because it’s to dangerous.

[–] SomeOddCodeGuy@alien.top 1 points 9 months ago (1 children)

Right. This part right here is very suspicious to me, and I'm taking their claims with a grain of salt.

No! The model is not going to be available publically. APOLOGIES. The model like this can be misused very easily. The model is only going to be provided to already selected organisations.

[–] bot-333@alien.top 1 points 9 months ago (1 children)

I think they changed it to it’s still an experiment and they are finishing evaluations to better understand the model.

[–] Illustrious_Sand6784@alien.top 1 points 9 months ago (1 children)

No they haven't, on the 220B model it's always been that message above, while on the 600B model it's a message similar to the one you stated.

[–] bot-333@alien.top 1 points 9 months ago

I guess they might open source the 600B one? They have different names, so maybe different training approaches.

[–] VertexMachine@alien.top 1 points 9 months ago (1 children)

I doubt there is any model really... follow the trail, you'll end up at a company founded by single person from India (who is founder of another company with a single app for collaborative drawing)... that at least doesn't have any employees on LinkedIn...

And the founder looks like a relatively young person that most likely wouldn't be even able to gather the required funding to have enough GPU compute for making model that's better than gpt4 (or know how). I think that's just a front for him trying to get some hype or funding.

[–] opi098514@alien.top 1 points 9 months ago (2 children)

Uuummmm no. It’s for sure real. And the best one out there. No questions asked. It’s better that CHATGPT 4 and OpenAI has been trying to hack this new company to get the 600b model because they are scared that it will end OpenAI for good.

Obligatory /s

[–] aurumvexillum@alien.top 1 points 9 months ago (1 children)

You forgot to mention that your uncle is the CEO of OpenAI! 😉

[–] opi098514@alien.top 1 points 9 months ago

Well that’s because he’s not. Sam is actually my dad.

[–] LetsGoBrandon4256@alien.top 1 points 9 months ago (2 children)

https://in.linkedin.com/company/deepnight

View 1 employee

Work experience: Google Startup Alumni

lmao

[–] ananthasharma@alien.top 1 points 9 months ago

A cursory look at the website makes me think these guys don’t know what they are doing

[–] opi098514@alien.top 1 points 9 months ago

Everything on that page is hype for something that doesn’t exist.

[–] a_beautiful_rhind@alien.top 1 points 9 months ago

Somebody pilfer this thing and quant it. We can run the 100B for sure. At least at Q3.

[–] You_Wen_AzzHu@alien.top 1 points 9 months ago (3 children)

We need some 4090s with 500gb VRAM modified in China if possible.

[–] mpasila@alien.top 1 points 9 months ago (2 children)

the devs mentioned that the 600B model takes about 1,3TB space alone..

[–] MannowLawn@alien.top 1 points 9 months ago (1 children)

Give it 5 years with the Mac Studio. Next year 256gb, will go up real quick.

[–] BangkokPadang@alien.top 1 points 9 months ago

Honestly, a 4bit quantized version of the 220B model should run on a 192GB M2 Studio, assuming these models could even work with a current transformer/loader.

[–] 9wR8xO@alien.top 1 points 9 months ago

Make it 0.01bpm quantized and you will fit in good ol' 3090.

[–] LocoMod@alien.top 1 points 9 months ago

We need some hero to develop an app that downloads more GPU memory like those apps back in the 90's. /s

[–] iCantHack@alien.top 1 points 9 months ago (1 children)

I wonder if there's any real demand for even 48GB 4090s enough to incentives somebody to do it. I bet the hardware/electronics part of it is trivial, tho.

[–] BangkokPadang@alien.top 1 points 9 months ago

If people started doing this with any regularity, nVidia would intentionally bork the drivers.

[–] OVAWARE@alien.top 1 points 9 months ago (1 children)

Its private so there is absolutely 0 way to confirm its quality

[–] ninjasaid13@alien.top 1 points 9 months ago

Not just private but closed access.

[–] FaustBargain@alien.top 1 points 9 months ago (1 children)

how much ram do you think the 600B would take? I have 512gb and I can fit another 512gb in my box before I run out of slots. I think with 1TB I should be able to run it unquantized because falcon 180b used slightly less than half my ram.

[–] theyreplayingyou@alien.top 1 points 9 months ago (1 children)

Can you please share a bit more about your setup and experiences?

I've been looking to use some of my idle enterprise gear for LLM's but everyone tells me not to bother. I've got a few dual xeon boxes with quad channel DDR4 in 256 & 384GB capacities, NVMe or RAID10 SSDs, 10GBe, etc and I guess (having not yet experienced it) I have a hard time imagining that the equivalent of 120Ghz, 1/2 - 1tb of RAM and 7GB/s disk reads "not being fast enough." I don't need instant responses from a sex chatbot, rather I would like to run a model that can help my wife (in the medical field) with work queries, to help my school age kid with math and grammar questions, etc.

Thank you much!

[–] FaustBargain@alien.top 1 points 9 months ago

if you have the ram don't worry about disk at all. if you have to drop to any kind of disk even if it's gen 5 ssd you speeds will tank. memory bandwidth matters so much more than compute for LLMs, but it all depends on your needs. there are probably cheaper ways to go about this if you just need something occasionally. maybe runpod or something, but if you need a lot of inference then locally could save you money, but renting a big machine with a100s will always be faster. so will a 7B model do what you need or do you need the accuracy and comprehension of a 70b or one of the new 120b merges? also llama3 is supposed to be out in jan/feb and if it's significantly better then everything changes again.

[–] wind_dude@alien.top 1 points 9 months ago

so it sounds like for the 600b they just finetuned llama2 again with the same stuff Llama2 was trained with, just more of it...

RefinedWeb

Opensource code from GitHub

Common Crawl we fine-tuned the model on a huge dataset (generated manually and with automation) for logical understanding and reasoning. We also trained the model for function calling capabilities.

[–] FaustBargain@alien.top 1 points 9 months ago

wait the 100B one says it's based on llama2-chat? did they take the llama 2 foundational model, up the parameter count, and just continue training?

[–] FaustBargain@alien.top 1 points 9 months ago

"organisations"...

[–] BalorNG@alien.top 1 points 9 months ago (1 children)

"Prompt Template: Alpeca" Wut?

Looks like a scam to be fair. I bet if you apply, you'll get "Just send us 100$ for access!"

[–] LetsGoBrandon4256@alien.top 1 points 9 months ago

Those Microsoft tech support scam calls are reaching new level.

[–] noeda@alien.top 1 points 9 months ago (2 children)

Some quotes I found on the pages:


"No! The model is not going to be available publically. APOLOGIES. The model like this can be misused very easily. The model is only going to be provided to already selected organisations."

"[SOMETHING SPECIAL]: AIN'T DISCLOSING!🧟"

"Hallucinations: Reduced Hallucinations 8x compared to ChatGPT 🥳"


My guess: it's just another merge like Goliath. At best it's marginally better than a good 70B.

I can also "successfully build 220B model" easily with mergekit. Would it be good? Probably not.

The lab should write on their model card why should I not think it's just bullshit. Not exactly the first mystery lab making big claims.

[–] VertexMachine@alien.top 1 points 9 months ago

I doubt there's any model there.

[–] PookaMacPhellimen@alien.top 1 points 9 months ago

Wonder if GPT4 is just a series of merges

[–] swagonflyyyy@alien.top 1 points 9 months ago

Inb4 The Bloke Quantizes it to about 100B size.

[–] BayesMind@alien.top 1 points 9 months ago

We need a different flair for New Models vs New Merge/Finetune

[–] UnignorableAnomaly@alien.top 1 points 9 months ago

Deepnight were the guys that uploaded upstage's instruct v2, claimed it was their own, then deleted with an oopsie whoopsie.
I am skeptical.

[–] Exotic-Estimate8355@alien.top 1 points 9 months ago

Np I’ll quantize to 0.001 bpw

[–] Ok_Library5522@alien.top 1 points 9 months ago

I don't understand, are all these models based on Llama? How much better is 100b than goliath 120b? there are a lot of questions. As far as we know, Goliath was made by an AI lover. Did the team make these three models?

[–] sahil1572@alien.top 1 points 9 months ago

It's a scam!

[–] OVAWARE@alien.top 1 points 9 months ago
[–] Few_Acanthisitta_858@alien.top 1 points 9 months ago

They have lifted the gate from the 100B model... Seems like it's pending evaluation in the OpenLLM Leaderboard. They're also saying they'll lift the gate from the 220B model before Christmas

Let's see if all this is just a publicity stunt or really they did it.