this post was submitted on 11 Feb 2025
1 points (52.2% liked)

Technology

1934 readers
276 users here now

Which posts fit here?

Anything that is at least tangentially connected to the technology, social media platforms, informational technologies and tech policy.


Rules

1. English onlyTitle and associated content has to be in English.
2. Use original linkPost URL should be the original link to the article (even if paywalled) and archived copies left in the body. It allows avoiding duplicate posts when cross-posting.
3. Respectful communicationAll communication has to be respectful of differing opinions, viewpoints, and experiences.
4. InclusivityEveryone is welcome here regardless of age, body size, visible or invisible disability, ethnicity, sex characteristics, gender identity and expression, education, socio-economic status, nationality, personal appearance, race, caste, color, religion, or sexual identity and orientation.
5. Ad hominem attacksAny kind of personal attacks are expressly forbidden. If you can't argue your position without attacking a person's character, you already lost the argument.
6. Off-topic tangentsStay on topic. Keep it relevant.
7. Instance rules may applyIf something is not covered by community rules, but are against lemmy.zip instance rules, they will be enforced.


Companion communities

!globalnews@lemmy.zip
!interestingshare@lemmy.zip


Icon attribution | Banner attribution


If someone is interested in moderating this community, message @brikox@lemmy.zip.

founded 1 year ago
MODERATORS
 

cross-posted from: https://lemmy.sdf.org/post/29335261

cross-posted from: https://lemmy.sdf.org/post/29335160

Here is the original report.

The research firm SemiAnalysis has conducted an extensive analysis of what's actually behind DeepSeek in terms of training costs, refuting the narrative that R1 has become so efficient that the compute resources from NVIDIA and others are unnecessary. Before we dive into the actual hardware used by DeepSeek, let's take a look at what the industry initially perceived. It was claimed that DeepSeek only utilized "$5 million" for its R1 model, which is on par with OpenAI GPT's o1, and this triggered a retail panic, which was reflected in the US stock market; however, now that the dust has settled, let's take a look at the actual figures.

...

top 9 comments
sorted by: hot top controversial new old
[–] pancake@lemmygrad.ml 15 points 1 week ago* (last edited 1 week ago)

400 times higher

That is the cost of the entire semiconductor stock the company owns, plus 4 year ownership. I highly doubt they ran the training step for 4 years, using all hardware resources available to them, and somehow also destroyed all of the GPUs in the process.

[–] RedWizard@hexbear.net 13 points 1 week ago

Sounds like cope.

[–] BrikoX@lemmy.zip 12 points 1 week ago* (last edited 1 week ago) (1 children)

This title doesn't even represent the analysis, literally a false claim.

Also, please read the report. They make a lot of claims and speculations which are possible, but don't prove even single case in their analysis.

[–] Hotznplotzn@lemmy.sdf.org 2 points 1 week ago (1 children)

I respectfully disagree. The analysis provides much more input that Deepseek's press release claiming its USD 5m budget (and some other points -e.g. of being Open Source while it isn't, and other points.)

[–] BrikoX@lemmy.zip 2 points 1 week ago* (last edited 1 week ago) (1 children)

It provides a bunch of claims which it fails to prove (they don't even bother to prove them to be honest).

It's like me saying "Based on my own analysis @Hotznplotzn@lemmy.sdf.org is likely a paid actor". Without any evidence it's meaningless claim that nobody will take seriously.

~~And it is open source by OSI definition. The only thing they don't provide is the raw training data, which OSI definition doesn't require to qualify.~~

[–] Hotznplotzn@lemmy.sdf.org 2 points 1 week ago (1 children)

The definition says it must include data information ("the complete description of all data used for training, including (if used) of unshareable data, disclosing the provenance of the data, its scope and characteristics, how the data was obtained and selected, the labeling procedures, and data processing and filtering methodologies"), as well as code and paramters. Read your link.

The guys at Hugging Face are working on a more open model based on Deepseek as they also claim it is not fully Open Source.

Thank you for stating that "@Hotznplotzn@lemmy.sdf.org is likely a paid actor" being baseless. It indeed is, although your hint is not too friendly.

[–] BrikoX@lemmy.zip 1 points 1 week ago* (last edited 1 week ago)

You are right, it indeed doesn't quality under OSI definition. I wasn't aware they didn't share the code for training the model. My bad on assuming they did, based on the public GitHub repo.

Even then, it's still the most open commercial model out there that rivals anything US Big Tech managed to come up with using their unlimited budget. There is no diminishing that. Lack of training code only affects other companies with enough resources to build it. It's a huge win for consumers and huge embarrassment for the US companies.

P.S. There isn't such a thing as "not fully open source". It either is or it's not.

[–] m532@lemmygrad.ml 9 points 1 week ago

I never heard of this SemiAnalysis, I bet they are paid by ClosedAI (and NoVideo obviously) to make shit up

ClosedAI = sore LOSERS

[–] Gradually_Adjusting@lemmy.world -5 points 1 week ago* (last edited 1 week ago)

I fucking knew it.

Receipts: https://lemmy.world/comment/14778286

I did say it would recover in "a week". It's been two and it hasn't fully, but you wouldn't be complaining if you'd bought calls then, either.