Technology

86751 readers

3844 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 3 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

108

Fable and Mythos: Anthropic says US lifts export ban on its advanced AI tools (www.bbc.co.uk)

submitted 1 month ago by LadyButterfly@reddthat.com to c/technology@lemmy.world

24 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] jbloggs777@discuss.tchncs.de 1 points 1 month ago (2 children)

My caveats were clearly stated... After capital expenditure, it's just operational costs, where electricity & cooling are the big ones.

At that point, it is insanely profitable to serve. The cheap API prices on open weights models hints at the profit margins involved in the US (the frontier labs and hyperscalers don't open their books for us), unsurprisingly)

Therefore, the longer they can serve existing and lower cost models at the current rates, the better for their bottom line. It's just common sense in business.

It doesn't mean the company as a whole is profitable. I expect we'll see turmoil in the coming months and years, and the prize will be compute capacity, with electricity & cooling options.

[–] Dran_Arcana@lemmy.world 3 points 4 weeks ago (2 children)

You can't just write off capital expenditure though. The hardware, even for "effecient" MOE inference is still very expensive to buy, house, run, and cool. Even assuming open-weight model serving at $0 r&d for the models themselves, mixing high-prefill workloads doesn't batch well with decode heavy concurrency (or other prefill-heavy jobs). The moment you do anything nontrivial you start running into very complicated architectural problems to efficiently solve at scale.

Hardware that is useful for 5-10 years at most, plus development and support for the inference workflows, doesn't leave a lot of margin on the table.

My gut, along with basically everything I read, suggests that not most (even pure inference) shops are not profitable and are still floating on loans or vc money.

[–] jbloggs777@discuss.tchncs.de 1 points 4 weeks ago

If you assume they are unprofitable, the Q only becomes whether they are more or less unprofitable by serving the older models for longer.

[–] MangoCats@feddit.it 1 points 4 weeks ago

At 10 years lifetime, it's sounding like the hardware costs as much to buy as it does to run - not factoring in time value of money...

[–] MangoCats@feddit.it 1 points 4 weeks ago (1 children)

I just asked Gemini to estimate run costs for a local GLM-5.2 instance, something that a team of a few software engineers might use the way they are using Cursor today... power budget is 6KW, which around here - after facility cooling costs - works out around $1000 per month. Our Cursor subscriptions have $100 per month price tags on them for the developers who use them most extensively, and this $100K to buy in $1K per month to run local instance isn't likely to serve more than a dozen engineers efficiently. Even if you can lease it out at full utilization 24 hours a day, it doesn't sound like much of a money printing machine to me, yet.

My $20/month home subscription to Claude? Even less so.

[–] jbloggs777@discuss.tchncs.de 1 points 4 weeks ago

Economies of scale... And you said instance, which is AWS terminology... If you have the scale and the expertise to run a DC efficiently, expect significant savings. We pay a premium for opex over capex.