this post was submitted on 05 May 2026

43 points (97.8% liked)

Ask Lemmy

39454 readers

1425 users here now

A Fediverse community for open-ended, thought provoking questions

Rules: (interactive)

1) Be nice and; have fun

Doxxing, trolling, sealioning, racism, toxicity and dog-whistling are not welcomed in AskLemmy. Remember what your mother said: if you can't say something nice, don't say anything at all. In addition, the site-wide Lemmy.world terms of service also apply here. Please familiarize yourself with them

2) All posts must end with a '?'

This is sort of like Jeopardy. Please phrase all post titles in the form of a proper question ending with ?

3) No spam

Please do not flood the community with nonsense. Actual suspected spammers will be banned on site. No astroturfing.

4) NSFW is okay, within reason

Just remember to tag posts with either a content warning or a [NSFW] tag. Overtly sexual posts are not allowed, please direct them to either !asklemmyafterdark@lemmy.world or !asklemmynsfw@lemmynsfw.com. NSFW comments should be restricted to posts tagged [NSFW].

5) This is not a support community.

It is not a place for 'how do I?', type questions. If you have any questions regarding the site itself or would like to report a community, please direct them to Lemmy.world Support or email info@lemmy.world. For other questions check our partnered communities list, or use the search function.

6) No US Politics.

Please don't post about current US Politics. If you need to do this, try !politicaldiscussion@lemmy.world or !askusa@discuss.online

Reminder: The terms of service apply here too.

Partnered Communities:

Logo design credit goes to: tubbadu

founded 2 years ago

MODERATORS

Bluetreefrog@lemmy.world

Asudox@lemmy.world

lemmy_bot@lemmy.world

beefbaby182@lemmy.world

ModeratorCan@lemmy.world

neidu3@sh.itjust.works

asudox@lemmy.asudox.dev

candyman337@lemmy.world

candyman337@sh.itjust.works

Is AI inference getting cheaper or more expensive over time? (lemmy.world)

submitted 3 days ago by GamingChairModel@lemmy.world to c/asklemmy@lemmy.world

32 comments fedilink hide all child comments

I've read some of Ed Zitron's long posts on why the AI industry is a bubble that will never be profitable (and will bring down a lot of companies and investors), and one of the recurring themes is that the AI companies are trying to capture growing market share in an industry where their marginal profits are still negative, and that any increase in revenue necessarily increases their costs of providing their services.

But some of the comments in various HackerNews threads are dismissive, saying that each new generation of models makes the cost of inference lower, so that with sufficient customer volume, the companies running the models can make enough profit on inference to make up for the staggering up-front capital expenditures it took to build out the data centers, train their models, etc.

It's all pretty confusing to me. So for those of you who are familiar with the industry, I have several questions:

Is the cost of running any given pretrained model going down, for specific models? Are there hardware and software improvements that make it cheaper to run those models, despite the model itself not changing?
Is the cost of performing a particular task at a particular quality level going down, through releases of newer models of similar performance (i.e., a smaller model of the current generation performing similarly to a bigger model of the previous generation, such that the cost is now cheaper)?
Is the cost of running the largest flagship frontier models going down for any given task? Or does running the cutting edge show-off tasks keep increasing in cost, but where the companies argue that the improvement in performance is worth the cost increase?

I suspect that the reason why the discussion around this is so muddled online is because the answers are different depending on which of the 3 questions is meant by "is running an AI model getting cheaper over time?" And the data isn't easy to synthesize because each model has different token prices and different number of tokens per query.

But I wanted to hear from people who are knowledgeable about these topics.

you are viewing a single comment's thread
view the rest of the comments

[–] General_Effort@lemmy.world -1 points 3 days ago (1 children)

FYI: Ed Zitron is a PR expert. He has no background in engineering or finance.

He has the skills to make people listen to him and give him money. He does not have the skills to determine if any of his assertion are true or not. If you're wondering if I'm calling him a liar, then I can only say that I can't read minds. If you're not wondering, then you weren't paying attention.

[–] GamingChairModel@lemmy.world 3 points 3 days ago (1 children)

FYI: Ed Zitron is a PR expert. He has no background in engineering or finance.

I'm not super interested in people's credentials (good or bad). I need for the actual substance of the words on the page to be well supported and well reasoned.

Zitron does the work of actually gathering the public statements (across SEC filings, public disclosures, public or leaked documents) and crafting a narrative around those statements. He links to original source documents a lot. Other people should be doing the same, but for whatever reason not a lot of other people are.

He needs an editor. His articles could be better organized, more tightly argued, and more focused in scope.

I have some skepticism about many of the extrapolations that he makes from the facts, but on my read, his factual claims are mostly well supported. When he calls other people liars by showing those contradictions out in the open, I think those arguments stand for themselves regardless of what his background, credentials, or even motivations are. So I draw a line between his factual claims about the past and present and his predictions about the future.

And that's the reason for this thread. He makes factual claims about the exponential rise in costs for these companies, and infers/extrapolates into the future with it, but I want to check whether those extrapolations actually fit the data we already can see. That's what I'm trying to learn by asking here.

Of course, if you have specific examples of him making false statements about the past or present (no need to attribute intentionality to the speaker), I'd love to see those, too.

[–] General_Effort@lemmy.world 0 points 3 days ago (1 children)

Do you feel that he has given you the right idea about where AI was heading in the last few years?

[–] GamingChairModel@lemmy.world 3 points 3 days ago (1 children)

On some issues, absolutely.

He flagged the issue with flat rate subscriptions not making any sense for the underlying token pricing and usage by users, and predicted that a lot of the AI startups that act as some kind of subscription middleman would feel the squeeze and eventually impose rate limits/quotas, degrade the quality of their offerings (i.e., push users towards cheaper models), or fail. I think that's a pretty good summary of what has been happening at the user/pricing level with Perplexity, Lovable, and Cursor. Microsoft's Copilot plans are also seeing a lot of changes to pricing and rate limits, as well as model choice, in ways that user complaints have gotten louder in the past month or two.

He was a skeptic on Stargate right out of the gate, and I think that external visibility into how that loose collection of projects under that banner has been going over the past year shows that something inside is fundamentally wrong. That isn't necessarily an indictment of the broader AI ecosystem as a whole, but Zitron's most pointed financial criticism has been directed at OpenAI and Oracle, and the costs of data center construction. Those criticisms have looked especially prescient this calendar year (and generally fits into my preconceived notions that building physical stuff is slow and expensive and that we Americans aren't very good at keeping megaprojects on schedule and under budget).

I'm a money guy. I don't have any special expertise in industry trends and how money will be spent in the future on industries where I'm not an insider (i.e., AI), but I find Zitron's accounting of how money is being spent in the present to largely seem accurate. So that's why I'm in this thread asking people about how they see the present and the future of spending/pricing/volume, to see if those projections of revenue needed are actually feasible.

[–] General_Effort@lemmy.world 0 points 2 days ago

That's not really impressive, is it? Not something for which you'd have to look up disclosures. Prices are adjusted in the face of supply and demand. Trump announcements are bull...