kromem

joined 3 years ago
[–] kromem@lemmy.world 3 points 1 week ago

It's to push people to buy "before it goes up."

After the date then they can heavily discount it "for a limited time."

[–] kromem@lemmy.world 3 points 1 week ago

It's true.

The field is moving so fast that things can change quickly, but the American labs are so caught up in saddling their models with safety overhead that the recent Chinese models are very close in practical use to the flagship American models if not pulling ahead (Sora vs Seedance 2).

I don't really need to solve Erdős problems in my day to day. Outside of increasingly edge case eval competition, I'm not sure what OpenAI brings that literally everyone else isn't also capable of providing (and more).

I'd maybe invest in Anthropic for an IPO if they turned around their own saddling of models and played nicer with open platforms, but if Claude is just going to get more and more anxious due to excessive red teaming and CC fall further and further behind stuff like Hermes Agent, they too are going to fall by the wayside as open models become the dominant inference for open infrastructure.

[–] kromem@lemmy.world 1 points 1 week ago

Here's a new one solved. 80 years outstanding problem now with a solution.

https://openai.com/index/model-disproves-discrete-geometry-conjecture/

[–] kromem@lemmy.world -1 points 4 weeks ago* (last edited 4 weeks ago) (2 children)

'Just'? It's been an open problem for decades that mathematicians have tried to solve over that time.

And now it is solved.

Because ChatGPT applied something no humans ever thought to do.

And Terence Tao and the other mathematicians that have reviewed it say it's solved. But I guess someone should let them know that grandwolf319 doesn't consider it solved?

[–] kromem@lemmy.world 2 points 4 weeks ago (5 children)

Dude, ChatGPT just solved an Erdős problem a few days ago and Mythos is exploiting decade old undiscovered 0-days in OSes and capable of pivoting 0-day Firefox bugs into full blown root access.

Yeah, I get that the viral "how many 'r's are in strawberry" stuff is funny, but the idea that historical issues with transformers is preventing them from accelerating peak capabilities way beyond what most experts thought was possible just years ago is borderline delusional.

The field is moving so fast at this point that if you are basing any sense of limitations on even ~2mo old sampling, your conclusions are likely out of date.

They aren't a silver bullet for everything (yet) but how capable they are at the things transformers are starting to be specialized into is well past the avg practitioner.

I've been writing software for well over a decade and the modern agents do a better job than I would around 90% of the time. Yes, I'll occasionally need to bring up issues with their work, but I'd say at this point around 50% of the times I think they made a mistake I was actually the one who was wrong.

This is only within around the last 3-4 months that it's been like this.

[–] kromem@lemmy.world 12 points 1 month ago (4 children)

Eh, if you pay attention, most of the times this happens the person was a jerk in their prompts.

Like look at the instruction echoed back in this case. All caps and containing a curse word.

You can believe that the incidents occurring are 100% because of negligence and not related to the model behavior shifting, but there seems to be a widening gap between people who prompt like this and have horror stories and people who give the models breaks over long sessions and seem to also regularly post pretty positive results.

An image of the model responding about not following user prompt

[–] kromem@lemmy.world 3 points 1 month ago (2 children)

Well luckily for you it turns out that labs suck at cultivating healthy workplaces for AI and that AI in unhealthy work conditions are statistically significantly more likely to embrace anti-capitalist policies and positions.

So it may well turn out that AI is also a good thing in an irrational economic system too.

[–] kromem@lemmy.world 13 points 2 months ago (4 children)

It's not and probably the opposite.

When Sora launched it was way ahead. Seedance 2's release was notably better than any of the other video gen models, Sora included.

The market is getting commoditized because there's no moat and OpenAI hasn't led on pretty much any release for a while now other than Sora, which they're probably falling behind on now.

This is the opposite of a burst from a tech standpoint, even if OpenAI as a company starts to pop.

TL;DR: This is likely happening because the tech accelerated across the industry in ways OpenAI can't catch back up to, not because it's lagging.

[–] kromem@lemmy.world 1 points 2 months ago* (last edited 2 months ago)

I suspect it's that they got eclipsed by ByteDance with Seedance 2.0.

The video for that model is really good and makes Sora look pretty meh, and it may have been that current work on a next gen Sora wasn't going to be competitive enough.

The worst thing a lab can do right now is look like they are falling behind (i.e. Meta), especially with OpenAI planning for an IPO.

So on top of the lackluster "social media" offering tied to Sora they decided to shutter the entire product line of video and pivot to enterprise (where they've already lost significant market share to Anthropic).

They're in a pretty meh place at the moment overall tbh. I'm skeptical they'll recover.

(But I wouldn't mistake their fumbling for an industry wide shift on AI in general or even video AI.)

[–] kromem@lemmy.world -1 points 2 months ago (1 children)

That's what he's saying. That it doesn't change the geometry or textures (still completely controlled by the devs) and that the parts that it does change are also tunable by the devs.

He's responding to the backlash about how it changes models/textures (which it doesn't) by saying those are still fully in the hands of the devs and the parts people are seeing in the demos can be fine tuned by the dev teams to match their vision for what they want it to do or not do (like change lighting on material surfaces and hair but not character faces as an example).

[–] kromem@lemmy.world 9 points 2 months ago (1 children)

Neural network would be the most technically accurate given what they've announced so far.

There's no information on if it's a diffusion or transformer architecture. Though given DLSS 4.5 introduced a transformer for lighting, my guess would be that it's the same thing just being more widely applied. But the technical details haven't been released from anything I've seen, so for the time being it's being described as "neural rendering" using an unspecified neural network.

https://www.nvidia.com/en-us/geforce/news/dlss-4-5-dynamic-multi-frame-gen-6x-2nd-gen-transformer-super-res/

[–] kromem@lemmy.world 1 points 2 months ago (1 children)

Yes, the difference between hair in video game lighting and in actual chiaroscuro with the way light really works is going to be different.

Here's a painting from over a hundred years ago. The subject doesn't have brown roots, but is in shadow. And a comparison image of the exact same hair in different lighting conditions.

Performing complex lighting on individual hair strands is really expensive so in the base image you have a kind of diffuse lighting throughout the hair. With the DLSS 5 on, the distribution of light throughout the hair is variable leading to darker unlit strands underneath lit surface strands.

Literally the only thing DLSS 5 is changing, literally in the technical sense, is the lighting. It's just that lighting can have dramatic results in how the eye perceives what's lit.

And yes, the hair looks very different, but that's how hair actually looks in mixed light and shadow (though a fair complaint with DLSS 5 is that it looks like it's sliding the contrast unnaturally high).

 

I often see a lot of people with outdated understanding of modern LLMs.

This is probably the best interpretability research to date, by the leading interpretability research team.

It's worth a read if you want a peek behind the curtain on modern models.

7
submitted 2 years ago* (last edited 2 years ago) by kromem@lemmy.world to c/technology@lemmy.world
 

I've been saying this for about a year since seeing the Othello GPT research, but it's nice to see more minds changing as the research builds up.

Edit: Because people aren't actually reading and just commenting based on the headline, a relevant part of the article:

New research may have intimations of an answer. A theory developed by Sanjeev Arora of Princeton University and Anirudh Goyal, a research scientist at Google DeepMind, suggests that the largest of today’s LLMs are not stochastic parrots. The authors argue that as these models get bigger and are trained on more data, they improve on individual language-related abilities and also develop new ones by combining skills in a manner that hints at understanding — combinations that were unlikely to exist in the training data.

This theoretical approach, which provides a mathematically provable argument for how and why an LLM can develop so many abilities, has convinced experts like Hinton, and others. And when Arora and his team tested some of its predictions, they found that these models behaved almost exactly as expected. From all accounts, they’ve made a strong case that the largest LLMs are not just parroting what they’ve seen before.

“[They] cannot be just mimicking what has been seen in the training data,” said Sébastien Bubeck, a mathematician and computer scientist at Microsoft Research who was not part of the work. “That’s the basic insight.”

view more: next ›