this post was submitted on 20 Apr 2025

1232 points (99.0% liked)

Lemmy Shitpost

35136 readers

3048 users here now

Welcome to Lemmy Shitpost. Here you can shitpost to your hearts content.

Anything and everything goes. Memes, Jokes, Vents and Banter. Though we still have to comply with lemmy.world instance rules. So behave!

Rules:

1. Be Respectful

Refrain from using harmful language pertaining to a protected characteristic: e.g. race, gender, sexuality, disability or religion.

Refrain from being argumentative when responding or commenting to posts/replies. Personal attacks are not welcome here.

...

2. No Illegal Content

Content that violates the law. Any post/comment found to be in breach of common law will be removed and given to the authorities if required.

That means:

-No promoting violence/threats against any individuals

-No CSA content or Revenge Porn

-No sharing private/personal information (Doxxing)

...

3. No Spam

Posting the same post, no matter the intent is against the rules.

-If you have posted content, please refrain from re-posting said content within this community.

-Do not spam posts with intent to harass, annoy, bully, advertise, scam or harm this community.

-No posting Scams/Advertisements/Phishing Links/IP Grabbers

-No Bots, Bots will be banned from the community.

...

4. No Porn/Explicit

Content

-Do not post explicit content. Lemmy.World is not the instance for NSFW content.

-Do not post Gore or Shock Content.

...

5. No Enciting Harassment,

Brigading, Doxxing or Witch Hunts

-Do not Brigade other Communities

-No calls to action against other communities/users within Lemmy or outside of Lemmy.

-No Witch Hunts against users/communities.

-No content that harasses members within or outside of the community.

...

6. NSFW should be behind NSFW tags.

-Content that is NSFW should be behind NSFW tags.

-Content that might be distressing should be kept behind NSFW tags.

...

If you see content that is a breach of the rules, please flag and report the comment and a moderator will take action where they can.

Also check out:

Partnered Communities:

1.Memes

2.Lemmy Review

3.Mildly Infuriating

4.Lemmy Be Wholesome

5.No Stupid Questions

10.LinuxMemes (Linux themed memes)

Reach out to

All communities included on the sidebar are to be made in compliance with the instance rules. Striker

founded 2 years ago

MODERATORS

LillianVS@lemmy.world

STRIKINGdebate2@lemmy.world

WiildFiire@lemmy.world

Decoy321@lemmy.world

YoBuckStopsHere@lemmy.world

The_Picard_Maneuver@startrek.website

FlyingSquid@lemmy.world

The_Picard_Maneuver@lemmy.world

1232

In heat (lemmy.world)

submitted 6 months ago by benni@lemmy.world to c/lemmyshitpost@lemmy.world

133 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] howrar@lemmy.ca 3 points 6 months ago (2 children)

It has nothing to do with the meaning. If your training set consists of a bunch of strings consisting of A's and B's together and another subset consisting of C's and D's together (i.e. [AB]+ and [CD]+ in regex) and the LLM outputs "ABBABBBDA", then that's statistically unlikely because D's don't appear with A's and B's. I have no idea what the meaning of these sequences are, nor do I need to know to see that it's statistically unlikely.

In the context of language and LLMs, "statistically likely" roughly means that some human somewhere out there is more likely to have written this than the alternatives because that's where the training data comes from. The LLM doesn't need to understand the meaning. It just needs to be able to compute probabilities, and the probability of this excerpt should be low because the probability that a human would've written this is low.

[–] monotremata@lemmy.ca 5 points 6 months ago

Honestly this isn't really all that accurate. Like, a common example when introducing the Word2Vec mapping is that if you take the vector for "king" and add the vector for "woman," the closest vector matching the resultant is "queen." So there are elements of "meaning" being captured there. The Deep Learning networks can capture a lot more abstraction than that, and the Attention mechanism introduced by the Transformer model greatly increased the ability of these models to interpret context clues.

You're right that it's easy to make the mistake of overestimating the level of understanding behind the writing. That's absolutely something that happens. But saying "it has nothing to do with the meaning" is going a bit far. There is semantic processing happening, it's just less sophisticated than the form of the writing could lead you to assume.

[–] JcbAzPx@lemmy.world 2 points 6 months ago

Unless they grabbed discussion forums that happened to have examples of multiple people. It's pretty common when talking about fertility, problems in that area will be brought up.

People can use context and meaning to avoid that mistake, LLMs have to be forced not to through much slower QC by real people (something Google hates to do).