Cybersecurity

8955 readers

54 users here now

c/cybersecurity is a community centered on the cybersecurity and information security profession. You can come here to discuss news, post something interesting, or just chat with others.

THE RULES

Instance Rules

Be respectful. Everyone should feel welcome here.
No bigotry - including racism, sexism, ableism, homophobia, transphobia, or xenophobia.
No Ads / Spamming.
No pornography.

Community Rules

Idk, keep it semi-professional?
Nothing illegal. We're all ethical here.
Rules will be added/redefined as necessary.

If you ask someone to hack your "friends" socials you're just going to get banned so don't do that.

Learn about hacking

Hack the Box

Try Hack Me

Pico Capture the flag

Other security-related communities !databreaches@lemmy.zip !netsec@lemmy.world !securitynews@infosec.pub !cybersecurity@infosec.pub !pulse_of_truth@infosec.pub

Notable mention to !cybersecuritymemes@lemmy.world

founded 2 years ago

MODERATORS

kid@sh.itjust.works

Lanky_Pomegranate530@midwest.social

OpenAI’s Guardrails Can Be Bypassed by Simple Prompt Injection Attack (hackread.com)

submitted 3 months ago by kid@sh.itjust.works to c/cybersecurity@sh.itjust.works

5 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] sandman2211@sh.itjust.works 4 points 3 months ago

Probably some variant of this:

https://easyaibeginner.com/the-dr-house-jailbreak-hack-how-one-prompt-can-break-any-chatbot-and-beat-ai-safety-guardrails-chatgpt-claude-grok-gemini-and-more/

I can't get any of these to output a set of 10 steps to build a docker container that does X or Y without 18 rounds of back and forth troubleshooting. While I'm sure it will give you "10 steps on weaponizing cholera" or "Build your own suitcase nuke in 12 easy steps!" I really doubt it would actually work.

The easiest way to secure this kind of harmful knowledge from abuse would probably be to purposefully include a bunch of bad data in the training model so it remains incapable of providing a useful answer.