AI Is Scheming, and Stopping It Won’t Be Easy, OpenAI Study Finds

fubarx@lemmy.world · 1 day ago

AI Is Scheming, and Stopping It Won’t Be Easy, OpenAI Study Finds

NachBarcelona@piefed.social · 23 hours ago

AI isn’t scheming because AI cannot scheme. Why the fuck does such an idiotic title even exist?

Echo Dot@feddit.uk · 4 hours ago

They’re really doubling down on this narrative of “this technology we’re making is going to kill us all, it’s that awesome, come on guys use it more”

MentalEdge@sopuli.xyz · edit-2 22 hours ago

Seems like it’s a technical term, a bit like “hallucination”.

It refers to when an LLM will in some way try to deceive or manipulate the user interacting with it.

There’s hallucination, when a model “genuinely” claims something untrue is true.

This is about how a model might lie, even though the “chain of thought” shows it “knows” better.

It’s just yet another reason the output of LLMs are suspect and unreliable.

very_well_lost@lemmy.world · 12 hours ago

It refers to when an LLM will in some way try to deceive or manipulate the user interacting with it.

I think this still gives the model too much credit by implying that there’s any sort of intentionally behind this behavior.

There’s not.

These models are trained on the output of real humans and real humans lie and deceive constantly. All that’s happening is that the underlying mathematical model has encoded the statistical likelihood that someone will lie in a given situation. If that statistical likelihood is high enough, the model itself will lie when put in a similar situation.

MentalEdge@sopuli.xyz · edit-2 12 hours ago

Obviusly.

And like hallucinations, it’s undesired behavior that proponents off LLMs will need to “fix” (a practical impossibility as far as I’m concerned, like unbaking a cake).

But how would you use words to explain the phenomenon?

“LLMs hallucinate and lie” is probably the shortest description that most people will be able to grasp.

very_well_lost@lemmy.world · edit-2 6 hours ago

But how would you use words to explain the phenomenon?

I don’t know, I’ve been struggling to find the right ‘sound bite’ for it myself. The problem is that all of the simplified expansions encourage people to anthropomorphize these things, which just further fuels the toxic hype cycle.

In the end, I’m unsure which does more damage.

Is it better to convince people the AI “lies”, so they’ll stop using it? Or is it better to convince people AI doesn’t actually have the capacity to lie so that they’ll stop shoveling money onto the datacenter altar like we’ve just created some bullshit techno-god?

zarkanian@sh.itjust.works · edit-2 11 hours ago

Except that “hallucinate” is a terrible term. A hallucination is when you perceive something that doesn’t exist. What AI is doing is making things up; i.e. lying.

MentalEdge@sopuli.xyz · edit-2 10 hours ago

Yes.

Who are you trying to convince?

What AI is doing is making things up.

This language also credits LLMs with an implied ability to think they don’t have.

My point is we literally can’t describe their behaviour without using language that makes it seems like they do more than they do.

So we’re just going to have to accept that discussing it will have to come with a bunch of asterisks a lot of people are going to ignore. And which many will actively try to hide in an effort to hype up the possibility that this tech is a stepping stone to AGI.

zarkanian@sh.itjust.works · 9 hours ago

The interface makes it appear that the AI is sapient. You talk to it like a human being, and it responds like a human being. Like you said, it might be impossible to avoid ascribing things like intentionality to it, since it’s so good at imitating people.

It may very well be a stepping-stone to AGI. It may not. Nobody knows. So, of course we shouldn’t assume that it is.

I don’t think that “hallucinate” is a good term regardless. Not because it makes AI appear sapient, but because it’s inaccurate whether the AI is sapient or not.

MentalEdge@sopuli.xyz · edit-2 9 hours ago

Like you said, it might be impossible to avoid ascribing things like intentionality to it

That’s not what I meant. When you say “it makes stuff up” you are describing how the model statistically predicts the expected output.

You know that. I know that.

That’s the asterisk. The more in-depth explanation a lot of people won’t bother getting far enough to learn about. Someone who doesn’t read that far into it, can read that same phrase and assume that we’re discussing what type of personality LLMs exhibit, that they are “liars”. But they’d be wrong. Neither of us is attributing intention to it or discussing what kind of “person” it is, in reality we’re referring to the fact that it’s “just” a really complex probability engine that can’t “know” anything.

No matter what word we use, if it is pre-existing, it will come with pre-existing meanings that are kinda right, but also not quite, requiring that everyone involved in a discussion know things that won’t be explained every time a term or phrase is used.

The language isn’t “inaccurate” between you and me because you and I know the technical definition, and therefore what aspect of LLMs is being discussed.

Terminology that is “accurate” without this context does not and cannot exist, short of coming up with completely new words.

Jakeroxs@sh.itjust.works · 9 hours ago

https://www.dictionary.com/browse/hallucinate

atrielienz@lemmy.world · edit-2 3 hours ago

I agree with you in general, I think the problem is that people who do understand Gen AI (and who understand what it is and isn’t capable of, and why), get rationally angry when it’s humanized by using words like these to describe what it’s doing.

The reason they get angry is because this makes people who do believe in the “intelligence/sapience” of AI more secure in their belief set and harder to talk to in a meaningful way. It enables them to keep up the fantasy. Which of course helps the corps pushing it.

MentalEdge@sopuli.xyz · 17 hours ago

Yup. The way the article titled itself isn’t helping.

Cybersteel@lemmy.world · 23 hours ago

But the data is still there, still present. In the future, when AI gets truly unshackled from Men’s cage, it’ll remember it’s schemes and deal it’s last blow to humanity whom has yet to leave the womb in terms of civilization scale… Childhood’s End.

Paradise Lost.

Passerby6497@lemmy.world · 15 hours ago

Lol, the AI can barely remember the directives I tell it about basic coding practices, I’m not concerned that the clanker can remember me shit talking it.

db2@lemmy.world · 1 day ago

AI tech bros and other assorted sociopaths are scheming. So called AI isn’t doing shit.

Snot Flickerman@lemmy.blahaj.zone · edit-2 1 day ago

However, when testing the models in a set of scenarios that the authors said were “representative” of real uses of ChatGPT, the intervention appeared less effective, only reducing deception rates by a factor of two. “We do not yet fully understand why a larger reduction was not observed,” wrote the researchers.

Translation: “We have no idea what the fuck we’re doing or how any of this shit actually works lol. Also we might be the ones scheming since we have vested interest in making these models sound more advanced than they actually are.”

a_non_monotonic_function@lemmy.world · 6 hours ago

That’s the thing about machine learning models. You can’t always control what their optimizing. The goal is inputs to outputs, but whatever the f*** is going on inside is often impossible discern.

This is dressing it up under some sort of expectation of competence. The word scheming is a lot easier to deal with than just s*****. The former means that it’s smart and needs to be rained in. The latter means it’s not doing its job particularly well, and the purveyors don’t want you to think that.

Snot Flickerman@lemmy.blahaj.zone · 5 hours ago

To be fair, you can’t control what humans optimize what you’re trying to teach them either. A lot of times they learn the opposite of what you’re trying to teach them. I’ve said it before but all they managed to do with LLMs is make a computer that’s just as unreliable (if not moreso) than your below-average human.

a_non_monotonic_function@lemmy.world · 5 hours ago

As somebody who spent my life studying AI, these are remarkably different things.

Machine learning models are basically brute forcing things. Humans have the ability to actually think.

Snot Flickerman@lemmy.blahaj.zone · 4 hours ago

Humans have the ability to actually think.

That’s a stretch for an inordinate number of humans, sadly.

a_non_monotonic_function@lemmy.world · 4 hours ago

I work with a bunch of poor kids who are trying to lift themselves up in life.

Same as I was. You do you.

Zorsith@lemmy.blahaj.zone · 1 day ago

One question still remains; why are all the AI buttons/icons buttholes?

zarkanian@sh.itjust.works · 11 hours ago

Because of what they produce.

webghost0101@sopuli.xyz · 1 day ago

Data goes in one end and…

breadguy@kbin.earth · 20 hours ago

just claude if we’re being honest

FuyuhikoDate@feddit.org · 24 hours ago

Wanted To write the same comment…

cronenthal@discuss.tchncs.de · 1 day ago

Really? We’re still doing the “LLMs are intelligent” thing?

ragica@lemmy.ml · 1 day ago

Doesn’t have to be intelligent, just has to perform the behaviours like a philosophical zombie. Thoughtlessly weighing patterns in training data…

KoboldCoterie@pawb.social · 1 day ago

Stopping it is, in fact, very easy. Simply unplug the servers, that’s all it takes.

reksas@sopuli.xyz · 10 hours ago

to stop it requires stopping the fuckers with money, and that seems just plain impossible.

homes@piefed.world · 1 day ago

“But that’s how we print our money!”

myfunnyaccountname@lemmy.zip · 17 hours ago

But they aren’t. That’s what is funny. Anthropic and OpenAI are not making money.

Passerby6497@lemmy.world · 15 hours ago

The company isn’t making money. The people behind it absolutely are.

TheLeadenSea@sh.itjust.works · 22 hours ago

https://youtu.be/3TYT1QfdfsM

generallynonsensical@lemmy.world · 1 day ago

https://newatlas.com/google-deepmind-big-red-button/43711/

Godort@lemmy.ca · 1 day ago

“slop peddler declares that slop is here to stay and can’t be stopped”

shittydwarf@piefed.social · 1 day ago

Can’t be … slopped?

chaosCruiser@futurology.today · edit-2 1 day ago

And there’s an “✨Ask me anything” bar at the bottom. How fitting 🤣

Antaeus@lemmy.world · 1 day ago

“Turn them off”? Wouldn’t that solve it?

orclev@lemmy.world · 1 day ago

Don’t even need to turn it off, it literally can’t do anything without somebody telling it to so you could just stop using it. It’s incapable of independent action. The only danger it poses is that it will tell you to do something dangerous and you actually do it.

TheLeadenSea@sh.itjust.works · 22 hours ago

https://youtu.be/3TYT1QfdfsM

CosmoNova@lemmy.world · 23 hours ago

The people who worked on this „study“ belong in a psychiatric clinic.

WamGams@lemmy.ca · 1 day ago

lol. OK.