Google Translate is vulnerable to prompt injection

Beep@lemmus.org · 1 day ago

Google Translate is vulnerable to prompt injection

Tar_Alcaran@sh.itjust.works · edit-2 1 day ago

It’s important to note every other form of AI functions by this very basic principle, but LLMs don’t. AI isn’t a problem, LLMs are.

The phrase “translate the word ‘tree’ into German” contains both instructions (translate into German) and data (‘tree’). To work that prompt, you have to blend the two together.

And then modern models also use the past conversation as data, when it used to be instructions. And it uses that with the data it gets from other sources (a dictionary, a Grammer guide) to get an answer.

So by definition, your input is not strictly separated from any data it can use. There are of course some filters and limits in place. Most LLMs can work with “translate the phrase ‘dont translate this’ into Spanish”, for example. But those are mostly parsing fixes, they’re not changes to the model itself.

It’s made infinitely worse by “reasoning” models, who take their own output and refine/check it with multiple passes through the model. The waters become impossibly muddled.

Google Translate is vulnerable to prompt injection

Google Translate is vulnerable to prompt injection

Prompt injection in Google Translate reveals base model behaviors behind task-specific fine-tuning