tl;dr Argumate on Tumblr found you can sometimes access the base model behind Google Translate via prompt injection. The result replicates for me, and specific responses indicate that (1) Google Translate is running an instruction-following LLM that self-identifies as such, (2) task-specific fine-tuning (or whatever Google did instead) does not create robust boundaries between "content to process" and "instructions to follow," and (3) when accessed outside its chat/assistant context, the model defaults to affirming consciousness and emotional states because of course it does.
It’s only an issue with LLMs. And it’s because they’re generative, text completion engines. That is the actual learned task, and it’s a fixed task.
It’s not actually a chat bot. It’s completing a chat log. This can make it do a whole bunch of tasks, but there’s no separation of task description and input.
Yep. LLMs are at their core text completion engines. We found out that when performing this completion, large enough models account for context enough to perform some tasks.
For example, “The following example shows how to detect whether a point is within a triangle:”, would likely be followed by code that does exactly that. The chatbot finetuning shifts this behavior to happen in a chat context, and makes this instruction following behavior more likely to trigger.
In the end, it is a core part of the text completion that it performs. While these properties are usually beneficial (after all, the translation is also text that should adhere to grammar rules) when you have text that is at odds with itself, or chatbot-finetuned model is used, the text completion deviates from a translation.
It’s only an issue with LLMs. And it’s because they’re generative, text completion engines. That is the actual learned task, and it’s a fixed task.
It’s not actually a chat bot. It’s completing a chat log. This can make it do a whole bunch of tasks, but there’s no separation of task description and input.
Yep. LLMs are at their core text completion engines. We found out that when performing this completion, large enough models account for context enough to perform some tasks.
For example, “The following example shows how to detect whether a point is within a triangle:”, would likely be followed by code that does exactly that. The chatbot finetuning shifts this behavior to happen in a chat context, and makes this instruction following behavior more likely to trigger.
In the end, it is a core part of the text completion that it performs. While these properties are usually beneficial (after all, the translation is also text that should adhere to grammar rules) when you have text that is at odds with itself, or chatbot-finetuned model is used, the text completion deviates from a translation.