They’re pretty bad outside of English-Chinese actually.
Voice-to-voice is all relatively new, and it sucks if it’s not all integrated (eg feeding a voice model plain text so it loses the original tone, emotion, cadence and such).
And… honestly, the only models I can think of that’d be good at this are Chinese. Or Japanese finetunes of Chinese models. Amazon certainly has some stupid policy where they aren’t allowed to use them (even with zero security risk since they’re open weights).
isn’t interpreting language the ONE thing LLMs are supposed to be good at? jesus
They’re pretty bad outside of English-Chinese actually.
Voice-to-voice is all relatively new, and it sucks if it’s not all integrated (eg feeding a voice model plain text so it loses the original tone, emotion, cadence and such).
And… honestly, the only models I can think of that’d be good at this are Chinese. Or Japanese finetunes of Chinese models. Amazon certainly has some stupid policy where they aren’t allowed to use them (even with zero security risk since they’re open weights).