• brucethemoose@lemmy.world
    link
    fedilink
    English
    arrow-up
    3
    ·
    edit-2
    17 hours ago

    They’re pretty bad outside of English-Chinese actually.

    Voice-to-voice is all relatively new, and it sucks if it’s not all integrated (eg feeding a voice model plain text so it loses the original tone, emotion, cadence and such).

    And… honestly, the only models I can think of that’d be good at this are Chinese. Or Japanese finetunes of Chinese models. Amazon certainly has some stupid policy where they aren’t allowed to use them (even with zero security risk since they’re open weights).