1 comments

  • mashreghi 5 hours ago
    Hi HN,

    We built Subformer (https://subformer.com), a web app that dubs videos into other languages while keeping speaker identity intact.

    Most “AI dubbing” pipelines are just ASR → translation → TTS, which breaks as soon as you have multiple speakers. We instead run:

    - VAD + speaker diarization - Audio Demixing - Global speaker clustering - Per-segment ASR + translation - Per-speaker TTS (voice cloning or synthetic) - Timeline-aligned remuxing back into the video

    The tricky parts were diarization drift on long videos, timing mismatches after translation, and keeping costs sane when doing multilingual TTS at scale.

    It’s still early, but it already works well for things like interviews, TV clips, and YouTube videos with multiple speakers.

    Would love feedback from people who work on audio, speech, or localization.

    https://subformer.com