YouTube’s auto-generated and auto-translated subtitles are notoriously unreliable. For many videos, captions are missing, delayed, or inconsistent across browsers.
I built a Chrome extension to generate subtitles directly from the audio, with features like:
- Real-time transcription of video audio
- Translation into 100+ languages
- Multiple subtitles at once (useful for language learners)
- Search inside video subtitles (like CTRL+F)
- Drag-and-drop subtitle placement for optimal viewing
- Optional: dictionary lookup, summarization, and Q&A based on video content
I’d love feedback from anyone interested in real-time transcription, accessibility, or YouTube workflow automation. How would you approach a problem like this?
I initially explored a fully client-side approach (including WebAssembly), but it didn’t work well in practice. Real-time audio transcription and multi-language translation are both compute-intensive, and browser-only solutions ran into performance and reliability limits, especially for longer videos and live streams.
Using a dedicated backend allowed more consistent latency and accuracy across browsers, and made features like multiple simultaneous subtitle languages and searchable transcripts feasible.
It’s a standard unpacked Chrome extension install (developer mode → load unpacked). Happy to answer any technical questions about the pipeline or trade-offs.
I built a Chrome extension to generate subtitles directly from the audio, with features like:
- Real-time transcription of video audio - Translation into 100+ languages - Multiple subtitles at once (useful for language learners) - Search inside video subtitles (like CTRL+F) - Drag-and-drop subtitle placement for optimal viewing - Optional: dictionary lookup, summarization, and Q&A based on video content
Here are short demo videos showing the extension generating subtitles from YouTube audio: https://drive.google.com/drive/folders/1I_z6HjGCVUwgYs1UXlB7...
I’d love feedback from anyone interested in real-time transcription, accessibility, or YouTube workflow automation. How would you approach a problem like this?