AI Transcriber
Transcribe audio & video to text + subtitles — on-device, no upload.
Runs entirely in your browser — nothing you enter is uploaded, logged, or tracked.
A real speech-to-text transcriber that runs 100% on your device — drop in an audio file, a podcast, a voice memo, a lecture or a video, and an on-device Whisper neural network (no servers, no upload) writes out the words with timestamps. Click any line to jump the player to that moment, fix the odd word in the editable transcript, and export a plain-text transcript or ready-to-use .srt / .vtt subtitles. The AI downloads once (~63 MB) and then works offline forever; your recordings never leave your device, and there is no watermark or length paywall.
Export
SubRip subtitles with timestamps — drop into any video editor or player.
Built on your device — nothing uploaded, no watermark.
Frequently asked questions
Is my audio or video uploaded anywhere?
No. The speech-recognition AI and all processing run inside your browser on your own device — there are no servers, no upload, no analytics. After the one-time model download you can disconnect from the internet and it still works. For private recordings (interviews, meetings, voice memos) that matters: cloud transcribers send your audio to their servers; this one never does.
How does speech recognition run in my browser with nothing uploaded?
It uses Whisper — OpenAI’s open-source speech-recognition model — compiled to run on your device via WebAssembly. The first time you transcribe, the model downloads once (~63 MB), caches, and from then on every transcription happens locally and offline. Your browser decodes the audio and the AI reads it; the file is never sent anywhere.
Can I make subtitles (.srt / .vtt)?
Yes. The transcript is timestamped per phrase, so you can export standard SubRip (.srt) or WebVTT (.vtt) subtitle files ready to drop into a video editor or upload alongside a video. You can also export a plain-text transcript or copy it.
Does it work on video files too?
Yes — drop in a video and it reads the audio track. You get a player you can scrub, with each transcript line clickable to jump to that moment. (Very large videos use more memory; if a format’s audio can’t be decoded in your browser, extract the audio first.)
How accurate is it, and what are the limits?
This is the compact English model (Whisper tiny.en) chosen so it runs fast and privately on your device. It’s good on clear English speech, but heavy accents, background noise, music, crosstalk and very long files are harder, and it’s English-only. You can edit the transcript before exporting. It trades some of a giant cloud model’s peak accuracy for total privacy and offline use.
Is there a length limit or watermark?
No watermark and no hard length cap — but it runs on your CPU in the browser, so a long recording takes a while (a progress bar shows it) and uses memory. Shorter clips are quick; hour-long files are slower than a cloud GPU but completely private.