Updated April 16, 2026

Voice-to-Text for Linux in 2026

Linux has been the forgotten platform for dictation for two decades. Here's where things actually stand in 2026 — and how to get modern, on-device, Whisper-powered voice typing working on any Linux desktop.

If you've tried to set up dictation on Linux in the past, you know the story. The desktop ecosystem has always trailed macOS and Windows on accessibility tooling, and voice input has been the most glaring gap. The options have historically been: wrestle with half-abandoned open-source projects, run a cloud service that sends every word to someone else's server, or just give up and type.

In 2026, that's finally changing. Whisper-class models are small enough to run on consumer laptops, whisper.cpp has matured into a production-ready inference engine, and GPU acceleration is available on basically every modern Linux machine via Vulkan and CUDA. The pieces all exist — they just hadn't been packaged into a real desktop app. Now they have.

The State of Dictation on Linux

Historically, Linux users looking for voice input have had three rough options, none of them great:

DIY open-source stacks like nerd-dictation, wtype scripts around whisper.cpp, or Vosk-based tools. These can work well for a tinkerer, but they're CLI-first, require manual setup per desktop environment, and don't give you a packaged app with a UI.
Cloud APIs like the OpenAI Whisper API or Google Speech-to-Text, wrapped in custom scripts. Accurate, but your audio leaves your machine and you pay per minute.
Desktop-built-in accessibility tools. Useful for some workflows but typically limited in accuracy, languages, and customization.

What's been missing is a packaged, installable desktop app that runs Whisper locally with GPU acceleration, has a proper UI, and Just Works on a modern Linux install. That's the gap Wspr is built to fill.

Why Linux Has Been Underserved

Three structural reasons:

Market size. Commercial dictation vendors (Dragon, Wispr Flow, Superwhisper, MacWhisper) build for Mac, Windows, and in some cases mobile — because that's where the paying users are. Linux is a rounding error on their revenue spreadsheet, so it stays unsupported.

Desktop fragmentation. GNOME, KDE Plasma, Hyprland, Sway, XFCE, Cinnamon — same kernel, six different ways to paste text into the focused window. Any serious Linux dictation tool has to handle X11 and Wayland, ibus and fcitx, xdotool and wtype. It's non-trivial and most vendors don't bother.

Audio stack churn. ALSA, PulseAudio, PipeWire — capturing microphone input on Linux has been a moving target for years. PipeWire finally stabilized that in 2023-2024, which is part of why this kind of app is feasible now.

What Wspr Does on Linux

Wspr is a native Linux app. It runs the same whisper.cpp engine as on macOS and Windows, with GPU acceleration where available and CPU fallback when it isn't.

The user flow on Linux is the same as everywhere else:

Press your configured global hotkey.
Speak.
Release the hotkey. Wspr transcribes locally and pastes into the focused window.
Optionally trigger an AI rewrite — "make this a bullet list", "translate to French", "fix grammar" — routed to OpenAI, Anthropic, or Gemini using your own API key.

It also does file transcription: drop an audio or video file in, get a transcript out. Good for meeting recordings, voice memos, and podcast clips.

Distro and Desktop Support

Wspr ships as a Linux build for x86_64 systems. It's been validated on mainstream distros including Ubuntu and its derivatives — other recent glibc-based distros should work too, though your mileage may vary depending on your desktop environment's input-injection behavior.

Session support, plainly: X11 sessions are fully supported and tested daily. Wayland sessions running XWayland (the Ubuntu/GNOME/KDE default) work via XWayland — Wspr forces GDK_BACKEND=x11 and pastes via xdotool against the XWayland server. Pure-Wayland setups (Sway, Hyprland with no XWayland) are not supported yet — paste depends on xdotool today. Native Wayland support (libei / wlr-virtual-keyboard) is on the roadmap.

Audio capture goes through cpal, which speaks whatever audio system your distro exposes — PipeWire on modern setups, PulseAudio on older ones. If you run into a compatibility issue on a specific distro or compositor, open a GitHub issue and we'll look at it.

Privacy: This Is the Part Linux Users Care About

A big reason people use Linux in the first place is because they don't want a vendor-controlled pipe between them and their computer. A dictation app that streams every word to a cloud service undoes that in one install.

Wspr never sends audio anywhere. The Whisper model lives on your disk, runs on your GPU, and produces text that's pasted directly into your focused application. No telemetry, no account, no cloud dependency for core transcription. AI rewriting is the only step that optionally calls out to a provider — and only the text you explicitly rewrite, using your own API key, routed directly from your machine to the provider you chose.

Performance

Whisper "small" and "medium" models run comfortably in real-time on a mid-range Linux laptop with integrated graphics. On a discrete GPU, you can run "large-v3" and get near-instant transcription of multi-minute recordings. Wspr downloads models on first run; you pick the one that fits your hardware.

Pricing

$14.99. Once. That's the whole pricing page.

No subscription, no account, no tiered "Pro" that gates normal functionality. The free download includes 50 live transcriptions so you can actually test that it works on your exact setup before paying. Pro is a one-time unlock that removes the cap — and removes it forever.

Try It

Linux has needed a real dictation app for a long time. Wspr is that app. It's private, it runs Whisper locally with GPU acceleration where available, and it costs $14.99 once. Grab the free build and see if it fits your workflow.

Download Wspr for Linux Buy Pro — $14.99

← Back to Wspr