March 22, 2026

How to Dictate on Mac Without Sending Your Voice to the Cloud

Your voice is biometric data. Most dictation apps upload it to servers you don't control. Here's how on-device transcription works, what makes it possible, and which apps actually keep your audio local.

The Problem: Your Voice Goes Everywhere

When you use a cloud-based dictation app, here's what happens: your computer records your voice, compresses the audio, sends it over the internet to a remote server, that server processes it with a speech recognition model, and sends text back to your computer. Your raw audio — or a processed version of it — exists on someone else's infrastructure, at least temporarily.

This is how most dictation has worked for years. Google, Amazon, and most third-party dictation apps use this approach. Even Apple's built-in dictation can send audio to Apple's servers depending on your settings and which features you enable.

The privacy implications are real. Your voice is uniquely identifiable biometric data. Dictation captures your unfiltered thoughts — drafts of emails, private messages, journal entries, medical notes, legal communications. If that audio lives on a server, it's subject to that company's data retention policies, potential breaches, and legal requests.

What Changed: GPU Acceleration and Whisper

Two developments made private, on-device dictation practical.

GPU acceleration. Modern GPUs — whether from Apple, NVIDIA, or AMD — can run machine learning models efficiently at near-real-time speed with minimal power consumption. Before this, running high-quality speech models locally meant maxing out your CPU and draining your battery.

OpenAI's Whisper. In 2022, OpenAI released Whisper, an open-source speech recognition model trained on 680,000 hours of multilingual audio. Whisper matches or exceeds commercial cloud speech APIs in accuracy across 99 languages. Because it's open-source, developers can run it locally instead of sending audio to OpenAI's servers.

The combination of powerful on-device hardware and a world-class open model created a new category: dictation apps that never need the internet.

How whisper.cpp Works

whisper.cpp is a high-performance C/C++ implementation of Whisper that runs with GPU acceleration on macOS, Windows, and Linux. It leverages your device's GPU to deliver fast transcription with low power consumption — no cloud connection needed.

When you speak into an app that uses whisper.cpp:

Your device's microphone captures audio
The audio is processed into a spectrogram (a visual representation of sound frequencies)
The Whisper model, running on your GPU, converts the spectrogram into text
The text appears in your app

Every step happens on your device. The audio never touches a network interface. There's no server, no API call, no internet connection required. You could transcribe on a plane with Wi-Fi turned off.

AI Rewriting with OpenAI, Anthropic, or Gemini

Raw transcription output isn't always clean. You might say "um" or repeat yourself. You might want to rephrase something more professionally. This is where AI rewriting comes in.

Most apps that offer AI rewriting route your text through their own servers as a middleman. With Wspr, you connect directly to OpenAI, Anthropic, or Google Gemini using your own API key. Your text goes straight from your device to the provider you choose — no intermediary, no vendor lock-in.

Here's how the pipeline works:

Step 1: Voice captured locally by microphone
Step 2: Speech transcribed locally by whisper.cpp on your GPU
Step 3: Text optionally rewritten by your chosen AI provider using your own API key
Step 4: Final text pasted into your app

Your audio never leaves your device. If you enable AI rewriting, only the transcribed text is sent — directly to the provider you trust, using your own credentials. And rewriting is entirely optional; you can skip it and keep everything 100% local.

What Stays Local vs. What Hits the Cloud

Not all "private" dictation apps are equally private. Here's a breakdown of what data goes where across the major options.

Data Type	Wspr	Wispr Flow	Superwhisper	Apple Dictation
Audio recording	Local only	Sent to cloud	Local only	Sometimes cloud
Speech-to-text processing	Local (whisper.cpp)	Cloud servers	Local (Whisper)	Local or cloud
AI text rewriting	Cloud (your own API key, your choice of provider)	Cloud (proprietary)	Cloud (OpenAI)	N/A
API keys	Stored locally on your device	Managed by vendor	App storage	N/A
Analytics/telemetry	None	Yes	Minimal	Apple telemetry
Account required	No	Yes	No	Apple ID

The key distinction: with Wspr, your audio never leaves your device. If you use AI rewriting, only the transcribed text is sent — directly from your device to the provider's API using your own key. No middleman. No vendor-controlled servers processing your voice. And rewriting is entirely optional.

Why This Matters More Than You Think

Consider what you dictate. Emails to colleagues. Messages to your partner. Journal entries. Medical notes. Legal memos. Creative writing. Business strategy. Job applications.

Cloud dictation services aggregate this data across millions of users. Even with good intentions, that creates a target. Data breaches happen. Subpoenas happen. Policy changes happen. Companies get acquired.

On-device transcription eliminates the entire category of risk. If your audio never leaves your device, it can't be intercepted, breached, subpoenaed from a third party, or used to train someone else's model. The data simply doesn't exist anywhere but your computer.

How to Set Up Private Dictation on Your Computer

Getting started with fully private voice-to-text takes about two minutes:

Download Wspr and open it. No account required. Available on macOS, Windows, and Linux.
Open Settings and download a Whisper model. Start with Small (~460 MB) for the best balance of speed and accuracy.
Set your preferred global hotkey (Ctrl+Shift+Space by default).
For AI rewriting, add your own API key for OpenAI, Anthropic, or Gemini. Or skip this step to keep everything 100% local.
Press your hotkey, speak, and your text appears — transcribed without a single byte of audio leaving your device.

The free tier gives you 50 transcriptions to verify that the quality meets your needs. If it does, Wspr Pro unlocks unlimited everything for a one-time $14.99 purchase via Polar.sh.

Download Wspr Free Buy Pro — $14.99

← More articles