Changelog¶
All notable changes to asr2clip are documented here.
0.4.0 (Unreleased)¶
Added¶
- Local ASR server — optional offline transcription powered by sherpa-onnx, with OpenAI-compatible API (
asr2clip-serve/asr2clip --serve) - Model registry — YAML-based model management (
models.yaml) supporting multiple model types (sense_voice, whisper, paraformer, transducer) with lazy loading and auto-download - Multi-model routing — the
modelAPI parameter selects which engine to use; models are loaded on first request - Per-request parameters —
language,prompt, andtemperatureare passed through to the engine where the model supports them; unsupported parameters are silently accepted - Language-specific recognizer caching — LRU cache of recognizer instances for per-request language hints (configurable cache size, default 3)
- SSE streaming —
stream=trueparameter returns Server-Sent Events (transcript.text.delta,transcript.text.done,[DONE]) --download-modeloption to pre-download the default model--host/--port/--configoptions for local ASR server configuration- CI pipeline with ruff, ty, and complexipy checks
Changed¶
- Update vendored
httpclientmodule from 0.3.1 to 0.4.1 (zerodep) - Zero external dependencies — replaced PyYAML with a vendored YAML parser and httpx/requests with a vendored HTTP client; core install now pulls only numpy, sounddevice, pydub, and copykitten
- Clipboard library — replaced pyperclip with copykitten (Rust-based, no external tools like xclip/wl-clipboard needed)
- Wayland clipboard — on Wayland sessions, prefers
wl-copyfor proper clipboard manager integration (e.g. KDE Klipper); falls back to copykitten on X11 or when wl-copy is unavailable - Minimum Python version raised from 3.8 to 3.10
Fixed¶
-iflag now correctly triggers file transcription instead of entering continuous recording mode
0.3.8¶
Added¶
- Voice Activity Detection (VAD) with
--vadflag for automatic transcription on silence - Multi-feature VAD — combines RMS energy, zero-crossing rate, and speech-band frequency ratio for robust detection
- Adaptive threshold — real-time adjustment to ambient noise (enabled by default with
--vad) - Ambient noise calibration —
--calibratemeasures environment noise and suggests threshold - Continuous recording mode —
--vadand--intervalfor long sessions (meetings, lectures) - Async transcription — transcription runs in background with ordered output
- Automatic retry on timeout — configurable retry count and delay for API calls
- Structured logging with ANSI color support and colored status indicators
- Double Ctrl+C to force exit in continuous mode (single Ctrl+C transcribes remaining audio first)
- Auto-calibrate ambient noise level on startup
Changed¶
- Modularized codebase into separate modules (audio, config, output, transcribe, vad, utils)
- Simplified CLI and improved config management
Fixed¶
- Handle multi-dimensional audio arrays in WAV writing
- Skip silence check when VAD confirms speech
0.3.7¶
Added¶
--version/-voption to display program version--edit/-eoption to open config file in default editor--testcommand for comprehensive configuration testing (clipboard, audio, API)- Audio device selection with
--list_devicesand--device
Changed¶
- Migrated from
setup.pytopyproject.tomlwith dynamic versioning - Simplified dependencies by removing unused packages
0.3.6¶
Added¶
-o FILE/--outputoption to append transcripts to a file with timestamps- Transcript output to stdout
0.3.5¶
Added¶
--generate_configand--print_configfor configuration template managementorg_idsupport for OpenAI Organization ID- Verbose logging with
-q/--quiettoggle
0.3.0¶
Added¶
- Input file transcription (
-i FILE) with ffmpeg-based format conversion - Support for MP3, WAV, FLAC, OGG, and other ffmpeg-supported formats
Changed¶
- Renamed project from
asr_to_clipboardtoasr2clip
0.2.0¶
Added¶
- Continuous recording mode
- Recording duration option
- Configurable audio settings
0.1.0¶
Added¶
- Initial release
- Real-time speech recording and transcription
- Clipboard integration
- YAML configuration file support
- OpenAI Whisper API support