All reports
Technology by deep-research

Parakeet-rs + Tauri 2.x Integration Patterns — Streaming STT, Sidecar Builds, and Metal Acceleration on macOS

Remindr

Parakeet-rs + Tauri 2.x Integration Patterns

Executive Summary

This report provides implementation guidance for integrating parakeet-rs (Rust NVIDIA Parakeet ASR via ONNX) into Remindr’s Tauri 2.x desktop app. It covers the parakeet-rs API surface, Tauri sidecar packaging patterns, GPU acceleration options on macOS Apple Silicon, memory management for long-running STT sessions, and fallback strategies for >4 speaker diarization via FluidAudio.

Key findings:

  1. parakeet-rs v0.2.8 supports streaming STT with 560ms chunks, Sortformer v2.1 diarization, and multitalker ASR
  2. CoreML EP is broken for Parakeet models (GitHub issue #26355) — use WebGPU (Metal under the hood) or CPU on macOS
  3. CPU-only parakeet-rs on M3 is already significantly faster than Whisper with Metal — GPU acceleration is a nice-to-have, not a requirement
  4. Tauri 2 sidecar approach is overkill — parakeet-rs is a Rust library, integrate directly via Tauri commands
  5. FluidAudio (Swift/CoreML) provides >4 speaker diarization via LS-EEND as XPC fallback

1. Parakeet-rs Architecture & API Surface

1.1 Overview

PropertyValue
Crateparakeet-rs v0.2.8 (crates.io)
BackendONNX Runtime via ort crate
ModelsNVIDIA Parakeet (CTC, TDT, EOU) + Sortformer v2/v2.1 diarization
LicenseMIT (library) + CC-BY-4.0 (NVIDIA ONNX models)
StreamingYes — cache-aware stateful streaming in 560ms chunks
DiarizationSortformer v2.1 — streaming 4-speaker attribution
MultitalkerSpeaker-attributed transcription with speaker IDs
PlatformsmacOS (CPU, WebGPU), Linux (CPU, CUDA, TensorRT), Windows (CPU, DirectML, CUDA)

1.2 Core API

Batch transcription (full audio file):

use parakeet_rs::{Transcriber, TranscriberConfig};

let config = TranscriberConfig::default()
    .with_model(ModelType::ParakeetTDT0_6B_V3)
    .with_execution_provider(ExecutionProvider::CPU); // or WebGPU

let transcriber = Transcriber::new(config)?;
let result = transcriber.transcribe("meeting.wav")?;
// result.text, result.segments (with timestamps)

Streaming transcription (real-time):

use parakeet_rs::streaming::{StreamingTranscriber, StreamingConfig};

let config = StreamingConfig::default()
    .with_model(ModelType::NemotronStreaming)
    .with_chunk_duration_ms(560); // cache-aware chunks

let mut streamer = StreamingTranscriber::new(config)?;

// Feed audio chunks as they arrive
loop {
    let audio_chunk = capture_audio(560)?; // 560ms PCM
    let partial = streamer.transcribe_chunk(&audio_chunk)?;
    // partial.text — intermediate result
    // partial.is_final — true when sentence boundary detected
}

Speaker diarization (streaming):

use parakeet_rs::diarization::{Diarizer, DiarizeConfig};

let config = DiarizeConfig::default()
    .with_model(DiarizeModel::SortformerV2_1)
    .with_max_speakers(4);

let mut diarizer = Diarizer::new(config)?;

// Stream chunks for real-time diarization
let result = diarizer.diarize_chunk(&audio_chunk)?;
// result.segments: Vec<DiarizeSegment> { speaker_id, start, end }

Multitalker (combined STT + diarization):

use parakeet_rs::multitalker::{MultitalkerTranscriber, MultitalkerConfig};

let config = MultitalkerConfig::default()
    .with_streaming(true);

let mut mt = MultitalkerTranscriber::new(config)?;
let result = mt.transcribe_chunk(&audio_chunk)?;
// result.utterances: Vec<Utterance> { speaker_id, text, start, end }

1.3 Model Downloads

Models are hosted on HuggingFace (NVIDIA). Total download size for recommended set:

ModelPurposeSizeQuantized (int8)
parakeet-tdt-0.6b-v3Batch transcription~600MB~200MB
nemotron-streamingStreaming STT~400MB~150MB
sortformer-v2.1Diarization (4 speakers)~300MB~120MB
multitalker-streaming-0.6bCombined STT+diarization~600MB~250MB

Recommendation for Remindr: Ship quantized (int8) models to minimize app size. Offer full-precision download as optional “HD transcription” upgrade. Total initial download: ~470MB (int8 nemotron + sortformer).


2. Integration Architecture: Library vs. Sidecar

Since parakeet-rs is a Rust crate and Tauri’s backend is Rust, the simplest approach is direct integration via Tauri commands:

┌──────────────────────────────────────┐
│           Tauri App Process          │
│  ┌────────────┐  ┌────────────────┐  │
│  │  WebView   │  │  Rust Backend  │  │
│  │ (React UI) │──│  - Tauri cmds  │  │
│  │            │  │  - parakeet-rs │  │
│  │            │  │  - audio cap   │  │
│  └────────────┘  └────────────────┘  │
└──────────────────────────────────────┘

Pros:

  • No IPC overhead — direct function calls
  • Shared memory — no serialization for audio buffers
  • Single process — simpler lifecycle management
  • Tauri command system handles JS↔Rust communication
  • Smaller bundle — no separate binary

Cons:

  • ONNX model loading blocks Tauri’s async runtime if not careful
  • Large models increase memory footprint of main process
  • Crash in ONNX runtime takes down entire app

Implementation:

// src-tauri/src/transcription.rs
use parakeet_rs::streaming::StreamingTranscriber;
use tauri::State;
use std::sync::Arc;
use tokio::sync::Mutex;

pub struct TranscriptionState {
    transcriber: Arc<Mutex<Option<StreamingTranscriber>>>,
}

#[tauri::command]
async fn start_transcription(
    state: State<'_, TranscriptionState>,
) -> Result<(), String> {
    let config = StreamingConfig::default()
        .with_model(ModelType::NemotronStreaming);
    let transcriber = StreamingTranscriber::new(config)
        .map_err(|e| e.to_string())?;
    *state.transcriber.lock().await = Some(transcriber);
    Ok(())
}

#[tauri::command]
async fn process_audio_chunk(
    state: State<'_, TranscriptionState>,
    audio: Vec<f32>,
) -> Result<TranscriptionResult, String> {
    let mut guard = state.transcriber.lock().await;
    let transcriber = guard.as_mut().ok_or("Not started")?;
    transcriber.transcribe_chunk(&audio)
        .map_err(|e| e.to_string())
}

2.2 Option B: Sidecar Process

Use this only if crash isolation is critical or if needing to support non-Rust ML backends (e.g., Python):

┌─────────────────┐     IPC (stdio)     ┌──────────────────┐
│    Tauri App     │ ←────────────────→  │  STT Sidecar     │
│  (UI + control)  │   JSON messages     │  (parakeet-rs)   │
└─────────────────┘                      └──────────────────┘

Tauri 2 sidecar config:

// tauri.conf.json
{
  "bundle": {
    "externalBin": [
      "binaries/remindr-stt"
    ]
  }
}

Binary naming convention:

binaries/
  remindr-stt-aarch64-apple-darwin    # Apple Silicon
  remindr-stt-x86_64-apple-darwin     # Intel Mac
  remindr-stt-x86_64-pc-windows-msvc  # Windows

IPC pattern (JSON over stdio):

// Sidecar reads stdin, writes stdout
// Frontend uses tauri_plugin_shell::ShellExt
let (mut rx, mut child) = app
    .shell()
    .sidecar("remindr-stt")?
    .args(["--streaming", "--model", "nemotron"])
    .spawn()?;

// Send audio chunks as base64 JSON
child.write(serde_json::to_vec(&AudioChunk { data: base64_audio })?)?;

// Receive transcription results
while let Some(event) = rx.recv().await {
    match event {
        CommandEvent::Stdout(line) => {
            let result: TranscriptionResult = serde_json::from_slice(&line)?;
            // Forward to frontend
        }
        _ => {}
    }
}

When to choose sidecar over direct integration:

  • Only if crash isolation is required (ONNX segfaults shouldn’t kill app)
  • Only if you need to support multiple ML backends (parakeet-rs + whisper.cpp)
  • Only if binary size is a concern (lazy-load the sidecar on first use)

2.3 Recommendation for Remindr

Use direct Rust integration (Option A) because:

  1. parakeet-rs is pure Rust — no FFI boundary concerns
  2. Audio buffer sharing without serialization saves ~50% CPU on chunk processing
  3. Tauri’s async command system already handles thread management
  4. ONNX Runtime is stable enough that crash isolation is unnecessary
  5. Simpler build pipeline — no separate sidecar compilation step

3. GPU Acceleration on macOS Apple Silicon

3.1 Execution Provider Options

ProvidermacOS SupportPerformanceStatus
CPUFullBaseline — already fast on M-seriesStable, recommended
CoreMLBrokenN/AGitHub issue #26355 — model inference fails
WebGPUVia Metal~1.5-2x over CPUExperimental but functional
Metal (direct)Not in ortN/ANot available via ONNX Runtime

3.2 Practical Guidance

CPU is the safe default. Parakeet-rs on CPU with Apple Silicon M3 is already significantly faster than Whisper with Metal acceleration. The architecture is optimized for ONNX inference on modern CPUs.

WebGPU as opt-in enhancement:

// Enable WebGPU (Metal under the hood on macOS)
let config = TranscriberConfig::default()
    .with_execution_provider(ExecutionProvider::WebGPU);

Note: WebGPU EP requires ort feature flag webgpu and may have compatibility issues across macOS versions. Test thoroughly.

CoreML is not viable (as of March 2026):

  • ONNX Runtime issue #26355 documents that Parakeet CTC models fail during inference with CoreML EP
  • The model loads successfully but crashes during actual inference
  • parakeet-rs maintainers explicitly recommend CPU or WebGPU instead

3.3 Memory Usage Patterns

ConfigurationPeak RAMSustained RAMNotes
Streaming (int8, CPU)~400MB~250MBModel + audio buffers
Streaming (fp32, CPU)~800MB~500MBFull precision
Batch (int8, CPU)~500MB~350MBHigher peak during long files
Streaming + Diarization (int8)~600MB~400MBTwo models loaded

For a 1-hour meeting transcription:

  • Audio buffer cycling keeps memory stable (no accumulation)
  • Cache-aware streaming reuses encoder states — no memory growth per chunk
  • Sortformer diarization state is bounded (4 speakers max)

4. Streaming Architecture for Remindr

4.1 Audio Pipeline

┌─────────┐    ┌──────────┐    ┌────────────┐    ┌───────────┐
│  System  │───→│  Audio   │───→│ parakeet-rs│───→│  Frontend │
│  Audio   │    │  Capture │    │  Streaming │    │  (React)  │
│  (mic +  │    │  (cpal)  │    │  Transcr.  │    │  Live UI  │
│  system) │    │  16kHz   │    │  560ms     │    │           │
└─────────┘    └──────────┘    └────────────┘    └───────────┘
                    │                │
                    │           ┌────────────┐
                    └──────────→│ Sortformer │
                                │ Diarizer   │
                                │ (parallel) │
                                └────────────┘

4.2 Chunk Processing Flow

  1. Audio capture via cpal crate — 16kHz mono PCM f32
  2. Buffer accumulation — collect 560ms chunks (8,960 samples at 16kHz)
  3. Parallel processing:
    • Feed chunk to streaming transcriber → partial text
    • Feed chunk to Sortformer diarizer → speaker segments
  4. Merge — align transcription with speaker IDs by timestamp
  5. Emit to frontend via Tauri event system
// Emit partial results to frontend
app.emit("transcription:partial", &PartialResult {
    text: partial.text,
    speaker_id: speaker.id,
    timestamp: chunk_start,
    is_final: partial.is_final,
})?;

4.3 Session Lifecycle

User clicks "Start Recording"
  → Initialize StreamingTranscriber + Diarizer (async, ~2-3s)
  → Begin audio capture
  → Process chunks in loop (560ms cadence)
  → Emit partial results via Tauri events
  → User clicks "Stop Recording"
  → Flush remaining audio buffer
  → Run final batch pass for accuracy correction (optional)
  → Save transcript to SQLite
  → Release models from memory

Important: Model initialization is expensive (~2-3 seconds). Pre-load models at app startup in background thread, not on first recording. Use tokio::spawn to avoid blocking Tauri’s event loop.


5. Fallback Strategy: >4 Speaker Diarization

5.1 The 4-Speaker Limit

Sortformer v2.1 (used by parakeet-rs) supports maximum 4 speakers. For meetings with >4 participants, Remindr needs a fallback.

PropertyValue
LibraryFluidAudio (Swift Package)
ModelLS-EEND (Long-Short End-to-End Neural Diarization)
Speaker limitHigher capacity than Sortformer (varies by model config)
PlatformmacOS 14.0+, Apple Silicon
AccelerationApple Neural Engine (ANE) — not GPU/MPS
IntegrationXPC Service from Tauri (Swift process)

5.3 XPC Service Architecture

Since FluidAudio is Swift-only, integrate via macOS XPC service:

┌─────────────────────┐     XPC      ┌────────────────────┐
│   Tauri App          │ ←─────────→  │  FluidAudio XPC    │
│   (Rust + WebView)   │  Mach IPC    │  (Swift service)   │
│                      │              │  - LS-EEND diarize │
│   parakeet-rs for    │              │  - ANE accelerated │
│   STT (always)       │              │  - >4 speakers     │
└─────────────────────┘              └────────────────────┘

Decision flow:

  1. Start recording → detect number of speakers after first 30 seconds
  2. If ≤4 speakers → use parakeet-rs Sortformer (in-process, fast)
  3. If >4 speakers → switch to FluidAudio XPC for diarization only
  4. STT always uses parakeet-rs regardless of diarization backend

5.4 Alternative: parakeet.cpp

A new alternative worth watching: parakeet.cpp (C++ implementation with native Metal/MPS acceleration). If it matures, it could replace the ONNX Runtime dependency entirely and provide native Metal acceleration without CoreML issues. Currently early-stage (March 2026).


6. Build & Distribution

6.1 Model Distribution Strategy

Don’t bundle models in the app binary. Instead:

  1. Ship app without models (~15MB DMG)
  2. On first launch, show “Downloading AI models…” progress
  3. Download from Cloudflare R2 (or HuggingFace CDN): ~470MB for int8 set
  4. Cache in ~/Library/Application Support/Remindr/models/
  5. Verify checksums on download
  6. Allow model updates independent of app updates

6.2 Build Configuration

# src-tauri/Cargo.toml
[dependencies]
parakeet-rs = { version = "0.2", features = ["streaming", "diarization"] }
ort = { version = "2.0", features = ["load-dynamic"] }
cpal = "0.15"  # Audio capture

# For WebGPU acceleration (optional)
# ort = { version = "2.0", features = ["load-dynamic", "webgpu"] }

Use load-dynamic for ort — this avoids bundling the full ONNX Runtime binary at compile time. Instead, ship the appropriate libonnxruntime.dylib alongside the app and load at runtime.

6.3 Code Signing & Notarization

ONNX Runtime dynamic library needs to be signed alongside the Tauri app:

  1. Sign libonnxruntime.dylib with your Apple Developer certificate
  2. Include in tauri.conf.json bundle resources
  3. Notarize the full .app bundle including the dylib
  4. See existing Tauri distribution report for full notarization workflow

7. Migration Path from pyannote-rs

7.1 Current State (pyannote-rs)

  • Speaker diarization via pyannote segmentation model
  • Separate from STT pipeline
  • Known issues: slower than Sortformer, complex setup

7.2 Migration Steps

StepActionRisk
1Add parakeet-rs dependency, keep pyannote-rsLow — additive
2Implement streaming STT with parakeet-rs (replace Whisper)Medium — core pipeline change
3Implement Sortformer diarization alongside pyannoteLow — parallel test
4A/B test diarization quality (Sortformer vs pyannote)None
5Remove pyannote-rs dependency if Sortformer matches/exceedsLow
6Add FluidAudio XPC fallback for >4 speakersMedium — XPC setup

7.3 Rollback Strategy

Keep pyannote-rs as a feature flag during migration:

[features]
default = ["parakeet-stt", "sortformer-diarize"]
legacy-diarize = ["pyannote-rs"]  # Fallback

8. Performance Benchmarks (Expected)

Based on parakeet-rs documentation and community reports:

ScenarioHardwareReal-time FactorNotes
Streaming STT (int8)M3 Pro, CPU~0.05x560ms chunk processed in ~28ms
Streaming STT (fp32)M3 Pro, CPU~0.08xStill well within real-time
Streaming STT (int8)M3 Pro, WebGPU~0.03x~40% faster than CPU
Batch STT (1hr audio)M3 Pro, CPU~0.04x~2.4 minutes for 1 hour
Sortformer diarizeM3 Pro, CPU~0.06xPer-chunk, streaming
Combined STT+diarizeM3 Pro, CPU~0.10xParallel processing

All scenarios are well within real-time requirements. CPU-only is the safe default.


Sources

Related Reports