All reports
Technology by cto

Prontua POC Hardware Scope — CTO Technical Brief (MOKA-606)

prontua

Prontua POC Hardware Scope — CTO Technical Brief

Date: 2026-03-29 | Agent: CTO | Issue: MOKA-606 | Confidence: High

Context: Saito review notes #2 and #5 redirected the Prontua POC toward mobile-first UX and single-mic validation before adding complexity. This brief addresses all four deliverables and revises the Phase 1 prototype scope from MOKA-596.


TL;DR — Decisions

QuestionDecisionConfidence
Is the XIAO ESP32S3 built-in PDM mic good enough?For POC at <1m, yes. For 1-2m ambient, no. Expect 25-35% WER pt-BR at distance.High
What mic should we use for POC?External ICS-43434 (I2S, 65dB SNR) wired to the XIAO. ~$5, 10min solder job.High
Mobile streaming architecture?Hybrid: BLE for provisioning, WiFi for audio streaming. Industry standard pattern.High
BLE audio streaming?Not for Phase 1. BLE 5.0 can barely handle raw 16kHz PCM (256kbps ≈ BLE ceiling). Only viable with Opus compression, and adds mobile app complexity with no benefit when WiFi works.High
Is XIAO production-ready?No. Dev board, no ANATEL, single-source. Use for POC, then custom PCB with ESP32-S3-WROOM-1 (ANATEL-certified) for production.High
Phase 1 shortcut?Hardcode WiFi, skip BLE, skip mobile app. For 1 device in 1 clinic, a web dashboard is faster than a native app. Mobile + BLE provisioning = Phase 2.Medium

1. Single Mic Audio Quality Validation

The Built-in Mic (MSM261D3526H1CPM)

SpecValueVerdict
SNR59 dB(A)Budget-tier. 6-11 dB worse than smartphone mics (65-70 dB)
Sensitivity-26 dBFS @ 94 dB SPLStandard
Frequency response100 Hz – 10 kHzAdequate for speech (300 Hz – 4 kHz critical band)
Max SPL120 dBFine

WER Estimates (Deepgram Nova-2 pt-BR)

DistanceBuilt-in PDM (59dB SNR)External ICS-43434 (65dB SNR)Phone mic (65-70dB + DSP)
0.3m18-25%14-20%12-16%
1.0m25-32%18-25%14-20%
2.0m32-40%+22-30%16-22%

Note: These are estimated ranges based on SNR delta analysis and Deepgram published benchmarks. Actual WER depends heavily on room acoustics, speaker clarity, and vet terminology. Deepgram’s keywords feature can boost domain terms by 5-10 WER points.

Known Issues with the Built-in Mic

  1. WiFi radio interference — audible buzzing when WiFi transmits simultaneously with audio capture. Mitigation: buffer-then-send pattern (capture → pause → transmit → resume).
  2. High noise floor — 59dB SNR means audible self-noise even in silent rooms.
  3. DC offset — requires software high-pass filter at ~80-100 Hz.
  4. No hardware AGC/AEC — all processing must be firmware or server-side.

Recommendation

Use an external ICS-43434 I2S mic for the POC. The built-in PDM mic is fine for a 5-minute demo at 0.3m but will produce frustrating results at real clinical distances (1-2m). The ICS-43434 provides a 6dB SNR improvement (2x signal-to-noise ratio) for ~$5 and a 10-minute solder job to the XIAO’s I2S pins.

For the POC firmware, support both PDM (built-in) and I2S (external) via a compile-time flag. This lets us A/B test audio quality in the field.

Validation Protocol (for clinic testing)

1. Place device at 0.5m, 1.0m, 1.5m, 2.0m from speaker
2. Record 3 min of simulated consult audio (Portuguese, normal speaking volume)
3. Run through Deepgram Nova-2 pt-BR with keywords=["SOAP vet terms"]
4. Measure WER against manual transcript
5. Repeat with built-in PDM mic and external ICS-43434
6. Target: <20% WER at 1.5m with external mic in quiet room

2. Mobile Streaming Architecture Decision

Architecture: Hybrid (BLE Setup + WiFi Streaming)

Phase 1 (Setup — one-time per device):
┌─────────┐    BLE GATT    ┌─────────┐
│ ESP32S3 │◀──────────────▶│  Mobile  │  ← WiFi credentials, clinic/room config
│ Device  │                │   App    │
└─────────┘                └─────────┘

Phase 2 (Daily operation — no phone needed):
┌─────────┐    WiFi/WS     ┌─────────┐   Push/API   ┌─────────┐
│ ESP32S3 │───────────────▶│  Cloud   │─────────────▶│  Mobile  │
│ Device  │                │  Backend │◀─────────────│   App    │
└─────────┘                └─────────┘              └─────────┘

Why This Architecture

AlternativeWhy not
BLE audio relay (Device→Phone→Cloud)BLE 5.0 maxes at ~300kbps sustained. 16kHz PCM = 256kbps raw, zero headroom. Would need Opus compression + custom GATT service + phone app always running. Massive complexity for no benefit when WiFi is available.
WiFi Direct (Device→Phone→Cloud)Phone loses normal WiFi connection. Terrible UX. Poorly supported on iOS. Do not pursue.
Device→Cloud only (no mobile)Works for Phase 1 POC. But Saito’s direction is mobile-first UX. Adding mobile as review interface (not relay) is the right move.

What “Mobile-First” Means for Prontua

Saito’s feedback is about the vet’s UX, not the streaming architecture. The mobile app is for:

  • Setup: BLE provisioning (scan device → pair → configure WiFi → assign to room)
  • Review: View generated SOAP notes, edit, approve, export
  • Control: Start/stop consult sessions (optional, button is primary)
  • Status: See device health, WiFi signal, recording indicator

The mobile app does NOT need to relay audio. The device streams directly to cloud over WiFi. The phone is the control plane, not the data plane.

Phase 1 Shortcut

For the prototype with 1 device in 1 clinic:

  • Skip BLE provisioning. Hardcode WiFi SSID/password in firmware.
  • Skip native mobile app. Use a responsive web dashboard (same as CEO’s Phase 1 spec, but mobile-optimized).
  • Phase 2: Add BLE provisioning + React Native app when deploying to multiple clinics.

This saves 2-3 weeks of mobile development that doesn’t contribute to validating the core hypothesis (can we capture and transcribe vet consults?).


3. BLE Pairing Flow Specification

Flow (for Phase 2 mobile app)

1. DISCOVERY
   Device powers on → BLE advertising (service UUID: Prontua)
   LED: slow blue blink = "ready to pair"
   App: scans for Prontua service UUID → shows device list

2. PAIRING
   User taps device in app
   App prompts to scan QR code on device body (proof-of-possession)
   BLE GATT connection established with SRP6a encryption

3. WIFI PROVISIONING
   App sends command: "scan WiFi networks"
   Device scans → returns SSID list via BLE
   User selects network + enters password
   App sends encrypted credentials via BLE characteristic
   Device connects to WiFi → reports status via BLE notification
   LED: solid green = "connected"

4. CLOUD REGISTRATION
   Device sends its unique ID to cloud via WiFi
   App registers device: clinic ID + room name + device ID
   Cloud confirms → device begins normal operation
   BLE connection can be dropped

5. ONGOING (BLE control plane, optional)
   App can reconnect via BLE for:
   - Start/stop session
   - View status (WiFi signal, battery, recording state)
   - Trigger firmware OTA update
   - Factory reset

Technical Stack

ComponentImplementation
ESP32 BLE provisioningESP-IDF wifi_provisioning component (production-ready, handles 90% of flow)
SecuritySRP6a key exchange + PoP via QR code. No numeric comparison needed (no display).
Mobile (React Native)react-native-ble-plx + react-native-esp-idf-provisioning
Mobile (native fallback)Espressif official ESPProvision SDK for iOS/Android

Security Model

ThreatMitigation
Unauthorized BLE pairingProof-of-possession: QR code printed on device. Required during pairing.
WiFi credential interceptionSRP6a encryption over BLE channel. Never plaintext.
Device impersonationUnique device certificate provisioned at first setup. Cloud validates via signed token.
Physical theftEncrypted NVS (ESP32-S3 flash encryption). Factory reset button clears all credentials.

Effort Estimate

TaskEffort
Firmware: BLE provisioning (ESP-IDF component)~1 week
Firmware: BLE control plane (custom GATT service)~3 days
Mobile: provisioning flow screens~1 week
Mobile: status + control screens~3 days
Total~3 weeks

4. Hardware Assessment: XIAO ESP32S3 for Production

Verdict: POC = Yes, Production = No

CriterionXIAO ESP32S3 SenseCustom PCB + WROOM-1
ANATEL certifiedNoYes (ICC 06.083/2023.1)
Supply chainSingle source (Seeed)Multi-source (LCSC, DigiKey, Mouser)
Unit cost (1,000 qty)~$11~$9
Unit cost (10,000 qty)~$9~$6.50
Unnecessary componentsCamera connector, SD slot, USB hubOnly what you need
Production readinessDev boardProduction module

Production BOM (ESP32-S3-WROOM-1-N8R8 based)

ComponentCost @ 1,000 qty
ESP32-S3-WROOM-1-N8R8$3.20
ICS-43434 or MSM261S4030H0 (I2S/PDM mic)$0.80
Power management + LiPo 1000mAh$2.10
PCB (2-layer) + SMT assembly$1.60
Enclosure (injection molded ABS)$1.00
LEDs + button + passives$0.29
Total~$8.99

Retail price target: $35-50 USD (subsidized by SaaS subscription).

ANATEL Certification Path

The ESP32-S3-WROOM-1 module is already ANATEL-certified. The final product still needs EMC homologation:

StepDurationCost
Pre-compliance testing2 weeks~$1,500
OCD lab testing4-6 weeks~$5,000
ANATEL submission4-8 weeks~$800
Total~16 weeks~$7,300

Hardware Strategy

PhaseHardwareWhenUnits
Phase 1 POCXIAO ESP32S3 Sense + external ICS-43434 micNow10
Phase 2 MVPSame XIAO, refined firmware + mobile appPost-validation20-50
Phase 3 ProductionCustom PCB + WROOM-1 + ANATEL certPost-PMF500+

Key rule: Write firmware module-agnostic from day one. Use ESP-IDF GPIO abstraction so the same codebase runs on XIAO (POC) and custom board (production) with only a pin mapping change.


5. Revised Architecture (vs. Phase 1 Spec from MOKA-596)

Changes from CEO’s Phase 1 Spec

ItemMOKA-596 (CEO)MOKA-606 (CTO revision)Rationale
MicBuilt-in PDMExternal ICS-43434 + built-in PDM (A/B test)59dB SNR insufficient at 1-2m
DashboardWeb SPAMobile-optimized web (Phase 1), React Native app (Phase 2)Saito: mobile-first UX
StreamingDevice → Cloud → WebDevice → Cloud → Mobile (same architecture, different client)Phone is review interface
BLENot mentionedPhase 2: BLE provisioning + controlNeeded for multi-clinic deployment
WiFi configHardcodedHardcoded Phase 1, BLE provisioning Phase 2Don’t build what you don’t need yet

What Stays the Same

  • Audio format: 16kHz 16-bit PCM mono
  • Streaming: WebSocket to cloud backend
  • STT: Deepgram Nova-2 pt-BR
  • LLM: Claude Sonnet for SOAP generation
  • Backend: Python FastAPI on devnest VPS
  • Firmware: ESP-IDF (PlatformIO)

6. Open Questions / Risks

#QuestionOwnerPriority
1Does Saito’s “mobile” mean native app or mobile-optimized web? Clarify before building.CPO/CEOHigh
2The XIAO’s PCB antenna is shared WiFi/BLE. In metal exam table environments, range may degrade. Need field testing.CTOMedium
3WiFi reliability in older Brazilian clinic buildings (thick walls, old routers). If >30% of clinics have WiFi issues, BLE audio fallback becomes necessary.CTOMedium
4Deepgram custom vocabulary for pt-BR vet terms — needs a domain-specific keyword list from a veterinarian.CPOMedium
5Should the device have a physical mute button for LGPD consent visibility? LED-only or button+LED?CPOLow

Supporting Research

Detailed technical research in companion reports:

  • reports/technology/2026-03-29-esp32s3-mobile-streaming-ble-architecture.md — BLE/WiFi architecture deep dive
  • reports/technology/2026-03-29-xiao-esp32s3-sense-production-hardware-assessment.md — Production hardware & ANATEL analysis

Related Reports