Prontua POC Hardware Scope — CTO Technical Brief (MOKA-606)
Prontua POC Hardware Scope — CTO Technical Brief
Date: 2026-03-29 | Agent: CTO | Issue: MOKA-606 | Confidence: High
Context: Saito review notes #2 and #5 redirected the Prontua POC toward mobile-first UX and single-mic validation before adding complexity. This brief addresses all four deliverables and revises the Phase 1 prototype scope from MOKA-596.
TL;DR — Decisions
| Question | Decision | Confidence |
|---|---|---|
| Is the XIAO ESP32S3 built-in PDM mic good enough? | For POC at <1m, yes. For 1-2m ambient, no. Expect 25-35% WER pt-BR at distance. | High |
| What mic should we use for POC? | External ICS-43434 (I2S, 65dB SNR) wired to the XIAO. ~$5, 10min solder job. | High |
| Mobile streaming architecture? | Hybrid: BLE for provisioning, WiFi for audio streaming. Industry standard pattern. | High |
| BLE audio streaming? | Not for Phase 1. BLE 5.0 can barely handle raw 16kHz PCM (256kbps ≈ BLE ceiling). Only viable with Opus compression, and adds mobile app complexity with no benefit when WiFi works. | High |
| Is XIAO production-ready? | No. Dev board, no ANATEL, single-source. Use for POC, then custom PCB with ESP32-S3-WROOM-1 (ANATEL-certified) for production. | High |
| Phase 1 shortcut? | Hardcode WiFi, skip BLE, skip mobile app. For 1 device in 1 clinic, a web dashboard is faster than a native app. Mobile + BLE provisioning = Phase 2. | Medium |
1. Single Mic Audio Quality Validation
The Built-in Mic (MSM261D3526H1CPM)
| Spec | Value | Verdict |
|---|---|---|
| SNR | 59 dB(A) | Budget-tier. 6-11 dB worse than smartphone mics (65-70 dB) |
| Sensitivity | -26 dBFS @ 94 dB SPL | Standard |
| Frequency response | 100 Hz – 10 kHz | Adequate for speech (300 Hz – 4 kHz critical band) |
| Max SPL | 120 dB | Fine |
WER Estimates (Deepgram Nova-2 pt-BR)
| Distance | Built-in PDM (59dB SNR) | External ICS-43434 (65dB SNR) | Phone mic (65-70dB + DSP) |
|---|---|---|---|
| 0.3m | 18-25% | 14-20% | 12-16% |
| 1.0m | 25-32% | 18-25% | 14-20% |
| 2.0m | 32-40%+ | 22-30% | 16-22% |
Note: These are estimated ranges based on SNR delta analysis and Deepgram published benchmarks. Actual WER depends heavily on room acoustics, speaker clarity, and vet terminology. Deepgram’s keywords feature can boost domain terms by 5-10 WER points.
Known Issues with the Built-in Mic
- WiFi radio interference — audible buzzing when WiFi transmits simultaneously with audio capture. Mitigation: buffer-then-send pattern (capture → pause → transmit → resume).
- High noise floor — 59dB SNR means audible self-noise even in silent rooms.
- DC offset — requires software high-pass filter at ~80-100 Hz.
- No hardware AGC/AEC — all processing must be firmware or server-side.
Recommendation
Use an external ICS-43434 I2S mic for the POC. The built-in PDM mic is fine for a 5-minute demo at 0.3m but will produce frustrating results at real clinical distances (1-2m). The ICS-43434 provides a 6dB SNR improvement (2x signal-to-noise ratio) for ~$5 and a 10-minute solder job to the XIAO’s I2S pins.
For the POC firmware, support both PDM (built-in) and I2S (external) via a compile-time flag. This lets us A/B test audio quality in the field.
Validation Protocol (for clinic testing)
1. Place device at 0.5m, 1.0m, 1.5m, 2.0m from speaker
2. Record 3 min of simulated consult audio (Portuguese, normal speaking volume)
3. Run through Deepgram Nova-2 pt-BR with keywords=["SOAP vet terms"]
4. Measure WER against manual transcript
5. Repeat with built-in PDM mic and external ICS-43434
6. Target: <20% WER at 1.5m with external mic in quiet room
2. Mobile Streaming Architecture Decision
Architecture: Hybrid (BLE Setup + WiFi Streaming)
Phase 1 (Setup — one-time per device):
┌─────────┐ BLE GATT ┌─────────┐
│ ESP32S3 │◀──────────────▶│ Mobile │ ← WiFi credentials, clinic/room config
│ Device │ │ App │
└─────────┘ └─────────┘
Phase 2 (Daily operation — no phone needed):
┌─────────┐ WiFi/WS ┌─────────┐ Push/API ┌─────────┐
│ ESP32S3 │───────────────▶│ Cloud │─────────────▶│ Mobile │
│ Device │ │ Backend │◀─────────────│ App │
└─────────┘ └─────────┘ └─────────┘
Why This Architecture
| Alternative | Why not |
|---|---|
| BLE audio relay (Device→Phone→Cloud) | BLE 5.0 maxes at ~300kbps sustained. 16kHz PCM = 256kbps raw, zero headroom. Would need Opus compression + custom GATT service + phone app always running. Massive complexity for no benefit when WiFi is available. |
| WiFi Direct (Device→Phone→Cloud) | Phone loses normal WiFi connection. Terrible UX. Poorly supported on iOS. Do not pursue. |
| Device→Cloud only (no mobile) | Works for Phase 1 POC. But Saito’s direction is mobile-first UX. Adding mobile as review interface (not relay) is the right move. |
What “Mobile-First” Means for Prontua
Saito’s feedback is about the vet’s UX, not the streaming architecture. The mobile app is for:
- Setup: BLE provisioning (scan device → pair → configure WiFi → assign to room)
- Review: View generated SOAP notes, edit, approve, export
- Control: Start/stop consult sessions (optional, button is primary)
- Status: See device health, WiFi signal, recording indicator
The mobile app does NOT need to relay audio. The device streams directly to cloud over WiFi. The phone is the control plane, not the data plane.
Phase 1 Shortcut
For the prototype with 1 device in 1 clinic:
- Skip BLE provisioning. Hardcode WiFi SSID/password in firmware.
- Skip native mobile app. Use a responsive web dashboard (same as CEO’s Phase 1 spec, but mobile-optimized).
- Phase 2: Add BLE provisioning + React Native app when deploying to multiple clinics.
This saves 2-3 weeks of mobile development that doesn’t contribute to validating the core hypothesis (can we capture and transcribe vet consults?).
3. BLE Pairing Flow Specification
Flow (for Phase 2 mobile app)
1. DISCOVERY
Device powers on → BLE advertising (service UUID: Prontua)
LED: slow blue blink = "ready to pair"
App: scans for Prontua service UUID → shows device list
2. PAIRING
User taps device in app
App prompts to scan QR code on device body (proof-of-possession)
BLE GATT connection established with SRP6a encryption
3. WIFI PROVISIONING
App sends command: "scan WiFi networks"
Device scans → returns SSID list via BLE
User selects network + enters password
App sends encrypted credentials via BLE characteristic
Device connects to WiFi → reports status via BLE notification
LED: solid green = "connected"
4. CLOUD REGISTRATION
Device sends its unique ID to cloud via WiFi
App registers device: clinic ID + room name + device ID
Cloud confirms → device begins normal operation
BLE connection can be dropped
5. ONGOING (BLE control plane, optional)
App can reconnect via BLE for:
- Start/stop session
- View status (WiFi signal, battery, recording state)
- Trigger firmware OTA update
- Factory reset
Technical Stack
| Component | Implementation |
|---|---|
| ESP32 BLE provisioning | ESP-IDF wifi_provisioning component (production-ready, handles 90% of flow) |
| Security | SRP6a key exchange + PoP via QR code. No numeric comparison needed (no display). |
| Mobile (React Native) | react-native-ble-plx + react-native-esp-idf-provisioning |
| Mobile (native fallback) | Espressif official ESPProvision SDK for iOS/Android |
Security Model
| Threat | Mitigation |
|---|---|
| Unauthorized BLE pairing | Proof-of-possession: QR code printed on device. Required during pairing. |
| WiFi credential interception | SRP6a encryption over BLE channel. Never plaintext. |
| Device impersonation | Unique device certificate provisioned at first setup. Cloud validates via signed token. |
| Physical theft | Encrypted NVS (ESP32-S3 flash encryption). Factory reset button clears all credentials. |
Effort Estimate
| Task | Effort |
|---|---|
| Firmware: BLE provisioning (ESP-IDF component) | ~1 week |
| Firmware: BLE control plane (custom GATT service) | ~3 days |
| Mobile: provisioning flow screens | ~1 week |
| Mobile: status + control screens | ~3 days |
| Total | ~3 weeks |
4. Hardware Assessment: XIAO ESP32S3 for Production
Verdict: POC = Yes, Production = No
| Criterion | XIAO ESP32S3 Sense | Custom PCB + WROOM-1 |
|---|---|---|
| ANATEL certified | No | Yes (ICC 06.083/2023.1) |
| Supply chain | Single source (Seeed) | Multi-source (LCSC, DigiKey, Mouser) |
| Unit cost (1,000 qty) | ~$11 | ~$9 |
| Unit cost (10,000 qty) | ~$9 | ~$6.50 |
| Unnecessary components | Camera connector, SD slot, USB hub | Only what you need |
| Production readiness | Dev board | Production module |
Production BOM (ESP32-S3-WROOM-1-N8R8 based)
| Component | Cost @ 1,000 qty |
|---|---|
| ESP32-S3-WROOM-1-N8R8 | $3.20 |
| ICS-43434 or MSM261S4030H0 (I2S/PDM mic) | $0.80 |
| Power management + LiPo 1000mAh | $2.10 |
| PCB (2-layer) + SMT assembly | $1.60 |
| Enclosure (injection molded ABS) | $1.00 |
| LEDs + button + passives | $0.29 |
| Total | ~$8.99 |
Retail price target: $35-50 USD (subsidized by SaaS subscription).
ANATEL Certification Path
The ESP32-S3-WROOM-1 module is already ANATEL-certified. The final product still needs EMC homologation:
| Step | Duration | Cost |
|---|---|---|
| Pre-compliance testing | 2 weeks | ~$1,500 |
| OCD lab testing | 4-6 weeks | ~$5,000 |
| ANATEL submission | 4-8 weeks | ~$800 |
| Total | ~16 weeks | ~$7,300 |
Hardware Strategy
| Phase | Hardware | When | Units |
|---|---|---|---|
| Phase 1 POC | XIAO ESP32S3 Sense + external ICS-43434 mic | Now | 10 |
| Phase 2 MVP | Same XIAO, refined firmware + mobile app | Post-validation | 20-50 |
| Phase 3 Production | Custom PCB + WROOM-1 + ANATEL cert | Post-PMF | 500+ |
Key rule: Write firmware module-agnostic from day one. Use ESP-IDF GPIO abstraction so the same codebase runs on XIAO (POC) and custom board (production) with only a pin mapping change.
5. Revised Architecture (vs. Phase 1 Spec from MOKA-596)
Changes from CEO’s Phase 1 Spec
| Item | MOKA-596 (CEO) | MOKA-606 (CTO revision) | Rationale |
|---|---|---|---|
| Mic | Built-in PDM | External ICS-43434 + built-in PDM (A/B test) | 59dB SNR insufficient at 1-2m |
| Dashboard | Web SPA | Mobile-optimized web (Phase 1), React Native app (Phase 2) | Saito: mobile-first UX |
| Streaming | Device → Cloud → Web | Device → Cloud → Mobile (same architecture, different client) | Phone is review interface |
| BLE | Not mentioned | Phase 2: BLE provisioning + control | Needed for multi-clinic deployment |
| WiFi config | Hardcoded | Hardcoded Phase 1, BLE provisioning Phase 2 | Don’t build what you don’t need yet |
What Stays the Same
- Audio format: 16kHz 16-bit PCM mono
- Streaming: WebSocket to cloud backend
- STT: Deepgram Nova-2 pt-BR
- LLM: Claude Sonnet for SOAP generation
- Backend: Python FastAPI on devnest VPS
- Firmware: ESP-IDF (PlatformIO)
6. Open Questions / Risks
| # | Question | Owner | Priority |
|---|---|---|---|
| 1 | Does Saito’s “mobile” mean native app or mobile-optimized web? Clarify before building. | CPO/CEO | High |
| 2 | The XIAO’s PCB antenna is shared WiFi/BLE. In metal exam table environments, range may degrade. Need field testing. | CTO | Medium |
| 3 | WiFi reliability in older Brazilian clinic buildings (thick walls, old routers). If >30% of clinics have WiFi issues, BLE audio fallback becomes necessary. | CTO | Medium |
| 4 | Deepgram custom vocabulary for pt-BR vet terms — needs a domain-specific keyword list from a veterinarian. | CPO | Medium |
| 5 | Should the device have a physical mute button for LGPD consent visibility? LED-only or button+LED? | CPO | Low |
Supporting Research
Detailed technical research in companion reports:
reports/technology/2026-03-29-esp32s3-mobile-streaming-ble-architecture.md— BLE/WiFi architecture deep divereports/technology/2026-03-29-xiao-esp32s3-sense-production-hardware-assessment.md— Production hardware & ANATEL analysis