All reports
Technology by research

ESP32S3 Mobile Streaming Architectures — BLE Audio, WiFi, and Hybrid Patterns

prontua

ESP32S3 Mobile Streaming Architectures — BLE Audio, WiFi, and Hybrid Patterns

Date: 2026-03-29 | Context: Prontua ambient audio capture device for veterinary clinics Hardware: Seeed Studio XIAO ESP32S3 Sense (WiFi + BLE 5.0) Audio target: 16kHz 16-bit mono PCM (~256kbps raw)


1. BLE 5.0 Audio Streaming Feasibility on ESP32S3

1.1 Practical BLE 5.0 Throughput

BLE 5.0 on the ESP32-S3 (using ESP-IDF’s NimBLE or Bluedroid stack) has the following throughput characteristics:

ParameterBLE 4.2BLE 5.0 (ESP32-S3)Notes
Theoretical max PHY1 Mbps2 Mbps (2M PHY)Raw radio rate
Practical GATT throughput100–200 kbps200–400 kbpsAfter L2CAP overhead, MTU negotiation
Realistic sustained (notifications)80–150 kbps150–300 kbpsWith MTU=512, connection interval 7.5ms
With DLE (Data Length Extension)150–200 kbps250–350 kbps251-byte PDUs

Key finding: 256kbps raw PCM is at the edge of BLE 5.0 capability. It is technically possible but leaves almost zero headroom for retransmissions, protocol overhead, or interference. In practice, you would experience dropouts.

With Opus compression (24kbps at 16kHz mono), BLE 5.0 is extremely comfortable. You would use less than 10% of available bandwidth, leaving massive headroom for reliability.

1.2 BLE Audio Profiles and Codec Availability

OptionStatus on ESP32-S3Feasibility
LE Audio (BAP/ASCS + LC3)Not available. Espressif has not released LE Audio profile support for ESP-IDF as of March 2026. LC3 codec requires Bluetooth SIG licensing. ESP32-S3 hardware supports the underlying BLE 5.0 features, but the LE Audio host stack is missing.Not viable for MVP
Classic Bluetooth A2DPAvailable on ESP32 (not S3 variant specifically for sink). A2DP is designed for music playback, not mic capture. Would need to use HFP (Hands-Free Profile) for mic audio, which is 8kHz CVSD — far too low quality.Not suitable
Custom GATT ServiceFully supported. Define a custom GATT service with audio characteristic(s). Send Opus-encoded frames as BLE notifications. This is the standard approach for BLE audio on ESP32.Recommended approach

Recommended path: Custom GATT service streaming Opus-encoded audio frames via BLE notifications.

The custom GATT service approach:

  • Define a service UUID for audio streaming
  • One characteristic for audio data (notify, ~200-byte frames containing Opus packets)
  • One characteristic for control (start/stop recording, status)
  • One characteristic for metadata (session ID, timestamps, battery level)
  • MTU negotiated to 512 bytes for efficiency
  • Connection interval of 7.5–15ms for low latency

1.3 Latency Characteristics

Latency ComponentBLE Custom GATTWiFi WebSocket
Encode (Opus 20ms frames)20ms20ms
Transport to phone10–30ms5–15ms
Phone processing5–10ms5–10ms
Phone → Cloud upload50–200ms (LTE/WiFi)N/A (direct)
Total device-to-cloud85–260ms30–45ms

BLE adds ~50–200ms of latency for the phone relay hop. For batch STT processing (Prontua’s model), this is irrelevant — notes are generated after the consult ends, not in real-time.

1.4 Power Implications

ModeESP32-S3 Current DrawBattery Impact
WiFi active streaming180–240 mA~6–8 hours on 2000mAh LiPo
BLE active streaming30–60 mA~30–50 hours on 2000mAh LiPo
BLE + WiFi coexistence200–280 mA~5–7 hours on 2000mAh LiPo
WiFi periodic upload (buffered)Avg 50–80 mA (duty cycled)~20–30 hours on 2000mAh LiPo

BLE streaming is 3–5x more power efficient than WiFi continuous streaming. However, for a mains-powered exam room device (Prontua’s design), power consumption is not a primary constraint. Power becomes critical only if the device is battery-operated (e.g., wearable or portable variant).


2. Architecture Comparison: Device → Mobile → Cloud

Option A: Device → WiFi → Cloud → Mobile (Current Architecture)

┌─────────┐    WiFi/WS     ┌─────────┐    HTTPS     ┌─────────┐
│ ESP32S3 │───────────────▶│  Cloud   │────────────▶│  Mobile  │
│ Device  │                │  Backend │◀────────────│   App    │
└─────────┘                └─────────┘   REST/WS    └─────────┘
DimensionRatingNotes
LatencyGood (30–45ms to cloud)Direct path, no relay
ComplexityLowDevice just needs WiFi + WebSocket client
ReliabilityMediumDepends on clinic WiFi quality
BatteryN/A (mains)High power draw but device is plugged in
RangeUnlimited (WiFi)Works anywhere with WiFi coverage
UXSimpleNo phone involvement for streaming
Offline resiliencePoorNo streaming without WiFi
SetupMediumDevice needs WiFi credentials

Verdict: Best for Prontua’s current use case. Simplest architecture, fewest moving parts.

Option B: Device → BLE → Mobile App → Cloud

┌─────────┐    BLE GATT    ┌─────────┐    LTE/WiFi  ┌─────────┐
│ ESP32S3 │───────────────▶│  Mobile  │────────────▶│  Cloud   │
│ Device  │                │   App    │              │  Backend │
└─────────┘                └─────────┘              └─────────┘
DimensionRatingNotes
LatencyGood enough (85–260ms)Extra hop through phone, but fine for batch
ComplexityHighNeed mobile app with BLE stack + background audio relay + cloud upload
ReliabilityMediumBLE connection can drop; phone must stay in range and app in foreground/background
Battery (device)Excellent3–5x less power than WiFi
Battery (phone)Moderate drainContinuous BLE + uploading taxes the vet’s phone
Range10–30m (BLE 5.0)Sufficient for exam room, not for multi-room
UXComplexRequires app running, phone proximity, BLE pairing
Offline resilienceGoodPhone can buffer and upload when connectivity returns
SetupEasyBLE pairing is simpler than WiFi provisioning for users

Verdict: Makes sense only if (a) WiFi is unreliable/unavailable, or (b) the device must be battery-powered portable. Adds significant mobile app complexity.

Option C: Device → WiFi Direct → Mobile → Cloud

┌─────────┐   WiFi Direct  ┌─────────┐    LTE/WiFi  ┌─────────┐
│ ESP32S3 │───────────────▶│  Mobile  │────────────▶│  Cloud   │
│ Device  │                │   App    │              │  Backend │
└─────────┘                └─────────┘              └─────────┘
DimensionRatingNotes
LatencyGood (10–30ms to phone)WiFi speeds without infrastructure
ComplexityVery HighWiFi Direct on ESP32 is poorly documented; phone loses normal WiFi while connected
ReliabilityLowWiFi Direct is flaky; phone OS may disconnect in background
Battery (device)PoorWiFi-level power consumption
Range30–50mBetter than BLE
UXTerriblePhone disconnects from normal WiFi; user must explicitly manage connection
SetupHardPoor OS support, especially on iOS

Verdict: Do not pursue. WiFi Direct is poorly supported on modern phones, disrupts the phone’s normal WiFi, and adds massive complexity for no clear benefit.

Option D (Hybrid): BLE for Setup + WiFi for Streaming

Phase 1 (Setup):
┌─────────┐    BLE GATT    ┌─────────┐
│ ESP32S3 │◀──────────────▶│  Mobile  │  ← WiFi credentials + config
│ Device  │                │   App    │
└─────────┘                └─────────┘

Phase 2 (Operation):
┌─────────┐    WiFi/WS     ┌─────────┐    HTTPS     ┌─────────┐
│ ESP32S3 │───────────────▶│  Cloud   │────────────▶│  Mobile  │
│ Device  │                │  Backend │◀────────────│   App    │
└─────────┘                └─────────┘              └─────────┘
DimensionRatingNotes
LatencyBest of bothWiFi for data, BLE only for setup
ComplexityMediumBLE provisioning is well-documented pattern; WiFi streaming is current plan
ReliabilityHighWiFi for steady-state; BLE only during setup
UXExcellentIndustry-standard IoT setup flow; no ongoing phone dependency
SetupEasyScan → pair → enter WiFi → done. Like setting up a smart home device

Verdict: RECOMMENDED architecture. This is the industry standard pattern for consumer IoT devices.

Summary Matrix

CriterionA: WiFi DirectB: BLE RelayC: WiFi DirectD: Hybrid (BLE setup + WiFi stream)
MVP ComplexityLowHighVery HighMedium
UX QualityGoodMediumPoorExcellent
Audio ReliabilityHighMediumLowHigh
Setup ExperienceHard (manual)Easy (BLE)HardEasy (BLE)
ScalabilityHighLowLowHigh
Phone DependencyNoneFullFullNone (after setup)

3. BLE Pairing Flow for Consumer IoT Devices

3.1 Industry-Standard BLE Setup Flow

The universal pattern used by Nest, Ring, ESP RainMaker, Tuya, and medical IoT devices:

1. DEVICE DISCOVERY
   ├── Device powers on → enters BLE advertising mode
   ├── Broadcasts device name + service UUID
   ├── LED blinks pattern indicating "ready to pair"
   └── App scans for devices with matching service UUID

2. BLE CONNECTION
   ├── User selects device from scan results
   ├── BLE GATT connection established
   ├── Optional: Secure pairing (LESC with numeric comparison or passkey)
   └── Device sends capabilities + firmware version

3. WIFI PROVISIONING (via BLE)
   ├── App requests WiFi scan from device (via BLE characteristic)
   ├── Device scans for WiFi networks → returns SSID list
   ├── User selects network + enters password
   ├── App sends credentials via encrypted BLE characteristic
   ├── Device attempts WiFi connection
   ├── Device reports connection status via BLE notification
   └── On success: device sends its cloud endpoint / device ID

4. CLOUD REGISTRATION
   ├── App registers device with cloud backend (device ID + clinic ID + room assignment)
   ├── Cloud confirms registration
   ├── Device begins normal WiFi operation
   └── BLE connection can be dropped (or kept for local control)

5. ONGOING (optional BLE control plane)
   ├── BLE for: start/stop recording, view status, firmware update trigger
   └── WiFi for: audio streaming, cloud sync, telemetry

3.2 ESP32-S3 Provisioning Libraries

Espressif provides ESP-IDF WiFi Provisioning framework that implements exactly this flow:

ComponentLibraryNotes
BLE provisioningwifi_provisioning (ESP-IDF)Built-in BLE-based WiFi provisioning. Handles scan, credential exchange, connection verification.
Securityprotocomm with SRP6a or proof-of-possessionEncrypted channel over BLE. Supports numeric PIN verification.
Mobile SDKsESP RainMaker SDKs (iOS + Android)Open-source mobile libraries for the provisioning flow. Can be customized.
Custom GATTNimBLE or BluedroidFor adding Prontua-specific services (control, status) alongside provisioning

ESP-IDF’s wifi_provisioning component is production-ready and handles 90% of the setup flow out of the box. The remaining 10% is Prontua-specific: clinic ID assignment, room selection, cloud registration.

DeviceSetup FlowKey UX Decisions
Google Nest / HomeBLE scan → ultrasonic pairing → WiFi provisioning via BLE → cloud registrationUses ultrasonic tone as proof-of-proximity. Excellent UX.
Amazon EchoWiFi AP mode (device creates hotspot) → app connects → WiFi provisioning → cloud registrationOlder pattern; BLE is now preferred. AP mode confuses users (phone switches WiFi).
ESP RainMaker devicesBLE scan → SRP6a pairing → WiFi provisioning → cloud registrationOpen-source. Reference implementation for ESP32.
Withings medical devicesBLE scan → pairing → data sync via BLE (no WiFi)Medical devices often stay BLE-only for simplicity and certification.
Owlet baby monitorBLE scan → pair → WiFi provisioning → cloud streamingVery similar use case to Prontua (ambient monitoring device).
Tuya smart homeBLE scan → quick pair → WiFi provisioning → Tuya CloudMassive scale (millions of devices). EZ Mode + BLE combo.

Common pattern: BLE for setup, WiFi for operation. The only devices that stream over BLE long-term are wearables where WiFi is unavailable (fitness trackers, hearing aids).

3.4 Security Considerations for Clinical Setting

For a veterinary clinical environment, the threat model is lower than human medical (no HIPAA), but LGPD still applies:

ThreatMitigationPriority
Unauthorized BLE pairingProof-of-possession (PoP): device has a QR code or printed PIN. App must present this during pairing. Prevents random BLE scanning from pairing.High
WiFi credential interceptionESP-IDF provisioning uses SRP6a (Secure Remote Password) over BLE — credentials are never sent in plaintext.High
BLE eavesdroppingBLE 5.0 LE Secure Connections (LESC) with AES-CCM encryption. ESP32-S3 supports this natively.Medium
Device impersonationEach device gets a unique certificate at manufacturing (or first provisioning). Cloud validates device identity via mTLS or signed tokens.Medium
Physical device theftCredentials stored in ESP32-S3 encrypted NVS (flash encryption). Factory reset button clears credentials.Low
Man-in-the-middle during setupNumeric comparison during BLE pairing (user confirms 6-digit code on both app and device LED/display). For a device without display: use PoP (QR code on device body).Medium

Recommended security level for Prontua MVP:

  • Proof-of-Possession (QR code printed on device) for BLE pairing
  • SRP6a for WiFi credential exchange
  • TLS 1.3 for all WiFi communication
  • Encrypted NVS for stored credentials
  • No numeric comparison needed (device has no display) — PoP is sufficient

3.5 Mobile Framework Comparison for BLE

FrameworkBLE SupportProvisioning LibrariesMaturityRecommendation
React Native + react-native-ble-plxGood. Stable library, supports GATT operations, scanning, notifications.No official ESP provisioning SDK. Would need to implement protocol manually or use react-native-esp-idf-provisioning (community).High for BLE basics; medium for ESP provisioningGood choice if team knows RN
React Native + react-native-esp-idf-provisioningWraps Espressif’s native provisioning SDKsDirect support for ESP-IDF WiFi provisioning flow (BLE + SoftAP). Handles SRP6a, proof-of-possession.Medium (community-maintained but actively used)Best if using ESP-IDF provisioning
Flutter + flutter_blue_plusGood. Similar capabilities to RN BLE.esp_provisioning Flutter package exists (community).High for BLE; medium for ESPGood alternative
Native iOS (CoreBluetooth)Excellent. Full BLE 5.0 support. Best performance and reliability.Espressif provides official ESPProvision iOS SDK.Very HighBest if iOS-first
Native Android (Android BLE API)Good but notoriously complex. Many device-specific quirks.Espressif provides official ESPProvision Android SDK.High (but painful)Best if Android-first
Kotlin Multiplatform + KableGood. Kable is a solid cross-platform BLE library.No ESP provisioning wrapper. Manual implementation needed.MediumEmerging option

Recommendation for Prontua:

If the vet mobile app is React Native (likely given Moklabs stack): use react-native-ble-plx for custom BLE operations + react-native-esp-idf-provisioning for the WiFi setup flow. This gives you the best combination of ESP-IDF compatibility and cross-platform support.

If going native: Espressif’s official SDKs (ESPProvision for iOS and Android) are production-proven and well-documented.


4. ESP32-S3 BLE + WiFi Coexistence

4.1 Can ESP32-S3 Run BLE and WiFi Simultaneously?

Yes. The ESP32-S3 has a single radio that time-shares between WiFi and BLE using a coexistence arbitration mechanism. This is well-supported in ESP-IDF.

AspectDetails
HardwareSingle 2.4GHz radio, hardware coexistence arbiter
ESP-IDF supportCONFIG_SW_COEXIST_ENABLE=y in menuconfig. Enabled by default in recent ESP-IDF versions.
Performance impactWiFi throughput reduced by ~10–20% when BLE is active. BLE may see occasional increased latency (10–50ms extra).
StabilityProduction-proven. Millions of ESP32 devices run WiFi+BLE simultaneously (e.g., all Tuya smart home devices).
Known issuesWith aggressive WiFi streaming (high throughput, continuous), BLE scan/advertising may become less responsive. Not a problem if BLE is used only for control (low duty cycle).

4.2 Practical Coexistence Patterns

Pattern 1: BLE for Provisioning, WiFi for Operation (Sequential)

Boot → BLE advertising (WiFi off)
     → User provisions via BLE
     → WiFi connects
     → BLE stops advertising (or enters low-power standby)
     → WiFi handles all streaming
     → BLE reactivated only for: reconfiguration, status queries, firmware update trigger

This is the simplest and most reliable pattern. No simultaneous operation needed.

Pattern 2: BLE Control Plane + WiFi Data Plane (Simultaneous)

Boot → BLE advertising + WiFi connected (coexistence)
     → WiFi streams audio continuously
     → BLE handles: start/stop commands, status, battery, device info
     → BLE operates at low duty cycle (~1 packet/sec for status)

This works well because BLE control traffic is minimal (~100 bytes/sec) and does not meaningfully contend with WiFi streaming. The coexistence arbiter handles scheduling.

Pattern 3: BLE Fallback When WiFi Drops (Failover)

Normal: WiFi streaming active, BLE standby
WiFi drops → Device detects disconnect
          → Activates BLE streaming (Opus-compressed audio to phone)
          → Phone relays to cloud via LTE
WiFi reconnects → Switch back to WiFi streaming
               → BLE returns to standby

This is the most complex pattern but provides the best reliability. Useful if WiFi is unreliable.

4.3 Recommendation for Prontua

Phase 1 (Prototype): Pattern 1 — Sequential. Hardcode WiFi credentials. No BLE needed yet.

Phase 2 (MVP): Pattern 2 — BLE for provisioning + control plane, WiFi for audio streaming. This gives you:

  • Easy device setup via mobile app (BLE provisioning)
  • Reliable audio streaming (WiFi)
  • Remote control without cloud dependency (BLE: start/stop, status check)
  • No phone dependency during normal operation

Phase 3 (If WiFi proves unreliable): Pattern 3 — Add BLE audio fallback. Only build this if clinic WiFi testing in Phase 1 reveals problems in >30% of sites.


5. Consolidated Recommendation

Architecture Decision

Use Option D: Hybrid BLE Setup + WiFi Streaming.

This is the industry standard for IoT devices, is well-supported by Espressif’s toolchain, and provides the best user experience without adding runtime complexity.

Implementation Priority

PriorityComponentPhaseEffort
1WiFi audio streaming (current plan)Phase 1Already planned
2BLE provisioning flow (WiFi setup via app)Phase 2~1 week firmware + ~1 week mobile
3BLE control plane (start/stop, status)Phase 2~3 days firmware + ~2 days mobile
4BLE audio fallback (if WiFi unreliable)Phase 3 (conditional)~2 weeks firmware + ~1 week mobile

What NOT to Build

  • LE Audio / LC3: Not available on ESP32-S3. Do not wait for it.
  • WiFi Direct: Terrible UX, poor OS support. Do not pursue.
  • BLE as primary audio transport: Unnecessary complexity when WiFi is available and device is mains-powered.
  • Classic Bluetooth A2DP/HFP: Wrong use case, poor quality for speech capture.

Key Technical Specifications for BLE Implementation

BLE Service: Prontua Device Service
  UUID: (custom 128-bit)

  Characteristics:
  ├── Audio Data (notify, MTU=512)
  │   └── Opus frames, 20ms, ~60 bytes/frame
  ├── Control (read/write)
  │   └── Commands: START_SESSION, STOP_SESSION, STATUS
  ├── Device Info (read)
  │   └── Firmware version, battery %, WiFi status, device ID
  ├── WiFi Config (write, encrypted)
  │   └── SSID + password (via ESP-IDF provisioning protocol)
  └── Session Metadata (notify)
      └── Session ID, timestamp, duration, audio stats

Connection Parameters:
  ├── MTU: 512 bytes (negotiate on connect)
  ├── Connection interval: 15ms (control) / 7.5ms (audio fallback)
  ├── PHY: 2M (BLE 5.0) when available
  └── Security: LESC + PoP (QR code)

Mobile App BLE Integration (React Native)

Dependencies:
  ├── react-native-ble-plx (BLE operations)
  ├── react-native-esp-idf-provisioning (WiFi setup)
  └── react-native-qrcode-scanner (PoP from device QR)

Screens:
  ├── Device Setup
  │   ├── Scan for Prontua devices (BLE scan)
  │   ├── Scan QR code on device (proof-of-possession)
  │   ├── Select WiFi network (from device scan results)
  │   ├── Enter WiFi password
  │   ├── Confirm setup (device connects to WiFi)
  │   └── Assign to clinic room
  ├── Device Status
  │   ├── Connection status (WiFi signal, BLE proximity)
  │   ├── Current session status (recording/idle)
  │   └── Battery level (if battery-powered variant)
  └── Session Control
      ├── Start/stop consult (via BLE or cloud API)
      └── View recent sessions

6. Open Questions for CTO

  1. Is the XIAO ESP32S3 Sense BLE antenna adequate? The XIAO has a PCB antenna shared between WiFi and BLE. In a metal exam table environment, range may be reduced. Test BLE range in actual clinic before committing to BLE features.

  2. React Native or native for mobile app? If the app is primarily a dashboard (view notes, edit, approve), React Native is fine. If BLE reliability is critical (many background operations), native may be more robust for BLE specifically.

  3. Should BLE control replace the physical button? The current Phase 1 spec uses the boot button for start/stop. BLE control from the phone could replace this, but a physical fallback is important (phone not available, app crashed).

  4. Phase 1 shortcut: skip BLE entirely? For the prototype with 1 device in 1 clinic, hardcoded WiFi credentials and no mobile app is the fastest path. BLE provisioning adds value only when deploying to multiple clinics/rooms.


Research complete. This analysis supports the existing WiFi-first architecture while laying out a clear path for BLE provisioning and control in Phase 2.

Related Reports