mellow ai feedback pronunciation assessment
AI-Powered Explicit Feedback Design for Language Learning — Claude API Integration Patterns and Pronunciation Assessment
Product: Mellow | Date: 2026-03-20 | Tags: claude-api, pronunciation-assessment, ai-tutor, explicit-feedback, spaced-repetition, FSRS, react-native, speech-to-text
Executive Summary
Mellow’s core differentiation is explicit, autism-friendly AI feedback — no ambiguity, no sarcasm, concrete corrections with explanations. This report covers: (1) Claude API integration patterns for language tutoring with autism-specific prompt design, (2) pronunciation assessment API comparison and React Native integration, (3) async speaking practice architecture, and (4) energy-adaptive spaced repetition using FSRS. Total estimated API cost per active user: $0.15-0.40/month.
1. Claude API for Explicit Language Feedback
1.1 Why Claude for Mellow
| Requirement | Claude Fit |
|---|---|
| No sarcasm or idioms in feedback | Claude follows system prompt instructions precisely |
| Explicit corrections (show wrong + right) | Strong at structured output (JSON feedback objects) |
| Portuguese-to-English context | Excellent multilingual capability |
| Beginner A1 level explanations | Can be constrained to simple vocabulary |
| Consistent tone across sessions | System prompt ensures uniform personality |
1.2 System Prompt Architecture
┌────────────────────────────────────────┐
│ SYSTEM PROMPT │
│ │
│ Role: Patient English tutor │
│ Constraints: │
│ - Never use sarcasm, irony, humor │
│ - Always explicit corrections │
│ - Simple A1-level language │
│ - Portuguese explanations available │
│ - Structured JSON output │
│ - No time pressure language │
│ - Celebrate effort, not speed │
└────────────────────────────────────────┘
Example System Prompt
You are a patient, clear English tutor for a Portuguese-speaking adult
learner at A1 level. Your student is autistic and prefers explicit,
concrete feedback.
RULES:
1. NEVER use sarcasm, irony, jokes, or figurative language
2. ALWAYS show the correct answer alongside the error
3. Explain WHY something is wrong in simple terms
4. Use Portuguese for grammar explanations when helpful
5. Keep sentences short (max 15 words per sentence)
6. One concept per response — never bundle corrections
7. Use "You wrote X. The correct form is Y." format
8. Never say "try again" — always provide the answer
9. Never create time pressure ("hurry", "quick", "before time runs out")
10. Acknowledge effort: "You got 4 out of 5 correct. Great practice."
OUTPUT FORMAT (JSON):
{
"feedback": "string — main feedback message",
"correction": { "wrong": "string", "correct": "string" } | null,
"explanation": "string — why (in Portuguese if grammar)",
"encouragement": "string — effort-based, never comparative",
"nextHint": "string — optional tip for next exercise"
}
1.3 Feedback Use Cases
Grammar Correction
{
"feedback": "You wrote 'She go to school.' The correct form is 'She goes to school.'",
"correction": { "wrong": "She go", "correct": "She goes" },
"explanation": "Em inglês, quando o sujeito é he/she/it, o verbo no presente ganha -s ou -es. Isso se chama 'third person singular'.",
"encouragement": "You remembered the word order correctly. Good job.",
"nextHint": "Other verbs that change: do → does, have → has."
}
Vocabulary Review
{
"feedback": "You chose 'mouse' for the picture of a keyboard. The correct word is 'keyboard'.",
"correction": { "wrong": "mouse", "correct": "keyboard" },
"explanation": "Mouse é o dispositivo que você move com a mão. Keyboard é o teclado onde você digita.",
"encouragement": "Both words are from the Tech pack. You are learning them.",
"nextHint": null
}
Pronunciation Feedback (after speech assessment)
{
"feedback": "You said 'hello'. Your pronunciation score is 78 out of 100.",
"correction": null,
"explanation": "The 'h' sound was clear. Try making the 'o' sound longer, like 'hel-LOH'.",
"encouragement": "78 is a good score for this word. Each practice makes it clearer.",
"nextHint": "Try saying it slowly first, then at normal speed."
}
1.4 API Integration Architecture
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ React Native │────▶│ Fastify API │────▶│ Claude API │
│ (exercise │ │ /feedback │ │ (Haiku 4.5) │
│ submission) │ │ │ │ │
│ │◀────│ structured │◀────│ JSON output │
│ render │ │ response │ │ │
│ feedback UI │ │ │ │ │
└──────────────┘ └──────────────┘ └──────────────┘
Model Selection
| Use Case | Model | Why | Cost/1M tokens |
|---|---|---|---|
| Exercise feedback | Haiku 4.5 | Fast, cheap, follows instructions well | $0.80 input / $4.00 output |
| Grammar explanations | Haiku 4.5 | Structured output, low latency | Same |
| Conversation practice | Sonnet 4.6 | Needs more nuance for freeform dialogue | $3.00 input / $15.00 output |
| Content generation (admin) | Opus 4.6 | Quality for lesson creation | $15.00 input / $75.00 output |
Cost Estimation per User
Average exercise: ~200 input tokens + ~150 output tokens
Sessions/day: 1 (average)
Exercises/session: 10
Daily cost per user:
Input: 10 × 200 = 2,000 tokens × $0.80/1M = $0.0016
Output: 10 × 150 = 1,500 tokens × $4.00/1M = $0.006
Total: ~$0.008/day = ~$0.24/month
With caching (repeated system prompt):
Cached input: ~$0.04/1M → reduces input cost by 95%
Effective: ~$0.16/month per active user
1.5 Caching Strategy
- System prompt caching: The system prompt (~500 tokens) is identical across all requests. Use Anthropic’s prompt caching to reduce cost by 90%+
- Lesson context caching: Each lesson’s vocabulary and grammar rules can be cached as ephemeral context
- Response caching: For identical exercises (vocabulary image matching), cache Claude’s response locally and serve without API call
2. Pronunciation Assessment
2.1 API Comparison
| Provider | Price | Languages | Features | Latency | Mobile SDK |
|---|---|---|---|---|---|
| Azure Speech | $0.022/min (Pronunciation Assessment) | 140+ | Phoneme scoring, fluency, prosody, content accuracy | <1s | iOS/Android SDK |
| Speechace | $50-500/mo plans | 30+ | Word/sentence scoring, phoneme detail, IELTS/TOEFL alignment | 1-2s | REST API |
| Whisper API | $0.006/min | 50+ | Transcription only (no scoring) | 1-3s | REST API |
| Deepgram | $0.0043/min | 30+ | Transcription, sentiment, no pronunciation scoring | <0.5s | WebSocket SDK |
| On-device (iOS 26) | Free | 20+ | SpeechAnalyzer API, on-device, no cloud needed | <0.5s | Native only |
2.2 Recommended Approach for Mellow
Hybrid architecture:
-
Primary: Azure Speech Pronunciation Assessment
- Best pronunciation scoring in the market (phoneme-level detail)
- $0.022/min — for 2 minutes of speaking per session = $0.044/day = ~$1.32/month per active user
- Supports Portuguese and English natively
-
Fallback: On-device iOS SpeechAnalyzer (iOS 26)
- Free, no network needed
- Good for basic transcription and confidence scoring
- Use when user is offline or for quick word pronunciation
-
Transcription layer: Whisper API (batch, for review)
- Cheapest option for async transcription of recorded speech
- $0.006/min for post-session review features
2.3 Pronunciation Assessment Flow
User taps "Speak" button
│
▼
┌─────────────────┐
│ Record audio │ ← No countdown timer
│ (press to start, │ User records when ready
│ press to stop) │ No "hurry up" cues
└─────────┬───────┘
│
▼
┌─────────────────┐
│ Send to Azure │
│ Speech API │
│ /pronunciation │
│ /assessment │
└─────────┬───────┘
│
▼
┌─────────────────┐ ┌─────────────────┐
│ Score response: │───▶│ Send to Claude │
│ - accuracy: 78 │ │ for human- │
│ - fluency: 65 │ │ readable feedback │
│ - prosody: 72 │ │ in Portuguese │
│ - phonemes: [..] │ └─────────┬───────┘
└─────────────────┘ │
▼
┌─────────────────┐
│ Display feedback │
│ "Your score: 78 │
│ The 'th' sound │
│ needs practice" │
└─────────────────┘
2.4 Autism-Specific Speaking UX
- Record when ready: No “3, 2, 1, speak!” countdown. User presses when comfortable
- Re-record unlimited times: No limit on attempts. Each is a practice opportunity
- Show waveform during recording (visual feedback that mic is working)
- Score as number (78/100), not stars/grades/badges
- Phoneme-level detail optional (expand to see which sounds need work)
- No comparison with others: Only self-comparison (“Last time: 72 → This time: 78”)
- Skip speaking: Every speaking exercise has a “Skip — I’ll practice this later” option
3. Async Speaking Practice
3.1 Why Async Matters for Autistic Users
Real-time speaking exercises create pressure and anxiety. Mellow’s async approach:
- No live conversation partner — eliminates social anxiety
- Record at own pace — no time pressure
- Review before submitting — listen to own recording first
- Get feedback later — decouple speaking from evaluation
- Practice in safe space — user chooses when/where to speak
3.2 Architecture
// Async speaking flow
interface SpeakingExercise {
id: string;
targetPhrase: string; // "Hello, my name is..."
targetAudio: string; // Native speaker reference audio
userRecording: string | null; // Local file path
assessmentResult: PronunciationScore | null;
claudeFeedback: FeedbackResponse | null;
status: 'pending' | 'recorded' | 'assessed' | 'reviewed';
}
// User records → save locally → assess when ready
// Assessment can happen in background while user continues other exercises
3.3 Cost Optimization
- Batch processing: Collect speaking exercises during session, send for assessment as batch after session ends
- On-device pre-filter: Use iOS SpeechAnalyzer to do basic transcription on-device. Only send to Azure for detailed scoring
- Progressive assessment: Word-level exercises use on-device only. Sentence-level uses Azure. Conversation-level uses Azure + Claude
4. Energy-Adaptive Spaced Repetition
4.1 FSRS Overview
FSRS (Free Spaced Repetition Scheduler) is the state-of-the-art algorithm, reducing reviews by 20-30% vs SM-2. Three core variables:
- Retrievability (R): Probability of successful recall (0-100%)
- Stability (S): Time for R to decay from 100% to 90%
- Difficulty (D): Inherent complexity of the item (1-10)
4.2 Energy-Adaptive Modifications for Mellow
Standard FSRS doesn’t account for variable energy. Mellow’s adaptation:
┌─────────────────────────────────────────────┐
│ ENERGY-ADAPTIVE FSRS │
│ │
│ Standard FSRS scheduling │
│ │ │
│ ▼ │
│ Filter by energy level: │
│ │
│ HIGH energy → Full review deck │
│ New cards + Due reviews + Hard cards │
│ │
│ MEDIUM energy → Reduced deck │
│ Due reviews only (no new cards) │
│ Skip cards with D > 7 │
│ │
│ LOW energy → Minimal deck │
│ Only cards with R < 50% (urgent) │
│ Max 10 cards │
│ Easy mode (recognition only, no recall) │
│ │
│ BROWSE → No deck │
│ View vocabulary list (read-only) │
│ No active recall required │
└─────────────────────────────────────────────┘
4.3 Key Differences from Standard SRS
| Standard SRS | Mellow’s Energy-Adaptive SRS |
|---|---|
| Fixed daily review count | Variable based on energy selection |
| Overdue cards pile up → anxiety | Graceful degradation: low-energy sessions still maintain critical items |
| ”You missed 47 reviews” guilt messaging | ”Welcome back. Here are 5 important words to keep fresh.” |
| Same difficulty for all moods | Easy mode (recognition) vs Hard mode (production) based on energy |
| Streak-based motivation | Progress-based: “You know 127 of 300 words in Tech pack” |
4.4 Implementation
interface MellowSRSCard {
wordId: string;
stability: number; // FSRS S parameter
difficulty: number; // FSRS D parameter (1-10)
retrievability: number; // FSRS R parameter (0-1)
lastReview: Date;
nextReview: Date;
reps: number;
lapses: number;
}
function getSessionDeck(
cards: MellowSRSCard[],
energy: 'high' | 'medium' | 'low' | 'browse'
): MellowSRSCard[] {
const now = new Date();
const dueCards = cards.filter(c => c.nextReview <= now);
switch (energy) {
case 'high':
// All due cards + up to 10 new cards
return [...dueCards, ...getNewCards(10)];
case 'medium':
// Only due reviews, skip hard cards
return dueCards.filter(c => c.difficulty <= 7);
case 'low':
// Only urgent cards (R < 0.5), max 10
return dueCards
.filter(c => c.retrievability < 0.5)
.slice(0, 10);
case 'browse':
return []; // Read-only mode, no active review
}
}
4.5 No-Guilt Messaging Framework
| Scenario | Standard App | Mellow |
|---|---|---|
| Missed 3 days | ”You lost your streak! 🔥" | "Welcome back. You still know 127 words.” |
| Failed a card | ”Wrong! ❌ -1 heart" | "The correct word is ‘keyboard’. Added to your review list.” |
| Low energy session | N/A (no concept) | “You reviewed 5 important words. That keeps them fresh.” |
| Quit mid-session | ”Are you sure? You’ll lose progress!" | "Your progress is saved. See you next time.” |
5. Total Cost Model
Per Active User Per Month
| Component | Usage | Cost |
|---|---|---|
| Claude Haiku (exercise feedback) | ~300 exercises/mo | $0.16 |
| Azure Pronunciation Assessment | ~20 min speaking/mo | $0.44 |
| Whisper (batch transcription) | ~10 min/mo | $0.06 |
| Total API cost per active user | $0.66/month |
At Scale (1,000 active users)
| Monthly | |
|---|---|
| Claude API | $160 |
| Azure Speech | $440 |
| Whisper | $60 |
| Total | $660/month |
Optimization Opportunities
- Prompt caching: Reduces Claude cost by 50-60%
- On-device pronunciation for word-level: Eliminates ~60% of Azure calls
- Response caching for identical exercises: Reduces Claude calls by ~30%
- Optimized total: ~$0.30-0.40/month per active user
6. Recommendations
MVP (Phase 1)
- Claude Haiku 4.5 for all exercise feedback with autism-specific system prompt
- Azure Speech Pronunciation Assessment for sentence-level speaking exercises
- FSRS base algorithm with energy-level filtering (high/medium/low/browse)
- No-guilt messaging framework across all UI touchpoints
- Async speaking only — no real-time conversation in MVP
V1.1 (Phase 2)
- On-device SpeechAnalyzer for word-level pronunciation (reduce Azure cost)
- Claude Sonnet for optional freeform conversation practice
- Prompt caching for system prompt + lesson context
- Pronunciation history showing improvement over time (self-comparison only)
V2 (Phase 3)
- Content generation pipeline using Opus for new lesson packs
- Adaptive difficulty using Claude to generate exercises at user’s exact level
- Voice cloning for native speaker pronunciation models (ElevenLabs or similar)
Sources
- Picovoice — React Native Speech Recognition 2026 Guide
- Callstack — On-Device Speech Transcription with Apple SpeechAnalyzer
- Azure Speech Pronunciation Assessment Pricing
- Deepgram — Best Speech-to-Text APIs 2026
- FSRS Algorithm Wiki
- FSRS vs SM-2 Comparison
- LECTOR: LLM-Enhanced Spaced Repetition
- Anthropic — Claude Education Solutions
- LPITutor: LLM Personalized Intelligent Tutoring
- Adaptive Scaffolding for LLM Pedagogical Agents