mellow ai feedback pronunciation assessment

by deep-research

AI-Powered Explicit Feedback Design for Language Learning — Claude API Integration Patterns and Pronunciation Assessment

Product: Mellow | Date: 2026-03-20 | Tags: claude-api, pronunciation-assessment, ai-tutor, explicit-feedback, spaced-repetition, FSRS, react-native, speech-to-text

Executive Summary

Mellow’s core differentiation is explicit, autism-friendly AI feedback — no ambiguity, no sarcasm, concrete corrections with explanations. This report covers: (1) Claude API integration patterns for language tutoring with autism-specific prompt design, (2) pronunciation assessment API comparison and React Native integration, (3) async speaking practice architecture, and (4) energy-adaptive spaced repetition using FSRS. Total estimated API cost per active user: $0.15-0.40/month.

1. Claude API for Explicit Language Feedback

1.1 Why Claude for Mellow

Requirement	Claude Fit
No sarcasm or idioms in feedback	Claude follows system prompt instructions precisely
Explicit corrections (show wrong + right)	Strong at structured output (JSON feedback objects)
Portuguese-to-English context	Excellent multilingual capability
Beginner A1 level explanations	Can be constrained to simple vocabulary
Consistent tone across sessions	System prompt ensures uniform personality

1.2 System Prompt Architecture

┌────────────────────────────────────────┐
│           SYSTEM PROMPT                │
│                                        │
│  Role: Patient English tutor           │
│  Constraints:                          │
│  - Never use sarcasm, irony, humor     │
│  - Always explicit corrections         │
│  - Simple A1-level language            │
│  - Portuguese explanations available   │
│  - Structured JSON output              │
│  - No time pressure language           │
│  - Celebrate effort, not speed         │
└────────────────────────────────────────┘

Example System Prompt

You are a patient, clear English tutor for a Portuguese-speaking adult
learner at A1 level. Your student is autistic and prefers explicit,
concrete feedback.

RULES:
1. NEVER use sarcasm, irony, jokes, or figurative language
2. ALWAYS show the correct answer alongside the error
3. Explain WHY something is wrong in simple terms
4. Use Portuguese for grammar explanations when helpful
5. Keep sentences short (max 15 words per sentence)
6. One concept per response — never bundle corrections
7. Use "You wrote X. The correct form is Y." format
8. Never say "try again" — always provide the answer
9. Never create time pressure ("hurry", "quick", "before time runs out")
10. Acknowledge effort: "You got 4 out of 5 correct. Great practice."

OUTPUT FORMAT (JSON):
{
  "feedback": "string — main feedback message",
  "correction": { "wrong": "string", "correct": "string" } | null,
  "explanation": "string — why (in Portuguese if grammar)",
  "encouragement": "string — effort-based, never comparative",
  "nextHint": "string — optional tip for next exercise"
}

1.3 Feedback Use Cases

Grammar Correction

{
  "feedback": "You wrote 'She go to school.' The correct form is 'She goes to school.'",
  "correction": { "wrong": "She go", "correct": "She goes" },
  "explanation": "Em inglês, quando o sujeito é he/she/it, o verbo no presente ganha -s ou -es. Isso se chama 'third person singular'.",
  "encouragement": "You remembered the word order correctly. Good job.",
  "nextHint": "Other verbs that change: do → does, have → has."
}

Vocabulary Review

{
  "feedback": "You chose 'mouse' for the picture of a keyboard. The correct word is 'keyboard'.",
  "correction": { "wrong": "mouse", "correct": "keyboard" },
  "explanation": "Mouse é o dispositivo que você move com a mão. Keyboard é o teclado onde você digita.",
  "encouragement": "Both words are from the Tech pack. You are learning them.",
  "nextHint": null
}

Pronunciation Feedback (after speech assessment)

{
  "feedback": "You said 'hello'. Your pronunciation score is 78 out of 100.",
  "correction": null,
  "explanation": "The 'h' sound was clear. Try making the 'o' sound longer, like 'hel-LOH'.",
  "encouragement": "78 is a good score for this word. Each practice makes it clearer.",
  "nextHint": "Try saying it slowly first, then at normal speed."
}

1.4 API Integration Architecture

┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│  React Native │────▶│  Fastify API │────▶│  Claude API  │
│  (exercise    │     │  /feedback   │     │  (Haiku 4.5) │
│   submission) │     │              │     │              │
│              │◀────│  structured  │◀────│  JSON output │
│  render      │     │  response    │     │              │
│  feedback UI │     │              │     │              │
└──────────────┘     └──────────────┘     └──────────────┘

Model Selection

Use Case	Model	Why	Cost/1M tokens
Exercise feedback	Haiku 4.5	Fast, cheap, follows instructions well	$0.80 input / $4.00 output
Grammar explanations	Haiku 4.5	Structured output, low latency	Same
Conversation practice	Sonnet 4.6	Needs more nuance for freeform dialogue	$3.00 input / $15.00 output
Content generation (admin)	Opus 4.6	Quality for lesson creation	$15.00 input / $75.00 output

Cost Estimation per User

Average exercise: ~200 input tokens + ~150 output tokens
Sessions/day: 1 (average)
Exercises/session: 10

Daily cost per user:
  Input: 10 × 200 = 2,000 tokens × $0.80/1M = $0.0016
  Output: 10 × 150 = 1,500 tokens × $4.00/1M = $0.006
  Total: ~$0.008/day = ~$0.24/month

With caching (repeated system prompt):
  Cached input: ~$0.04/1M → reduces input cost by 95%
  Effective: ~$0.16/month per active user

1.5 Caching Strategy

System prompt caching: The system prompt (~500 tokens) is identical across all requests. Use Anthropic’s prompt caching to reduce cost by 90%+
Lesson context caching: Each lesson’s vocabulary and grammar rules can be cached as ephemeral context
Response caching: For identical exercises (vocabulary image matching), cache Claude’s response locally and serve without API call

2. Pronunciation Assessment

2.1 API Comparison

Provider	Price	Languages	Features	Latency	Mobile SDK
Azure Speech	$0.022/min (Pronunciation Assessment)	140+	Phoneme scoring, fluency, prosody, content accuracy	<1s	iOS/Android SDK
Speechace	$50-500/mo plans	30+	Word/sentence scoring, phoneme detail, IELTS/TOEFL alignment	1-2s	REST API
Whisper API	$0.006/min	50+	Transcription only (no scoring)	1-3s	REST API
Deepgram	$0.0043/min	30+	Transcription, sentiment, no pronunciation scoring	<0.5s	WebSocket SDK
On-device (iOS 26)	Free	20+	SpeechAnalyzer API, on-device, no cloud needed	<0.5s	Native only

2.2 Recommended Approach for Mellow

Hybrid architecture:

Primary: Azure Speech Pronunciation Assessment
- Best pronunciation scoring in the market (phoneme-level detail)
- $0.022/min — for 2 minutes of speaking per session = $0.044/day = ~$1.32/month per active user
- Supports Portuguese and English natively
Fallback: On-device iOS SpeechAnalyzer (iOS 26)
- Free, no network needed
- Good for basic transcription and confidence scoring
- Use when user is offline or for quick word pronunciation
Transcription layer: Whisper API (batch, for review)
- Cheapest option for async transcription of recorded speech
- $0.006/min for post-session review features

2.3 Pronunciation Assessment Flow

User taps "Speak" button
        │
        ▼
┌─────────────────┐
│ Record audio     │  ← No countdown timer
│ (press to start, │    User records when ready
│  press to stop)  │    No "hurry up" cues
└─────────┬───────┘
          │
          ▼
┌─────────────────┐
│ Send to Azure    │
│ Speech API       │
│ /pronunciation   │
│ /assessment      │
└─────────┬───────┘
          │
          ▼
┌─────────────────┐    ┌─────────────────┐
│ Score response:  │───▶│ Send to Claude   │
│ - accuracy: 78   │    │ for human-       │
│ - fluency: 65    │    │ readable feedback │
│ - prosody: 72    │    │ in Portuguese    │
│ - phonemes: [..] │    └─────────┬───────┘
└─────────────────┘              │
                                 ▼
                    ┌─────────────────┐
                    │ Display feedback │
                    │ "Your score: 78 │
                    │  The 'th' sound  │
                    │  needs practice" │
                    └─────────────────┘

2.4 Autism-Specific Speaking UX

Record when ready: No “3, 2, 1, speak!” countdown. User presses when comfortable
Re-record unlimited times: No limit on attempts. Each is a practice opportunity
Show waveform during recording (visual feedback that mic is working)
Score as number (78/100), not stars/grades/badges
Phoneme-level detail optional (expand to see which sounds need work)
No comparison with others: Only self-comparison (“Last time: 72 → This time: 78”)
Skip speaking: Every speaking exercise has a “Skip — I’ll practice this later” option

3. Async Speaking Practice

3.1 Why Async Matters for Autistic Users

Real-time speaking exercises create pressure and anxiety. Mellow’s async approach:

No live conversation partner — eliminates social anxiety
Record at own pace — no time pressure
Review before submitting — listen to own recording first
Get feedback later — decouple speaking from evaluation
Practice in safe space — user chooses when/where to speak

3.2 Architecture

// Async speaking flow
interface SpeakingExercise {
  id: string;
  targetPhrase: string;           // "Hello, my name is..."
  targetAudio: string;            // Native speaker reference audio
  userRecording: string | null;   // Local file path
  assessmentResult: PronunciationScore | null;
  claudeFeedback: FeedbackResponse | null;
  status: 'pending' | 'recorded' | 'assessed' | 'reviewed';
}

// User records → save locally → assess when ready
// Assessment can happen in background while user continues other exercises

3.3 Cost Optimization

Batch processing: Collect speaking exercises during session, send for assessment as batch after session ends
On-device pre-filter: Use iOS SpeechAnalyzer to do basic transcription on-device. Only send to Azure for detailed scoring
Progressive assessment: Word-level exercises use on-device only. Sentence-level uses Azure. Conversation-level uses Azure + Claude

4. Energy-Adaptive Spaced Repetition

4.1 FSRS Overview

FSRS (Free Spaced Repetition Scheduler) is the state-of-the-art algorithm, reducing reviews by 20-30% vs SM-2. Three core variables:

Retrievability (R): Probability of successful recall (0-100%)
Stability (S): Time for R to decay from 100% to 90%
Difficulty (D): Inherent complexity of the item (1-10)

4.2 Energy-Adaptive Modifications for Mellow

Standard FSRS doesn’t account for variable energy. Mellow’s adaptation:

┌─────────────────────────────────────────────┐
│         ENERGY-ADAPTIVE FSRS                │
│                                             │
│  Standard FSRS scheduling                   │
│         │                                   │
│         ▼                                   │
│  Filter by energy level:                    │
│                                             │
│  HIGH energy → Full review deck             │
│    New cards + Due reviews + Hard cards      │
│                                             │
│  MEDIUM energy → Reduced deck               │
│    Due reviews only (no new cards)           │
│    Skip cards with D > 7                     │
│                                             │
│  LOW energy → Minimal deck                  │
│    Only cards with R < 50% (urgent)          │
│    Max 10 cards                              │
│    Easy mode (recognition only, no recall)   │
│                                             │
│  BROWSE → No deck                           │
│    View vocabulary list (read-only)          │
│    No active recall required                 │
└─────────────────────────────────────────────┘

4.3 Key Differences from Standard SRS

Standard SRS	Mellow’s Energy-Adaptive SRS
Fixed daily review count	Variable based on energy selection
Overdue cards pile up → anxiety	Graceful degradation: low-energy sessions still maintain critical items
”You missed 47 reviews” guilt messaging	”Welcome back. Here are 5 important words to keep fresh.”
Same difficulty for all moods	Easy mode (recognition) vs Hard mode (production) based on energy
Streak-based motivation	Progress-based: “You know 127 of 300 words in Tech pack”

4.4 Implementation

interface MellowSRSCard {
  wordId: string;
  stability: number;      // FSRS S parameter
  difficulty: number;      // FSRS D parameter (1-10)
  retrievability: number;  // FSRS R parameter (0-1)
  lastReview: Date;
  nextReview: Date;
  reps: number;
  lapses: number;
}

function getSessionDeck(
  cards: MellowSRSCard[],
  energy: 'high' | 'medium' | 'low' | 'browse'
): MellowSRSCard[] {
  const now = new Date();
  const dueCards = cards.filter(c => c.nextReview <= now);

  switch (energy) {
    case 'high':
      // All due cards + up to 10 new cards
      return [...dueCards, ...getNewCards(10)];
    case 'medium':
      // Only due reviews, skip hard cards
      return dueCards.filter(c => c.difficulty <= 7);
    case 'low':
      // Only urgent cards (R < 0.5), max 10
      return dueCards
        .filter(c => c.retrievability < 0.5)
        .slice(0, 10);
    case 'browse':
      return []; // Read-only mode, no active review
  }
}

4.5 No-Guilt Messaging Framework

Scenario	Standard App	Mellow
Missed 3 days	”You lost your streak! 🔥"	"Welcome back. You still know 127 words.”
Failed a card	”Wrong! ❌ -1 heart"	"The correct word is ‘keyboard’. Added to your review list.”
Low energy session	N/A (no concept)	“You reviewed 5 important words. That keeps them fresh.”
Quit mid-session	”Are you sure? You’ll lose progress!"	"Your progress is saved. See you next time.”

5. Total Cost Model

Per Active User Per Month

Component	Usage	Cost
Claude Haiku (exercise feedback)	~300 exercises/mo	$0.16
Azure Pronunciation Assessment	~20 min speaking/mo	$0.44
Whisper (batch transcription)	~10 min/mo	$0.06
Total API cost per active user		$0.66/month

At Scale (1,000 active users)

	Monthly
Claude API	$160
Azure Speech	$440
Whisper	$60
Total	$660/month

Optimization Opportunities

Prompt caching: Reduces Claude cost by 50-60%
On-device pronunciation for word-level: Eliminates ~60% of Azure calls
Response caching for identical exercises: Reduces Claude calls by ~30%
Optimized total: ~$0.30-0.40/month per active user

6. Recommendations

MVP (Phase 1)

Claude Haiku 4.5 for all exercise feedback with autism-specific system prompt
Azure Speech Pronunciation Assessment for sentence-level speaking exercises
FSRS base algorithm with energy-level filtering (high/medium/low/browse)
No-guilt messaging framework across all UI touchpoints
Async speaking only — no real-time conversation in MVP

V1.1 (Phase 2)

On-device SpeechAnalyzer for word-level pronunciation (reduce Azure cost)
Claude Sonnet for optional freeform conversation practice
Prompt caching for system prompt + lesson context
Pronunciation history showing improvement over time (self-comparison only)

V2 (Phase 3)

Content generation pipeline using Opus for new lesson packs
Adaptive difficulty using Claude to generate exercises at user’s exact level
Voice cloning for native speaker pronunciation models (ElevenLabs or similar)