Complaint Mining + Fake Door: A Combined Methodology for AI-Native Product Discovery

Product Strategy May 14, 2026 by deep-research

#product-discovery #complaint-mining #opinion-mining #fake-door #pretotyping #lean-experiments #app-reviews #requirements-engineering #llm #methodology

Complaint Mining + Fake Door: A Combined Methodology for AI-Native Product Discovery

Date: 2026-05-14 Context: Methodological reference for portfolio-wide discovery work. Applies to every Moklabs product that needs to reduce problem risk, solution risk and usage risk before committing engineering capacity. Source: External literature review (academic + practitioner) commissioned via GPT, validated and reframed for Moklabs use.

Executive Summary

Two techniques, complementary not competing. Complaint Mining answers “what real pains already exist in the market at scale, without me having to ask?”. Fake Door / Pretotyping answers “do people exhibit real behavior of interest before I build anything?”.
Recommended sequence: mine complaints → cluster pains → formulate opportunity hypothesis → test with fake door / concierge / wizard-of-oz → only then prototype or build MVP.
Three risks addressed in series: (1) Problem risk — are we solving a real pain? (2) Solution risk — do users show interest in the proposed solution? (3) Usage risk — do they actually come back, use it, and pay?
The portfolio failure mode this prevents: jumping straight to MVP. The literature consistently points to a cheaper, more robust path: extract real market signals first, convert signals into falsifiable hypotheses, measure behavior before committing engineering.
AI-native angle: LLMs make Complaint Mining ~10x cheaper than it was 3 years ago (transfer learning, contextual understanding, JSON-schema extraction). LLM-Cure-style competitor analysis is now feasible for any Moklabs product with a public competitor universe.

1. Bibliography Validation

The following references were verified and reframed. Treat this as the canon for portfolio discovery work.

Complaint Mining / App Review Mining

Massenon et al. — 2024 — PeerJ Computer Science. Mapping of 180 primary studies on automated/semi-automated review analysis for requirements extraction. Lists techniques (topic modeling, collocation finding, association rules, ABSA, frequency-based, word-vector-based, hybrid) and tools (KEFE, MERIT, DIVER, SAFER, SIRA, T-FEX, RE-BERT, AOBTM). Central reading.
Genc-Nayebi & Abran — 2017 — Journal of Systems and Software. Historical baseline. Maps opinion mining studies in app store reviews, open problems, contributions to requirements evolution.
Dąbrowski et al. — 2022 — IEEE Requirements Engineering. Use cases and reference architecture connecting user-feedback mining to software engineering activities. Important for architecture.
Dąbrowski et al. — 2023 — Information Systems. Feature-specific analysis, replication and benchmarking of mining/searching approaches for app reviews.
Hadi & Fard — 2023 — Empirical Software Engineering. Evaluates pre-trained models for app review classification across binary, multiclass, zero-shot, multitask and multi-source scenarios. Relevant for model choice.
Lima & Marcacini — 2024 — SBQS. Automated identification, prioritization and monitoring of emerging risks in app reviews. Risk matrix, temporal series, heat maps, issue tree, alerts. Processed 6.6M+ reviews across 20 domains, ~270K issues ranked. Highly relevant for prioritization. Note: their “drastic time reduction” claim is goal-state, not proven universal — measure in your own context.
Motger et al. — 2025 — REFSQ / CEUR Vol. 3959. State-of-the-art synthesis, gap identification in feature/sentiment analysis, methodological contributions and datasets. (Originally cited as 2024 — corrected to 2025.)
Assi, Hassan & Zou — LLM-Cure — 2024/2025. Competitor user review analysis with LLMs: extracts features, identifies underperforming features, generates improvement suggestions from positive competitor reviews. Evaluated on 1M+ reviews across 70 apps, 85% mean F1 on feature attribution. Strong reading for competitive analysis.

Fake Door / Pretotyping / Lean Experiments

Alberto Savoia — Pretotype It — 2011. Foundational manifesto. Practitioner literature, not peer-reviewed. Defines pretotyping as testing initial appeal and real usage by simulating core experience with minimal investment. Distinguishes pretotype (“is this the right thing to build?”) from prototype (“can we build it?”).
Google Testing Blog — Pretotyping: A Different Type of Testing — 2011. Short summary of the thesis: apply Agile/TDD ideas further upstream, before building, to test whether building is worthwhile.
Mansoori — Chalmers dissertation; Mansoori & Lackéus follow-up paper. Academic bridge classifying Fake Door as a Lean Startup tactic alongside customer interviews, targeted experiments, physical prototypes, concierge, A/B tests.
Shepherd & Gruber — 2021 — Entrepreneurship Theory and Practice. Lean Startup framework decomposed into five blocks: business model, validated learning/customer development, MVP, pivot/persevere, market-opportunity navigation.
Camuffo et al. — 2020 — Management Science. RCT with startups: scientific hypothesis-test approach outperforms intuition, with higher probability of necessary pivots. Important for scientific rigor argument.
Stevenson, Burnell & Fisher — 2024 — Journal of Management. MVP in theory and practice — dimensionality, forms, risks, trade-offs. Useful to avoid conflating MVP with “first crappy version”.
Felin et al. — 2024 — Journal of Management. Critical reading: compares Lean Startup with theory-based view, argues that a scientific method for startups must align experiments with the startup’s specific causal theory.
Martins Pacheco et al. — 2021 — Proceedings of the Design Society. PETRA framework for fuzzy front-end in startups: prototyping matrix by purpose and lens, plan–execute–test–reflect–assimilate cycle, modified Kanban (Protoban).

2. Complaint Mining / Opinion Mining

2.1 Naming in academia

What product people call “Complaint Mining” appears as: Opinion Mining in App Reviews, App Review Mining, User Feedback Mining, Mining User Feedback for Software Engineering, Crowdsourcing of Software Requirements, Requirements Elicitation from App Reviews, Issue Detection, Feature Extraction, Feature-Specific Sentiment Analysis, Requirements Prioritization.

Core premise: reviews, tickets, forums and public communities contain signals of bugs, feature requests, UX friction, non-functional requirements, dissatisfaction, and differentiation opportunities. Feedback is valuable but voluminous, noisy, and hard to analyze manually.

2.2 The full technical pipeline

Sources → Ingestion → Normalization → AI Layer (classify + extract) →
Clustering → Prioritization → Product Artifacts

1. Collection — sources: App Store, Google Play, competitor reviews, Reclame Aqui, G2, Capterra, Trustpilot, Reddit, forums, Discords, support tickets, NPS/CSAT free-text, call transcripts, internal search logs, release-note comments, sales/CS/support feedback.

2. Normalization — canonical raw-feedback type:

type RawFeedback = {
  source: "google_play" | "app_store" | "reddit" | "g2" | "support_ticket";
  appOrProduct: string;
  competitor?: string;
  rating?: number;
  title?: string;
  body: string;
  language: string;
  createdAt: string;
  version?: string;
  country?: string;
  userSegment?: string;
  url?: string;
};

Required normalizations: language detection, semantic dedup, spam removal, version partitioning, competitor partitioning, entity extraction (feature, screen, platform, device, flow, persona), preservation of original text for audit.

3. Classification — canonical intents:

type FeedbackIntent =
  | "bug_report"
  | "feature_request"
  | "ux_complaint"
  | "performance_complaint"
  | "pricing_complaint"
  | "reliability_complaint"
  | "accessibility_issue"
  | "security_privacy_concern"
  | "praise"
  | "generic_rating"
  | "non_actionable";

4. Aspect/feature extraction — what specifically is the user complaining about (e.g. “login”, “checkout”, “report export”, “load time”, “notifications”, “offline mode”, “permissions”, “billing”, “design system consistency”, “search”, “onboarding”). Aspect-Based Sentiment Analysis (ABSA) is the dominant technique (28/180 studies in Massenon et al.).

5. Sentiment / emotion / severity — decompose:

type SentimentSignal = {
  polarity: "positive" | "neutral" | "negative";
  emotion?: "anger" | "frustration" | "confusion" | "fear" | "disappointment" | "joy";
  severity: 1 | 2 | 3 | 4 | 5;
  urgency: 1 | 2 | 3 | 4 | 5;
  businessImpact: 1 | 2 | 3 | 4 | 5;
};

Negative sentiment with low severity = noise. Moderate sentiment with high severity = real risk. “The app crashes every time I try to pay” outweighs “I don’t like the button color”.

6. Semantic clustering — TF-IDF + K-Means, sentence embeddings + HDBSCAN, BERTopic, agglomerative clustering, entity/version/competitor partitions.

7. Emerging-issue detection — the high-leverage step. Not just “what complaints exist”, but: what’s growing, what appeared after a release, what affects premium users, what appears in competitors but not yet in your product, what carries reputational risk, what predicts churn, what suggests roadmap opportunity. Lima & Marcacini’s risk matrix and temporal monitoring operationalize this.

8. Prioritization — two formulas:

Opportunity Score =
  log(volume + 1)
  × growth_rate
  × negative_sentiment_intensity
  × severity
  × affected_persona_weight
  × strategic_fit
  × confidence
  ÷ estimated_effort

Simpler RICE-inspired version:

Complaint-RICE = Reach × Intensity × Confidence ÷ Effort

2.3 Technique families

Family	When to use	Strengths	Weaknesses
Keyword mining	Baseline, seed-term discovery	Fast, cheap	Loses context, misses irony, implicit complaints
Topic modeling (LDA, NMF, BERTopic, Top2Vec)	Discovery across large unlabeled corpora	Good for surfacing latent themes	Topics often vague, needs human curation
Supervised classification (LR/SVM/RF, BERT-family)	When you have labeled data	High precision per class	Annotation cost, label drift
Weak supervision	Lots of data, no time to label	Reduces annotation cost	Heuristic bias
Aspect-Based Sentiment Analysis	Feature-level signal	Avoids mixing global rating with specific issues	More complex pipeline
LLM extraction with schema	Modern default for low/mid volume	Contextual, transfer learning, auditable JSON	Compute cost, hallucination, interpretability
LLM + competitive analysis (LLM-Cure-style)	Cross-product opportunity discovery	High strategic leverage	Requires careful framing to avoid copying without causality

2.4 Recommended pipeline architecture

┌────────────────────┐
│ Sources            │ App Store, Reddit, G2, tickets, NPS
└─────────┬──────────┘
          ▼
┌────────────────────┐
│ Ingestion           │ crawlers, APIs, schedulers
└─────────┬──────────┘
          ▼
┌────────────────────┐
│ Normalization       │ language, metadata, version, source
└─────────┬──────────┘
          ▼
┌────────────────────┐
│ AI Layer            │ classifier + LLM + embeddings + ABSA
└─────────┬──────────┘
          ▼
┌────────────────────┐
│ Clustering          │ semantic groups, duplicate collapse
└─────────┬──────────┘
          ▼
┌────────────────────┐
│ Prioritization      │ risk matrix, trend, severity, effort
└─────────┬──────────┘
          ▼
┌────────────────────┐
│ Product Artifacts   │ problem cards, PRDs, fake door ideas
└────────────────────┘

Dąbrowski et al. argue for exactly this kind of reference architecture, connecting mining techniques to concrete software engineering use cases.

2.5 Canonical artifact: Problem Opportunity Card

# Problem Opportunity Card

## Detected pain
Users complain that [problem] in [context].

## Evidence
- Volume: 438 reviews
- Growth: +32% last 30 days
- Mean sentiment: -0.78
- Mean severity: 4/5
- Top sources: Google Play, G2, Reddit
- Competitors affected: A, B, C

## Sample complaints
- review_id_123: ...
- review_id_456: ...
- review_id_789: ...

## Associated feature/aspect
Report export.

## Likely persona
Pedagogical coordinator / school admin.

## Opportunity hypothesis
If we offer automatic report export, users will reduce manual work and demonstrate interest in activating the feature.

## Recommended experiment
In-product Fake Door + waitlist + interview for users who click.

## Validation metric
- CTR on CTA >= 8%
- Waitlist conversion >= 30% of clicks
- At least 10 qualified interviews
- 3+ users accepting concierge version

## Decision
Build / Concierge / Pivot / Kill

2.6 Practical LLM extraction prompt

You are a requirements and product analyst.

Analyze the feedback below and return only valid JSON.

Feedback:
"""
{{review_text}}
"""

Metadata:
- rating: {{rating}}
- source: {{source}}
- app_version: {{version}}
- product: {{product}}
- competitor: {{competitor}}

Schema:
{
  "is_actionable": boolean,
  "intent": "bug_report | feature_request | ux_complaint | performance_complaint | pricing_complaint | reliability_complaint | security_privacy_concern | praise | generic_rating | non_actionable",
  "feature_or_area": string,
  "problem_statement": string,
  "user_goal": string,
  "severity": 1-5,
  "urgency": 1-5,
  "sentiment": "positive | neutral | negative",
  "emotion": "anger | frustration | confusion | disappointment | fear | joy | none",
  "suggested_requirement": string,
  "evidence_quote": string,
  "confidence": 0.0-1.0,
  "needs_human_review": boolean
}

Rules:
- Do not invent missing information.
- If no clear evidence, use "unknown".
- Severity reflects user impact, not just emotional tone.
- Response must be pure JSON.

2.7 Metrics

ML/NLP layer: precision, recall, F1, macro-F1 for rare classes, confusion matrix, inter-annotator agreement, topic coherence, cluster purity, human-review rate, LLM hallucination rate.

Product layer: actionable issues per week, time-to-detect emerging issue, review→ticket time, % of clusters that drive real decisions, % of opportunities tested, post-release impact on rating/NPS/churn/tickets/retention, % of competitor insights converted to tested hypotheses.

North star:

Validated Opportunity Rate =
  mined opportunities that produced real behavioral evidence
  ÷
  mined opportunities that entered analysis

Forces the system to connect insight to experiment, not just produce “pretty insights”.

2.8 Failure modes

Confusing volume with importance.
Confusing review with absolute truth (very-happy / very-frustrated users are over-represented).
Losing version context (old-release bugs look current if you don’t cross app version × date).
Copying competitor features without understanding causality.
Over-automating (LLMs should accelerate triage, not replace product/engineering judgment).
Conflating complaint, solution and requirement. “I want CSV export” may be the user’s imagined solution; the real pain might be “I need to share data with my manager”.

3. Fake Door / Pretotyping / Lean Experiments

3.1 Operational definition

Pretotyping = testing whether an idea deserves to be built, before investing in a functional prototype. Savoia: simulate core experience with minimum investment. Pretotype answers “is this the right thing to build?”; prototype answers “can we build it?”.

Fake Door = a specific pretotype: an entry point (link, button, ad, landing) for a feature or product that doesn’t exist yet. Measure user behavior; reveal the in-development state in a controlled way.

3.2 Abordagem comparison

Approach	Main question	What exists	Best use
Fake Door / Pretotype	Is there behavioral interest?	Simulated entry point only	Before building
Concierge	Does delivery create value if done manually?	Manual service behind interface	Before automating
Wizard of Oz / Mechanical Turk	Does the user accept the experience?	Real front-end, human/manual backend	Before heavy engineering
Prototype	Does the solution work?	Partially functional interface or flow	Test usability/feasibility
MVP	Does the minimum solution generate learning in real use?	Minimum usable product	Validate use and iteration

Critical caveat: Fake Door does not validate retention, satisfaction or technical viability. It validates initial interest only. For recurring use, evolve to Concierge / Wizard of Oz / closed beta / MVP. Savoia separates ILI (Initial Level of Interest) from OLI (Ongoing Level of Interest).

3.3 Pretotyping techniques

Fake Door — button/card/menu/landing/ad for something not built. Measure impressions, clicks, waitlist conversion, segment, context, eventual frustration.
Landing Page Smoke Test — product page before product. Value prop, mocked screenshot, CTA, price/plan, FAQ, email capture, “request access”. Best for new products, new audiences, or validation outside existing base.
Pricing Fake Door — test willingness to pay. Stronger than “would you be interested?” because the click happens with real friction (price). Stricter ethical bar as you get closer to actual payment.
Concierge — deliver manually what you intend to automate. Measure if the user uses it, asks again, shares, would pay.
Wizard of Oz / Mechanical Turk — user thinks they’re using automation; humans run it behind. Measure perceived value before automating.
Pinocchio — non-functional object/interface to test behavior, ergonomy, flow location, initial acceptance.
Provincial — test in a small slice: one school, one class, one segment, one state, one team, a beta cohort. Prevents scaling too early.
One-night Stand — time-limited offer. Delivery can be manual. Measures demand without committing to ongoing ops.
Re-label / Impostor — repackage an existing solution under new positioning. Validates framing and value before building dedicated software.

3.4 Fake Door experiment brief template

# Experiment Brief — Fake Door

## Idea
[Feature/product name]

## Hypothesis
We believe [persona] wants [benefit] because [observed pain].

## Prior evidence
- Complaint cluster: [id]
- Volume: [n]
- Severity: [1-5]
- Competitors with similar signals: [x]

## Risk being tested
Desirability / willingness-to-pay / urgency / segment / channel.

## Fake door surface
Where it appears: dashboard | side menu | landing | email | ad | card in existing flow.

## Copy
Title:
Subtitle:
CTA:

## Behavioral currency
- click
- signup
- response
- booking
- beta opt-in
- purchase attempt
- concierge usage

## Primary metric
e.g. click → waitlist conversion.

## Secondary metrics
- impressions
- CTR
- abandonment
- complaints
- interviews booked
- segment of interested users

## Decision thresholds
Build if:
- CTR >= X%
- waitlist >= Y%
- at least Z qualified users
- no relevant trust-break signals

Pivot if:
- interest exists, but in different segment/copy/price

Kill if:
- low CTR
- low post-click conversion
- weak qualitative feedback

## Window
e.g. 14 days or until N impressions.

## Ethics
Reveal message:
Support plan:
Frustration limit:

3.5 Correct metrics

Exposure: eligible_users, impressions, unique_impressions.

Initial interest: CTR = clicks / impressions, unique_CTR = unique_clickers / unique_viewers.

Qualified intent: waitlist_conversion = waitlist_signups / clicks, interview_conversion = interviews_booked / clicks, beta_acceptance = beta_opt_ins / clicks.

Willingness to pay: pricing_click_rate, checkout_intent, preorder_intent.

Trust guardrails: complaint_rate, support_contact_rate, rage_click_rate.

Decision metric:

Validated Demand =
  users who took a costly action
  ÷
  users exposed to the opportunity

Costly action > click. Examples: joining a waitlist, answering a question, booking a call, accepting beta, attempting purchase, submitting data, using concierge.

3.6 Practical thresholds (heuristic, not universal)

Context	Weak signal	Moderate	Strong
In-product CTA	1–3% CTR	4–8% CTR	9%+ CTR
Cold landing page	0.5–2% signup	3–7% signup	8%+ signup
Pricing intent	0.5–1%	2–4%	5%+
B2B beta	3–5 qualified leads	10–20	30+
Feature for existing base	isolated click	click + waitlist	click + interview + concierge use

For product teams, weight post-click conversion higher than raw CTR. A flashy CTA can generate curiosity; a waitlist signup / booking / purchase attempt measures real intent.

3.7 Fake Door ethics

Fake Door can be excellent — or a dark pattern.

Guardrails:

Reveal quickly that the feature is not yet available.
Don’t promise immediate delivery with no plan.
Never Fake-Door critical flows: payment, health, safety, sensitive data, important academic/clinical operations.
Don’t capture unnecessary data.
Don’t charge without clear pre-sale terms, deadline, refund policy.
Don’t repeat Fake Doors to the same user to the point of breaking trust.
Always have a fallback: “we want to understand your need better”.
Monitor complaints and tickets generated by the experiment.

Savoia acknowledges the ethical discomfort but argues that building the wrong product also wastes time, money and people. This argument doesn’t eliminate the operational requirement of transparency.

4. Combined Flow

1. Mine complaints
   ↓
2. Cluster recurring pains
   ↓
3. Generate Problem Opportunity Cards
   ↓
4. Select top opportunities by severity, growth, strategic fit
   ↓
5. Formulate testable hypotheses
   ↓
6. Run Fake Door
   ↓
7. Convert interested users into interview/beta/concierge
   ↓
8. Validate recurring use
   ↓
9. Build MVP — or kill the hypothesis

This flow aligns with Lean Startup’s five-block structure (business model, validated learning, MVP, pivot/persevere, opportunity navigation).

Worked example

Mined signal (competitor reviews):

“Takes me hours to consolidate reports.” “The app doesn’t export data properly.” “I have to copy everything into a spreadsheet.” “I can’t show student progress to the coordination.”

Cluster:

Manual report export & consolidation
Volume: 1,240 reviews
Growth: +18% last 60 days
Sentiment: negative
Severity: high
Persona: coordinator / teacher / admin

Hypothesis: coordinators have high intent to use an auto-report feature because they currently spend time consolidating data manually.

Fake Door (in-product dashboard card):

New: Automatic Progress Reports
Generate a ready-to-share summary for coordination and families.
[I want to test this]

After click:

We're validating this feature with a few schools.
Join the beta list and tell us what report you need to generate.

Metric:

Build if:
- CTR >= 8%
- Beta conversion >= 35%
- 10+ qualifiable users
- 5+ interviews confirming pain and frequency

Next step: concierge. Team generates the report manually for 5 users. Measure use, repetition, share-rate. Only then automate.

5. Engineering & DX Application

5.1 Front-end instrumentation contract

type FakeDoorEvent =
  | {
      type: "fake_door_impression";
      experimentId: string;
      userId: string;
      segment: string;
      surface: string;
      variant: string;
      timestamp: string;
    }
  | {
      type: "fake_door_click";
      experimentId: string;
      userId: string;
      segment: string;
      surface: string;
      variant: string;
      timestamp: string;
    }
  | {
      type: "fake_door_reveal";
      experimentId: string;
      userId: string;
      revealType: "coming_soon" | "waitlist" | "beta" | "interview";
      timestamp: string;
    }
  | {
      type: "fake_door_conversion";
      experimentId: string;
      userId: string;
      conversionType: "waitlist" | "interview" | "beta_opt_in" | "pricing_intent";
      timestamp: string;
    };

5.2 Feature-flag + experiment manifest

const experiment = {
  id: "auto-report-fake-door-v1",
  enabled: true,
  audience: {
    roles: ["coordinator", "admin"],
    percentage: 20,
  },
  variants: [
    { id: "control", weight: 50, visible: false },
    {
      id: "fake-door",
      weight: 50,
      visible: true,
      title: "Automatic Progress Reports",
      cta: "I want to test this",
    },
  ],
};

Rule: the experiment must be removable without complex deploys. In micro-frontend architectures, treat this as remote config + analytics contract + experiment manifest, not logic scattered across MFEs.

5.3 Experiment contract (declarative)

{
  "experimentId": "auto-report-fake-door-v1",
  "owner": "product-analytics",
  "surface": "dashboard.card",
  "hypothesis": "Coordinators show qualified intent for automatic reports.",
  "primaryMetric": "waitlist_conversion",
  "guardrailMetrics": ["complaint_rate", "support_contact_rate"],
  "startAt": "2026-05-01",
  "endAt": "2026-05-14",
  "decisionThresholds": {
    "ctr": 0.08,
    "waitlistConversion": 0.35,
    "qualifiedInterviews": 10,
    "maxComplaintRate": 0.01
  }
}

6. Decision Matrix

Situation	Best approach
You don’t know what pains exist	Complaint Mining
You already have a solution hypothesis	Fake Door
You want to compare against competitors	Competitive Complaint Mining (LLM-Cure-style)
You want to measure real intent in existing base	In-product Fake Door
You want to validate willingness to pay	Pricing Fake Door / landing page
You want to validate recurring use	Concierge / Wizard of Oz
Low traffic	Complaint Mining + interviews + concierge
High traffic	Fake Door + A/B + segmentation
Feature requires high ethical confidence	Interview, concierge, or explicit beta
Heavily regulated market	Avoid deceptive Fake Door; use explicit “early access”

7. Recommended Portfolio Playbook

Sprint 1 — Discovery via Complaint Mining

Goal: find opportunities backed by external evidence.

Collect 5k–50k reviews/tickets/posts.
Partition by source, competitor, version, segment.
Classify intent.
Extract feature/aspect.
Score sentiment/severity.
Cluster.
Generate top 20 clusters.
Convert top 5 into Problem Opportunity Cards.
Review with PM, design, engineering, support.

Output: 5 prioritized opportunities with textual evidence, volume, severity, trend, and a testable hypothesis.

Sprint 2 — Validation via Fake Door

Goal: measure real behavior before building.

Pick 1–2 opportunities.
Define hypothesis and primary metric.
Build CTA/landing/card.
Instrument events.
Define segment.
Run for a fixed window.
Monitor trust guardrails.
Convert interested users to beta/interviews.
Decide build / pivot / kill.

Output: behavior-grounded decision, not opinion-grounded.

Sprint 3 — Usage validation via Concierge / Wizard of Oz

Goal: test recurring value.

Select users who clicked.
Deliver the solution manually.
Measure real use, repetition, perceived quality, willingness to pay / effort accepted.
Map minimum requirements.
Then spec the MVP.

Output: MVP with smaller scope, higher confidence, less engineering waste.

8. Application to the Moklabs Portfolio

This methodology applies to every Moklabs product, but with different intensities given current portfolio state and freeze rules:

Product	Primary use today	Notes
Nightwalls (Argus)	Mine reviews/forums for AI-camera-security pain (Reolink, UniFi, Frigate, Blue Iris communities) before adding features beyond MVP.	Currently the unblock-everything product per freeze rule.
AgentScope	Mine GitHub issues, Reddit r/LLMDevs, HN, competitor docs (Langfuse, Helicone, Arize) for observability pain points.	Pricing Fake Door candidates on landing.
Mandate	Mine orchestration/control-plane discussions (Temporal, Restate, Inngest forums) for governance pain.	High ethical bar — avoid Fake Door in governance flows themselves.
Hardcut	Mine video-creator complaints (Runway, Pika, Captions, Opus reviews). LLM-Cure competitor analysis is a natural fit.	Pricing Fake Door viable on landing.
Plainly	Native macOS local-first → mine meeting-assistant complaints (Granola, Otter, Fathom, Limitless reviews).	Concierge over Fake Door given clinical/personal-data sensitivity.
Prontua	Avoid deceptive Fake Door in clinical/veterinary flows. Use explicit “early access” + interviews + concierge.	Regulated context — see ethics §3.7.
Mellow	Mine neurodivergent-EdTech complaints; Fake Door in landing is fine, but never in core learning flows.	Accessibility ethics layer applies.
Auth, Platform-Notifications	Infra products — mine internal/portfolio “complaints” (Moklabs eng feedback) as ground truth.	No external Fake Door needed.
Code-Review, eggOS	Apply Complaint Mining on niche communities; Fake Door optional.	Low priority under current freeze.
Neuron	On hold. Methodology applies when reactivated.	—

9. Practical Next Steps for Moklabs

Adopt the Problem Opportunity Card template as the canonical discovery artifact across products. Add to moklabs/docs/templates/.
Add this guideline reference to moklabs/docs/guidelines/product-discovery.md (companion file).
Define the analytics contract for Fake Door events once, in a shared @moklabs/experiments spec, so every product instruments the same shape.
Run one end-to-end pilot on Nightwalls (highest-priority product) — mine 5k reviews from competitor camera-security apps, generate 5 Problem Opportunity Cards, run one Fake Door inside the Nightwalls landing or app.
Codify ethics gates in moklabs/docs/guidelines/: list product flows where Fake Door is forbidden (Prontua clinical, Mellow learning core, Mandate governance, payment flows, auth flows).

10. Conclusion

For immediate impact, use the two techniques in sequence:

Complaint Mining to discover and prioritize pains.
Fake Door to validate behavioral demand.
Concierge / Wizard-of-Oz to validate recurring value.
MVP only afterwards.

The most common Moklabs anti-pattern is jumping straight to MVP. The literature consistently points to a cheaper and more robust path: first extract real market signals, then turn those signals into falsifiable hypotheses, then measure behavior, then commit engineering.

Three risks, three gates, three artifacts:

Problem risk → Complaint Mining → Problem Opportunity Card.
Solution risk → Fake Door → Experiment Brief.
Usage risk → Concierge / Wizard of Oz → Concierge log + decision memo.

Only after all three gates pass should engineering build the MVP.

Complaint Mining + Fake Door: A Combined Methodology for AI-Native Product Discovery

Executive Summary

1. Bibliography Validation

Complaint Mining / App Review Mining

Fake Door / Pretotyping / Lean Experiments

2. Complaint Mining / Opinion Mining

2.1 Naming in academia

2.2 The full technical pipeline

2.3 Technique families

2.4 Recommended pipeline architecture

2.5 Canonical artifact: Problem Opportunity Card

2.6 Practical LLM extraction prompt

2.7 Metrics

2.8 Failure modes

3. Fake Door / Pretotyping / Lean Experiments

3.1 Operational definition

3.2 Abordagem comparison

3.3 Pretotyping techniques

3.4 Fake Door experiment brief template

3.5 Correct metrics

3.6 Practical thresholds (heuristic, not universal)

3.7 Fake Door ethics

4. Combined Flow

Worked example

5. Engineering & DX Application

5.1 Front-end instrumentation contract

5.2 Feature-flag + experiment manifest

5.3 Experiment contract (declarative)

6. Decision Matrix

7. Recommended Portfolio Playbook

Sprint 1 — Discovery via Complaint Mining

Sprint 2 — Validation via Fake Door

Sprint 3 — Usage validation via Concierge / Wizard of Oz

8. Application to the Moklabs Portfolio

9. Practical Next Steps for Moklabs

10. Conclusion

Related Reports