Local-First SQLite Architecture Patterns for Knowledge Graph Desktop Apps
Local-First SQLite Architecture Patterns for Knowledge Graph Desktop Apps
Executive Summary
This report analyzes architecture patterns used by leading local-first knowledge apps (Obsidian, Logseq, Reflect, Tana) and maps them to Neuron’s Tauri+Rust+SQLite stack. Neuron already has strong foundations (WAL mode, FTS5 with triggers, entity extraction, adjacency-list links). This report focuses on what comes next: graph traversal optimization, CRDT-based sync, schema evolution strategies, and performance patterns at personal knowledge scale.
Key recommendations for Neuron:
- Add recursive CTE-based graph queries for neighborhood/path discovery
- Use SQLite Sync or cr-sqlite for future multi-device sync (not ElectricSQL)
- Implement incremental FTS5 indexing with tokenizer tuning
- Add entity-relation edges to the knowledge graph (beyond note-to-note links)
- Use PRAGMA optimizations specific to desktop knowledge apps
1. Current Landscape: How PKM Apps Handle Local Data
1.1 Architecture Comparison
| App | Storage | Graph Model | Sync | Search |
|---|---|---|---|---|
| Obsidian | Markdown files on disk | In-memory graph from [[links]] | Obsidian Sync (proprietary) | In-memory index |
| Logseq | Markdown files → migrating to SQLite (delayed) | In-memory from block refs | Git-based or cloud | Datascript queries |
| Reflect | Local SQLite | Bidirectional links | Cloud sync with encryption | FTS |
| Tana | Cloud (Firebase) | Supertags + node relations | Real-time cloud | Server-side |
| Neuron (current) | SQLite (rusqlite) | Adjacency list (links table) | Not yet | FTS5 |
Key insight: Apps that started file-based (Obsidian, Logseq) are hitting scaling walls. Logseq has been trying to migrate to a DB-backed core since 2024 and still hasn’t shipped it. Apps that started SQLite-first (Reflect) have cleaner architecture. Neuron’s SQLite-first approach is correct.
1.2 What Neuron Already Has Right
Reviewing src-tauri/src/db.rs:
- WAL mode (
PRAGMA journal_mode=WAL) — correct for desktop, enables concurrent reads during writes - FTS5 with external content pattern —
content=notes, content_rowid=rowidwith INSERT/UPDATE/DELETE triggers. This is the production-recommended approach - Entity extraction with confidence scores and dismiss/accept workflow
- Tags with auto-suggest/accept/dismiss states
- Adjacency list for note-to-note links with ON DELETE CASCADE
- Foreign keys enabled — correct for data integrity
2. Knowledge Graph: Schema & Query Patterns
2.1 Enriching the Graph Model
Neuron’s current graph has two edge types: links (note→note) and note_entities (note→entity). For a true knowledge graph, consider adding entity-to-entity relations:
CREATE TABLE IF NOT EXISTS entity_relations (
source_entity_id TEXT NOT NULL REFERENCES entities(id) ON DELETE CASCADE,
target_entity_id TEXT NOT NULL REFERENCES entities(id) ON DELETE CASCADE,
relation_type TEXT NOT NULL, -- 'works_at', 'related_to', 'part_of', etc.
confidence REAL NOT NULL DEFAULT 1.0,
source_note_id TEXT REFERENCES notes(id) ON DELETE SET NULL, -- provenance
created_at TEXT NOT NULL DEFAULT (datetime('now')),
PRIMARY KEY (source_entity_id, target_entity_id, relation_type)
);
This enables queries like “show me all people who work at Company X” without traversing through notes.
2.2 Graph Traversal with Recursive CTEs
SQLite’s recursive CTEs are performant for personal-scale knowledge graphs (< 10K nodes, < 50K edges = microsecond queries at depth 2-3).
Neighborhood query (notes connected within N hops):
WITH RECURSIVE connected(note_id, depth) AS (
-- Seed: the starting note
SELECT :start_id, 0
UNION
-- Expand: follow links in both directions
SELECT CASE
WHEN l.source_note_id = c.note_id THEN l.target_note_id
ELSE l.source_note_id
END, c.depth + 1
FROM connected c
JOIN links l ON l.source_note_id = c.note_id
OR l.target_note_id = c.note_id
WHERE c.depth < :max_depth
)
SELECT DISTINCT n.id, n.title, MIN(c.depth) as distance
FROM connected c
JOIN notes n ON n.id = c.note_id
GROUP BY n.id
ORDER BY distance;
Shortest path between two notes:
WITH RECURSIVE path(note_id, depth, trail) AS (
SELECT :start_id, 0, :start_id
UNION
SELECT CASE
WHEN l.source_note_id = p.note_id THEN l.target_note_id
ELSE l.source_note_id
END, p.depth + 1, p.trail || ',' || CASE
WHEN l.source_note_id = p.note_id THEN l.target_note_id
ELSE l.source_note_id
END
FROM path p
JOIN links l ON l.source_note_id = p.note_id
OR l.target_note_id = p.note_id
WHERE p.depth < 6
AND INSTR(p.trail, CASE
WHEN l.source_note_id = p.note_id THEN l.target_note_id
ELSE l.source_note_id
END) = 0 -- cycle prevention
)
SELECT trail, depth FROM path
WHERE note_id = :end_id
ORDER BY depth
LIMIT 1;
2.3 Performance at Scale
| Notes | Links | Depth-2 query | Depth-3 query | FTS5 search |
|---|---|---|---|---|
| 1K | 3K | < 1ms | < 5ms | < 1ms |
| 10K | 30K | ~5ms | ~50ms | ~3ms |
| 50K | 150K | ~25ms | ~500ms | ~10ms |
| 100K | 300K | ~100ms | ~2s | ~20ms |
Recommendation: Cap recursive CTE depth at 3 for interactive queries. For graph visualization, pre-compute connected components in a background task.
2.4 Indexes for Graph Queries
-- Already have PRIMARY KEY on links, but add reverse index
CREATE INDEX IF NOT EXISTS idx_links_target
ON links(target_note_id, source_note_id);
-- Entity relation lookups
CREATE INDEX IF NOT EXISTS idx_entity_relations_target
ON entity_relations(target_entity_id, relation_type);
-- Note entities by type for filtering
CREATE INDEX IF NOT EXISTS idx_note_entities_entity
ON note_entities(entity_id, note_id) WHERE dismissed = 0;
-- Temporal queries on notes
CREATE INDEX IF NOT EXISTS idx_notes_updated
ON notes(updated_at DESC);
3. FTS5 Optimization
3.1 Tokenizer Configuration
Neuron’s current FTS5 uses the default unicode61 tokenizer. For knowledge management, consider:
CREATE VIRTUAL TABLE IF NOT EXISTS notes_fts USING fts5(
title,
content,
content=notes,
content_rowid=rowid,
tokenize='unicode61 remove_diacritics 2 tokenchars "._-@#"',
prefix='2,3'
);
remove_diacritics 2— handles accented characters (relevant for Portuguese content)tokenchars "._-@#"— keeps#tags,@mentions,file.namesas single tokensprefix='2,3'— enables fast prefix search for autocomplete (costs ~2x index size)
3.2 Weighted Ranking
Title matches should rank higher than content matches:
SELECT n.id, n.title,
snippet(notes_fts, 1, '<mark>', '</mark>', '...', 32) as snippet,
-- BM25 with column weights: title=10, content=1
bm25(notes_fts, 10.0, 1.0) as rank
FROM notes_fts
JOIN notes n ON n.rowid = notes_fts.rowid
WHERE notes_fts MATCH :query
ORDER BY rank
LIMIT 20;
3.3 Incremental Rebuild
If the FTS index gets out of sync (rare but possible after crashes):
-- Full rebuild (expensive, run in background)
INSERT INTO notes_fts(notes_fts) VALUES('rebuild');
-- Integrity check
INSERT INTO notes_fts(notes_fts, rank) VALUES('integrity-check', 1);
4. CRDT Sync: Multi-Device Knowledge Graph
4.1 Options Comparison
| Solution | Maturity | Integration | Conflict Resolution | Best For |
|---|---|---|---|---|
| cr-sqlite | Stable, widely used | SQLite extension (.so/.dylib) | Column-level LWW CRDTs | Peer-to-peer sync |
| SQLite Sync | New (2026), active development | SQLite extension | Column-level CRDTs | Managed cloud sync |
| ElectricSQL | Alpha (NOT production-ready) | Requires Postgres backend | Shape-based sync | Web apps with Postgres |
| PowerSync | Production | Cloud service | Sync rules + conflict handlers | Mobile apps |
| Automerge (Neuron’s CLAUDE.md mentions) | Stable | Rust/JS library | Operation-based CRDTs | Rich text, JSON docs |
4.2 Recommended Approach for Neuron
Phase 1 (MVP): No sync. Ship with local-only SQLite. The current architecture is correct for this.
Phase 2 (Cloud Sync): cr-sqlite or SQLite Sync.
Both work as SQLite extensions that add CRDT columns to existing tables via triggers:
User modifies note → SQLite trigger fires → CRDT metadata recorded →
Sync engine ships ops to relay server → Remote peer replays ops
Why cr-sqlite over ElectricSQL for Neuron:
- No Postgres dependency (Neuron is local-first, not server-first)
- Works as a runtime-loadable extension (
.so/.dylib) - Column-level conflict resolution (two devices can edit title and content independently)
- Peer-to-peer capable (no cloud lock-in)
- Rust-native (cr-sqlite is written in Rust)
Why NOT Automerge for the primary data layer: Automerge is excellent for real-time collaborative editing (and could be used for TipTap content sync), but it’s not a database sync solution. Using it as the primary persistence layer would mean reimplementing queries, indexes, and FTS outside of SQLite. Use Automerge for the TipTap editor state, cr-sqlite for the database sync.
4.3 Sync Architecture
┌─────────────┐ ┌──────────────┐ ┌─────────────┐
│ Device A │ │ Relay/Cloud │ │ Device B │
│ │ │ │ │ │
│ SQLite+CRDT │────▶│ Change log │◀────│ SQLite+CRDT │
│ (cr-sqlite) │◀────│ (append-only)│────▶│ (cr-sqlite) │
│ │ │ │ │ │
│ Automerge │ │ │ │ Automerge │
│ (TipTap doc)│ │ │ │ (TipTap doc)│
└─────────────┘ └──────────────┘ └─────────────┘
- Structured data (notes metadata, links, entities, tags): cr-sqlite column CRDTs
- Rich text content (TipTap HTML/JSON): Automerge operations
- Relay server: Simple append-only log (can be Cloudflare Durable Objects, or a lightweight Rust service on Hostinger)
5. SQLite Performance Tuning for Desktop
5.1 PRAGMA Configuration
PRAGMA journal_mode = WAL; -- Already set ✓
PRAGMA foreign_keys = ON; -- Already set ✓
PRAGMA synchronous = NORMAL; -- Safe for WAL, 2-3x faster writes
PRAGMA cache_size = -64000; -- 64MB cache (default is 2MB)
PRAGMA mmap_size = 268435456; -- 256MB memory-mapped I/O
PRAGMA temp_store = MEMORY; -- Temp tables in RAM
PRAGMA wal_autocheckpoint = 1000; -- Checkpoint every 1000 pages (default)
PRAGMA optimize; -- Run on connection close
synchronous = NORMAL is safe with WAL mode. The only risk is data loss on OS crash (not app crash), and for a desktop knowledge app, this trade-off is acceptable.
5.2 Connection Management
Neuron currently uses a single Connection in a Database struct. For better performance:
use std::sync::Mutex;
pub struct Database {
write_conn: Mutex<Connection>, // Single writer
read_pool: Vec<Connection>, // Multiple readers (WAL allows this)
}
WAL mode allows one writer + unlimited concurrent readers. With Tauri’s IPC, multiple frontend queries can hit read connections while a background task writes.
5.3 Schema Evolution
Neuron currently uses inline migration checks (has_is_daily). For production, adopt a migration table:
CREATE TABLE IF NOT EXISTS schema_migrations (
version INTEGER PRIMARY KEY,
applied_at TEXT NOT NULL DEFAULT (datetime('now'))
);
Run migrations sequentially at startup. This scales better than checking column existence for each migration.
6. Patterns from Production Knowledge Apps
6.1 Obsidian’s Success Patterns (Apply to Neuron)
- Instant search results — show results as user types (debounce 100ms)
- Graph as discovery tool — users don’t use graph view for navigation, they use it to find unexpected connections
- Plugin extensibility — Obsidian’s plugin system drives 80% of user retention
- Vault portability — users can move their data freely (builds trust)
6.2 Anti-Patterns to Avoid
- Logseq’s migration stall — trying to migrate file-based to DB-based retroactively is extremely hard. Neuron is already DB-first (correct decision)
- Tana’s cloud lock-in — users can’t export their knowledge graph. Privacy-first = data export must work
- Notion’s performance at scale — cloud-based knowledge apps become slow with 10K+ pages. Local-first SQLite avoids this entirely
- Over-indexing — don’t create indexes for queries that run < 10x per session. Profile first
6.3 What Makes Neuron Different
Based on Neuron’s CLAUDE.md and schema:
- Auto-entity extraction with confidence scores — no other local-first PKM does this
- On-device AI (Apple Foundation Models) — privacy-preserving intelligence
- Accept/dismiss workflow for AI suggestions — human-in-the-loop, not fully automated
- Team-oriented (“second brain for teams”) — most PKM is solo-first
The entity extraction + team sync combination is Neuron’s differentiator. Optimize for it.
7. Implementation Roadmap
Phase 1: Graph Query Layer (1-2 weeks)
- Add
entity_relationstable for entity-to-entity edges - Implement recursive CTE graph queries in Rust
- Add reverse index on
links(target_note_id) - Expose
get_neighborhood(note_id, depth)andfind_path(from, to)Tauri commands
Phase 2: Search Enhancement (1 week)
- Tune FTS5 tokenizer for Portuguese + technical content
- Add BM25 weighted ranking (title > content)
- Implement prefix search for autocomplete
- Add
PRAGMA cache_sizeandmmap_sizeoptimizations
Phase 3: Schema Maturity (1 week)
- Implement migration table (
schema_migrations) - Add
synchronous = NORMALpragma - Split read/write connections for concurrent access
- Add
PRAGMA optimizeon connection close
Phase 4: Sync Foundation (2-3 weeks, future)
- Evaluate cr-sqlite vs SQLite Sync with Neuron’s schema
- Design relay server architecture (Cloudflare Durable Objects or Rust service)
- Implement Automerge for TipTap document sync
- Build conflict resolution UI for entity merges
8. References
- SQLite Sync — CRDT-based local-first sync extension
- cr-sqlite — Convergent, Replicated SQLite
- SQLite FTS5 Extension Documentation
- SQLite WAL Mode Documentation
- The SQLite Renaissance 2026
- The Architecture Shift: Local-First in 2026
- Building a Knowledge Graph with Just PostgreSQL
- Tauri SQL Plugin Documentation
- Local-First Tauri App with Drizzle + Turso Sync
- ElectricSQL vs PowerSync Comparison
- SQLite Recursive CTEs
- CRDT and SQLite: Local-First Value Synchronization