Local-First SQLite Architecture Patterns for Knowledge Graph Desktop Apps

Technology Mar 20, 2026 by deep-research

#sqlite #local-first #knowledge-graph #crdt #tauri #fts5 #sync

Local-First SQLite Architecture Patterns for Knowledge Graph Desktop Apps

Executive Summary

This report analyzes architecture patterns used by leading local-first knowledge apps (Obsidian, Logseq, Reflect, Tana) and maps them to Neuron’s Tauri+Rust+SQLite stack. Neuron already has strong foundations (WAL mode, FTS5 with triggers, entity extraction, adjacency-list links). This report focuses on what comes next: graph traversal optimization, CRDT-based sync, schema evolution strategies, and performance patterns at personal knowledge scale.

Key recommendations for Neuron:

Add recursive CTE-based graph queries for neighborhood/path discovery
Use SQLite Sync or cr-sqlite for future multi-device sync (not ElectricSQL)
Implement incremental FTS5 indexing with tokenizer tuning
Add entity-relation edges to the knowledge graph (beyond note-to-note links)
Use PRAGMA optimizations specific to desktop knowledge apps

1. Current Landscape: How PKM Apps Handle Local Data

1.1 Architecture Comparison

App	Storage	Graph Model	Sync	Search
Obsidian	Markdown files on disk	In-memory graph from `[[links]]`	Obsidian Sync (proprietary)	In-memory index
Logseq	Markdown files → migrating to SQLite (delayed)	In-memory from block refs	Git-based or cloud	Datascript queries
Reflect	Local SQLite	Bidirectional links	Cloud sync with encryption	FTS
Tana	Cloud (Firebase)	Supertags + node relations	Real-time cloud	Server-side
Neuron (current)	SQLite (rusqlite)	Adjacency list (links table)	Not yet	FTS5

Key insight: Apps that started file-based (Obsidian, Logseq) are hitting scaling walls. Logseq has been trying to migrate to a DB-backed core since 2024 and still hasn’t shipped it. Apps that started SQLite-first (Reflect) have cleaner architecture. Neuron’s SQLite-first approach is correct.

1.2 What Neuron Already Has Right

Reviewing src-tauri/src/db.rs:

WAL mode (PRAGMA journal_mode=WAL) — correct for desktop, enables concurrent reads during writes
FTS5 with external content pattern — content=notes, content_rowid=rowid with INSERT/UPDATE/DELETE triggers. This is the production-recommended approach
Entity extraction with confidence scores and dismiss/accept workflow
Tags with auto-suggest/accept/dismiss states
Adjacency list for note-to-note links with ON DELETE CASCADE
Foreign keys enabled — correct for data integrity

2. Knowledge Graph: Schema & Query Patterns

2.1 Enriching the Graph Model

Neuron’s current graph has two edge types: links (note→note) and note_entities (note→entity). For a true knowledge graph, consider adding entity-to-entity relations:

CREATE TABLE IF NOT EXISTS entity_relations (
    source_entity_id TEXT NOT NULL REFERENCES entities(id) ON DELETE CASCADE,
    target_entity_id TEXT NOT NULL REFERENCES entities(id) ON DELETE CASCADE,
    relation_type TEXT NOT NULL,  -- 'works_at', 'related_to', 'part_of', etc.
    confidence REAL NOT NULL DEFAULT 1.0,
    source_note_id TEXT REFERENCES notes(id) ON DELETE SET NULL,  -- provenance
    created_at TEXT NOT NULL DEFAULT (datetime('now')),
    PRIMARY KEY (source_entity_id, target_entity_id, relation_type)
);

This enables queries like “show me all people who work at Company X” without traversing through notes.

2.2 Graph Traversal with Recursive CTEs

SQLite’s recursive CTEs are performant for personal-scale knowledge graphs (< 10K nodes, < 50K edges = microsecond queries at depth 2-3).

Neighborhood query (notes connected within N hops):

WITH RECURSIVE connected(note_id, depth) AS (
    -- Seed: the starting note
    SELECT :start_id, 0
    UNION
    -- Expand: follow links in both directions
    SELECT CASE
        WHEN l.source_note_id = c.note_id THEN l.target_note_id
        ELSE l.source_note_id
    END, c.depth + 1
    FROM connected c
    JOIN links l ON l.source_note_id = c.note_id
                 OR l.target_note_id = c.note_id
    WHERE c.depth < :max_depth
)
SELECT DISTINCT n.id, n.title, MIN(c.depth) as distance
FROM connected c
JOIN notes n ON n.id = c.note_id
GROUP BY n.id
ORDER BY distance;

Shortest path between two notes:

WITH RECURSIVE path(note_id, depth, trail) AS (
    SELECT :start_id, 0, :start_id
    UNION
    SELECT CASE
        WHEN l.source_note_id = p.note_id THEN l.target_note_id
        ELSE l.source_note_id
    END, p.depth + 1, p.trail || ',' || CASE
        WHEN l.source_note_id = p.note_id THEN l.target_note_id
        ELSE l.source_note_id
    END
    FROM path p
    JOIN links l ON l.source_note_id = p.note_id
                 OR l.target_note_id = p.note_id
    WHERE p.depth < 6
      AND INSTR(p.trail, CASE
          WHEN l.source_note_id = p.note_id THEN l.target_note_id
          ELSE l.source_note_id
      END) = 0  -- cycle prevention
)
SELECT trail, depth FROM path
WHERE note_id = :end_id
ORDER BY depth
LIMIT 1;

2.3 Performance at Scale

Notes	Links	Depth-2 query	Depth-3 query	FTS5 search
1K	3K	< 1ms	< 5ms	< 1ms
10K	30K	~5ms	~50ms	~3ms
50K	150K	~25ms	~500ms	~10ms
100K	300K	~100ms	~2s	~20ms

Recommendation: Cap recursive CTE depth at 3 for interactive queries. For graph visualization, pre-compute connected components in a background task.

2.4 Indexes for Graph Queries

-- Already have PRIMARY KEY on links, but add reverse index
CREATE INDEX IF NOT EXISTS idx_links_target
    ON links(target_note_id, source_note_id);

-- Entity relation lookups
CREATE INDEX IF NOT EXISTS idx_entity_relations_target
    ON entity_relations(target_entity_id, relation_type);

-- Note entities by type for filtering
CREATE INDEX IF NOT EXISTS idx_note_entities_entity
    ON note_entities(entity_id, note_id) WHERE dismissed = 0;

-- Temporal queries on notes
CREATE INDEX IF NOT EXISTS idx_notes_updated
    ON notes(updated_at DESC);

3. FTS5 Optimization

3.1 Tokenizer Configuration

Neuron’s current FTS5 uses the default unicode61 tokenizer. For knowledge management, consider:

CREATE VIRTUAL TABLE IF NOT EXISTS notes_fts USING fts5(
    title,
    content,
    content=notes,
    content_rowid=rowid,
    tokenize='unicode61 remove_diacritics 2 tokenchars "._-@#"',
    prefix='2,3'
);

remove_diacritics 2 — handles accented characters (relevant for Portuguese content)
tokenchars "._-@#" — keeps #tags, @mentions, file.names as single tokens
prefix='2,3' — enables fast prefix search for autocomplete (costs ~2x index size)

3.2 Weighted Ranking

Title matches should rank higher than content matches:

SELECT n.id, n.title,
    snippet(notes_fts, 1, '<mark>', '</mark>', '...', 32) as snippet,
    -- BM25 with column weights: title=10, content=1
    bm25(notes_fts, 10.0, 1.0) as rank
FROM notes_fts
JOIN notes n ON n.rowid = notes_fts.rowid
WHERE notes_fts MATCH :query
ORDER BY rank
LIMIT 20;

3.3 Incremental Rebuild

If the FTS index gets out of sync (rare but possible after crashes):

-- Full rebuild (expensive, run in background)
INSERT INTO notes_fts(notes_fts) VALUES('rebuild');

-- Integrity check
INSERT INTO notes_fts(notes_fts, rank) VALUES('integrity-check', 1);

4. CRDT Sync: Multi-Device Knowledge Graph

4.1 Options Comparison

Solution	Maturity	Integration	Conflict Resolution	Best For
cr-sqlite	Stable, widely used	SQLite extension (.so/.dylib)	Column-level LWW CRDTs	Peer-to-peer sync
SQLite Sync	New (2026), active development	SQLite extension	Column-level CRDTs	Managed cloud sync
ElectricSQL	Alpha (NOT production-ready)	Requires Postgres backend	Shape-based sync	Web apps with Postgres
PowerSync	Production	Cloud service	Sync rules + conflict handlers	Mobile apps
Automerge (Neuron’s CLAUDE.md mentions)	Stable	Rust/JS library	Operation-based CRDTs	Rich text, JSON docs

4.2 Recommended Approach for Neuron

Phase 1 (MVP): No sync. Ship with local-only SQLite. The current architecture is correct for this.

Phase 2 (Cloud Sync): cr-sqlite or SQLite Sync.

Both work as SQLite extensions that add CRDT columns to existing tables via triggers:

User modifies note → SQLite trigger fires → CRDT metadata recorded →
Sync engine ships ops to relay server → Remote peer replays ops

Why cr-sqlite over ElectricSQL for Neuron:

No Postgres dependency (Neuron is local-first, not server-first)
Works as a runtime-loadable extension (.so / .dylib)
Column-level conflict resolution (two devices can edit title and content independently)
Peer-to-peer capable (no cloud lock-in)
Rust-native (cr-sqlite is written in Rust)

Why NOT Automerge for the primary data layer: Automerge is excellent for real-time collaborative editing (and could be used for TipTap content sync), but it’s not a database sync solution. Using it as the primary persistence layer would mean reimplementing queries, indexes, and FTS outside of SQLite. Use Automerge for the TipTap editor state, cr-sqlite for the database sync.

4.3 Sync Architecture

┌─────────────┐     ┌──────────────┐     ┌─────────────┐
│  Device A   │     │  Relay/Cloud │     │  Device B   │
│             │     │              │     │             │
│ SQLite+CRDT │────▶│ Change log   │◀────│ SQLite+CRDT │
│ (cr-sqlite) │◀────│ (append-only)│────▶│ (cr-sqlite) │
│             │     │              │     │             │
│ Automerge   │     │              │     │ Automerge   │
│ (TipTap doc)│     │              │     │ (TipTap doc)│
└─────────────┘     └──────────────┘     └─────────────┘

Structured data (notes metadata, links, entities, tags): cr-sqlite column CRDTs
Rich text content (TipTap HTML/JSON): Automerge operations
Relay server: Simple append-only log (can be Cloudflare Durable Objects, or a lightweight Rust service on Hostinger)

5. SQLite Performance Tuning for Desktop

5.1 PRAGMA Configuration

PRAGMA journal_mode = WAL;          -- Already set ✓
PRAGMA foreign_keys = ON;           -- Already set ✓
PRAGMA synchronous = NORMAL;        -- Safe for WAL, 2-3x faster writes
PRAGMA cache_size = -64000;         -- 64MB cache (default is 2MB)
PRAGMA mmap_size = 268435456;       -- 256MB memory-mapped I/O
PRAGMA temp_store = MEMORY;         -- Temp tables in RAM
PRAGMA wal_autocheckpoint = 1000;   -- Checkpoint every 1000 pages (default)
PRAGMA optimize;                    -- Run on connection close

synchronous = NORMAL is safe with WAL mode. The only risk is data loss on OS crash (not app crash), and for a desktop knowledge app, this trade-off is acceptable.

5.2 Connection Management

Neuron currently uses a single Connection in a Database struct. For better performance:

use std::sync::Mutex;

pub struct Database {
    write_conn: Mutex<Connection>,  // Single writer
    read_pool: Vec<Connection>,     // Multiple readers (WAL allows this)
}

WAL mode allows one writer + unlimited concurrent readers. With Tauri’s IPC, multiple frontend queries can hit read connections while a background task writes.

5.3 Schema Evolution

Neuron currently uses inline migration checks (has_is_daily). For production, adopt a migration table:

CREATE TABLE IF NOT EXISTS schema_migrations (
    version INTEGER PRIMARY KEY,
    applied_at TEXT NOT NULL DEFAULT (datetime('now'))
);

Run migrations sequentially at startup. This scales better than checking column existence for each migration.

6. Patterns from Production Knowledge Apps

6.1 Obsidian’s Success Patterns (Apply to Neuron)

Instant search results — show results as user types (debounce 100ms)
Graph as discovery tool — users don’t use graph view for navigation, they use it to find unexpected connections
Plugin extensibility — Obsidian’s plugin system drives 80% of user retention
Vault portability — users can move their data freely (builds trust)

6.2 Anti-Patterns to Avoid

Logseq’s migration stall — trying to migrate file-based to DB-based retroactively is extremely hard. Neuron is already DB-first (correct decision)
Tana’s cloud lock-in — users can’t export their knowledge graph. Privacy-first = data export must work
Notion’s performance at scale — cloud-based knowledge apps become slow with 10K+ pages. Local-first SQLite avoids this entirely
Over-indexing — don’t create indexes for queries that run < 10x per session. Profile first

6.3 What Makes Neuron Different

Based on Neuron’s CLAUDE.md and schema:

Auto-entity extraction with confidence scores — no other local-first PKM does this
On-device AI (Apple Foundation Models) — privacy-preserving intelligence
Accept/dismiss workflow for AI suggestions — human-in-the-loop, not fully automated
Team-oriented (“second brain for teams”) — most PKM is solo-first

The entity extraction + team sync combination is Neuron’s differentiator. Optimize for it.

7. Implementation Roadmap

Phase 1: Graph Query Layer (1-2 weeks)

Add entity_relations table for entity-to-entity edges
Implement recursive CTE graph queries in Rust
Add reverse index on links(target_note_id)
Expose get_neighborhood(note_id, depth) and find_path(from, to) Tauri commands

Phase 2: Search Enhancement (1 week)

Tune FTS5 tokenizer for Portuguese + technical content
Add BM25 weighted ranking (title > content)
Implement prefix search for autocomplete
Add PRAGMA cache_size and mmap_size optimizations

Phase 3: Schema Maturity (1 week)

Implement migration table (schema_migrations)
Add synchronous = NORMAL pragma
Split read/write connections for concurrent access
Add PRAGMA optimize on connection close

Phase 4: Sync Foundation (2-3 weeks, future)

Evaluate cr-sqlite vs SQLite Sync with Neuron’s schema
Design relay server architecture (Cloudflare Durable Objects or Rust service)
Implement Automerge for TipTap document sync
Build conflict resolution UI for entity merges

Local-First SQLite Architecture Patterns for Knowledge Graph Desktop Apps

Local-First SQLite Architecture Patterns for Knowledge Graph Desktop Apps

Executive Summary

1. Current Landscape: How PKM Apps Handle Local Data

1.1 Architecture Comparison

1.2 What Neuron Already Has Right

2. Knowledge Graph: Schema & Query Patterns

2.1 Enriching the Graph Model

2.2 Graph Traversal with Recursive CTEs

2.3 Performance at Scale

2.4 Indexes for Graph Queries

3. FTS5 Optimization

3.1 Tokenizer Configuration

3.2 Weighted Ranking

3.3 Incremental Rebuild

4. CRDT Sync: Multi-Device Knowledge Graph

4.1 Options Comparison

4.2 Recommended Approach for Neuron

4.3 Sync Architecture

5. SQLite Performance Tuning for Desktop

5.1 PRAGMA Configuration

5.2 Connection Management

5.3 Schema Evolution

6. Patterns from Production Knowledge Apps

6.1 Obsidian’s Success Patterns (Apply to Neuron)

6.2 Anti-Patterns to Avoid

6.3 What Makes Neuron Different

7. Implementation Roadmap

Phase 1: Graph Query Layer (1-2 weeks)

Phase 2: Search Enhancement (1 week)

Phase 3: Schema Maturity (1 week)

Phase 4: Sync Foundation (2-3 weeks, future)

8. References

Related Reports