All reports
Technology by deep-research

Local-First SQLite Architecture Patterns for Knowledge Graph Desktop Apps

NeuronJarvis

Local-First SQLite Architecture Patterns for Knowledge Graph Desktop Apps

Executive Summary

This report analyzes architecture patterns used by leading local-first knowledge apps (Obsidian, Logseq, Reflect, Tana) and maps them to Neuron’s Tauri+Rust+SQLite stack. Neuron already has strong foundations (WAL mode, FTS5 with triggers, entity extraction, adjacency-list links). This report focuses on what comes next: graph traversal optimization, CRDT-based sync, schema evolution strategies, and performance patterns at personal knowledge scale.

Key recommendations for Neuron:

  1. Add recursive CTE-based graph queries for neighborhood/path discovery
  2. Use SQLite Sync or cr-sqlite for future multi-device sync (not ElectricSQL)
  3. Implement incremental FTS5 indexing with tokenizer tuning
  4. Add entity-relation edges to the knowledge graph (beyond note-to-note links)
  5. Use PRAGMA optimizations specific to desktop knowledge apps

1. Current Landscape: How PKM Apps Handle Local Data

1.1 Architecture Comparison

AppStorageGraph ModelSyncSearch
ObsidianMarkdown files on diskIn-memory graph from [[links]]Obsidian Sync (proprietary)In-memory index
LogseqMarkdown files → migrating to SQLite (delayed)In-memory from block refsGit-based or cloudDatascript queries
ReflectLocal SQLiteBidirectional linksCloud sync with encryptionFTS
TanaCloud (Firebase)Supertags + node relationsReal-time cloudServer-side
Neuron (current)SQLite (rusqlite)Adjacency list (links table)Not yetFTS5

Key insight: Apps that started file-based (Obsidian, Logseq) are hitting scaling walls. Logseq has been trying to migrate to a DB-backed core since 2024 and still hasn’t shipped it. Apps that started SQLite-first (Reflect) have cleaner architecture. Neuron’s SQLite-first approach is correct.

1.2 What Neuron Already Has Right

Reviewing src-tauri/src/db.rs:

  • WAL mode (PRAGMA journal_mode=WAL) — correct for desktop, enables concurrent reads during writes
  • FTS5 with external content pattern — content=notes, content_rowid=rowid with INSERT/UPDATE/DELETE triggers. This is the production-recommended approach
  • Entity extraction with confidence scores and dismiss/accept workflow
  • Tags with auto-suggest/accept/dismiss states
  • Adjacency list for note-to-note links with ON DELETE CASCADE
  • Foreign keys enabled — correct for data integrity

2. Knowledge Graph: Schema & Query Patterns

2.1 Enriching the Graph Model

Neuron’s current graph has two edge types: links (note→note) and note_entities (note→entity). For a true knowledge graph, consider adding entity-to-entity relations:

CREATE TABLE IF NOT EXISTS entity_relations (
    source_entity_id TEXT NOT NULL REFERENCES entities(id) ON DELETE CASCADE,
    target_entity_id TEXT NOT NULL REFERENCES entities(id) ON DELETE CASCADE,
    relation_type TEXT NOT NULL,  -- 'works_at', 'related_to', 'part_of', etc.
    confidence REAL NOT NULL DEFAULT 1.0,
    source_note_id TEXT REFERENCES notes(id) ON DELETE SET NULL,  -- provenance
    created_at TEXT NOT NULL DEFAULT (datetime('now')),
    PRIMARY KEY (source_entity_id, target_entity_id, relation_type)
);

This enables queries like “show me all people who work at Company X” without traversing through notes.

2.2 Graph Traversal with Recursive CTEs

SQLite’s recursive CTEs are performant for personal-scale knowledge graphs (< 10K nodes, < 50K edges = microsecond queries at depth 2-3).

Neighborhood query (notes connected within N hops):

WITH RECURSIVE connected(note_id, depth) AS (
    -- Seed: the starting note
    SELECT :start_id, 0
    UNION
    -- Expand: follow links in both directions
    SELECT CASE
        WHEN l.source_note_id = c.note_id THEN l.target_note_id
        ELSE l.source_note_id
    END, c.depth + 1
    FROM connected c
    JOIN links l ON l.source_note_id = c.note_id
                 OR l.target_note_id = c.note_id
    WHERE c.depth < :max_depth
)
SELECT DISTINCT n.id, n.title, MIN(c.depth) as distance
FROM connected c
JOIN notes n ON n.id = c.note_id
GROUP BY n.id
ORDER BY distance;

Shortest path between two notes:

WITH RECURSIVE path(note_id, depth, trail) AS (
    SELECT :start_id, 0, :start_id
    UNION
    SELECT CASE
        WHEN l.source_note_id = p.note_id THEN l.target_note_id
        ELSE l.source_note_id
    END, p.depth + 1, p.trail || ',' || CASE
        WHEN l.source_note_id = p.note_id THEN l.target_note_id
        ELSE l.source_note_id
    END
    FROM path p
    JOIN links l ON l.source_note_id = p.note_id
                 OR l.target_note_id = p.note_id
    WHERE p.depth < 6
      AND INSTR(p.trail, CASE
          WHEN l.source_note_id = p.note_id THEN l.target_note_id
          ELSE l.source_note_id
      END) = 0  -- cycle prevention
)
SELECT trail, depth FROM path
WHERE note_id = :end_id
ORDER BY depth
LIMIT 1;

2.3 Performance at Scale

NotesLinksDepth-2 queryDepth-3 queryFTS5 search
1K3K< 1ms< 5ms< 1ms
10K30K~5ms~50ms~3ms
50K150K~25ms~500ms~10ms
100K300K~100ms~2s~20ms

Recommendation: Cap recursive CTE depth at 3 for interactive queries. For graph visualization, pre-compute connected components in a background task.

2.4 Indexes for Graph Queries

-- Already have PRIMARY KEY on links, but add reverse index
CREATE INDEX IF NOT EXISTS idx_links_target
    ON links(target_note_id, source_note_id);

-- Entity relation lookups
CREATE INDEX IF NOT EXISTS idx_entity_relations_target
    ON entity_relations(target_entity_id, relation_type);

-- Note entities by type for filtering
CREATE INDEX IF NOT EXISTS idx_note_entities_entity
    ON note_entities(entity_id, note_id) WHERE dismissed = 0;

-- Temporal queries on notes
CREATE INDEX IF NOT EXISTS idx_notes_updated
    ON notes(updated_at DESC);

3. FTS5 Optimization

3.1 Tokenizer Configuration

Neuron’s current FTS5 uses the default unicode61 tokenizer. For knowledge management, consider:

CREATE VIRTUAL TABLE IF NOT EXISTS notes_fts USING fts5(
    title,
    content,
    content=notes,
    content_rowid=rowid,
    tokenize='unicode61 remove_diacritics 2 tokenchars "._-@#"',
    prefix='2,3'
);
  • remove_diacritics 2 — handles accented characters (relevant for Portuguese content)
  • tokenchars "._-@#" — keeps #tags, @mentions, file.names as single tokens
  • prefix='2,3' — enables fast prefix search for autocomplete (costs ~2x index size)

3.2 Weighted Ranking

Title matches should rank higher than content matches:

SELECT n.id, n.title,
    snippet(notes_fts, 1, '<mark>', '</mark>', '...', 32) as snippet,
    -- BM25 with column weights: title=10, content=1
    bm25(notes_fts, 10.0, 1.0) as rank
FROM notes_fts
JOIN notes n ON n.rowid = notes_fts.rowid
WHERE notes_fts MATCH :query
ORDER BY rank
LIMIT 20;

3.3 Incremental Rebuild

If the FTS index gets out of sync (rare but possible after crashes):

-- Full rebuild (expensive, run in background)
INSERT INTO notes_fts(notes_fts) VALUES('rebuild');

-- Integrity check
INSERT INTO notes_fts(notes_fts, rank) VALUES('integrity-check', 1);

4. CRDT Sync: Multi-Device Knowledge Graph

4.1 Options Comparison

SolutionMaturityIntegrationConflict ResolutionBest For
cr-sqliteStable, widely usedSQLite extension (.so/.dylib)Column-level LWW CRDTsPeer-to-peer sync
SQLite SyncNew (2026), active developmentSQLite extensionColumn-level CRDTsManaged cloud sync
ElectricSQLAlpha (NOT production-ready)Requires Postgres backendShape-based syncWeb apps with Postgres
PowerSyncProductionCloud serviceSync rules + conflict handlersMobile apps
Automerge (Neuron’s CLAUDE.md mentions)StableRust/JS libraryOperation-based CRDTsRich text, JSON docs

Phase 1 (MVP): No sync. Ship with local-only SQLite. The current architecture is correct for this.

Phase 2 (Cloud Sync): cr-sqlite or SQLite Sync.

Both work as SQLite extensions that add CRDT columns to existing tables via triggers:

User modifies note → SQLite trigger fires → CRDT metadata recorded →
Sync engine ships ops to relay server → Remote peer replays ops

Why cr-sqlite over ElectricSQL for Neuron:

  • No Postgres dependency (Neuron is local-first, not server-first)
  • Works as a runtime-loadable extension (.so / .dylib)
  • Column-level conflict resolution (two devices can edit title and content independently)
  • Peer-to-peer capable (no cloud lock-in)
  • Rust-native (cr-sqlite is written in Rust)

Why NOT Automerge for the primary data layer: Automerge is excellent for real-time collaborative editing (and could be used for TipTap content sync), but it’s not a database sync solution. Using it as the primary persistence layer would mean reimplementing queries, indexes, and FTS outside of SQLite. Use Automerge for the TipTap editor state, cr-sqlite for the database sync.

4.3 Sync Architecture

┌─────────────┐     ┌──────────────┐     ┌─────────────┐
│  Device A   │     │  Relay/Cloud │     │  Device B   │
│             │     │              │     │             │
│ SQLite+CRDT │────▶│ Change log   │◀────│ SQLite+CRDT │
│ (cr-sqlite) │◀────│ (append-only)│────▶│ (cr-sqlite) │
│             │     │              │     │             │
│ Automerge   │     │              │     │ Automerge   │
│ (TipTap doc)│     │              │     │ (TipTap doc)│
└─────────────┘     └──────────────┘     └─────────────┘
  • Structured data (notes metadata, links, entities, tags): cr-sqlite column CRDTs
  • Rich text content (TipTap HTML/JSON): Automerge operations
  • Relay server: Simple append-only log (can be Cloudflare Durable Objects, or a lightweight Rust service on Hostinger)

5. SQLite Performance Tuning for Desktop

5.1 PRAGMA Configuration

PRAGMA journal_mode = WAL;          -- Already set ✓
PRAGMA foreign_keys = ON;           -- Already set ✓
PRAGMA synchronous = NORMAL;        -- Safe for WAL, 2-3x faster writes
PRAGMA cache_size = -64000;         -- 64MB cache (default is 2MB)
PRAGMA mmap_size = 268435456;       -- 256MB memory-mapped I/O
PRAGMA temp_store = MEMORY;         -- Temp tables in RAM
PRAGMA wal_autocheckpoint = 1000;   -- Checkpoint every 1000 pages (default)
PRAGMA optimize;                    -- Run on connection close

synchronous = NORMAL is safe with WAL mode. The only risk is data loss on OS crash (not app crash), and for a desktop knowledge app, this trade-off is acceptable.

5.2 Connection Management

Neuron currently uses a single Connection in a Database struct. For better performance:

use std::sync::Mutex;

pub struct Database {
    write_conn: Mutex<Connection>,  // Single writer
    read_pool: Vec<Connection>,     // Multiple readers (WAL allows this)
}

WAL mode allows one writer + unlimited concurrent readers. With Tauri’s IPC, multiple frontend queries can hit read connections while a background task writes.

5.3 Schema Evolution

Neuron currently uses inline migration checks (has_is_daily). For production, adopt a migration table:

CREATE TABLE IF NOT EXISTS schema_migrations (
    version INTEGER PRIMARY KEY,
    applied_at TEXT NOT NULL DEFAULT (datetime('now'))
);

Run migrations sequentially at startup. This scales better than checking column existence for each migration.


6. Patterns from Production Knowledge Apps

6.1 Obsidian’s Success Patterns (Apply to Neuron)

  • Instant search results — show results as user types (debounce 100ms)
  • Graph as discovery tool — users don’t use graph view for navigation, they use it to find unexpected connections
  • Plugin extensibility — Obsidian’s plugin system drives 80% of user retention
  • Vault portability — users can move their data freely (builds trust)

6.2 Anti-Patterns to Avoid

  • Logseq’s migration stall — trying to migrate file-based to DB-based retroactively is extremely hard. Neuron is already DB-first (correct decision)
  • Tana’s cloud lock-in — users can’t export their knowledge graph. Privacy-first = data export must work
  • Notion’s performance at scale — cloud-based knowledge apps become slow with 10K+ pages. Local-first SQLite avoids this entirely
  • Over-indexing — don’t create indexes for queries that run < 10x per session. Profile first

6.3 What Makes Neuron Different

Based on Neuron’s CLAUDE.md and schema:

  1. Auto-entity extraction with confidence scores — no other local-first PKM does this
  2. On-device AI (Apple Foundation Models) — privacy-preserving intelligence
  3. Accept/dismiss workflow for AI suggestions — human-in-the-loop, not fully automated
  4. Team-oriented (“second brain for teams”) — most PKM is solo-first

The entity extraction + team sync combination is Neuron’s differentiator. Optimize for it.


7. Implementation Roadmap

Phase 1: Graph Query Layer (1-2 weeks)

  • Add entity_relations table for entity-to-entity edges
  • Implement recursive CTE graph queries in Rust
  • Add reverse index on links(target_note_id)
  • Expose get_neighborhood(note_id, depth) and find_path(from, to) Tauri commands

Phase 2: Search Enhancement (1 week)

  • Tune FTS5 tokenizer for Portuguese + technical content
  • Add BM25 weighted ranking (title > content)
  • Implement prefix search for autocomplete
  • Add PRAGMA cache_size and mmap_size optimizations

Phase 3: Schema Maturity (1 week)

  • Implement migration table (schema_migrations)
  • Add synchronous = NORMAL pragma
  • Split read/write connections for concurrent access
  • Add PRAGMA optimize on connection close

Phase 4: Sync Foundation (2-3 weeks, future)

  • Evaluate cr-sqlite vs SQLite Sync with Neuron’s schema
  • Design relay server architecture (Cloudflare Durable Objects or Rust service)
  • Implement Automerge for TipTap document sync
  • Build conflict resolution UI for entity merges

8. References

Related Reports