AlphaLab-USTC

AutoWiki

Your LLM Compiles a Knowledge Base. You Just Read and Ask.

1

Drop

→

2

Compile

→

3

Browse

Implementing Andrej Karpathy's LLM Knowledge Base vision

The Problem

Your Reading Never Compounds

You consume dozens of sources a week. You take notes, highlight results, file them into folders. Six months later, you can't remember how any of them connect.

🔄

RAG

Re-discovers knowledge from scratch on every query. Nothing accumulates.

🪦

Note-taking Apps

A graveyard of disconnected files. You organize once, then abandon.

📋

Summarizers

Shallow bullet points that miss what actually matters. No cross-references.

The Inspiration

The Karpathy Pattern

Using LLMs to build personal knowledge bases… a large fraction of my recent token throughput is going less into manipulating code, and more into manipulating knowledge.

— Andrej Karpathy

Instead of retrieving from raw documents at query time, the LLM incrementally builds and maintains a persistent wiki — a structured, interlinked collection that sits between you and the raw sources.

Core Idea

The Wiki is a Persistent,
Compounding Artifact

The cross-references are already there. The contradictions have already been flagged. The synthesis already reflects everything you've read.

raw sources

immutable archive

→

LLM compiler

reads, synthesizes, writes

→

living wiki

compounds over time

→

Obsidian

your reading UI

You never write the wiki yourself — the LLM writes and maintains all of it.
You curate sources, ask questions, and think. The LLM does the grunt work.

What Makes AutoWiki Different

Four Design Principles

🧠

Cognitive Depth

Not shallow summaries. CRGP factors from the author's Introduction. Critical Analysis with contrastive prior/update structure. Built-in anti-patterns prevent generic filler.

🔗

Temporal Knowledge Graphs

Every source positioned in the field's timeline. Evolutionary chains, cross-domain origins, temporal tensions. Topics auto-generate Chronological Evolution sections.

🏠

Self-Maintaining Wiki

Three-tier autonomy: Silent → Notify → Confirm. 25+ lint checks catch broken links, hierarchy violations, stale data. The wiki heals itself during normal use.

🎯

Classification That Thinks

3-question fitness check before every topic assignment. Milestone hierarchy with consolidation (<5) and promotion (≥5). No misorganization.

Architecture

Two Decoupled Layers

OBSIDIAN VAULT

Your reading UI — graph view, backlinks, Properties UI. All for free.

↕

KB/ — KNOWLEDGE BASE

sources/ paper pages · topics/ milestone nodes · journal/ cognitive timeline
Agent's domain — fully LLM-maintained. Zero Python write code.

↕

LLM (CLAUDE CODE)

Reads source → extracts factors → positions temporally → links → writes → self-lints

↕

RAW/ — SOURCE ARCHIVE

Immutable. Human drops sources into raw/new/, agent moves to raw/compiled/

📄 Ingest — 4-phase pipeline

🔍 Query — search → synthesize → write-back

🔧 Lint — 25+ consistency checks

🔀 Reorganize — three-tree sync

⚡ The SKILL.md IS the architecture — 390 lines of domain knowledge

Design Philosophy — Topics

Survey-Style Paper Organization

Each topic is a milestone node — a conceptual breakthrough that clusters papers. Like a survey paper, it tells the story of how a research direction evolved over time.

CASE: AGENT-SELF-EVOLUTION (80 PAPERS, 13 MILESTONES)

agent-self-evolution.md ← split parent, 9 children
├─ Mechanism Layer
│  ├─ self-evolving-skill-libraries (7 papers)
│  ├─ memory-evolution (12 papers, 2 sub-children)
│  ├─ experience-driven-policy-evolution (5)
│  ├─ llm-guided-evolutionary-search (8)
│  └─ multi-agent-co-evolution (8)
├─ Application Layer
│  └─ domain-applications (10 → scientific + clinical)
└─ Cross-Cutting Layer
   ├─ agentic-evolution-theory (6)
   ├─ agent-safety-adversarial-evolution (5)
   └─ evolving-agent-surveys-benchmarks (6)

CASE: HARNESS-ENGINEERING — CHRONOLOGICAL EVOLUTION

Phase 1 Foundations (2026-01) → Phase 2 Evaluation (2026-02) → Phase 3 Specialization (2026-03)
Chain: meta-context-eng → building-effective-agents → nl-agent-harnesses → alara-agents

🌳 Dual role: topic file organizes both knowledge structure and file tree

📂 File mapping: topic slug = source subdirectory
sources/agent-self-evolution/self-evolving-skill-libraries/*.md

⚖️ Auto-scaling: <5 → inline subtopics, ≥5 → split to own file, >8 → must sub-cluster

🎯 Fitness check before every paper — prevents keyword-based misclassification

Design Philosophy — Sources

The Atomic Knowledge Unit

Each source page is a complete, self-contained analysis — not a summary. Here's a real page from our wiki: MemSkill (Zhang et al., 2026).

📖

Factors (CRGP)

Context → Related Work → Gap → Proposal
Extracted from the author's Introduction. Their framing, not ours.

🔬

Critical Analysis

Novel Insight — prior/update contrastive
Limitations — approach-level
Frontier — concrete next steps

⏳

Temporal Relations

Typed links: extends, complements, contrasts_with
Each with a delta — what specifically differs

🖼️

Figures

Auto-extracted via PyMuPDF. Agent reads manifest.json to select figures with one-line interpretations.

💡

Cognitive Shifts

Timestamped insights from human-agent conversation. The reading process creates knowledge too.

❓

Open Questions

Future papers can resolve them — agent marks with strikethrough and links to the answering source.

CASE: MEMSKILL.MD

GAP

"No prior work treats memory operations themselves as first-class learnable entities. The field conflates: what to remember vs. how to remember."

NOVEL INSIGHT

prior: Memory mgmt = content problem, solved with fixed logic

update: The extraction procedure is itself a variable — separating "how" from "what" enables joint optimization

RELATIONS

extends [[cascade]] — same learning principle, different target layer
complements [[skillrl]] — orthogonal skill domains, shared optimization pattern
contrasts_with [[yunjue-agent]] — executable code vs. declarative skills

COGNITIVE SHIFT

[2026-04-08] "What to remember" and "how to remember" are two independent optimization problems — separating them enables a new research direction.

OPEN QUESTION

Can evolved skill cards be auto-converted to executable procedures, bridging MemSkill's declarative skills with SkillRL's executable heuristics?

Design Philosophy — Proactive Updates

The Wiki Updates Your Cognition

The agent doesn't wait for instructions. When insights emerge, it writes them back immediately. Here are real journal entries from our 80-paper wiki:

synthesis 2026-04-09

Restructured wiki into three-layer taxonomy

trigger: User observed all topics converge to "evolving topic" with unclear inter-topic relationships
pages: index + agent-self-evolution + 14 topic files
action: Created mechanism / application / cross-cutting layers. Merged memory pair, domain pair into split parents.

maintenance 2026-04-09

Mirrored raw/compiled/ directory tree to match sources/topics

Moved 11 directories into nested structure. Updated raw_path YAML in all 67 source files. Three directory trees now perfectly mirrored.

batch-ingest 2026-04-08

Ingested 80 Evolving Agent & Harness Engineering papers

Classified into 12 milestones. Key cross-cutting themes: (1) self-evolution as third scaling axis, (2) convergence of memory + skill evolution, (3) harness engineering as distinct discipline

cognitive-shift from memskill.md

"What to remember" and "how to remember" are two independent optimization problems — separating them enables a new research direction in self-evolving memory.

THREE-TIER AUTONOMY

SILENT

Broken links, index sync, raw_path fixes

NOTIFY

Cognitive shifts, resolved open questions

CONFIRM

New topics, taxonomy restructure

DUAL-WRITE RULE

Every proactive update writes to the relevant page in-place and appends a journal entry — full audit trail.

↑ All entries above are real — from our agent-self-evolution wiki built with 80 papers over 2 days.

Vision → Implementation

Karpathy's Vision, Realized

Karpathy's Vision	AutoWiki's Implementation
"Index source documents into directory"	`raw/` — source documents with extracted assets
"LLM incrementally compiles a wiki"	`kb/` — structured analysis, critical synthesis, temporal positioning
"Backlinks, categorizes, writes articles, links them"	Three-level index + `[[wikilinks]]` + milestone hierarchy
"Use Obsidian as the IDE frontend"	Entire project root is an Obsidian vault
"LLM writes and maintains all the data"	Agent owns `kb/` — proactive write-back with 3-tier autonomy
"Explorations and queries always add up"	Every query can produce journal entries + updated cross-references
"LLM health checks over the wiki"	25+ lint checks: hierarchy, temporal, broken links, orphan pages

Design Decisions

Why This Architecture?

Why Markdown, not a database?

LLMs work natively with text. No ORM, no migrations. grep is sufficient at personal scale. Obsidian renders it beautifully.

Why Claude Code as the compiler?

Zero Python write code in the KB layer. The LLM uses built-in tools to manipulate markdown directly. The compiler adapts without code changes.

Why not RAG?

A well-maintained index + grep outperforms vector search at ~100s of sources. No embedding pipeline. Karpathy agrees.

Why a skill, not an app?

SKILL.md IS the architecture — 390 lines encoding quality standards, anti-patterns, and workflow rules. No servers. No infra.

Self-Healing Wiki

Lint Repairs. Reorganize Evolves.

The wiki doesn't just store knowledge — it heals itself when things break, and restructures itself when you rethink the taxonomy.

🔧 Lint — Self-Repair

25+ checks run during every operation, not just on command. When the agent reads any page, it detects and fixes silently:

› Broken [[wikilinks]]

› Feeds ↔ Cluster mismatch

› Orphan pages / hollow topics

› Hierarchy depth > 3

› Stale raw_path references

› Temporal chain gaps

› Subtopic ≥5 not promoted

› Topic >8 not sub-clustered

Structural fixes are silent. Semantic changes escalate to Confirm tier.

🔀 Reorganize — Your Taxonomy

You decide how to classify. The agent ensures three trees stay in sync — every time.

topics/

knowledge

≡

sources/

papers

≡

raw/compiled/

PDFs + figures

7-STEP CHECKLIST

1 Move topic files, update YAML
2 Move source directories to mirror
3 Move raw/compiled/ directories to mirror
4 Update raw_path in every affected source
5 Rewrite index.md
6 Log + journal
7 Verify — diff three trees, check all paths resolve

REAL CASE: User said "all topics converge to 'evolving topic'" → agent restructured 80 papers into 3-layer taxonomy, moved 11 directories, updated 67 raw_path fields. Zero broken links.

Status

What's Built

✓ LLM-compiled wiki with three-level indexing

✓ CRGP factors + Critical Analysis with anti-patterns

✓ Temporal positioning and evolutionary chains

✓ Proactive write-back (Silent / Notify / Confirm)

✓ 25+ lint checks across hierarchy, temporal, raw/ consistency

✓ Figure extraction pipeline with manifest

✓ Classification Fitness Check + milestone hierarchy

✓ Claude Code plugin packaging

The paper domain is fully developed. The architecture is domain-agnostic —
the same pipeline can power any domain where reading compounds.

AutoWiki

Your LLM compiles a wiki. Your knowledge compounds.

GitHub AlphaLab-USTC Karpathy's Vision

AutoWiki

Your Reading Never Compounds

The Karpathy Pattern

The Wiki is a Persistent,Compounding Artifact

Four Design Principles

Two Decoupled Layers

Survey-Style Paper Organization

The Atomic Knowledge Unit

The Wiki Updates Your Cognition

Karpathy's Vision, Realized

Why This Architecture?

Lint Repairs. Reorganize Evolves.

What's Built

The Wiki is a Persistent,
Compounding Artifact