AlphaLab-USTC
AutoWiki
Your LLM Compiles a Knowledge Base. You Just Read and Ask.
Implementing Andrej Karpathy's LLM Knowledge Base vision
The Problem
Your Reading Never Compounds
You consume dozens of sources a week. You take notes, highlight results, file them into folders.
Six months later, you can't remember how any of them connect.
🔄
RAG
Re-discovers knowledge from scratch on every query. Nothing accumulates.
🪦
Note-taking Apps
A graveyard of disconnected files. You organize once, then abandon.
📋
Summarizers
Shallow bullet points that miss what actually matters. No cross-references.
The Inspiration
The Karpathy Pattern
Using LLMs to build personal knowledge bases… a large fraction of my
recent token throughput is going less into manipulating code, and more
into manipulating knowledge.
— Andrej Karpathy
Instead of retrieving from raw documents at query time, the LLM
incrementally builds and maintains a persistent wiki —
a structured, interlinked collection that sits between you and the raw sources.
Core Idea
The Wiki is a Persistent,
Compounding Artifact
The cross-references are already there. The contradictions have already been flagged.
The synthesis already reflects everything you've read.
raw sources
immutable archive
→
LLM compiler
reads, synthesizes, writes
→
living wiki
compounds over time
→
You never write the wiki yourself — the LLM writes and maintains all of it.
You curate sources, ask questions, and think. The LLM does the grunt work.
What Makes AutoWiki Different
Four Design Principles
🧠
Cognitive Depth
Not shallow summaries. CRGP factors from the author's Introduction.
Critical Analysis with contrastive prior/update structure.
Built-in anti-patterns prevent generic filler.
🔗
Temporal Knowledge Graphs
Every source positioned in the field's timeline. Evolutionary chains,
cross-domain origins, temporal tensions. Topics auto-generate
Chronological Evolution sections.
🏠
Self-Maintaining Wiki
Three-tier autonomy: Silent → Notify → Confirm.
25+ lint checks catch broken links, hierarchy violations, stale data.
The wiki heals itself during normal use.
🎯
Classification That Thinks
3-question fitness check before every topic assignment.
Milestone hierarchy with consolidation (<5) and promotion (≥5).
No misorganization.
Architecture
Two Decoupled Layers
OBSIDIAN VAULT
Your reading UI — graph view, backlinks, Properties UI. All for free.
↕
KB/ — KNOWLEDGE BASE
sources/ paper pages ·
topics/ milestone nodes ·
journal/ cognitive timeline
Agent's domain — fully LLM-maintained. Zero Python write code.
↕
LLM (CLAUDE CODE)
Reads source → extracts factors → positions temporally → links → writes → self-lints
↕
RAW/ — SOURCE ARCHIVE
Immutable. Human drops sources into raw/new/, agent moves to raw/compiled/
📄 Ingest — 4-phase pipeline
🔍 Query — search → synthesize → write-back
🔧 Lint — 25+ consistency checks
🔀 Reorganize — three-tree sync
⚡
The SKILL.md IS the architecture — 390 lines of domain knowledge
Design Philosophy — Topics
Survey-Style Paper Organization
Each topic is a milestone node — a conceptual breakthrough that clusters papers.
Like a survey paper, it tells the story of how a research direction evolved over time.
CASE: AGENT-SELF-EVOLUTION (80 PAPERS, 13 MILESTONES)
agent-self-evolution.md ← split parent, 9 children
├─ Mechanism Layer
│ ├─ self-evolving-skill-libraries (7 papers)
│ ├─ memory-evolution (12 papers, 2 sub-children)
│ ├─ experience-driven-policy-evolution (5)
│ ├─ llm-guided-evolutionary-search (8)
│ └─ multi-agent-co-evolution (8)
├─ Application Layer
│ └─ domain-applications (10 → scientific + clinical)
└─ Cross-Cutting Layer
├─ agentic-evolution-theory (6)
├─ agent-safety-adversarial-evolution (5)
└─ evolving-agent-surveys-benchmarks (6)
CASE: HARNESS-ENGINEERING — CHRONOLOGICAL EVOLUTION
Phase 1 Foundations (2026-01) → Phase 2 Evaluation (2026-02) → Phase 3 Specialization (2026-03)
Chain: meta-context-eng → building-effective-agents → nl-agent-harnesses → alara-agents
🌳
Dual role: topic file organizes both knowledge structure and file tree
📂
File mapping: topic slug = source subdirectory
sources/agent-self-evolution/self-evolving-skill-libraries/*.md
⚖️
Auto-scaling: <5 → inline subtopics, ≥5 → split to own file, >8 → must sub-cluster
🎯
Fitness check before every paper — prevents keyword-based misclassification
Design Philosophy — Sources
The Atomic Knowledge Unit
Each source page is a complete, self-contained analysis — not a summary.
Here's a real page from our wiki: MemSkill (Zhang et al., 2026).
📖
Factors (CRGP)
Context → Related Work → Gap → Proposal
Extracted from the author's Introduction. Their framing, not ours.
🔬
Critical Analysis
Novel Insight — prior/update contrastive
Limitations — approach-level
Frontier — concrete next steps
⏳
Temporal Relations
Typed links: extends, complements, contrasts_with
Each with a delta — what specifically differs
🖼️
Figures
Auto-extracted via PyMuPDF. Agent reads manifest.json to select figures with one-line interpretations.
💡
Cognitive Shifts
Timestamped insights from human-agent conversation. The reading process creates knowledge too.
❓
Open Questions
Future papers can resolve them — agent marks with strikethrough and links to the answering source.
CASE: MEMSKILL.MD
GAP
"No prior work treats memory operations themselves as first-class learnable entities. The field conflates: what to remember vs. how to remember."
NOVEL INSIGHT
prior: Memory mgmt = content problem, solved with fixed logic
update: The extraction procedure is itself a variable — separating "how" from "what" enables joint optimization
RELATIONS
extends [[cascade]] — same learning principle, different target layer
complements [[skillrl]] — orthogonal skill domains, shared optimization pattern
contrasts_with [[yunjue-agent]] — executable code vs. declarative skills
COGNITIVE SHIFT
[2026-04-08] "What to remember" and "how to remember" are two independent optimization problems — separating them enables a new research direction.
OPEN QUESTION
Can evolved skill cards be auto-converted to executable procedures, bridging MemSkill's declarative skills with SkillRL's executable heuristics?
Design Philosophy — Proactive Updates
The Wiki Updates Your Cognition
The agent doesn't wait for instructions. When insights emerge, it
writes them back immediately.
Here are real journal entries from our 80-paper wiki:
synthesis
2026-04-09
Restructured wiki into three-layer taxonomy
trigger: User observed all topics converge to "evolving topic" with unclear inter-topic relationships
pages: index + agent-self-evolution + 14 topic files
action: Created mechanism / application / cross-cutting layers. Merged memory pair, domain pair into split parents.
maintenance
2026-04-09
Mirrored raw/compiled/ directory tree to match sources/topics
Moved 11 directories into nested structure. Updated raw_path YAML in all 67 source files. Three directory trees now perfectly mirrored.
batch-ingest
2026-04-08
Ingested 80 Evolving Agent & Harness Engineering papers
Classified into 12 milestones. Key cross-cutting themes: (1) self-evolution as third scaling axis, (2) convergence of memory + skill evolution, (3) harness engineering as distinct discipline
cognitive-shift
from memskill.md
"What to remember" and "how to remember" are two independent optimization problems — separating them enables a new research direction in self-evolving memory.
THREE-TIER AUTONOMY
SILENT
Broken links, index sync, raw_path fixes
NOTIFY
Cognitive shifts, resolved open questions
CONFIRM
New topics, taxonomy restructure
DUAL-WRITE RULE
Every proactive update writes to the relevant page in-place and appends a journal entry — full audit trail.
↑ All entries above are real — from our agent-self-evolution wiki built with 80 papers over 2 days.
Vision → Implementation
Karpathy's Vision, Realized
| Karpathy's Vision |
AutoWiki's Implementation |
| "Index source documents into directory" | raw/ — source documents with extracted assets |
| "LLM incrementally compiles a wiki" | kb/ — structured analysis, critical synthesis, temporal positioning |
| "Backlinks, categorizes, writes articles, links them" | Three-level index + [[wikilinks]] + milestone hierarchy |
| "Use Obsidian as the IDE frontend" | Entire project root is an Obsidian vault |
| "LLM writes and maintains all the data" | Agent owns kb/ — proactive write-back with 3-tier autonomy |
| "Explorations and queries always add up" | Every query can produce journal entries + updated cross-references |
| "LLM health checks over the wiki" | 25+ lint checks: hierarchy, temporal, broken links, orphan pages |
Design Decisions
Why This Architecture?
Why Markdown, not a database?
LLMs work natively with text. No ORM, no migrations. grep is sufficient at personal scale. Obsidian renders it beautifully.
Why Claude Code as the compiler?
Zero Python write code in the KB layer. The LLM uses built-in tools to manipulate markdown directly. The compiler adapts without code changes.
Why not RAG?
A well-maintained index + grep outperforms vector search at ~100s of sources. No embedding pipeline. Karpathy agrees.
Why a skill, not an app?
SKILL.md IS the architecture — 390 lines encoding quality standards, anti-patterns, and workflow rules. No servers. No infra.
Self-Healing Wiki
Lint Repairs. Reorganize Evolves.
The wiki doesn't just store knowledge — it heals itself when things break,
and restructures itself when you rethink the taxonomy.
🔧
Lint — Self-Repair
25+ checks run during every operation, not just on command.
When the agent reads any page, it detects and fixes silently:
› Broken [[wikilinks]]
› Feeds ↔ Cluster mismatch
› Orphan pages / hollow topics
› Hierarchy depth > 3
› Stale raw_path references
› Temporal chain gaps
› Subtopic ≥5 not promoted
› Topic >8 not sub-clustered
Structural fixes are silent. Semantic changes escalate to Confirm tier.
🔀
Reorganize — Your Taxonomy
You decide how to classify. The agent ensures
three trees stay in sync — every time.
≡
≡
raw/compiled/
PDFs + figures
7-STEP CHECKLIST
1 Move topic files, update YAML
2 Move source directories to mirror
3 Move raw/compiled/ directories to mirror
4 Update raw_path in every affected source
5 Rewrite index.md
6 Log + journal
7 Verify — diff three trees, check all paths resolve
REAL CASE:
User said "all topics converge to 'evolving topic'" → agent restructured 80 papers into 3-layer taxonomy, moved 11 directories, updated 67 raw_path fields. Zero broken links.
Status
What's Built
✓
LLM-compiled wiki with three-level indexing
✓
CRGP factors + Critical Analysis with anti-patterns
✓
Temporal positioning and evolutionary chains
✓
Proactive write-back (Silent / Notify / Confirm)
✓
25+ lint checks across hierarchy, temporal, raw/ consistency
✓
Figure extraction pipeline with manifest
✓
Classification Fitness Check + milestone hierarchy
✓
Claude Code plugin packaging
The paper domain is fully developed. The architecture is domain-agnostic —
the same pipeline can power any domain where reading compounds.
AutoWiki
Your LLM compiles a wiki. Your knowledge compounds.