AutoWiki Topic Output — Real Example

Agent Self-Evolution

80 papers compiled into a milestone node with 9 children across 3 layers. This is what the LLM writes and maintains — you never touch it.

Milestone Definition

The paradigm of LLM-based agents that autonomously improve their capabilities post-deployment — treating evolution-time compute as a third scaling axis alongside training-time and inference-time compute.

Arrival

Three converging trends crystallized agent self-evolution as a distinct research direction:

Scaling law diminishing returns motivated post-training improvement mechanisms
Deployment-time adaptation needs exposed the limits of static models
Biological evolution analogies provided architectural templates for autonomous capability growth

Child Milestones

Child	Definition	Papers
MECHANISM self-evolving-skill-libraries	Autonomous construction and accumulation of reusable executable skills	7
MECHANISM memory-evolution	Evolution of agent memory from architecture to content	12
MECHANISM experience-driven-policy-evolution	Policy evolution through training-time or test-time learning from trajectories	5
MECHANISM llm-guided-evolutionary-search	LLM agents as variation operators in evolutionary program search	8
MECHANISM multi-agent-co-evolution	Co-evolution of agent populations and coordination strategies	8
APPLICATION domain-applications	Domain-specific instantiation of agent self-evolution in science and medicine	10
CROSS-CUTTING agentic-evolution-theory	Theoretical foundations defining agent self-evolution as a scaling axis	6
CROSS-CUTTING agent-safety-adversarial-evolution	Safety challenges and adversarial dynamics for self-evolving agents	5
CROSS-CUTTING evolving-agent-surveys-benchmarks	Surveys, taxonomies, and evaluation infrastructure	6

Synthesis

Children organize into three orthogonal layers:

Mechanism Layer

Five children address distinct evolution targets: skill repertoire (self-evolving-skill-libraries), memory system (memory-evolution), decision policy (experience-driven-policy-evolution), programs/algorithms (llm-guided-evolutionary-search), agent populations (multi-agent-co-evolution).

Application Layer

domain-applications aggregates scientific and clinical instantiations that validate mechanism-level principles under real-world constraints — physics-grounded evaluation in science, safety-constrained evolution in medicine.

Cross-Cutting Layer

agentic-evolution-theory provides the conceptual vocabulary (evolution-time compute, clone-and-replace, epistemic routing). agent-safety-adversarial-evolution constrains how mechanisms can operate. evolving-agent-surveys-benchmarks provides evaluation infrastructure.

Unifying Meta-Principle

The "information gap as training signal" pattern — skill libraries exploit skill-augmented vs. skill-free performance, memory evolution exploits memory-rich vs. memory-poor contexts, policy evolution exploits successful vs. failed trajectories, evolutionary search exploits parent vs. offspring fitness.

Departure

Mechanism children materialize in applications: skill repertoire → scientific tool synthesis (venusfactory2, skillfoundry); memory system → clinical case accumulation (theraagent, skingpt-x); programs/algorithms → surrogate discovery (aero-blueprint, lensagent).

Open Questions

? Is there a universal convergence point where all mechanism-level evolutions produce equivalent capability growth, or are some evolution targets fundamentally more productive?

? How should evolution-time compute be budgeted relative to training-time and inference-time compute for a given deployment scenario?

? Does the three-axis scaling framework (training, inference, evolution) have diminishing returns analogous to Chinchilla scaling laws?

⚡ Tension with harness-engineering — the harness mediates between agent and environment; as agents self-evolve, can harnesses co-evolve, or does the harness become the fixed point that constrains evolution?

⚡ Tension with three-layer taxonomy — mechanism boundaries blur as the field matures (skill + memory co-evolution in memskill), applications generate novel mechanisms, and cross-cutting concerns may split further.