← GitHub TOPIC OUTPUT Source Example →
AutoWiki Topic Output — Real Example

Agent Self-Evolution

80 papers compiled into a milestone node with 9 children across 3 layers. This is what the LLM writes and maintains — you never touch it.

The paradigm of LLM-based agents that autonomously improve their capabilities post-deployment — treating evolution-time compute as a third scaling axis alongside training-time and inference-time compute.

Three converging trends crystallized agent self-evolution as a distinct research direction:

ChildDefinitionPapers
MECHANISM
self-evolving-skill-libraries
Autonomous construction and accumulation of reusable executable skills7
MECHANISM
memory-evolution
Evolution of agent memory from architecture to content12
MECHANISM
experience-driven-policy-evolution
Policy evolution through training-time or test-time learning from trajectories5
MECHANISM
llm-guided-evolutionary-search
LLM agents as variation operators in evolutionary program search8
MECHANISM
multi-agent-co-evolution
Co-evolution of agent populations and coordination strategies8
APPLICATION
domain-applications
Domain-specific instantiation of agent self-evolution in science and medicine10
CROSS-CUTTING
agentic-evolution-theory
Theoretical foundations defining agent self-evolution as a scaling axis6
CROSS-CUTTING
agent-safety-adversarial-evolution
Safety challenges and adversarial dynamics for self-evolving agents5
CROSS-CUTTING
evolving-agent-surveys-benchmarks
Surveys, taxonomies, and evaluation infrastructure6

Children organize into three orthogonal layers:

Mechanism Layer

Five children address distinct evolution targets: skill repertoire (self-evolving-skill-libraries), memory system (memory-evolution), decision policy (experience-driven-policy-evolution), programs/algorithms (llm-guided-evolutionary-search), agent populations (multi-agent-co-evolution).

Application Layer

domain-applications aggregates scientific and clinical instantiations that validate mechanism-level principles under real-world constraints — physics-grounded evaluation in science, safety-constrained evolution in medicine.

Cross-Cutting Layer

agentic-evolution-theory provides the conceptual vocabulary (evolution-time compute, clone-and-replace, epistemic routing). agent-safety-adversarial-evolution constrains how mechanisms can operate. evolving-agent-surveys-benchmarks provides evaluation infrastructure.

The "information gap as training signal" pattern — skill libraries exploit skill-augmented vs. skill-free performance, memory evolution exploits memory-rich vs. memory-poor contexts, policy evolution exploits successful vs. failed trajectories, evolutionary search exploits parent vs. offspring fitness.

Mechanism children materialize in applications: skill repertoire → scientific tool synthesis (venusfactory2, skillfoundry); memory system → clinical case accumulation (theraagent, skingpt-x); programs/algorithms → surrogate discovery (aero-blueprint, lensagent).

? Is there a universal convergence point where all mechanism-level evolutions produce equivalent capability growth, or are some evolution targets fundamentally more productive?
? How should evolution-time compute be budgeted relative to training-time and inference-time compute for a given deployment scenario?
? Does the three-axis scaling framework (training, inference, evolution) have diminishing returns analogous to Chinchilla scaling laws?
Tension with harness-engineering — the harness mediates between agent and environment; as agents self-evolve, can harnesses co-evolve, or does the harness become the fixed point that constrains evolution?
Tension with three-layer taxonomy — mechanism boundaries blur as the field matures (skill + memory co-evolution in memskill), applications generate novel mechanisms, and cross-cutting concerns may split further.