← Topic Example SOURCE OUTPUT GitHub →
AutoWiki Source Output — Real Example
type: source
id: memskill
milestone: [[memory-evolution]][[self-evolving-skill-libraries]]
tags: [memory-evolution, self-evolving-skill-libraries, year/2026-02]

MemSkill

Zhang et al., 2026

arXiv 2026-02

MemSkill reframes agent memory extraction from fixed hand-designed operations into a learnable, evolvable skill bank — where a controller selects skills, an LLM executor produces skill-conditioned memories, and a designer evolves the skill set from hard cases.

Context

As LLM agents engage in longer, open-ended interactions, they must handle growing histories. Most memory systems rely on static, hand-designed operations — fixed primitives that bake in strong human assumptions about what to store and how to revise memory.

Gap

No prior work treats memory operations themselves as first-class learnable entities. The field conflates two orthogonal problems: what to remember (content) and how to remember it (skill).

Proposal

A skill bank of structured memory operations. Controller (RL) selects Top-K skills → LLM executor applies them in one pass → Designer evolves skills from hard cases. Closed-loop optimization over both selection policy and skill set.

Fig 1 Architectural shift from handcrafted turn-level operations to skill-conditioned span-level generation — prior methods interleave fixed operations with LLM calls per turn; MemSkill selects skills from a shared bank and applies them in one pass.

Fig 2 Full system: interaction trace → span segmentation → controller skill selection (Top-K) → LLM executor memory construction → skill designer evolution from hard cases.

Fig 3 LoCoMo-trained skill bank transferred to HotpotQA at 3 context lengths (50/100/200 docs): gains grow at longer contexts, validating that evolved skills generalize across domains.

Fig 4 Concrete examples of self-evolved memory skill cards (Purpose, When-to-use, How-to-apply, Constraints) from two domains.

Memory operations ("how to remember") are a distinct and separately evolvable layer from memory content ("what is stored") — treating them as learnable skills unlocks a new optimization dimension.

priorThe field treated memory management as a content problem and solved it with fixed procedural logic updateThe extraction procedure is itself a variable — separating "how" from "what" enables joint optimization of both

Span-level (multi-turn) skill-conditioned memory construction outperforms turn-level interleaved operations — the granularity of memory extraction has significant downstream impact.

priorTurn-level processing was assumed necessary for capturing fine-grained memory updates updateBatching multiple turns into a span allows skills to reason about temporal relationships across turns, producing more coherent memories

The skill designer evolves skills based on hard cases but has no mechanism to detect skill redundancy or contradiction — skill bank bloat can grow over time.

Root cause: Skills are evaluated by utility on individual hard cases, not by coherence with the full skill bank.
Implication: Self-evolving skill banks risk growing unwieldy — a maintenance mechanism (analogous to SkillFoundry's prune/merge) is necessary.

The RL-based controller learns skill selection for specific interaction distributions; transferred to new domains, the selection policy may be suboptimal even if skills generalize.

Root cause: Skill selection policies encode task-specific priors — they are not distribution-free.
Implication: Generalization results may reflect skill content generalization rather than controller generalization.

Joint optimization of memory skill content and skill selection policy in a single training objective.

Prerequisite: A differentiable representation of skill content allowing gradients to flow from downstream performance back to the designer.
Closest attempt: Memory-R1 optimizes memory via RL but with fixed primitives — adding skill content as a learnable variable is the natural extension.

Skill bank sharing across agent memory systems — allowing independently trained MemSkill instances to share evolved skills via a compatibility registry.

Prerequisite: Formalized skill representation with typed applicability conditions.
Closest attempt: EvoSkills demonstrated cross-model skill transfer for task execution; the analogous transfer for memory skills is unexplored.

MemSkill (2026-02) applies the self-evolving skill paradigm established by CASCADE (2025-12) to the memory domain, arriving simultaneously with SkillRL and S1-NexusAgent.

extends cascade — CASCADE applies skill acquisition to external tool mastery; MemSkill applies the same paradigm internally to memory operations — same learning principle, different target layer
complements skillrl — SkillRL evolves task execution skills via RL co-evolution; MemSkill evolves memory extraction skills via RL controller + LLM designer — orthogonal skill domains, shared optimization pattern
contrasts yunjue-agent — Yunjue evolves executable Python tool primitives; MemSkill evolves abstract memory operation specifications — executable code vs. declarative skill descriptions
contrasts evoskills — EvoSkills generates multi-file executable skill bundles via co-evolutionary verification; MemSkill generates structured descriptions via RL + LLM designer — different skill types and evolution mechanisms

The controller + executor + designer trinity (select, apply, evolve from failures) is a clean three-component architecture for any self-evolving system.

Span-level vs. turn-level processing is a generalizable design choice — the optimal granularity is task-dependent and should be learned, not fixed.

The hard-case-triggered evolution mechanism is an efficient alternative to continuous evolution — focuses improvement on highest-impact failures.

2026-04-08 · from ingest conversation

"What to remember" and "how to remember" are two independent optimization problems that have been conflated in prior memory systems — separating them enables a new research direction in self-evolving memory.

? How large can the skill bank grow before Top-K selection becomes a bottleneck — is there an optimal skill bank size beyond which performance plateaus?
? Can the evolved skill cards be automatically converted to executable memory procedures, bridging MemSkill's declarative skills with SkillRL's executable heuristics?
? How does MemSkill interact with the base LLM's in-context learning — do better base models require fewer evolved skills?
? What is the minimum hard-case rate that justifies triggering designer evolution vs. the computational cost of LLM-based skill refinement?