MemSkill reframes agent memory extraction from fixed hand-designed operations into a learnable, evolvable skill bank — where a controller selects skills, an LLM executor produces skill-conditioned memories, and a designer evolves the skill set from hard cases.
As LLM agents engage in longer, open-ended interactions, they must handle growing histories. Most memory systems rely on static, hand-designed operations — fixed primitives that bake in strong human assumptions about what to store and how to revise memory.
No prior work treats memory operations themselves as first-class learnable entities. The field conflates two orthogonal problems: what to remember (content) and how to remember it (skill).
A skill bank of structured memory operations. Controller (RL) selects Top-K skills → LLM executor applies them in one pass → Designer evolves skills from hard cases. Closed-loop optimization over both selection policy and skill set.
Fig 1 Architectural shift from handcrafted turn-level operations to skill-conditioned span-level generation — prior methods interleave fixed operations with LLM calls per turn; MemSkill selects skills from a shared bank and applies them in one pass.
Fig 2 Full system: interaction trace → span segmentation → controller skill selection (Top-K) → LLM executor memory construction → skill designer evolution from hard cases.
Fig 3 LoCoMo-trained skill bank transferred to HotpotQA at 3 context lengths (50/100/200 docs): gains grow at longer contexts, validating that evolved skills generalize across domains.
Fig 4 Concrete examples of self-evolved memory skill cards (Purpose, When-to-use, How-to-apply, Constraints) from two domains.
Memory operations ("how to remember") are a distinct and separately evolvable layer from memory content ("what is stored") — treating them as learnable skills unlocks a new optimization dimension.
Span-level (multi-turn) skill-conditioned memory construction outperforms turn-level interleaved operations — the granularity of memory extraction has significant downstream impact.
The skill designer evolves skills based on hard cases but has no mechanism to detect skill redundancy or contradiction — skill bank bloat can grow over time.
Root cause: Skills are evaluated by utility on individual hard cases, not by coherence with the full skill bank.
Implication: Self-evolving skill banks risk growing unwieldy — a maintenance mechanism (analogous to SkillFoundry's prune/merge) is necessary.
The RL-based controller learns skill selection for specific interaction distributions; transferred to new domains, the selection policy may be suboptimal even if skills generalize.
Root cause: Skill selection policies encode task-specific priors — they are not distribution-free.
Implication: Generalization results may reflect skill content generalization rather than controller generalization.
Joint optimization of memory skill content and skill selection policy in a single training objective.
Prerequisite: A differentiable representation of skill content allowing gradients to flow from downstream performance back to the designer.
Closest attempt: Memory-R1 optimizes memory via RL but with fixed primitives — adding skill content as a learnable variable is the natural extension.
Skill bank sharing across agent memory systems — allowing independently trained MemSkill instances to share evolved skills via a compatibility registry.
Prerequisite: Formalized skill representation with typed applicability conditions.
Closest attempt: EvoSkills demonstrated cross-model skill transfer for task execution; the analogous transfer for memory skills is unexplored.
MemSkill (2026-02) applies the self-evolving skill paradigm established by CASCADE (2025-12) to the memory domain, arriving simultaneously with SkillRL and S1-NexusAgent.
The controller + executor + designer trinity (select, apply, evolve from failures) is a clean three-component architecture for any self-evolving system.
Span-level vs. turn-level processing is a generalizable design choice — the optimal granularity is task-dependent and should be learned, not fixed.
The hard-case-triggered evolution mechanism is an efficient alternative to continuous evolution — focuses improvement on highest-impact failures.
"What to remember" and "how to remember" are two independent optimization problems that have been conflated in prior memory systems — separating them enables a new research direction in self-evolving memory.