When to Memorize and When to Stop: Gated Recurrent Memory for Long-Context Reasoning

Leheng Sheng^1,2 Yongtao Zhang¹ Wenchang Ma¹ Yaorui Shi³ Ting Huang¹ Xiang Wang³ An Zhang³ Ke Shen¹ Tat-Seng Chua²

¹Bytedance Seed
²National University of Singapore
³University of Science and Technology of China

Paper (TBD) Code (TBD)

Introduction

Long-context reasoning remains challenging for LLMs: performance degrades as context grows, and inputs beyond the context window are difficult to handle effectively. MemAgent introduced an RNN-like chunk-by-chunk memory workflow, but two practical limitations remain: memory can explode due to indiscriminate updates, and the workflow lacks an early-exit mechanism.

We propose GRU-Mem, a gated recurrent memory framework with two text-controlled gates: an update gate (UG) deciding whether memory should be updated at a step, and an exit gate (EG) deciding whether the recurrent loop should stop once sufficient evidence is collected. The model is trained end-to-end with explicit rewards for update and exit behaviors.

Across diverse long-context tasks, GRU-Mem improves both effectiveness and efficiency over vanilla MemAgent, and achieves up to 400% inference speed acceleration in selected settings.

GRU-Mem teaser — MemAgent limitations and GRU-Mem motivation.

Method

Gated Recurrent Memory Workflow

At step $t$, the memory agent outputs $\mathcal{U}_t, \hat{\mathcal{M}}_t, \mathcal{E}_t = \phi_\theta(\mathcal{Q}, \mathcal{C}_t, \mathcal{M}_{t-1})$, where $\mathcal{U}_t$ controls memory update and $\mathcal{E}_t$ controls early exit.

If $\mathcal{U}_t=\texttt{True}$, memory is updated with candidate memory; otherwise the previous memory is retained. If $\mathcal{E}_t=\texttt{True}$, the workflow terminates and answers from terminal memory.

GRU-Mem workflow — GRU-Mem memory update and early-exit workflow.

End-to-End RL Optimization

GRU-Mem introduces dedicated rewards for gate behaviors: $r^{\text{update}}$ for correct update decisions and $r^{\text{exit}}$ for proper exit timing, together with outcome and format rewards.

The final advantage combines trajectory-level and turn-level terms:

$$ \hat{A}_{g,t,i}=\alpha\hat{A}^{\text{traj}}_{g,t,i}+(1-\alpha)\hat{A}^{\text{turn}}_{g,t,i}. $$

Advantage calculation — Trajectory-level and turn-level advantage calculation.

Experiments

RQ1: Performance and Efficiency

On Qwen2.5-3B/7B backbones and 10 long-context tasks (HQA, SQuAD, SK/MK/MQ/MV), GRU-Mem generally outperforms vanilla MemAgent with clear inference-time reductions. With exit gate enabled, acceleration can reach up to 400% in selected scenarios.

Performance and efficiency on MV — Performance-efficiency tradeoff across context lengths on MV.

RQ2: Gating Mechanism Analysis

The update gate slows memory growth and mitigates memory explosion; the exit gate enables meaningful early stopping under evidence-unbalanced settings.

Memory dynamics — Memory size dynamics under long-context inference.

Early/exact/late exit ratio — Early, exact, and late exit ratio under top-20% evidence setting.

RQ3: Ablation Study

Ablation confirms that RL training and reward balancing are critical. A mild advantage-mixing coefficient (e.g., $\alpha=0.9$) yields better balance between evidence-present and evidence-free update decisions.

Ablation results — Effectiveness of RL training.

Limitations

Current evaluation focuses on QA-oriented long-context reasoning; broader task types such as summarization remain underexplored. In addition, introducing extra reward terms for gating can reduce training stability and may require smaller off-policy degree and longer convergence.

Conclusion

GRU-Mem extends recurrent memory reasoning with two controllable gates: update-when-needed and stop-when-sufficient. This design improves long-horizon stability by reducing memory explosion risk, while improving runtime efficiency through early termination. Across diverse long-context benchmarks, GRU-Mem consistently surpasses vanilla MemAgent and can deliver large inference speedups.

Citation

If you find this work useful, please cite:

@article{sheng2026grumem,
  title={When to Memorize and When to Stop: Gated Recurrent Memory for Long-Context Reasoning},
  author={Sheng, Leheng and Zhang, Yongtao and Ma, Wenchang and Shi, Yaorui and Huang, Ting and Wang, Xiang and Zhang, An and Shen, Ke and Chua, Tat-Seng},
  journal={arXiv preprint},
  year={2026}
}