A new ML paradigm for continual learning

The last decade has seen incredible progress in machine learning (ML), primarily driven by powerful neural network architectures and the algorithms used to train them. However, despite the success of large language models (LLMs), a few fundamental challenges persist, especially around continual learning, the ability for a model to actively acquire new knowledge and skills over time without forgetting old ones.

When it comes to continual learning and self-improvement, the human brain is the gold standard. It adapts through neuroplasticity — the remarkable capacity to change its structure in response to new experiences, memories, and learning. Without this ability, a person is limited to immediate context (like anterograde amnesia). We see a similar limitation in current LLMs: their knowledge is confined to either the immediate context of their input window or the static information that they learn during pre-training.

The simple approach, continually updating a model’s parameters with new data, often leads to “catastrophic forgetting” (CF), where learning new tasks sacrifices proficiency on old tasks. Researchers traditionally combat CF through architectural tweaks or better optimization rules. However, for too long, we have treated the model’s architecture (the network structure) and the optimization algorithm (the training rule) as two separate things, which prevents us from achieving a truly unified, efficient learning system.

In our paper, “Nested Learning: The Illusion of Deep Learning Architectures”, published at NeurIPS 2025, we introduce Nested Learning, which bridges this gap. Nested Learning treats a single ML model not as one continuous process, but as a system of interconnected, multi-level learning problems that are optimized simultaneously. We argue that the model’s architecture and the rules used to train it (i.e., the optimization algorithm) are fundamentally the same concepts; they are just different “levels” of optimization, each with its own internal flow of information (“context flow”) and update rate. By recognizing this inherent structure, Nested Learning provides a new, previously invisible dimension for designing more capable AI, allowing us to build learning components with deeper computational depth, which ultimately helps solve issues like catastrophic forgetting.

We test and validate Nested Learning through a proof-of-concept, self-modifying architecture that we call “Hope”, which achieves superior performance in language modeling and demonstrates better long-context memory management than existing state-of-the-art models.

Source link

What's Hot

Equinix trials landmark hydrogen power solution at Dublin data centre

The AI Revolution and the Physical Internet

Five ways to do least squares (with torch)

A new ML paradigm for continual learning

Five ways to do least squares (with torch)

The Download: a new hunt for dark matter and Kenya’s case for going solar

The Case Against Building Your Own Agent Platform – O’Reilly

Research into how AI can help users understand skin conditions

5 foundations for reshaping the future of education and AI

Jinhua Zhao named head of the Department of Urban Studies and Planning | MIT News

Understanding U-Net Architecture in Deep Learning

Hard-braking events as indicators of road segment crash risk

Redefining AI efficiency with extreme compression

Equinix trials landmark hydrogen power solution at Dublin data centre

The AI Revolution and the Physical Internet

Five ways to do least squares (with torch)

Announcing Amazon EC2 G7 instances accelerated by NVIDIA RTX PRO 4500 Blackwell Server Edition GPUs

Our Picks

Equinix trials landmark hydrogen power solution at Dublin data centre

The AI Revolution and the Physical Internet

What's Hot

A new ML paradigm for continual learning

Related Posts

Subscribe to Updates