Helping AI have long-term memory

The Transformer architecture revolutionized sequence modeling with its introduction of attention, a mechanism by which models look back at earlier inputs to prioritize relevant input data. However, computational cost increases drastically with sequence length, which limits the ability to scale Transformer-based models to extremely long contexts, such as those required for full-document understanding or genomic analysis.

The research community explored various approaches for solutions, such as efficient linear recurrent neural networks (RNNs) and state space models (SSMs) like Mamba-2. These models offer fast, linear scaling by compressing context into a fixed-size. However, this fixed-size compression cannot adequately capture the rich information in very long sequences.

In two new papers, Titans and MIRAS, we introduce an architecture and theoretical blueprint that combine the speed of RNNs with the accuracy of transformers. Titans is the specific architecture (the tool), and MIRAS is the theoretical framework (the blueprint) for generalizing these approaches. Together, they advance the concept of test-time memorization, the ability of an AI model to maintain long-term memory by incorporating more powerful “surprise” metrics (i.e., unexpected pieces of information) while the model is running and without dedicated offline retraining.

The MIRAS framework, as demonstrated by Titans, introduces a meaningful shift toward real-time adaptation. Instead of compressing information into a static state, this architecture actively learns and updates its own parameters as data streams in. This crucial mechanism enables the model to incorporate new, specific details into its core knowledge instantly.

Source link

What's Hot

AI Is Watching Your Every Move on the Road. These State Laws Are Pushing Back

Atomic-Scale AFM for Labs by Park Systems

The ELM Library: An LLM Evaluation Toolset

Helping AI have long-term memory

Best Executive Programs to Build AI Leadership Across Business, Marketing, and Technology

Posit AI Blog: AO, NAO, ENSO: A wavelet analysis example

The Download: AI malaise and babymaking tech

The Best Risk Mitigation Strategy in Data? A Single Source of Truth – O’Reilly

Catalyzing scientific impact through global partnerships and open resources

Strengthening cyber capacity in Kenya: A new toolkit with lessons for the region

Understanding U-Net Architecture in Deep Learning

Hard-braking events as indicators of road segment crash risk

Redefining AI efficiency with extreme compression

AI Is Watching Your Every Move on the Road. These State Laws Are Pushing Back

Atomic-Scale AFM for Labs by Park Systems

The ELM Library: An LLM Evaluation Toolset

Fandom Con Ireland’s largest neurodivergent-led gaming convention returns to Belfast

Our Picks

AI Is Watching Your Every Move on the Road. These State Laws Are Pushing Back

Atomic-Scale AFM for Labs by Park Systems

What's Hot

Helping AI have long-term memory

Related Posts

Subscribe to Updates