Optimizing cloud economics with linear elastic caching

Testing linear elastic caching

To ensure our theory holds up in the real world, we conducted extensive experiments using two primary sources:

Production workloads: We integrated the system into Spanner.
Public traces: We tested against a variety of publicly available cache traces from industry benchmarks to ensure the results weren’t specific to Google’s infrastructure.

Production workloads

We developed a practical algorithm that assigns a time-to-live (TTL) to the cached page on each page request based on the page’s access patterns and costs. Because Spanner handles billions of requests per second, this TTL prediction model has to be incredibly lightweight. We opted for a shallow decision tree that can be translated into a few lines of C++ code. The resulting code is also easily interpretable and provides valuable insights on the workload characteristics. This model considers features such as the size of the data, the cost of a cache miss (when data isn’t in the cache and the system needs to retrieve it from some other, slower system like a disk), and the type of database operation being performed to predict the optimal TTL for each page.

We integrated the elastic caching policy into Spanner’s production servers over several months. Compared to a standard fixed-size cache, the results were substantial:

Memory usage: Reduced by 15.5%.
Cache misses: Increased by only 5.5%.
Total cost of ownership (TCO): Reduced by approximately 5%.

Crucially, because the algorithm is “cost-aware,” the small increase in cache misses was concentrated on data that is cheap to fetch from storage, meaning the impact on actual I/O costs was a negligible 0.5%.

Public traces

We also evaluated our elastic caching approach using several publicly available cache traces. We used an optimized implementation of the greedy dual size frequency (GDSF) eviction algorithm — a generalization of the well-known LRU policy that allows for pages of different sizes — as a fixed cache size baseline policy.

We considered four variants of elastic caching depending on which ski rental algorithm we used and whether or not we used a machine learned model. Since the available public traces don’t have application-level features available for training, we didn’t implement decision trees for prediction. Instead, we developed a simple learning strategy that splits each trace in half and uses the first half for training. For each individual page in the training trace, we computed the best TTL for the page that minimizes the cost over the training trace.

Since the behavior of the cache changes depending on what’s initially in the cache, a common practice, known as “warming up”, is to use some prefix of the cache trace to populate the cache but not actually measure performance on it. We warmed up all caches with one day’s worth of requests from the second half of the trace and used the rest for testing and measurements. During the test trace, if we encountered a page that was seen during training, we set the TTL to be the best precomputed TTL for that page. Otherwise, we set the TTL using either the breakeven or randomized policies.

Source link

What's Hot

The US Is About to Deport an Immigrant to the Center of the Ebola Crisis

Subsea resilience needs to move beyond cable count – here’s why (Reader Forum)

Optimizing cloud economics with linear elastic caching

Optimizing cloud economics with linear elastic caching

Scaling cybercrime disruption through innovation and AI

Exploring the societal impacts of AI | MIT News

Clustering Unstructured Text with LLM Embeddings and HDBSCAN

SpaceX wants to build AI data centers in space. Will it work?

DataRobot for Developers — integrating with the Google Antigravity CLI

Building AI Agents and Workflows for Every Role Without Coding with Great Learning

Understanding U-Net Architecture in Deep Learning

Hard-braking events as indicators of road segment crash risk

Redefining AI efficiency with extreme compression

The US Is About to Deport an Immigrant to the Center of the Ebola Crisis

Subsea resilience needs to move beyond cable count – here’s why (Reader Forum)

Optimizing cloud economics with linear elastic caching

Run isolated sandboxes with full lifecycle control: AWS Lambda introduces MicroVMs

Our Picks

The US Is About to Deport an Immigrant to the Center of the Ebola Crisis

Subsea resilience needs to move beyond cable count – here’s why (Reader Forum)

What's Hot

Optimizing cloud economics with linear elastic caching

Testing linear elastic caching

Production workloads

Public traces

Related Posts

Subscribe to Updates