Skip to content
Artificial Intelligence Technology

AI Revolutionizes Server Memory: Goodbye RAM in Every Machine

Artificial intelligence is redefining data center infrastructure, driving a radical shift in how memory is managed. Forget the notion that RAM must reside within each individual server.

person Redacción Tricuatro calendar_month 15 May, 2026 schedule 2 min read

Attention, tech enthusiasts! Artificial intelligence is sparking a revolution in data center infrastructure, turning a fundamental server rule on its head: memory no longer wants to live in every machine. The key lies in shifting memory logic, much like we already do with storage, to centralized external systems.

The idea is simple yet transformative: instead of relying solely on the RAM integrated into each server, a significant portion of this memory will move to large external systems. These memory clusters, nicknamed “memory godboxes,” will be able to allocate capacity according to the dynamic needs of each moment, freeing individual machines from their dependence on local memory.

This paradigm shift is made possible by advancements in technologies like Compute Express Link (CXL). While CXL has been in development for years, the growing memory demand driven by AI has given it a crucial boost. CXL provides a coherent interface that enables seamless communication between processors, memory, accelerators, and other peripherals, all built upon PCIe.

CXL's evolution has been progressive. Initially, it focused on expanding a server's memory through modules connected to PCIe slots. With CXL 2.0, the concept of “pooling” was introduced, allowing memory to be grouped into a common pool and allocated to different machines. However, CXL 3.0 marks a turning point by enabling broader topologies and, most importantly, shared memory between machines, albeit with certain technical considerations.

Why is this so critical for AI? According to The Next Platform, AI faces not only computational limitations but also memory constraints. High Bandwidth Memory (HBM), crucial for GPUs, is ultra-fast but limited in capacity and expensive. Model training involves processing massive datasets, while inference challenges lie in the efficiency of responding to millions of user requests.

This is where the “conversation memory” comes into play. Each response generated by a language model is built sequentially. To avoid recalculating everything at each step, systems use a “KV cache” that stores working memory. The Next Platform explains that this cache, which holds previous attention vectors, can grow exponentially, occupying more space than the model itself, especially in high-concurrency services.

AI is not held back solely by a lack of computation, but also by a lack of memory.

This vision is no longer mere theory. Companies like Panmnesia, Liqid, and UnifabriX are developing systems to externalize memory and make it accessible to multiple machines. Some employ CXL switches, while others utilize large DDR5 reserves. Enfabrica, for its part, presents its Emfasys system, designed for inference and capable of managing up to 18 TB of DDR5 per memory server, or 144 TB in a full rack. The industry is unanimously focused on optimizing memory placement to enhance AI performance.

This architectural shift promises not only greater efficiency and scalability but could also democratize access to high-capacity memory resources. For companies developing and deploying AI models, this means unprecedented flexibility and potential cost savings, by dynamically and optimally sharing memory resources. The era of centralized memory for AI has begun.

Share:
Also available in: ES

Related articles

Latest news

View all

Comments (0)

No comments yet. Be the first!

Leave a comment