They abandoned HBM!

2025-11-12

The surging wave of AI has ushered in an unprecedented "super boom" for the storage market, which was previously known for its cyclical fluctuations. Driven by both the training and inference of large AI models, the demand for computing power has skyrocketed, making HBM a key component of AI servers. By stacking multiple layers of DRAM and tightly integrating with GPUs, it provides a faster data channel for AI computing, becoming the most sought-after "golden storage" in the AI era.

The surge in HBM demand has also fueled a boom across the entire storage industry chain. Samsung Electronics, SK Hynix, and Micron Technology, the world's three largest storage giants, have all seen explosive growth in performance. Samsung's third-quarter net profit increased by 21% year-on-year, SK Hynix achieved its highest quarterly profit in history, and Micron's net profit tripled year-on-year. SK Hynix also stated that its HBM capacity up to 2025 has been fully booked by customers.

Meanwhile, traditional DRAM and NAND chips are also unexpectedly in high demand.

Due to storage manufacturers' concentrated expansion of HBM production, conventional memory capacity is tightening, leading to a rebalancing of market supply and demand. Data center giants such as Amazon, Google, and Meta are making large-scale purchases of traditional DRAM to expand their AI inference and cloud service capabilities. In fact, in the AI inference stage, ordinary memory still plays an irreplaceable role—resulting in a "tight supply across the board" situation in the entire storage market.

The explosive popularity of LPDDR5

The first to gain widespread popularity was LPDDR, a component used in almost all smartphones.

Recently, Qualcomm released its new AI200 and AI250 data center accelerators, expected to be available in 2026 and 2027, respectively. These two new accelerators are said to compete with rack-class solutions from AMD and Nvidia, offering higher efficiency and lower operating costs when running large-scale generative AI workloads. This release also reaffirms Qualcomm's plan to release updated products annually.

Both the Qualcomm AI200 and AI250 accelerators are based on the Qualcomm Hexagon Neural Processing Unit (NPU), specifically designed for data center AI workloads. In recent years, the company has been progressively improving its Hexagon NPU, so the latest versions of these processors are equipped with scalar, vector, and tensor accelerators (in a 12+8+1 configuration), support for data formats such as INT2, INT4, INT8, INT16, FP8, and FP16, as well as micro-block inference for reduced memory traffic, 64-bit memory addressing, virtualization, and Gen AI model encryption for added security. For Qualcomm, scaling Hexagon to data center workloads is a natural choice, although it remains to be seen what performance targets the company will set for its AI200 and AI250 units.

Qualcomm's AI200 rack-mount solution will be the company's first data center-class inference system powered by the AI200 accelerator. Equipped with 768 GB of LPDDR memory (a considerable amount for an inference accelerator), it will use PCIe interconnects for vertical scaling and Ethernet for horizontal scaling. The system will feature direct liquid cooling, delivering up to 160 kW per rack – unprecedented power consumption for an inference solution. Furthermore, it will support confidential computing for enterprise deployments and will be available in 2026.

A year later, the AI250 retains this architecture but adds a near-memory computing architecture, increasing effective memory bandwidth by more than 10 times. Additionally, it will support decomposed inference capabilities, allowing compute and memory resources to be dynamically shared across different cards. Qualcomm positions it as a more efficient, high-bandwidth solution optimized for large Transformer models while retaining the same thermal, heat dissipation, security, and scalability features as the AI200.

However, many are not focused on the familiar story of yet another chipmaker attempting to challenge Nvidia, but rather on Qualcomm's radically different technological path in this AI arms race—equipping each accelerator card with up to 768GB of LPDDR memory, approximately 10 times the HBM capacity of Nvidia's H100.

Instead of adopting the industry-leading and expensive HBM, Qualcomm has directly applied its low-power LPDDR technology, perfected in the smartphone field, to data centers. This seemingly "downgraded" choice reveals another possibility for AI storage.

Interestingly, Qualcomm is not alone. Around the same time, other giants have also showcased similar technological approaches.

At the 2025 GTC conference, GPU giant Nvidia showcased its next-generation Vera Rubin super chip. This product, slated for mass production by the end of 2026, is the first to utilize LPDDR memory integrated into the SOCAMM2 module around its 88-core Vera CPU. While the two Rubin GPUs still feature eight HBM4 memory stacks, the inclusion of LPDDR is itself a significant signal—even the most ardent HBM proponents are beginning to allocate space for LPDDR in their system architectures.

Notably, NVIDIA also launched the new Rubin CPX AI chip, a "disaggregated" architecture product specifically optimized for inference, further confirming its strategic shift in the inference arena.

The 2025 OCP Global Summit

Meanwhile, at the 2025 OCP Global Summit, Intel unveiled its "Crescent Island" data center GPU, designed specifically for AI inference workloads and equipped with 160GB of LPDDR5X memory. Intel CTO Sachin Katti stated bluntly, "AI is shifting from static training to real-time, ubiquitous inference—driven by agent-based AI. Scalable these complex workloads require heterogeneous systems, matching the right silicon to the right task." This GPU, based on the Xe3P microarchitecture and optimized for air-cooled enterprise servers, is expected to begin customer sampling in the second half of 2026. Intel explicitly emphasized its focus on "power and cost optimization," as well as "large-capacity memory and bandwidth optimized for inference workflows."

The division of technical routes

The simultaneous shift of the three major chip giants to LPDDR is not a coincidence, but rather an industry-wide adjustment. Some organizations point out that by 2030, the number of inference workloads will be 100 times that of training workloads.

Industry insiders have begun to refer to the current bottleneck in AI as the "martini straw problem": the computing engine is the glass, while the data flows through the straw. No matter how powerful a chip is, its performance is limited by the speed of data inflow and outflow. Modern AI inference workloads are increasingly constrained by memory rather than computation—as model sizes and context windows expand, the challenge lies not in chip computing speed, but in how to quickly deliver data to the processor.

The bottleneck of storage is precisely where the value of LPDDR solutions lies. According to research cited by Qualcomm, LPDDR memory offers 13 times the cost-effectiveness of HBM, allowing large language model inference workloads to run directly in memory without frequent data shuffling. The practical effects are faster response times, lower latency, and lower power consumption. Qualcomm claims its Cloud AI 100 Ultra architecture consumes 20 to 35 times less power than comparable NVIDIA configurations under certain inference workloads.

Of course, LPDDR solutions are not without their costs. Compared to HBM, it suffers from lower memory bandwidth, higher latency due to narrower interfaces, and unproven reliability in 24/7 high-temperature server environments. However, the key difference lies in the application scenarios.

In training scenarios, extreme memory bandwidth is required to handle backpropagation of massive amounts of data, making HBM irreplaceable. In inference scenarios, however, model parameters are fixed, and the focus is on large-capacity storage and efficient retrieval; LPDDR's capacity and cost advantages far outweigh its bandwidth disadvantages.

Notably, Qualcomm's AI250 solution goes a step further, introducing an innovative memory architecture based on "near-memory computing," claiming to offer more than 10 times the effective memory bandwidth and lower power consumption, enabling decomposed AI inference to efficiently utilize hardware. Both solutions employ direct liquid cooling, with rack-level power consumption of only 160 kilowatts—a highly attractive figure given that data center energy consumption is doubling every three years.

When data centers start grabbing mobile phone memory

The shift in AI storage technology is also brewing a supply chain crisis that could impact the global consumer electronics market.

First, it's clear that the amount of LPDDR memory required for a single AI inference rack is staggering. Taking Qualcomm's AI200 as an example, a single rack might contain dozens of accelerator cards, each with 768GB of memory, resulting in a total memory capacity of tens of terabytes. This is equivalent to the memory usage of hundreds of thousands, or even millions, of smartphones.

And this is just one product from one company. When Qualcomm, Intel, Nvidia, and other potential entrants (such as AMD and Broadcom) mass-produce LPDDR solutions in 2026-2027, the demand for LPDDR will grow exponentially.

Currently, LPDDR production capacity is not unlimited and is mainly controlled by three suppliers: Samsung, SK Hynix, and Micron. Data center customers are characterized by huge purchasing volumes, high profit margins, and stable, long-term orders. In contrast, while the smartphone market is massive, the amount of memory used per device is small, price-sensitive, and subject to significant seasonal fluctuations.

From a supplier's perspective, the priorities are obvious. This could not only lead to data center orders crowding out consumer electronics market share, similar to the GPU shortage caused by cryptocurrency mining in 2017-2018 and the chip shortage that forced automakers to halt production in 2020-2021, but it would also force mobile phone manufacturers to face rising LPDDR procurement costs and longer delivery cycles. Ultimately, this could force mid-to-high-end phones to compromise on memory configurations or significantly increase prices.

However, for other mobile phone manufacturers, this could mean facing a difficult choice in 2026-2027: either accept higher memory costs, reduce memory configurations in flagship models, or find alternative solutions.

The arrival of LPDDR6

The so-called alternative may include the more expensive LPDDR6.

Recently, JEDEC (Solid State Technology Association), the global semiconductor standards organization, officially released its latest standard document, JESD209-6, marking the official debut of the next-generation low-power memory—LPDDR6. This is not only a major evolution of the LPDDR series, but also the first standard to mention DDR6 in official specifications. Five years after the release of the DDR5 standard, with the rapid development of AI computing power, mobile devices, and edge intelligence, the industry urgently needs a new memory architecture that combines high bandwidth, low power consumption, and high reliability. The birth of LPDDR6 is timely.

JEDEC stated that LPDDR6 has achieved systemic upgrades in performance, energy efficiency, security, and stability. Its core architecture has evolved from the traditional dual-channel (DDR4's single 64-bit channel was split into two independent 32-bit sub-channels in the DDR5 era) to four 24-bit sub-channels, achieving higher parallelism and lower access latency. In addition, LPDDR6 has undergone in-depth optimization in power management. It not only further reduces the operating voltage, but also introduces new mechanisms such as DVFSL (Dynamic Voltage and Frequency Scaling), which can dynamically adjust power consumption according to the operating load to extend battery life.

In terms of performance, LPDDR6 boasts data rates ranging from 10,667 to 14,400 MT/s, with an effective bandwidth of approximately 28.5 to 38.4 GB/s. This speed surpasses the current overclocking record of DDR5-12054, providing ample bandwidth and responsiveness for AI smartphones, thin and light laptops, and automotive intelligent systems.

As the core of global semiconductor standard setting, JEDEC members cover the entire industry chain from chip design to manufacturing and testing. Following the release of the LPDDR6 standard, companies such as Cadence, Synopsys, Advantest, Keysight, MediaTek, Qualcomm, Samsung, Micron, and SK Hynix have already expressed their support. This means that the next-generation LPDDR6 is expected to be widely adopted by the industry in a short period of time. Although JEDEC has not yet released the final DDR6 specification for desktop platforms, the official statement indicates that the relevant standard will be released within the year.

From a timeline perspective, DDR5 entered the mass production market approximately one year after its release in 2020, and LPDDR6 is expected to follow a similar path. Especially with major manufacturers planning to gradually phase out DDR4 production starting in 2025, the arrival of LPDDR6 marks a crucial juncture in the transition from old to new standards.

It's worth noting that Synopsys has already completed silicon bring-up verification for its LPDDR6 IP based on TSMC's N2P process node. Silicon bring-up verification is a critical stage in chip design, signifying that the core design has reached a level of technological maturity suitable for mass production. This IP comprises two main parts: a controller responsible for JEDEC protocol parsing and low-power management, and a physical layer interface (PHY) built upon N2P's metal stacking and I/O library to achieve higher signal integrity and density.

Thanks to N2P's leading performance, power consumption, and area (PPA), Synopsys' LPDDR6 IP achieves a bandwidth of up to 86 GB/s, along with higher energy efficiency and a more compact physical size, providing strong support for AI terminals and high-efficiency computing platforms. The theoretical peak speed of the JEDEC standard can even reach 115 GB/s, meaning that compared to LPDDR5, the new generation standard has achieved a leap forward in both speed and power consumption.

With LPDDR6 expected to officially enter mass production next year, it may replace LPDDR5 and become the standard for smartphones in the future, although its price may also rise accordingly.

LPDDR 5, too expensive to afford?

This shift from HBM to LPDDR essentially marks a move in the AI industry from a reckless technology race to a more cost-effective commercial deployment.

Nvidia's CUDA software stack remains unmatched in AI training, exhibiting strong developer lock-in. However, the situation is entirely different in inference: models are already trained and only require efficient execution; developer lock-in is far less pronounced, and developers are extremely price-sensitive.

This opens the door for companies like Qualcomm and Intel to compete in entirely new ways. They are not trying to create larger GPUs to directly challenge Nvidia, but rather focusing on the reality that most AI models do not require daily retraining, only efficient operation, and can run anywhere.

Qualcomm's advantage lies in combining its mobile DNA with data center-level scalability. Intel is also emphasizing its end-to-end capabilities from AI PCs to data centers and industrial edge computing, as well as its collaboration with communities like the Open Compute Project (OCP).

The future AI hardware market may exhibit a clear stratification. HBM will remain irreplaceable in the training market, with Nvidia/AMD continuing to dominate. However, LPDDR is poised to emerge as a dark horse in the inference market, becoming the choice for next-generation AI chips.

But the rise of LPDDR may come at a cost borne by billions of smartphone users worldwide. As data centers begin to seize LPDDR supplies traditionally reserved for consumer electronics, we may witness an ironic scenario: supercomputers training AI will be equipped with cutting-edge HBM, inference clusters running AI services will use "phone memory," while actual smartphone users may face memory shortages, price increases, or downgraded configurations in 2026-2027.

This is the paradox of technological progress: the efficiency revolution in AI inference may be coming at the expense of consumer interests. While chip giants celebrate optimizing the TCO of data centers, the smartphones in the hands of ordinary users are becoming the most vulnerable link in this industrial transformation.

Souce: Semiconductor Industry Observation

View more at EASELINK

Previous: This chip will subvert traditional mobile phones Next: Memory chip prices have gone crazy

Back to list