Home News The first HBM4 GPU went into full production

The first HBM4 GPU went into full production

2026-01-28

Share this article :

Nvidia said on Monday that its next-generation Rubin AI chip is in "full production" and will be available in the second half of 2026. The company also released more details about its highly anticipated successor to the Blackwell series.

"We must advance computing technology every year; we can't fall behind even for a year," Nvidia CEO Jensen Huang said in his keynote address at CES 2026.

Rubin's remarks come amid growing concerns about an "AI bubble," as questions arise about the sustainability of large-scale AI infrastructure development.

The chip giant typically unveils its latest AI chip advancements at its GTC developer conference in San Jose, California, which will take place March 16-19 this year.

At the March 2025 GTC conference, Nvidia previewed its Vera CPU and Rubin GPU, stating that the Vera-Rubin chipset will offer significantly better AI training and inference performance than its predecessor, Grace-Blackwell. Inference refers to using trained AI models to generate content or perform tasks.

At Monday's presentation, Huang revealed more details about the Rubin series. The Rubin GPU boasts five times the inference computing performance and 3.5 times the training computing performance of Blackwell. Compared to Blackwell, the new generation of chips also reduces training and inference costs, with inference token costs reduced by up to 10 times.

The Rubin architecture contains 336 billion transistors and delivers 50 petaflops of performance when processing NVFP4 data. In comparison, Nvidia's previous generation GPU architecture, Blackwell, achieved a maximum performance of 10 petaflops. Meanwhile, Rubin's training speed is improved by 250%, reaching 35 petaflops.

A portion of the chip's computing power is provided by a module called the Transformer Engine, which was also released with Blackwell. According to Nvidia, Rubin's Transformer Engine is based on a newer design and features a performance enhancement called hardware-accelerated adaptive compression. Compressing files reduces the number of bits they contain, thereby reducing the amount of data that AI models need to process, thus speeding up processing.

Nvidia CEO Jensen Huang stated, "Rubin's arrival is perfectly timed, as the demand for AI training and inference computing is exploding. With our pace of launching a new generation of AI supercomputers every year, and the deep collaborative design of six new chips, Rubin marks a significant step forward for us towards the next frontier of AI."

According to Nvidia, Rubin will also be the first GPU to integrate HBM4 memory chips, boasting data transfer speeds of up to 22 TB per second, a significant improvement over Blackwell.

The company stated that the Rubin series of chips is already in "full production" and will increase production in the second half of this year.

Microsoft Azure and Nvidia-backed cloud service provider CoreWeave will be among the first companies to offer Rubin-powered cloud computing services in the second half of 2026.

At a media briefing on Sunday, Nvidia Senior Director Dion Harris stated that the early launch of the Rubin products was due to the chips "reaching some very key milestones in demonstrating real-world readiness," adding that the company is working to prepare the ecosystem for adoption of the Vera-Rubin architecture.

"Given our current readiness and the market's enthusiasm for Vera-Rubin, we saw this as an excellent opportunity to launch this product at CES," Harris said.

However, the earlier-than-expected release of the first-generation Rubin chips failed to impress the market, with Nvidia shares falling 0.13% in after-hours trading on Monday, after closing at $188.12.

Jensen Huang, wearing a shiny black leather jacket—a modified version of his signature style—delivered a keynote address to a packed audience of 3,000 at the BleauLive Theater in Las Vegas. The atmosphere was electric—the CEO was greeted with cheers, applause, and photos taken by the audience upon his arrival—a testament to the company's meteoric rise and its current status as a leading indicator of the AI era.

The CEO previously stated that even without China or other Asian markets, the company expects its state-of-the-art Blackwell AI chip and Rubin's "early capacity ramp-up" to generate $500 billion in revenue by 2026.

Meanwhile, Jensen Huang believes the future of AI will primarily reside in the physical world. On Monday, the day before CES 2026 officially opened, Nvidia announced partnerships with several manufacturers, robotics companies, and leading automakers, including BYD, LG Electronics, and Boston Dynamics.

Huang stated, "The ChatGPT moment in robotics has arrived. Breakthroughs in physical AI—models capable of understanding the real world, reasoning, and planning actions—are unlocking entirely new applications." He was referring to ChatGPT, the chatbot that ignited the generative AI revolution.

NVIDIA releases Vera Rubin NVL72 AI supercomputer

At CES 2026, artificial intelligence will be ubiquitous, and NVIDIA GPUs will be at the heart of this ever-expanding AI landscape. Today, in his CES keynote address, NVIDIA CEO Jensen Huang shared the company's plans to continue leading the AI revolution, as the technology's applications extend far beyond chatbots, encompassing robots, self-driving cars, and the wider physical world.

First, Huang officially unveiled NVIDIA's next-generation AI data center rack architecture, Vera Rubin. Rubin is the culmination of NVIDIA's so-called "ultimate collaborative design," comprised of six chips: a Vera CPU, a Rubin GPU, an NVLink 6 switch, a ConnectX-9 SuperNIC, a BlueField-4 data processing unit, and a Spectrum-6 Ethernet switch. These components together form the Vera Rubin NVL72 rack.

Rubin GPU

The demand for AI computing is endless, and each Rubin GPU promises even greater computing power in this generation: up to 50 PFLOPS of inference performance for NVFP4 data types, five times that of the Blackwell GB200; and up to 35 PFLOPS of training performance for NVFP4, 3.5 times that of Blackwell. To meet such massive computing resource demands, each Rubin GPU is equipped with eight HBM4 memory stacks, providing 288GB of capacity and 22 TB/s of bandwidth.

The computing power of each GPU is just one component of the AI data center. As leading large language models shift from dense architectures that activate all parameters to generate a given output lexicon to expert hybrid (MoE) architectures that activate only a subset of available parameters per lexicon, these models have become relatively more efficient at scaling. However, communication between experts within the model requires significant inter-node bandwidth.

Vera Rubin introduces NVLink 6 for scaling networks vertically, increasing the exchange matrix bandwidth per GPU to 3.6 TB/s (bidirectional). Each NVLink 6 switch offers 28 TB/s bandwidth, and each Vera Rubin NVL72 rack houses nine such switches, resulting in a total vertically scalable bandwidth of 260 TB/s.

The Nvidia Vera CPU features 88 custom Olympus Arm cores and Nvidia's so-called "spatial multithreading" technology, enabling up to 176 threads to run simultaneously. The NVLink C2C interconnect, connecting the Vera CPU to the Rubin GPU, doubles the bandwidth to 1.8 TB/s. Each Vera CPU can address up to 1.5 TB of SOCAMM LPDDR5X memory, providing a memory bandwidth of up to 1.2 TB/s.

To scale the Vera Rubin NVL72 rack to eight racks per DGX SuperPod, Nvidia introduced two Spectrum-X Ethernet switches using the Spectrum-6 chip, both integrating optical modules. Each Spectrum-6 chip provides 102.4 Tb/s of bandwidth, and Nvidia uses it in two switches. 

More products will be released simultaneously

NVIDIA officially launched its new CPU "Vera" and GPU "Rubin" for AI data centers. While related plans had been announced previously, CEO Jensen Huang formally unveiled these products during his keynote address in Las Vegas on January 5th.

In addition, the company also released high-speed networking products, such as the NVLink 6 switch (allowing rack expansion using Vera and Rubin), ConnectX-9 SuperNIC, BlueField-4 DPU, and Spectrum-6 Ethernet switch (allowing rack expansion within data centers).

Rubin is the successor to the current generation of GPUs, "Blackwell" (NVIDIA B300/B200/B100), employing a completely new GPU architecture and HBM4 memory. According to NVFP4 calculations, Blackwell achieves 10 PFLOPS of AI inference and training performance, while Rubin achieves 50 PFLOPS of inference performance (a 5x speedup) and 35 PFLOPS of training performance (a 3.5x speedup).

NVIDIA officially announced Vera, an Arm-based CPU featuring 88 custom-designed Olympus cores; and Rubin, a GPU designed for AI data centers, which will succeed the current Blackwell (B300/B200/B100) products.

Named after renowned American scientist Cooper Rubin, the Rubin GPU utilizes the Rubin architecture, offering more efficient AI computing compared to the Blackwell architecture. It also features new HBM4 memory technology, sixth-generation NVLink, confidential computing capabilities, and the RAS engine, enhancing platform-level performance and security.

These improvements reduce the cost per token for inference by up to one-tenth and the cost per token for training by up to one-quarter when using NVIDIA's advanced inference models and MoE (Expert Hybrid) models for implementing agent AI.

Compared to the previous generation Blackwell (likely the B200 found in the GB200), Rubin's NVFP4 inference performance is improved to 50 PFLOPS, a 5x performance increase; training performance is improved to 35 PFLOPS, a 3.5x performance increase (Blackwell's performance in both metrics is 10 PFLOPS). HBM4's memory bandwidth is 22 TB/s, 2.8 times that of Blackwell; and the NVLink bandwidth per GPU is 3.6 TB/s, a two-fold performance increase.

On the other hand, Vera is an Arm CPU featuring 88 custom-designed Olympus cores from NVIDIA. It supports NVIDIA's proprietary Spatial Multi-threading (SMT) technology, enabling it to function as a 176-thread CPU. It can be equipped with 1.5TB of LPDDR5X memory (three times the capacity of the previous generation Grace), based on the SOCAMM data center memory module standard, with a memory bandwidth of 1.2TB/s.

Like the Blackwell series, each Vera Rubin module will contain one Vera processor and two Rubin processors. Additionally, the Vera Rubin NVL72 will be launched, a scalable solution integrating 36 Vera Rubin processors into a single rack. The Vera Rubin NVL72 is equipped with an NVLink 6 switch supporting the sixth-generation NVLink protocol, and a single rack can accommodate 36 Vera CPUs and 72 Rubin GPUs.

In addition, NVIDIA plans to launch the "HGX Rubin NVL8," an OEM-oriented design integrating eight Rubin modules into a single server; and the "DGX Rubin NVL8," a server designed specifically for x86 processors. Customers can choose to pair Rubin with either NVIDIA's Arm CPUs or x86 CPUs.

Simultaneously, NVIDIA also released new high-speed networking products for scale-out, including the ConnectX-9 SuperNIC, BlueField-4 DPU, and Spectrum-6 Ethernet switch. These products can be used with the aforementioned Vera Rubin NVL72 and HGX Rubin NVL8 for scale-out functionality.

The company also released the "DGX SuperPOD with DGX Vera Rubin NVL72," an expanded supercomputer consisting of eight Vera Rubin NVL72 GPUs, serving as a reference design for artificial intelligence supercomputers. By leveraging software solutions such as CUDA, a single supercomputer can utilize 256 Vera CPUs and 512 Rubin GPUs.

According to the company, Vera and Rubin are planned for release in the second half of 2026 and will be available through the four major cloud service providers (AWS, Google Cloud, Microsoft Azure, and Oracle Cloud Infrastructure) and original equipment manufacturers (OEMs) such as Dell Technologies, HPE, Lenovo, and Supermicro. The company explained that AI model development companies such as OpenAI, Anthropic, and Meta have already announced their adoption plans.

Nvidia "absent" from CES for the first time in five years

The entire industry is facing a component shortage, and Nvidia just announced on its X platform that its 2026 CES keynote will "not release any new GPUs," undoubtedly pouring cold water on the last remaining hope of new PC assemblers. This breaks Nvidia's tradition of releasing new GPUs (both desktop and mobile) at CES for five consecutive years; this time, there will be no new hardware products.

Much of this year's presentation will likely focus on the latest advancements in artificial intelligence. Microsoft has showcased its latest chip products annually at CES since 2021. Recently, the RTX 50 series graphics cards made their debut at the iconic CES showroom in Las Vegas, and there have been persistent rumors that the RTX 50 Super series would also be released at CES 2026

While never officially confirmed, a DRAM shortage may have caused this release plan to be shelved. Otherwise, Nvidia could have released the RTX 40 Super series graphics cards at CES 2024, just one year after the release of the first Ada Lovelace graphics card. Furthermore, the company's latest Blackwell GPUs use GDDR7 memory, which is more difficult to produce. The situation has deteriorated to such an extent that there are even rumors that Nvidia will restart production of the RTX 3060, as that card uses GDDR6 memory and is manufactured using Samsung's older 8nm process.

Memory supply is the key issue. If the factories behind the scenes were completely paralyzed, Nvidia wouldn't be able to release new GPUs. Globally, only three companies—Micron, SK Hynix, and Samsung—can produce cutting-edge DRAM, and they're all happy to sell their products to AI customers for higher profits. The thirst for Artificial General Intelligence (AGI) has prompted companies like OpenAI to set groundbreaking computing goals far exceeding the capacity of our existing supply chains.


Some might wonder why governments aren't stepping in to help consumers. Isn't regulating the market their responsibility? Unfortunately, geopolitical factors complicate matters further, as cutting-edge AI represents another arms race, and Washington wants to maintain its lead over China.

Ultimately, there won't be a savior. Like the memory crisis of 2014 and the various GPU shortages of the past decade, we can only expect a turnaround when the AI hype stalls. Currently, Nvidia graphics card prices haven't risen, so this might be our last chance to return to the era of scalpers. However, some in the community, like Sapphire's PR manager, remain hopeful that this storm will eventually pass.

Source: Compiled from Nikkei, etc.



View more at EASELINK


HOT NEWS

Glass substrates, transformed overnight

AI,chip,Nvidia,Nvidia,chip,Nvidia,news,Vera,Rubin,NVL72,Vera,Rubin,NVL72,AI,supercomputer,HBM4,GPU,HBM4,GPU,manufacturers,memory,memory,chips,GPUs

In August 2024, a seemingly ordinary personnel change caused a stir in the semiconductor industry. Dr. Gang Duan, a longtime Intel chi...

2025-08-22

UFS 4.1 standard is commercially available, and industry giants respond positively

The formulation of the UFS 4.1 standard may accelerate the implementation of large-capacity storage such as QLC

2025-01-17

Amazon halts development of a chip

Amazon has stopped developing its Inferentia AI chip and is instead focusing on semiconductors for training AI models, an area the com...

2024-12-10

DRAM prices plummet, and the future of storage is uncertain

The DRAM market will see a notable price decline in the first quarter of 2025, with the PC, server, and GPU VRAM segments expe...

2025-01-06

US invests $75 million to support glass substrates

US invests $75 million to support glass substrates. In the last few days of the Biden administration in the United States, it has been...

2024-12-12

SOT-MRAM, Chinese companies achieve key breakthrough

SOT-MRAM (spin-orbit moment magnetic random access memory), with its nanosecond write speed and unlimited erase and write times, is a...

2024-12-30

TSMC's 2nm leak: Inside story revealed

TSMC has entered a "one-man showdown" in advanced processes, with 2nm undergoing unprecedented investment and expansion. Etching is a c...

2025-09-08

Understanding the Importance of Signal Buffers in Electronics

Have you ever wondered how your electronic devices manage to transmit and receive signals with such precision? The secret lies in a small ...

2023-11-13

Address: 73 Upper Paya Lebar Road #06-01CCentro Bianco Singapore

AI,chip,Nvidia,Nvidia,chip,Nvidia,news,Vera,Rubin,NVL72,Vera,Rubin,NVL72,AI,supercomputer,HBM4,GPU,HBM4,GPU,manufacturers,memory,memory,chips,GPUs AI,chip,Nvidia,Nvidia,chip,Nvidia,news,Vera,Rubin,NVL72,Vera,Rubin,NVL72,AI,supercomputer,HBM4,GPU,HBM4,GPU,manufacturers,memory,memory,chips,GPUs
AI,chip,Nvidia,Nvidia,chip,Nvidia,news,Vera,Rubin,NVL72,Vera,Rubin,NVL72,AI,supercomputer,HBM4,GPU,HBM4,GPU,manufacturers,memory,memory,chips,GPUs
Copyright © 2023 EASELINK. All rights reserved. Website Map
×

Send request/ Leave your message

Please leave your message here and we will reply to you as soon as possible. Thank you for your support.

send
×

RECYCLE Electronic Components

Sell us your Excess here. We buy ICs, Transistors, Diodes, Capacitors, Connectors, Military&Commercial Electronic components.

BOM File
AI,chip,Nvidia,Nvidia,chip,Nvidia,news,Vera,Rubin,NVL72,Vera,Rubin,NVL72,AI,supercomputer,HBM4,GPU,HBM4,GPU,manufacturers,memory,memory,chips,GPUs
send

Leave Your Message

Send