NPU will face a reshuffle

2024-11-27

Source: Content compiled from semiwiki

When the potential of edge AI first captured our imaginations, semiconductor designers realized that performance (and low power) required accelerators, and many decided to build their own. The requirements weren't too complex, commercial alternatives were limited, and who wanted to add another royalty fee to further reduce profit margins? We saw NPUs pop up everywhere, with in-house, startup, and commercial IP portfolios expanding. We're still in this mode, but there are already signs that this free-for-all must end, especially for edge AI.

Accelerating software complexity

The flood of innovation around neural network architectures, AI models, and base models is inevitable. From CNNs to DNNs, to RNNs, and finally (so far) to transformers, from architectures to models. From vision, audio/speech, radar and lidar to large language models, from base models to ChatGPT, Llama, and Gemini. The only certainty is that whatever you consider state-of-the-art today will have to be upgraded next year.

The complexity of the operators/instruction sets required to support these models has also exploded. A simple convolutional model might have once supported less than 10 operators, while now the ONNX standard supports 186 operators and NPUs allow for expansion of this core set. Today's models combine matrix/tensor, vector and scalar operations and math operations (activation, softmax, etc.). Supporting this range requires a software compiler to interface the underlying hardware to standard (simplified) network models. Additionally, an instruction set simulator is required to verify and check performance on the target platform.

NPU providers must now make ModelZoo of pre-validated/optimized models (CV, audio, etc.) generally available on their platforms to alleviate adoption/ownership cost concerns of buyers facing this complexity.

Accelerating Hardware Complexity

Training platforms are now very architecturally constrained, and the main question today is whose GPU or TPU you want to use. But that's not the case with inference platforms. Initially, these platforms were viewed as scaled-down versions of the training platforms, essentially converting floating point numbers into fixed, more strictly quantized word lengths. This view has now changed dramatically. Most hardware innovation today is happening in the inference space, especially for edge applications where competitive performance and power consumption are under intense pressure.

When optimizing a trained network for edge deployment, the pruning step zeroes out parameters that have little impact on accuracy. Keep in mind that some models today have billions of parameters, and zeroing these parameters could theoretically improve performance (and reduce power consumption) significantly because the calculations surrounding such cases can be skipped.

This "sparsity" enhancement works if the hardware runs one calculation at a time, but modern hardware takes advantage of massive parallelism in systolic array accelerators to increase speed. However, this accelerator cannot skip computations spread across the array. There are software and hardware workarounds to regain the benefits of pruning, but these are still under development and are unlikely to be resolved anytime soon.

Convolutional networks were the beginning of modern AI for many of us, and they remain a very important component of feature extraction, for example in many AI models and even in visual transformers (ViT). These networks can also be run on systolic arrays, but are less efficient than regular matrix multiplication common in LLM. Finding ways to further speed up convolution is a very popular research topic.

In addition to these huge acceleration challenges, there are vector calculations such as activation and softmax, which either require mathematical calculations that are not supported by standard systolic arrays, or can run on such arrays but are inefficient because most arrays Idle during single row or single column operation.

A common approach to solving this set of challenges is to combine tensor engines (systolic arrays), vector engines (DSPs), and scalar engines (CPUs) together, possibly in multiple clusters. The systolic array engine handles whatever operations it does best, offloading vector operations to the DSP and everything else (including custom/math operations) is passed to the CPU.

Makes sense, but this solution requires at least 3 compute engines. Product costs rise in terms of chip area and possible patent fees, power consumption rises, and the programming and support models become more complex in terms of managing, debugging, and updating the software in these engines. You can understand why software developers would prefer to see all this complexity handled by a common NPU engine and a programming model.

Supply chains/ecosystems are becoming increasingly complex

Intermediate manufacturers in the supply chain must build or at least adapt models to optimize for end-system applications, taking into account different lens options for cameras. They don't have the time or leeway to adapt to a variety of different platforms. Their business realities will inevitably limit which NPUs they are prepared to support.

Slightly further out, but not too far away, the software ecosystem is eager to develop around high-volume edge markets. An example is software/models for earbuds and hearing aids that support audio personalization. These value-added software companies will also gravitate toward the few platforms they are prepared to support.

The survival of the fittest may emerge sooner than it did during the early proliferation of CPU platforms. We will still need some competition between options, but regardless, the current Cambrian explosion of edge NPUs must soon end.

View more at EASELINK

Previous: Nvidia's performance soared again Next: Global semiconductors will hit a record high

Back to list

HOT NEWS

Understanding the Importance of Signal Buffers in Electronics

NPU,component,manufacturer,semiconductor

Have you ever wondered how your electronic devices manage to transmit and receive signals with such precision? The secret lies in a small ...

2023-11-13

Turkish domestically produced microcontrollers about to be put into production

Turkey has become one of the most important non-EU technology and semiconductor producers and distributors in Europe. The European se...

2024-08-14

1 What is PSRRPSRR Power Supply Rejection Ratio, the English name is Power Supply Rejection Ratio, or PSRR for short, ...

2023-09-26

Amazon halts development of a chip

Amazon has stopped developing its Inferentia AI chip and is instead focusing on semiconductors for training AI models, an area the com...

2024-12-10

Survival Guide – AI Chip Unicorn’s

Recently, the world's "AI chip unicorns" have successively announced new developments in their companies and products. Gro...

2024-04-26

Another century of Japanese electronics giant comes to an end

"Toshiba, Toshiba, the Toshiba of the new era!" In the 1980s, this advertising slogan was once popular all over the country.S...

2023-10-13

Understanding the World of Encoders, Decoders, and Converters: A Comprehensive Guide

Encoders play a crucial role in the world of technology, enabling the conversion of analog signals into digital formats.

2023-10-20

NPU will face a reshuffle

Accelerating software complexity

Accelerating Hardware Complexity

Supply chains/ecosystems are becoming increasingly complex

HOT NEWS

Understanding the Importance of Signal Buffers in Electronics

Have you ever wondered how your electronic devices manage to transmit and receive signals with such precision? The secret lies in a small ...

Turkish domestically produced microcontrollers about to be put into production

Turkey has become one of the most important non-EU technology and semiconductor producers and distributors in Europe. The European se...

UFS 4.1 standard is commercially available, and industry giants respond positively

The formulation of the UFS 4.1 standard may accelerate the implementation of large-capacity storage such as QLC

Basics of Power Supply Rejection Ratio (PSRR)

1 What is PSRRPSRR Power Supply Rejection Ratio, the English name is Power Supply Rejection Ratio, or PSRR for short, ...

Amazon halts development of a chip

Amazon has stopped developing its Inferentia AI chip and is instead focusing on semiconductors for training AI models, an area the com...

Survival Guide – AI Chip Unicorn’s

Recently, the world's "AI chip unicorns" have successively announced new developments in their companies and products. Gro...

Another century of Japanese electronics giant comes to an end

"Toshiba, Toshiba, the Toshiba of the new era!" In the 1980s, this advertising slogan was once popular all over the country.S...

Understanding the World of Encoders, Decoders, and Converters: A Comprehensive Guide

Encoders play a crucial role in the world of technology, enabling the conversion of analog signals into digital formats.

Office

Quote

Business

Send request/ Leave your message

RECYCLE Electronic Components

NPU will face a reshuffle

Accelerating software complexity

Accelerating Hardware Complexity

Supply chains/ecosystems are becoming increasingly complex

HOT NEWS

Understanding the Importance of Signal Buffers in Electronics

Have you ever wondered how your electronic devices manage to transmit and receive signals with such precision? The secret lies in a small ...

Turkish domestically produced microcontrollers about to be put into production

Turkey has become one of the most important non-EU technology and semiconductor producers and distributors in Europe. The European se...

UFS 4.1 standard is commercially available, and industry giants respond positively

The formulation of the UFS 4.1 standard may accelerate the implementation of large-capacity storage such as QLC

Basics of Power Supply Rejection Ratio (PSRR)

1 What is PSRRPSRR Power Supply Rejection Ratio, the English name is Power Supply Rejection Ratio, or PSRR for short, ...

Amazon halts development of a chip

Amazon has stopped developing its Inferentia AI chip and is instead focusing on semiconductors for training AI models, an area the com...

Survival Guide – AI Chip Unicorn’s

Recently, the world&#39;s &quot;AI chip unicorns&quot; have successively announced new developments in their companies and products. Gro...

Another century of Japanese electronics giant comes to an end

&quot;Toshiba, Toshiba, the Toshiba of the new era!&quot; In the 1980s, this advertising slogan was once popular all over the country.S...

Understanding the World of Encoders, Decoders, and Converters: A Comprehensive Guide

Encoders play a crucial role in the world of technology, enabling the conversion of analog signals into digital formats.

Office

Quote

Business

Send request/ Leave your message

RECYCLE Electronic Components

Recently, the world's "AI chip unicorns" have successively announced new developments in their companies and products. Gro...

"Toshiba, Toshiba, the Toshiba of the new era!" In the 1980s, this advertising slogan was once popular all over the country.S...