Blog

Nvidia Competitors: AI Chipmakers Fighting the Silicon War

August 30, 2024

7 minutes read

Nvidia joined the 3-trillion market valuation club in June earlier this year, outranking the likes of Apple and Minecraft. This astronomical growth has been possible due to its dominance in the GPU and AI hardware space. However, Nvidia is not the only company making chips for today’s growing AI workloads. Many companies, such as Intel, Google, Amazon, and others, are working on custom silicon for training and inferencing AI models. So, let’s look at promising Nvidia competitors in the AI hardware space.

AMD

When it comes to high-performing AI accelerators, AMD is up there competing against Nvidia, both in terms of training and inference. While analysts suggest that Nvidia has a market share of 70% to 90% in the AI hardware space, AMD has started putting its house in order.

AMD introduced its Instinct MI300X accelerator for AI workloads and HPC (High Performance Computing) in December 2023. AMD claims that its Instinct MI300X accelerator delivers 1.6x better performance than Nvidia H100 in inference and almost similar performance in training.

Not only that, it offers a capacity of up to 192GB HBM3 memory (High-Bandwidth Memory), much higher than Nvidia H100’s 80GB capacity. MI300X delivers a memory bandwidth of up to 5.3 TBps, again higher than H100’s 3.4 TBps.

Image Courtesy: AMD

So AMD is really putting up a fight against Nvidia’s reign. However, AMD still has a long way to go before it establishes itself as a major rival to Nvidia. The answer to this lies in software. Nvidia’s moat is CUDA, the computing platform that allows developers to directly interact with Nvidia GPUs for accelerated parallel processing.

The CUDA platform has a large number of libraries, SDKs, toolkits, compilers, and debugging tools, and it’s supported by popular deep learning frameworks such as PyTorch and TensorFlow. On top of that, CUDA has been around for nearly two decades, and developers are more familiar with Nvidia GPUs and their workings, especially in the field of machine learning. Nvidia has created a large community around CUDA with better documentation and training resources.

That said, AMD is investing heavily in the ROCm (Radeon Open Compute) software platform and it supports PyTorch, TensorFlow, and other open frameworks. The company has also decided to open-source some portion of the ROCm software stack. However, developers have criticized ROCm for offering a fragmented experience and a lack of comprehensive documentation. Remember George Hotz calling out AMD for its unstable driver?

The AMD tinybox is on hold until we can build and run the relevant firmware on our GPUs.

The driver is still very unstable, and when it crashes or hangs we have no way of debugging it. We have no way of dumping the state of a GPU. Apparently it isn’t just the MES causing these…— the tiny corp (@__tinygrad__) March 19, 2024

So the bottom line is that AMD must unify its software platform and bring ML researchers and developers into its fold with better ROCm documentation and support. Big giants like Microsoft, Meta, OpenAI, and Databricks are already deploying MI300X accelerators under ROCm so that’s a good sign.

Intel

Many analysts are writing off Intel from the AI chip space, but Intel has been one of the leaders in inferencing with its CPU-based Xeon servers. The company recently launched its Gaudi 3 AI accelerator, which is an ASIC (Application-Specific Integrated Circuit) chip that is not based on traditional CPU or GPU design. It offers both training and inference for Generative AI workloads.

Intel claims the Gaudi 3 AI accelerator is 1.5x faster at training and inference than Nvidia H100. Its Tensor Processor Cores (TPC) and MME Engines are specialized for matrix operations which are required for deep learning workloads.

As for software, Intel is going the open-source route with OpenVINO and its own software stack. The Gaudi software suite integrates frameworks, tools, drivers, and libraries and supports open frameworks like PyTorch and TensorFlow. In regards to Nvidia’s CUDA, Intel chief, Pat Gelsinger recently said:

You know, the entire industry is motivated to eliminate the CUDA market. We think of the CUDA moat as shallow and small.

In case you are not aware, Intel along with Google, Arm, Qualcomm, Samsung, and other companies have formed a group called the Unified Acceleration Foundation (UXL). The group aims to create an open-source alternative to Nvidia’s proprietary CUDA software platform. The task is to create a silicon-agnostic platform to train and run models on any chip. This will prevent developers from getting locked into Nvidia’s CUDA platform.

Now, what shape the future will take is something only time will tell. But Intel’s effort to dethrone CUDA has started.

In Today’s AI Race, Don’t Gamble with Your Digital Privacy

Arjun Sha

May 1, 2024

Google

If there is an AI giant that is not reliant on Nvidia, it’s Google. Yes, you read that right. Google has been developing its in-house TPU (Tensor Processing Unit) since 2015 on ASIC design. Its powerful TPU v5p is 2.8x faster than Nvidia H100 at training AI models and highly efficient at inference. And the sixth-gen Trillium TPU is even more powerful. Google uses its TPU for training, finetuning, and inferencing.

microsoft maia 100 — Image Courtesy: Amazon

Source link

Nvidia Competitors: AI Chipmakers Fighting the Silicon War

AMD

Intel

Google

Amazon

Microsoft

Qualcomm

Cerebras

Groq

Closing Thoughts

Galaxy Z Fold 7 renders leak — and it looks like Samsung’s fixed two big problems

OpenAI’s Sam Altman discusses GPT-5 release date

Inside Microsoft’s complicated relationship with OpenAI

Definition, Benefits & Best Practices

Ransomware gang claims responsibility for cyber attack on Ohio school district that cancelled classes

7 Security and Compliance Tips From ISC2 Security Congress

How to restart Windows 11 (in 9 easy ways)

Personal data taken in Oxford City Council cyber attack

28 Years Later Review: Contagiously Good, Mostly!

FBI disrupts the Dispossessor ransomware operation, seizes servers

Today’s AI models have a poor grasp of world history – Computerworld

Wordle Answer for Today, August 13, 2024

AMD

Intel

Google

Amazon

Microsoft

Qualcomm

Cerebras

Groq

Closing Thoughts

Related Articles

Galaxy Z Fold 7 renders leak — and it looks like Samsung’s fixed two big problems

OpenAI’s Sam Altman discusses GPT-5 release date

Inside Microsoft’s complicated relationship with OpenAI

Definition, Benefits & Best Practices

Ransomware gang claims responsibility for cyber attack on Ohio school district that cancelled classes

7 Security and Compliance Tips From ISC2 Security Congress

How to restart Windows 11 (in 9 easy ways)

Personal data taken in Oxford City Council cyber attack

28 Years Later Review: Contagiously Good, Mostly!

FBI disrupts the Dispossessor ransomware operation, seizes servers

Today’s AI models have a poor grasp of world history – Computerworld

Wordle Answer for Today, August 13, 2024