Surging AI-related infrastructure demands mean data centers could soon cost as much as $200 billion to build, according to a recent study, with most owned privately rather than by governments.
Research by Epoch AI tracing the computing powering AI by analyzing 500 systems shows that AI supercomputers — Epoch’s phrase for AI data centers or GPU clusters — are doubling in performance every nine months thanks to improvements in chips.
“Two key factors drove this growth: a yearly 1.6x increase in chip quantity and a yearly 1.6x improvement in performance per chip,” the researchers said in a blog post.
“While systems with more than 10,000 chips were rare in 2019, several companies deployed AI supercomputers more than ten times that size in 2024, such as xAI’s Colossus with 200,000 AI chips.
While computational performance is up 2.5 times per year, power requirements and hardware costs are also doubling annually. Epoch said hardware costs were up by 1.9 times each year, while power needs were doubling annually.
For example, xAI’s Colossus cost $7bn to build and uses 300MW of power. On the upside, these systems are becoming more efficient, with computational performance per watt increasing by 1.34 times annually.
“If the observed trends continue, the leading AI supercomputer in June 2030 will need 2 million AI chips, cost $200bn, and require 9 GW of power,” researchers said, noting projects like Stargate, the $500bn infrastructure project in the US, mean the funding side should be sufficient, though a lack of power may continue to be a problem.
“To overcome power constraints, companies may increasingly use decentralized training approaches, which would allow them to distribute a training run across AI supercomputers in several locations,” the researchers added.
The research highlights two key concerns for AI developers. First, the cost of infrastructure, with critics — and some investors — suggesting too much is being spent; earlier this year, Microsoft said it may take 15 years to see a return on investment for its AI spending and Anthropic CEO has said it may one day cost $100bn to build an AI model.
Second, such systems come with high energy demands that existing power infrastructure may not be able to keep up with. On both sides of the Atlantic, concerns have been raised over the ability for grid networks to keep pace with demand as AI adoption continues to accelerate.
Last year, the CEO of the UK’s National Grid suggested an overhaul of the country’s grid network could be required to accommodate data center infrastructure growth in years to come.
Shift to private systems
Another change is ownership, the study noted. Now, 80% of such machines are owned by companies rather than governments, compared to industry owning 40% back in 2019, the research showed.
According to Epoch’s research, the shift in ownership is down to a 2.7-times annual growth rate for AI supercomputers at private companies versus 1.9 times in the public sector.
“In addition to faster performance growth, companies also rapidly increased the total number of AI supercomputers they deployed to serve a rapidly expanding user base,” the blog post noted.
The US owns 75% of the total computing power in Epoch’s list, with China in second place at 15%.
“Meanwhile, traditional supercomputing powers like the UK, Germany, and Japan now play marginal roles in AI supercomputers,” the researchers noted.
“This shift reflects the dominance of large, US-based companies in AI development and computing. However, AI supercomputer location does not necessarily determine who uses the computational resources, given that many systems in our database are available remotely, such as via cloud services.”
MORE FROM ITPRO
Source link