Decoding the Energy Footprint of AI

Why efficiency is the next frontier for tech advancements?

Aug 07, 2024

AI and computing technologies have spread into every sector, influencing everything from business to our daily routines. However, unlike the tangible presence of cars, homes or industries, these technologies exist in a seemingly intangible realm, often perceived as residing ‘in the ether’ or ‘in the clouds’. This intangible nature hides their substantial impact on our world, especially in terms of energy consumption. In this post we explore the hidden costs of AI and more generally of a computationally-based society and why it is imperative to invest in new energy-efficient computational paradigms.

The outline will be:

AI Energy Consumption: training, infrastructures, inference;
How to computer carbon footprint and the big picture;
🚀 Job & Research opportunities, talks, and events in AI.

Let’s start!

AI Energy Consumption

In 2019, researchers at the Allen Institute of AI highlighted in a paper entitled Green AI how the computational requirements for Deep Learning models were doubling every few months, with an estimated 300,000-fold increase from 2012 to 2018. This equates to a surprisingly large carbon footprint.

Now that we are in the era of Large Language Models, the picture is even worse. Training an LLM from scratch is estimated to use just under 1300 MWh of electricity, equivalent to 22 million 60 W light bulbs. Once the model is trained, an additional amount of computational energy is consumed when the model is distributed and queried by users during inference.

Let's delve a little deeper into these calculations to better understand today's energy impact of one of the technologies that will most transform our future. After some useful information, we will proceed to a step-by-step calculation with Llama 3 8B.

Training phase: computational power

To estimate the computing power for training, we need to consider the type of hardware used for training LLMs: typically a cluster of several high-performance GPUs (Graphics Processing Units). GPUs are optimised to handle large-scale parallel computations, such as vector-matrix and matrix-matrix computations typical of deep learning algorithms. For example, NVIDIA A100 GPUs, used in many modern AI training setups, have a maximum Thermal Design Power (TDP) consumption of about 400 W. This means that one hour of training a deep learning model (such as LLMs) on a single A100 GPU requires 400 Wh of energy.

Infrastructure power consumption

The infrastructure that supports LLM training, including data centre facilities, storage, cooling and lighting systems, also plays a key role in determining energy consumption. A fraction of the energy is used to maintain this supporting infrastructure. This is quantified by the Power Usage Effectiveness (PUE) and averages 20%, or 1.20 times training consumption. Inefficient data centres can have a PUE of 2.0 or higher, which means that half of the energy consumed is for non-computational overheads.

Inference: how much energy is required to use the model?

When the model is distributed and interrogated by users, it is said to undergo the inference phase. The computations required to query the model are much less energy-consuming than training. However, inference can exceed the corresponding training cycles when the model serves trillions of daily predictions and generations of responses for billions of users worldwide. Meta noted that the testing, training, and inference of their AI models breaks down to approximately 10:20:70 of the total energy capacity.

To give an example, a single query/prompt to an LLM (such as GPT4 or Claude) could consume about 0.002 kWh, which is 50 more than a normal Google search (0.00004 kWh, thus 180 Joules). Since a Google search is equivalent to lighting a 60W light bulb for 3 seconds, an LLM query corresponds to lighting our light bulb for 2 minutes and 30 seconds. To give another reference, charging an average smartphone requires 0.022 kWh of energy. This highlights the considerable environmental impact of frequent LLM use. If 100 million users query the model five times a day, this is equivalent to 1.0 GWh. As LLM users increase, it is expected that the total calculation cycles for inference predictions will exceed the corresponding training cycles for the implemented model.

A step by step example: Llama 3 8B

Let’s give for clarity an example using Llama 3 8B:

Meta claimed to have used 1.3 million (1.3e6) GPU hours to train the model on 15T (15e12) tokens, which corresponds to about 11.25T words, considering a 3/4 word/token ratio. It was not explicitly stated, but it seems that H100 GPUs were used, which are somewhat more efficient than the A100 with a TDP of 350W. The claimed throughput is 400 TFLOPS (number of floating-point operations completed per second) with a floating-point precision of FT16. Thus, the total number of floating-point operations (FLOP) was 1.8e24 (1.3e6 hours * 400e12 FLOP/s * 3600 s/hour). The total energy consumption for training is 455 MWh (350 W * 1.3e6 hours). Considering the PUE ratio above, the actual energy consumption of Llama 3 8B for training alone is 546 MWh. By comparison, the larger Llama 3 70B required about 10 million GPU hours, totalling 4200 MWh.

How to compute carbon footprint

The carbon footprint of large AI models depends on the country in which the hardware is located. A commonly used average among countries is 0.5 kg CO2 per kWh. Given the numbers above:

Training a small Llama 3 8B model releases 28e4 kg CO2, while Llama 3 70B releases 210e4 kg CO2;
Querying 1000 times an GPT4-like LLM can produce 1 kg CO2, while 100 millions users prompting 10 times each per day can generate 365e4 kg CO2 per year.

The big picture

What is the impact of AI on overall energy consumption? Short answer: today it is negligible, but it can grow very easily. Let's look at some numbers:

The total energy of the Sun striking the Earth per hour is 174000 TWh (626e18 Joules). About 30% of the energy is reflected back into space, so 122000 TWh are absorbed by the Earth (atmosphere, oceans, land) per hour.
On average, it takes about 0.28-0.56 kWh of solar energy to produce 1 kg of plant biomass. With the same amount of energy to train one LLM (which corresponds to 10 nano seconds of sunlight striking the Earth), the sun can grow 3.25 million kg of biomass, equivalent to 3250 one-tonne trees. The difference is that 3250 one-tonne trees can absorb 6 million kg of CO2 (1 kg of dry biomass can absorb about 1.88 kg of CO2), whereas an LLM during inference can release 3.7 million kg of CO2 when queried by 100 million users 10 times (as explained above);
Humanity energy consumption in 2023 was 180000 TWh. If this is compared with the solar energy absorbed by the Earth (see above), it means that humanity could live on the energy the sun provides us with in just 1 hour and 20 minutes;
Of all energy resources (coal, oil, natural gas, etc.), annual electricity consumption is 27000 TWh. In 2023, data centres consumed 500 TWh of electricity, which is almost 2% of global electricity consumption. This is the energy required to run the Internet, send e-mails, upload data, store data, operate AI, etc. For comparison, the city of New York consumes 51 TWh per year;
According to some estimates, the AI sector alone could consume up to 134 TWh per year in 2027 and 1000 TWh per year in 2035;

What’s next?

Not only AI but the whole digital world is expanding exponentially. If humanity does not want to turn in a few decades all available energy into computations, a central theme is to invest in new computing paradigms that are more energy efficient and closer to the Landauer's Limit. These include Quantum Computing, Neuromorphic Computing, and many other fascinating new methods inspired by physics and biology. After all, let us always remember that our brains, and many biological mechanisms, do incredible things with ridiculous energy consumption. This tells us that it is possible and gives us the motivation to explore further. In future episodes of Beyond Entropy, I will look in detail at the energetic cutting edge of computation. Stay tuned!

Note: While writing this post, I have noticed a lot of confusion in the literature and articles when it comes to energy consumption and power consumption. This is mainly due to the ambiguity between Watt (W) and Wh (Watt-hour). The Watt is a measure of power, hence of energy over time (1 W = 1 Joule / 1 second). On the other hand, the Wh is a measure of energy (not power!) and is not part of the International System of Units, known internationally by the abbreviation SI (this also contributes to the confusion). To give an example, an average light bulb has a power output of 60 W, which means that after one hour it consumes an energy of 60Wh, which translates into 60 * 3600 Joules = 216 kJ (1 hour = 3600 seconds).

Interesting resources

Energy counter from The World Counts;
Power Hungry Processing: Watts Driving the Cost of AI Deployment?

Opportunities, talks, and events

I share some opportunities from my network that you might find interesting:

🚀 Job opportunities:

Pasqal, an European Quantum Computing company, is looking for a mid-senior Quantum Developer;
Rombo AI, an Italian AI startup, has an open position for a Machine Learning Engineer;

🔬 Research opportunities:

Open call for a PhD position on Photonic Neural Networks and their applications at University of Trento (Lorenzo Pavesi’s Research Group);
PhD position @ Fast Computing, an Italian startup in the HPC sector;

📚 Learning opportunities:

Several fellowship available at the renewed Master in High-performance Computing (MHPC) held by SISSA and ICTP in Trieste;
10 fellowships are available for the newborn Master in Quantum Science & Technology at the University of Bari.

You can find me on LinkedIn or Twitter, where I share Science & Tech news. If you want to know more about me, my projects and my courses (technical and non-technical) then visit my website!