Deploying the future

The dual facets of AI – training and inference – are driving different demands on data center architecture.

Training AI, particularly when it comes to developing Large Language Models (LLMs), requires substantial GPU clusters and immense computing power.

This means that data centers involved in AI training need to be strategically located in areas with plenty of energy and cooling resources, which ideally need to be sustainable.

On the other hand, AI Inference, which involves processing user input using patterns learned during training, requires minimal latency to ensure a smooth user experience. As a result, compute resources for inference are typically positioned in urban areas, closer to end users, to serve proximity-driven applications.

Ensuring AI can be effective

Training and inference workloads impact networks differently. Training is centralized in areas with low-cost energy; inferencing pushes greater traffic demands to edge of network. For AI to be most effective, these considerations must therefore be taken into account.


Increasingly, enterprises require more, smaller, data centers at the edge – as close to the end user as possible to reduce latency. Here are just a few examples of why:

Manufacturing

Microsecond decision making in a manufacturing environment can help avoid machine downtime and revenue loss.

Smart cities

Modern cities leverage AI and 4K cameras to monitor and detect traffic patterns and suspicious activity and crime.

Automated cars

This workload will not run in a data center, but on a chip within the vehicle itself. However, the vehicle will store sensor data and send it back to the data center for training – enabling the development of better models, which will then be sent back to the car locally. These updates might happen as much as once a day.

Shaping tomorrow’s networks

AI’s use in training is less driven by latency and more by the need for a large compute capacity in order to process massive data sets, along with higher bandwidth to handle increased traffic. These factors are therefore creating a strain on lower power/bit networks.

In addition to this, greater costs and impact to the environment are forcing data center operators to favor locations far away from the edge, where they can utilize renewable energy and buy cheaper land. However, this then presents the challenge of having to transport data traffic across larger distances in a reliable and secure way – putting even more pressure on bandwidth requirements.

New types of traffic with distinct requirements


Training traffic

Ingress to data center

  • Huge volume
  • Spiky
  • Usually not time sensitive

Egress to data center

  • Small to medium size
  • Distributed to thousands of sites

Inter data center

  • Deterministic
  • Maximum throughput
  • Minimal latency

Inference traffic

Chat/text

Low bandwidth

Not latency sensitive

Image

Low/medium bandwidth

Not latency sensitive

Voice

Low bandwidth

Medium latency sensitive

Video

High bandwidth

High latency sensitive

Multimodal

Highest bandwidth requirements

Extremely latency sensitive

exainfra.net exainfra.net/contact-us Cookie consent