GPU Compatibility: What to Consider – TechMikeNY

OUR TECHS ARE AVAILABLE MONDAY - FRIDAY 9:00 AM TO 5:00 PM EST

RECENT ARTICLES

Timelapse ASMR: Watch Us (Re)Build Servers...

Wed, Feb 28, 24

  As you might have guessed, configuring enterprise-grade servers takes some serious...

All About RAID Batteries – Preventing Data...

Wed, Jan 17, 24

  Ever sat around and wondered about your RAID controller’s battery? Can’t...

WATCH: Installing a Rear Flex Bay in an R7...

Wed, Dec 20, 23

This tutorial will show you, step-by-step, how to install a rear flex...

VIEW ALL

GPU Compatibility: What to Consider


From rendering complex graphics to training neural networks, the specialized parallel processing capabilities of a Graphics Processing Unit (GPU) are designed to substantially accelerate modern data center workloads. But of course, just finding a GPU that physically fits into your server isn't enough. The specific model and type of GPU should also be compatible with the server’s connectors, power draw, and supported software in order to function properly. So in this blogpost, we’re talking about the nitty-gritty of GPU compatibility considerations.


Hardware Compatibility Considerations

(Not sure what a Graphics Processing Unit is? Give this a read.)

The most fundamental compatibility consideration is whether the GPU will physically fit and electrically integrate with the server. When it comes to hardware compatibility, these are key questions you'll want to answer:

Physical card height/length: Will the GPU fit within the chassis?
Server GPUs come in full-height and low-profile form factors. Low-profile cards allow installation in 1U servers.

Power draw and connectors: Do you have enough power/the right supplies?
GPUs above 75W require auxiliary power cables. For supplementary GPU power, Dell and HP servers rely on riser cards to supply the physical 6-pin or 8-pin cable connections. And when adding those kinds of GPUs, we recommend 1100W power supply units (PSUs) to provide sufficient wattage headroom. Confirm the connector type and that adequate power overhead is available, especially for the more power intensive high-end cards.

Connector type:  Do you have the right connectors? (The answer is probably yes.) 
PCIe slots are backwards compatible. This means a GPU with a PCIe Gen 4 connector can still physically fit and work in a server’s Gen 3 slot, though it will run at the slower Gen 3 speeds.

There’s also a new connector standard for NVIDIA cards, the SXM4. Some high-end NVIDIA data center GPUs use this proprietary SXM connector for compute; however, as much bandwidth as it provides, it’s mostly used for top of the line GPUs (and PCIe standard connectors aren’t going anywhere).

Motherboard support: How many PCIe lanes are you working with?
x4, x8, and x16 PCIe lanes can all be adapted to connect a GPU. But only an x16 slot will allow for the GPU to work at full capacity (read: maximum speeds).

Let's take the Dell R740xd and the Dell R640 for example. The R740xd can fit full-height, dual-slot width GPUs with up to 300W power draw and Gen 4x16 support. But the R640 is limited to a single slot, low-profile GPU that draws less than 75W of power. So matching server model, chipset generation, PCIe slot bandwidth and power available to GPU requirements is key.


Pro Tip: when ordering a server with a GPU, especially a power-intensive GPU, check to see whether your order comes with an "enablement kit" with auxiliary power cables / other components that don't come standard with the chassis. (Shameless self-promo: any TechMikeNY server with a GPU will automatically come with the cables included.)



Thermals: Heat dissipation is critical


Like processors, GPUs get real hot under load. So of course, making sure there's enough cooling for the GPU is essential; of all things that could throttle performance, you definitely don’t want it to be poor airflow.

Confirm that GPU airflow requirements match the server fans, shrouding etc. This might require special server configurations for airflow baffles to route air correctly.

With multiple GPUs, pay special attention to heat concentration in contiguous slots, as more spacing may be required. Most GPU vendors offer solid guidelines on spacing between cards. 

Driver packages and OS support


The GPU driver package hooks up the hardware capabilities of the physical card to the OS, enabling accelerated workloads by exposing the parallel processing capabilities. To work together properly, though, both the driver and GPU hardware versions should be validated and certified. Mixing and matching the wrong driver/hardware combinations can lead to instability, crashes or limited functionality due to missing libraries. (So make sure you’re downloading the driver version that corresponds to both your specific GPU and the OS version your system is running on.)


Some tools to find compatible drivers:

NVIDIA Driver Compatibility Website (manual)
https://www.nvidia.com/download/index.aspx

NVIDIA GeForce Driver Compatibility Website (manual):
https://www.nvidia.com/en-us/geforce/drivers/

NVIDIA SmartScan Tool (checks your system automatically, requires Java):
https://www.nvidia.com/download/Scansg.aspx?lang=en-us

NVIDIA automatic SmartScan Tool for older systems:
https://www.nvidia.com/download/ScannForce.aspx?lang=en-us

For Windows users, AMD also has an auto-detect tool that checks your PC for “check your PC for compatible AMD Radeon™ Series Graphics, AMD Ryzen™ Chipsets, and Windows® version” and auto downloads compatible drivers: 
https://www.amd.com/en/support/kb/faq/gpu-56

Workload optimization


Aside from baseline compatibility, you’ll also want to keep in mind exactly what it is you plan on using that GPU for. To make the most of your GPU, you’ll want to check that it aligns to the performance profile required by the target workloads. For example, inferencing workloads (like transformers for NLP tasks), cards optimized for INT8 precision tend to provide the best throughput/TCO. Meanwhile, training or graphics workloads will require high floating point precision and bandwidth to drive large-scale parallel processing. Matching the card capabilities to what you’re doing is crucial – more on that coming soon. 

Final thoughts


En masse, GPUs have been designed as a one size fits all. For a while, makers were aiming to make deployment easier by making compatibility more standardized. But as using GPUs is becoming more and more common in data centers, the tasks for which they’re being used are getting more and more specific. Now, as companies are exploring very specific integration types, the market is moving towards boutique solutions.

Thankfully, resources for GPU integration are abundant in the open source community. If you Google a GPU model + task + “Reddit”, you will find several off-duty scientists explaining their drivers, setups, and fixes.

The future is software-defined. That is to say, down the road, servers will become fully software-controlled to allow flexible GPU use - just like spinning up cloud VMs today.

Doing your homework on what GPU cards work with the server specs is a key step in successful GPU integration, regardless of whether it’s for your data center or your homelab testing environment. Checking vendor manuals is a great place to start. But of course, if you want a hand figuring out the best setup, you can always give us a ring.

Leave a comment

Name . . Message .

Please note, comments must be approved before they are published