Benchmarking the SOMOD Neural PC: A Deep Learning Engineer’s Experience

As part of the launch of SOMOD PCs, the MBUZZ Labs team tasked me with benchmarking the new SOMOD Neural PC to evaluate its AI performance. Given the workstation’s powerful specs, I set out to test its deep learning capabilities using industry-standard benchmarks.

Initial Benchmarking Roadblocks

My initial plan was to run Geekbench AI, but I quickly hit a roadblock—it wasn’t detecting the GPU, making it unsuitable for proper evaluation. Next, I considered MLPerf, one of the most recognized AI benchmarking suites. However, downloading the massive datasets required for MLPerf benchmarks would have consumed significant bandwidth and time, so I decided to look for an alternative.

That’s when I stumbled upon PyTorch’s deep learning benchmark repository on GitHub. It provided a lightweight yet effective way to test AI workloads. I made modifications to the test files and adjusted batch sizes to optimize performance for the RTX 5880 Ada GPU. With these tweaks, I finally had a solid benchmarking setup.

Hardware Specifications of the SoMod Neural PC

Here’s the hardware that powers this AI workstation:

  • CPU: AMD Ryzen Threadripper PRO 5955WX (16 Cores, 32 Threads)
  • Memory: 256GB RAM
  • GPU: NVIDIA RTX 5880 Ada Generation (49GB VRAM)
  • NVIDIA Driver: 550.120
  • CUDA Version: 12.6.77
  • cuDNN: Latest version included in the CUDA toolkit
  • Motherboard: ASUSTeK Pro WS WRX80E-SAGE SE WIFI
  • Operating System: Ubuntu 22.04.5 LTS
  • PyTorch Version: 2.5.0a0+e000cf0ad9.nv24.10

Deep Learning Benchmarks: Real-World Performance

After setting up the benchmarks, I tested various AI workloads to see how well the SoMod Neural PC handled them. Here’s what I found:

1. Object Detection – SSD (Single Shot MultiBox Detector) with ResNet50 Backbone on COCO Dataset
  • AMP (Automatic Mixed Precision): 324 images per second
  • FP32: 184 images per second

AMP provided a 1.75x performance boost, demonstrating real-time object detection capabilities.

2. NLP – BERT on SQuAD v1.1
  • FP16 (with AMP): 237 sequences per second
  • FP32: 134 sequences per second

AMP delivered a 40% speedup over FP32, making it ideal for language model training.

3. Neural Machine Translation – GNMT
  • FP16 (with AMP): ~133,000 tokens per second
  • FP32: ~82,000 tokens per second

AMP allowed for larger batch sizes and faster translation speeds.

4. Recommendation Systems – Neural Collaborative Filtering (NCF) on ML-20M Dataset
  • FP16: ~22.1 million samples per second
  • FP32: ~21.5 million samples per second

Both precisions performed similarly, showing that precision optimizations aren’t always necessary for recommendation models.

5. Image Classification – ResNet50 on Synthetic ImageNet Data
  • AMP: 1,113 images per second
  • FP32: 624 images per second

With nearly 2x the speed of FP32, AMP is the best choice for high-speed training.

Benchmark Results: The Numbers Speak for Themselves

Model Precision Key Metric Value Notes
SSD (ResNet50 Backbone) AMP Images/Second 324 Real-time object detection capabilities.
SSD (ResNet50 Backbone) FP32 Images/Second 184 The FP32 precision result shows half performance if compared to AMP.
BERT (Base, SQuAD) FP16 (with AMP) Training Sequences/Second 237
BERT (Base, SQuAD) FP32 Training Sequences/Second 134 40% lower performance compared to FP16 with AMP.
GNMT (Translation) FP16 (with AMP) Tokens/Second (Training) ~133,000 Effective mixed precision allows larger batch sizes and faster runs.
GNMT (Translation) FP32 Tokens/Second (Training) ~82,000
NCF (Recommendation) FP16 Training Samples/Second ~22.1 Million
NCF (Recommendation) FP32 Training Samples/Second ~21.5 Million Results nearly on par with its FP16 counterpart.
ResNet50 (ImageNet) AMP Images/Second ~1113 AMP provides nearly double the performance compared to FP32.
ResNet50 (ImageNet) FP32 Images/Second ~624

Final Thoughts: Why the SoMod Neural PC is a Game Changer

🔹 AMP Optimizations Deliver Significant Speed Gains – Automatic Mixed Precision enhances performance without compromising accuracy.

🔹 High Memory Capacity – The 256GB RAM enables large batch sizes and efficient training.

🔹 Unrivaled AI Performance – The Threadripper PRO CPU + NVIDIA RTX 5880 Ada GPU combo ensures unmatched deep learning power.

🔹 AI-Ready Software Stack – With CUDA 12.6, cuDNN, and PyTorch optimizations, the system is future-proof for next-gen AI workloads.

Need a High-Performance AI Workstation?

If you’re looking for a powerful deep learning machine that can accelerate your AI workloads, the SoMod Neural PC is the ideal choice.

➡️ Order yours today at somodsystems.com and experience next-level AI computing! 🚀

Posted in Blog