For Deep Learning performance, please go here.


Modern HPC data centers are key to solving some of the world’s most important scientific and engineering challenges. The NVIDIA A100, V100 and T4 GPUs fundamentally change the economics of the data center, delivering breakthrough performance with dramatically fewer servers, less power consumption, and reduced networking overhead, resulting in total cost savings of 5X-10X.

The number of CPU-only servers replaced by a single GPU-accelerated server is called the node replacement factor (NRF). To arrive at NRF, we measure application performance with up to 8 CPU-only servers. Then we use linear scaling to scale beyond 8 servers to calculate the NRF. The NRF will vary by application.

Engineering

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A100 SXM 80GB | FUN3D Benchmark: dpw_wbt0_crs-3.6Mn_5, CUDA Version: 11.8

Geoscience

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A100 SXM 80GB | ICON Benchmark: QUBICC 160 km resolution, CUDA Version: 11.8 | RTM Benchmark: Isotropic Radius 4, CUDA Version: 11.8 | SPECFEM3D Benchmark: four_material_simple_model, CUDA Version: 11.8

Molecular Dynamics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A100 SXM 80GB | AMBER Benchmark: DC-STMV_NPT, CUDA Version: 11.8 | GROMACS Benchmark: STMV, CUDA Version: 11.8 | LAMMPS Benchmark: SNAP, CUDA Version: 11.8 | NAMD Benchmark: apoa1_nve_cuda, CUDA Version: 11.8 | Relion Benchmark: Plasmodium Ribosome (2D), CUDA Version: 11.4.2

Physics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A100 SXM 80GB | Chroma Benchmark: szscl21_24_128, CUDA Version: 11.3.1 | GTC Benchmark: moi#proc.in, CUDA Version: 11.8 | MILC Benchmark: Apex Medium, CUDA Version: 11.8

Quantum Mechanics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A100 SXM 80GB | Quantum Espresso Benchmark: AUSURF112-jR, CUDA Version: 11.8


Detailed A100 application performance data is located below in alphabetical order.

AMBER

Molecular Dynamics

Suite of programs to simulate molecular dynamics on biomolecule

VERSION

22.0-AT_22.3

ACCELERATED FEATURES

  • PMEMD Explicit Solvent and GB Implicit Solvent

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

http://ambermd.org/GPUSupport.php

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM4 80GB2x A100 SXM4 80GB4x A100 SXM4 80GB8x A100 SXM4 80GB1x A100 PCIe 80GB2x A100 PCIe 80GB4x A100 PCIe 80GB8x A100 PCIe 80GB
AMBER [PME-Cellulose_NPT_4fs]ns/dayDC-Cellulose_NPTyes4.051472945881,1761452895791,157
AMBER [PME-Cellulose_NPT_4fs]NRFDC-Cellulose_NPTyes1x36x73x145x290x36x71x143x286x
AMBER [PME-Cellulose_NVE_4fs]ns/dayDC-Cellulose_NVEyes4.081593186361,2721583156301,261
AMBER [PME-Cellulose_NVE_4fs]NRFDC-Cellulose_NVEyes1x39x78x156x312x39x77x155x309x
AMBER [PME-FactorIX_NPT_4fs]ns/dayDC-FactorIX_NPTyes20.375121,0252,0504,1005081,0172,0344,067
AMBER [PME-FactorIX_NPT_4fs]NRFDC-FactorIX_NPTyes1x25x50x101x201x25x50x100x200x
AMBER [PME-FactorIX_NVE_4fs]ns/dayDC-FactorIX_NVEyes21.05561,1122,2254,4505511,1022,2034,407
AMBER [PME-FactorIX_NVE_4fs]NRFDC-FactorIX_NVEyes1x26x53x106x212x26x52x105x210x
AMBER [PME-JAC_NPT_4fs]ns/dayDC-JAC_NPTyes82.161,1672,3334,6679,3341,1702,3394,6789,357
AMBER [PME-JAC_NPT_4fs]NRFDC-JAC_NPTyes1x14x28x57x114x14x28x57x114x
AMBER [PME-JAC_NVE_4fs]ns/dayDC-JAC_NVEyes85.071,2212,4414,8839,7651,2472,4934,9869,973
AMBER [PME-JAC_NVE_4fs]NRFDC-JAC_NVEyes1x14x29x57x115x15x29x59x117x
AMBER [PME-STMV_NPT_4fs]ns/dayDC-STMV_NPTyes1.275410721442853107214428
AMBER [PME-STMV_NPT_4fs]NRFDC-STMV_NPTyes1x42x84x169x337x42x84x168x337x
AMBER [FEP-GTI_Complex 1fs]ns/dayFEP-GTI_Complexyes9.971322655301,0591352715421,084
AMBER [FEP-GTI_Complex 1fs]NRFFEP-GTI_Complexyes1x13x27x53x106x14x27x54x109x

Chroma

Physics

Lattice Quantum Chromodynamics (LQCD)

VERSION

V 2021.08

ACCELERATED FEATURES

  • Wilson-clover fermions, Krylov solvers, Domain-decomposition
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM4 80GB2x A100 SXM4 80GB4x A100 SXM4 80GB8x A100 SXM4 80GB1x A100 PCIe 80GB2x A100 PCIe 80GB4x A100 PCIe 80GB8x A100 PCIe 80GB
ChromaTotal Time (Sec)szscl21_24_128no1,11536201174425139
ChromaNRFszscl21_24_128yes1x32x55x99x163x26x46x84x129x

FUN3D

Engineering

Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow

VERSION

13.7 (update 1)

ACCELERATED FEATURES

  • Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://fun3d.larc.nasa.gov

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM4 80GB2x A100 SXM4 80GB4x A100 SXM4 80GB8x A100 SXM4 80GB1x A100 PCIe 80GB2x A100 PCIe 80GB4x A100 PCIe 80GB8x A100 PCIe 80GB
FUN3DLoop Time (Sec)dpw_wbt0_crs-3.6Mn_5no4955228161154281613
FUN3DNRFdpw_wbt0_crs-3.6Mn_5yes1x12x22x39x55x11x22x39x49x

GROMACS

Molecular Dynamics

Simulation of biochemical molecules with complicated bond interactions

VERSION

2022.3

ACCELERATED FEATURES

  • Implicit (5x), Explicit (2x) Solvent
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM4 80GB2x A100 SXM4 80GB4x A100 SXM4 80GB8x A100 SXM4 80GB1x A100 PCIe 80GB2x A100 PCIe 80GB4x A100 PCIe 80GB8x A100 PCIe 80GB
GROMACS [ADH Dodec]ns/dayADH Dodecyes69381450611-385411535-
GROMACS [ADH Dodec]NRFADH Dodecyes1x7x9x12x-7x8x10x-
GROMACS [Cellulose]ns/dayCelluloseyes20109176258279109128179184
GROMACS [Cellulose]NRFCelluloseyes1x8x13x19x21x8x10x13x14x
GROMACS [STMV]ns/daySTMVyes424447912524416783
GROMACS [STMV]NRFSTMVyes1x6x11x19x31x5x10x16x20x

GTC

Physics

GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas

VERSION

V 4.5 Updated

ACCELERATED FEATURES

  • Push, shift, and collision

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM4 80GB2x A100 SXM4 80GB4x A100 SXM4 80GB8x A100 SXM4 80GB1x A100 PCIe 80GB2x A100 PCIe 80GB4x A100 PCIe 80GB8x A100 PCIe 80GB
GTCMpush/Secmoi#proc.inyes354979381,8573,6965079271,7902,805
GTCNRFmoi#proc.inyes1x14x27x54x108x15x27x52x82x

ICON

Weather and Climate

A global unified atmosphere model for numerical weather prediction and climate modeling research

VERSION

2.6.5_RC

ACCELERATED FEATURES

  • Full model of dynamics and physics

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://code.mpimet.mpg.de/projects/iconpublic

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM4 80GB2x A100 SXM4 80GB4x A100 SXM4 80GB8x A100 SXM4 80GB1x A100 PCIe 80GB2x A100 PCIe 80GB4x A100 PCIe 80GB
ICON [SLAM 191 - 160KM - no radiation]Integrate_nh (sec)SLAM 191 levels 160 km resolution without radiationno2,431317218158134318224165
ICON [SLAM 191 - 160KM - no radiation]NRFSLAM 191 levels 160 km resolution without radiationyes1x8x11x15x18x8x11x15x
ICON [QUBICC 160 km resolution]Integrate_nh (sec)SLAM 191 levels 160 km resolution with radiationno2,213293197144120291192140
ICON [QUBICC 160 km resolution]NRFSLAM 191 levels 160 km resolution with radiationyes1x8x11x15x18x8x12x16x

LAMMPS

Molecular Dynamics

Classical molecular dynamics package

VERSION

stable_23Jun2022_update1

ACCELERATED FEATURES

  • Lennard-Jones, Gay-Berne, Tersoff, many more potentials
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM4 80GB2x A100 SXM4 80GB4x A100 SXM4 80GB8x A100 SXM4 80GB1x A100 PCIe 80GB2x A100 PCIe 80GB4x A100 PCIe 80GB8x A100 PCIe 80GB
LAMMPS [LJ 2.5]ATOM-Time Steps/sLJ 2.5yes1.07E+085.04E+089.50E+081.73E+093.14E+095.10E+089.40E+081.61E+09-
LAMMPS [LJ 2.5]NRFLJ 2.5yes1x5x9x17x30x5x9x15x-
LAMMPS [EAM]ATOM-Time Steps/sEAMyes5.39E+072.84E+085.21E+088.99E+081.57E+092.82E+085.06E+088.20E+08-
LAMMPS [EAM]NRFEAMyes1x5x10x17x29x5x9x15x-
LAMMPS [ReaxFF/C]ATOM-Time Steps/sReaxFF/Cyes4.48E+054.44E+068.31E+061.52E+072.44E+074.45E+068.33E+061.43E+071.83E+07
LAMMPS [ReaxFF/C]NRFReaxFF/Cyes1x17x31x58x92x17x32x54x69x
LAMMPS [SNAP]ATOM-Time Steps/sSNAPyes1.13E+052.18E+064.34E+068.65E+061.66E+072.08E+064.20E+068.29E+061.59E+07
LAMMPS [SNAP]NRFSNAPyes1x19x38x76x146x18x37x73x140x
LAMMPS [Tersoff]ATOM-Time Steps/sTersoffyes2.75E+075.19E+089.66E+081.71E+093.02E+094.98E+088.85E+081.42E+09-
LAMMPS [Tersoff]NRFTersoffyes1x19x35x62x109x18x32x51x-

MILC

Physics

Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons

VERSION

feature/gauge-action-quda_16a2d47119

ACCELERATED FEATURES

  • Staggered fermions, Krylov solvers, Gauge-link fattening

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://ngc.nvidia.com/catalog/containers/hpc:milc

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM4 80GB2x A100 SXM4 80GB4x A100 SXM4 80GB8x A100 SXM4 80GB1x A100 PCIe 80GB2x A100 PCIe 80GB4x A100 PCIe 80GB8x A100 PCIe 80GB
MILCTotal Time (Sec)Apex Mediumno72,0422,1161,2336483732,1731,145660647
MILCNRFApex Mediumyes1x37x64x122x212x36x69x120x122x

NAMD

Molecular Dynamics

Designed for high-performance simulation of large molecular systems

VERSION

GPU, AMD CPU V 3.0a13 ; Intel CPU V 2.15a AVX512

ACCELERATED FEATURES

  • Full electrostatics with PME and most simulation features

SCALABILITY

Up to 100M atom capable, multi-GPU, single node

MORE INFORMATION

http://www.ks.uiuc.edu/Research/namd/

https://ngc.nvidia.com/catalog/containers/hpc:namd

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM4 80GB2x A100 SXM4 80GB4x A100 SXM4 80GB8x A100 SXM4 80GB1x A100 PCIe 80GB2x A100 PCIe 80GB4x A100 PCIe 80GB8x A100 PCIe 80GB
NAMD [apoa1_npt_cuda]Ave ns/dayapoa1_npt_cudayes19.151743476961,3841733446911,381
NAMD [apoa1_npt_cuda]NRFapoa1_npt_cudayes1x9x18x36x72x9x18x36x72x
NAMD [apoa1_nptsr_cuda]Ave ns/dayapoa1_nptsr_cudayes19.591783607171,4211793577091,415
NAMD [apoa1_nptsr_cuda]NRFapoa1_nptsr_cudayes1x9x18x37x73x9x18x36x72x
NAMD [apoa1_nve_cuda]Ave ns/dayapoa1_nve_cudayes20.752184368681,7272164298561,720
NAMD [apoa1_nve_cuda]NRFapoa1_nve_cudayes1x11x21x42x83x10x21x41x83x
NAMD [stmv_npt_cuda]Ave ns/daystmv_npt_cudayes1.87142754108132751107
NAMD [stmv_npt_cuda]NRFstmv_npt_cudayes1x7x15x29x58x7x14x27x57x
NAMD [stmv_nptsr_cuda]Ave ns/daystmv_nptsr_cudayes1.81142856111142855110
NAMD [stmv_nptsr_cuda]NRFstmv_nptsr_cudayes1x8x15x31x61x8x15x30x61x
NAMD [stmv_nve_cuda]Ave ns/daystmv_nve_cudayes1.94163264128163162127
NAMD [stmv_nve_cuda]NRFstmv_nve_cudayes1x8x17x33x66x8x16x32x65x

Quantum Espresso

Material Science (Quantum Chemistry)

An Open-source suite of computer codes for electronic structure calculations and materials modeling at the nanoscale

VERSION

V7.0 CPU; V7.1 GPU

ACCELERATED FEATURES

  • linear algebra (matrix multiply)
  • explicit computational kernels
  • 3D FFTs

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.quantum-espresso.org

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM4 80GB2x A100 SXM4 80GB4x A100 SXM4 80GB8x A100 SXM4 80GB1x A100 PCIe 80GB2x A100 PCIe 80GB4x A100 PCIe 80GB8x A100 PCIe 80GB
Quantum EspresssoTotal CPU Time (Sec)AUSURF112-jRno718112714837114694941
Quantum EspresssoNRFAUSURF112-jRyes1x7x11x17x22x7x12x16x19x

RELION

Microscopy

Stand-alone computer program that employs an empirical Bayesianapproach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM)

VERSION

3.1.3

ACCELERATED FEATURES

  • Reduced memory requirements; high-resolution cryo-EM structure determination in a matter of day on a single workstation
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM4 80GB2x A100 SXM4 80GB4x A100 SXM4 80GB8x A100 SXM4 80GB1x A100 PCIe 80GB2x A100 PCIe 80GB4x A100 PCIe 80GB8x A100 PCIe 80GB
Relion [Plasmodium Ribosome]Total Wall Clock (Sec)MB numbers Plasmodium Ribosime on Relion-3.0no12,7423,1101,7811,4581,3133,4011,9941,838-
Relion [Plasmodium Ribosome]NRFMB numbers Plasmodium Ribosime on Relion-3.0yes1x4x7x9x10x4x6x7x-
Relion [Plasmodium Ribosome 2D]Total Wall Clock (Sec)Plasmodium Ribosome (2D)no90,10822,96012,4168,5876,29925,27513,4149,5837,266
Relion [Plasmodium Ribosome 2D]NRFPlasmodium Ribosome (2D)yes1x4x7x10x14x4x7x9x12x

RTM

Geoscience

Reverse time migration (RTM) modeling is a critical component in the seismic processing workflow of oil and gas exploration

VERSION

nvidia_2021_05

ACCELERATED FEATURES

  • Batch algorithm

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.tsunamidevelopment.com/assets/rtm.pdf

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM4 80GB2x A100 SXM4 80GB4x A100 SXM4 80GB8x A100 SXM4 80GB1x A100 PCIe 80GB2x A100 PCIe 80GB4x A100 PCIe 80GB8x A100 PCIe 80GB
RTM [Isotropic Radius 4]Mcells/sIsotropic Radius 4yes11,31889,617178,568356,408714,38585,304169,986339,992679,229
RTM [Isotropic Radius 4]NRFIsotropic Radius 4yes1x8x16x31x63x8x15x30x60x
RTM [TTI Radius 8 1-pass]Mcells/sTTI Radius 8 1-passyes3,77312,96825,69251,157102,20012,88725,71251,428102,234
RTM [TTI Radius 8 1-pass]NRFTTI Radius 8 1-passyes1x3x7x14x27x3x7x14x27x
RTM [TTI RX 2Pass mgpu]Mcells/sTTI RX 2Pass mgpuyes3,77313,98927,52954,607108,89513,58527,03453,782107,190
RTM [TTI RX 2Pass mgpu]NRFTTI RX 2Pass mgpuyes1x4x7x14x29x4x7x14x28x

SPECFEM3D

Geoscience

Simulates Seismic wave propagation

VERSION

devel_fef2ace9

ACCELERATED FEATURES

  • OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://geodynamics.org/cig/software/specfem3d/

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM4 80GB2x A100 SXM4 80GB4x A100 SXM4 80GB8x A100 SXM4 80GB1x A100 PCIe 80GB2x A100 PCIe 80GB4x A100 PCIe 80GB8x A100 PCIe 80GB
SPECFEM3DTotal Time (Sec)four_material_simple_modelno1,8477741221477402215
SPECFEM3DNRFfour_material_simple_modelyes1x27x52x98x148x27x53x98x142x

Engineering

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A30 | FUN3D Benchmark: dpw_wbt0_crs-3.6Mn_5, CUDA Version: 11.8

Geoscience

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A30 | ICON Benchmark: QUBICC 160km resolution, CUDA Version: 11.8 | RTM Benchmark: Isotropic Radius 4, CUDA Version: 11.8 | SPECFEM3D Benchmark: four_material_simple_model, CUDA Version: 11.8

Molecular Dynamics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A30 | AMBER Benchmark: DC-STMV_NPT, CUDA Version: 11.8 | GROMACS Benchmark: STMV, CUDA Version: 11.8 | LAMMPS Benchmark: SNAP, CUDA Version: 11.8 | NAMD Benchmark: apoa1_nve_cuda, CUDA Version: 11.8 | Relion Benchmark: Plasmodium Ribosome (2D), CUDA Version: 11.4.2

Physics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A30 | Chroma Benchmark: szscl21_24_128, CUDA Version: 11.3.1 | GTC Benchmark: moi#proc.in, CUDA Version: 11.8 | MILC Benchmark: Apex Medium, CUDA Version: 11.8

Quantum Mechanics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A30 | Quantum Espresso Benchmark: AUSURF112-jR, CUDA Version: 11.8


Detailed A30 application performance data is located below in alphabetical order.

AMBER

Molecular Dynamics

Suite of programs to simulate molecular dynamics on biomolecule

VERSION

22.0-AT_22.3

ACCELERATED FEATURES

  • PMEMD Explicit Solvent and GB Implicit Solvent

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

http://ambermd.org/GPUSupport.php

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A302x A304x A308x A30
AMBER [PME-Cellulose_NPT_4fs]ns/dayDC-Cellulose_NPTyes4.0580161321643
AMBER [PME-Cellulose_NPT_4fs]NRFDC-Cellulose_NPTyes1x20x40x79x159x
AMBER [PME-Cellulose_NVE_4fs]ns/dayDC-Cellulose_NVEyes4.0884168336671
AMBER [PME-Cellulose_NVE_4fs]NRFDC-Cellulose_NVEyes1x21x41x82x165x
AMBER [PME-FactorIX_NPT_4fs]ns/dayDC-FactorIX_NPTyes20.373326651,3292,659
AMBER [PME-FactorIX_NPT_4fs]NRFDC-FactorIX_NPTyes1x16x33x65x131x
AMBER [PME-FactorIX_NVE_4fs]ns/dayDC-FactorIX_NVEyes21.03496981,3962,793
AMBER [PME-FactorIX_NVE_4fs]NRFDC-FactorIX_NVEyes1x17x33x66x133x
AMBER [PME-JAC_NPT_4fs]ns/dayDC-JAC_NPTyes82.169081,8153,6317,262
AMBER [PME-JAC_NPT_4fs]NRFDC-JAC_NPTyes1x11x22x44x88x
AMBER [PME-JAC_NVE_4fs]ns/dayDC-JAC_NVEyes85.079101,8213,6417,282
AMBER [PME-JAC_NVE_4fs]NRFDC-JAC_NVEyes1x11x21x43x86x
AMBER [PME-STMV_NPT_4fs]ns/dayDC-STMV_NPTyes1.272958116231
AMBER [PME-STMV_NPT_4fs]NRFDC-STMV_NPTyes1x23x45x91x182x
AMBER [FEP-GTI_Complex 1fs]ns/dayFEP-GTI_Complexyes9.9799199398795
AMBER [FEP-GTI_Complex 1fs]NRFFEP-GTI_Complexyes1x10x20x40x80x

Chroma

Physics

Lattice Quantum Chromodynamics (LQCD)

VERSION

V 2021.08

ACCELERATED FEATURES

  • Wilson-clover fermions, Krylov solvers, Domain-decomposition
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x A304x A308x A30
ChromaTotal Time (Sec)szscl21_24_128no1,115351811
ChromaNRFszscl21_24_128yes1x33x62x103x

FUN3D

Engineering

Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow

VERSION

13.7 (update 1)

ACCELERATED FEATURES

  • Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://fun3d.larc.nasa.gov

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A302x A304x A308x A30
FUN3DLoop Time (Sec)dpw_wbt0_crs-3.6Mn_5no495111562918
FUN3DNRFdpw_wbt0_crs-3.6Mn_5yes1x5x11x21x35x

GROMACS

Molecular Dynamics

Simulation of biochemical molecules with complicated bond interactions

VERSION

2022.3

ACCELERATED FEATURES

  • Implicit (5x), Explicit (2x) Solvent
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A302x A304x A308x A30
GROMACS [ADH Dodec]ns/dayADH Dodecyes69201292381-
GROMACS [ADH Dodec]NRFADH Dodecyes1x3x6x7x-
GROMACS [Cellulose]ns/dayCelluloseyes205592115140
GROMACS [Cellulose]NRFCelluloseyes1x3x5x9x10x
GROMACS [STMV]ns/daySTMVyes413233955
GROMACS [STMV]NRFSTMVyes1x3x5x10x13x

GTC

Physics

GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas

VERSION

V 4.5 Updated

ACCELERATED FEATURES

  • Push, shift, and collision

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A302x A304x A308x A30
GTCMpush/Secmoi#proc.inyes352865351,0551,781
GTCNRFmoi#proc.inyes1x8x16x31x52x

ICON

Weather and Climate

A global unified atmosphere model for numerical weather prediction and climate modeling research

VERSION

2.6.5_RC

ACCELERATED FEATURES

  • Full model of dynamics and physics

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://code.mpimet.mpg.de/projects/iconpublic

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A302x A304x A308x A30
ICON [SLAM 191 - 160KM - no radiation]Integrate_nh (sec)SLAM 191 levels 160 km resolution without radiationno2,431571354233206
ICON [SLAM 191 - 160KM - no radiation]NRFSLAM 191 levels 160 km resolution without radiationyes1x4x7x10x12x
ICON [QUBICC 160 km resolution]Integrate_nh (sec)SLAM 191 levels 160 km resolution with radiationno2,213502302193164
ICON [QUBICC 160 km resolution]NRFSLAM 191 levels 160 km resolution with radiationyes1x4x7x11x13x

LAMMPS

Molecular Dynamics

Classical molecular dynamics package

VERSION

stable_23Jun2022_update1

ACCELERATED FEATURES

  • Lennard-Jones, Gay-Berne, Tersoff, many more potentials
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A302x A304x A308x A30
LAMMPS [LJ 2.5]ATOM-Time Steps/sLJ 2.5yes1.07E+082.77E+085.34E+089.88E+081.42E+09
LAMMPS [LJ 2.5]NRFLJ 2.5yes1x3x5x10x14x
LAMMPS [EAM]ATOM-Time Steps/sEAMyes5.39E+071.36E+082.56E+084.60E+086.95E+08
LAMMPS [EAM]NRFEAMyes1x3x5x9x13x
LAMMPS [ReaxFF/C]ATOM-Time Steps/sReaxFF/Cyes4.48E+052.46E+064.73E+068.66E+061.28E+07
LAMMPS [ReaxFF/C]NRFReaxFF/Cyes1x9x18x33x49x
LAMMPS [SNAP]ATOM-Time Steps/sSNAPyes1.13E+051.11E+062.22E+064.42E+068.62E+06
LAMMPS [SNAP]NRFSNAPyes1x10x20x39x76x
LAMMPS [Tersoff]ATOM-Time Steps/sTersoffyes2.75E+072.49E+084.39E+087.94E+081.02E+09
LAMMPS [Tersoff]NRFTersoffyes1x9x16x29x37x

MILC

Physics

Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons

VERSION

feature/gauge-action-quda_16a2d47119

ACCELERATED FEATURES

  • Staggered fermions, Krylov solvers, Gauge-link fattening

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://ngc.nvidia.com/catalog/containers/hpc:milc

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A302x A304x A308x A30
MILCTotal Time (Sec)Apex Mediumno72,0424,9202,1311,145713
MILCNRFApex Mediumyes1x16x37x69x111x

NAMD

Molecular Dynamics

Designed for high-performance simulation of large molecular systems

VERSION

GPU, AMD CPU V 3.0a13 ; Intel CPU V 2.15a AVX512

ACCELERATED FEATURES

  • Full electrostatics with PME and most simulation features

SCALABILITY

Up to 100M atom capable, multi-GPU, single node

MORE INFORMATION

http://www.ks.uiuc.edu/Research/namd/

https://ngc.nvidia.com/catalog/containers/hpc:namd

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A302x A304x A308x A30
NAMD [apoa1_npt_cuda]Ave ns/dayapoa1_npt_cudayes19.1591182363728
NAMD [apoa1_npt_cuda]NRFapoa1_npt_cudayes1x5x10x19x38x
NAMD [apoa1_nptsr_cuda]Ave ns/dayapoa1_nptsr_cudayes19.5994187372748
NAMD [apoa1_nptsr_cuda]NRFapoa1_nptsr_cudayes1x5x10x19x38x
NAMD [apoa1_nve_cuda]Ave ns/dayapoa1_nve_cudayes20.75111221442886
NAMD [apoa1_nve_cuda]NRFapoa1_nve_cudayes1x5x11x21x43x
NAMD [stmv_npt_cuda]Ave ns/daystmv_npt_cudayes1.877142858
NAMD [stmv_npt_cuda]NRFstmv_npt_cudayes1x4x8x15x31x
NAMD [stmv_nptsr_cuda]Ave ns/daystmv_nptsr_cudayes1.817153059
NAMD [stmv_nptsr_cuda]NRFstmv_nptsr_cudayes1x4x8x16x33x
NAMD [stmv_nve_cuda]Ave ns/daystmv_nve_cudayes1.948163265
NAMD [stmv_nve_cuda]NRFstmv_nve_cudayes1x4x8x17x34x

Quantum Espresso

Material Science (Quantum Chemistry)

An Open-source suite of computer codes for electronic structure calculations and materials modeling at the nanoscale

VERSION

V7.0 CPU; V7.1 GPU

ACCELERATED FEATURES

  • linear algebra (matrix multiply)
  • explicit computational kernels
  • 3D FFTs

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.quantum-espresso.org

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A302x A304x A308x A30
Quantum EspresssoTotal CPU Time (Sec)AUSURF112-jRno7182391096850
Quantum EspresssoNRFAUSURF112-jRyes1x3x7x12x16x

RELION

Microscopy

Stand-alone computer program that employs an empirical Bayesianapproach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM)

VERSION

3.1.3

ACCELERATED FEATURES

  • Reduced memory requirements; high-resolution cryo-EM structure determination in a matter of day on a single workstation
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A302x A304x A308x A30
Relion [Plasmodium Ribosome]Total Wall Clock (Sec)MB numbers Plasmodium Ribosime on Relion-3.0no12,7423,9222,2201,9051,676
Relion [Plasmodium Ribosome]NRFMB numbers Plasmodium Ribosime on Relion-3.0yes1x3x6x7x8x
Relion [Plasmodium Ribosome 2D]Total Wall Clock (Sec)Plasmodium Ribosome (2D)no90,10835,02618,22111,8688,110
Relion [Plasmodium Ribosome 2D]NRFPlasmodium Ribosome (2D)yes1x3x5x8x11x

RTM

Geoscience

Reverse time migration (RTM) modeling is a critical component in the seismic processing workflow of oil and gas exploration

VERSION

nvidia_2021_05

ACCELERATED FEATURES

  • Batch algorithm

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.tsunamidevelopment.com/assets/rtm.pdf

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A302x A304x A308x A30
RTM [Isotropic Radius 4]Mcells/sIsotropic Radius 4yes11,31844,07087,787175,692350,683
RTM [Isotropic Radius 4]NRFIsotropic Radius 4yes1x4x8x16x31x
RTM [TTI Radius 8 1-pass]Mcells/sTTI Radius 8 1-passyes3,7736,70813,36626,73153,142
RTM [TTI Radius 8 1-pass]NRFTTI Radius 8 1-passyes1x2x4x7x14x
RTM [TTI RX 2Pass mgpu]Mcells/sTTI RX 2Pass mgpuyes3,7737,01913,93227,71355,159
RTM [TTI RX 2Pass mgpu]NRFTTI RX 2Pass mgpuyes1x2x4x7x15x

SPECFEM3D

Geoscience

Simulates Seismic wave propagation

VERSION

devel_fef2ace9

ACCELERATED FEATURES

  • OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://geodynamics.org/cig/software/specfem3d/

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A302x A304x A308x A30
SPECFEM3DTotal Time (Sec)four_material_simple_modelno1,847156804123
SPECFEM3DNRFfour_material_simple_modelyes1x14x26x51x92x

Engineering

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A40 | FUN3D Benchmark: dpw_wbt0_crs-3.6Mn_5, CUDA Version: 11.8

Geoscience

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A40 | ICON Benchmark: QUBICC 160 km resolution, CUDA Version: 11.8 | SPECFEM3D Benchmark: four_material_simple_model, CUDA Version: 11.8

Molecular Dynamics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A40 | AMBER Benchmark: DC-STMV_NPT, CUDA Version: 11.8 | GROMACS Benchmark: STMV, CUDA Version: 11.8 | LAMMPS Benchmark: SNAP, CUDA Version: 11.8 | NAMD Benchmark: apoa1_nve_cuda, CUDA Version: 11.8 | Relion Benchmark: Plasmodium Ribosome (2D), CUDA Version: 11.4.2

Physics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A40 | Chroma Benchmark: szscl21_24_128, CUDA Version: 11.3.1 | GTC Benchmark: moi#proc.in, CUDA Version: 11.8 | MILC Benchmark: Apex Medium, CUDA Version: 11.8


Detailed A40 application performance data is located below in alphabetical order.

AMBER

Molecular Dynamics

Suite of programs to simulate molecular dynamics on biomolecule

VERSION

22.0-AT_22.3

ACCELERATED FEATURES

  • PMEMD Explicit Solvent and GB Implicit Solvent

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

http://ambermd.org/GPUSupport.php

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A402x A404x A408x A40
AMBER [PME-Cellulose_NPT_4fs]ns/dayDC-Cellulose_NPTyes4.0589178357714
AMBER [PME-Cellulose_NPT_4fs]NRFDC-Cellulose_NPTyes1x22x44x88x176x
AMBER [PME-Cellulose_NVE_4fs]ns/dayDC-Cellulose_NVEyes4.0895190380760
AMBER [PME-Cellulose_NVE_4fs]NRFDC-Cellulose_NVEyes1x23x47x93x186x
AMBER [PME-FactorIX_NPT_4fs]ns/dayDC-FactorIX_NPTyes20.374158301,6593,319
AMBER [PME-FactorIX_NPT_4fs]NRFDC-FactorIX_NPTyes1x20x41x81x163x
AMBER [PME-FactorIX_NVE_4fs]ns/dayDC-FactorIX_NVEyes21.04358701,7403,480
AMBER [PME-FactorIX_NVE_4fs]NRFDC-FactorIX_NVEyes1x21x41x83x166x
AMBER [PME-JAC_NPT_4fs]ns/dayDC-JAC_NPTyes82.161,0302,0614,1218,242
AMBER [PME-JAC_NPT_4fs]NRFDC-JAC_NPTyes1x13x25x50x100x
AMBER [PME-JAC_NVE_4fs]ns/dayDC-JAC_NVEyes85.071,0612,1214,2438,486
AMBER [PME-JAC_NVE_4fs]NRFDC-JAC_NVEyes1x12x25x50x100x
AMBER [PME-STMV_NPT_4fs]ns/dayDC-STMV_NPTyes1.273163126252
AMBER [PME-STMV_NPT_4fs]NRFDC-STMV_NPTyes1x25x50x99x198x
AMBER [FEP-GTI_Complex 1fs]ns/dayFEP-GTI_Complexyes9.97119238476951
AMBER [FEP-GTI_Complex 1fs]NRFFEP-GTI_Complexyes1x12x24x48x95x

Chroma

Physics

Lattice Quantum Chromodynamics (LQCD)

VERSION

V 2021.08

ACCELERATED FEATURES

  • Wilson-clover fermions, Krylov solvers, Domain-decomposition
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A402x A404x A408x A40
ChromaTotal Time (Sec)szscl21_24_128no1,11578412213
ChromaNRFszscl21_24_128yes1x15x28x52x89x

FUN3D

Engineering

Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow

VERSION

13.7 (update 1)

ACCELERATED FEATURES

  • Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://fun3d.larc.nasa.gov

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A402x A404x A408x A40
FUN3DLoop Time (Sec)dpw_wbt0_crs-3.6Mn_5no4952311175932
FUN3DNRFdpw_wbt0_crs-3.6Mn_5yes1x2x5x10x19x

GROMACS

Molecular Dynamics

Simulation of biochemical molecules with complicated bond interactions

VERSION

2022.3

ACCELERATED FEATURES

  • Implicit (5x), Explicit (2x) Solvent
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A402x A404x A408x A40
GROMACS [ADH Dodec]ns/dayADH Dodecyes69274386497-
GROMACS [ADH Dodec]NRFADH Dodecyes1x5x7x10x-
GROMACS [Cellulose]ns/dayCelluloseyes2076114155170
GROMACS [Cellulose]NRFCelluloseyes1x4x8x12x13x
GROMACS [STMV]ns/daySTMVyes4193760-
GROMACS [STMV]NRFSTMVyes1x4x9x15x-

GTC

Physics

GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas

VERSION

V 4.5 Updated

ACCELERATED FEATURES

  • Push, shift, and collision

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A402x A404x A408x A40
GTCMpush/Secmoi#proc.inyes352945441,0681,800
GTCNRFmoi#proc.inyes1x9x16x31x52x

ICON

Weather and Climate

A global unified atmosphere model for numerical weather prediction and climate modeling research

VERSION

2.6.5_RC

ACCELERATED FEATURES

  • Full model of dynamics and physics

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://code.mpimet.mpg.de/projects/iconpublic

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A402x A404x A408x A40
ICON [SLAM 191 - 160KM - no radiation]Integrate_nh (sec)SLAM 191 levels 160 km resolution without radiationno2,431741420262223
ICON [SLAM 191 - 160KM - no radiation]NRFSLAM 191 levels 160 km resolution without radiationyes1x3x6x9x11x
ICON [QUBICC 160 km resolution]Integrate_nh (sec)SLAM 191 levels 160 km resolution with radiationno2,213747415253192
ICON [QUBICC 160 km resolution]NRFSLAM 191 levels 160 km resolution with radiationyes1x3x5x9x12x

LAMMPS

Molecular Dynamics

Classical molecular dynamics package

VERSION

stable_23Jun2022_update1

ACCELERATED FEATURES

  • Lennard-Jones, Gay-Berne, Tersoff, many more potentials
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A402x A404x A408x A40
LAMMPS [ReaxFF/C]ATOM-Time Steps/sReaxFF/Cyes4.48E+056.85E+051.32E+062.50E+064.18E+06
LAMMPS [ReaxFF/C]NRFReaxFF/Cyes1x2x3x9x16x
LAMMPS [SNAP]ATOM-Time Steps/sSNAPyes1.13E+052.44E+054.88E+059.76E+051.94E+06
LAMMPS [SNAP]NRFSNAPyes1x2x5x9x17x
LAMMPS [Tersoff]ATOM-Time Steps/sTersoffyes2.75E+075.23E+071.03E+082.02E+083.50E+08
LAMMPS [Tersoff]NRFTersoffyes1x2x4x7x13x

MILC

Physics

Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons

VERSION

feature/gauge-action-quda_16a2d47119

ACCELERATED FEATURES

  • Staggered fermions, Krylov solvers, Gauge-link fattening

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://ngc.nvidia.com/catalog/containers/hpc:milc

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A402x A404x A408x A40
MILCTotal Time (Sec)Apex Mediumno72,0425,8683,0191,721988
MILCNRFApex Mediumyes1x14x26x46x80x

NAMD

Molecular Dynamics

Designed for high-performance simulation of large molecular systems

VERSION

GPU, AMD CPU V 3.0a13 ; Intel CPU V 2.15a AVX512

ACCELERATED FEATURES

  • Full electrostatics with PME and most simulation features

SCALABILITY

Up to 100M atom capable, multi-GPU, single node

MORE INFORMATION

http://www.ks.uiuc.edu/Research/namd/

https://ngc.nvidia.com/catalog/containers/hpc:namd

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A402x A404x A408x A40
NAMD [apoa1_npt_cuda]Ave ns/dayapoa1_npt_cudayes19.15103209418837
NAMD [apoa1_npt_cuda]NRFapoa1_npt_cudayes1x5x11x22x44x
NAMD [apoa1_nptsr_cuda]Ave ns/dayapoa1_nptsr_cudayes19.59109220440883
NAMD [apoa1_nptsr_cuda]NRFapoa1_nptsr_cudayes1x6x11x22x45x
NAMD [apoa1_nve_cuda]Ave ns/dayapoa1_nve_cudayes20.751452935861,175
NAMD [apoa1_nve_cuda]NRFapoa1_nve_cudayes1x7x14x28x57x
NAMD [stmv_npt_cuda]Ave ns/daystmv_npt_cudayes1.878153060
NAMD [stmv_npt_cuda]NRFstmv_npt_cudayes1x4x8x16x32x
NAMD [stmv_nptsr_cuda]Ave ns/daystmv_nptsr_cudayes1.818163264
NAMD [stmv_nptsr_cuda]NRFstmv_nptsr_cudayes1x4x9x18x35x
NAMD [stmv_nve_cuda]Ave ns/daystmv_nve_cudayes1.9410203979
NAMD [stmv_nve_cuda]NRFstmv_nve_cudayes1x5x10x20x41x

RELION

Microscopy

Stand-alone computer program that employs an empirical Bayesianapproach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM)

VERSION

3.1.3

ACCELERATED FEATURES

  • Reduced memory requirements; high-resolution cryo-EM structure determination in a matter of day on a single workstation
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A402x A404x A408x A40
Relion [Plasmodium Ribosome]Total Wall Clock (Sec)MB numbers Plasmodium Ribosime on Relion-3.0no12,7423,8102,1651,8181,678
Relion [Plasmodium Ribosome]NRFMB numbers Plasmodium Ribosime on Relion-3.0yes1x3x6x7x8x
Relion [Plasmodium Ribosome 2D]Total Wall Clock (Sec)Plasmodium Ribosome (2D)no90,10825,42911,9938,3856,005
Relion [Plasmodium Ribosome 2D]NRFPlasmodium Ribosome (2D)yes1x4x8x11x15x

RTM

Geoscience

Reverse time migration (RTM) modeling is a critical component in the seismic processing workflow of oil and gas exploration

VERSION

nvidia_2021_05

ACCELERATED FEATURES

  • Batch algorithm

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.tsunamidevelopment.com/assets/rtm.pdf

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A40
RTM [Isotropic Radius 4]Mcells/sIsotropic Radius 4yes11,31831,014
RTM [Isotropic Radius 4]NRFIsotropic Radius 4yes1x3x
RTM [TTI Radius 8 1-pass]Mcells/sTTI Radius 8 1-passyes3,7738,617
RTM [TTI Radius 8 1-pass]NRFTTI Radius 8 1-passyes1x2x
RTM [TTI RX 2Pass mgpu]Mcells/sTTI RX 2Pass mgpuyes3,7736,403
RTM [TTI RX 2Pass mgpu]NRFTTI RX 2Pass mgpuyes1x2x

SPECFEM3D

Geoscience

Simulates Seismic wave propagation

VERSION

devel_fef2ace9

ACCELERATED FEATURES

  • OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://geodynamics.org/cig/software/specfem3d/

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A402x A404x A408x A40
SPECFEM3DTotal Time (Sec)four_material_simple_modelno1,8472031035329
SPECFEM3DNRFfour_material_simple_modelyes1x10x20x40x73x

Engineering

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual Xeon Gold 6240@2.60GHz with 4x NVIDIA V100 SXM2 | FUN3D Benchmark: dpw_wbt0_crs-3.6Mn_5, CUDA Version: 11.8

Geoscience

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual Xeon Gold 6240@2.60GHz 4x NVIDIA V100 SXM2 | ICON Benchmark: QUBICC 160 km resolution, CUDA Version: 11.8 | RTM Benchmark: Isotropic Radius 4, CUDA Version: 11.8 | SPENFEM3D Benchmark: four_material_simple_model, CUDA Version: 11.8

Microscopy and Molecular Dynamics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual Xeon Gold 6240@2.60GHz with 4x NVIDIA V100 SXM2 | AMBER Benchmark: DC-STMV_NPT, CUDA Version: 11.8 | GROMACS Benchmark: STMV, CUDA Version: 11.8 | LAMMPS Benchmark: SNAP, CUDA Version: 11.8 | NAMD Benchmark: apoa1_nve_cuda, CUDA Version: 11.8 | Relion Benchmark: Plasmodium Ribosome (2D), CUDA Version: 11.4.2

Physics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual Xeon Gold 6240@2.60GHz with 4x NVIDIA V100 SXM2 | Chroma Benchmark: szscl21_24_128, CUDA Version: 11.3.1 | GTC Benchmark: moi#proc.in, CUDA Version: 11.8 | MILC Benchmark: Apex Medium, CUDA Version: 11.8

Quantum Mechanics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual Xeon Gold 6240@2.60GHz with 4x NVIDIA V100 SXM2 | Quantum Espresso Benchmark: AUSURF112-jR, CUDA Version: 11.8


Detailed V100 application performance data is located below in alphabetical order.

AMBER

Molecular Dynamics

Suite of programs to simulate molecular dynamics on biomolecule

VERSION

22.0-AT_22.3

ACCELERATED FEATURES

  • PMEMD Explicit Solvent and GB Implicit Solvent

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

http://ambermd.org/GPUSupport.php

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB8x V100S PCIe 32GB
AMBER [PME-Cellulose_NPT_4fs]ns/dayDC-Cellulose_NPTyes4.059118236372776152304607
AMBER [PME-Cellulose_NPT_4fs]NRFDC-Cellulose_NPTyes1x22x45x90x179x19x37x75x150x
AMBER [PME-Cellulose_NVE_4fs]ns/dayDC-Cellulose_NVEyes4.089619238476894188377754
AMBER [PME-Cellulose_NVE_4fs]NRFDC-Cellulose_NVEyes1x24x47x94x188x23x46x92x185x
AMBER [PME-FactorIX_NPT_4fs]ns/dayDC-FactorIX_NPTyes20.373927841,5683,1353406801,3592,718
AMBER [PME-FactorIX_NPT_4fs]NRFDC-FactorIX_NPTyes1x19x38x77x154x17x33x67x133x
AMBER [PME-FactorIX_NVE_4fs]ns/dayDC-FactorIX_NVEyes21.04188361,6733,3453587161,4322,863
AMBER [PME-FactorIX_NVE_4fs]NRFDC-FactorIX_NVEyes1x20x40x80x159x17x34x68x136x
AMBER [PME-JAC_NPT_4fs]ns/dayDC-JAC_NPTyes82.161,0042,0074,0148,0299751,9503,8997,798
AMBER [PME-JAC_NPT_4fs]NRFDC-JAC_NPTyes1x12x24x49x98x12x24x47x95x
AMBER [PME-JAC_NVE_4fs]ns/dayDC-JAC_NVEyes85.071,0592,1184,2378,4731,0262,0524,1038,206
AMBER [PME-JAC_NVE_4fs]NRFDC-JAC_NVEyes1x12x25x50x100x12x24x48x96x
AMBER [PME-STMV_NPT_4fs]ns/dayDC-STMV_NPTyes1.273162124249224488176
AMBER [PME-STMV_NPT_4fs]NRFDC-STMV_NPTyes1x24x49x98x196x17x35x69x139x
AMBER [FEP-GTI_Complex 1fs]ns/dayFEP-GTI_Complexyes9.97119239478955119237475949
AMBER [FEP-GTI_Complex 1fs]NRFFEP-GTI_Complexyes1x12x24x48x96x12x24x48x95x

Chroma

Physics

Lattice Quantum Chromodynamics (LQCD)

VERSION

V 2021.08

ACCELERATED FEATURES

  • Wilson-clover fermions, Krylov solvers, Domain-decomposition
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB8x V100S PCIe 32GB
ChromaTotal Time (Sec)szscl21_24_128no1,115165311710142281513
ChromaNRFszscl21_24_128yes1x7x37x68x111x8x41x77x85x

FUN3D

Engineering

Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow

VERSION

13.7 (update 1)

ACCELERATED FEATURES

  • Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://fun3d.larc.nasa.gov

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB8x V100S PCIe 32GB
FUN3DLoop Time (Sec)dpw_wbt0_crs-3.6Mn_5no4959950261587452314
FUN3DNRFdpw_wbt0_crs-3.6Mn_5yes1x5x12x24x41x6x14x26x43x

GROMACS

Molecular Dynamics

Simulation of biochemical molecules with complicated bond interactions

VERSION

2022.3

ACCELERATED FEATURES

  • Implicit (5x), Explicit (2x) Solvent
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB1x RTX60002x RTX60004x RTX60001x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB
GROMACS [ADH Dodec]ns/dayADH Dodecyes69258317464253310-261295-
GROMACS [ADH Dodec]NRFADH Dodecyes1x5x6x9x5x6x-5x6x-
GROMACS [Cellulose]ns/dayCelluloseyes20691001536083-7096-
GROMACS [Cellulose]NRFCelluloseyes1x3x6x11x3x5x-4x6x-
GROMACS [STMV]ns/daySTMVyes4163052132532162938
GROMACS [STMV]NRFSTMVyes1x3x7x13x3x6x7x3x7x9x

GTC

Physics

GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas

VERSION

V 4.5 Updated

ACCELERATED FEATURES

  • Push, shift, and collision

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB8x V100S PCIe 32GB
GTCMpush/Secmoi#proc.inyes352735151,0121,7973005561,0871,904
GTCNRFmoi#proc.inyes1x8x15x29x52x9x16x32x55x

ICON

Weather and Climate

A global unified atmosphere model for numerical weather prediction and climate modeling research

VERSION

2.6.5_RC

ACCELERATED FEATURES

  • Full model of dynamics and physics

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://code.mpimet.mpg.de/projects/iconpublic

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB
ICON [SLAM 191 - 160KM - no radiation]Integrate_nh (sec)SLAM 191 levels 160 km resolution without radiationno2,431591353223167819578248
ICON [SLAM 191 - 160KM - no radiation]NRFSLAM 191 levels 160 km resolution without radiationyes1x4x7x11x15x3x4x10x
ICON [QUBICC 160 km resolution]Integrate_nh (sec)SLAM 191 levels 160 km resolution with radiationno2,213514304192143697438215
ICON [QUBICC 160 km resolution]NRFSLAM 191 levels 160 km resolution with radiationyes1x4x7x12x16x3x5x10x

LAMMPS

Molecular Dynamics

Classical molecular dynamics package

VERSION

stable_23Jun2022_update1

ACCELERATED FEATURES

  • Lennard-Jones, Gay-Berne, Tersoff, many more potentials
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB8x V100S PCIe 32GB
LAMMPS [LJ 2.5]ATOM-Time Steps/sLJ 2.5yes1.07E+083.52E+086.64E+081.29E+092.34E+093.56E+086.46E+081.17E+091.89E+09
LAMMPS [LJ 2.5]NRFLJ 2.5yes1x3x6x12x23x3x6x11x18x
LAMMPS [EAM]ATOM-Time Steps/sEAMyes5.39E+071.24E+082.66E+085.37E+089.61E+081.27E+082.65E+085.07E+088.19E+08
LAMMPS [EAM]NRFEAMyes1x2x5x10x18x2x5x9x15x
LAMMPS [ReaxFF/C]ATOM-Time Steps/sReaxFF/Cyes4.48E+052.95E+065.66E+061.07E+071.84E+073.10E+065.84E+061.09E+071.76E+07
LAMMPS [ReaxFF/C]NRFReaxFF/Cyes1x11x21x41x70x12x22x41x67x
LAMMPS [SNAP]ATOM-Time Steps/sSNAPyes1.13E+051.42E+062.87E+065.72E+061.14E+071.40E+062.80E+065.61E+061.12E+07
LAMMPS [SNAP]NRFSNAPyes1x13x25x50x100x12x25x49x99x
LAMMPS [Tersoff]ATOM-Time Steps/sTersoffyes2.75E+072.67E+085.03E+089.60E+081.78E+092.78E+085.12E+089.69E+081.54E+09
LAMMPS [Tersoff]NRFTersoffyes1x10x18x35x64x10x19x35x56x

MILC

Physics

Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons

VERSION

feature/gauge-action-quda_16a2d47119

ACCELERATED FEATURES

  • Staggered fermions, Krylov solvers, Gauge-link fattening

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://ngc.nvidia.com/catalog/containers/hpc:milc

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB8x V100S PCIe 32GB
MILCTotal Time (Sec)Apex Mediumno72,0425,0342,4341,2707123,8852,0821,1371,059
MILCNRFApex Mediumyes1x16x33x62x111x20x38x70x75x

NAMD

Molecular Dynamics

Designed for high-performance simulation of large molecular systems

VERSION

GPU, AMD CPU V 3.0a13 ; Intel CPU V 2.15a AVX512

ACCELERATED FEATURES

  • Full electrostatics with PME and most simulation features

SCALABILITY

Up to 100M atom capable, multi-GPU, single node

MORE INFORMATION

http://www.ks.uiuc.edu/Research/namd/

https://ngc.nvidia.com/catalog/containers/hpc:namd

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x RTX60002x RTX60004x RTX60008x RTX60001x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB8x V100S PCIe 32GB
NAMD [apoa1_npt_cuda]Ave ns/dayapoa1_npt_cudayes19.1511122344789067134267533114228457904
NAMD [apoa1_npt_cuda]NRFapoa1_npt_cudayes1x6x12x23x46x4x7x14x28x6x12x24x47x
NAMD [apoa1_nptsr_cuda]Ave ns/dayapoa1_nptsr_cudayes19.5911623446893571142283564119237474945
NAMD [apoa1_nptsr_cuda]NRFapoa1_nptsr_cudayes1x6x12x24x48x4x7x14x29x6x12x24x48x
NAMD [apoa1_nve_cuda]Ave ns/dayapoa1_nve_cudayes20.751412855701,146901803607211442885721,138
NAMD [apoa1_nve_cuda]NRFapoa1_nve_cudayes1x7x14x27x55x4x9x17x35x7x14x28x55x
NAMD [stmv_npt_cuda]Ave ns/daystmv_npt_cudayes1.87917346851021429183570
NAMD [stmv_npt_cuda]NRFstmv_npt_cudayes1x5x9x18x36x3x6x11x22x5x10x19x37x
NAMD [stmv_nptsr_cuda]Ave ns/daystmv_nptsr_cudayes1.81918367151122439183673
NAMD [stmv_nptsr_cuda]NRFstmv_nptsr_cudayes1x5x10x20x39x3x6x12x24x5x10x20x40x
NAMD [stmv_nve_cuda]Ave ns/daystmv_nve_cudayes1.9410204079612265110204080
NAMD [stmv_nve_cuda]NRFstmv_nve_cudayes1x5x10x20x41x3x6x13x26x5x10x21x41x

NV-WRFg

Numerical Weather Prediction

Numerical weather prediction system designed for both atmospheric research and operational forecasting applications

VERSION

3.8.1 NCAR (CPU) / 3.8.1 WRFg 10_28 (GPU)

ACCELERATED FEATURES

  • Dynamics modules
  • Several Physics modules

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

https://wrfg.net/wrfg-description/

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)4x V100 SXM2 32GB4x V100S PCIe 32GB
NV-WRFgSeconds / TimestampsConus_2.5k_JAno60.620.68
NV-WRFgNRFConus_2.5k_JAyes1x10x9x

Quantum Espresso

Material Science (Quantum Chemistry)

An Open-source suite of computer codes for electronic structure calculations and materials modeling at the nanoscale

VERSION

V7.0 CPU; V7.1 GPU

ACCELERATED FEATURES

  • linear algebra (matrix multiply)
  • explicit computational kernels
  • 3D FFTs

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.quantum-espresso.org

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB8x V100S PCIe 32GB
Quantum EspresssoTotal CPU Time (Sec)AUSURF112-jRno71827113587603371409276
Quantum EspresssoNRFAUSURF112-jRyes1x3x6x9x13x2x6x9x10x

RELION

Microscopy

Stand-alone computer program that employs an empirical Bayesianapproach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM)

VERSION

3.1.3

ACCELERATED FEATURES

  • Reduced memory requirements; high-resolution cryo-EM structure determination in a matter of day on a single workstation
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB8x V100S PCIe 32GB
Relion [Plasmodium Ribosome]Total Wall Clock (Sec)MB numbers Plasmodium Ribosime on Relion-3.0no12,7423,5692,173-1,6733,5592,1902,1411,718
Relion [Plasmodium Ribosome]NRFMB numbers Plasmodium Ribosime on Relion-3.0yes1x4x6x-8x4x6x6x7x
Relion [Plasmodium Ribosome 2D]Total Wall Clock (Sec)Plasmodium Ribosome (2D)no90,108-17,79712,1108,46334,19118,47212,3938,695
Relion [Plasmodium Ribosome 2D]NRFPlasmodium Ribosome (2D)yes1x-5x7x11x3x5x7x10x

RTM

Geoscience

Reverse time migration (RTM) modeling is a critical component in the seismic processing workflow of oil and gas exploration

VERSION

nvidia_2021_05

ACCELERATED FEATURES

  • Batch algorithm

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.tsunamidevelopment.com/assets/rtm.pdf

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB8x V100S PCIe 32GB
RTM [Isotropic Radius 4]Mcells/sIsotropic Radius 4yes11,31838,16876,056152,093304,13946,00191,749183,356366,874
RTM [Isotropic Radius 4]NRFIsotropic Radius 4yes1x3x7x13x27x4x8x16x32x
RTM [TTI Radius 8 1-pass]Mcells/sTTI Radius 8 1-passyes3,7738,52116,81232,95865,2439,22118,33136,28172,384
RTM [TTI Radius 8 1-pass]NRFTTI Radius 8 1-passyes1x2x4x9x17x2x5x10x19x
RTM [TTI RX 2Pass mgpu]Mcells/sTTI RX 2Pass mgpuyes3,7737,16314,19828,18456,2468,49816,85333,50866,865
RTM [TTI RX 2Pass mgpu]NRFTTI RX 2Pass mgpuyes1x2x4x7x15x2x4x9x18x

SPECFEM3D

Geoscience

Simulates Seismic wave propagation

VERSION

devel_fef2ace9

ACCELERATED FEATURES

  • OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://geodynamics.org/cig/software/specfem3d/

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB8x V100S PCIe 32GB
SPECFEM3DTotal Time (Sec)four_material_simple_modelno1,847159824326131683624
SPECFEM3DNRFfour_material_simple_modelyes1x13x26x49x81x16x31x59x88x

Engineering

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual Xeon Gold 6240@2.60GHz with 4x NVIDIA T4 PCIe | FUN3D Benchmark: dpw_wbt0_crs-3.6Mn_5, CUDA Version: 11.8

Geoscience

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual Xeon Gold 6240@2.60GHz with 4x NVIDIA T4 PCIe | RTM Benchmark: Isotropic Radius 4, CUDA Version: 11.8 | SPECFEM3D Benchmark: four_material_simple_model, CUDA Version 11.8

Microscopy and Molecular Dynamics

CPU Server: Dual Xeon Gold 6240@2.60GHz, GPU Server: Dual Xeon Gold 6240@2.60GHz with 4x NVIDIA T4 PCIe | AMBER Benchmark: DC-STMV_NPT, CUDA Version: 11.8 | Gromacs Benchmark: STMV, CUDA Version: 11.8 | NAMD Benchmark: apoa1_nve_cuda, CUDA Version: 11.8 | Relion Benchmark: Plasmodium Ribosome (2D), CUDA Version: 11.4.2

Physics

CPU Server: Dual Xeon Gold 6240@2.60GHz, GPU Server: Dual Xeon Gold 6240@2.60GHz with 4x NVIDIA T4 PCIe | Chroma Benchmark: szscl21_24_128, CUDA Version: 11.3.1 | GTC Benchmark: moi#proc.in, CUDA Version: 11.8 | MILC Benchmark: Apex Medium, CUDA Version: 11.8


Detailed T4 application performance data is located below in alphabetical order.


AMBER

Molecular Dynamics

Suite of programs to simulate molecular dynamics on biomolecule

VERSION

22.0-AT_22.3

ACCELERATED FEATURES

  • PMEMD Explicit Solvent and GB Implicit Solvent

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

http://ambermd.org/GPUSupport.php

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
AMBER [PME-Cellulose_NPT_4fs]ns/dayDC-Cellulose_NPTyes4.0552105210
AMBER [PME-Cellulose_NPT_4fs]NRFDC-Cellulose_NPTyes1x13x26x52x
AMBER [PME-Cellulose_NVE_4fs]ns/dayDC-Cellulose_NVEyes4.0853106213
AMBER [PME-Cellulose_NVE_4fs]NRFDC-Cellulose_NVEyes1x13x26x52x
AMBER [PME-FactorIX_NPT_4fs]ns/dayDC-FactorIX_NPTyes20.372595181,037
AMBER [PME-FactorIX_NPT_4fs]NRFDC-FactorIX_NPTyes1x13x25x51x
AMBER [PME-FactorIX_NVE_4fs]ns/dayDC-FactorIX_NVEyes21.02605211,042
AMBER [PME-FactorIX_NVE_4fs]NRFDC-FactorIX_NVEyes1x12x25x50x
AMBER [PME-JAC_NPT_4fs]ns/dayDC-JAC_NPTyes82.169581,9153,831
AMBER [PME-JAC_NPT_4fs]NRFDC-JAC_NPTyes1x12x23x47x
AMBER [PME-JAC_NVE_4fs]ns/dayDC-JAC_NVEyes85.079551,9113,822
AMBER [PME-JAC_NVE_4fs]NRFDC-JAC_NVEyes1x11x22x45x
AMBER [PME-STMV_NPT_4fs]ns/dayDC-STMV_NPTyes1.27183773
AMBER [PME-STMV_NPT_4fs]NRFDC-STMV_NPTyes1x14x29x58x
AMBER [FEP-GTI_Complex 1fs]ns/dayFEP-GTI_Complexyes9.9799197394
AMBER [FEP-GTI_Complex 1fs]NRFFEP-GTI_Complexyes1x10x20x40x

Chroma

Physics

Lattice Quantum Chromodynamics (LQCD)

VERSION

V 2021.08

ACCELERATED FEATURES

  • Wilson-clover fermions, Krylov solvers, Domain-decomposition
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
ChromaTotal Time (Sec)szscl21_24_128no1,1151174026
ChromaNRFszscl21_24_128yes1x10x28x44x

FUN3D

Engineering

Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow

VERSION

13.7 (update 1)

ACCELERATED FEATURES

  • Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://fun3d.larc.nasa.gov

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
Fun3DLoop Time (Sec)dpw_wbt0_crs-3.6Mn_5no49528714474
Fun3DNRFdpw_wbt0_crs-3.6Mn_5yes1x2x3x8x

GROMACS

Molecular Dynamics

Simulation of biochemical molecules with complicated bond interactions

VERSION

2022.3

ACCELERATED FEATURES

  • Implicit (5x), Explicit (2x) Solvent
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
GROMACS [ADH Dodec]ns/dayADH Dodecyes69172243-
GROMACS [ADH Dodec]NRFADH Dodecyes1x3x5x-
GROMACS [Cellulose]ns/dayCelluloseyes20446578
GROMACS [Cellulose]NRFCelluloseyes1x2x3x5x
GROMACS [STMV]ns/daySTMVyes4112128
GROMACS [STMV]NRFSTMVyes1x2x5x6x

GTC

Physics

GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas

VERSION

V 4.5 Updated

ACCELERATED FEATURES

  • Push, shift, and collision

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
GTCMpush/Secmoi#proc.inyes35235467886
GTCNRFmoi#proc.inyes1x7x14x26x

ICON

Weather and Climate

A global unified atmosphere model for numerical weather prediction and climate modeling research

VERSION

2.6.5_RC

ACCELERATED FEATURES

  • Full model of dynamics and physics

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://code.mpimet.mpg.de/projects/iconpublic

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
ICON [SLAM 191 - 160KM - no radiation]Integrate_nh (sec)SLAM 191 levels 160 km resolution without radiationno2,431971549404
ICON [SLAM 191 - 160KM - no radiation]NRFSLAM 191 levels 160 km resolution without radiationyes1x3x4x6x
ICON [QUBICC 160 km resolution]Integrate_nh (sec)SLAM 191 levels 160 km resolution with radiationno2,213954531352
ICON [QUBICC 160 km resolution]NRFSLAM 191 levels 160 km resolution with radiationyes1x2x4x6x

MILC

Physics

Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons

VERSION

feature/gauge-action-quda_16a2d47119

ACCELERATED FEATURES

  • Staggered fermions, Krylov solvers, Gauge-link fattening

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://ngc.nvidia.com/catalog/containers/hpc:milc

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
MILCTotal Time (Sec)Apex Mediumno72,0427,3353,8012,098
MILCNRFApex Mediumyes1x11x21x38x

NAMD

Molecular Dynamics

Designed for high-performance simulation of large molecular systems

VERSION

GPU, AMD CPU V 3.0a13 ; Intel CPU V 2.15a AVX512

ACCELERATED FEATURES

  • Full electrostatics with PME and most simulation features

SCALABILITY

Up to 100M atom capable, multi-GPU, single node

MORE INFORMATION

http://www.ks.uiuc.edu/Research/namd/

https://ngc.nvidia.com/catalog/containers/hpc:namd

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
NAMD [apoa1_npt_cuda]Ave ns/dayapoa1_npt_cudayes19.1557115231
NAMD [apoa1_npt_cuda]NRFapoa1_npt_cudayes1x3x6x12x
NAMD [apoa1_nptsr_cuda]Ave ns/dayapoa1_nptsr_cudayes19.5960120240
NAMD [apoa1_nptsr_cuda]NRFapoa1_nptsr_cudayes1x3x6x12x
NAMD [apoa1_nve_cuda]Ave ns/dayapoa1_nve_cudayes20.7577153305
NAMD [apoa1_nve_cuda]NRFapoa1_nve_cudayes1x4x7x15x
NAMD [stmv_npt_cuda]Ave ns/daystmv_npt_cudayes1.874918
NAMD [stmv_npt_cuda]NRFstmv_npt_cudayes1x2x5x9x
NAMD [stmv_nptsr_cuda]Ave ns/daystmv_nptsr_cudayes1.815918
NAMD [stmv_nptsr_cuda]NRFstmv_nptsr_cudayes1x3x5x10x
NAMD [stmv_nve_cuda]Ave ns/daystmv_nve_cudayes1.9451021
NAMD [stmv_nve_cuda]NRFstmv_nve_cudayes1x3x5x11x

NV-WRFg

Numerical Weather Prediction

Numerical weather prediction system designed for both atmospheric research and operational forecasting applications

VERSION

3.8.1 NCAR (CPU) / 3.8.1 WRFg 10_28 (GPU)

ACCELERATED FEATURES

  • Dynamics modules
  • Several Physics modules

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

https://wrfg.net/wrfg-description/

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)8x T4 PCIe
NV-WRFgSeconds / TimestampsConus_2.5k_JAno5.510.90
NV-WRFgNRFConus_2.5k_JAyes1x7x

RELION

Microscopy

Stand-alone computer program that employs an empirical Bayesianapproach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM)

VERSION

3.1.3

ACCELERATED FEATURES

  • Reduced memory requirements; high-resolution cryo-EM structure determination in a matter of day on a single workstation
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
Relion [Plasmodium Ribosome]Total Wall Clock (Sec)MB numbers Plasmodium Ribosime on Relion-3.0no12,7423,3772,5671,815
Relion [Plasmodium Ribosome]NRFMB numbers Plasmodium Ribosime on Relion-3.0yes1x4x5x7x
Relion [Plasmodium Ribosome 2D]Total Wall Clock (Sec)Plasmodium Ribosome (2D)no90,10830,65118,10011,239
Relion [Plasmodium Ribosome 2D]NRFPlasmodium Ribosome (2D)yes1x3x5x8x

RTM

Geoscience

Reverse time migration (RTM) modeling is a critical component in the seismic processing workflow of oil and gas exploration

VERSION

nvidia_2021_05

ACCELERATED FEATURES

  • Batch algorithm

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.tsunamidevelopment.com/assets/rtm.pdf

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
RTM [Isotropic Radius 4]Mcells/sIsotropic Radius 4yes11,31822,06044,16888,386
RTM [Isotropic Radius 4]NRFIsotropic Radius 4yes1x2x4x8x
RTM [TTI Radius 8 1-pass]Mcells/sTTI Radius 8 1-passyes3,7735,88111,66823,111
RTM [TTI Radius 8 1-pass]NRFTTI Radius 8 1-passyes1x2x3x6x
RTM [TTI RX 2Pass mgpu]Mcells/sTTI RX 2Pass mgpuyes3,773-9,82519,610
RTM [TTI RX 2Pass mgpu]NRFTTI RX 2Pass mgpuyes1x-3x5x

SPECFEM3D

Geoscience

Simulates Seismic wave propagation

VERSION

devel_fef2ace9

ACCELERATED FEATURES

  • OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://geodynamics.org/cig/software/specfem3d/

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
SPECFEM3DTotal Time (Sec)four_material_simple_modelno1,84723912264
SPECFEM3DNRFfour_material_simple_modelyes1x9x17x33x