Introducing scrp-data, our second generation all-flash distributed storage system. This new, custom-built system will provide four times the capacity and double the sustained throughout of our previous distributed file system, all contained within a 2U rackmount server. Once it is online, this new system will be one of the fastest installed among all academic HPC systems in Hong Kong.

Performance of the system:

Capacity 22TB
Sequential read throughput 19.2GB/s
Maximum IOPS 500,000

In terms of performance, scrp-data is a low-spec version of commercial all-flash storage arrays, such as DDN SFA200NVX. If budget permits, it is generally a good idea to acquire full systems from specialized manufacturers such as DDN and NetApp, since it takes a lot of expertise and experimentation to squeeze every last bit of performance out of such systems.

In any case, if you are in a situation where you need to build such a system, here is what you need to know:

Things that make a big difference

  • Network interface cards (NIC)
  • Storage disks
  • Memory channels
  • CPU speed
  • Virtual memory settings

Things that do not make a big difference

  • Memory speed
  • Distributed file system software tuning

Because NVMe SSDs are so fast, all-flash storage arrays are typically bottlenecked by other components of the system. This tends to be the network interface. As a general rule, storage throughout max out at ~45% of theoretical network bandwidth. To see this, consider the stated performance of commercial all-flash storage arrays:

  • DDN SFA200NVX
    • Throughput: 24GB/s
    • NIC: 100Gb/s Infiniband x 4 = 50GB/s
    • Ratio: 24/50 = 48%
  • DDN SFA200NVX2
    • Throughout: 48GB/s
    • NIC: 200Gb/s Infiniband x 4 = 100GB/s
    • Ratio: 48/100 = 48%
  • DDN AI400X2
    • Throughput: 90GB/s
    • NIC: 200Gb/s Infiniband x 8 = 200GB/s
    • Ratio: 90/200 = 45%
  • NetApp EF300
    • Throughput: 20GB/s
    • NIC: 100Gb/s Infiniband x 4 = 50GB/s
    • Ratio: 20/50 = 40%

scrp-data is no different:

  • SCRP-DATA
    • Throughput: 19.2GB/s
    • NIC: 56Gb/s Infiniband x 6 = 42GB/s
    • Ratio: 19.2/42 = 45.7%

Hardware specifications:

Network Interface Cards Mellanox Connect-IB HCA card x 2
Mellanox ConnectX-3 HCA card x 2
Storage Disks Intel P4510 8TB NVMe SSD x 6
Intel P4600 1.6TB NVMe SSD x 2
CPU AMD EPYC 7302
Memory SK Hynix 32GB 3200Mhz ECC RDIMM x 8
/ Kingston 32GB 2666Mhz ECC RDIMM x 4
Chassis Tyan TN70A-B8026