|
|
[<< Home](/home#5-digital-back-end-design-panel-charge-4)
|
|
|
|
|
|
[<< Section 5.2](./5.2)
|
|
|
|
|
|
## 5.3 XB-Engine Hardware and Capabilities
|
|
|
|
|
|
Following the F-engine, the ALPACA digital back end includes a correlator and
|
|
|
beamformer (XB-engine). The XB-engine is implemented using a GPU cluster of 25
|
|
|
high-performance computers (HPCs). GPU based digital back ends provide
|
|
|
flexibility as individual hardware components can be easily replaced or upgraded
|
|
|
for improved performance. Compared to FPGA design, the software used to
|
|
|
implement the digital back end is more easily developed and maintained, and refactoring cycles
|
|
|
are faster.
|
|
|
|
|
|
The selected HPC is the Tyan Transport HX TN83B8251, a 2U server chassis with a
|
|
|
dual-socket motherboard architecture for 3rd Generation AMD EPYC (Zen 3 Milan)
|
|
|
supporting (per processor) two dual-width, full-height PCIe 4.0x16 slots, and an
|
|
|
additional half-height PCIe 4.0x16. Each server is populated with two AMD EPYC
|
|
|
7313 core processors, two NVIDIA A10 Tensor Core GPUs and 96 GB DDR4 RAM. To
|
|
|
both receive Ethernet packets from the output of the F-engine over 100GbE and
|
|
|
transmit processed data products to an distributed file system over InfiniBand
|
|
|
each server has two Mellanox dual-port InfiniBand/EDR QSFP28 network interface
|
|
|
cards (NICs). All 25 (+2 spare) servers, 50 (+4 spare) GPUs, 50 (+4 spare) NIC
|
|
|
cards, and associated accessories have been purchased, with one system on hand
|
|
|
at BYU in use for software development, and the remainder expected to be
|
|
|
delivered to BYU by the time of this design review.
|
|
|
|
|
|
The above configuration specification with more detailed information is
|
|
|
presented in the following table[^tyan-footnote]:
|
|
|
|
|
|
<div align="center">
|
|
|
|
|
|
| Feature| Description|
|
|
|
| :------ | :------ |
|
|
|
| Processors | 2 AMD EPYC 7313 Processor, 16 Cores 32 Threads, 3.0 GHz, 128 MB L3 Cache, 155 W |
|
|
|
| Memory | 96 GB DDR4-3200MHz EEC |
|
|
|
| Hard Drives | 2 TB SATA HDD 6.0 Gb/s 7200rpm |
|
|
|
| PCIe| Gen 4.0[^pcie-footnote], 4 dual-width/full-height x16, 2 half-height x16 |
|
|
|
| GPU | 2 NVIDIA A10 Tensor Core |
|
|
|
| NIC | 2 Mellanox MCX556A-EDAT ConnectX-5 dual-port InfiniBand EDR |
|
|
|
| LAN Network | dual port 1 GbE LAN ports + 1 GbE dedicated IPMI |
|
|
|
| Power Supply | 2,200 W, 80 plus platinum, 1+1 redundant, 120VAC/25.2A, 200-240VAC/12.6A |
|
|
|
| Height| 2U |
|
|
|
| Rack Mountable | yes |
|
|
|
| Drive Bays | 8 Hot-swap HDD/SSD SATA or NVMe |
|
|
|
|
|
|
</div>
|
|
|
|
|
|
The following two figures show two platform development servers rack mounted at
|
|
|
BYU and an inside view of the server chassis showing the dual-socket CPU
|
|
|
architecture and riser cards installed with GPUs and NICs.
|
|
|
|
|
|
<div align="center">
|
|
|
<img src="../img/dbe/tyan-server-rack-mount.png" width=500"\>
|
|
|
</div>
|
|
|
|
|
|
<div align="center">
|
|
|
<img src="../img/dbe/inside-view-tyan-server.png" width=500"\>
|
|
|
</div>
|
|
|
|
|
|
The NVIDIA A10 (GA102-890) Tensor Core GPU is a single-width, full-height PCIe
|
|
|
4.0 form factor GPU and is the successor to the T4 series GPUs. The A10 feature
|
|
|
NVIDIAs latest design architecture Ampere. This model has 72 Streaming
|
|
|
multiprocessors (SMs) each containing 128 CUDA cores, four third-generation
|
|
|
Tensor Cores, a 256 KB register file, four texture units, and 128 KB shared
|
|
|
memory. The memory subsystem is 12 32-bit memory controllers (384-bit bus), 512
|
|
|
KB of L2 cache with each memory controller (total of 6144 KB).
|
|
|
|
|
|
The Ampere SM architecture improves over previous architectures by implementing
|
|
|
floating point operations in both datapaths of an SM partition. Previous chips
|
|
|
only had an independent integer and floating point datapath. This is expected to
|
|
|
improve processing and throughput capabilities. More significantly, this new SM
|
|
|
architecture features unified shared memory, data data cache, and texture
|
|
|
caching.
|
|
|
|
|
|
The Tensor Cores in these processors are specialized execution units designed
|
|
|
specifically for vector/matrix operations that are a core compute function of the
|
|
|
correlator and beamformer.
|
|
|
|
|
|
The following summarizes the above specifications for the A10 GPU:
|
|
|
|
|
|
<div align="center">
|
|
|
|
|
|
| Feature | Description|
|
|
|
| :------ | :------ |
|
|
|
| Architecture | Ampere |
|
|
|
| SMs | 72 |
|
|
|
| CUDA cores | 9216 |
|
|
|
| Tensor cores | 288 |
|
|
|
| Memory Size | 24 GB GDDR6 |
|
|
|
| Memory Bus | 384-bit |
|
|
|
| Memory Bandwidth | 600 GB/s |
|
|
|
| L1 (shared memory) | 128 KB per SM |
|
|
|
| L2 Cache | 6 MB (512 KB per memory controller) |
|
|
|
| Bus Interface | PCIe 4.0 x16 |
|
|
|
| Form Factor | single-slot, full-height
|
|
|
| Power Consumption | 150 W (max) |
|
|
|
|
|
|
</div>
|
|
|
|
|
|
The below figure shows the A10 GPU installed in a riser card along side the Mellanox
|
|
|
dual-port EDR NIC.
|
|
|
|
|
|
<div align="center">
|
|
|
<img src="../img/dbe/a10-mlnx-riser-card.png" width=500"\>
|
|
|
</div>
|
|
|
|
|
|
[Section 5.4 >>](./5.4)
|
|
|
|
|
|
### Footnotes
|
|
|
[^pcie-footnote]: The PCIe 4.0 standard was released in 2017 and improves on
|
|
|
PCIe 3.0 by increasing the unidirectional transfer capacity to 256 Gbps.
|
|
|
However, only recently has hardware at a competitive price become available.
|
|
|
|
|
|
[^tyan-footnote]: See [Tyan vendor website][tyan-transport-hx] for full server
|
|
|
specifications
|
|
|
|
|
|
[a10-gpu]: https://www.nvidia.com/en-us/data-center/products/a10-gpu/
|
|
|
[a10-ds]:
|
|
|
https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/a10/pdf/datasheet-new/nvidia-a10-datasheet.pdf
|
|
|
[ga102-arch-wp]: https://images.nvidia.com/aem-dam/en-zz/Solutions/geforce/ampere/pdf/NVIDIA-ampere-GA102-GPU-Architecture-Whitepaper-V1.pdf
|
|
|
[ampere-dev-blog]: https://developer.nvidia.com/blog/nvidia-ampere-architecture-in-depth/
|
|
|
[tyan-transport-hx]: https://www.tyan.com/Barebones_TN83B8251_B8251T83E8HR-2T-N
|
|
|
[amd-epyc-zen3]: https://www.amd.com/en/events/epyc#EPYC-7003-Processor
|
|
|
[epyc-wiki]: https://en.wikipedia.org/wiki/Epyc
|
|
|
[a10-hw-specs]: https://www.techpowerup.com/gpu-specs/a10-pcie.c3793
|
|
|
[mcx556-edat-specs]: https://docs.mellanox.com/display/ConnectX5IB/Specifications#Specifications-MCX556A-EDATSpecifications |