5.2 F-Engine RFSoC Hardware and Capabilities
The first stage of the ALPACA back end digital processing system is called an "F-engine." The F-engine includes digitizers, first stage oversampled polyphase filterbank channelizer, packetizer, and 100GbE data transport.
The F-engine is implemented using the Xilinx ZCU216 board with a 3rd generation Zynq UltraScale+ RFSoC. Two of these RFSoC boards have been purchased and are being used at BYU for firmware development. Orders have been placed for the remaining 12 (including 2 spare).
A high-level block diagram of the RFSoC package is shown in the below figure.
The RFSoC integrates programmable logic (FPGA fabric) with the Zynq ARM (A53) processor, high speed serial transceivers, and the RF Data Converters (RFDC) which is a hardened IP core implementing all RF functionality. The RFDC groups together multi-gigasample per second analog to digital converters (ADCs) and digital to analog converters (DACs) capable of direct RF sampling up to 6 GHz or synthesis up to 9.85 GHz, respectively. Additionally, these cores include digital down and up converters respectively, a mixer capable of a fixed coarse setting or fine frequency tuning by a numerically controlled oscillator (NCO), and interpolation and decimation filters. A block diagram of the analog signal path for the ADCs is shown in the following figure.
The ADCs and DACs are grouped into "tiles" to some extent similar to the idea of other columnar tile components of a Xilinx FPGA. In this case however, the ADCs or DACs and their supporting components populate the entire tile. There are two different tile architectures found in RFSoC devices: quad-tile and dual-tile. The number of tiles found in the device and their capabilities varies between RFSoC packages and generation. The quad- and dual-tile architectures are depicted in the below figure.
A ZCU216 board is pictured below, and the following table provides a summary of onboard FPGA resources for this particular Gen 3 RFSoC.
|Analog Signal Path|
|#ADCs w/ DDC||16|
|Max ADC rate (Gsps)||2.5|
|ADC Resolution (bits)||14|
|RF Input Bandwidth (GHz)||6.0|
|Decimation Factors||1x, 2x, 3x, 4x, 5x, 6x, 8x,
10x, 12x, 16x, 20x, 24x, 40x
|LUT RAM (Mb)||13.0|
|100G CMAC w/ RSFEC||2|
ALPACA ZCU216 Configuration
The 49DR RFSoC is capable of sampling inputs from 16 antennas however, to accommodate data rates and available board resources for the oversampled PFB the base ALPACA design will be to sample 12 inputs per board using a total of 12 ZCU216 boards in the entire back end. These boards will be rack mounted individually in a custom enclosure (see Section 7.2).
The RFDC will be configured to directly sample the L-band RF signal, without analog mix down, at 2000 Msps, thus targeting the 2nd Nyquist Zone. We use the provided digital down converters and NCO in the ADCs to tune to center on the 1300-1720 MHz passband, with a 4x decimation factor. This results in an effective 500 MHz sample rate with RF at complex baseband.
The ZCU216 provides one 4x25 SFP28 network I/O cage with the four lanes capable of aggregating to implement a single CAUI-4 100G PHY with RS-FEC (Reed-Solomon forward error correction). The link between the RFSoC and 100G Arista switch will be 100GBASE-SR4 over an OM4 50/125 Multimode LC to MPO break out cable and compatible SFP28 and QSFP28 optical transceivers.
Each of the ADCs of the RFSoC have two built-in calibration procedures: foreground calibration and background calibration. These two modes are provided to compensate for timing and gain offsets due to the interleaved architecture of the RFSoC. The foreground calibration step is executed during the RFDC startup state machine and corrects for DC offsets in the interleaved ADCs. After foreground calibration, the background calibration step is designed to operate during ADC run-time as needed to correct for gain differences and time skew offsets that may be introduced. These calibration processes will be controlled by driver software running on the MPSoC.
The following figure shows the frequency response of the RFSoC built-in anti-alias decimating FIR filter configured at the ALPACA specification of 4x decimation.
Multi-tile and Multi-chip Synchronization
Multi-tile (MTS) and Multi-chip Synchronization (MCS) are the processes by which the ADC sampling circuits are synchronized with reference to a common sampling time position. This is necessary to avoid random initial phase relationships between RF data streams at each power cycle start up. Tile synchronization refers the alignment of samples across the ADCs within the different tiles that make up the architecture on any given single RFSoC chip. Chip synchronization is the alignment of ADCs (tiles) between two or more RFSoC chips (boards). Repeatable and deterministic sample alignment is critical to maintain beamformer weight calibrations for ALPACA. MCS will be the primary mechanism for achieving deterministic sample delay through the ALPACA transport and digital processing1.
An industry standard protocol to achieve deterministic latency and
synchronization in applications with high-speed ADCs using serial link I/O is
JESD204B. This standard calls out for a dedicated clock signal called
as the global timing reference used to align the devices internal dividers,
clocks, and multi-frame clocks. Xilinx has augmented the standard implementing
"a complementary, simplified scheme for SYSREF" [PG269, Ch.4]2.
The following figure is the high-level block diagram depicting the MTS hardware design:
There are 4 required clocks to be provided by the PCB to the pin package of the
RFSoC: tile sample clocks,
PL SYSREF, and
PL CLK. The tile
sample clocks and
Analog SYSREF are directly input into the tiles of the RFDC
of the RFSoC. The
PL CLK is the fabric clock used to clock out samples from
the RFDC output FIFOs and the
PL SYSREF must be a fabric copy (phase aligned)
Analog SYSREF.3 The RFDC internally provides the required
distribution of the these clock signals and the synchronization state machine.
For the ZCU216 board these clocking signals are provided by the CLK104 add-on
The synchronization state machine is prepared, configured and started using a
software driver that arms and guides the synchronization process4.
Analog SYSREF that has been distributed across all tiles is captured with
the sample clock and the sample clock is delay scanned to determine a stable
sampling position with
Analog SYSREF in the center of the sample clock period.
A synchronous reset is then issued using
Analog SYSREF to reset the digital
part of the tiles (dividers, etc.). Alignment of the output FIFOs is then done
by comparing the "time of flight" difference between the
Analog SYSREF and
PL SYSREF by inserting a "marker bit". This bit is compared on the output of each
FIFO and with samples delayed to match.
In ALPACA this software driver capability has been implemented as part of the
RFSoC augmented version of
tcpborphserver that is running on the MPSoC A53
processor. The following two figures show an example of actual MTS results for eight inputs on the RFSoC
(two ADCs per tile).
The first plot shows the ADC outputs are not phase aligned while
the second plot are ADC outputs following MTS.
MCS is the generalization of MTS with each board running an independent process
dependent on tighter constraints imposed on the
SYSREF signaling. In this
PL CLKs between boards must be aligned better than
1/2a sample clock period.
Analog SYSREFs between boards must be aligned within
1/4a sample period.
- All other conditions from MTS still apply.
The software driver is configured and used similarly to initiate MTS on each board. It may be necessary to report the results of the delay through the output FIFOs for each board with the worst case delay used as the target latency through the FIFO that is used in subsequent update to each board in the system using the driver API.
To mitigate risk of sample misalignment, as a backup system, each transmitter and receiver board in the RFoF link provides for the injection of calibration signals to permit detection and characterization of any electronic or phase/gain drift in the fiber or electronics.
Xilinx accuracy specification for MTS operation is +/- one sample clock period. This is because there is no formal specification on the ADC sample clock phases relative to each other. In PCB board layout, aligning the sample clocks to each tile can reduce skew improving the spec to absolute alignment.
There are additional clock signal conditioning requirements in order to satisfy the full features of MTS operation.
Different MTS functions have separate process. For example, the NCO is an update triggered event requiring different conditioning of the
PL_SYSREFand driver API configuration.