|
|
[<< Home](/home#6-firmware-and-software-design-panel-charge-5)
|
|
|
|
|
|
[<< Section 5.4](/5-dbe/5.4)
|
|
|
|
|
|
## 6.1 Beamformer F-Engine Firmware
|
|
|
An overview of the beamformer digital back end and its operating modes was given
|
|
|
in [Section 5.1](../5-dbe/5.1). The capability to provide different modes with
|
|
|
either coarse or narrow-band spectral data products is realized by a two-stage
|
|
|
channelizer architecture. First stage digital processing will be done in the
|
|
|
[RFSoC](../5-dbe/5.2). A high-level block diagram is shown in the following
|
|
|
figure which depicts the signal path through the RFSoC for one antenna element.
|
|
|
The input signal is received over the RFoF link, and output is routed to the
|
|
|
second-stage processor through the 100 GbE network and switch.
|
|
|
|
|
|
<div align="center">
|
|
|
<img src="../img/dbe/f-engine-blk-diagram.png" width=800"\>
|
|
|
|
|
|
Figure 1: RFSoC single antenna signal flow diagram
|
|
|
</div>
|
|
|
|
|
|
|
|
|
The first stage digital processing (F-engine) includes sampling antenna voltages,
|
|
|
frequency channelization, and data "packetizing" for network transport.
|
|
|
The following describes functionality and implementation of the IP used in the F-engine
|
|
|
following the ADC. The RFSoCs sampling capabilities, configuration, and
|
|
|
operation in the context of ALPACA were addressed in [Section 5.2](../5-dbe/5.2).
|
|
|
|
|
|
### 6.1.1 Oversampled Polyphase Filter Bank
|
|
|
A channelizer is a filter bank used to decompose an input signal into bins by
|
|
|
frequency. In high-performance real-time systems, computationally efficient
|
|
|
channelization is achieved by using a polyphase filter bank (PFB) as opposed to
|
|
|
a conventional fast Fourier transform (FFT) because of its ability to reduce
|
|
|
spectral leakage and signal attenuation near frequency bin edges (called scalloping
|
|
|
loss).
|
|
|
|
|
|
Single-stage PFB implementations follow a conventional design approach where the
|
|
|
frequency response of the prototype low-pass filter (LPF) has low sidelobes,
|
|
|
narrow transition bands, and the attenuation specification at the crossover
|
|
|
point between adjacent channels is -3 dB. This results in a uniform power spread
|
|
|
for spectra across the full bandwidth of the instrument. The PFB which
|
|
|
accomplishes this is called a critically sampled (or maximally decimated) PFB
|
|
|
because the channelizer output sample rate per channel, in samples per second,
|
|
|
is equal to the effective channel spacing in Hertz [^harris].
|
|
|
|
|
|
In two-stage channelizer architectures, when this same approach is followed but
|
|
|
output products are then subsequently processed by a second-stage "zoom" PFB,
|
|
|
this results in two significant processing artifacts observed in the fine
|
|
|
channelized spectrum in regions corresponding to coarse adjacent channel
|
|
|
crossovers. These undesirable artifacts are scalloping between adjacent
|
|
|
adjacent fine channels, and spectral aliasing between fine channels. An example
|
|
|
of this behavior is shown in the following figure:
|
|
|
|
|
|
<div align="center">
|
|
|
<img src="../img/dbe/second_stage_alias.png" width=600"\>
|
|
|
</div>
|
|
|
|
|
|
Figure 2: System degrading processing artifacts are present when a critically
|
|
|
sampled PFB is followed by a second-stage channelizer. Note the aliased
|
|
|
frequency tone (red curve) and scalloping of the white noise floor which should
|
|
|
be flat (black curve).
|
|
|
|
|
|
Despite the design of the LPF in the first-stage being correct for a
|
|
|
channelizer design, the scalloping shown is the expected result because the
|
|
|
filter frequency response in the transition band is sampled at a finer
|
|
|
frequency resolution as a result of the second channelizer. The spectral images
|
|
|
that occur from signals present at the coarse channel boundary are a more severe
|
|
|
artifact and occur because the filter was not designed to attenuate aliases at
|
|
|
the same level as a conventional anti-aliasing filter.
|
|
|
|
|
|
To avoid these spectral corruptions when processing in fine "zoom" spectrometer mode,
|
|
|
the channelizer in the ALPACA F-engine is not the conventional critically
|
|
|
sampled PFB, but an oversampled PFB (OSPFB). Here, the decimation rate of the
|
|
|
first-stage channelizer is decreased and the channel passband shape is designed
|
|
|
to allow for a slight overlap between adjacent channels in their crossover
|
|
|
region. Following the output of the second-stage critically sampled PFB, the
|
|
|
fine channels in the overlapped region are discarded eliminating all unwanted
|
|
|
processing artifacts. With proper prototype filter design only a few channels of
|
|
|
overlap are required. The OSPFB does increase the channelizer output
|
|
|
sampling rate (compared to the critically sampled case), and this needs to be accounted
|
|
|
for as part of the allocated I/O budget.
|
|
|
|
|
|
The following figure shows a software simulation result comparing the output of
|
|
|
a second-stage PFB for fine spectrometer mode when the first-stage PFB is either
|
|
|
critically sampled or oversampled. A signal of interest is placed between
|
|
|
adjacent channels within the passband. When the first-stage PFB is critically
|
|
|
sampled we again see the scalloping and aliased image of the signal of interest.
|
|
|
The OSPFB successfully removes these unwanted artifacts producing a uniform
|
|
|
power spectrum.
|
|
|
|
|
|
<div align="center">
|
|
|
<img src="../img/dbe/os_pfb_mat.png" width=600"\>
|
|
|
|
|
|
Figure 3: Improved second stage spectrum with an OSPFB first stage.
|
|
|
</div>
|
|
|
|
|
|
|
|
|
The architecture for the implementation of an OSPFB can be derived by starting
|
|
|
with that of a critically sampled PFB. As shown in the following figure, a PFB
|
|
|
channelizer producing $`M`$ frequency bin outputs can be considered an
|
|
|
$`M`$-port device where samples are delivered to the $`M`$ branches of a
|
|
|
polyphase LPF with filter outputs subsequently processed by an $`M`$-point FFT.
|
|
|
|
|
|
<div align="center">
|
|
|
<img src="../img/dbe/cspfb-blk-diagram.png" width=600"\>
|
|
|
|
|
|
Figure 4: Critically sampled PFB block diagram.
|
|
|
</div>
|
|
|
|
|
|
In the critically sampled case, $`M`$ samples are delivered to the core per
|
|
|
computation of the $`M`$ branch filter outputs and $`M`$-point FFT. The OSPFB
|
|
|
modifies the decimation by any rate $`D`$ to be less than the critical rate $`M`$ ($`D
|
|
|
< M`$), increasing the sampling rate at each output port by the ratio $`M/D`$.
|
|
|
In practice this is done by shifting in $`D`$ samples to the core per
|
|
|
computation of branch filter and FFT outputs.
|
|
|
|
|
|
The shifts by $`D`$ samples as opposed to $`M`$ introduce a frequency dependent
|
|
|
phase offset not accounted for by the $`M`$-point FFT kernel. The compensation
|
|
|
of this phase offset is done with the addition of a barrel sample rotator
|
|
|
serving to re-align the $`M`$-path filter outputs with their respective
|
|
|
transform input. The following figure shows the modified block diagram for the
|
|
|
OPSFB implementation with the addition of the phase compensation buffer.
|
|
|
|
|
|
<div align="center">
|
|
|
<img src="../img/dbe/ospfb-concept-blk-diagram.png" width=600"\>
|
|
|
|
|
|
Figure 5: Oversampled PFB block diagram.
|
|
|
</div>
|
|
|
|
|
|
|
|
|
The ALPACA F-engine OSPFB is a custom developed IP which takes into account the
|
|
|
trade-offs in the number of parallel antenna signals and available FPGA
|
|
|
resources resulting in a flexible and efficient implementation.
|
|
|
Design and implementation for a single antenna input of this custom ALPACA
|
|
|
hardware OSPFB IP for the RFSoC has been completed.
|
|
|
The following figure shows a complete post-synthesis hardware simulation (bit and cycle
|
|
|
accurate) for the first-stage ALPACA specified OSPFB (2048 channels, oversample
|
|
|
ratio 4/3, 8 polyphase taps) followed by a second stage software 32-point
|
|
|
critically sampled PFB. The core is functional and working as expected.
|
|
|
|
|
|
<div align="center">
|
|
|
<img src="../img/dbe/ospfb-hw-sim-output.png" width=600"\>
|
|
|
|
|
|
Figure 6: Fine spectrum plot of the ALPACA OSPFB output with a single tone input. Note the lack of scalloping or aliasing.
|
|
|
</div>
|
|
|
|
|
|
### 6.1.2 Packetizer
|
|
|
The document linked below specifies the detailed ethernet jumbo packet format for
|
|
|
data transfer from the RFSoC F-engine digitizer and frequency channelizer, to
|
|
|
the GPU XB-engine digital beamformer. The data transfer is handled by a 60-port
|
|
|
100 GbE ethernet switch, which performs a large "corner turn" operation to
|
|
|
reorder data from being sequenced by antenna index to sequencing by frequency
|
|
|
channel index. Each F-engine RFSoC handles 12 PAF antennas across all frequency
|
|
|
channels. After the corner turn, these jumbo packets are re-routed so that each
|
|
|
GPU process 25 (out of 1300) frequency channels for all 138 (+6 spares) antenna
|
|
|
signal streams.
|
|
|
|
|
|
Another important aspect of the packetizer format design shown in the linked
|
|
|
document below is the way frequency channels from each F-engine (each with a
|
|
|
unique FID index number as shown in the table) are distributed across the 50 GPU
|
|
|
XB-engines (each with a unique XID index). The processing load for some
|
|
|
XB-engine processing modes, such as HI observations using a "zoom" fine
|
|
|
resolution spectrometer, is so high that the digital back end cannot process the
|
|
|
full 305.1 MHz bandwidth. Usually the observer in these modes has no need for
|
|
|
the full bandwidth, so we do reduced width subband processing. However, if
|
|
|
channels are assigned to GPUs (XIDs) sequentially, filling up one XID with
|
|
|
channels before moving on to the next, the system would fail in increased
|
|
|
computational demand modes even with reduced bandwidth. The packet format
|
|
|
handles this by "dealing out like playing cards" one channel per XID until all
|
|
|
50 have one, then starting over for the next 50 channels, and so on. When
|
|
|
processing bandwidth is reduced, the processing load is then still evenly
|
|
|
distributed across all XIDs, rather than concentrated on a few. This keeps the
|
|
|
workload uniform across XIDs when processing demands will not support full
|
|
|
bandwidth operation.
|
|
|
|
|
|
[Ethernet Packet Specifications](../uploads/7666d16ef1f7fb6c19a746e2dbf23508/Packet_Format_2.0.pdf)
|
|
|
|
|
|
### 6.1.3 UDP Framer and 100 GbE
|
|
|
The UDP framer was developed by the Electronic Systems Design Group of
|
|
|
Rutherford Appleton Laboratories. This core converts AXI4-Stream data frames
|
|
|
from the F-engine packetizer into IEEE 802.3 Ethernet and IPv4 packets. The core
|
|
|
is very flexible, with a receive path, AXI4-Lite memory map control
|
|
|
interface, and optional PING and other IPv4 protocol functions. ALPCA will only
|
|
|
be using the UDP core to transmit packets and its ARP capabilities for
|
|
|
destination IP address look up. The outputs of the UDP core are then sent to our
|
|
|
custom wrapper IP for the integrated 100G CMAC PHY of the RFSoC. This core
|
|
|
implements CAUI-4 100G using RS-FEC (Reed-Solomon forward error correction) for
|
|
|
use on a 100GBASE-SR4 link.
|
|
|
|
|
|
The output data rate per each of the 12 RFSoC will be 81.8 Gbps. After being
|
|
|
distributed to the 25 HPCs (50 GPUs) the rate drops to 39.3 Gbps per HPC over
|
|
|
two 100 Gigabit NIC cards per each.
|
|
|
|
|
|
[Section 6.2 >>](./6.2)
|
|
|
|
|
|
### Footnotes
|
|
|
[^harris]: F. J. Harris, Multirate Signal Processing for Communication Systems.
|
|
|
Upper Saddle River, NJ, USA: Prentice Hall PTR, 2004. |
|
|
\ No newline at end of file |