Commits · 8afbb82a4af1c68510ba31f15716dca0cfb61d3f · ALPACA / xb-engine / core / rtbf

03 Jun, 2022 1 commit

Mitch Burnett authored 3 years ago

user would set the rtbf context to a weight array and then call update
weights. This was removed.

Also, initialization of the beamformer would return the device weight
memory in an indetermined state. The init now initializes the weights to
all ones [CMPLXF(1.0, 0)] and then the user is to make a call to
`update_weights` to change them. This allows the beamformer to at least
not be completely broken initially.

The update mechanics now have `update_weights` accepting an array of
float values to load to the device. The idea is that the user
creates some weights and manually update them and then they are to still
be responsible to free that memory.

simplified and improved the logic in `update_weight` to combine the
conjugate transpose into one loop instead of across multiple temporary
arrays. Two versions are left here and will follow up with a new commit
that removes the second (because I think I personally favor the first
implementation as it is more descriptive in its memory access).

8afbb82a

02 Jun, 2022 2 commits
- register user allocated memory (pinned) with CUDA · 0fda7ed6
  Mitch Burnett authored 3 years ago
```
this allows for the GPU to have DMA access to host memory directly
```
  0fda7ed6
- use rtbf info for array/matrix sizes · c941944c
  Mitch Burnett authored 3 years ago
  
  c941944c
01 Jun, 2022 1 commit

add an rtbfinfo for displaying compiled info, test at alpaca specs · 01e27dc6

Mitch Burnett authored 3 years ago

use of `compiletime_info` struct in the library. Fix static mallocs that
were larger than default stack parameters for large block testing.

cleanup `cublas_beamformer.h` definitions

01e27dc6

31 May, 2022 1 commit

rename compile parameters, work on registering host memory · a26238a2

Mitch Burnett authored 3 years ago

compile parameter names have changed to be more descriptive. still
working on some of the size parameters. Working on a struct containing
the compiled info.

started working on registering host memory, stopped to get parameters
changed and to have compiled info to compute sizes.

this has the start of detecting if pinned memory regions overlap that
seems to happen for small beamform sizes

a26238a2

29 May, 2022 3 commits
- remove iq output kernel it is redundant and not needed · fc919e49
  Mitch Burnett authored 3 years ago
```
mwr probably had it there because he might of thought it was needed to
make the data float*. But, really the memory can be reinterpreted.
```
  fc919e49
- minor edits · 5154b7a0
  Mitch Burnett authored 3 years ago
  
  5154b7a0
- evaluate and test STI capabilities of library · c9b26fe6
  Mitch Burnett authored 3 years ago
  
  c9b26fe6
28 May, 2022 6 commits

create a `weights_h` context member, simplify `update_weights` · cb6e149d

Mitch Burnett authored 3 years ago

functionality to update beamformer weights are what is left in
`update_weights`. The ability to update from a file is still needed but
should be provided as a way to populate memory and then call
`update_weights`

currently a call to init beamformer, with the `weights_h` set, without a
call to `update_weights` will lead to bogus results that would be hard
to identify. Need a way to safely know weights have been applied (or
not).

cb6e149d

use kernel function pointer to select beamforming post processing · 39f4a4d6
Mitch Burnett authored 3 years ago
```
setup of the pointer and data flow is part of internal context
initialization
```
39f4a4d6

use output poiner in internal context to select copy size and output · 484e1542

Mitch Burnett authored 3 years ago

remove the intermediate variables and use setup from internal context to
control data flow.

There might be a way to use function pointers to select a
post-processing task. Looking into it there is a cuda example from the
cuda samples in section 6 called "function pointers"

484e1542

remove intermediate variables, improve naming for readability · c9e45955

Mitch Burnett authored 3 years ago

removed non-descriptive names of variables in favor of those that were
descriptive that were just temporary movement variables

c9e45955

forward declare all kernels · 8799ba85

Mitch Burnett authored 3 years ago

not necessary, in fact probably will get rid of them, just being
consistent for now.

8799ba85

first shot implementing operational mode control · 9451b1c1

Mitch Burnett authored 3 years ago

different control flow for operation of rtbf. not sure if best way at
the moment. Because in reality BEAM_RAW_OP doesn't need to be checked
instead just dump the output.

still more to do. already know need to remove the iq kernel but this is
just a small incremental change.

9451b1c1

27 May, 2022 5 commits

move intermediate input and output pointers to internal · 91ba4109
Mitch Burnett authored 3 years ago

91ba4109

remove references to "transpose" kernel · dc981ff5

Mitch Burnett authored 3 years ago

Not sure why that implementation was still in the beamformer.

That version assumes data as received at the network (group by f-engine
packetized). The cublasGemmBatched works on batching frequency bins and
uses that as the slowest moving dimension. It is still useful code for
elsewhere in the system and was copied out in a temporary file to be
moved somewhere else more permanent (not yet decided). But I could not
figure out why it would have been kept around. The thought came to me
for looking at beamform spectra data but, still needs to be beamformed
first needing to be grouped by batch for this implementation.

(just a thought, would need to think about it)
transposing the data in that specific way may not be necessary as
long as the data is contiguous the pointer-to-pointer interface just
needs to point to the start of each batch (frequency bin) and as long as
the data is contiguous there

dc981ff5

look at transpose code · ef3dba92

Mitch Burnett authored 3 years ago

looking over transpose code since still probably faster to transpose in
GPU than CPU and will continue to do that.

But, because input assumes network ordered data means that xgpu needs to
have a transpose in it. The real goal would be to get to a gpu pipeline
implementation that gets data on device and just pass that between
kernel methods.

also, these transpose could probably be sped up with a LUT

ef3dba92

move weights, input, and beamformed data to internal context · f1b926f0
Mitch Burnett authored 3 years ago

f1b926f0

fix size cudaMalloc'd in pointer-to-pointer interface · c4b24461

Mitch Burnett authored 3 years ago

gemmBatched uses a pointer-to-pointer interface where the input arrays
to gemmBatched is an array of the memory addresses for the start of each
back. The `d_arr_*` array's are these interface array's. They were being
malloc'd with a size equal to what the data sizes are where these arrays
just contain the start of the next batch. The array size only needs to
be the batchCount.

c4b24461

26 May, 2022 2 commits
- update rtbf contexts · ce80e632
  Mitch Burnett authored 3 years ago
```
internal context to have device pointers

move gpu block dimension magic numbers to rtbf context
```
  ce80e632
- update internal context to include cublas dimensions · 5fcff834
  Mitch Burnett authored 3 years ago
  
  5fcff834
23 May, 2022 2 commits
- make rtbf intenral context, continue to extend contexts · 12daa4cb
  Mitch Burnett authored 3 years ago
  
  12daa4cb
- minimal changes to create and use rtbf context struct · fa93a496
  Mitch Burnett authored 3 years ago
  
  fa93a496
11 May, 2022 2 commits
- error check cleanup, dependency cleanup · 42005fa6
  Mitch Burnett authored 3 years ago
  
  42005fa6
- change directory name to rtbf · 1c9dc6d2
  Mitch Burnett authored 3 years ago
  
  1c9dc6d2
09 May, 2022 5 commits
- start refactoring · b7bc2a06
  Mitch Burnett authored 3 years ago
  
  b7bc2a06
- update unit test check · d80484e9
  Mitch Burnett authored 3 years ago
  
  d80484e9
- update ignore file · f70b228b
  Mitch Burnett authored 3 years ago
  
  f70b228b
- add check with unit case · cb46d99d
  Mitch Burnett authored 3 years ago
  
  cb46d99d
- fix simple build script to have Wl path correct · e067f3da
  Mitch Burnett authored 3 years ago
  
  e067f3da
08 May, 2022 3 commits

new testbench and example of working with beamformer · c3fa924c

Mitch Burnett authored 3 years ago

header downsizes for a smaller example and beamformer adjusts magic
numbers for setting up dimensions to support that.

init method setups up the weights instead of only being loaded by a
file, this makes `cublas_main` unusable but this is for setting up to
verify output and be able to start making adjustments

`testbeam` was meant to be `multibeam`. the new testbench sets up data
using the time dimension as scanning angle to plot beam patterns. So
each time element is an angle. beamformer weights are ULA and the angles
chosen split up the number of beams. All the same data is in each
channel

c3fa924c

fix for compile with cuda11 · 778e49c7

Mitch Burnett authored 3 years ago

couldn't compile for cuda 11 using complex, needed to move to cuComplex

change formatting of code

start to adjust header definition for alpaca but ended up more with
todo's and notes to try and understand what to do in moving from flag to
onr to alpaca and be more flexible/standalone

778e49c7

initial commit bringing over flag gpu beamformer · 996a6783

Mitch Burnett authored 3 years ago

flag beamformer library has always been part of a larger project, this
strips that down to a separate repo and removing much of the other
baggage

additionally, a lot of changes have been made to adjust flag to onr but
we need now need the beamformer to look more like flag and there was no
tag or real commit to revert back

996a6783