You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I was wondering what method you use for FFT size not equal to a power of two (chirp z-transform or factorization). I picked up even lengths with prime factors and ran a test.
The results confused me:
Run on (12 X 2600 MHz CPU s)
CPU Caches:
L1 Data 32 KiB
L1 Instruction 32 KiB
L2 Unified 256 KiB (x6)
L3 Unified 12288 KiB
Load Average: 2.70, 3.20, 3.63
--------------------------------------------------------------
Benchmark Time CPU Iterations
--------------------------------------------------------------
BM_FFT_KISS/1018 1387 us 1386 us 444
BM_FFT_KISS/2246 6771 us 6771 us 103
BM_FFT_KISS/11200 337 us 337 us 2090
BM_FFT_KISS/11202 62048 us 61918 us 12
BM_FFT_KISS/11182 171242 us 171165 us 4
BM_FFT_KISS/11146 169774 us 169752 us 4
BM_FFT_KISS/16384 201 us 201 us 3476
The difference is 500-800 times between the worst and the best option. Most users don't think about implementation details and can get serious performance issues.
The fftw3 implementation does not have this problem:
BM_FFTW3_FLOAT/1018 11.7 us 11.7 us 59338
BM_FFTW3_FLOAT/2246 43.4 us 43.4 us 16293
BM_FFTW3_FLOAT/11200 24.2 us 24.2 us 29191
BM_FFTW3_FLOAT/11202 230 us 230 us 2918
BM_FFTW3_FLOAT/11182 257 us 257 us 2734
BM_FFTW3_FLOAT/11146 260 us 260 us 2727
BM_FFTW3_FLOAT/16384 36.7 us 36.7 us 19199
As I understand it, you use factorization and calculate the FFT for each factor, and then combine the results. It's probably worth adding a separate stage for large prime factors (e.g. czt ) to smooth out this problem.
In my implementation, I use the czt algorithm. This gives the worst result on good multipliers, but it has a fixed difficulty.
Here is an example of my benchmark:
BM_FFT_DSPLIB/1018 46.0 us 46.0 us 15287
BM_FFT_DSPLIB/2246 230 us 230 us 3031
BM_FFT_DSPLIB/11200 1100 us 1100 us 634
BM_FFT_DSPLIB/11202 1138 us 1138 us 636
BM_FFT_DSPLIB/11182 1132 us 1132 us 628
BM_FFT_DSPLIB/11146 1107 us 1107 us 618
BM_FFT_DSPLIB/16384 222 us 222 us 3236
PS: All bencmarks for complex type
The text was updated successfully, but these errors were encountered:
Hi, I was wondering what method you use for FFT size not equal to a power of two (chirp z-transform or factorization). I picked up even lengths with prime factors and ran a test.
The results confused me:
Factorization result:
The difference is 500-800 times between the worst and the best option. Most users don't think about implementation details and can get serious performance issues.
The fftw3 implementation does not have this problem:
As I understand it, you use factorization and calculate the FFT for each factor, and then combine the results. It's probably worth adding a separate stage for large prime factors (e.g. czt ) to smooth out this problem.
In my implementation, I use the czt algorithm. This gives the worst result on good multipliers, but it has a fixed difficulty.
Here is an example of my benchmark:
PS: All bencmarks for complex type
The text was updated successfully, but these errors were encountered: