-
Notifications
You must be signed in to change notification settings - Fork 48
/
NEWS
428 lines (392 loc) · 34.4 KB
/
NEWS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
=========== PANDA 2024.03 ===========
Twelfth public release of PandA framework.
New features introduced:
- Internal memory architecture refactioring
- Enhanced scheduling for concurrent memory operations
- Enhanced support for single port Block RAMs
- Support for #pragma HLS interface, unroll, inline, dataflow, cache
- Support for C/C++ co-simulation of ac_channel<>/hls::stream<>
- Support for DSP fracturing
- Support for dataflow architecture generation and co-simulation
- Enhanced AXI4-Master interface support
=========== PANDA 2023.10 ===========
Eleventh public release of PandA framework.
New features introduced:
- Added support to Clang/LLVM 16
- The synthesis of ac_types of any size is now supported by Clang/LLVM 16
- C/C++ co-simulation support added for Mentor Questasim/Modelsim, Xilinx XSim, and Verilator
- Memory mapped top interface support
- C++17 is now required to build PandA Bambu
- Dependencies from Boost libraries have been dropped, only headers are needed (CentOS based system now supported)
- MachSuite benchmarks updated
- Fixed issues with Bambu LibM implementation
- Enhanced interface generation support
- Changed default HLS division implementation to Newton-Raphson
=========== PANDA 2023.01 ===========
Tenth public release of PandA framework.
New features introduced:
- Support for LLVM 13
- Support for Xilinx Alveo and Xilinx Kintex FPGA boards
- Support for ac_channels and hls::stream interfaces
- Various bugfixes and stability enhancement
- Support for customizable caches added to AXI interfaces
=========== PANDA 0.9.8 ===========
Ninth public release of PandA framework.
New features introduced:
- Added an LLVM based tree height reduction step.
- Updated support to NanoXplore NXmap: NXmap3 support, NG-ULTRA nx2h540tsc device family support
- Added support for AXI Master interface.
- Improved regression test, benchmarking, and integration flow through GitHub Actions workflows
=========== PANDA 0.9.7 ===========
Sun 20 March 2022, PandA release 0.9.7
Eighth public release of PandA framework.
New features introduced:
- Added support to CLANG/LLVM compiler versions 8, 9,10, 11 and 12.
- Added support to Xilinx Vitis HLS LLVM 2020.2. https://github.com/Xilinx/HLS
- Added support to Svelto methodology. Reference paper: M. Minutoli, V. Castellana, N. Saporetti, S. Devecchi, M. Lattuada, P. Fezzardi, A. Tumeo, and F. Ferrandi, “Svelto: High-Level Synthesis of Multi-Threaded Accelerators for Graph Analytics,” IEEE Transactions on Computers, iss. 01, pp. 1-14, 2021.
- Added support to Tensor flow optimization. Reference paper: M. Siracusa and F. Ferrandi, “Tensor Optimization for High-Level Synthesis Design Flows,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Best Paper Candidate of CODES+ISSS 2020, vol. 39, iss. 11, pp. 4217-4228, 2020.
- Example of integration between Soda-Opt (https://gitlab.pnnl.gov/sodalite/soda-opt) and Bambu: Reference paper: S. Curzel, N. Bohm Agostini, S. Song, I. Dagli, A. Limaye, C. Tan, M. Minutoli, V. G. Castellana, V. Amatya, J. Manzano, A. Das, F. Ferrandi, A. Tumeo, "Automated Generation of Integrated Digital and Spiking Neuromorphic Machine Learning Accelerators", 2021 40th International Conference on Computer-Aided Design (ICCAD).
- Paper describing the PandA-bambu framework: F. Ferrandi, V.G. Castellana, S. Curzel, P. Fezzardi, M, Fiorito, M. Lattuada, M. Minutoli, C. Pilato, A. Tumeo, "Invited: Bambu: an Open-Source Research Framework for the High-Level Synthesis of Complex Applications," 2021 58th ACM/IEEE Design Automation Conference (DAC), pp. 1327-1330.
- Improved support to discrepancy analysis. Reference paper: P. Fezzardi and F. Ferrandi, “Automated Bug Detection for High-Level Synthesis of Multi-Threaded Irregular Applications,” ACM Trans. Parallel Comput., vol. 7, iss. 4, 2020.
- Added initial support to ASIC flow based on Yosys+OpenROAD projects (see https://theopenroadproject.org/). Nangate45 and ASAP7 PDKs supported.
- Simplified the simulation/synthesis backends integration. In case the used simulator or synthesizer is in the system path, the configuration can be as simple as ../configure --prefix=/opt/panda
- Added support to multi-thread simulation when Verilator version 4 is used (it is disabled by default).
- Improved bambu memory consumption.
- Added support to AppImage bambu binary distribution (https://appimage.org/).
- Added support to xc7z045-2ffg900-VVD device.
- Added support to ECP5 Lattice semiconductor devices (e.g., LFE5UM85F8BG756C, LFE5U85F8BG756C).
- Improved and better integrated value range analysis.
- Improved support to interface synthesis by allowing references in called functions.
- Added a first support to axis interface.
- Simulation and synthesis tools are now detected at runtime. Configure options have been removed, now vendor specific install directories may be specified using Bambu option --<vendor>-root=<path> (e.g. --xilinx-root=/opt/Xilinx).
- PandA-bambu reference paper has been published at DAC: F. Ferrandi, V. G. Castellana, S. Curzel, P. Fezzardi, M. Fiorito, M. Lattuada, M. Minutoli, C. Pilato, and A. Tumeo, “Invited: Bambu: an Open-Source Research Framework for the High-Level Synthesis of Complex Applications,” in 2021 58th ACM/IEEE Design Automation Conference (DAC), 2021, pp. 1327-1330.
- Added a Google Colab notebook with many examples to play with Bambu. [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ferrandi/PandA-bambu/blob/main/documentation/tutorial_date_2022/bambu.ipynb)
Main contributions to this release come from Fabrizio Ferrandi, Michele Fiorito, Serena Curzel, Luca Pozzoni, Marco Siracusa, Riccardo Binetti.
=========== PANDA 0.9.6 ===========
Sat 11 January 2020, PandA release 0.9.6
Seventh public release of PandA framework.
New features introduced:
- Added support to BRAVE FPGAs (https://www.esa.int/Our_Activities/Space_Engineering_Technology/Shaping_the_Future/High_Density_European_Rad-Hard_SRAM-Based_FPGA-_First_Validated_Prototypes_BRAVE). Both NG-Medium and NG-Large are supported.
- added support to TASTE model-based design flow (https://taste.tools/). A new option has been added to activate the customized HLS flow: --experimental-setup=BAMBU-TASTE.
- Added support for Hardware Discrepancy Analysis. Reference paper: Pietro Fezzardi, Marco Lattuada, Fabrizio Ferrandi, Using Efficient Path Profiling to Optimize Memory Consumption of On-Chip Debugging for High-Level Synthesis. ACM Trans. Embedded Comput. Syst. 16(5): 149:1-149:19 (2017).
- Added support for OpenMP for. Reference papers: M. Minutoli, V. G. Castellana, A. Tumeo, M. Lattuada, and F. Ferrandi, “Efficient Synthesis of Graph Methods: A Dynamically Scheduled Architecture,” in Proceedings of the 35th International Conference on Computer-Aided Design, New York, NY, USA, 2016, p. 128:1–128:8. V. G. Castellana, M. Minutoli, A. Morari, A. Tumeo, M. Lattuada, and F. Ferrandi, “High level synthesis of RDF queries for graph analytics,” in Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, 2015, pp. 323-330.
- Customized Dynamic HLS flow updated. Reference paper: M. Lattuada and F. Ferrandi, "A Design Flow Engine for the Support of Customized Dynamic High Level Synthesis Flows", ACM Trans. Reconfigurable Technol. Syst. 12, 4, Article 19 (October 2019), 26 pages.
- Added initial support to C++/fortran high-level synthesis. Now, Ubuntu distributions require the installation of g++-multilib package.
- Added support to CLANG/LLVM compiler version 4, 5, 6 and 7. On multiple files, LTO is exploited.
- Added support to high-level synthesis of spec in LLVM format (i.e., file with .ll extension).
- Added to CLANG/LLVM based analysis an interprocedural value range analysis based on the following paper: Fernando Magno Quintao Pereira, Raphael Ernani Rodrigues, and Victor Hugo Sperle Campos. "A fast and low-overhead technique to secure programs against integer overflows", In Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) (CGO '13). Washington, DC, USA, 1-11.
The implementation started from code available at this link:https://code.google.com/archive/p/range-analysis/.
The original code went through a deep revision and it has been changed to be compatible with a recent version of LLVM. Some extensions have been also added:
- Added anti range support.
- Redesigned many Range operations to take into account wrapping and to improve the reductions performed.
- Integrated the LLVM lazy value range analysis.
- Added support to range value propagation of load from constant arrays.
- Added support to range value propagation of load and store from generic arrays.
- Added to CLANG/LLVM based analysis an Andersen based pointer analysis . The specific version used is described in:
"The Ant and the Grasshopper: Fast and Accurate Pointer Analysis for
Millions of Lines of Code", by Ben Hardekopf & Calvin Lin, in PLDI 2007.
The BDD library used is the Buddy BDD package By Jørn Lind-Nielsen. Users working on a Debian/Ubuntu distribution can install the libbdd-dev package.
- Added support to GCC compiler version 8.
- Added support to mingw-w64. A Windows-7 64bit binary distribution is now available. This minimal msys2 binary distribution includes bambu, GCC8 with plugin and multilib enabled and a customized version of clang7.
- Added support to Mac OSX exploiting Mac Ports project (https://www.macports.org/)
- Improved the vagrant based virtual machine generation scripts. Such scripts now create an Ubuntu 16.04 32bit VirtualBox image and an Ubuntu 18.04 64bit VirtualBox image.
- Added Vagrant scripts for Ubuntu precise, trusty,xenial and bionic, Fedora 29, CentOs7, MacOSX MacPorts.
- Improved timing constraints and timing reports.
- Added --registered-inputs=top option.
- Improved the support for discrepancy when Verilator is used as simulator and a better check of basic floating-point operations.
- Added an example using C++14 constexpr declaration and a simple GCD example written in C++.
- The building system is now based on a single configure.ac.
- Regressions are now exploiting a Jenkins based infrastructure.
- Some performance, style, and c++ improvements have been done following the suggestions coming from cppcheck, clang static analyzer, and Codacy.
- A single-precision floating-point faithfully rounded powf function has been added. It follows the method published in:
Florent De Dinechin, Pedro Echeverria, Marisa Lopez-Vallejo, Bogdan Pasca. Floating-Point Exponentiation Units for Reconfigurable Computing. ACM Transactions on Reconfigurable Technology and Systems (TRETS), ACM, 2013, 6 (1), pp.4:1–4:15.
The powf function currently does not supports subnormals.
- Ported some parts of the code to C++11 standard by applying clang-tidy modernize-deprecated-headers, modernize-pass-by-value, modernize-use-auto,
modernize-use-bool-literals, modernize-use-equals-default, modernize-use-equals-delete, modernize-loop-convert and modernize-use-override.
- Added a SRT4 implementation. Added --hls-fpdiv to select which floating-point division will be used. Current options: SRT4 for Sweeney, Robertson, Tocher floating-point division with radix 4 and G for Goldschmidt floating point division.
- Added support to ac_types and ac_math library from Mentor Graphics. Concerning the original library, the bambu/PandA library does support both LLVM/Clang and GCC compiler and it requires c++14 standard for the compilation. Initial support to ap_* objects used by VIVADO HLS has been added as well.
- Added support to top design interfaces: none (ap_none), acknowledge (ap_ack), valid (ap_vld), ovalid (ap_ovld), handshake (ap_hs), fifo (ap_fifo) and array (ap_memory). These interfaces are activated by pragmas added to the source code and when the option --generate-interface=INFER is passed to bambu. Examples of pragma use can be found in panda_regressions/hls/bambu_specific_test4.
- Added some examples taken from https://github.com/Xilinx/HLx_Examples
- Added option --clock-name, --reset-name, --start-name and --done-name to specify the top component controlling signal names.
- Renamed top component and removed the suffix _minimal_interface. Now, the top component has the same name of the top function.
- Improved and fixed the synthesis of empty functions.
- Added option --VHDL-library=libraryname to specify the library in which the synthesized function has to be compiled.
- Extended the VHDL support to other bambu library components.
- If it is supported, the compiler used to compile the PandA framework uses the c++17 standard (-std=c++17), otherwise, it uses c++11 standard.
- Integrated Mockturtle library from EPFL (https://github.com/lsils/mockturtle) to simplify LUT-based expressions.
- Integrated Abseil C++ libraries (https://abseil.io/)
- Added examples of parallel_queries from ICCAD15 and ICCAD16 papers.
- Added FPT tutorial material.
- Added PNNL19 tutorial material.
- Added PACT19 tutorial material.
- Improved makefile parallelization.
- Improved bambu determinism: two runs with the same specs produce the same HDL output.
- Fixed COND_EXPR_RESTRUCTURING step
- Fixed omp tests
- Fixed libm tests
- Fixed make check and on cpp examples
- Fixed c++ struct initialization
- Fixed floating point division
- Fixed make dist command under Github repository
Main contributions to this release come from Fabrizio Ferrandi, Marco Lattuada, Pietro Fezzardi, Nicola Saporetti, Serena Curzel, Iris Cusini, Marco Siracusa, Marco Speziali and Davide Toschi.
=========== PANDA 0.9.5 ===========
Mon 14 Aug 2017, PandA release 0.9.5
Sixth public release of PandA framework.
New features introduced:
- Added support to GCC 6 and GCC 7 (GCC 4.9 is still the preferred GCC compiler).
- Added support to bitfields.
- Added support for pointers and memory operations to the Discrepancy Analysis. Reference paper: Pietro Fezzardi and Fabrizio Ferrandi, "Automated bug detection for pointers and memory accesses in High-Level Synthesis compilers", in 2016 26th International Conference on Field Programmable Logic and Applications (FPL), 2016.
- Added new option: --discrepancy-only=comma,separated,list,of,function,names
Restricts the discrepancy analysis only to the functions whose name is in the list passed as argument.
- Added new option: --discrepancy-permissive-ptrs\n"
Do not trigger hard errors on pointer variables.
- added preliminary support to TASTE integration. Reference paper: M. Lattuada, F. Ferrandi, and M. Perrotin, “Computer Assisted Design and Integration of FPGA Accelerators in Aerospace Systems,” in Proceedings of the IEEE Aerospace Conference, 2016, pp. 1-11.
- Added support to OpenMP SIMD. Reference paper: M. Lattuada and F. Ferrandi, “Exploiting Vectorization in High Level Synthesis of Nested Irregular Loops,” Journal of Systems Architecture, vol. 75, pp. 1-14, 2017.
- Default golden reference is now input C code without any modification.
- The options --synthesize and --objectives have been removed. Now the same values passed with --objectives can
be directly passed through the option --evaluation.
- Improved the precision and the effectiveness of the Bit Value analysis and optimizations.
- Improved detection of irreducible loops.
- Improved CSE.
- Added a frontend transformations that merge some operations into FPGA LUTs.
- Now frontend explicitly introduces function calls to softfloat functions.
- Added support to block RAM with latency = 3 (--high-latency=4).
- Added bambu option --fsm-encoding=[auto,one-hot,binary].
- Added a new option: --disable-reg-init-value
Used to remove the INIT value from registers (useful for ASIC designs)
- Improved mapping of multiplications on DSPs.
- Added a GCC plugin to apply the whole program optimization starting from the topfname function instead of main function (currently only GCC 4.9 is supported).
- Added further integer division algorithms:
- non-restoring division with unrolling factor equal to 1 (--hls-div=nr1) which becomes the default division algorithm.
- non-restoring division with unrolling factor equal to 2 (--hls-div=nr2)
- align divisor shift dividend method (--hls-div=as)
- Added a specialization of the integer division working with 64bits dividend and 32bits divisor.
- Single precision floating point faithfully rounded expf and logf functions implemented following the HOTBM method published by
- Jeremie Detrey and Florent de Dinechin, "Parameterized floating-point logarithm and exponential functions for FPGAs", Microprocessors and Microsystems, vol.31,n.8, 2007, pp.537-545.
The code has been exhaustively tested and it supports subnormals.
- Single precision floating point faithfully rounded sin, cos, sincos and tan functions implemented following the HOTBM method published by
- Jeremie Detrey and Florent de Dinechin, "Floating-point Trigonometric Functions for FPGAs" FPL 2007.
The code has been exhaustively tested and it supports subnormals.
- Single precision floating point faithfully rounded sqrt function implemented following the method published by
- Florent de Dinechin, Mioara Joldes, Bogdan Pasca, Guillaume Revy: Multiplicative Square Root Algorithms for FPGAs. FPL 2010: 574-577
The code has been exhaustively tested and it supports subnormals.
- Implemented the port swapping algorithm as described in the following paper:
- Hao Cong, Song Chen and T. Yoshimura, "Port assignment for interconnect reduction in high-level synthesis," Proceedings of Technical Program of 2012 VLSI Design, Automation and Test, Hsinchu, 2012, pp. 1-4.
- Improved support to structs passed by copy.
- Improved ROM identification.
- Added a new option: --rom-duplication
Assume that read-only memories can be duplicated in case timing requires.
- Improved memory initialization.
- Added some transformations that lowered some memcpy and memset call to simple instructions.
- Improved softfloat functions for basic single and double precisions operations: sum, sub, mul and division.
Now addition and subtraction operations correctly manage operand equal to +0 and -0.
- Added three options to control which softfloat and libm libraries are used: --softfloat-subnormal, --libm-std-rounding and --soft-fp.
- Fixed builtin isnanf.
- Added double precision implementation of libm round function.
- Added __builtin_lrint, __builtin_llrint, __builtin_nearbyint to libm library.
- Fixed and improved tgamma and tgammaf function.
- Added support to parallel compilation of bambu libraries.
- Added support to the automatic configuration of newer releases of Quartus for IntelFPGAs.
- Improved verilator detection.
- Improved libicu detection.
- Improved boost filesystem macro.
- Fixed problems due to -m32 under arch linux.
- Fixed compilation problems with glpk and ubuntu 14.04.
- Fixed a problem with long double. They now have the same size of double.
- Added support to Mentor Visualizer.
- Improved components characterization and timing models.
- Extended support to VHDL.
- Now VHDL modelsim simulation uses 2008 standard.
- Extended set of synthesis scripts and synthesis results.
- Improved area reporting for Virtex4 devices.
- Improved characterization of asynchronous RAMs.
- Fixed extraction of slack delay from ISE trce and Lattice reports.
- Fixed yosys backend wr.r.t the newer Vivado releases.
- Added SLICES to the set of data collected by characterization.
- Extended set of regression tests.
Main contributions to this release come from Fabrizio Ferrandi, Marco Lattuada, Pietro Fezzardi, Alessia Chiappa, Roberta Catizzone, Andrea Pezzutti, Alessandro Comodi, Davide Conficconi, Jacopo Di Simone, Paolo Cappello, Ilyas Inajjar, Angelo Gallarello, Stefano Longari from Politecnico di Milano (Italy).
=========== PANDA 0.9.4 ===========
Wed 20 Apr 2016, PandA release 0.9.4
Fifth public release of PandA framework.
New features introduced:
- added support to GCC 5 (GCC 4.9 is still the preferred GCC compiler)
- improved support to complex builtin data types
- added an initial support to "Extended Asm - Assembler Instructions with C Expression Operands". In particular, the asm instruction could be used to inline VERILOG/VHDL code in a C source description (done by extending the multiple assembler dialects feature in asm templates: https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html)
- timing for combinational accelerator is now correctly estimated by the backend synthesis scripts (accelerator that takes a single cycle to complete).
- added option --serialize-memory-accesses to remove any memory access parallelization. It is mainly useful for debugging purposes.
- added option --distram-threshold to explicitly control the DISTRIBUTED/ASYNCHRONOUS RAMs inferencing.
- refactored of simulation/evaluation options
- added support to STRATIX V and STRATIX IV
- added support to Virtex4
- added support to C files for cosimulation
- added support to out of context synthesis on Altera boards
- now Lattice ECP3 is fully supported. In particular, the byte enabling feature required by some of the memories instantiated by bambu is implemented by exploiting Lattice PMI (Parameterizable Module Inferencing) library.
- improved and extended the integration of existing IPs written in Verilog/VHDL.
- added an example showing how asm could be inlined in the C source code: simple_asm
- added an example, named file_simulate, showing how open, read, write and close could be used to verify a complex design with datasets coming from a file
- added an example showing how python could be used to verify the correctness of the HLS process: python-bindings
- added MachSuite ("MachSuite: Benchmarks for Accelerator Design and Customized Architectures." - 2014 IEEE International Symposium on Workload Characterization.) to examples
- added benchmarks of "A Survey and Evaluation of FPGA High-Level Synthesis Tools" - IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems to examples
- added two examples showing how external IPs could be integrated in the HLS flow: IP_integration and led_example
- improved the VGA example for DE1 and ported to Nexys4 Xilinx prototyping board
- added two examples showing how it is possible to run two arcade games such as pong and breakout on a Nexys 4 prototyping board without using any processor. The two examples smoothly connect a low level controller for the VGA port plus some GPIO controllers with plain C code describing the game behavior.
- added a tutorial describing how to use bambu in designing a simple example: led_example
- refactoring of scripts for technology libraries characterization
- improved regression scripts: now panda regression consists of about 250K tests
- assertions check now have to be explicitly disabled also in release
- done port is now registered whenever it is possible
- added support to the synthesis of function pointers and to the inter-procedural resource sharing of functions (reference paper is: Marco Minutoli, Vito Giovanni Castellana, Antonino Tumeo, Fabrizio Ferrandi: Inter-procedural resource sharing in High Level Synthesis through function proxies. FPL 2015: 1-8)
- added support to speculative SDC code scheduling (controlled through option --speculative-sdc-scheduling | reference paper is: Code Transformations Based on Speculative SDC Scheduling. ICCAD 2015: 71-77)
- added two new experimental setups (BAMBU-BALANCED-MP and BAMBU-BALANCED) oriented to trade-off between area and performances; BAMBU_BALANCED-MP is the new default experimental setup
- added a discrepancy analysis to verify the correctness of the generated code (controlled through option --discrepancy | reference paper is: Pietro Fezzardi, Michele Castellana, Fabrizio Ferrandi: Trace-based automated logical debugging for high-level synthesis generated circuits. ICCD 2015: 251-258)
- added common subexpression elimination step
- reset can now be active high or active low (controlled through option --reset-level)
- added support to file IO libc functions: open, read, write and close.
- added support to assert function.
- added support to libc functions: stpcpy stpncpy strcasecmp strcasestr strcat strchr
strchrnul strcmp strcpy strcspn strdup strlen strncasecmp
strncat strncmp strncpy strndup strnlen strpbrk strrchr
strsep strspn strstr strtok bzero bcopy mempcpy
memchr memrchr rawmemchr index rindex
- improved double precision soft-float library
- added support to single and double precision complex division operations: __divsc3 __divdc3
- added preliminary support to irreducible loops
- changed the PandA hardware description license from GPL to LGPL
Main contributions to this release come from Fabrizio Ferrandi, Marco Lattuada, Pietro Fezzardi, Marco Minutoli from Politecnico di Milano (Italy).
Many thanks to Marco Boschini, Stefano Devecchi, Cumhur Erdin, Michele Renda and Christophe Clienti for their feedbacks and suggestions.
=========== PANDA 0.9.3 ===========
Mon 24 Mar 2015, PandA release 0.9.3
Fourth public release of PandA framework.
PandA now requires GCC 4.6 or greater to be compiled.
New features introduced:
- general improvement of performances of generated circuits
- added full support to GCC 4.9 family which is now the default
- improved retrieving of GCC alias analysis information
- added first version of VHDL backend
- added support to CycloneV
- added support to Artix7
- extended support to Virtex7 boards family
- added option --top-rtldesign-name that controls which is the function to be synthesized by the RTL backed
- it is now possible to write the testbench in C instead of using the xml file
- added a first experimental backend to yosys (http://www.clifford.at/yosys/)
- added examples/crc_yosys which tests yosys backend and C based testbenches
- improved Verilog testbench generation: it is now fully compliant with cycle based simulators (e.g., VERILATOR)
- added option --backend-script-extensions to pass further constraints to the RTL synthesis (e.g., pin assignment)
- added examples/VGA showing how to integrate existing HDL based IPs in a real FPGA design
- added scripts and results for CHStone synthesis of Lattice based designs
- improved support of complex numbers
- single precision soft-float functions redesigned: now --soft-float is the default and --flopoco becomes optional
- single precision floating point division implemented exploiting Goldshmidt algorithm
- improved synthesis of libm functions
- improved libm regression test
- improved architectural timing model
- improved graphviz representation of FSMs: timing information has been added
- added option --post-rescheduling to further improve the resource usage
- parameter registering is now performed and it can be controlled by using option --registered-inputs
- added a full implementation of Bit Value analysis and coupled with Value Range analysis performed by GCC
- added option --experimental-setup to control bambu defaults:
* BAMBU-PERFORMANCE-MP - multi-port performance oriented setup
* BAMBU-PERFORMANCE - single port performance oriented setup
* BAMBU-AREA-MP - multi-port area oriented setup
* BAMBU-AREA - single-port area oriented setup
* BAMBU - no specific optimizations enabled
- improved code speculation
- improved memory localization
- added option --do-not-expose-globals making possible localization of globals, as it is similarly done by some commercial tools
- added support of high latency memories and of distributed memories: zero, one and two delays memories are supported
- added option --aligned-access to drive the memory allocation towards more simple block RAM models: it can be used under some restricted assumptions (e.g., no vectorization and no structs used)
- ported the GCC algorithm which rewrites a division by a constant in adds and shifts
- added option --hls-div that maps integer divisions and modulus on a C based implementation of the Newton-Raphson algorithm
- improved technology libraries management:
* technology libraries and contraints are now managed in a independent way
* multiple technology libraries can be provided to the tool at the same time
- improved and parallelized PandA test regression infrastructure
- added support to Centos7, fedora 21, Ubuntu 14.04 and Ubuntu 14.10 distributions
- complete refactoring of output messages
Problems fixed:
- fixed problem related to Bison 2.7
- fixed reinstallation of PandA in a different folder
- fixed installation problems on systems where boost and gcc are not installed in default locations
- removed some implicit conversions from generated verilog circuits
Main contributions to this release come from Fabrizio Ferrandi, Marco Lattuada, Marco Minutoli, Pietro Fezzardi, Giulio Stramondo, Stefano Bodini from Politecnico di Milano (Italy).
=========== PANDA 0.9.2 ===========
mer 12 feb 2014, 13.22.48, CET PandA release 0.9.2
Third public release of PandA framework.
New features introduced:
- added an initial support to GCC 4.9.0,
- stable support to GCC versions: v4.5, v4.6, v4.7 (default) and v4.8,
- added an experimental support to Verilator simulator,
- new dataflow dependency analysis for LOADs and STOREs; we now use GCC alias analysis to see if a LOAD and STORE pair or a STORE and STORE pair may conflict,
- added a frontend step that optimizes PHI nodes,
- added a frontend step that performs conditionally if conversions,
- added a frontend step that performs simple code motions,
- added a frontend step that converts if chains in a single multi-if control construct,
- added a frontend step that simplifies short circuits based control constructs,
- added a proxy-based approach to the LOADs/STOREs of statically resolved pointers,
- improved EBR inference for Lattice based devices,
- now, memory models are different for Lattice, Altera, Virtex5 and Virtex6-7 based devices,
- updated FloPoCo to a more recent version,
- now, register allocation maps storage values on registers without write enable when possible,
- added support to CentOS/Scientific Linux distributions,
- added support to ArchLinux distribution,
- added support to Ubuntu 13.10 distribution,
- now, testbenches accept a user defined error for float based computations; the error is specified in ULPs units; a Unit in the Last Place is the spacing between floating-point numbers,
- improved architectural timing model,
- added a very simple symbolic estimator of number of cycles taken by a function, it mainly covers function without loops and without unbounded operations,
- general refactoring of automatic HLS testbench generation,
- added support to libm function lceil and lceilf,
- added skip-pipe-parameter option to bambu; it is is used to select a faster pipelined unit (xilinx devices have the default equal to 1 while lattice and altera devices have the default equal to 0),
- improved memory allocation when byte-enabled write memories are not needed,
- added support to variable lenght arrays,
- added support to memalign builtin,
- added EXT_PIPELINED_BRAM memory allocation policy, bambu with this option assumes that a block ram memory is allocated outside the core (LOAD with II=1 and latency=2 and STORE with latency=1),
- added 2D matrix multiplication examples for integers and single precision floating point numbers,
- added some synthesis scripts checking bambu quality of results w.r.t. 72 single precision libm functions (e.g., sqrtf, sinf, expf, tanhf, etc.),
- added spider tool to automatically generate latex tables from bambu synthesis results,
- moved all the dot generated files into directory HLS_output/dot/. Such files (scheduling, FSM, CFG, DFG, etc) are now generated when --print-dot option is passed,
- VIVADO is now the default backend flow for Zynq based devices.
Problems fixed:
- fixed all the Bison related compilation problems,
- fixed some problems with testbench generation of 2D arrays,
- fixed configuration scripts for manually installed Boost libraries; now, we need at least Boost 1.48.0,
- fixed some problems with C pretty-printing of the internal IR,
- fixed some ISE/Vivado synthesis problems when System Verilog based model are used,
- fixed some problems with --soft-float based synthesis,
- fixed RTL-backend configuration scripts looking for tools (e.g., ISE, Vivado, Quartus and Diamond) already installed,
- fixed some problems with real-to-int and int-to-real conversions, added some explicit tests to the panda regressions.
Main contributions to this release come from Fabrizio Ferrandi, Marco Minutoli, Marco Lattuada from Politecnico di Milano (Italy).
Many thanks to Razvan Nane, Berke Durak and Staf Verhaegen for their feedbacks and suggestions.
=========== PANDA 0.9.1 ===========
mar 17 set 2013, 12.33.30, CET PandA release 0.9.1
Second public release of PandA framework.
New features introduced:
- complete support of CHSTone benchmarks synthesis and verification (http://www.ertl.jp/chstone/),
- better support of multi-ported memories,
- local memory exploitation,
- read-only-memory exploitation,
- support of multi-bus for parallel memory accesses,
- support of unaligned memory accesses,
- better support of pipelined resources,
- improved module binding algorithms (e.g., slack-based module binding),
- support of resource constraints through user xml file,
- support of libc primitives: memcpy, memcmp, memset and memmove,
- better support of printf primitive for RTL debugging purposes,
- support of dynamic memory allocation,
- synthesis of libm builtin primitives such as sin, cos, acosh, etc,
- better integration with FloPoCo library (http://flopoco.gforge.inria.fr/),
- soft-float based HW synthesis,
- support of Vivado Xilinx backend,
- support of Diamond Lattice backend,
- support of XSIM Xilinx simulator,
- synthesis and testbench generation of WISHBONE B4 Compliant Accelerators (see http://cdn.opencores.org/downloads/wbspec_b4.pdf for details on the WISHBONE specification),
- synthesis of AXI4LITE Compliant Accelerators (experimental),
- inclusion of GCC regression tests to test HLS synthesis (tested HLS synthesis and RTL simulation),
- inclusion of libm regression tests to test HLS synthesis of libm (tested HLS synthesis and RTL simulation),
- support of multiple versions of GCC compiler: v4.5, v4.6 and v4.7.
- support of GCC vectorizing capability (experimental).
Main contributions to this release come from Fabrizio Ferrandi, Christian Pilato, Marco Minutoli, Marco Lattuada, Vito Giovanni Castellana and Silvia Lovergine from Politecnico di Milano (Italy).
Many thanks to Jorge Lacoste for his feedbacks and suggestions.
=========== PANDA 0.9.0 ===========
Wed Mar 14 06:50:32 CET 2012 PandA release 0.9.0
The first public release of the PandA framework mainly covers the high-level synthesis of C based descriptions.
Main contributions to this release come from Fabrizio Ferrandi, Christian Pilato and Marco Lattuada from Politecnico di Milano (Italy).