diff --git a/404.html b/404.html index 082d5588..43401427 100644 --- a/404.html +++ b/404.html @@ -12,7 +12,7 @@ - + @@ -564,7 +564,7 @@ - Links + Channels & Links @@ -1023,7 +1023,7 @@
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog and this project adheres to Semantic Versioning.
-floo_pkg
was extended with helper functions to calculate the size of AXI payloads and mapping of AXI to Floo Channels.AxiCfg
describes all the necessary parameters needed for the type definitions of a bidirectional AXI interfaceRouteCfg
describes all the necessary routing information parameters required by the chimneys.ChimneyCfg
describes all other parameters for the data path of the chimney (e.g. Mgr/Sbr port enable, number of oustanding transactions, RoB types & sizes, etc.)floo_test_pkg
now defines default configurations for all the new configuration structs that are used by the testbenches.floo_axi_router
module, which is a wrapper similar to the floo_nw_router
but for single-AXI configurations, and can be used in conjunction with floo_axi_chimney
.data_width
and user_width
fields for protocols
are now also validated to be compatible with each other.*Cfg
's is now rendered by FlooGen, either in the *_noc_pkg
or in the *_noc
module itself.floo_narrow_wide_*
modules and the corresponding testbenches were renamed to floo_nw_*
to be more concise.typedef.svh
.*Cfg
's from the floo_pkg
. In the narrow-wide chimneys, both datapaths now have their own configs (i.e. *CfgN
and *CfgW
), to reduce the verbosity of the module instantiation.*_chan_t
type previously had its own name. This was unified to payload
since *_chan_t
already determines the type of the payload.InFifoDepth
and OutFifoDepth
to be more consistent (previously ChannelFifoDepth
and OutputFifoDepth
).AxiCfg
structs to redefine the link types internally.ReorderBufferSize
parameters was shortened to RoBSize
.typedef.svh
instead of rendering them in pure SystemVerilog.nw
naming scheme.floo_*_noc
resp. floo_*_noc_pkg
which is more consistent since all other modules have the floo_*
prefix.protocols
schema was adapted a bit to be more intuitive.type
field was renamed to protocol
, which currently only accepts AXI4
. A new type
field now is used by FlooGen to now where to attach the protocol in the network interface. Currently, FlooGen only supports the narrow-wide AXI configuration, hence only narrow|wide
is allowed as type
values.direction
field in the protocol
schema is no longer required, since the direction is determined when specifying mgr_port_protocol
and sbr_port_protocol
.name
field must be unique now, since it is used by mgr_port_protocol
and sbr_port_protocol
to reference the exact protocol.network_type
field, to determine the type of network to generate. The options are axi
for single-AXI networks and narrow-wide
for the narrow-wide AXI configurations.Sam
is now sorted correctly and can be indexed with ep_id_e
values.floo_rob
was fixed. Previously, the allocation and the write process used the same counter in bursts for offset calculation, which resulted in wrong offsets.XYRouting
do now use the global id_offset
, which was previously not accounted for (or had to be specified manually).typedef.svh
, the auto-generated floo_*_pkg
packages were removed from the repository. Furthermore, all the (global) imports of those packages in the modules were replaced by parameters.tb_floo_nw_chimney
was removed since it was neither used nor maintained anymore.IdIsPort
routing algorithm was removed since it can only be used for routes over a single router. The same functionality can be achieved with the SourceRouting
algorithm.dma_mesh
testbench was removed in favor of nw_mesh
and axi_mesh
which use generated networks with FlooGen.typedef.svh
file. Further, the --only-pkg
and --pkg-outdir
flags were removed from the FlooGen CLI.floo_pkg
helper functions.floogen
. The route is encoded in the header as a route_t
field, and each router consumes a couple of bits to determine the output ports. In the chimney, a two-stage encoder was added to first determine the destination ID of the request, and then retrive the pre-computed route to that destination from a table. The floogen
configuration was extended to support the new routing algorithm, and it will also generate the necessary tables for the chimneys.MaxUniqueids
parameter. This will mitigate ordering of transactions from initially different IDs or endpoints at the expense of some complexity in the meta_buffer
which then uses id_queue
to store the meta information required to return responses.floogen
to define the direction of connections
to/from routers with dst_dir
and src_dir
flags. This replaces the previous id_offset
flag for that purpose. Specifying the direction of the connection is useful for mesh topologies with XYRouting
, but also for tile-based implementation, where the order of the ports matters resp. needs to be known.routers
in floogen
can no be configured with degree
to overwrite the number of ports. This is manily useful for tile-based implementations, where all tiles should have identical routers.floo_route_comp
now supports source-based routing, and can output both destination ID and a route to the destination.route_table_i
to receive the pre-computed routing table that is generated by floogen
.bidirectional
flag for connections
in floogen
is set to true
by default, since uni-directional links are currently not supported.chimneys
, since it is not part of the flit packages anymore.id_i
in the chimneys should not throw an error anymore in elaboration.floo_vc_arbiter
when setting NumVirtChannels
to 1, that caused issue when compiling with Verilator.IdTable
was used as the routing algorithm.floo_synth*
wrapper modules. They are moved to the internal PD repository, since they are not really maintained as part of the FlooNoC repository.EnMgrPort
and EnSbrPort
are swapped in the chimneys to be more consistent. FlooNoC defines subordinate ports as requests that go out of the NoC to AXI subordinates (i.e. memories) that return a response, and manager ports as requests that come into the NoC from AXI managers (i.e. cores).floo_narrow_wide_join
now uses axi_riscv_atomics
to filter out atomic operations. The atop_filter
are still there but are disabled by default.id_t
instead of the deprecated xy_id_t
type as a parameter.floogen
.EJECT
) was not defined.floo_synth_mesh
, floo_synth_mesh_ruche
& floo_synth_router_simple
synthesis wrappers, since they are not used anymore.floo_narrow_wide_join
which joins a narrow and a wide AXI busfloogen
is now relative to the current working directory instead of the installation folder of floogen
.AW
and W
beats over different channels would have allowed to arrive them out of order, if multiple managers are sending write requests to the same subordinate, which could result in interleaving of the data. This is now fixed by sending AW
and W
beats over the same wide channel. The AW
and W
beats are coupled together and wormhole routing prevents interleaving of the data.floogen
. Also added documentation for floogen
in the docs
folder.EnMgrPort
and EnSbrPort
to properly parametrize Manager resp. Subordinate-only instances of a chimneyXYRouteOpt
parameter to router to enable/disable routing optimizations when using XYRouting
floo
package is moved to hw/include
.LICENSE
file was updated to reflect that the project uses the Solderpad Hardware License Version 2.1
for all hw
files and the Apache License 2.0
for software related files.header
field is replaced in favor of a routing
field that better represents the information needed for routing.XYRouting
now also supports a routing table similar to the IdTable
routing table. Before the destination was determined based on a couple of bits in the address. This however did not allow for a lot of flexibility and requires a larger addres width.NoRoB
version of the reorder buffer, which could lead to overflow of countersaxi_channel_compare
was removed in favor of axi_chan_compare
from the axi
repository.flit_gen.py
including configuration files, since this is now integrated into floogen
(in conjunction with the --only-pkg
flag)*_flit_pkg
to *_pkg
axi_
, all FlooNoC links are now prefixed with floo_
resp_t
structs to rsp_t
narrow_wide_chimney
narrow_wide_router
floo_axi_rand_slave
& floo_dma_test_node
now support addr_width > 32
name
: name of the protocol. This will be used as a reference in the framework and in the generated RTL code to name the protocol module and the protocol signals. If the narrow-wide channels are used, they need to be named narrow
and wide
respectively.type
: Currently only AXI4
is supporteddirection
: the direction of the protocol. It can be either manager
or subordinate
. If an endpoint is both manager and subordinate, two protocols need to be defined.data_width
: the data width of the protocoladdr_width
: the address width of the protocolid_width
: the ID width of the protocol. Endpoints with different ID widths for the manager
and subordinate
protocols are supported.Apart from the configuration file, floogen
supports additional options to customize the generated RTL code. The following options are supported:
--outdir
: the output directory where the generated RTL code will be placed. This is equivalent to the -o
option. If it is not specified, the output is printed to stdout.--only-pkg
: only generate the package. This is useful if you want to test single IPs without generating a whole network.--pkg-outdir
: the output directory where the generated package will be placed. By default, the package in the hw
folder is overwitten, since it is also the once that is used by bender
for compiling the IPs. If you want to keep the original package, you can specify a different output directory here.--no-format
: do not format the generated RTL code. By default, the generated RTL code is formatted with verible format, for which the verible-verilog-format
binary needs to be installed. If this option is set, the generated RTL code is not formatted.--visualize
: visualize the generated network. It will create a plot of the graph of the network. If the --outdir
option is specified, the plot is saved in the output directory. Otherwise, it is shown in a window. This is mainly intended for a quick check of the generated network, not a tool for debugging.Parallel header: Instead of sending the header before the payload, the header is sent in parallel to the payload. This way, the link utilization is not degraded by header flits.
Wires are cheap now
-You might wonder why this was not used in the first place. The reason is that wires were not as cheap as they are today. Modern technologies now have >10 metal layers which can fit >10000 wires/mm. A very good source on this topic, which has also influenced the design of FlooNoC is the NOCS keynote Reflections on 21 Years of NoCS from Bill Dally, one of the pioneers in early NoC research.
-Below, we will discuss the header and the payload in more detail.
In FlooNoC the header consists of the following fields:
@@ -1255,7 +1251,7 @@rsvd
logic[x:0]
logic[RsvdBits-1:0]
Now that we established flits -- the smallest unit of data that is sent -- we can discuss how a flit is sent from one node to another. As we have explained in the flits section, there usually exist multiple types of flits, which differ in the payload they carry. For instance, the payload can be an AXI request, an AXI response, or any other data that needs to be sent from one node to another. For multiple reasons, it makes sense to send these different types of flits over different "channels", which we will discuss in this section.
+Channels are a way to separate different types of flits. For instance, one channel can be used to send AXI requests, another channel can be used to send AXI responses, and a third channel can be used to send other types of data. This separation has multiple advantages:
+Message-Level deadlocks: If all flits are sent over a single channel, it could happen that message-level deadlocks are introduced. For instance, if node A sends a request to node B, and node B sends a request to node A, both nodes both nodes might need to wait for their response before accepting a new request, which can lead to a deadlock. By separating the request and response channel, we can ensure forward progress.
+Latency: Different types of flits might have different priorities. For instance, some messages are very latency-sensitive (e.g. synchronization messages), while others are much more latency-tolerant (e.g. bulk data transfers). By separating the channels, we can ensure that the congestion can be kept low on the latency-sensitive channel, which in turn reduces the latency of these messages.
+Bandwidth: Different types of flits might have different bandwidth requirements. For instance, the data widths of AXI can reach up to 1024 bit, and AXI additionally supports burst transfers. Using wide links is the natural way to increase the bandwidth of the channel. However, smaller flits like AXI write responses are only a fraction of the link width and would waste bandwidth if sent over a wide link.
+There are essentially two different ways how to implement multiple channels:
+Virtual channels: Virtual channels are a way to multiplex multiple channels over a single physical channel. Virtual channel have the advantage that the physical channel can be used more efficiently, as it can be shared between multiple virtual channels. Moreover, message-level deadlocks can be prevented with virtual channels, as messages from different channels can be interleaved, resp. they can overtake each other. This is possible, since on the RX side of a virtual channel, every channel has its own buffers. So even if for instance the buffer for requests is full, responses can still be received. While virtual channels have its advantages, they also have some disadvantages. For instance, virtual channels require additional logic to multiplex and demultiplex the channels, which increases the complexity of the design. Furthermore, multiplexing onto a single physical channel limits the throughput of the channel.
+Physical channels: Physical channels on the other hand are real physical channels in hardware. Effectively, physical channels result in multiple separate networks used to send different types of messages throught the network. The main advantage of physical channels is the throughpt of the channel, since it is not shared with other channels. Also, routers for physical channels can be streamlined, since they don't require multiplexing of virtual channels. One disadvantage of physical channels is that they require more routing resources, as each physical channel is implemented as a separate network.
+One of the main design principles of FlooNoC is to use multiple physical channels instead of virtual channels. While the main drawback of physical channels is the increased routing resources, modern technologies come to rescue here. For instance, modern technologies usually can feature up to 20 metal layers and have routing resources of >10000 wires/mm that can be exploited to implement multiple physical channels. Not all of it is avaliable for routing of course, since some routing resources are used for cell connectivity and power distribution. However, the routing resources tend not to be the bottleneck in the design, especially not global wires on higher metal layers of the chip, which are primarly used for the routing of the physical links.
+Wires are cheap now
+A very good source on this topic, which has also greatly influenced the use of physical channels during the development of FlooNoC is the NOCS keynote Reflections on 21 Years of NoCS from Bill Dally, one of the pioneers in early NoC research.
+In FlooNoC, we use multiple physical channels to separate different types of traffic. The most basic form of FlooNoC is to use two channels req
and rsp
, to send all request resp. responses. However, traffic in an SoC can be quite diverse, and comes with different requirements. For instance, synchronization messages are usually very small in the order of a few bytes, but are very latency-sensitive. On the other hand, bulk data transfers can be very large, but are usually more tolerant to latency, since they can be issued as multiple outstanding transactions. In some systems, this is the reason why mulitple AXI interfaces are used. A narrow one for configuration and synchronization messages and a wider one for bulk data transfers. In that case, FlooNoC also featuers a wide
channel to provide high bandwidth for bulk data transfers.
req
, rsp
mappingIf only a single AXI interface is used (e.g. with 32-bit address width and 64-bit data width), the AXI channels are mapped to the FlooNoC channels as follows:
+ - + diff --git a/hw/links/index.html b/hw/links/index.html index cfba7848..1dcf8b7f 100644 --- a/hw/links/index.html +++ b/hw/links/index.html @@ -18,11 +18,11 @@ - + -+ | req |
+rsp |
+primary payload | +
---|---|---|---|
Aw |
++ | - | +addr (32-bit) |
+
Aw |
++ | - | +addr (32-bit) |
+
W |
++ | - | +w_data (64-bit) |
+
R |
+- | ++ | r_data (64-bit) |
+
B |
+- | ++ | b_rsp (2-bit) |
+
The mapping is quite straightforward. Requests from AXI manager are sent over the req
channel, while responses from AXI subordinates are sent over the rsp
channel. Message-level deadlock are also avoided this way, since requests and responses are sent over different channels.
req
, rsp
, wide
mappingIn case two AXI interfaces are used, a narrow (e.g. 64-bit) and a wide one (e.g. 512-bit), the AXI channels are mapped to the FlooNoC channels as follows:
++ | req |
+rsp |
+wide |
+primary payload | +
---|---|---|---|---|
NarrowAw |
++ | - | +- | +addr (32-bit) |
+
NarrowAr |
++ | - | +- | +addr (32-bit) |
+
NarrowW |
++ | - | +- | +w_data (64-bit) |
+
NarrowR |
+- | ++ | - | +r_data (64-bit) |
+
NarrowB |
+- | ++ | - | +b_rsp (2-bit) |
+
WideAw |
+- | +- | ++ | addr (32-bit) |
+
WideAr |
++ | - | +- | +addr (32-bit) |
+
WideW |
+- | +- | ++ | w_data (512-bit) |
+
WideR |
+- | +- | ++ | r_data (512-bit) |
+
WideB |
+- | ++ | - | +b_rsp (2-bit) |
+
In this case, the narrow AXI to req
, rsp
mapping is the same as in the single-AXI case. However, the wide AXI interface mapping is a different and requires some explanation. Unsurprisingly, the wide data channels WideR
and WideW
are mapped to the wide
channel to make use of its high bandwidth. The AXI read request WideAr
and the write response WideB
are mapped to the req
and rsp
channel, respectively. Those are smaller messages and would underutilize the wide
channel. The outlier here is the AXI write requests WideAw
, which is mapped to the wide
channel, eventhough it is a small message. The reason for this is related to the ordering of AXI transactions.
AXI supports out-of-order transactions by specifying transaction IDs (txnID
). Transactions with the same txnID
need to be ordered with respect to each other i.e. they cannot overtake each other. Transactions with different txnID
however are free to do so. The txnID
is specified in the initial requests and the corresponding read and write response also carries the same txnID
. However, the write data is a bit different in this regard. The write data W
does not feature any txnID
and needs to be sent (and eventually arrive at the AXI subordinate) in the same order as the write requests Aw
. This also needs to be guaranteed in systems with multiple AXI managers that send write requests to the same AXI subordinate. If the Aw
and W
are sent over different channels, it might be that the order of them is not preserved since those differnent channels might have different congestion levels. To avoid this, the WideAw
and WideW
are sent over the same channel, which is the wide
channel in this case. Furthermore, it also needs to be guaranteed that WideW
payloads from different AXI requesters are not interleaved in the network, since they cannot be distinguished when arriving at the destination (which would also very likely require large reorder buffers). The non-interleaving needs to be guaranteed by the routers as well, which will be discussed later in the routers section.
FlooNoC uses unions
to represent the different types of flits that are sent over the same physical channel. For instance, the req
channel for a single-AXI configuration is defined as follows:
typedef union packed {
+ floo_aw_flit_t axi_aw;
+ floo_w_flit_t axi_w;
+ floo_ar_flit_t axi_ar;
+ floo_generic_flit_t generic;
+} floo_req_chan_t;
+
A union
essentially allows to represent multiple types of data in the same number of bits. This is also why rsvd
bits are used in the flits, to ensure that the flits sent over a channel all have the same size. The generic
is not meant to represent a flit with an actual payload, but can be used to decode the type of flit from its header.
SystemVerilog Macros
+Similar to the flits, FlooNoC provides System Verilog macros in typedef.svh
to generate the channel types such as FLOO_TYPEDEF_AXI_CHAN_ALL
for a single-AXI configuration and FLOO_TYPEDEF_AXI_CHAN_ALL
for a narrow-wide AXI configuration.
floogen
FlooGen
-
+
diff --git a/sitemap.xml b/sitemap.xml
index 5b190bfc..673e1857 100644
--- a/sitemap.xml
+++ b/sitemap.xml
@@ -2,102 +2,102 @@