Concept - not part of any standard
Some vector instructions treat operands as a vector of one or more element groups, where each element group is a fixed number of elements. For example, complex numbers can be viewed as a two-element group (one real element and one imaginary element). As another example, the SHA-256 cryptographic instructions operate on 128-bit values represented as a 4-element group of 32-bit elements.
This section describes recommendations and terminology for generic instruction set design for vector instructions that operate on element groups.
Note
|
Element groups effectively replace the EDIV concept that was discussed before ratification of the vector spec, but removed from the ratified spec. Element groups are more flexible, supporting a wider range of group sizes, and avoid having to increase supported SEW for larger groups. |
The element group size (EGS) is the number of elements in one group, and must be a power-of-two (POT).
Note
|
Support for non-POT EGS was considered but causes many practical
complications and so has been dropped. Error checking for vl is a
little more difficult. For LMUL>1, non-POT EGSs will result in groups
straddling the individual vector registers in a vector register
group. Non-POT EGS can also cause large increases in the
lowest-common-multiple of element group sizes, which adds constraints
to vl setting in order to avoid splitting an element group across
stripmine iterations in vector-length-agnostic code.
|
The element group size is statically encoded in the instruction, often implicitly as part of the opcode.
Executing a vector instruction with EGS > VLMAX causes an illegal instruction exception to be raised.
Note
|
The vector instructions in the base V vector ISA can be viewed as all having an element group size of 1 for all operands statically encoded in the instruction. |
Note
|
Many operations only make sense with a certain number of elements per group (e.g., complex operations require a element group size of 2 and SHA-256 requires an element group size of 4). |
Note
|
Possible future vector instructions that might operate on variable-sized element groups are not considered here. |
Each source and destination operand to a vector instruction might be
defined as either a single element group or a vector of element
groups. When an operand is a vector of element groups, the vl
setting must correspond to an integer multiple of the element group
size, with other values of vl
reserved.
Note
|
More complex vector instructions could potentially have different non-unit vector lengths for different operands, but these are not considered here. |
Note
|
For example, a SHA-256 instruction would require that vl is a
multiple of 4.
|
When element group instructions are present, an additional constraint
is placed on the setting of vl
based on an AVL value (Section 6.3 of
vector spec 1.0). EGSMAX is the largest EGS supported by the
implementation. When AVL > VLMAX, the value of vl
must be set to
either VLMAX or a positive integer multiple of EGSMAX.
Note
|
As the base vector extension only has element group size of 1, this constraint is backwards-compatible. |
Note
|
This constraint prevents element groups being broken across stripmining iterations in vector-length-agnostic code when a VLMAX-size vector would otherwise be able to accomodate a whole number of element groups. |
Note
|
If EEW is encoded statically in the instruction, or if an
instruction has multiple operands containing vectors of element groups
with different EEW an appropriate SEW must be chosen for vsetvl
instructions.
|
Note
|
Additional constraints may be required for some element group instructions to ensure legal length values for all operands. |
The vtype
SEW can be used to indicate or calculate the effective
element size (EEW) of one or more operands of an element group
instruction. Where the operand is an element group, SEW and EEW refer
to the number of bits in each individual element within a group not
the number of bits in the group as a whole.
Alternatively, the opcode might encode EEW of all operands statically and ignore the value of SEW when the operation only makes sense for a single size on each operand.
Note
|
Many operations are only defined for one EEW, e.g., SHA-256 requires EEW=32. Encoding EEWs statically in the instruction removes a dynamic dependency on the SEW value and the need to check for errors in SEW values. However, ignoring SEW also prevents reuse of the static opcode with a different dynamic SEW, and in many cases, the SEW setting will be needed for regular vector instructions used to process the individual elements in the vector. |
The vtype
LMUL setting can be used to indicate or calculate the
effective length multiplier (EMUL) for one or more operands. Element
group instructions tend to exhibit a much wider range of relationships
between various operand EEW/EMUL values. For example, an instruction
might take a vector of length N of 4-element groups with EEW=8b and
reduce each group to produce a vector length N of 1-element groups
with EEW=32b. In this case, the input and output EMUL values are equal
even though the EEW settings differ by a factor of 4.
Each source and destination operand to a vector instruction may have a different element group size, different EMUL, and/or different EEW.
The element group width (EGW) is the number of bits in the element group as a whole. For example, the proposed SHA-256 instructions operate on an EGW of 128, with EGS=4 and EEW=32. It is possible to use LMUL to concatenate multiple vector registers together to support larger EGW>VLEN.
Note
|
Implementations may choose not to support instructions with EGW>VLEN due to implementation complexity. |
Note
|
If software using large EGW instructions wants to be portable across a range of implementations, some of which may have VLEN<EGW and hence require LMUL>1, then software can only use a subset of the architectural registers. Profiles can set minimum VLEN requirements to inform authors of portable software. |
Note
|
Element group operations by their nature will gather data from across a wider portion of a vector datapath than regular vector instructions. Some element group instructions might allow temporal execution of individual element operations in a larger group, while others will require all EGW bits of a group to be presented to a functional unit at the same time. |
Some element-group instructions might not support masking. Some element-group instructions might support element-level masking. Other instructions might define mask behavior in terms of a mask element group (e.g., update destination element group if any or all mask bits in mask element group are set, and/or update one or more destination mask values in a mask element group on basis of element group predicate).