-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ved feedback #7
Comments
Agreed upon changes so far:
|
Should be: This follows from Smstateen: |
Yes, that's what I meant to say. |
|
|
|
Additional email feedback from Ved, after resolution of the above:
Nits:
|
Resolution of the above:
|
Additional changes agreed upon via email:
Other various text or clarification improvements added. |
Via email:
I have the following feedback on the CTR specification. Later in this email I
have some editorial suggestions as well.
The STE bit and ETEN bit are mutually redundant for U->S/VU->VS trap. The
STE bit is in conjunction with ETEN and is only used for a VU/VS->HS trap. When H
extension is not implemented, the specification would require a redundant ETEN.
Consider optimizing to remove ETEN and using the AND of the TE bits leading to
the privilege level where the trap is delegated as the effective TE instead
of the
and
of the ETEN and *TE.Remove the requirement that if a DEPTH value is supported then all lesser
depth values must be supported. Else this will not be compatible with a
read-only DEPTH encoding in vsctrcontrol.
Add non-normative notes about guarding against leaking a higher privileged
pc
on trap return.Add rationale for reserving some DEPTH encodings.
Is the CLR control required and justified? The following options were discussed
on the email thread.
using CSR writes. This requires 2^(DEPTH+4) * 4 CSR writes in a loop on
context switches.
this may be implemented optimally.
And an implementation note.
When mstateen0.CTR is 0, hstateen0.CTR should appear to be read-only and 0.
Consider a FREEZE-CLR control bit to avoid the read-modify-write hazard on
the FREEZE bit or provide details on how such hazards can be avoided by
SW actions.
Reset behavior of custom fields should be unspecified. Reconsider the
reservation of 2 bits in control for custom extensions. At best they may be
unused and at worst insufficient.
Clarify how WRPTR is modulated on change to DEPTH field - both increasing
and decreasing depth. See below for suggestions.
Clarify intention for recommending clearing of inaccessible entries when
reducing depth. If this is to allow powering down those entries then
that should be also specified that inaccessible entries may hold a legal but
unspecified value when made accessible.
Implicit writes to source/target entry are specified to zero extend from
XLEN to MXLEN. However explicit writes are specified to require a
read-modify-write to preserve the bits beyond XLEN. What is justification
for requiring a read-modify-write and not specifying a zero-extend from
XLEN to MXLEN even on CSR writes.
CTR source and destination registers would want to be specified similar to
xEPC in holding at least one invalid address and allowing invalid addresses
to be transformed to a invalid address that can be held in the register. Further
add clarification about needing to hold all valid PA as well (as specified
for tval).
When only IALIGN=32 is supported the bit 1 of source and destination entry
be read-only zero.
Rationale for metadata being a 256-bit register should be included.
Are cycles in ctrdata required to be obtained from the same clock as mcycle. If
so that should be clarified.
The Freeze section should include the vsctrcontrol description for lcofi/breakpoints
that occur in V=1.
ved
Editorial suggestions:
Original: "A method to record control flow transfer history is valuable for
performance profiling usages, as well as for debug."
Suggestion: "A method for recording control flow transfer history is valuable
not only for performance profiling but also for debugging."
Original: "Control flow transfers refer to jump instructions (including function
calls and returns), taken branch instructions, traps, and trap returns."
Suggestion: This sentence is clear and technically accurate.
Original: "Profiling tools like Linux perf collect control transfer history when
sampling software execution, enabling users, and tools like AutoFDO, to identify
hot paths for optimization."
Suggestion: "Profiling tools, such as Linux perf, collect control transfer
history during software execution sampling. This enables users and tools like
AutoFDO to identify hot paths for optimization."
Original: "CTR defines a circular (FIFO) control transfer history buffer."
Suggestion: "CTR defines a circular (FIFO) buffer for control transfer history."
Original: "The source PC, target PC, and some optional metadata is stored for
each recorded transfer."
Suggestion: "The source PC, target PC, and some optional metadata are stored for
each recorded transfer."
Original: "Logical entry 0 is always the youngest recorded transfer, entry 1 is
the next youngest, etc."
Suggestion: "Logical entry 0 always corresponds to the most recent recorded
transfer, followed by entry 1 as the next most recent, and so on."
Original: "The machine-level extension Smctr encompasses all added CSRs and all
behavior modifications for a hart, over all privilege levels."
Suggestion: "The machine-level extension, Smctr, encompasses all newly added
Control Status Registers (CSRs) and behavior modifications for a hart across all
privilege levels."
Original: "The associated supervisor-level extension Ssctr is essentially the
same as Smctr except for excluding the machine-level CSRs and behaviors that
should not be directly visible to supervisor level."
Suggestion: "The corresponding supervisor-level extension, Ssctr, is essentially
identical to Smctr, except that it excludes machine-level CSRs and behaviors not
intended to be directly accessible at the supervisor level."
Original: "Smctr is dependent on the Smcsrind extension. Ssctr is dependent on
the Sscsrind extension."
Suggestion: "Smctr depends on the Smcsrind extension, while Ssctr depends on the
Sscsrind extension."
Consider using wavedrom to render the register layout to be consistent with
other specifications and for improved readbility. Add figure number captions to
the register layout and table number captions to tables. Consider highlighting
CSR names and fields using backticks -
vsctrcontrol
. Consider using__x__ctrcontrol
insted of *ctrcontrol.Mark "0" fields in the registers as WPRI.
The STE bit and ETEN bit are mutually redundant for U->S/VU->VS trap. The
STE bit is in conjuntion with ETEN is only used for a VU/VS->HS trap. When H
extension is not implemented, the specification would require a redundant ETEN.
Consider optimizing to remove ETEN and using the AND of the TE bits leading to
the privilege level where the trap is delegated as the effective TE.
DEPTH[3:0] implies a 4 bit field. It should be 3 bit field. Can remove the [3:0]
label as width of DEPTH field is provided in the register layout about. Is there
intent to use DEPTH as anything besides depth for future and is why `11x are
reserved? If not why not specify the depth as 2^(DEPTH+4)?
Original: "The depth of the CTR buffer dictates the number of entries to which the
hardware will record transfers."
Suggestion: "The depth of the CTR buffer dictates the number of entries to which
the hardware records transfers."
Original: "For a depth of N, the hardware will record transfers to entries 0..N-1."
Suggestion: "For a depth of N, the hardware records transfers to entries 0 through N-1."
Original: "All entry registers are read-only 0 when the selected entry is in N..255."
Suggestion: "All entry registers read as '0' and are read-only when the selected
entry is in the range N to 255."
Original: "The maximum DEPTH value supported is implementation-specific, but all
lesser values must also be supported."
Suggestion: "The maximum DEPTH value supported varies by implementation. If a
certain DEPTH value is supported then all lesser values must also be supported."
However, this may end up contradicting the requirement that the DEPTH is
read-only reflection of the sctrcontrol.DEPTH when V=1. Suggest removing this
requirement that all lesser values must be supported.
Original: "If software writes a DEPTH value above the maximum supported, DEPTH
must read back the maximum supported value."
Is it required to read back the maximum supported value. It would be better to
allow implementation to report any legal value following WARL.
Original: "An implementation may opt to hardcode some or all of the bits in this
field, based on the depth options supported."
Suggestion: "An implementation may choose to hardcode some or all bits in this
field, depending on the supported depth options."
However, this is implied by WARL.
Custom field should not require a specified reset behavior.
Original: "All fields are optional save for M, CLR, BPFRZ, and DEPTH."
Suggestion: "All fields are optional except for M, CLR, BPFRZ, and DEPTH."
Original: "All unimplemented fields are read-only 0, while other defined fields
are writable, though DEPTH may have some read-only 1 bits."
Suggestion: "All unimplemented fields are read-only with a default value of 0.
Other defined fields are writable, although DEPTH may contain some bits that are
read-only and set to 1."
Original: "S must be writable if S-mode is implemented, U must be writable if
U-mode is implemented, and LCOFIFRZ must be writable if the Smcofpmf/Sscofpmf
extension is implemented."
Suggestion: "The S field must be writable if S-mode is implemented. Similarly, U
must be writable if U-mode is implemented, and LCOFIFRZ must be writable if the
Smcofpmf/Sscofpmf extension is implemented."
Original: "Software may opt to use a depth less than the maximum supported in
order to reduce the latency of saving and restoring CTR state, or to emulate the
maximum depth supported by other implementations, e.g., in cases of VM-migration."
Suggestion: "Software may choose to use a depth less than the maximum supported to
reduce the latency of saving and restoring CTR state, or to emulate the maximum
depth supported by other implementations, for example, during VM migration."
Original: "When reducing CTR depth, by writing mctrcontrol.DEPTH to a smaller value,
software should set mctrcontrol.CLR. This ensures that no transfer state is retained
in the now-inaccessible entries above the new depth value."
What is the intent of this guideline to clear the inaccessible entries. Is it to
allow implementations freedom to power down the inaccessible entries? What harm
arises from not clearing the inaccessible entries?
"vsctrcontrol.DEPTH is a read-only copy of sctrcontrol.DEPTH"
See comment about requiring all lesser values to be also supported.
Original: "The sctrstatus register provides access to CTR status information,
and is updated by the hardware when CTR is active (in an enabled privilege mode
and not frozen)."
Suggestion: "The sctrstatus register grants access to CTR status information and
is updated by the hardware whenever CTR is active, which is when it's in an
enabled privilege mode and not frozen."
WRPTR - specify behavior of legal values that WRPTR can assume when DEPTH value
is changed. Suggest that bits 7:(7 - DEPTH+4) assume a value of 0 and DEPTH+3:0
assume an unspecified but legal value when DEPTH field is written. The statement
"wraps on increment" should be expanded to state "wraps to assume a value of
2^(DEPTH+4) - 1.
"Logical entry 0, accessed via mireg* when miselect=0x200, is always the physical
entry preceding the WRPTR entry ((WRPTR-1) % depth)." should this be CTR entry
(WRPTR-1) % depth? Further since lower case depth is used to imply
2^(DEPTH+4) - it may be good to be explicit about that.
Original: "Because the sctrstatus register is updated by hardware, writes should
be performed with caution."
Suggestion: "Since the sctrstatus register is updated by hardware, writes to it
should be performed cautiously."
Original: "If a multi-instruction read-modify-write to sctrstatus is performed
while CTR is active, such that a qualified transfer, or trap that causes CTR
freeze, completes between the read and the write, a hardware update could be
lost."
Suggestion: "If a multi-instruction read-modify-write operation is performed on
sctrstatus while CTR is active, and a qualified transfer or a trap that causes
CTR to freeze occurs between the read and write stages, a hardware update might
be lost."
Should there be a FREEZE-CLEAR control bit?
Original: "A future extension could add support for RV32, by adding 3 new CSRs
(mctrcontrolh, sctrcontrolh, and vsctrcontrolh) to provide this access."
Suggestion: "A future extension could add support for RV32 by adding 3 new CSRs
(mctrcontrolh, sctrcontrolh, and vsctrcontrolh) to provide this access."
Original: "The miselect index range 0x200..0x2FF is reserved for CTR entries
0..255. When miselect holds a value in this range, mireg provides access to
ctrsource, mireg2 provides access to ctrtarget, and mireg[3456] provide access
ctrdata."
Suggestion: Consider using 0 through 255. For mireg[3456] - mireg[3/4/5/6].
Original: "ctrsource is an MXLEN-bit WARL register that must be able to hold all
valid virtual addresses. It need not be able to hold an invalid address. When
XLEN < MXLEN, software access via *ireg will access only the lower XLEN bits of
ctrsource, and implict writes (by recorded transfers) will be zero-extended."
Suggestion: implict -> implicit (also in ctrtarget description)
Does "software access" imply only read and not writes? So while implicit writes
do a zero extend but explicit CSR writes will be required to preserve the
MXLEN-1:XLEN bits? Why not zero extend for CSR writes to avoid needing a
read-modify-write in hardware.
It may be good to treat this similar to *epc i.e. must hold all valid addresses
but need not hold all invalid addresses.
"It need not be capable of holding all possible invalid addresses. Prior to writing
ctrsource, implementations may convert an invalid address into some other invalid
address that ctrsource is capable of holding."
With that this note can be dropped "A transfer from an invalid address (which
could only occur on an exception) may report a valid address in ctrsource.PC."
There is one case where a jump to an invalid address would record the invalid
address on the trap caused by the instruction access-fault/(guest)page-fault trap.
Add a informational note that when address translation is not in effect, virtual
and physical address are equal and so the set of addresses that can be held in
ctrsource must include the set of physical addresses that can be used as a valid
pc.
Use lower case pc with emphasis.
Add that on implementations that support only IALIGN=32, the bit 1 of ctrsource
is also read only 0.
Similar comments about ctrtarget.
Since CTR is not supported for XLEN 32, the indirect alias CSRs table can be
updated to remove the XLEN=32 column.
Add a non-normative note for rationale for metadata being a 256 bit register.
Original: "If the H extension is implemented and mstateen0.CTR=1, the hstateen0.CTR
bit controls access to supervisor CTR state (sctrcontrol, sctrstatus, and sireg* when
siselect is in 0x200..0x2FF) when V=1."
Should add a note that when mstateen0.CTR is 0, hstateen0.CTR is appears to be
read-only and 0.
Original: "When mstateen0.CTR=1 and hstateen0.CTR=1, VS-mode accesses to supervisor
CTR state behave as described in CSRs and Entry Registers above. When mstateen0.CTR=1
and hstateen0.CTR=0, VS- mode accesses to supervisor CTR state raise a virtual
instruction exception.
With that, the qualification - "and hstateen0.CTR" is not required.
Original: "When hstateen0.CTR=0, qualified control transfers executed while V=1 will
continue to implicitly update Entry Registers and sctrstatus."
Suggestion: "...sctrstatus (really vsctrstatus)"
Original: "Such qualified transfers update the Entry Registers at logical entry
0, such that older entries are pushed down the stack (the record previously in
entry 0 is pushed to entry 1, the record previously in entry 1 is pushed to
entry 2, etc)."
Suggested: "Such qualified transfers update the Entry Registers at logical entry 0.
As a result, older entries are pushed down the stack: the record previously in entry 0
moves to entry 1, the record in entry 1 moves to entry 2, and so on."
Section 5.1 - it may be helpful if the second paragraph was split into two
paragraphs. One to discuss traps and second to discuss trap returns.
"The ctrdata register may optionally include a count of CPU cycles elapsed since the
prior CTR record."
Is this intended to be identical to cycles recorded in mcycles. If so then
that should be spelt out.
"An implementation must clear CCV for the next recorded transfer upon a write to
[ms]ctrcontrol"
On all writes?
Freeze section should include the vsctrcontrol descrition for lcofi/breakpoints
that occur in V=1.
Custom extensions - could they have used a custom indirect CSR? Is there a major
value in reserving 2 bits for custom extensions in *ctrcontrol? It seems best to
make all bits in *control, sctrstatus and entries as WPRI. Custom extensions can
add a bank of registers using the select values designated for custom
extensions.
The text was updated successfully, but these errors were encountered: