Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a bit flag to heartbeat with synchronization flag #8

Closed
bruno-f-cruz opened this issue Oct 15, 2023 · 10 comments · Fixed by #45
Closed

Add a bit flag to heartbeat with synchronization flag #8

bruno-f-cruz opened this issue Oct 15, 2023 · 10 comments · Fixed by #45
Assignees
Labels
proposal Request for a new feature under review An issue that is undergoing pr review

Comments

@bruno-f-cruz
Copy link
Member

It would be very helpful to have an additional flag indicating whether a device is currently synchronized (i.e. receiving a synchronization message on every second) is a source. This is super relevant for online QC but also when using the same clock device across multiple setups, since there is no easy way to communicate with the same clock synchronizer from two different computers for instance.

This change should probably be made to the core and result in a new release. If we add it as an additional array position, it should even be backward compatible.

@bruno-f-cruz bruno-f-cruz added the proposal Request for a new feature label Oct 15, 2023
@glopesdev
Copy link
Collaborator

glopesdev commented Dec 14, 2023

Feedback from SRM meeting:

  • Doesn't make sense to add it to every Harp message
  • @bruno-f-cruz and @filcarv explored the possibility of including this as data in the heartbeat message, so we don't overload all messages to the host with this bit.
  • Current options are:
    1. Create a new "status" register (might allow more bits in the future)
    2. Reserve new data in the heartbeat register

@glopesdev
Copy link
Collaborator

Unresolved questions

Status register

Pros:

  • Allow for more bits, heterogeneous structure (bitmask or flags).
  • Does not need to get tied to the U32 payload of the heartbeat register.

Cons / Questions:

  • When does the status register get sent? If once every second, will having a second register imply unnecessary delays (e.g. 1ms) compared to the heartbeat? Is this a problem (maybe not)?
  • Alternatively, can we have the "status" register send an event only on change?
  • Is trying to pack all statuses in a single structure really an advantage? There might be a need for many status registers which are tied to different hardware aspects.

Augmented heartbeat register

Pros:

  • Synchronization flag evaluation is aligned to heartbeat anyway.
  • It would be useful to be able to debug timing information further using both synchronization flags and heartbeat (e.g. check if synchronization bit is set and also compute difference between clocks).
  • Even if you miss the initial register status you can keep checking periodically that everything is synchronized.

Cons / Questions:

  • Will this break backwards compatibility? Depends on our stance on increasing array length.
  • Since payload type is U32 for seconds, we need to add an entire 32-bit word just for one bit.

@glopesdev
Copy link
Collaborator

Special status bits only for specific registers

An alternative proposal would be to reserve space in the general Harp message header to have "per-register" status / metadata. In this proposal, the status bits would then be set only for the heartbeat message, but without the need to extend the size of the register

@bruno-f-cruz
Copy link
Member Author

Special status bits only for specific registers

An alternative proposal would be to reserve space in the general Harp message header to have "per-register" status / metadata. In this proposal, the status bits would then be set only for the heartbeat message, but without the need to extend the size of the register

I don't think this is warranted since the synchronization protocol is not continuous (i.e synchronization is only ensured once every second). I think a periodic event is more inline with this.

@glopesdev
Copy link
Collaborator

I don't think this is warranted since the synchronization protocol is not continuous (i.e synchronization is only ensured once every second). I think a periodic event is more inline with this.

My suggestion was not to add this for every message, only for heartbeat events, hence register-specific status bits, I will explain better in the SRM, but I agree probably not worth it anyway.

@glopesdev
Copy link
Collaborator

Feedback from SRM meeting:

  • Draft proposal for new status register which will replace the current heartbeat event (i.e. the new status register will be sent instead of register timestamp seconds)
  • This will be a breaking behavior change since older workflows will stop receiving event 8 as heartbeat

@bruno-f-cruz
Copy link
Member Author

bruno-f-cruz commented Jan 28, 2024

Ok from all the past discussions I believe these are the critical points to move forward:

  • What register will this bit flag be added to?

    • Are we adding it to an already existing register with bit available (e.g. OpControl or CLK_CONFIG)? (bit name SynchStatus)?? I suggest Bit2 of OP_REG or Bit 5 of CLK_CONFIG
    • Are we creating a new register for this sole purpose?
      • Name: Heartbeat?
      • The register would report the status of the device periodically (1Hz).
      • Size?
    • Bit/register name? (SynchStatus)?
    • I suggest if Clk_Config.CLK_GEN is True, then the value of this new bit should be set to 1 since the device would, by definition, by synchronized by itself. This would make things on the client side much easier.
  • What is the expected behavior?

    • The current heartbeat event (using the Seconds core register`) will be disabled and replaced by a new 1hz periodic event. This event will contain a flag that will state whether the device is currently being externally (True) synchronized or generating its own clock reference (False).
    • Should we establish a upper bound for the jitter of the timestamp of this event? e.g. some devices might take a few cycles to report the event message after the synchronization protocol is successfully completed. As a result the timestamp might not have a precise sub second porpotion equal to 0. Do we need to define this jitter or will trusting the bit-flag be sufficient?
  • This bit should always be Read-Only!

Related to #32 as this register could also report the error state of the device

@bruno-f-cruz
Copy link
Member Author

Feedback from the last meeting 22/02/2024:

Intro

We want to leverage the heartbeat event to relay information about the current status of the device. For now, this could be circumscribed to the synchronization state of the device, but having the space to introduce extra flags later on is critical. This could be used to relay other "status" messages, such as error states, communication layer quality (e.g. for wireless devices), etc...

Possible implementations:

We ended up with three possible distinct implementations with respective pros and cons that I will briefly list.

Use the current OP_REG register

The most immediate solution would be to use the current OP_REG and co-opt the Bit2 of the register to implement this behavior. This change would be fully backward compatible when it comes to the interface, but people would need to change the event they use to synchronize boards. Additionally, the event would also report the current state of the device (Bit0-1) which could also be used to solve #32 if we use the State=2 value to indicate a generic error state.

This option has two main drawbacks. First, it uses the last available bit in the register, so future flags to indicate the state of the device would be impossible to add to the current configuration. Second, it "muddies" the waters by having a register that is used to Write (e.g. device state, alive status, led status) and Read-only (the new event).
The main advantage is keeping one likely relevant piece of information regarding the state of the device, in this case, the OPMODE, in the same register (and thus adjacent memory) that is responsible to report the status of the device. It is also backward compatible as mentioned before.

Extend the current OP_REG register

An extension of the previous implementation. The register would suffer a breaking change by going from a 8bit to a 16bit register, allowing more flags to be encoded in the future. It would still suffer from the other issue.
Importantly, this would be a breaking change, and its implementation should wait for a larger batch of breaking changes thus delaying its deployment.
(Side note, would it be possible to turn the register into a uint8[2] instead of uint16? This could potentially allow for backward compatibility)

New Register

  • Name: 'Heartbeat' or 'Status'
  • Size: Likely a U16 so we have ample space for future flags
  • Access: Read-only
  • The device would report the state of the bit periodically by default. The event dispatching can be turned off using the previous op_reg alive flag.
  • The current heartbeat event (using the Seconds core register) will be disabled and replaced by a new 1Hz periodic event. This event will contain a flag that will report whether the device is currently being externally (True) synchronized or generating its own clock reference (False).
  • I suggest if Clk_Config.CLK_GEN is True, then the value of this new bit should be set to 1 since the device would, by definition, by synchronized by itself. This would make things on the client side much easier.

This suggestion would be backward compatible as it adds a new register. It also splits the write vs read access issue from the previous two solutions. The main drawback is the separation of the OPMODE from this register and the consumption of another <32 core register.

@glopesdev
Copy link
Collaborator

glopesdev commented Feb 27, 2024

Some more thoughts from the discussion:

  • A valid argument raised in the meeting is that the value of R_OPERATION_CTRL can already change by itself from Active to Standby mode due to the nature of the device state machine, so reporting any internal changes to this existing register as an event might be warranted.

Note

The above notwithstanding, this is currently the only spontaneous change in the value of the register. All other bits seem intended as configuration states and will never be changed by the device. If a new status or heartbeat register is introduced, we can specifically introduce a new bit to represent whether the device is in active or standby mode (see below).

  • If we are defining a new register, it would be worth to survey at this point all possible extra bits which might be useful to report from the device. Some possible examples for discussion:
    • Error status (0=normal operation; 1=exceptional termination)
    • Synchronized (0=CLK_GEN is false and synch pulses are not being received; 1=CLK_GEN is true or synch pulses are being periodically received)
    • Active (0=device is in standby mode; 1=device is in active mode, currently either normal or speed mode)

Note

On further consideration, I feel we shouldn't entirely duplicate the state of OP_MODE in the new register, and would argue the only meaningful bit to report is whether the device is active or on standby.

  • It was argued that leftover bit 2 might be used as a backwards compatible toggle between the old heartbeat behavior of reporting R_TIMESTAMP_SECOND and the putative new register. However, this might actually be more error prone in the mid-term if devices are updated without changing their behavior, since it raises the risk of mixed formats in setups with multiple devices; also see below.

  • The leftover bit 2 is better positioned to extend OP_MODE to 8 possible values, which may be useful in future devices, e.g. wireless devices could maybe benefit from a special mode different than speed mode.

Warning

The R_OPERATION_CTRL spec is currently wrong since it lists this register as 16-bit whereas historically it has so far been an 8-bit register. This appears to be a typo when porting the original DOCX / PDF document to markdown, since the original document was correct. After discussion ends we should make sure to review all registers in this document.

@bruno-f-cruz
Copy link
Member Author

bruno-f-cruz commented Feb 29, 2024

Notes from 29022024

We will go with the new additional register. I propose the following specs:

Name: Heartbeat
Size: U16
Access: Read, Event
BitFlags:

  • 0x01: IsActive
  • 0x02: IsErrorState
  • 0x04: IsSynchronized
  • The other 13 bits will remain reserved for future use.

I will draft a PR with a complete proposal

@bruno-f-cruz bruno-f-cruz added the under review An issue that is undergoing pr review label Feb 29, 2024
@bruno-f-cruz bruno-f-cruz self-assigned this Feb 29, 2024
@bruno-f-cruz bruno-f-cruz linked a pull request Mar 24, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal Request for a new feature under review An issue that is undergoing pr review
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants