Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"New" TTDevice class #40

Closed
wants to merge 62 commits into from
Closed

"New" TTDevice class #40

wants to merge 62 commits into from

Conversation

vtangTT
Copy link
Contributor

@vtangTT vtangTT commented Aug 27, 2024

This PR aims to separate/clean up tt_SiliconDevice by separating all functions that interact directly with the PCIe into the TTDevice class.

End goal:

  • something similar to host_api.h from metal, where the essential functions left in tt_SiliconDevice (write_to_device, read_from_device, tlb setup, etc.) will be exposed to TT-metal
  • those functions will use TTDevice class to interact with the PCIe
  • any helper functions will be moved to a separate util file

Next steps:

  • get rid of tt_SiliconDevice class and rename tt_silicon_driver.cpp and tt_device.h into something more appropriate (open to suggestions)
  • split out helper functions to a diff file
  • rename pcie_device.cpp .hpp into tt_device.cpp .hpp
  • implement the necessary metal changes

UMD post commit: https://github.com/tenstorrent/tt-metal/actions/runs/10636550474

@vtangTT vtangTT changed the title init changes for tt_device New TTDevice class Aug 30, 2024
@vtangTT vtangTT changed the title New TTDevice class "New" TTDevice class Aug 30, 2024
Copy link
Contributor

@joelsmithTT joelsmithTT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A step in the right direction, IMO.

I've commented on various parts of the diff, not all directly related to this PR's goals. These don't need fixing right now - just areas that could be opportunity for future improvement.

device/pci_device.cpp Show resolved Hide resolved
device/pci_device.cpp Show resolved Hide resolved
device/pci_device.cpp Show resolved Hide resolved
device/pci_device.cpp Show resolved Hide resolved
uint32_t *dest = reinterpret_cast<uint32_t*>(data);

while (word_len-- != 0) {
uint32_t temp = *src++;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a workaround for something. I am not convinced it is needed.
A similar comment from me in the past resulted in similar code being removed from the write path.

device/pci_device.hpp Show resolved Hide resolved
device/pci_device.cpp Show resolved Hide resolved
Comment on lines +285 to +289
static const char sys_pattern[] = "/sys/bus/pci/devices/%04x:%02x:%02x.%u/%s";
char buf[sizeof(sys_pattern) + 10];

// revision pattern = "/sys/bus/pci/devices/%04x:%02x:%02x.%u/revision"
std::snprintf(buf, sizeof(buf), sys_pattern, pcie_domain, pcie_bus, pcie_device, pcie_function, "revision");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if the code that does this in multiple places is still around, but constructing the path so the caller can append "revision" or "current_link_width" or whatever could be a useful utility function.

device/pci_device.cpp Show resolved Hide resolved
@@ -3627,7 +2929,7 @@ void tt_SiliconDevice::write_mmio_device_register(const void* mem_ptr, tt_cxy_pa
// Copy value from main buffer to aligned buffer
std::memcpy(aligned_buf.local_storage, mem_ptr, size);
}
write_regs(dev, mapped_address, aligned_buf.block_size / sizeof(uint32_t), aligned_buf.local_storage);
pci_device->write_regs(mapped_address, aligned_buf.block_size / sizeof(uint32_t), aligned_buf.local_storage);
Copy link
Contributor

@joelsmithTT joelsmithTT Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again - not really related to your change, just pointing it out for future consideration --

It seems that because this write_regs interface accepts a block of data (as opposed to a single u32), it was natural to hook it up to UMD's device write interface (which accepts a block of data and conflates register vs memory access) instead of giving UMD a sane interface for register writes.

I regard this area of code as broken because if the buffer does get fixed up on account of (size % 4 != 0), it looks like hardware receives (aligned_buf.block_size - size) bytes of garbage, which is almost certainly not what the caller wants.

I am not sure that the buffer here should be getting fixed up at all.

@vtangTT vtangTT force-pushed the metal-main branch 2 times, most recently from d2e1711 to 4e74102 Compare September 10, 2024 20:42
@broskoTT
Copy link
Contributor

A step in the right direction, IMO.

I've commented on various parts of the diff, not all directly related to this PR's goals. These don't need fixing right now - just areas that could be opportunity for future improvement.

I really like to push towards small PRs. That said, this PR currently is small in terms it doesn't have many functional changes, only a lot of lines change due to file move/rename.

I'd rather if this PR goes in like this, and we can create an issue to go back to all your helpful comments that you left on this PR and fix that stuff further. That way we can chunk out the work in more pieces and paralelize and prioritize better (for instance this PR as a whole has a higher priority than some minor additional refactorings), but also have more focused code changes which makes changes less error prone. @joelsmithTT does that sound fine to you?

@pjanevskiTT pjanevskiTT mentioned this pull request Sep 12, 2024
3 tasks
@vtangTT vtangTT force-pushed the metal-main branch 6 times, most recently from 83c55b8 to 378bba9 Compare September 13, 2024 17:28
@broskoTT
Copy link
Contributor

Please make the PR target "main".
Rebase now after you've completed your other PR.
Joel is fine with my suggestion above, so rather than solving all of his comments, you can create an issue for us to get back to it.

@vtangTT vtangTT changed the base branch from metal-main to main September 17, 2024 13:29
@vtangTT
Copy link
Contributor Author

vtangTT commented Sep 17, 2024

Closing this PR cuz rebasing is not worth the time.

Please see new PR #64

@vtangTT vtangTT closed this Sep 17, 2024
@vtangTT vtangTT deleted the vtangTT/device branch October 17, 2024 17:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants