-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding the Dromajo Platform #721
base: main
Are you sure you want to change the base?
Adding the Dromajo Platform #721
Conversation
it still does the BP program loading currently, right? |
If you are asking about loading the BP ELF into DRAM, that is not implemented yet. I am directly passing in the program as a argument to Dromajo. But @drichmond said we could use the bsg_manycore_loader.cpp library file to load the binary. |
Got it! |
This would require some modifications. That loader is for a rv32 binary and has some other vanilla core specific shiz in it. |
Well, that is atleast a base to start off with rather than writing one from scratch :) |
f2cc40d
to
87e4965
Compare
Yeah I would imagine the BP loader is much simpler — our linker script has like 4 sections |
It should be so trivial, even a parrot could do it. |
19f4ffb
to
79654a4
Compare
0fe92b4
to
cf7e823
Compare
I think I resolved most of the issues I was facing barring the ones in discussion offline. Can we get started on the review process @dpetrisko @drichmond @mrutt92 ? |
Yes @mrutt92 @drichmond , @sripathi-muralitharan will be leaving soon so it’s important we get this buttoned up by let’s say end of 1st week of july |
|
||
err |= hb_bp_read_from_mc_bridge(&fence_resp_pkt, HB_MC_FIFO_RX_RSP); | ||
is_vacant = fence_resp_pkt.response.data; | ||
} while ((err == HB_MC_SUCCESS) && !((credits_used == 0) && is_vacant)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So BP counts down to 0, not up to the credit limit (as tiles do in the manycore), is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would move error checking out of the while loop condition and make them discrete. It makes the termination condition hard to think about, and makes debugging later easier. (This would solve the |= comment above, too)
Then, I would make it: (credits_used != 0) || !is_vacant
This makes it clearer that the while loop will not terminate if either condition holds. Having the ! on the outside means I have to think about it and apply boolean algebra.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, BP counts down to 0 because the DPI module in the testbench returns the number of credits currently used (https://github.com/bespoke-silicon-group/bsg_manycore/blob/d7ab95c470bc3ade5dce2a18046339ea211f6a8c/testbenches/dpi/bsg_nonsynth_dpi_manycore.hpp#L116)
err = hb_bp_get_credits_used(&credits_used); | ||
// In a real system, this function call makes no sense since we will be sending packets to the | ||
// host on the network and are trying to check for credits to be zero and complete the fence. | ||
// It is fine here because of the system setup. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you be more explicit about why it's fine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the comment. It is fine in this setup because host packets do not go over the network. While checking for TX vacancy I don't think you want to send a packet over the network to do that.
// Intercept packets that are for the host and generate appropriate responses | ||
// TODO: Currently, these packets don't go over the network. In the real system, they will and the simulation infrastructure | ||
// must emulate that as best as possible. | ||
if ((dromajo_to_mc_packet.request.x_dst == HOST_X_COORD) && (dromajo_to_mc_packet.request.y_dst == HOST_Y_COORD)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the future I would factor this into two separate. methods, one for sending to the manycore and one for sending to the host. The if statement is fine, but there's a lot of code in a single method for transmitting packets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kept the packet transmission to the manycore in the same function and moved the host packet handling to another function. Can break it down further if needed.
pkt = reinterpret_cast<__m128i *>(&dromajo_to_mc_packet); | ||
|
||
// Allows the DPI interface to track response FIFO capacity | ||
bool expect_response = | ||
(dromajo_to_mc_packet.request.op_v2 != HB_MC_PACKET_OP_REMOTE_STORE) && | ||
(dromajo_to_mc_packet.request.op_v2 != HB_MC_PACKET_OP_REMOTE_SW) && | ||
(dromajo_to_mc_packet.request.op_v2 != HB_MC_PACKET_OP_CACHE_OP); | ||
|
||
// Attempt packet transmission | ||
// Since we trigger a call to the transmit FIFOs only when all the 32-bit Dromajo | ||
// FIFOs are full (i.e. a single 128-bit packet is ready), we need to wait until | ||
// the DPI FIFOs are ready to receive before advancing to the next operation. | ||
// This can prevent filling up of the FIFOs. However, not doing this can help in | ||
// identifying situations that might create backpressure in actual hardware and | ||
// provision for it. | ||
do { | ||
advance_time(); | ||
err = dpi->tx_req(*pkt, expect_response); | ||
} while (err != BSG_NONSYNTH_DPI_SUCCESS && | ||
(err == BSG_NONSYNTH_DPI_NO_CREDITS || | ||
err == BSG_NONSYNTH_DPI_NO_CAPACITY || | ||
err == BSG_NONSYNTH_DPI_NOT_WINDOW || | ||
err == BSG_NONSYNTH_DPI_BUSY || | ||
err == BSG_NONSYNTH_DPI_NOT_READY)); | ||
|
||
if (err == BSG_NONSYNTH_DPI_SUCCESS) { | ||
for (int i = 0;i < 4; i++) | ||
host_to_mc_req_fifo->fifo[i].pop(); | ||
} | ||
else { | ||
bsg_pr_err("%s: Packet transmission failed\n", __func__); | ||
return HB_MC_FAIL; | ||
} | ||
} | ||
is_empty = mc_is_fifo_empty(type); | ||
} while (!is_empty); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like above, I would make this transmit_device_packet. Then it's clear it's host OR device.
But it's much easier to understand now.
* Adding missing copyright headers * Bug fixes in passing arguments from x86 land to RISC-V land * Compiling manycore binary first in preparation to run SPMD/CUDA-lite programs on Dromajo/BP
* Simulator updates for argument retrieval * Simulator updated for correct updates during fence (hopefully) * Adding dromajo_transmit_device_packet() * Updating address map, adding hb_bp_get_?x_fifo_entries * Adding check for entries in Dromajo TX FIFO during fence * Putting in newlines where required
* Using a new linker script * CRT modifications for new linker script * Changing DRAM base for Dromajo
* Argument parsing hacky fix to ignore the first argument; ARGP is just ughhhh * Copyright headers in some files * Refactoring placement of some make targets * Updates to comments
8f379b8
to
9c6a5ba
Compare
/* | ||
* Reads manycore specific symbols from the program binary | ||
* @param[out] program: Pointer to a memory location containing the address of the manycore binary | ||
* @param[out] program_size: Pointer to the manycore binary size | ||
* @returns HB_MC_SUCCESS | ||
*/ | ||
static int read_symbols(unsigned char **program, size_t *program_size) { | ||
unsigned char *address; | ||
uint64_t size; | ||
|
||
// Read the address from the symbol table in the binary | ||
__asm__ __volatile__ ("la %[manycore_start_addr], manycore" | ||
: [manycore_start_addr] "=r" (address)); | ||
|
||
// The program size is already stored at a predefined location in the binary | ||
__asm__ __volatile__ ("la t0, manycore_size\n\t" | ||
"ld %[manycore_size], 0(t0)" | ||
: [manycore_size] "=r" (size)); | ||
|
||
*program_size = size; | ||
*program = address; | ||
|
||
return HB_MC_SUCCESS; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function contains the C code to read the symbols from the program binary. Let me know how you guys want to go ahead (i.e. leave it and make it BP-specific somehow or throw it or keep this PR open until the FS stuff is figured out)
# This assembly file includes the manycore binary if there is one | ||
|
||
.section .rodata.manycore | ||
|
||
.globl manycore | ||
.align 4 | ||
.type manycore, @object | ||
manycore: | ||
.incbin BSG_MANYCORE_KERNELS | ||
|
||
.globl manycore_size | ||
.type manycore_size, @object | ||
.align 4 | ||
manycore_size: | ||
.int manycore_size - manycore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is used to generate another assembly file where the BSG_MANYCORE_KERNELS flow variable is replaced with its value here during compilation. One hitch with running SPMD and CUDA tests is that you have to run the *.log
target directly (i.e. you can't run main.exec
and then run main.exe.log
). This has to do with the visibility of the BSG_MANYCORE_KERNELS variable in the flow.
This PR adds a platform for Dromajo - a 64-bit RISC-V emulator that acts as a host for the manycore. It replaces the x86 host and performs all the host operations. The x86 host only runs VCS, controls time, manages the DPI and forwards packets between Dromajo and the manycore.