Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Balar mmio with Vanadis #2428

Open
wants to merge 62 commits into
base: devel
Choose a base branch
from

Conversation

William-An
Copy link
Contributor

@William-An William-An commented Dec 12, 2024

Balar mmio with Vanadis

  • Add support for using Balar as an MMAP device for Vanadis to access
  • Create a custom CUDA runtime lib to run CUDA programs with Vanadis
  • Add more CUDA runtime API to support rodinia-2.0 benchmark
  • Add more unit test test cases

	* Add a new CUDA API id "GPU_PARAM_CONFIG" to support
	  querying kernel function argument size and alignment
	  information from GPGPU-Sim.

	* Add param "cuda_executable" to BalarMMIO so that it
	  can know the CUDA binary path when running LLVM CUDA
	  code (Vanadis cannot know the host file structure).

	* Add all the CUDA API implementations needed to link
	  the test program inside tests/vanadisLLVMRISCV.

	* Minor formatting changes.
@sst-autotester
Copy link
Contributor

Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging
NO INSPECTION HAS BEEN PERFORMED ON THIS PULL REQUEST! - This PR must be inspected by setting label 'AT: PRE-TEST INSPECTED'.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gvoskuilen @feldergast Do we want to keep the prerequisites in this readme or remove them in favor of the list that we test against? Already discussed what testing we want in the nightlies versus weeklies.


- Tested on commit `0f358dda178f96db3b0da88b2b965492c4be187d`
- Use `./configure --prefix=$SST_CORE_HOME --disable-mpi --disable-mem-pools` for sst-core config
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@William-An Did you test at all with mem pools enabled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have just tried with both mpi and mem pools enabled, and balar can work with these two options.

balar->cuda_ret.is_cuda_call_done = false;

// Create a DMA request to read the cuda call packet from cache to balar
DMAEngine::DMAEngineControlRegisters dma_registers;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@William-An Did we discuss putting this in memH or vanadis?
@gvoskuilen

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we have this settled down.

gridDim,
blockDim,
packet->configure_call.sharedMem,
packet->configure_call.stream
(cudaStream_t) packet->configure_call.stream
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do CUDA streams work in this framework?

Copy link
Contributor Author

@William-An William-An Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think rodinia 2.0 use CUDA stream, but I can create a test example for this with https://github.com/NVIDIA/cuda-samples/tree/master.

GPU_MALLOC_HOST_RET,
};

// Future: Make this into a class with additional serialization methods?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gvoskuilen @feldergast Is this going to be necessary for checkpointing/debug?

# Constans shared across components
network_bw = "25GB/s"
clock = "2GHz"
balar_mmio_testcpu_addr = 4096
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@William-An How configurable are the mmio addresses?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For testcpu, the mmio addresses can be moved around. For using with Vanadis, the address needed to be 0x80100000 as it is specified in the vanadis's hashmap for file descriptor.

clock = "2GHz"
balar_mmio_testcpu_addr = 4096
balar_mmio_vanadis_addr = 0x80100000
balar_mmio_size = 1024
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the mmio sizes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Balar does not need a large MMIO size, as CUDA call packet data are passed via pointers. But when using with vanadis, since it will be mmaped by page granularity, address range from 0x80100000 to 0x80100FFF will be mapped as /dev/balar.

uint64_t size;
uint64_t offset;
uint8_t value[200];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@William-An If this is related to the array from above, we should find a way to ensure that this is propagated everywhere that relies on it.

@@ -43,7 +48,8 @@ int main( int argc, char* argv[] ) {

// Preparing the data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why only five updates? And why is n = 10k?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I set it to 5 as Vanadis is a bit slow dealing with print syscall, and 10k is a number I picked that is large enough to run.


/**
* @file cuda_runtime_api.h
* @author Weili An ([email protected])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@William-An You should probably remove your email address from these unless you want users bugging you directly. ^-^

@sst-autotester
Copy link
Contributor

Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging
NO INSPECTION HAS BEEN PERFORMED ON THIS PULL REQUEST! - This PR must be inspected by setting label 'AT: PRE-TEST INSPECTED'.

@William-An William-An force-pushed the balar-mmio-vanadis-llvm branch from bb3a75c to ef89fe0 Compare January 12, 2025 20:12
@sst-autotester
Copy link
Contributor

Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging
NO INSPECTION HAS BEEN PERFORMED ON THIS PULL REQUEST! - This PR must be inspected by setting label 'AT: PRE-TEST INSPECTED'.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants