Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix: fix type of number for offset and size #229

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions dlio_benchmark/configs/workload/megatron_deepspeed.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,12 @@ dataset:
data_folder: dataset/megatron-deepspeed/
format: mmap_indexed_binary
num_files_train: 1
num_samples_per_file: 277203535
record_length: 2048
num_samples_per_file: 270706
record_length: 2097152

reader:
data_loader: pytorch
batch_size: 1024
batch_size: 1
read_threads: 1
file_shuffle: seed
sample_shuffle: seed
Expand Down
2 changes: 1 addition & 1 deletion dlio_benchmark/data_generator/indexed_binary_generator.py
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ def generate(self):
out_path_spec_off_idx = self.index_file_path_off(out_path_spec)
out_path_spec_sz_idx = self.index_file_path_size(out_path_spec)
fh = MPI.File.Open(comm, out_path_spec, amode)
samples_per_loop = int(MB / sample_size)
samples_per_loop = int(MB * 16 / sample_size)

for sample_index in range(self.my_rank*samples_per_rank, samples_per_rank*(self.my_rank+1), samples_per_loop):
#logging.info(f"{utcnow()} rank {self.my_rank} writing {sample_index} * {samples_per_loop} for {samples_per_rank} samples")
Expand Down
6 changes: 3 additions & 3 deletions dlio_benchmark/reader/indexed_binary_mmap_reader.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,10 +57,10 @@ def load_index_file(self, global_sample_idx, filename, sample_index):
self.file_map_ibr[filename] = []
bin_buffer_mmap = np.memmap(offset_file, mode='r', order='C')
bin_buffer = memoryview(bin_buffer_mmap)
self.file_map_ibr[filename].append(np.frombuffer(bin_buffer, dtype=np.uint8))
self.file_map_ibr[filename].append(np.frombuffer(bin_buffer, dtype=np.uint64))
bin_buffer_mmap = np.memmap(sz_file, mode='r', order='C')
bin_buffer = memoryview(bin_buffer_mmap)
self.file_map_ibr[filename].append(np.frombuffer(bin_buffer, dtype=np.uint8))
self.file_map_ibr[filename].append(np.frombuffer(bin_buffer, dtype=np.uint64))

@dlp.log
def load_index(self):
Expand Down Expand Up @@ -113,4 +113,4 @@ def is_index_based(self):
return True

def is_iterator_based(self):
return True
return True
Loading