Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should I preprocess the SAM dataset? #106

Open
mao-code opened this issue Oct 1, 2024 · 1 comment
Open

Should I preprocess the SAM dataset? #106

mao-code opened this issue Oct 1, 2024 · 1 comment

Comments

@mao-code
Copy link

mao-code commented Oct 1, 2024

``I am reproducing the result.
For the SAM dataset download, I found this link: SAM

The dataset config file in this repo shows that it is divided into 4 subset. However, the original SAM dataset did not do that. Instead, it is just .jpg and .json file in the root folder.

Therefore, I am wondering if I should write a script to split the dataset into 4 subsets.
Here is my current code for doing that:

import os
import shutil

source_dir = 'SAM'
folders = ['0000', '0001', '0002', '0003']

for folder in folders:
    os.makedirs(os.path.join(source_dir, folder), exist_ok=True)

total_pairs = 11187

pairs_per_folder = total_pairs // 4
extra_pairs = total_pairs % 4  # This will be 3 in this case

folder_indices = [0, 1, 2, 3]
folder_counts = [pairs_per_folder] * 4

for i in range(extra_pairs):
    folder_counts[i] += 1

cumulative_counts = [sum(folder_counts[:i+1]) for i in range(len(folder_counts))]

for i in range(1, total_pairs + 1):
    # Determine which folder the current pair belongs to
    if i <= cumulative_counts[0]:
        folder = folders[0]
    elif i <= cumulative_counts[1]:
        folder = folders[1]
    elif i <= cumulative_counts[2]:
        folder = folders[2]
    else:
        folder = folders[3]

    # Move both the .jpg and .json files
    for ext in ['jpg', 'json']:
        filename = f'sa_{i}.{ext}'
        src = os.path.join(source_dir, filename)
        dst = os.path.join(source_dir, folder, filename)
        if os.path.exists(src):
            shutil.move(src, dst)
        else:
            print(f"File not found: {src}")
@mao-code
Copy link
Author

mao-code commented Oct 1, 2024

If it is true, i will make a pull request. Thank you for your reply!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant