Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mini-imageNet #515

Open
HeYiyang2 opened this issue Nov 20, 2023 · 1 comment
Open

mini-imageNet #515

HeYiyang2 opened this issue Nov 20, 2023 · 1 comment

Comments

@HeYiyang2
Copy link

Due to the large size of the ImageNet dataset, I am using the MiniImageNet dataset. I modified the YAML file accordingly.
datasets:
target: flava.definitions.TrainingDatasetsInfo
selected:
- image
- vl
- text
image:
target: flava.definitions.TrainingSingleDatasetInfo
train:
- target: flava.definitions.HFDatasetInfo
key: mini_train
subset: default
data_dir: >-
/home/liumaofu/hyy/multimodal/examples/flava/mini/ok/train/
val:
- target: flava.definitions.HFDatasetInfo
key: mini_val
subset: default
data_dir: >-
/home/liumaofu/hyy/multimodal/examples/flava/mini/ok/val/
At the same time, I modified the examples/flava/data/utils. py file:
def build_datasets_from_info(dataset_infos: List[HFDatasetInfo], split: str = "train"):
dataset_list = []
for dataset_info in dataset_infos:
print(f"Loading dataset from {dataset_info.data_dir}")

    current_dataset = load_from_disk(dataset_info.data_dir)

    if dataset_info.remove_columns is not None:
        current_dataset = current_dataset.remove_columns(dataset_info.remove_columns)
    if dataset_info.rename_columns is not None:
        for rename in dataset_info.rename_columns:
            current_dataset = current_dataset.rename_column(rename[0], rename[1])

    dataset_list.append(current_dataset)

return concatenate_datasets(dataset_list)

However, when executing the code:python -m flava.train config=flava/configs/pretraining/debug.yaml
, an error is reported:Directory /home/liumaofu/hyy/multimodal/examples/flava/mini/ok/train/ is neither a dataset directory nor a dataset dict directory.
The structure of my miniimagenet dataset is as follows:
miniImagenet
|-- train
| |-- class1
| | |-- image1.jpg
| | |-- image2.jpg
| | |-- ...
| |-- class2
| | |-- image1.jpg
| | |-- image2.jpg
| | |-- ...
| |-- ...
|-- val
| |-- class1
| | |-- image1.jpg
| | |-- image2.jpg
| | |-- ...
| |-- class2
| | |-- image1.jpg
| | |-- image2.jpg
| | |-- ...
| |-- ...
|-- test
| |-- class1
| | |-- image1.jpg
| | |-- image2.jpg
| | |-- ...
| |-- class2
| | |-- image1.jpg
| | |-- image2.jpg
| | |-- ...
| |-- ..
I ensure that their storage path is not a problem. May I ask why this error is reported and what should I do?

@ebsmothers
Copy link
Contributor

Hi @HeYiyang2 apologies for the delayed response. How did you download the local dataset? I think load_from_disk should only be used in cases where the directory is created as a result of a call to save_to_disk. See e.g. this comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants