mini-imageNet #515

HeYiyang2 · 2023-11-20T06:33:43Z

Due to the large size of the ImageNet dataset, I am using the MiniImageNet dataset. I modified the YAML file accordingly.
datasets:
target: flava.definitions.TrainingDatasetsInfo
selected:
- image
- vl
- text
image:
target: flava.definitions.TrainingSingleDatasetInfo
train:
- target: flava.definitions.HFDatasetInfo
key: mini_train
subset: default
data_dir: >-
/home/liumaofu/hyy/multimodal/examples/flava/mini/ok/train/
val:
- target: flava.definitions.HFDatasetInfo
key: mini_val
subset: default
data_dir: >-
/home/liumaofu/hyy/multimodal/examples/flava/mini/ok/val/
At the same time, I modified the examples/flava/data/utils. py file:
def build_datasets_from_info(dataset_infos: List[HFDatasetInfo], split: str = "train"):
dataset_list = []
for dataset_info in dataset_infos:
print(f"Loading dataset from {dataset_info.data_dir}")

    current_dataset = load_from_disk(dataset_info.data_dir)

    if dataset_info.remove_columns is not None:
        current_dataset = current_dataset.remove_columns(dataset_info.remove_columns)
    if dataset_info.rename_columns is not None:
        for rename in dataset_info.rename_columns:
            current_dataset = current_dataset.rename_column(rename[0], rename[1])

    dataset_list.append(current_dataset)

return concatenate_datasets(dataset_list)

The text was updated successfully, but these errors were encountered:

ebsmothers · 2023-12-01T18:08:56Z

Hi @HeYiyang2 apologies for the delayed response. How did you download the local dataset? I think load_from_disk should only be used in cases where the directory is created as a result of a call to save_to_disk. See e.g. this comment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mini-imageNet #515

mini-imageNet #515

HeYiyang2 commented Nov 20, 2023

ebsmothers commented Dec 1, 2023

mini-imageNet #515

mini-imageNet #515

Comments

HeYiyang2 commented Nov 20, 2023

ebsmothers commented Dec 1, 2023