Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

微调训练视频数据读取问题 #30

Open
wangyin717 opened this issue Jul 8, 2024 · 1 comment
Open

微调训练视频数据读取问题 #30

wangyin717 opened this issue Jul 8, 2024 · 1 comment

Comments

@wangyin717
Copy link

在二阶段微调训练时,会输出video有问题的信息:

Error loading data at index 23971: Video not found: /dev/shm/vlm/MiniGPT4Qwen/cache/dataset/videochatgpt/activitynet_videos/v_IrTqW6Qn8mI.mp4
Error loading data at index 71068: Video not found: /dev/shm/vlm/MiniGPT4Qwen/cache/dataset/videochatgpt/activitynet_videos/v_aV5DMcsNMmk.mp4
Error loading data at index 51648: Video not found: /dev/shm/vlm/MiniGPT4Qwen/cache/dataset/videochatgpt/activitynet_videos/v_MlbM7Mew0Ys.mp4
Error loading data at index 80768: 'video'
Error loading data at index 29235: Video not found: /dev/shm/vlm/MiniGPT4Qwen/cache/dataset/videochatgpt/activitynet_videos/v_AA1wvSZ4Mno.mp4
Error loading data at index 81059: 'video'
Error loading data at index 99616: 'video'
Error loading data at index 80812: 'video'
Error loading data at index 91812: 'video'
Error loading data at index 20455: Video not found: /dev/shm/vlm/MiniGPT4Qwen/cache/dataset/videochatgpt/activitynet_videos/v_J_SD_hhGET8.mp4
......

分析发现是一些视频数据没有截取到图像帧,在这里的ret会返回False(一部分视频返回False,其他视频能够正常返回True,返回False的视频路径对应的视频存在于数据集中):

导致输出信息

raise AssertionError(f"Video not found: {video_path}")

说明这里没有在视频中截取到图像,但是我把报错视频下载下来,发现视频没有问题。现在不知道问题出在哪里。

附配置文件 sft.yaml :

model:
  arch: minigpt4qwen
  model_type: qwen7b_chat
  load_finetuned: True
  load_pretrained: True

  # pretrained: "https://storage.googleapis.com/sfr-vision-language-research/LAVIS/models/InstructBLIP/blip2_pretrained_flant5xxl.pth"
  pretrained: "ckpt/blip2/blip2_pretrained_flant5xxl.pth"
  finetuned: "/dev/shm/vlm/MiniGPT4Qwen/lavis/output/pp_7b_video/pretrain/global_step295/model.pth"

  # vit encoder
  vit_model: "eva_clip_g"
  image_size: 224
  drop_path_rate: 0
  use_grad_checkpoint: True
  vit_precision: "fp16"  # 如果你要打开vit进行训练,这里需要调整成fp32,否则如果开启amp混合精度训练会有问题(在scaler处报错,因为没有实现一个fp16的AdamW)
  freeze_vit: True
  unfreeze_pos_embed: False

  # Q-Former
  num_query_token: 32
  qformer_text_input: False
  freeze_qformer: True
  freeze_queries: True

  # projection
  freeze_proj: False

  # path to Vicuna checkpoint
  llm_model: "/dev/shm/vlm/MiniGPT4Qwen/cache/ckpt/Qwen-7B-Chat"

  # unfreeze LLM for better chat
  freeze_llm: False

  # lora config
  get_lora: False
  lora_alpha: 32
  lora_r: 8
  lora_dropout: 0.05

  # text length when training
  max_txt_len: 1536 # 512

  # enable autocast of vit
  enable_autocast: False

datasets:
  llava_instruct_156k: # name of the dataset builder
    vis_processor:
        train:
          name: "blip2_image_train"
          image_size: 224
    text_processor:
        train:
          name: "base_instruction"
          max_words: 200

  videochatgpt_100k: # name of the dataset builder
    vis_processor:
        train:
          name: "blip2_image_train"
          image_size: 224
    text_processor:
        train:
          name: "base_instruction"
          max_words: 200

run:
  output_dir: "lavis/output/pp_7b_video/sft_video/"

  task: deepspeed_image_text_pretrain

  num_workers: 4

  seed: 42

  world_size: 1
  dist_url: "env://"
  distributed: True

  max_epoch: 1
  log_freq: 10

  lr_sched: "linear_warmup_cosine_lr_step-wise"
  warmup_lr: 0
  init_lr: 2e-5
  min_lr: 0
  warmup_ratio: 0.1

  deepspeed_config:
    # global batch = 128 = n_ranks * grad_acc_steps * micro_batch_size = (4//2) * 64 * 1
    # 8 x 3090
    # pp=8 dp=1 nproc=pp*dp=8 
    gradient_accumulation_steps: 128 # 128 // dp(=1) // bs_per_gpu(=1) = 128
    train_micro_batch_size_per_gpu: 1

    gradient_clipping: 1.
    steps_per_print: 10
    wall_clock_breakdown: false
    dump_state: False

    fp16:
        enabled: false
        loss_scale: 0
        loss_scale_window: 1000
        initial_scale_power: 16
        hysteresis: 2
        min_loss_scale: 1

    bf16:
        enabled: true

    optimizer:
        type: "AdamW"
        params:
            lr: 2e-5
            betas: [0.9,0.99]
            eps: 1e-7
            weight_decay: 0.

    zero_optimization:
        stage: 0
        # offload_optimizer:
        #   device: "cpu"
        #   pin_memory: true
        allgather_partitions: true
        allgather_bucket_size: 2e8
        overlap_comm: true
        reduce_scatter: true
        reduce_bucket_size: 2e8
        contiguous_gradients: true

@Coobiw
Copy link
Owner

Coobiw commented Jul 9, 2024

moviepy的VideoFileClip有时候读取会有问题,所以我直接跳过了,大概只会浪费1k左右的视频(总共100k),所以就还好

如果你想尽可能避免的话,可以把https://github.com/Coobiw/MPP-LLaVA/blob/master/lavis/datasets/datasets/video_instructions.py#L28这个函数,参考https://github.com/Coobiw/MPP-LLaVA/blob/master/webui_demo.py#L23进行修改

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants