Skip to content

Latest commit

 

History

History
98 lines (81 loc) · 5.17 KB

训练命令.md

File metadata and controls

98 lines (81 loc) · 5.17 KB

1. 单 GPU 训练

1.1 原始命令格式

python opencood/tools/train.py -y ${CONFIG_FILE} [--model_dir ${CHECKPOINT_FOLDER}]

1.2 使用到的命令

# stage1
CUDA_VISIBLE_DEVICES=1 python opencood/tools/train.py -y None --model_dir opencood/logs/HEAL_m1_based/stage1

python opencood/tools/train.py -y None --model_dir opencood/logs/HEAL_m3_based/stage1

CUDA_VISIBLE_DEVICES=1 python opencood/tools/train.py -y None --model_dir opencood/logs/HEAL_m2_based/stage1


# stage2
CUDA_VISIBLE_DEVICES=1 python opencood/tools/train.py -y None --model_dir opencood/logs/HEAL_m1_based/stage2/m2_alignto_m1
python opencood/tools/train.py -y None --model_dir opencood/logs/HEAL_m1_based/stage2/m3_alignto_m1
python opencood/tools/train.py -y None --model_dir opencood/logs/HEAL_m1_based/stage2/m4_alignto_m1

CUDA_VISIBLE_DEVICES=1 python opencood/tools/inference.py --model_dir opencood/logs/HEAL_m1_based/stage1/m1_base  --fusion_method intermediate
CUDA_VISIBLE_DEVICES=1 python opencood/tools/inference.py --model_dir opencood/logs/HEAL_m1_based/stage2/m2_alignto_m1 --fusion_method intermediate
CUDA_VISIBLE_DEVICES=1 python opencood/tools/inference.py --model_dir opencood/logs/HEAL_m1_based/stage2/m3_alignto_m1 --fusion_method intermediate
CUDA_VISIBLE_DEVICES=1 python opencood/tools/inference.py --model_dir opencood/logs/HEAL_m1_based/stage2/m4_alignto_m1 --fusion_method intermediate

2. 多 GPU 训练

2.1 原始命令格式

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch  --nproc_per_node=2 --use_env opencood/tools/train_ddp.py -y ${CONFIG_FILE} [--model_dir ${CHECKPOINT_FOLDER}]

命令解释 (这部分由 GPT 完成)

  • CUDA_VISIBLE_DEVICES=0,1: 这个环境变量指定了可见的 GPU 设备。这里指定使用 0 号和 1 号GPU。
  • python -m torch.distributed.launch: 这个命令使用 torch.distributed.launch 模块来启动分布式训练任务
  • --nproc_per_node=2: 这个参数指定每个节点上的进程数。这里设置为2,意味着将在两个GPU上分别启动一个进程
  • --use_env: 这个参数告诉启动脚本使用环境变量来初始化分布式配置。
  • opencood/tools/train_ddp.py: 这是你要运行的Python脚本,用于训练模型。该脚本应该已经被设置为支持分布式训练。
  • -y ${CONFIG_FILE}: 这个参数传递了配置文件的路径,${CONFIG_FILE}是一个环境变量,代表具体的配置文件路径。-y可能是你的脚本所定义的参数,表示使用该配置文件来进行训练。
  • [--model_dir ${CHECKPOINT_FOLDER}]: 这个可选参数表示模型的检查点目录,${CHECKPOINT_FOLDER}是一个环境变量,指向模型检查点保存的文件夹路径。--model_dir是传递给训练脚本的参数,用于指定保存或加载模型检查点的目录。

2.2 使用到的命令

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch  --nproc_per_node=2 --use_env opencood/tools/train_ddp.py -y None --model_dir opencood/logs/HEAL_m1_based/stage1/m1_base

3. TensorBoard

3.1 用到的命令

tensorboard --logdir=opencood/logs/HEAL_m1_based/stage2/m2_alignto_m1 --port=8080 --host=0.0.0.0
tensorboard --logdir=opencood/logs/HEAL_m1_based/stage2/m3_alignto_m1 --port=8080 --host=0.0.0.0
tensorboard --logdir=opencood/logs/HEAL_m1_based/stage2/m4_alignto_m1 --port=8080 --host=0.0.0.0

4. 训练结果

使用到的命令

CUDA_VISIBLE_DEVICES=1 python opencood/tools/inference.py --model_dir opencood/logs/HEAL_m2_based/stage1 --fusion_method intermediate
The Average Precision at IOU 0.3 is 0.62, The Average Precision at IOU 0.5 is 0.53, The Average Precision at IOU 0.7 is 0.33

stage1 $L^{(x)}_{P}$

The Average Precision at IOU 0.3 is 0.95, The Average Precision at IOU 0.5 is 0.94, The Average Precision at IOU 0.7 is 0.90
# m3_base
The Average Precision at IOU 0.3 is 0.88, The Average Precision at IOU 0.5 is 0.87, The Average Precision at IOU 0.7 is 0.80

stage2

m2 $C^{(x)}_{E}$

The Average Precision at IOU 0.3 is 0.44, The Average Precision at IOU 0.5 is 0.34, The Average Precision at IOU 0.7 is 0.19

m3 $L^{(x)}_{S}$

The Average Precision at IOU 0.3 is 0.72, The Average Precision at IOU 0.5 is 0.63, The Average Precision at IOU 0.7 is 0.28

m4 $C^{(x)}_{R}$

The Average Precision at IOU 0.3 is 0.39, The Average Precision at IOU 0.5 is 0.27, The Average Precision at IOU 0.7 is 0.11

融合训练

The Average Precision at IOU 0.3 is 0.77, The Average Precision at IOU 0.5 is 0.76, The Average Precision at IOU 0.7 is 0.65 
{'m3': 32}

The Average Precision at IOU 0.3 is 0.77, The Average Precision at IOU 0.5 is 0.76, The Average Precision at IOU 0.7 is 0.65
{'m3': 32}

The Average Precision at IOU 0.3 is 0.77, The Average Precision at IOU 0.5 is 0.76, The Average Precision at IOU 0.7 is 0.65
{'m3': 32}

The Average Precision at IOU 0.3 is 0.24, The Average Precision at IOU 0.5 is 0.24, The Average Precision at IOU 0.7 is 0.20
{'m3': 32}

The Average Precision at IOU 0.3 is 0.24, The Average Precision at IOU 0.5 is 0.24, The Average Precision at IOU 0.7 is 0.20