This repository contains the dataset published with the ECCV 2022 paper "BRACE: The Breakdancing Competition Dataset for Dance Motion Synthesis".
- A new dataset for audio-conditioned dance motion synthesis
- Focuses on breakdancing sequences
- Contains high quality annotations for complex body poses and dance movements
Property | Value |
---|---|
Frames | 334,538 |
Manually annotated frames | 26,676 (8%) |
Duration | 3h 32m |
Dancers | 64 |
Videos | 81 |
Sequences | 465 |
Segments | 1,352 |
Avg. segments per sequence | 2.91 |
Avg. sequence duration | 27.48s |
Avg. segment duration | 9.45s |
You can read our paper on arXiv.
You can watch our supplementary video to have a look at BRACE and our paper.
You can download our keypoints here.
Notice that keypoints are split into segments, i.e. the shorter clips that compose a sequence (please refer to our paper for more details). We provide Pytorch code to load these segments keypoints as sequences (see below).
Keypoints are JSON files organised in folders as follows:
├── year
│ ├── video_id
│ │ ├── video_id_start-end_movement.json
Where video_id_start-end_movement
denote the corresponding information about the segment.
For example 3rIk56dcBTM_1234-1330_powermove.json
indicates:
- video_id:
3rIk56dcBTM
- start:
1234
- end:
1330
- movement:
powermove
Start/end are in frames. Movement can be one of (toprock, footwork, powermove)
.
The content of the JSON files is
a dictionary where keys are frame IDs in the format video_id/img-xxxxxx.png
,
where xxxxxx
is the 6 digit (0 padded if necessary) frame number. Each frame ID indexes a dictionary with
two values: box
and keypoints
. box
is a 5-element list containing the coordinates of the dancer
bounding box, in the format (x, y, w, h, score)
. keypoints
is a 17-element list containing the
human joints coordinates in the format (x, y, score)
.
The order of the keypoints follows COCO's format.
All coordinates are in pixel (area: 1920x1080) and all score
values are 1
(we kept them for compatibilities with other libraries).
You can find the 26,676 manually annotated keypoints
here.
These are provided as numpy arrays (npz
files) organised in a similar structure as the interpolated keypoints:
├── year
│ ├── video_id
│ │ ├── img-xxxxxx.npz
Where img-xxxxxx
is the frame ID, as seen above. Keypoints can be loaded as follows
import numpy as np
keypoints = np.load('path_to_npz_file')['coco_joints2d'][:, :2]
keypoints
will then be a numpy array of shape (17, 2) containing the annotated nodes coordinates.
These are also in pixel and follow the COCO format just like the segment keypoints.
Notice that arrays actually have shape (17, 3), however the last column axis [:, 2]
is not meaningful.
Make sure you load these files as suggested with the snippet above to load arrays correctly.
You can download pre-extracted audio features here. The audio files for these features were obtained trimming the videos' full audio following the sequences' start and end times.
We extracted features using Dance Revolution 's code. Specifically, we extract the following:
- mel frequency cepstral coefficients (MFCC)
- MFCC delta
- constant-Q chromagram
- onset envelope
- onset beat
- tempogram
Sampling rate for these was set to 15360. Please refer to Dance Revolution for more details. Files are organised as follows:
├── year
│ ├── video_id
│ │ ├── video_id.sequence_idx.npz
Where video_id.sequence_idx
corresponds to the sequence uid
(see annotations below).
Features are saved as numpy files, which you can load as follows:
import numpy as np
features = np.load('path_to_feature_file.npz')
features.files
contains the 6 audio features listed above:
features.files
['mfcc', 'mfcc_delta', 'chroma_cqt', 'onset_env', 'onset_beat', 'tempogram']
Each of these is a numpy array, which you can access like you
query a dictionary, e.g. features['mfcc']
.
Each array is 2D with shape (feature_dim, temporal_dim)
.
We used youtube-dl
to download the videos from YouTube
(links are provided in video_info.csv)
using:
format: bestvideo[ext=mp4],bestaudio[ext=m4a]
To extract frames we simply used ffmpeg
without re-encoding the videos:
ffmpeg -i ./path_to_videos/${video_id}.mp4 ./path_to_frames/${video_id}/img-%06d.png
Where video_id
is the YouTube video ID.
You will find our annotations under the folder annotations
in this repo.
Here we provide the details of each file.
This file annotates segments, i.e. the shorter dance units a sequence is composed of (more details in our paper). The file contains the following columns:
video_id
: YouTube video IDseq_idx
: index of the sequence a segment belongs tostart_frame
: start of the segment in frameend_frame
: end of the segment in framedance_type
: type of the dance segment, either(toprock, powermove, footwork)
dance_type_id
: numerical ID of the dance type:{0: toprock, 1: powermove, 2: footwork}
dancer
: name of the dancerdancer_id
: numerical ID of the danceryear
: year of the competition/videouid
: unique ID of the segment
This file annotates sequences, i.e. a series of segments. Each sequence corresponds to one of the sequences a dancer performs in the corresponding video. The file contains the following columns:
video_id
: YouTube video IDseq_idx
: index of the sequence, i.e. the order in which it appears in the videostart_frame
: start of the sequence in frameend_frame
: end of the sequence in framedancer
: name of the dancerdancer_id
: numerical ID of the danceryear
: year of the competition/videouid
: unique ID of the sequence
Note that there may be gaps between the indicated start/end times (i.e. missing frames). In fact, these times correspond to the start of the first segment and the end of the last segment. While most segments are contiguous, in few cases we could not label some parts of the videos due to aerial or very distant views.
These files identify the training/testing splits we used for our experiments. They contain two columns to uniquely identify a training or testing sequence:
video_id
: YouTube video IDseq_idx
: index of the sequence, relative to the videouid
: unique ID of the sequence
This JSON file contains the sequences' audio beat information, extracted with
Essentia.
Information is organised as a dictionary where keys are sequences uid
(see above).
Each key indexes a dictionary containing:
bpm
: the beat per minute of the audio trackbeats_confidence
: the confidence of Essentia's beat extractor algorithmbeats_sec
: list of beat positions, in secondsbeats_frame
: list of beat positions, in frames
Like audio features, beats info is relative to audio sequences (i.e. the video audio trimmed using the annotated sequence start/end times).
This JSON file contains the shot boundaries we detected with Scene Detect. This file is a dictionary where keys are YouTube video IDs and values are lists of frame indices where shot changes were detected.
We prepared a Python script that loads BRACE as a PyTorch dataset.
This is the file utils/dataset_pytorch.py
.
You can use this file as follows (see also the __main__
function there):
import pandas as pd
from pathlib import Path
# adjust csv paths if you don't run this script from the `utils` folder
sequences_path_ = Path('../dataset') # path where you download and unzipped the keypoints
df_ = pd.read_csv(Path('../annotations/sequences.csv'))
train_df = pd.read_csv('../annotations/sequences_train.csv')
train_df = df_[df_.uid.isin(train_df.uid)]
brace_train = BraceDataset(sequences_path_, train_df)
skeletons_train, metadata_train = brace_train.__getitem__(0)
test_df = pd.read_csv('../annotations/sequences_test.csv')
test_df = df_[df_.uid.isin(test_df.uid)]
brace_test = BraceDataset(sequences_path_, test_df)
skeletons_test, metadata_test = brace_test.__getitem__(0)
If you need help with BRACE, just create a new issue in this repository.
Please cite our paper if you use BRACE:
@article{moltisanti22brace,
author = {Moltisanti, Davide and Wu, Jinyi and Dai, Bo and Loy, Chen Change},
title = {{BRACE: The Breakdancing Competition Dataset for Dance Motion Synthesis}},
journal = {European Conference on Computer Vision (ECCV)},
year = {2022}
}
- Davide Moltisanti*. University of Edinburgh (work done while at Nanyang Technological University)
- Jinyi Wu*. S-Lab, Nanyang Technological University
- Bo Dai. Shanghai AI Laboratory (work done while at Nanyang Technological University)
- Chen Change Loy. S-Lab, Nanyang Technological University
*Equal contribution:
- Davide worked on the dataset acquisition pipeline and compilation, paper writing and synthesis baselines.
- Jinyi proposed to create BRACE from Red Bull BC One videos, was in charge of manual annotations, worked on synthesis and pose estimation baselines. He also helped reviewing the paper.
BRACE is released under the S-Lab License 1.0:
Copyright 2022 S-Lab
Redistribution and use for non-commercial purpose in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
- Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
- Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. The disclaimer referenced above is:
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- In the event that redistribution and/or use for commercial purpose in source or binary forms, with or without modification is required, please contact the contributor(s) of the work.