Main Task

Describing the First Guest: Naming at least 4 characteristics of the first guest, i.e., color of clothes, color of hair, gender, and age, earns bonus points.

Potential Solutions

Semantic Segmentation + multi-task classification;
crop and identify (the colours)

Potential Models

Segmentation models - i.e., UNet...
CV classification models - i.e., ResNet, VGG, MobileNet...
Merged models - i.e., UNet for segmentation + ResNet as backbone and for classification...

Useful Datasets

CelebA - http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html
LIP (Look into Person) - https://www.sysu-hcp.net/lip/overview.php
ATR (Human Parsing Dataset) - https://github.com/lemondan/HumanParsing-Dataset

Preprocessing

CelebA

Merge masks:

Original categories: 'cloth', 'ear_r', 'hair', 'l_brow', 'l_eye', 'l_lip', 'mouth', 'neck', 'nose', 'r_brow', 'r_ear', 'r_eye', 'skin', 'u_lip', 'hat', 'l_ear', 'neck_l', 'eye_g'.

To be merged: 'ear': ['l_ear', 'r_ear'], 'brow': ['l_brow', 'r_brow'], 'eye': ['l_eye', 'r_eye'], 'mouth': ['l_lip', 'u_lip', 'mouth']

The masks are merged by taking logic "or" operations over the whole mask.
Data Augmentation:

Random Flip, Random Crop, Random zooming in/out - apply to both image and masks;

Random Noise, (Gaussian) Blur, Brightness - apply to image;
Resize

PIL

Merge masks:

Original categories: 'hat', 'hair', 'glove', 'sunglasses', 'upperclothes', 'dress', 'coat', 'socks', 'pants', 'jumpsuits', 'scarf', 'skirt', 'face', 'left-arm', 'right-arm', 'left-leg', 'right-leg', 'left-shoe', 'right-shoe',

To be merged: 'cloth': ['upperclothes', 'dress', 'coat', 'jumpsuits',]
Data Augmentation and Resize - same as CelebA

Unify the Categories from dataset

CelecbA:

Keep only: cloth (as upper cloth), hair, skin (as face), hat, eye_g (as glasses)
PIL:

Take sunglasses as glasses

Training

Image size: (256, 256) for training.
Loss: loss of segmentation (BCELoss in mean mode) + 0.5 * loss of classification (BCELoss, sum all the channels, then take average among the whole batch) + Loss of colour regression (MAELoss, only counted when the object actually exists, mean over channels and the whole batch)
If training a classification and regression model (for Pipeline B), the label will be decided by whether the lebelled mask is not pure black; and the colour label will be taken by taking the medium value of the cropped region given by the masks.

Detection Pipeline A

Semantic Segmentation: Find prediction masks of the person's cloth, hair, hat, face and glasses from the input image;
Compute the average colour of the regions from input image cropped by the segmentation masks;
Object exists or not will be decided by the size and relative position of the contours collected from the predicted masks.

Detection Pipeline B

Semantic Segmentation with model A - same as A;
Input the predicted masks and input image into model B, make predictions on whether an object (i.e., glasses, hat...) is detected, as well as making prediction of the colour directly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

work_description.md

work_description.md

Main Task

Potential Solutions

Potential Models

Useful Datasets

Preprocessing

CelebA

PIL

Unify the Categories from dataset

Training

Detection Pipeline A

Detection Pipeline B

Files

work_description.md

Latest commit

History

work_description.md

File metadata and controls

Main Task

Potential Solutions

Potential Models

Useful Datasets

Preprocessing

CelebA

PIL

Unify the Categories from dataset

Training

Detection Pipeline A

Detection Pipeline B