Contributions A people dataset of 8035 images. 2 1 Three layer attribute classification framework using poselets.
H3D (Humans in 3D) Pascal VOC 2010 (trn+val) 9 different attributes in total. At least 2 attributes per image Agreement of 4 out of 5 TRN: 2003 VAL: 2011 TEST: 4022 People Dataset 1 8035 images
What is a Poselet ? Poselets capture part of the pose from a given viewpoint [Bourdev & Malik, ICCV09]
Poselets Examples may differ visually but have common semantics [Bourdev & Malik, ICCV09]
Poselets But how are we going to create training examples of poselets?
Finding correspondences at training time Given part of a human pose How do we find a similar pose configuration in the training set?
Finding correspondences at training time Left Shoulder Left Hip We use keypoints to annotate the joints, eyes, nose, etc. of people
Finding correspondences at training time Residual Error
Training poselet classifiers Residual Error: 0.15 0.20 0.10 0.35 0.15 0.85 Given a seed patch Find the closest patch for every other person Sort them by residual error Threshold them
Training poselet classifiers Given a seed patch Find the closest patch for every other person Sort them by residual error Threshold them Use them as positive training examples to train a linear SVM with HOG features
Which poselets should we train? • Choose thousands of random windows, generate poselet candidates, train linear SVMs • Select a small set of poselets that are: • Individually effective • Complementary
Features • HOGs at two levels (~2K-4K features) • 16 x 16 • 32 x 32 • Color Histograms in H,S,B (30 features) • 10 bins for H, S and B • Skin classifier output (3 features) • GMM with 5 components • Fraction of skin pixels • hands-skin, legs-skin, neck skin