00:00

Multi-task Cascaded Convolutional Networks for Joint Face Detection and Alignment

Face detection and alignment are crucial for various applications. This paper introduces a cascaded framework with three deep convolutional networks for accurately detecting faces and their landmarks in images. The network progresses from proposing candidate facial windows, refining them with bounding box regression, classifying faces, localizing landmarks, to a final stage for more precise supervision. Training involves face classification, bounding box regression, and landmark localization losses using specific training data. Experimental results demonstrate the effectiveness of the proposed method in achieving fast and accurate joint face detection and alignment.

gabana
Download Presentation

Multi-task Cascaded Convolutional Networks for Joint Face Detection and Alignment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 1 Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks

  2. 2 Introduction • Face detection and alignment are essential to many applications such as face recognition, facial expression recognition, age identification, and so on. • Automatically output the position for each face in the given image.

  3. • Cascaded framework that includes three-stage multi-task deep convolutional networks. 3 Fig1: flowchart

  4. 4 • Given an image, we initially resize it to different scales to build an image pyramid

  5. Stage1:Proposal Network(P-net) 5 • Obtain the candidate facial windows and their bounding box regression vectors.

  6. Stage1:Proposal Network(P-net) 6 • Face classification

  7. Stage1:Proposal Network(P-net) 7 • Bounding box regression

  8. Stage1:Proposal Network(P-net) 8 • Facial landmark location

  9. P-net detection result 9 • Candidates are calibrated based on the estimated bounding box regression vectors. • Employ non-maximum suppression (NMS) to merge highly overlapped candidates.

  10. Stage2:Refinement Network(R-net) 10

  11. R-net detection result 11 • All candidates are fed to another CNN(R-Net),which further rejects a large number of false candidates, performs calibration with bounding box regression, and conducts NMS.

  12. Stage3:Output Network (O-Net) 12 • This stage is similar to the second stage, but in this stage we aim to identify face regions with more supervision. • The network will output five facial landmarks’ positions.

  13. O-net detection result 13 • The network will output five facial landmarks’ positions.

  14. Flowchart 14

  15. Training 15 • Face classification: cross-entropy loss

  16. Training 16 • Bounding box regression: Euclidean loss

  17. Training 17 • Facial landmark localization: Euclidean loss

  18. Training Data 18

  19. Experiments 19 • Training Data 1) Negatives: Regions that the Intersection-over-Union (IoU) ratio less than 0.4 to any ground-truth faces 2) Positives: IoU above 0.65 to a ground truth face 3) Part faces: IoU between 0.4 and 0.65 to a ground truth face 4) Landmark faces: faces labeled 5 landmarks’ positions

  20. 20 Experiments • Training Data  Negatives and positives are used for face classification tasks  Positives and part faces are used for bounding box regression  landmark faces are used for facial landmark localization

  21. Experiments 21

  22. Oscar 22

  23. Result 23

  24. Conclusion 24 • Framework adopts a cascaded structure with three stages of carefully designed deep convolutional networks that predict face and landmark location in a coarse-to-fine manner. • Given the cascade structure, proposed method can achieve very fast speed in joint face detection and alignment.

  25. Reference 25 [1] Zhang, K., Zhang, Z., Li, Z., & Qiao, Y. (2016). Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Processing Letters, 23(10), 1499- 1503.

More Related