Homography based multiple camera detection and tracking of people in a dense crowd
Sponsored Links
This presentation is the property of its rightful owner.
1 / 41

Homography Based Multiple Camera Detection and Tracking of People in a Dense Crowd PowerPoint PPT Presentation

  • Uploaded on
  • Presentation posted in: General

Homography Based Multiple Camera Detection and Tracking of People in a Dense Crowd. Presented by: Idan Aharoni. Ran Eshel and Yael Moses. Motivation. Usually for surveillance, but not only. Many cameras, create enormous amount of data, impossible to track manually.

Download Presentation

Homography Based Multiple Camera Detection and Tracking of People in a Dense Crowd

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Homography based multiple camera detection and tracking of people in a dense crowd

Homography Based Multiple Camera Detection and Tracking of People in aDense Crowd

Presented by: IdanAharoni

Ran Eshel and Yael Moses



  • Usually for surveillance, but not only.

  • Many cameras, create enormous amount of data, impossible to track manually.

  • Many real life scenes are crowded.

Single camera tracking

Single camera tracking

  • Many papers about this issue, some of them were presented in this course:

    • Floor fields for tracking in high density crowds

    • Unsupervised Bayesian detection of independent motion in crowds

    • Particle Filters.

    • etc.

Single camera tracking problems

Single camera tracking problems

  • Not isolated body parts (human shape trackers)

  • Targets interactions

  • Target blocking each other

Algorithm overview

Algorithm Overview

  • Combining data from a set of cameras over looking the same scene.

  • Based on that data, try to detect human head tops.

  • Track after detected head tops by using assumptions on the expected trajectory.

Scene example

Scene Example

What is a homography

What Is a Homography?

  • Homography is a coordinates transformation from one image to another – represented by a 3x3 matrix.

  • Possible in 2 case only.

    • Camera rotation.

    • Same plane

More homographys

More Homographys

  • Translation:

  • Rotation:

  • Affine:

More homographys1

More Homographys

  • Need only 4 points to calculate. (defined up to scaling factor)

  • Projection:

  • It describes what happens to the perceived positions of observed objects when the point of view of the observer changes.

Not a homography

Not a Homography!

  • Barrel Correction:

Homography points detection

Homography Points Detection

  • For each camera, we want to slice find 4 points in each height plane.

Homography points detection1

Homography Points Detection

Height calculation

Height Calculation

  • Cross ratio of 4 pixels:

Floor plane projection

Floor Plane Projection

  • We can define a homography from the image to itself, that will transform a height plane to another height plane.

  • Again, all we need are 4 points of each height.

Head top detection

Head Top Detection

  • Head Top – The highest 2D patch of a person.

  • The detection is based on co-temporal frames – frames that were taken at the same times, from different cameras.

Head top detection1

Head Top Detection

Camera A

Camera B

B projected onto A plane

A on B

Background subtraction

Background Subtraction

  • First stage of the algorithm.

  • All the next stages are performed on foreground pixels only.

  • Subtract each frame from an offline background sample.

What is hyper pixel

What is Hyper-Pixel?

  • A hyper pixel is a Nx1 vector (N denotes the number of cameras)

  • q – Reference image pixel.

  • - Homography related pixels in the rest of the images.

  • - Homography transformation of image i onto the reference image (opposite of ).

  • I – Intensity level.

Hyper pixel usage

Hyper-Pixel Usage

  • Hyper pixel is calculated for each foreground pixel of the reference image.

  • By using the hyper pixel intensity variance we can estimate the correlation between the pixels from the different image.

Hyper pixel variance

Hyper Pixel Variance

Low Variance

Low Variance

High Variance

Low Variance

2d patches

2D patches

  • Now we have a map of variances, for each pixel.

  • We need to obtain candidates for real projected pixels.

  • Use variance thresholds and head size clustering (K-Means).

K means clustering

K-Means Clustering

  • Partition N observations into K clusters

  • Each observation belongs to the cluster with the nearest mean.

Repeat until convergence…

Thanks Wiki!

Back to floor projection

Back to Floor Projection…

  • A person can be detected on more than one height plane.

  • All heights are projected to the floor, and only highest patch is taken…

  • A Head!



Reference foreground

Projected foregrounds

Variance map

Single height detection

All heights detection




  • So far we have a map of potential heads and heights.

  • Tracking should remove false positives and false negatives.

  • For that we define a few prior based measurements.

Tracking first stage

Tracking – First Stage

  • In this stage, we aim to remove false negatives.

  • For that we have two head maps. One with high threshold, and one with low threshold, (projected to the floor)

  • High threshold yields less false positives, but more false negatives.

Tracking first stage1

Tracking – First Stage

  • High threshold map:

  • If we have a hole, we try to make fill it in the low threshold map.

Tracking first stage2

Tracking – First Stage

  • If no match could be found in high and low maps, we stop the tracking after this track.

Tracking second stage

Tracking – Second Stage

  • Now we have a list of fractioned tracks

  • Very easy for a human to figure out which one goes where…

Tracking second stage1

Tracking – Second Stage

  • In this stage we aim to connect fragmented tracks, by using priors of how people move.

  • For that we define a score, which is calculated out of 6 parameters, for each pair of time overlapped tracks.

Second stage scores

Second Stage - Scores

  • The difference

    in direction

    2. Direction change


Second stage scores1

Second Stage - Scores

3) Amount of overlap between tracks (4).

4) Minimal distance along the tracks (3).

5) Average distance along the tracks.

Second stage scores2

Second Stage - Scores

6) Height change – Not very likely in a tracking time frame…

Tracking scores

Tracking - Scores

  • Score calculation:

  • : Maximum expected value of score

Tracking final stage

Tracking – Final stage

  • We now have full length set of trajectories.

  • In this stage, tracks that are suspected as false positives are removed.

Tracking final stage1

Tracking – Final stage

  • For each trajectory, we use a consistency score between each 2 consecutive frames

  • Consistency score is made of weighted average of:

    • Un-natural speed changes.

    • Un-natural direction changes.

    • Changes in height.

    • Too short track Length.

Results scene

Results - scene

  • Cameras:

    • 3 - 9 grey level cameras.

    • 15 fps, 640x512.

    • 30⁰ related to each other.

    • 45⁰ below horizon.

  • Scene:

    • 3x6 meters



  • True Positive (TP): 75% - 100% of the trajectory is tracked (might be with IDC)

  • Perfect True Positive (PTP) – 100% of the trajectory is tracked (no IDC).

  • Detection Rate (DR): percent of frames tracked compare to full trajectory.

  • ID Change (IDC)

  • False Negative (FN): less than 75% of the trajectory is tracked.

  • False Positive (FP): Track with no real trajectory.

Results summary

Results – Summary

Varying the number of cameras

Varying the number of cameras

  • It seems like we need at least 8-9 cameras…



  • Login