1 / 37

Agenda - PowerPoint PPT Presentation

  • Uploaded on

Agenda. Presentation of ImageLab. Digital Library content-based retrieval. Computer Vision for robotic automation. Multimedia: video annotation. Medical Imaging. Video analysis for indoor/outdoor surveillance. Off-line Video analysis for telemetry a nd forensics.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Agenda' - lynley

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Presentation of imagelab
Presentation of ImageLab



Computer Vision


Multimedia: video annotation


Video analysis

for indoor/outdoor


Off-line Video analysis


and forensics

People and vehicle surveillance


Lab of Computer Vision,

Pattern Recognition and Multimedia

Dipartimento di Ingegneria dell’Informazione

Università di Modena e Reggio Emilia Italy

Imagelab recent projects in surveillance
Imagelab: recentprojects in surveillance




Italian & Regional

With Companies

  • THIS Transport hubs intelligent surveillance EU JLS/CHIPS Project 2009-2010

  • VIDI-Video: STREP VI FP EU  (VISOR VideosSurveillance Online Repository) 2007-2009

  • BE SAFE NATO Science for Peace project 2007-2009

  • Detection of infiltrated objects for security 2006-2008 Australian Council

  • Behave_Lib : Regione Emilia Romagna Tecnopolo Softech 2010-2013

  • LAICARegione Emilia Romagna; 2005-2007

  • FREE_SURF MIUR PRIN Project 2006-2008

  • Building site surveillance: with Bridge-129 Italia 2009-2010

  • Stopped Vehicles with Digitek Srl 2007-2008

  • SmokeWave: with Bridge-129 Italia 2007-2010

  • Sakbot for Traffic Analysis with Traficon 2004-2006

  • Mobile surveillance with Sistemi Integrati 2007

  • Domotica per disabili: posture detection FCRM 2004-2005

Key aspects
Key Handlingaspects

  • Based on the SAKBOT system

    • Background estimation and updating

    • Shadow removal

  • Appearance based tracking

    • we aim at recovering a pixel based foreground mask, even during an occlusion

    • Recovering of missing parts from the background subtraction

    • Managing split and merge situations

  • Occlusion detection and classification

    • Classify the differences as real shape changes or occlusions

Example 1 from visor
Example Handling1 (fromViSOR)

Example 2 from pets 2002
Example Handling 2 from PETS 2002

Example 3
Example Handling 3

Other experimental results
Other experimental results Handling

Imagelab videos (available on ViSOR)

PETS series

Results on the pets2006 dataset
Results on the PETS2006 dataset Handling

Working in real time at 10 fps!

Distributed surveillance with non overlapping field of view

Distributed surveillance Handlingwith non overlapping field of view

Exploit the knowledge about the scene
Exploit the Handlingknowledgeabout the scene

  • To avoid all-to-all matches, the tracking system can exploit the knowledge about the scene

    • Preferential paths -> Pathnodes

    • Border line / exit zones

    • Physical constraints & Forbidden zones NVR

    • Temporal constraints

Tracking with pathnode
Tracking Handlingwithpathnode

A possiblepathbetweenCamera1 and Camera 4

Pathnodes lead particle diffusion
Pathnodes Handlingleadparticlediffusion

Results with pf and pathnodes
Results Handlingwith PF and pathnodes

Single camera tracking: Multicamera tracking

Recall=90.27% Recall=84.16%

Precision=88.64% Precision=80.00%

“VIP: Vision tool for comparing Images of People” Handling

Lantagne & al., Vision Interface 2003

Each extracted silhouette is segmented into significant region using the JSEG algorithm

( Y. Deng ,B.S. Manjunath: “Unsupervised segmentation of color-texture regions in images and video” )

Colour and texture descriptors are calculated for each region

  • The colour descriptor is a modified version of the descriptor

  • presented in Y. Deng et al.: “Efficient color representation for

  • Image retrieval”.

  • Basically an HSV histogram of the dominant colors.

  • The texture descriptor is based on D.K.Park et al.: “Efficient

  • Use of Local Edge Histogram Descriptor”.

  • Essentially this descriptor characterizes the edge density

  • inside a region according to different orientations ( 0°, 45°,

  • 90° and 135° )

  • The similarity between two regions is the weighted sum of

  • the two descriptor similarities:

To compare the regions inside two silhouette, a region matching scheme is used,

involving a modified version of the IRM algorithm presented in J.Z. Wang et al, ”Simplicity:

Semantics-sensitive integrated matching for picture libraries” .

The IRM algorithm is simple and works as follows:

1) The first step is to calculate all of the similarities

between all regions.

2) Similarities are sorted in decreasing order, the

first one is selected, and areas of the

respective pair of regions are compared.

A weight, equal to the smallest percentage area

between the two regions, is assigned to the

similarity measure.

3) Then, the percentage area of the largest region is updated by removing the

percentage area of the smallest region so that it can be matched again.

The smallest region will not be matched anymore with any other region.

4) The process continues in decreasing order for all of the similarities.

In the end the overall similarity between the two region sets is calculated as:

The visor video repository
The ViSOR video repository matching scheme is used,

Aims of visor
Aims of ViSOR matching scheme is used,

  • Gather and make freely available a repository of surveillance videos

  • Store metadata annotations, both manually provided as ground-truth and automatically generated by video surveillance tools and systems

  • Execute Online performance evaluation and comparison

  • Create an open forum to exchange, compare and discuss problems and results on video surveillance

Different types of annotation
Different types of annotation matching scheme is used,

  • Structural Annotation: video size, authors, keywords,…

  • Base Annotation: ground-truth, with concepts referred to the whole video. Annotation tool: online!

  • GT Annotation: ground-truth, with a frame level annotation; concepts can be referred to the whole video, to a frame interval or to a single frame. Annotation tool: Viper-GT (offline)

  • Automatic Annotation: output of automatic systems shared by ViSOR users.

Video corpus set the 14 categories
Video corpus set: the 14 categories matching scheme is used,

Outdoor multicamera
Outdoor multicamera matching scheme is used,


Surveillance of entrance door of a building
Surveillance matching scheme is used,ofentrancedoorof a building

  • About 10h!

Videos for smoke detection with gt
Videos matching scheme is used,forsmoke detection with GT

Videos for shadow detection
Videos matching scheme is used,forshadow detection

  • Already used from many researcher working on shadow detection

  • Some videos with GT

A. Prati, I. Mikic, M.M. Trivedi, R. Cucchiara, "Detecting Moving Shadows: Algorithms and Evaluation" in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, n. 7, pp. 918-923, July, 2003

Some statistics
Some statistics matching scheme is used,

We need videos and annotations!

Simultaneous hmm action segmentation and recognition

Action recognition matching scheme is used,


Probabilistic action classification
Probabilistic Action Classification matching scheme is used,

  • Classical approach:

    • Given a set of training videos containing an atomic action each (manually labelled)

    • Given a new video with a single action

  • … find the most likely action

Dataset: "ActionsasSpace-TimeShapes (ICCV '05)." M. Blank, L. Gorelick, E. Shechtman, M. Irani, R. Basri

Classical hmm framework
Classical HMM Framework matching scheme is used,

  • Definition of a feature set

  • For each frame t, computation of the feature set Ot (observations)

  • Given a set of training observations O={O1…OT} for each action, training of an HMM (k) for each action k

  • Given a new set of observations O={O1…OT}

  • Find the model (k) which maximise P(k|O)

A sample 17 dim feature set
A sample 17-dim feature set matching scheme is used,

  • Computed on the extracted blob after the foreground segmentation and people tracking:

From the rabiner tutorial
From the Rabiner tutorial matching scheme is used,

Online action recognition
Online action Recognition matching scheme is used,

  • Given a video with a sequence of actions

    • Which is the current action? Frame by frame action classification(online – Action recognition)

    • When does an action finish and the next one start? (offline – Action segmentation)

R. Vezzani, M. Piccardi, R. Cucchiara, "An efficientBayesianframeworkfor on-line actionrecognition" in press on Proceedingsof the IEEE International Conference on Image Processing, Cairo, Egypt, November 7-11, 2009

Main problem of this approach
Main problem of this approach matching scheme is used,

  • I do not know when the action starts and when it finishes.

  • Using all the observations, the first action only is recognized

  • A possible solution: “brute force”. For each action, for each starting frame, for each ending frame, compute the model likelihood and select the maximum. UNFEASIBLE

Our approach
Our approach matching scheme is used,

  • Subsample of the starting frames (1 each 10)

  • Adoption of recursive formulas

  • Computation of the emission probability once for each model (Action)

  • Current frame as Ending frame

  • Maximum length of each action

  • The computational complexity is compliant with real time requirements

Different length sequences
Different length sequences matching scheme is used,

  • Sequences with different starting frame have different length

  • Unfair comparisons using the traditional HMM schema

  • The output of each HMM is normalized using the sequence length and a term related to the mean duration of the considered action

  • This allows to classify the current action and, at the same time, to perform an online action segmentation