1 / 51

Scene Understanding perception, multi-sensor fusion, spatio-temporal reasoning

Scene Understanding perception, multi-sensor fusion, spatio-temporal reasoning and activity recognition. Francois BREMOND PULSAR project-team, INRIA Sophia Antipolis, FRANCE Francois.Bremond@sophia.inria.fr http://www-sop.inria.fr/pulsar/

quasar
Download Presentation

Scene Understanding perception, multi-sensor fusion, spatio-temporal reasoning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scene Understanding perception, multi-sensor fusion, spatio-temporal reasoning and activity recognition. Francois BREMOND PULSAR project-team, INRIA Sophia Antipolis, FRANCE Francois.Bremond@sophia.inria.fr http://www-sop.inria.fr/pulsar/ Key words: Artificial intelligence, knowledge-based systems, cognitive vision, human behavior representation, scenario recognition

  2. Video Understanding:Performance Evaluation (V. Valentin, R. Ma) • ETISEO: French initiative for algorithm validation and knowledge acquisition: http://www-sop.inria.fr/orion/ETISEO/ • Approach: 3 critical evaluation concepts • Selection of test video sequences • Follow a specified characterization of problems • Study one problem at a time, several levels of difficulty • Collect long sequences for significance • Ground truth definition • Up to the event level • Give clear and precise instructions to the annotator • E.g., annotate both visible and occluded part of objects • Metric definition • Set of metrics for each video processing task • Performance indicators: sensitivityand precision

  3. Evaluation :current approach(AT. NGHIEM) • ETISEO limitations: • Selection of video sequence according to difficulty levels is subjective • Generalization of evaluation results is subjective. • One video sequence may contain severalvideo processingproblems at many difficulty levels • Approach: treat each video processing problem separately • Define a measure to compute difficulty levels of input data (e.g. video sequences) • Select video sequences containing only the current problems at various difficulty levels • For each algorithm, determine the highest difficulty level for which this algorithm still has acceptable performance. • Approach validation : applied to two problems • Detect weakly contrasted objects • Detect objects mixed with shadows

  4. Evaluation : conclusion • A new evaluation approach to generaliseevaluation results. • Implement this approach for 2 problems. • Limitations: only detect the upper bound of algorithm capacity. • The difference between the upper bound and the real performance may be significant if: • The test video sequence contains several video processing problems • The same set of parameters is tuned differently to adapt to several concurrent problems • Ongoing evaluation campaigns: • PETS at ECCV2008 • TRECVid (NIST) with ILids video • Benchmarking databases: • http://homepages.inf.ed.ac.uk/cgi/rbf/CVONLINE/entries.pl?TAG363 • http://www.hitech-projects.com/euprojects/cantata/datasets_cantata/dataset.html

  5. Video Understanding:Program Supervision

  6. Supervised Video Understanding:Proposed Approach Goal : easy creation ofreliablesupervisedvideounderstandingsystems Approach • Use of a supervised video understanding platform • A reusable software tool composed of three separate components: program library – control – knowledgebase • Formalize a priori knowledge of video processing programs • Explicit the control of video processing programs Issues ? • Video processing programs which can be supervised • A friendlyformalism to represent knowledge of programs • A generalcontrolengine to implement different control strategies • A learningtool to adapt system parameters to the environment

  7. Proposed Approach Evaluation Application domain knowledge base Application Domain Expert Control Particular System Scene environment knowledge base Video Processing Program Library Video Processing Expert Learning Video processing program knowledge base Evaluation

  8. Supervised Video Understanding Platform:Operator Formalism • Use of an operator formalism[Clément and Thonnat, 93] to represent knowledge of video processing programs • Composed of frames and production rules • Frames: declarative knowledge • Operators: abstract model of a video processing program • primitive: particular program • composite: particular combination of programs • Production rules: inferential knowledge • Choiceandoptionalcriteria • Initialization criteria • Assessment criteria • Adjustmentand repair criteria

  9. Program Supervision:Knowledge and Reasoning Composite operator Primitive operator Functionality Characteristics Input data Parameters Output data Preconditions Postconditions Effects Decompositioninto suboperators (sequential, parallel, alternative) Data flow Functionality Characteristics Input data Parameters Output data Preconditions Postconditions Effects Calling syntax Rule Bases Parameterinitializationrules Parameteradjustmentrules Resultevalutationrules Repairrules Rule bases Parameterinitializationrules Parameteradjustmentrules Choice rules Resultevalutationrules Repair rules

  10. Video Understanding:Learning Parameters (B.Georis) • Objective: a learning tool to automatically tune algorithm parameters with experimental data • Used for learning the segmentationparameters with respect to the illumination conditions • Method • Identify a set of parameters of a task • 18 segmentation thresholds • depending on environment characteristics • Image intensity histogram • Study the variability of the characteristic • Histogram clustering -> 5 clusters • Determine optimal parameters for each cluster • Optimization of the 18 segmentation thresholds

  11. Video Understanding:Learning Parameters Camera View

  12. Learning ParametersClustering the Image Histograms ßiopt1 ßiopt2 ßiopt3 ßiopt4 ßiopt5 A X-Z slice represents an image histogram Number of pixels [%] Z Y X Pixel intensity [0-255]

  13. Video Understanding : Knowledge Discovery(E. Corvee, JL. Patino_Vilchis) Complex Events Raw Data Simple Events • CARETAKER: An FP6 IST European initiative to provide an efficient tool for the management of large multimedia collections. • Applications to surveillance and safety issues, in urban/environment planning, resource optimization, disabled/elderly person monitoring. • Currently being validated on large underground video recordings ( Torino, Roma). Raw data Multiple Audio/Video sensors Audio/Video acquisition and encoding Generic Event recognition Primitives Event and Meta data Knowledge Discovery Knowledge Discovery

  14. Event detection examples

  15. Data Flow Object Detection • Id • Type • Info 2D • Info 3D Event Detection • Id • Type (inside_zone, stays_inside_zone) • Involved Mobile Object • Involved Contextual Object Mobile object table Object/EventDetection InformationModelling Event table Contextual object table

  16. Mobile Objects People characterised by: Trajectory Shape Significant Event in which they are involved … Table Contents Events Model the normal activities in the metro station • Event type • Involved objects • Time • … Contextual Objects Find interactions between mobile objects and contextual objects • Interaction type • Time • …

  17. Knowledge Discovery: trajectory clustering Objective: Clustering oftrajectories into k groups to match people activities • Feature set • Entry and exit points of an object • Direction, speed, duration, … • Clustering techniques • Agglomerative Hierarchical Clustering. • K-means • Self-Organizing (Kohonen) Maps • Evaluation of each cluster set based on Ground-Truth

  18. Results on Torino subway (45min),2052 trajectories

  19. Trajectory: Analysis K-means Agglomerative SOM Groups withmixedoverlap

  20. Trajectory: Semantic characterisation Agglomerative CL 21 SOM CL14 / Kmeans CL12 Consistencyof clusters between algorithms Semantic meaning:walking towards vending machines

  21. Intraclass & Interclass variance Trajectory: Analysis • SOM algorithm has thelowest intraclass andhigher interclassseparation, • Parameter tuning: which clustering techniques?

  22. Mobile Objects

  23. Mobile Object Analysis Building statistics on Objects There is anincreaseof people after 6:45

  24. Contextual Object Analysis Vending Machine 1 Vending Machine 2 With anincrease of people,there is anincreaseon theuseof vending machines

  25. Results : Trajectory Clustering

  26. Knowledge Discovery: achievements • Semantic knowledge extracted by the off-linelong term analysis of on-lineinteractions between moving objects and contextual objects: • 70% of people are coming from north entrance • Most people spend 10 sec in the hall • 64% of people are going directly to the gates without stopping at the ticket machine • At rush hours people are 40% quicker to buy a ticket, … • Issues: • At which level(s), should be designed clustering techniques: low level (image features)/ middle level (trajectories, shapes)/ high level (primitive events)? • to learn what : visual concepts, scenario models? • uncertainty (noise/outliers/rare), what are the activities of interest? • Parameter tuning (e.g. distance, clustering tech.) and • performance evaluation (criteria, ground-truth).

  27. Video Understanding : Learning Scenario Models (A. Toshev) or Frequent Composite Event Discovery in Videosevent time series

  28. Learning Scenarios: Motivation • Why unsupervised model learning in Video Understanding? • Complex modelscontaining many events, • Large variety of models, • Different parameters for different models The learning of models should be automated. Video surveillance in a parking lot

  29. Learning Scenarios: Problem Definition • Input: A set of primitive eventsfrom the vision module:object-inside-zone(Vehicle, Entrance) [5,16] • Output:frequent event patterns. • A pattern is a set of events:object-inside-zone(Vehicle, Road) [0, 35]object-inside-zone(Vehicle, Parking_Road) [36, 47]object-inside-zone(Vehicle, Parking_Places) [62, 374]object-inside-zone(Person, Road) [314, 344] Zones • Goals: • Automatic data-driven modeling of composite events, • Reoccurring patterns of primitive events correspond to frequent activities, Find classes with large size & similar patterns.

  30. Learning Scenarios: A PRIORI Method • Approach: • Iterative method from data mining for efficient frequent patterns discovery in large datasets, • A PRIORI: Sub-patterns of frequent patterns are also frequent (Agrawal & Srikant, 1995), • At i th step consider only i-patterns which have frequent (i-1) – sub-patterns  the search space is thus pruned. • A PRIORI-property for activities represented as classes: size(C m-1) ≥ size(C m) where C m is a class containing patterns of length m, C m-1 is a sub-activity of C m.

  31. Learning Scenarios: A PRIORI Method Merge two i-patterns with (i-1) primitive events in common to form an (i+1)-pattern:

  32. Learning Scenarios: Similarity Measure 2 types of Similarity Measure between event patterns: • similarities between event attributes • similarities between pattern structures Generic Similarity Measure : • Generic properties when possible  easy usage in different domains, • It shouldincorporate domain-dependent properties relevance to the concrete application.

  33. Learning Scenarios: Attribute Similarity Attributes: the corresponding events in two patterns should have similar (same) attributes (duration, names, object types,...). • Comparison between corresponding events (same type, same color). • For numeric attributes: G(x,y)= • attr(pi, pj) = average of all event attribute similarities.

  34. Learning Scenarios: Evaluation Test data: • Video surveillance at a parking lot, • 4 hours records from 2 days in 2 test sets, • Every test set contains appr. 100 primitive events. Results: In both test sets the following event pattern was recognized: object-inside-zone(Vehicle, Road) object-inside-zone(Vehicle, Parking_Road) object-inside-zone(Vehicle, Parking_Places) object-inside-zone(Person, Parking_Road)

  35. Learning Scenarios: Evaluation Testdata: • Video surveillance at a parking lot, • 4 hours records from 2 days in 2 test sets, • Every test set contains appr. 100 primitive events. Results: In both test sets the following event pattern was recognized: object-inside-zone(Vehicle, Road) object-inside-zone(Vehicle, Parking_Road) object-inside-zone(Vehicle, Parking_Places) object-inside-zone(Person, Parking_Road)

  36. Learning Scenarios: Evaluation Testdata: • Video surveillance at a parking lot, • 4 hours records from 2 days in 2 test sets, • Every test set contains appr. 100 primitive events. Results: In both test sets the following event pattern was recognized: object-inside-zone(Vehicle, Road) object-inside-zone(Vehicle, Parking_Road) object-inside-zone(Vehicle, Parking_Places) object-inside-zone(Person, Parking_Road)

  37. Learning Scenarios: Evaluation Testdata: • Video surveillance at a parking lot, • 4 hours records from 2 days in 2 test sets, • Every test set contains appr. 100 primitive events. Results: In both test sets the following event pattern was recognized: object-inside-zone(Vehicle, Road) object-inside-zone(Vehicle, Parking_Road) object-inside-zone(Vehicle, Parking_Places) object-inside-zone(Person, Parking_Road) Maneuver Parking!

  38. Learning Scenarios: Conclusion & Future Work Conclusion: • Application of a data mining approach, • Handling of uncertainty without losing computational effectiveness, • General framework: only a similarity measure and a primitive event library must be specified. Future Work: • Other similarities, • Handling of different aspects of uncertainty, • Qualification of the learned patterns, • Frequent equal interesting ? • Different applications: different event libraries or features.

  39. HealthCare Monitoring: (N. Zouba) GERHOME (CSTB, INRIA, CHU Nice) : Ageing population http://gerhome.cstb.fr/ • Approach : • Multi-sensor analysisbased on sensors embedded in the home environment • Detectin real-time any alarmingsituation • Identify aperson profile– his/her usualbehaviors - fromthe global trends of life parameters, and then to detect anydeviationfrom this profile

  40. Monitoring of Activities of Daily Living for Elderly • Goal: Increase independence and quality of life: • Enable elderly to live longer in their preferred environment. • Reduce costs for public health systems. • Relieve family members and caregivers. • Approach: • Detecting alarming situations (eg. Falls) • Detecting changes in behavior (missing activities, disorder, interruptions, repetitions, inactivity). • Calculate the degree of frailty of elderly people. Example of normal activity: Meal preparation (in kitchen) (11h– 12h) Eating (in dinning room) (12h -12h30) Resting, TV watching, (in living room) (13h– 16h) …

  41. Gerhome laboratory • GERHOME (Gerontology at Home) : homecare laboratory http://www-sop.inria.fr/orion/personnel/Francois.Bremond/topicsText/gerhomeProject.html • Experimental site in CSTB (Centre Scientifique et Technique du Bâtiment) at Sophia Antipolis http://gerhome.cstb.fr • Partners: INRIA, CSTB, CHU-Nice, Philips-NXP, CG06…

  42. Gerhome laboratory • Video cameras installed in the kitchen and in the living-room to detect and track the person in the apartment. • Contact sensors mounted on many devices to determine the interactions with the person. • Presence sensors installed in front of sink and cooking stove to detect the presence of people near sink and stove. Position of the sensors in Gerhome laboratory

  43. Sensors installed in Gerhome laboratory Contact sensor in the window Contact sensor in the cupboard door Dans la cuisine Video camera in the living-room Pressure sensor underneath the legs of armchair

  44. Event modelling • We have modelled a set of activities by using a event recognition language developed • in our team. This is an example for “Meal preparation” event. Composite Event (Prepare_meal_1, “detected by a video camera combined with a contact sensors” Physical Objects ( (p: Person), (Microwave: Equipment), (Fridge: Equipment),(Kitchen: Zone)) Components ((p_inz: PrimitiveState Inside_zone (p, Kitchen)) “detected by video camera” • (open_fg: PrimitiveEventOpen_Fridge (Fridge)) “detected by contact sensor” • (close_fg: PrimitiveEventClose_Fridge (Fridge))“detected by contact sensor” • (open_mw: PrimitiveEvent Open_Microwave (Microwave)) “detected by contact sensor” (close_mw: PrimitiveEvent Close_Microwave (Microwave))) “detected by contact sensor” Constraints ((open_fg during p_inz ) (open_mwbefore_meet open_fg) (open_fg Duration>= 10) (open_mw Duration>=5)) Action ( AText (“Person prepares meal”) AType (“NOT URGENT”)) )

  45. Multi-sensor monitoring: results and evaluation • We have studied and tested a range of activities in the Gerhome laboratory, such as: using microwave, using fridge, preparing meal, … • We have validated and visualized the recognized events with a 3D visualization tool.

  46. Recognition of the “Prepare meal” event • The person is recognized with the posture "standing with one arm up”, “located in the kitchen” and “using the microwave”. Visualization of a recognized event in the Gerhome laboratory

  47. Recognitionof the “Resting in living-room” event • The person is recognized with the posture “sitting in the armchair” and “located in the living-room”. Visualization of a recognized event in the Gerhome laboratory

  48. End-users • There are several end-users in homecare: • Doctors (gerontologists): • Frailty measurement (depression, …) • Alarm detection (falls, gas, dementia, …). • Caregivers and nursing home: • Cost reduction: no false alarm and reduction employee involvement. • Employee protection. • Persons with special needs, including young children, disabled and elderly people: • Feeling safe at home. • Autonomy: at night, lighting up the way to bathroom. • Improving life: smart mirror, summary of user day, week, month, in terms of walking distance, TV, water consumption. • Family members and relatives: • Elderly safety and protection. • Social connectivity.

  49. Social problems and solutions

  50. Conclusion A global framework for building video understanding systems: • Hypotheses: • mostly fixed cameras • 3D model of the empty scene • predefined behavior models • Results: • Video understanding real-time systems for Individuals, Groups of People, Vehicles, Crowd, or Animals … • Knowledge structured within the different abstraction levels (i.e. processing worlds) • Formal description of the empty scene • Structures for algorithm parameters • Structures for object detection rules, tracking rules, fusion rules, … • Operational language for event recognition (more than 60 states and events), video event ontology • Tools forknowledge management • Metrics, tools for performance evaluation, learning • Parsers, Formats for data exchange • …

More Related