Natural language processing for action recognition
This presentation is the property of its rightful owner.
Sponsored Links
1 / 35

Natural Language Processing for Action Recognition PowerPoint PPT Presentation


  • 107 Views
  • Uploaded on
  • Presentation posted in: General

Natural Language Processing for Action Recognition. JHU Summer School Evelyne Tzoukermann, Ph.D. Friday, June 11, 2010. What is the role of Natural Language in Action Recognition?. Provide temporal information Where in the video is the action happening? Provide semantic information

Download Presentation

Natural Language Processing for Action Recognition

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Natural language processing for action recognition

Natural Language Processing for Action Recognition

JHU Summer School

Evelyne Tzoukermann, Ph.D.

  • Friday, June 11, 2010


What is the role of natural language in action recognition

What is the role of Natural Language in Action Recognition?

  • Provide temporal information

    • Where in the video is the action happening?

  • Provide semantic information

    • Parse the phrasal constituents to determine action type and human interaction through objects, instruments, and other contextual information

    • E.g.: cut potatoes  semantic representation

      • <instrument> knife

      • <human interaction> hands

      • <location> cutting board


Function of natural language in action recognition

Function of Natural Language in Action Recognition?

  • Facilitate action recognition from the video.

  • Ground video processing

  • Extract relevant entities and semantics associated with them

  • Allow fusion of knowledge from text with action primitives

  • Leverage already existing techniques and knowledge


Completed

Completed

  • Dataset domains:

    • Cooking

    • Crafts

  • Classification of Actions

  • Categorization of Actions


Cooking domain

Cooking domain

  • DVD’s:

    • Cook like a chef

    • Martha’s Favorite Family Dinners

    • Joanne Wier’s cooking class

  • CMU Kitchen dataset

  • Food Network: 12 consecutive hours of recorded time

  • PBS Kids: Sprout – 5 shows

  • URADL: U. of Rochester Activities of Daily Living

    • 12 activities, 5 individuals, 3 recordings each


Craft domain

Craft domain

  • PBS Kids: Sprout – over 25 shows


Tuples of entities

Tuples of Entities

  • Time stamps for temporal information

  • Verbs - capture actions

  • Objects - what is acted upon

  • Instruments - with what tool

  • Location – for recognition

  • Camera position – for scalability


Information extraction

Information Extraction

  • Extract structured information from unstructured documents

    Ex: "Yesterday, New-York based Foo Inc. announced their acquisition of Bar Corp.“

    • Entity identification and recognition

  • Goal of IE: allow computation to be performed on unstructured data.

  • More specific goal: allow logical reasoning to draw inferences based on the logical content of the input data.


Entity recognition for video

Entity Recognition for Video

  • Can be considered an IE task with a list of entities

  • Find a tuple or an ordered list with a temporal dimension

  • Goal of text-based Information Extraction:

    “Who did what to whom where”

    • Find the different entities that fill these slots

  • Goal of video and text IE

    • Find the temporal, and other entities


Angelina s ballet slippers

Angelina’s Ballet Slippers

  • Video

  • Web page


Angelina s ballet slippers1

Angelina’s Ballet Slippers

Ingredients

  • 1 red pepper, cut in half with seeds removed

  • 1⁄2 cup quick cook brown rice

  • 1⁄2 cup vegetable stock

  • 1 cup canned mixed vegetables, no added salt

  • 1⁄4 tsp. black pepper

  • 1 tsp. chopped fresh parsley

  • 1 tsp. extra virgin olive oil

  • 1 lemon

  • Decorative cabbage

  • 1⁄4 cup shredded cheddar cheese, divided

Supplies

  • Measuring cups and spoons

  • Cutting board & knife

  • Cooking pot

  • Small cooking pot

  • Mixing spoons

  • Slotted spoon

  • High-sided baking dish

  • Pastry brush

  • Large serving plate


Sprout alphabet book

Sprout - Alphabet book


Baby picture frames

Baby Picture Frames


Action recognition and complexity

Action Recognition and Complexity

Input

  • transcripts and closed captions

  • text transcripts alone

  • list of ingredients and utensils

  • Evaluation can follow these levels


Sprout elmo s funny face pizza

Sprout – Elmo’s Funny Face Pizza


Sprout caillou s crunchy carrot salad

Sprout – Caillou’s Crunchy Carrot Salad


Martha stewart episode 2

Martha Stewart Episode 2


Martha stewart 191 action verbs

Martha Stewart – 191 action verbs


Semantic categorization of actions

Semantic Categorization of Actions


Cmu kitchen set verbs

CMU Kitchen Set - Verbs

  • take

  • put

  • Open

  • fill

  • crack

  • beat

  • stir

  • pour

  • clean

  • switchon

  • read

  • spray

  • close

  • walk

  • wist_on

  • twist_off


Nlp tools

NLP Tools

  • Part-of-speech tagger or phrase chunker

  • Dependency parser for Verb-Object relations

    • We have tuples of Verb, Object, Instrument, Location

    • Ex: Stir(v)chili(o)with a wooden spoon (instr) in a pot(loc)

  • Collocations for Instrument and Location

    • Coocurrence from Google

    • Ex: “place a wooden spoon across the pot to keep it from boiling”

  • And more


Ontology

Ontology

  • Need to capture:

    • Concepts

    • Relationships

    • Properties

    • Timestamps (video_name [beg_time, end_time])

    • Validation


Ontology for cooking and craft

Ontology for cooking and craft

  • Need to capture:

    • Actions

    • Food – including the state and transformation

      or

    • Objects – paper, paper roll, …

    • Instruments: kitchen utensils, scissors, crayons

    • Location

    • Timing

    • (Recipes)


Ontology1

Ontology

  • Use of Protégé http://protege.stanford.edu/

    • ontology editor and knowledge-base framework.

  • Knowtator : Protégé plug-in for annotation

    • can be used for evaluating or

    • training a variety of NLP systems.

  • Write a plug-in that takes the output of a syntactic parser and connects it to visual frames


Prot g knowledge base

Protégé knowledge-base

  • class,

    • Represent the concepts of a domain

    • organized in a subsumption hierarchy

  • instance, correspond to individuals of a class

  • slot, define properties of a class or instance

  • facet frames constrain the values that slots can have.


Dependency parser input sentence next we need to open the can of veggies

Dependency ParserInput Sentence: “Next we need to open the can of veggies”

ROOT [next-1]

( SBAR [next-1]

( next-1(Next)/IN

S [need-6] (

NP [we-3] (

we-3/PRP

)

VP [need-6] (

need-6/VBP

S [to-8] (

VP [to-8] (

to-8/TO

VP [open-10] (

open-10/VB

NP [can-14] (

NP [can-14] (

the-12/DT

can-14/NN

)

PP [of-17] (

of-17/IN

NP [veggy-19] (

veggy-19(veggies)/NNS

)

)


Dependency parser input sentence next we need to open the can of veggies1

Dependency ParserInput Sentence: “Next we need to open the can of veggies”

ROOT [next-1]

( SBAR [next-1]

( next-1(Next)/IN

S [need-6] (

NP [we-3] (

we-3/PRP

)

VP [need-6] (

need-6/VBP

S [to-8] (

VP [to-8] (

to-8/TO

VP [open-10] (

open-10/VB

NP [can-14] (

NP [can-14] (

the-12/DT

can-14/NN

)

PP [of-17] (

of-17/IN

NP [veggy-19] (

veggy-19(veggies)/NNS

)

)


Action concept and relations with other concepts

Action concept and relations with other concepts

Action

Verb

Object

Human

Interaction

Instrument

Location

Time

Vn,t1,t2


Knowtator annotation plug in

Knowtator: Annotation Plug-in

  • General purpose annotation tool

  • Facilitates creation of training and evaluation corpora for language processing tasks

  • Ease of use

  • Straightforward to incorporate domain knowledge


Knowtator an example

Knowtator: an example


Processes

Processes

Ontology

Creation

Syntactic

Parser

Ontology

Annotation

Corpus enrichment using collocations


Related research

Related Research

  • Ontology and cooking

  • Parsing “restricted” languages

  • Connecting text with images


Related research1

Related Research

  • Dina Demner-Fushman, SameerAntani, Matthew Simpson, George R. Thoma “Annotation and retrieval of clinically relevant images”, 2009

  • Ricardo Ribeiro, Fernando Batista, Joana Paulo Pardal, Nuno J. Mamede, and H. Sofia Pinto “Cooking an Ontology?”, 2008

  • Fernando Batista, Joana Paulo, NunoMamede, Paula Vaz, Ricardo Ribeiro “Ontology construction: cooking domain”, 2006

  • Joana Paulo Pardal, “Dynamic Use of Ontologies in Dialogue Systems”, 2009


Related research2

Related Research

  • Mutsuo Sano, Ichiro Ide, Kenzaburo Miyawaki “Overview of the ACM Multimedia 2009 Workshop on Multimedia for Cooking and Eating Activities (CEA’09)”

  • Keigo Kitamura Toshihiko Yamasaki KiyoharuAizawa

    “FoodLog: Capture, Analysis and Retrieval of Personal

    Food Images via Web”, 2009 distinguishes food images from other images

  • Dan Tasse and Noah Smith (CMU) SOUR CREAM:Toward Semantic Processing of Recipes, 2008

    • new techniques for semantic parsing by focusing on the domain of cooking recipes

    • first order logic


  • Login