keyword extraction and image annotation games to enhance the cultural database creation n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Keyword Extraction and Image Annotation Games to Enhance the Cultural Database Creation PowerPoint Presentation
Download Presentation
Keyword Extraction and Image Annotation Games to Enhance the Cultural Database Creation

Loading in 2 Seconds...

play fullscreen
1 / 36

Keyword Extraction and Image Annotation Games to Enhance the Cultural Database Creation - PowerPoint PPT Presentation


  • 73 Views
  • Uploaded on

Keyword Extraction and Image Annotation Games to Enhance the Cultural Database Creation. Virach Sornlertlamvanich and Thatsanee Charoenporn virach@gmail.com , thatsanee.charoenporn@nectec.or.th National Electronics and Computer Technology Center, Thailand. Motivation.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Keyword Extraction and Image Annotation Games to Enhance the Cultural Database Creation' - suchin


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
keyword extraction and image annotation games to enhance the cultural database creation

Keyword Extraction and Image Annotation Games to Enhance the Cultural Database Creation

Virach Sornlertlamvanich

and ThatsaneeCharoenporn

virach@gmail.com, thatsanee.charoenporn@nectec.or.th

National Electronics and Computer Technology Center, Thailand

PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

motivation
Motivation
  • Cultural Knowledge Creation
  • Image and object labeling
    • Keyword and semantic relation extraction
    • Image as a focal point
  • Cultural Knowledge Services
    • Service platform

PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

3 steps in digital cultural communication
3 Steps in Digital Cultural Communication

Step 1: Cultural knowledge curation

  • Reuse
  • Standardization

Step 2: Cultural image annotation

  • Keyword extraction
  • Semantic relation acquisition
  • Image annotation games

Step 3: Cultural knowledge service

  • Cultural knowledge platform for application service development

PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

cultural knowledge curation
Cultural Knowledge Curation

PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

slide5

Community Co-Creation Cultural Knowledge Base

Institution

Curation and Presentation

Cultural knowledge curation

Standardized Annotated

Cultural Knowledge Base

  • Search
  • Category
  • Statistics
  • Provincial Cultural Knowledge Base Service
  • Search
    • Text
    • Filter
    • Similarity (color,structure, role, mood, image)
  • Presentation
    • Location, category
    • Statistics
    • Relation

Community

  • Citation
  • Museum
  • Museum archive
  • Other departments
  • Community Co-Creation
  • Input
  • GPS data
  • Tag
  • Invitation, registration, approval

Audience

PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

cultural knowledge portal creation
Cultural Knowledge Portal Creation
  • Way of Life
  • - Ethnic
  • - Religion and Belief
  • - Tradition and Rite
  • - Language and Literature
  • - Local Wisdom
  • - Performing Art and Music
  • Scope of Collection
  • Cultural Personnel/Organization
  • - Artist
  • - Scholar
  • - Religious Monument
  • - Writer/Author
  • - Society/Association
  • - Cultural Network
  • - Cultural Unit
  • Cultural Site
  • - Archaeological Site
  • - Historical Park
  • - Historical Site
  • - Architecture
  • - Religious Place
  • - Museum
  • - Library
  • - Archive
  • - Monument
  • - Theatre
  • - Tourism spot
  • Cultural Artifact
  • - Archaeological Objects
  • - Artwork
  • - Visual Art
  • - Book/Press
  • - Audiovisual Media
  • - Utensil
  • - Costume

PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

slide7

Cultural Databank

http://www.m-culture.in.th/

PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

slide8

Cultural Databank

http://www.m-culture.in.th/

PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

cultural image annotation
Cultural image annotation

PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

keyword extraction
Keyword Extraction
  • Some keywords are readily available in the set tags, but many of them are still missing.
  • Our task is to extract those missing keywords from the description and title.

PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

keyword extraction1
Keyword Extraction
  • Some keywords can be linked to external pages, e.g. Wikipedia.
  • Our task is to find appropriate articles corresponding to those keywords.

PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

method for kw extraction
Method for KW Extraction
  • Chunking model (Uchimoto et al., 2004) for keyword extraction

PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

training data preparation
Training Data Preparation
  • Generate a keyword list from tags and titles that are not shorter than 5 characters and not longer than 30 characters
  • Segment descriptions using a state-of-the-art Thai word segmentation algorithm (Kruengkrai et al., 2009)
  • Note that the word segmentation algorithm was trained using ORCHID corpus and TCL’s lexicon (contents of ORCHID corpus and our current data are quite different)
  • Label the segmented descriptions with the keyword list

PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

training data
Training Data
  • Description

ผ้าซิ่นลายมัดหมี่บ้านปทุมแก้วเป็นงานฝีมือพื้นบ้าน …..

PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

labeling
Labeling
  • Apply BIO tagging
    • B: beginning position of a keyword
    • I: intermediate (or end) position of a keyword
    • O: other words
  • If several matches are possible, select the longest one (like in the previous example)

PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

training data1
Training Data
  • Description

ผ้าซิ่นลายมัดหมี่บ้านปทุมแก้วเป็นงานฝีมือพื้นบ้าน …..

  • Segmented/Tagged/Labeled Description

Word POS tag Label

ผ้าซิ่น N B-K

ลายมัดหมี่บ้านปทุมแก้ว N I-K

<space> P O

เป็น V O

งานฝีมือพื้นบ้าน N O

…… ….. …..

  • Keyword List (extracted from tag and title)
  • …..
  • ผ้า
  • …..
  • ผ้าซิ่น
  • ผ้าซิ่นลายมัดหมี่บ้านปทุมแก้ว
  • …..
  • …..

PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

chunking model
Chunking Model

PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

preliminary experiment result
Preliminary Experiment Result
  • 3000 examples for training, 500 examples for testing
  • Based on Margin Infused Relaxed Algorithm (MIRA), Crammer et al., 2005
    • Baseline features (Unigram and Bigram) +
    • 3 character prefix/suffix of current word +
    • 3 consecutive POS tags
  • Recall=0.8256, Precision=0.9061, F1=0.8640

PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

semantic relation acquisition
Semantic Relation Acquisition
  • Extract commons syntactic patterns between two nouns
  • Our task is to acquire triples (ei , rij , ej ), where
    • ei and ej are entities (keywords)
    • rij is a relationship between them

PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

extract common syntactic pattern of a predicate between two keywords
Extract Common Syntactic Pattern of a Predicate between Two Keywords

Predicate

Anchored keyword

  • Example

Title: วัดทุ่งDescription:วัดทุ่งมีอายุราว500ปีสันนิฐานว่าสร้างขึ้นในสมัยกรุงสุโขทัย

Title: วัดตราชูDescription:วัดตราชูสร้างขึ้นในสมัยกรุงศรีอยุธยาตอนต้นราวพ.ศ.2076

Title: หลวงพ่อขาวDescription:เป็นพระพุทธรูปเก่าแก่เนื้อหินทรายปางสมาธิขนาดหน้าตักกว้าง๒ศอก

ประดิษฐานอยู่ในวิหารวัดหลวงวัดสันนิฐานว่าสร้างขึ้นในสมัยอยุธยา

Title: พระพุทธรูปปางมารวิชัยDescription:สร้างขึ้นในสมัยรัตนโกสินทร์ตอนต้น

Title: วิหารวัดโยธานิมิตDescription:สร้างขึ้นในสมัยพระบาทสมเด็จพระเจ้าตากสินมหาราช

PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

extract common syntactic pattern of a predicate between two keywords1
Extract Common Syntactic Pattern of a Predicate between Two Keywords
  • Example

(วัดทุ่ง, สร้างขึ้นในสมัย, กรุงสุโขทัย)

(วัดตราชู, สร้างขึ้นในสมัย, กรุงศรีอยุธยาตอนต้น)

(หลวงพ่อขาว,สร้างขึ้นในสมัย, อยุธยา)

(พระพุทธรูปปางมารวิชัย,สร้างขึ้นในสมัย,รัตนโกสินทร์ตอนต้น)

(วิหารวัดโยธานิมิต,สร้างขึ้นในสมัย,พระบาทสมเด็จพระเจ้าตากสินมหาราช)

  • (ei, BUILT_IN, ej)

PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

extract common syntactic pattern of a predicate between two keywords2
Extract Common Syntactic Pattern of a Predicate between Two Keywords

Predicate

Anchored keyword

  • Example

Title: กระโจมไฟบ้านโรงถ่านDescription: สร้างโดยอพท. เมื่อปี พ.ศ.2550 เป็นท่าเทียบเรีอสำหรับเรือท่องเที่ยว

Title: ศาลเจ้าตากสินวัดบ้านค่ายDescription: ศาลปูนขนาดกลางสร้างโดยพระครูพิพัฒน์ชยาภรณ์

Title:วัดทุ่งโฮ้งใต้Description:สร้างขึ้นเมื่อ พ.ศ.2370จากตำนานเล่าว่าสร้างโดยกลุ่มชาวลาวพวน

Title: ศาลพระพรหมDescription:ตั้งอยู่บริเวณสวนตุงโคมตำบลเวียงอำเภอเมืองเชียงรายจัดสร้างโดยเทศบาลนครเชียงราย

Title:วงเวียนนิมิตรDescription:วงเวียนนิมิตรหรือวงเวียนม้าน้ำก่อสร้างโดยเทศบาลนครภูเก็ตในปีพ.ศ.2548

PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

extract common syntactic pattern of a predicate between two keywords3
Extract Common Syntactic Pattern of a Predicate between Two Keywords
  • Example

(กระโจมไฟบ้านโรงถ่าน, สร้างโดย, อพท.)

(ศาลเจ้าตากสินวัดบ้านค่าย,สร้างโดย,พระครูพิพัฒน์ชยาภรณ์)

(วัดทุ่งโฮ้งใต้,สร้างโดย,กลุ่มชาวลาวพวน)

(ศาลพระพรหม, สร้างโดย, เทศบาลนครเชียงราย)

(วงเวียนนิมิตร, สร้างโดย, เทศบาลนครภูเก็ต)

  • (ei, BUILT_BY, ej)

PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

esp game and peekaboom proposed by luis von ahn may 25 2006 by pete cashmore
ESP Game and Peekaboomproposed by Luis von Ahn, May 25, 2006 by Pete Cashmore
  • ESP Game – In the ESP Game, the two players are shown an image and asked to enter a word that describes it. The players can’t see each other’s guesses. The aim is to enter the same word as your partner in the shortest possible time. But there’s an ulterior motive here: much of the data is recorded, and could be used to power image search engines in the future. What’s cheaper – paying thousands of Mechanical Turkers to label all the images on the web, or tricking people into doing it for free?
  • Peekaboom – Peekaboom takes the ESP Game to the next level. Unlike the ESP Game, it’s asymmetrical. To start, one user is shown an image and the other sees an empty black space. The first user is given a word relating to the image, and the aim is to communicate that word to the other player by revealing portions of the image. So if the word is “eye” and the image is a face, you reveal the eye to your partner. But the real aim here is to build a better image search engine: one that could identify individual items within an image.

PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

esp game
ESP Game
  • Two players are shown an image
  • asked to enter a word that describes it.
  • The aim is to enter the same word as your partner in the shortest possible time.

Twitter

Bird

Angry bird

Mohawk

Bird

Bird

To name the image

PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

peekaboom
Peekaboom

Bird

  • One user is shown a named image and show the part of the image according to the name
  • Another user gives a word relating to the image
  • The aim is to enter the same word as it is named in the shortest possible time.

Squirrel

Flying fish

Bird

To label the object

in the image

Bird

PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

extended peekaboom
Extended Peekaboom

Bird

  • One user is shown a named image and show the part of the image according to the name
  • Another user gives a word relating to the image
  • The aim is to enter the same word as it is named in the shortest possible time.
  • A word from the Synset can be matched.
  • Once a synset is selected cross language matching can be determined.

Squirrel

Flying fish

Bird

AWN

Bird

PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

slide28
Demo
  • ESP-like game
    • http://m-culture.in.th/game/esp_game
    • Play mode
      • Single player mode: play against history
      • Two-player mode: guess to match each other
  • Extended Peekaboom game
    • http://m-culture.in.th/game/peekaboom
    • Play mode
      • Single player mode: play against history
      • Two-player mode: guess to match each other
    • For Thai language, use AWN to support synonym, hypernym, hyponym, meronym, and holonym
    • For other languages, use AWN to support synonym only

PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

preliminary experiment
Preliminary Experiment
  • 18 images played by 19 persons. For each image, we allow 60 seconds to guess a proper word.
  • AWN can expand the matching in 67 cases or increase 22% of matching ratio.

PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

cultural knowledge service
Cultural knowledge service

PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

slide31

Shop Database

Title shop

Snippet description

Tags A, B, C

Title shop

Snippet description

Tags A, B, C

Cultural Database

Product Database

Title shop

Snippet description

Tags A, B, C

A

Title culture

Snippet description

Tags A, B, C

Title product

Snippet description

Tags A, B, C

B

Title culture

Snippet description

Tags A, B, C

Title product

Snippet description

Tags A, B, C

C

Maker Database

D

Title culture

Snippet description

Tags A, B, C

Title product

Snippet description

Tags A, B, C

Title maker

Snippet description

Tags A, B, C

Title maker

Snippet description

Tags A, B, C

Title maker

Snippet description

Tags A, B, C

PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

slide32

To find a related Product from Culture information

Title culture

Snippet description

Tags A, B, C

Title culture

Snippet description

Tags A, B, C

Title culture

Snippet description

Tags A, B, C

Title product

Snippet description

Tags A, B, C

Title product

Snippet description

Tags A, B, C

Title culture

Snippet description

Tags A, B, C

PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

slide33

To find the background Culture information from a Product

Title product

Snippet description

Tags A, B, C

Title product

Snippet description

Tags A, B, C

Title product

Snippet description

Tags A, B, C

Title product

Snippet description

Tags A, B, C

Title product

Snippet description

Tags A, B, C

Title culture

Snippet description

Tags A, B, C

PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

slide34

Product and Culture information relation

Title culture

Snippet description

Tags A, B, C

Title product

Snippet description

Tags A, B, C

Title culture

Snippet description

Tags A, B, C

Title product

Snippet description

Tags A, B, C

Title culture

Snippet description

Tags A, B, C

Title product

Snippet description

Tags A, B, C

Title product

Snippet description

Tags A, B, C

Title product

Snippet description

Tags A, B, C

Title culture

Snippet description

Tags A, B, C

PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

summary
Summary
  • From this ESP-like game, we successfully named the images or at least obtained a list of candidates for labeling the object in the image to be used in the next extended Peekaboom game.
  • Synonym, hypernym, hyponym, meronym, holonym from AWN can help expanding the matching ratio.
  • Cross language image labeling is realized by AWN synonym.

PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

future work
Future Work
  • Enhancing keyword extraction to find more term candidate for image matching
  • Call for participation of the extended ESP and Peekaboom games for image labeling

PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012