Nonnegative shared subspace learning and its application to social media retrieval
Download
1 / 19

Nonnegative Shared Subspace Learning and Its Application to Social Media Retrieval - PowerPoint PPT Presentation


  • 87 Views
  • Uploaded on

Nonnegative Shared Subspace Learning and Its Application to Social Media Retrieval. Presenter: Andy Lim. Paper Topic. Folksonomy Social media s haring p latforms. The Problem. Rise in popularity of social image and video sharing platforms Precision of tag-based media retrieval Tags are

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Nonnegative Shared Subspace Learning and Its Application to Social Media Retrieval' - jaron


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Nonnegative shared subspace learning and its application to social media retrieval

Nonnegative Shared Subspace Learning and Its Application to Social Media Retrieval

Presenter: Andy Lim


Paper topic
Paper Topic Social Media Retrieval

  • Folksonomy

  • Social media sharing platforms


The problem
The Problem Social Media Retrieval

  • Rise in popularity of social image and video sharing platforms

  • Precision of tag-based media retrieval

  • Tags are

    • Noisy

    • Ambiguous

    • Incomplete

    • Subjective

  • Lack of constraints

    • Free-text tags (i.e. “djfja;sldfkj”)

Tags: hotdog, chinese, trololol, aidjishi, sandwich, bread


Previous research internal
Previous Research Social Media Retrieval(Internal)

  • Improving tag relevance

  • Sigurbjornsson and Zwol

    • Developed a method of recommending a set of relevant tags based on tag popularity

  • Li et al.

    • List all images for a given tag and determine tag relevance from visual similarity

  • All are confined to noisy tags within the primary dataset


The approach
The Approach Social Media Retrieval

  • Internal vs. External

  • Leverage external auxiliary sources of information to improve target tagging systems (presumably much noisier)

  • Exploit disparate characteristics of target domain using auxiliary source

  • Note: What is the optimal level of joint modeling such that the target domain still benefits from the auxiliary source?


Assumptions
Assumptions Social Media Retrieval

  • There is a common underlying subspace shared by the primary and secondary domains

  • The primary domain is much nosier than the secondary domains


Nonnegative matrix factorization
Nonnegative Matrix Factorization Social Media Retrieval

  • X (M x N data matrix) where N = documents in terms of M vocabulary words

  • F (M x R nonnegative matrix) represents R basis vectors

  • H (R x N nonnegative matrix) contains coordinates of each document


Joint shared nonnegative matrix factorization jsnmf
Joint Shared Nonnegative Matrix Factorization (JSNMF) Social Media Retrieval

  • Input:

    • X (target domain), Y (auxiliary domain), R1 and R2 (dimensionality of underlying subspaces of X and Y), K (basis vectors)

  • Output:

    • W (joint shared subspace), U (remaining subspace in target domain), V (remaining subspace in auxiliary domain), H (coordinate matrix for target domain), L (coordinate matrix for auxiliary domain)


Retrieval using jsnmf
Retrieval using JSNMF Social Media Retrieval

  • Input: W, U, H, query sentence SQ, number of images (or videos) to be retrieved N and image (or video) dataset

  • Output: Return top N retrieved images (or videos)


Experiment
Experiment Social Media Retrieval

  • Use LabelMe tags (auxiliary) to improve

    • Image retrieval in Flickr

    • Video retrieval in Youtube

  • Why LabelMe?

    • Object image tagging

    • Controlled vocabulary


Flickr dataset
Flickr Dataset Social Media Retrieval

  • Downloaded 50,000 images from Flickr

  • Average number of distinct tags = 8

  • Removed

    • Rare tags (appears less than 5 times)

    • Images with no tags and non-English tags

  • Obtained 20,000 labeled images

  • 7,000 examples are kept for investigating internal auxiliary dataset


Youtube dataset
YouTube Dataset Social Media Retrieval

  • Downloaded 18,000 videos’ metadata (tags, URL, category, title, comments, etc.)

  • Average number of distinct tags = 7

  • Removed

    • Rare tags (appearing less than 2 times)

    • Videos with no tags or non-English tags

  • Obtained dataset corresponding to 12,000 videos

  • Again, kept 7,000 examples to be used as an internal auxiliary dataset


Labelme dataset
LabelMe Social Media Retrieval Dataset

  • Added 7,000 images with tags from LabelMe

  • Average number of distinct tags = 32

  • Removed

    • Rare tags (appearing less than 2 times)

  • Cleanup does not reduce dataset


Evaluation measures
Evaluation Measures Social Media Retrieval

  • Defined query set Q

    • {cloud, man, street, water, road, leg, table, plant, girl, drawer, lamp, bed, cable, bus, pole, laptop, plate, kitchen, river, pool, flower}

  • Manually annotated the two datasets (Flickr and YouTube) with respect to the query set (no benchmark dataset available)

  • Query term and an image is relevant if the concept is clearly visible in the image (or video)


Results with jsnmf
Results with JSNMF Social Media Retrieval

  • Precision-Scope Curve

  • Fix recall at 0.1

    • Users are usually only interested in first few results

  • 10% improvement


Results with jsnmf1
Results with JSNMF Social Media Retrieval

  • Under-representation

    • Shares very few basis vectors

  • Over-representation

    • Forces many basis vectors to represent both datasets

  • Appropriate level of representation


Flickr retrieval results
Flickr Retrieval Results Social Media Retrieval

  • Results are better with LabelMe

  • As recall increases, precision decreases

  • When K=0 (no sharing) or K=40 (fully sharing), precision is lower compared to K=15


Youtube retrieval results
YouTube Retrieval Results Social Media Retrieval

  • Similar to Flickr Results


Extra notes questions
Extra Notes & Questions? Social Media Retrieval

  • Can be extended to multiple datasets (not just 2)

  • Can use generic model to apply to other data mining tasks


ad