spatial business detection and recognition from images n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Spatial Business Detection and Recognition from Images PowerPoint Presentation
Download Presentation
Spatial Business Detection and Recognition from Images

Loading in 2 Seconds...

play fullscreen
1 / 110

Spatial Business Detection and Recognition from Images - PowerPoint PPT Presentation


  • 135 Views
  • Uploaded on

Spatial Business Detection and Recognition from Images. Alexander Darino. Outline. Project Overview. Previous Work Project Objective Anticipated End Result Project Pipeline. Project Overview. Previous Work: Where Am I?. Image. Where Am I?. Latitude, Longitude. Project Objective.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Spatial Business Detection and Recognition from Images' - gada


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
outline
Outline
  • Project Overview
project overview
Previous Work

Project Objective

Anticipated End Result

Project Pipeline

Project Overview
previous work where am i
Previous Work: Where Am I?

Image

Where Am I?

Latitude, Longitude

project objective
Project Objective
  • Given:
    • Image
    • Geolocation
  • Yield:
    • Spatial Identification of Businesses in Image
    • Addresses of Businesses in Image
    • Information about Businesses in Image
      • Ex. Reviews, Categories, Phone Number, etc.
project objective1
Project Objective
  • Given:
    • Image
    • Geolocation
  • Yield:
    • Spatial Identification of Businesses in Image
    • Addresses of Businesses in Image
    • Information about Businesses in Image
      • Ex. Reviews, Categories, Phone Number, etc.
project pipeline
Project Pipeline

Business Name Matching

Business Spatial

Detection

Latitude

Longitude

Geocoding

Reverse

Geocoding

Nearby Businesses

BusinessIdentification

Image

Text Extraction

Detected Text

business searching1
Business Searching

Business Name Matching

Business Spatial

Detection

Latitude

Longitude

Geocoding

Reverse

Geocoding

Nearby Businesses

BusinessIdentification

Image

Text Extraction

Detected Text

business searching2
Business Searching
  • Business Search Services
    • Google
    • Yelp
    • CityGrid (Supplier for Yellow Pages, Super Pages)
  • REST-based API
  • Results in JSON or XML format
  • Aggregate Results into Facade
slide12
{'businesses': [{'address1': '466 Haight St',                 'address2': '',                 'address3': '',                 'avg_rating': 4.0,                 'categories': [{'category_filter': 'danceclubs',                                 'name': 'Dance Clubs',                                 'search_url': 'http://yelp.com/search?find_loc=466+Haight+St%2C+San+Francisco%2C+CA&cflt=danceclubs'},                                {'category_filter': 'lounges',                                 'name': 'Lounges',                                 'search_url': 'http://yelp.com/search?find_loc=466+Haight+St%2C+San+Francisco%2C+CA&cflt=lounges'},                                {'category_filter': 'tradamerican',                                 'name': 'American (Traditional)',                                 'search_url': 'http://yelp.com/search?find_loc=466+Haight+St%2C+San+Francisco%2C+CA&cflt=tradamerican'}],                 'city': 'San Francisco',                 'distance': 1.8780401945114136,                 'id': 'yyqwqfgn1ZmbQYNbl7s5sQ',                 'is_closed': False,                 'latitude': 37.772201000000003,                 'longitude': -122.42992599999999,                 'mobile_url': 'http://mobile.yelp.com/biz/yyqwqfgn1ZmbQYNbl7s5sQ',                 'name': 'Nickies',                 'nearby_url': 'http://yelp.com/search?find_loc=466+Haight+St%2C+San+Francisco%2C+CA',                 'neighborhoods': [{'name': 'Hayes Valley',                                    'url': 'http://yelp.com/search?find_loc=Hayes+Valley%
business searching results
Business Searching: Results

40.441127247181797 -80.002821624487595

Denham & Company Salon

Ullrich's Shoe Repairing

Nicholas Coffee Co

Bella Sera On the Square

A & J Ribs

Starbucks Coffee

Jenny Lee Bakery

Galardi's 30 Minute Cleaners

Jimmy John's Gourmet Sandwiches

Charley's Grilled Subs

Fresh Corner

Lagondola Pizzeria & Restaurant

Camera Repair Service Inc

Pittsburgh Cigar Bar

Original Oyster House

MixStirs

1902 Tavern

Costanzo's

Pittsburgh Silver Llc

Graeme St

Galardi's 30 Minute Cleaners

Denham & Co Salon

Bruegger's Bagel Bakery

Nicholas Coffee Co

Market Square

Fat Tommy's Pizzeria

Mixstirs Cafe

Giggles

Rycon Construction Inc

Garbera, Dennis C, Dds - Emmert Dental Assoc

Bella Sera on the Square

Mancini's Bread Co

Las Velas

Ciao Baby

Washington Reprographics Inc

Highmark Life Insurance Co

Fischer, Donald R, Md - Highmark Life Insurance Co

Jimmy John's

Lynx Energy Partners Inc

Emmert Dental Assoc

business searching evaluation
Business Searching: Evaluation
  • Strengths
    • Aggregated results almost always found Business of interest
  • Weaknesses
    • Each API limits query result set size - this is why we aggregate
    • Only businesses listed
    • Not all businesses listed
  • Limitations
    • Dependent on well-populated, accurate Business Directories
    • Have only tested for 15 Pittsburgh images - unknown result quality for rural areas.
extracting identifying text1
Extracting Identifying Text

Business Name Matching

Business Spatial

Detection

Latitude

Longitude

Geocoding

Reverse

Geocoding

Nearby Businesses

BusinessIdentification

Image

Text Extraction

Detected Text

extracting identifying text ocr
Extracting Identifying Text: OCR
  • Used Two OCR APIs:
    • GNU OCR (Ocrad)
    • GOCR
  • OCR APIs highly sensitive to:
    • Font (only works well with roman font)
    • Perspective
    • Scale
    • Binarization Threshold
    • Dark on Light vs. Light on Dark (inversion)
extracting identifying text ocr1
Extracting Identifying Text: OCR
  • OCR API evaluations
    • Ocrad - could not yield any meaningful data across over 200 scale/threshold/inversion combinations
    • GOCR - produced good results across 10 scales with and without inversion using threshold automatically determined by Otsu's method
  • 98% of Results are garbage!
  • Examples of GOCR output (next slides)
extracting identifying text ocr3
Extracting Identifying Text: OCR

n.c.......o.a...u..............oU..D.oa..e......_RuEGGE..KERy..J...w...........L........M.II.....c..

...i.......l..J.t...llt...lSHA.P.It..tllt.........._.l...Jy._.c_...._tt.._....t.._.r.........t.t_t.._.._.l..J.r.r.I.

extracting identifying text ocr5
Extracting Identifying Text: OCR

u..........._nq......eoR.E.l.e...í....e...n...

.n....n.e.R.E...e....o.

_....E.R.E.IKE........I.ltlO..

.......rE..o......E.....I.K.E.o.....

J.n....c...E.R.E.I.E......

.M..E.R.E...E...a

J...Gu.ge..geE.F

.._.....E..gE.D...

fUlI..lll.lll.IIi.l..Xl..

extracting identifying text ocr7
Extracting Identifying Text: OCR

..e_..w.._......D.........uJ.....J.................n......n..........n_..r.l_d..J.ec.m._..n.......J.n.._...tn..ct..._.................D.u.v...e.n....u..

Y.._w.n.n....Jn.......G..o..r..._........J...ml.t..l.tt.l.._w....................._....l....t........j..ilI.i..

extracting identifying text ocr9
Extracting Identifying Text: OCR

__.ncu_.l..._..._J...ne......._n._..v.....ra......d_..._.............i..n..UllREsT.unAN...r.c.....r...Tt.rJll......m...c.....n.......

...Jn.I..c...r.rESTAU.ANT.r.O....c.cc.

Note: Even though "Tambellini" is a roman font, it is too stretched to be picked up by GOCR

ocr evaluation
OCR Evaluation
  • Strengths
    • Applicable to expected input of orthogonal images
  • Weaknesses
    • Only works well(-ish) for strictly roman font
  • Limitations
    • Will perform poorly for artistic fonts and business signs
  • Conclusion
    • By itself, OCR is not the best approach towards Business identification
      • Reasons: poor recognition, franchises, perspective, etc
business name matching1
Business Name Matching

Business Name Matching

Business Spatial

Detection

Latitude

Longitude

Geocoding

Reverse

Geocoding

Nearby Businesses

BusinessIdentification

Image

Text Extraction

Detected Text

business name matching2
Business Name Matching
  • Given: Unreliable fragments of ‘detected text’
  • Yield: Matching Business Names
  • Process:
    • Filter input: trimming, uselessness (< 2 letters)
    • Fuzzy String Matching
    • Voting Scheme: confidence of business appearing in image
business name matching3
Business Name Matching
  • Developed Confidence Attribution Algorithm
    • Confidence of OCR Token being Name Token
      • Example: Confidence of “ESTUANT” representing “RESTAURANT”
      • Point-based system
    • Confidence of Name appearing in Image
      • Sum of points of matching OCR Text
      • Use logarithmically-normalized points to determine business inclusion threshold
business name matching8
Business Name Matching

Note: This originally did not appear because it did not exceed the confidence threshold. It now appears because it contributes to the Business Name Identification

business spatial identification
Business Spatial Identification

Business Name Matching

Business Spatial

Detection

Latitude

Longitude

Geocoding

Reverse

Geocoding

Nearby Businesses

BusinessIdentification

Image

OCR

Detected Text

business spatial identification2
Business Spatial Identification

Aiken George S Co

Category: Food, Grocery

Address: 218 Forbes Ave Pittsburgh, PA 15222

Phone: (412) 391-6358

Rating: 4.5/5 (2 Reviews)

business spatial identification5
Business Spatial Identification

Bruegger's Bagels

Category: Bagels

Address:Market Sq

Pittsburgh, PA 15222

Phone: (412) 281-2515

Rating: Not Rated

current approach
Current Approach

Business Name Matching

Business Spatial

Detection

Latitude

Longitude

Geocoding

Reverse

Geocoding

Nearby Businesses

BusinessIdentification

Image

OCR

Detected Text

weaknesses to current approach
Weaknesses to Current Approach

Business Name Matching

Business Spatial

Detection

Latitude

Longitude

Geocoding

Reverse

Geocoding

Nearby Businesses

BusinessIdentification

Image

OCR

Detected Text

weaknesses to current approach2
Weaknesses to Current Approach

Fragmented Word Detection

weaknesses to current approach3
Weaknesses to Current Approach

Fails withnon-orthogonal perspective

Did I already mention lots of garbage?

weaknesses to current approach4
Weaknesses to Current Approach

Fails withnon-roman text

Not scale-invariant

alternative 1 image matching
Alternative #1: Image Matching

Match to Storefront Image

Business Spatial

Detection

Latitude

Longitude

Geocoding

Reverse

Geocoding

Nearby Businesses

BusinessIdentification

Image

alternative 1 evaluation
Alternative #1: Evaluation
  • Weaknesses:
    • Low Availability of Storefront Images (< 50% Avg)
      • George Aiken area businesses with photos: 18/35
      • Brueggers area businesses with photos: 22/40
      • Tambellini area businesses with photos: 8/22
    • Available Images too small (100 x 100)
    • Computationally Expensive
  • Conclusion: Not a viable solution
alternative 2 template matching
Alternative #2: Template Matching
  • Tambellini
  • Tambellini
  • Tambellini
  • Tambellini
  • Tambellini
  • Tambellini
  • Tambellini
  • Tambellini
alternative 2 template matching1
Alternative #2: Template Matching

Latitude

Longitude

Geocoding

Reverse

Geocoding

Render Templates of Business Names in Different Fonts

Nearby Businesses

Template Images

Image

Image Matching

(eg. SIFT, HAAR)

Business Identification

Business Spatial

Detection

alternative 2 template matching2
Alternative #2: Template Matching

OCR

Alternative #2

Scale Invariant

Bounded Search

Whole-word recognition

All fonts

  • Not Scale Invariant
  • Unbounded Search
  • Fragmented Recognition
  • Roman-only font
alternative 3 scene text recognition
Alternative #3: Scene Text Recognition
  • State of the Art:
    • STR ≠ OCR
    • Far superior to our ‘naïve’ approaches to STR (ie. OCR, Image matching, SIFT)
  • OCR only works for highly controlled environments
  • STR works for unconditioned environments
    • Scale invariant
    • Color/intensity invariant
    • Lexicon-Assisted
alternative scene text recognition
Alternative: Scene Text Recognition
  • No STR implementations readily available
  • Have contacted several groups specialized in STR – unable to assist us in providing implementation for research purposes
  • Had to resort to implement STR from scratch
str implementation
STR Implementation
  • STR Implementation: “Automatic Detection and Recognition of Signs From Natural Scenes”

Multiresolution-based potential characters detection

Character/layout geometry and color properties analysis

Refined Detection

Local affine rectification

str implementation1
STR Implementation
  • STR Implementation: “Automatic Detection and Recognition of Signs From Natural Scenes”

Multiresolution-based potential characters detection

Character/layout geometry and color properties analysis

Refined Detection

Local affine rectification

multiresolution based potential characters detection1
Multiresolution-based potential characters detection
  • Laplacian-of-Guassian Edge Detection
  • Dice image/edges into Patches
    • Combine patches with similar properties into regions
    • Obtain bounding box of region as candidate text
    • Properties include:
      • Mean
      • Variance
      • Intensity
problems with current approach
Problems with Current Approach
  • Too much “bleeding”
  • Unstable edge-data due to unpredictability of location of edge patch relative to edge itself
new approach
New Approach
  • Each edge pixel gets an N x N edge patch (eg. 3x3)
  • Edge patches overlap
    • Tighter boundary boxes
    • More region consistency
    • More robust to resolution changes
    • Able to use tighter thresholds
text detection problem 1
Text Detection Problem #1

How do I know that two regions are close enough together that they might be part of the same character?

  • Center of bounding box?
  • Moment of regions?
  • Nearest Neighbor?
  • Connectedness?

All have severe weaknesses

text detection problem 2
Text Detection Problem #2

How do I know that two characters are close enough to be considered a part of the same word?

Easier version of the last problem, but still hard!

str implementation2
STR Implementation
  • STR Implementation: “Automatic Detection and Recognition of Signs From Natural Scenes”

Multiresolution-based potential characters detection

Character/layout geometry and color properties analysis

Refined Detection

Local affine rectification

color properties analysis
Color Properties Analysis
  • Implemented Gaussian Mixture Model (GMM) to obtain μ and σ of foreground/background for: R/G/B/H/I
  • Calculated Confidences that component (RGBHI) can be used to recognize characters

Multiresolution-based potential characters detection

Character/layout geometry and color properties analysis

Refined Detection

Local affine rectification

color properties analysis1
Color Properties Analysis
  • Assumed Invariant: High contrast between foreground/background of characters in sign
  • Choose the channel (R/G/B/H/I) that is best suited for use with character recognition
color analysis evaluation
Color Analysis: Evaluation
  • Highest confidence observed to be channel best suited for OCR…
  • …Did I just say OCR?

YES!

(I did.)

slide106

Optical Characterrecognition II

(and this time, it’s personal)

refined detection
Refined Detection
  • Generate alphabet templates in different fonts
  • Resize templates; Divide into grid
  • Apply several 2D Gabor filters to each grid patch
    • Different orientations, frequencies, variances
    • For each pixel, yields real/imaginary component of transformation
  • Feed data into Linear Discriminant Analysis
    • Reduces features and forms classifier at same time
2d gabor filter
2D Gabor Filter
  • Convolution of Gaussian x Sine wave
live demonstration

Live Demonstration

Training

Classification