1 / 56

Data-driven Generation of Image Descriptions

Data-driven Generation of Image Descriptions. Advisor: Tamara Berg. Vicente Ordonez-Roman. Previously: . The State University of New York. What most Computer Vision systems aim to say about a picture. sky trees water building bridge river tree. Computer Vision.

cruz
Download Presentation

Data-driven Generation of Image Descriptions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data-driven Generation of Image Descriptions Advisor: Tamara Berg Vicente Ordonez-Roman Previously: The State University of New York

  2. What most Computer Vision systems aim to say about a picture skytreeswaterbuildingbridge river tree Computer Vision

  3. What we are able to say about a picture An old bridge over dirty green water. Our Goal One of the many stone bridges in town that carry the gravel carriage roads. A stone bridge over a peaceful river.

  4. Let’s just borrow captions from similar images! Im2Text: Describing Images Using 1 Million Captioned Photographs. Vicente Ordonez, Girish Kulkarni, Tamara L. Berg.Advances in Neural Information Processing Systems. NIPS 2011.

  5. Harness the Web! Matching using Global Image Features(GIST + Color) Images + Captions from the Web Smallest house in paris between red (on right) and beige (on left). A walk around the lake near our house with Abby. Bridge to temple in HoanKiemlake. Transfer Caption(s) e.g. “The water is clear enough to see fish swimming around in it.” The daintree river by boat. Hangzhou bridge in West lake. The water is clear enough to see fish swimming around in it. . . .

  6. GIST

  7. Use the web to collect images + captions 90, 000, 000, 000 pictures~!! (**)A lot of them with captions(a lot of them not publicy available ) 6, 000, 000, 000 photographs! (*)A lot of them with captions(lots of them publicly available ) (*) http://blog.flickr.net/en/2011/08/04/6000000000/(**) http://www.quora.com/How-many-photos-are-uploaded-to-Facebook-each-day

  8. Flickr images + captions cat in a sink A 10-kg cat called Hercules.. and got caught in a pet door when trying to sneak into another house to steal dog food. 'Nuff said Dog with a ball in its mouth running around like crazy on the green grass.

  9. Flickr images + captions cat in a sink A 10-kg cat called Hercules.. and got caught in a pet door when trying to sneak into another house to steal dog food. 'Nuff said Dog with a ball in its mouth running around like crazy on the green grass. Dog with a ball in its mouth running around like crazy on the green grass.

  10. Flickr images + captions cat in a sink A 10-kg cat called Hercules.. and got caught in a pet door when trying to sneak into another house to steal dog food. 'Nuff said Dog with a ball in its mouth running around like crazy on the green grass.

  11. Flickr images + captions cat in a sink A 10-kg cat called Hercules.. and got caught in a pet door when trying to sneak into another house to steal dog food. 'Nuff said Dog with a ball in its mouth running around like crazy on the green grass. cat in a sink

  12. Flickr images + captions cat in a sink A 10-kg cat called Hercules.. and got caught in a pet door when trying to sneak into another house to steal dog food. 'Nuff said Dog with a ball in its mouth running around like crazy on the green grass.

  13. Flickr images + captions cat in a sink A 10-kg cat called Hercules.. and got caught in a pet door when trying to sneak into another house to steal dog food. 'Nuff said Dog with a ball in its mouth running around like crazy on the green grass. A 10-kg cat called Hercules.. and got caught in a pet doorwhen trying to sneak into another house to steal dogfood. 'Nuffsaid

  14. Solution: Collect hundreds of millions of captionsFilter them outWe found “good captions” have visual concepts and relation words “by”, “in”, “over”, “beside”, “on top of”~1 “good caption” for every 1000 “bad captions” Im2Text: Describing Images Using 1 Million Captioned Photographs. Vicente Ordonez, Girish Kulkarni, Tamara L. Berg.Advances in Neural Information Processing Systems. NIPS 2011.

  15. SBU Captioned Photo Dataset 1 million captioned photos! 1 million captioned photos! Little girl and her dog in northern Thailand. They both seemed interested in what we were doing Interior design of modern white and brown living room furniture against white wall with a lamp hanging. The Egyptian cat statue by the floor clock and perpetual motion machine in the pantheon Man sits in a rusted car buried in the sand on Waitarere beach Our dog Zoe in her bed Emma in her hat looking super cute

  16. Results (1) while walking bythewater(2) plane flying overthesun(3) shot thisina moving car atthenkve highway(4) sunsetovercrevecoeur lake andthe page bridge(5) sunseton 12th sep 2009 asseenfromthe field polder nearmy house(6) window over yellow door(7) sunsetover capitol hill asseenfromthe roof ofmy building(8) an orange skyovertheirishsea(9) beautiful golden sunset reflected inthe waves oftheocean(10) redskyprobably caused by volcanic ash fromiceland(11) a view ofsunsetover river brahmaputafromkoliyabhumura bridge(12) redskyinthe morning

  17. Results (1) burnt wooden doorin derelict buildingportugal(2) peterborough cathedral normandoorin south wall(3) amazing wooden doorwith wider light above(4) doorin wall(5) girl lookingina classroom window(6) a interesting cross ina window ofan ancient city(7) this mirror decorated with fruit painting was left behindbytheprevious owners(8) unusual exterior wall postbox atstalbans post office inst peters street al1(9) doorin oxford ukin black and white(10) 19 plate behind glass in brass mat and preserver(11) thisissomeofthe window decoration external onthe house justoverthe porch 0364(12) cat ina window

  18. Results (1) img8783 ginger inthe red chair(2) red sky inthe morning(3) thecatisinthe bag andthe bag isinthe river quot(4) the light inthe kitchen made everythin glow mylittle girl is growing up(5) mycatina box thatisfartoo small forher(6) oneofthe towel animals inthe cabin ednootjivotnitenapravenoothavlienikarpi v kabinata(7) baby inherlater years turned from green to red butsheneverwent fully red allover(8) ifyoutake pictures throughthe hole inthe bottom ofa flower pot thewholeofthe eldritch world is revealed(9) glazed ceramic poop form in orange wooden box(10) rock garden in library(11) it s funny to capture thepreciousestcatinthehouseathismostdevillicious(12) the pink willget replaced by orange and blue inthe fall

  19. Results (1) starfish fromthe book toys toknitdashingdachssuperwash sock yarn ingoldfishbackingis orange fabricstuffingis pillow stuffing(2) mural of birds and trees inthe crypt ofwatratburanaayutthaya(3) carvings inthe rock wall(4) acrylic on paper scarlet macaws communicate inthe color red withyellowand blue as visual grammar(5) epsomand table salt crystals growing in concentrated green tea solution(6) the hops dried toa golden green ina matter ofafew days almosttoo pretty to bag up(7) after staring atthe gorgeous colors ofthe leaves claesdiscoveredthattherewereabout 100 birds sleeping inthe tree(8) youknowyoureinwisconsinwhenthe beach has pine needles inthe sand(9) iwas walking downthe sidewalk andisawthis glove craft droppedinthe dirt itseemedreally unusual(10) made by fusing plastic bags(11) bark pattern froma ponderosa pine tree in grand canyon national park(12) the peasant that found a statue ofthe black virgin ona rock inariver

  20. What to do next?

  21. Use High Level Content to Rerank(Objects, Stuff, People, Scenes, Captions) The bridgeoverthe lake on Suzhou Street. Iron bridgeoverthe Duck river. Transfer Caption(s) e.g. “The bridge over the lake on Suzhou Street.” The Daintreeriverby boat. BridgeoverCacaponriver. . . .

  22. Some success… Amazing colours in the sky at sunset with the orange of the cloud and the blue of the sky behind. A female mallard duck in the lake at LuukkiEspoo Tree with red leaves in the field in autumn. Fresh fruit and vegetables at the market in Port Louis Mauritius. Under the sky of burning clouds. The sun was coming through the trees while I was sitting in my chair by the river Strange cloud formation literally flowing through the sky like a river in relation to the other clouds out there. Stained glass window in Eusebius church.

  23. Still far from perfect Incorrect objects Kentucky cows in a field. The cat in the window.

  24. Still far from perfect Incorrect context The sky is blue over the Gherkin. Tree beside the river. Completely wrong The boat ended up a kilometre from the water in the middle of the airstrip. Water over the road.

  25. How to Evaluate? • “Ground truth”: The car is parked next to the train station besides a building. • Candidates: “There is car parked in front of an office building”“This is the building that hosted the ceremony”“A vehicle stopped next to my house” Similar to evaluation on Machine Translation

  26. BLEU score evaluation against Human Captions

  27. Human Visual Verification Please choose the image that better corresponds to the given caption: View overlooking Kuala Lumpur from my office building

  28. Human Visual Verification Caption from Flickr Random image Please choose the image that better corresponds to the given caption: View overlooking Kuala Lumpur from my office building

  29. Human Visual Verification Caption from Flickr Random image Please choose the image that better corresponds to the given caption: View overlooking Kuala Lumpur from my office building

  30. Human Visual Evaluation Caption produced by our system Random image Please choose the image that better corresponds to the given caption: The view from the 13th floor of an apartment building in Nakano awesome.

  31. Human Visual Evaluation Caption produced by our system Random image Please choose the image that better corresponds to the given caption: The view from the 13th floor of an apartment building in Nakano awesome.

  32. What to do next?

  33. Let’s not borrow captions from other images, let’s just borrow short phrases! Collective Generation of Natural Image Descriptions.PolinaKuznetsova, Vicente Ordonez, Alexander C. Berg, Tamara L. Berg, Yejin Choi.Association for Computational Linguistics. ACL 2012. Large Scale Retrieval for Image Description GenerationVicente Ordonez, Xufeng Han, PolinaKuznetsova, GirishKulkarni, Margaret Mitchell,Kota Yamaguchi, Karl Stratos, AmitGoyal, Jesse Dodge, Alyssa Mensch, Hal Daume III,Alexander C. Berg, Yejin Choi, Tamara L. BergOn Submission to IJCV special issue on Big Data.

  34. Retrieving noun phrases from similar object detections

  35. Retrieving verb phrases from similar object detections Contented dog just laying on the edge of the road in front of a house.. Peruvian dog sleeping on city street in the city of Cusco, (Peru) Detect: dog Find matching dog detections by visual similarity this dog was laying in the middle of the road on a back street in jaco Closeup of my dog sleeping under my desk.

  36. Retrieving prepositional phrases from region + detection matches Find matching region detections using appearance + arrangement Cordoba - lonely elephant under an orange tree... I positioned the chairs around the lemon tree -- it's like a shrine Mini Nike soccer ball all alone in the grass Object:car Comfy chair under a tree.

  37. Retrieving prepositional phrases from scene matches I'm about to blow the building across the street over with my massive lung power. Pedestrian street in the Old Lyon with stairs to climb up the hill of fourviere Extract scene descriptor Find matching images by scene similarity Only in Paris will you find a bottle of wine on a table outside a bookstore View from our B&B in this photo

  38. Data Processing 1 million images: • Run object detectors • Run region based stuff detectors (e.g. grass, sky, etc) • Run global scene classifiers • Parse captions associated with images and retrieve phrases referring to objects (NPs, VPs), region relationships (PPstuff), and general scene context (PPscene).

  39. Recognition, aka Vision is hard Detecting one hundred objects

  40. Sometimes you can make it (a little) better Detecting “mentioned” objects Look in the mountain for a lion face The background is a vintage paint by number painting I have and the fabulous forest dress is by candyjunky! Ecuador, amazon basin, near coca, rain forest, passion fruit flower Kevin’s mom, so punxrawk in Kev’s black flag hat

  41. Everything together Scene Objects Actions Stuff in Lincoln City Oregon coast looking for food in water bird

  42. Everything together Retrieved phrases looking for food in Atlantic City in water bird looking for food on the beach in water bird looking for food in Lincoln City Oregon coast in water bird

  43. Binary Integer Linear Programming Position k Position k+1 Position k Phrase spq Phrase sij Phrase sij Head words co-occurrence Pairwise phrase cohesion Ngram cohesion + = Phrase Vision Confidence

  44. Composing Descriptions Compose descriptions from phrases with ILP approach • Linguistic constraints • Allow only one phrase of each type • Enforce plural/singular agreement between NP and VP • Discourse constraints • Prevent inclusion of repeated phrasing • Phrase cohesion constraints • n-gram statistics between phrases • Co-occurrence statistics between head words of phrases (last word or main verb) to encourage longer range cohesion

  45. Good Results This is a sporty little red convertible made for a great day in Key West FL. This car was in the 4th parade of the apartment buildings. Taken in front of my cat sitting in a shoe box. Cat likes hanging around in my recliner. This is a brass viking boat moored on beach in Tobago by the ocean.

  46. Bad Results Cognitive absurdity. Grammatically incorrect. Not relevant This is a shoulder bag with a blended rainbow effect Here you can see a cross by the frog in the sky. One of the most shirt in the wall of the house.

  47. BLEU score evaluation

  48. Human Forced Choice Evaluation

  49. Visual Turing Test Us vs Original Human Written Caption In some cases (16%), ILP generated captions were preferred over human written ones!

  50. What’s next?

More Related