1 / 24

UBIQ – Low Bandwidth Visual Communication

UBIQ – Low Bandwidth Visual Communication. Jonathan H. Connell Exploratory Computer Vision Group IBM T. J. Watson Research Center jconnell@us.ibm.com. What is it?. Links camera phone to any PC PC user can see video, snap pictures Good for a quick “beam in”.

penni
Download Presentation

UBIQ – Low Bandwidth Visual Communication

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. UBIQ – Low Bandwidth Visual Communication Jonathan H. Connell Exploratory Computer Vision Group IBM T. J. Watson Research Center jconnell@us.ibm.com

  2. What is it? • Links camera phone to any PC • PC user can see video, snap pictures • Good for a quick “beam in”

  3. UBIQ concept: The expert can be everywhere • Field service dilemma (e.g. repair): • Most problems have simple solutions • Some field-service problems require experts • Experts are expensive, want to utilize effectively • Medium-skilled labor can fix many problems • Hybrid solution • Send out medium-skilled person for quick fix in most cases • Call back to main office for more difficult problems • “Beaming in” the expert • Sometimes verbal communications is insufficient • Pictures can be sent, but take a long time to transmit • Person in the field might take picture of wrong aspect • Provide a real-time “viewfinder” mode to allow expert to quickly snap the right picture on a remote mobile phone

  4. … closer … … there. Open the side … Scenario: fixing a copier • Customer calls in problem = streaks on paper • Local maintenance guy shows up promptly • Checks for correct paper & toner level • Calls back to home office for advice • Shows paper markings • Expert asks for view of “fuser roller” • “What’s that?” • Local person uses video mode to get to correct location • Expert snaps image and examines • Fix problem by using alcohol wipe on this component (marked)

  5. Demo – click here to play

  6. Inspection at construction site • Concrete slab slipping down hill in Brazil • Fly in a civil engineer (while site idles) • Problem really requires a hydrologist? • Specialized medical consultation • Remote clinic in Botswana • Experts can’t (or won’t) travel there quickly • Check out foot rash without fear of contagion Other Scenarios

  7. Making the top of the skill pyramid virtually ubiquitous. Value Proposition • Lower cost of operations • No expense for cars, plane trips, lodging … • Right brain at the right place quickly • Can easily change experts if needed • No delays due to flights, visa approval … • Increases customer satisfaction • Better leverage existing expertise • No time lost on travel (or getting lost) • Bigger expert recruitment pool • No onerous travel or relocation • Social skills less important

  8. USA: blue = 400 kbaud, green = 50 kbaud South Africa: blue = 30 kbaud Critical Point: Designing around bandwidth • Verizon 3G EV-DO cites (uncompressed data): • Rev A peak: down = 600-1400 kbaud, up = 500-800 kbaud • Non Rev A peak: down = 400-700 kbaud, up = 60-80 kbaud • Local test: uplink 200KB in 25 sec  8KB / sec = 64 kbaud • Older CDPD / GPRS networks = 9.6-40 kbaud • Remote areas in US (Nebraska) • Developing countries (South Africa)

  9. Video transmission • Uplink bandwidth intrinsically limited • Handset radiated power (batteries, FCC limits) • Distance to base station • Generally assume 10-50 kbaud (like old dial-up) • Motion “video” requires 5-10 fps • H.264 (MPEG-4) lowest = 64 kbaud for 176x144 @ 15fps • WMV for dialup = 38 kbaud for 160x120 @ 15 fps • Need very low-bandwidth codecs • 350 bytes / frame @ 53 kbaud for 15fps • 100-200 bytes / frame @ 10 kbaud for 5-10fps

  10. Key technology • Low-bandwidth viewfinder suited to task • WHY: Allows expert to guide image acquisition more effectively • HOW: Use computer vision techniques to focus on “semantic” aspects • US patent 7,219,364 to IBM “System and Method for Selectable Semantic Codec Pairs for Very Low Data-Rate Video Transmission” Rudolf Bolle & Jonathan Connell (filed Feb. 2001, issued May 2007) Claims: • A system for compressing one or more video streams comprising: one or more image input devices creating the one or more video streams; and a selector process that selects a semantic compression process out of a set of semantic compression processes, the selected semantic compression process compressing the one or more video streams based on a task that required the compression of the one or more video streams and that utilizes content of the one or more video streams.

  11. 64 x 48 = 1242 bytes (1.2 secs) 128 x 96 = 2765 bytes (2.8 secs) 32 x 24 = 812 bytes (0.8 secs) Codec 1: JPEG stills • Compression settings • Moderate resolution • low quality (50) • Balance of clarity & speed • Non-linear with resolution • Network issues 4x fewer pixels 1.5x faster 4x more pixels 2.2x slower

  12. Interaction with network • Ethernet TCP/IP packet structure: • 8 bytes Ethernet framing • 20 byte TCP header • 14 bytes IPv4 MAC header • 46-1500 bytes payload • 4 bytes CRC check code • Effective bandwidth over raw 10 kbaud link: • 100 bytes  146 bytes = 8.6 fps (32% overhead) • 200 bytes  246 bytes = 5.1 fps (19% overhead) • 1000 bytes  1046 bytes = 1.2 fps (4% overhead) • Nagel algorithm in TCP • Tries to combine small packets for better efficiency • Need to disable for acceptable latency (and smoothness) • Delayed ACK in TCP • Multi-packet transmit can be delayed 200ms if no down-linked command

  13. 16 x 12 x 8 bits = 192 bytes Interpolated 8 bits 16 x 12 x 4 bits = 96 bytes Interpolated 4 bits Codec 2: Progressive gray • Low spatial and intensity resolution • 16 x 12 pixels • 4 bit gray scale • Image = 96 bytes • 10fps @ 10 kbaud • No Huffman coding • not effective on short messages nearly identical

  14. Algorithm • Progressive refinement • Send very low 4 bit resolution base • Send next resolution in 4 pieces • Send best resolution in 16 pieces • Add in low order bits in 16 pieces • Motion sensitivity • If basic scene changes start with new base image • Add resolution from the center outward • Long term stability • Don’t replace a good resolution image with a poorer one • Send new best resolution in 32 pieces in background

  15. Base 16 x 12 pixels Central quarter 1 Central quarters 1 & 2 Central quarters 1 & 2 & 3 Refinement sequence

  16. 16 x 12 @ 4 bits (0.1 secs) 32 x 24 @ 4 bits (0.5 secs) 64 x 48 @ 4 bits (2.1 secs) 64 x 48 @ 8 bits (3.6 secs) Resolution sequence

  17. Edge Magnitude 1 0 -1 2 0 -2 1 0 -1 Input Edge Direction (only 4 matter) 1 2 1 0 0 0 -1 -2 -1 Codec 3: Prominent lines • Convolve with Sobel masks • Y vs. X = angular direction • RMS value = magnitude

  18. Choosing edges • Separate into horizontal and vertical edges • Find connected components • Determine maximal length elements • Keep best N

  19. Pixel pattern Approximating edges • Find blob parameters • First order moments (centroid) • Second order moments (inertia) • Bounding box (max & min of x, y) • Get line endpoints • Line passes through centroid • Line is parallel to minimal axis • Clip to bounding box • Better than least squares • Not just minimum y error

  20. INPUT Final line version • Keep and code 50 best • (x0, y0, x1, y1) in 240x180 • 200 bytes total  5fps

  21. But only if low motion fattened previous “extra” edges  moved now - = Blend successive frames now mixed (grayed) previous + = Client side smoothing

  22. Input Progressive Lines JPEG Comparison of codecs • Different rates: 10 fps, 5 fps, 0.8 fps • Color vs. gray • Iconic vs. graphical Demo – click here to play

  23. UBIQ summary • Enhances visual communication • Multiple viewfinder codecs • Remote acquisition controls • Image mark-up possible • Fundamentals covered under US patent • Single platform implementation • Windows XP (PC client) • Windows Mobile 5.0 (Smartphone server) • Demo possible • http://www.research.ibm.com/people/j/jhc/ubiq/

  24. Future work • Field testing • See which codecs are useful for which tasks • Porting to other phones • Java, Symbian (camera access?) • Development of additional codecs • Area based analog to lines • Hybrid lines + blobs • Spatially varying resolution • Camera tracking partial stills • Quick remote zoom refinement

More Related