Pierre Sermanet¹·² Raia Hadsell¹ Jan Ben² Ayse Naz Erkan¹ Beat Flepp² Urs Muller² Yann LeCun¹

SPEED-RANGE DILEMMAS FOR VISION-BASEDNAVIGATION IN UNSTRUCTURED TERRAIN Pierre Sermanet¹·² Raia Hadsell¹ Jan Ben² Ayse Naz Erkan¹ Beat Flepp² Urs Muller² Yann LeCun¹ (1) Courant Institute of Mathematical Sciences, New York University (2) Net-Scale Technologies, Morganville, NJ 07751, USA

Outline • Program and System overview • Problem definition • Architecture • Results Pierre Sermanet September 4th, 2007 2/21

SwRI, TX Ft. Belvoir, VA Ft. Belvoir, VA Hanover, NH Overview Problem Architecture Results Overview: Program • LAGR: Learning Applied to Ground Robots • Demonstrate learning algorithms in unstructured outdoor robotics • Vision-based only (passive), no expensive equipment • Reach a GPS goal the fastest without any prior knowledge of location • DARPA funded, 10 teams (Universities and companies), common platform • Comparison to state-of-the-art CMU “baseline” software and other teams • Monthly tests by DARPA in various unknown locations: • Unstructured outdoor robotics is highly challenging due to wide diversity of environments (colors, shapes, sizes of obstacles, lighting and shadows, etc) • Conventional algorithms are unsuited, need for adaptability and learning Pierre Sermanet September 4th, 2007 3/21

Overview Problem Architecture Results Overview: Platform • Constructor: CMU/NREC • Vision based only: 2 stereo pairs of cameras (+ GPS for global navigation) • 4 Linux machines linked by Gigabit ethernet: • Two “eye” machines (dual core 2Ghz): Image processing • “planner” machine (single core 2Ghz): Planning and control loop • “controller” machine: Low level communication • Maximum speed: 1.3m/s • Proprietary CMU/NREC API to sensors and actuators • Proprietary CMU/NREC “Baseline”: end-to-end navigation software (D*, etc) (not re-used) GPS Dual stereo cameras Bumper Pierre Sermanet September 4th, 2007 4/21

Overview Problem Architecture Results Overview: Philosophy • Main goal: Demonstrate machine learning algorithms for long-range vision (RSS07). • Supporting goal: Build a solid software platform for long-range vision and navigation: • Robust and reliable • Resistant to sensors imprecisions and failures Self-supervised learning using convolutional network: Input image Stereo labels (short-range) Input: context-rich image windows Output: long-range labels Pierre Sermanet September 4th, 2007 5/21

Sensors (cameras) Eye machine Input image Frequency Image processing Traversibility map Network transmission Pose (GPS + IMU + wheels) Path planning Planner machine Path Control Latency Actuators (wheels) Overview Problem Architecture Results Overview: System • Processing chain: Note: Latency is not only tied to frequency but also to sensors latency, network, planning and actuators latency. Pierre Sermanet September 4th, 2007 6/21

Overview Problem Architecture Results Problem • Important performance drop in local obstacle avoidance with too high latency and frequency: Performance Test of July 2006, Holmdel Park, NJ • Artificially increasing latency and period almost linearly increases the number of crashes in obstacles • Human expert drivers of the UPI Crusher vehicle reported a feedback latency of 400ms was the maximum for good remote driving. How to guarantee good performance with increasing complexity introduced by sophisticated long-range vision modules? When does processing speed prevails over vision range, and vice-versa? Pierre Sermanet September 4th, 2007 7/21

fixed Overview Problem Architecture Results Problem: Delays • Latency and frequency determine performance, but latency is actually composed of 3 types of latencies or “delays”: • Sensors/Actuators latency + LAGR API latency: Images are already 190ms old when made available to image processing • Processing latency • Robot’s dynamics latency (inertia + acceleration/deceleration): 1.5sec (worst case) between a wheel command and actual desired speed • (1)and(3)are relatively high on the LAGR platform and must be caught up to and taken in account by (2). Pierre Sermanet September 4th, 2007 8/21

Overview Problem Architecture Results Problem: Solutions to delays • To account for sensors and processing latencies (1) and (2): • Reduce processing time. • Estimate delays between path planning and actuation. • Place traversibility mapsaccording to delays before and after path planning. • To account for dynamics latencies (3): • Modeling or record robot’s dynamics. All (a), (b), (c) and (d) solutions are part of the global solution presented in the results section, but here we will only describe a successful architecture for (a) Pierre Sermanet September 4th, 2007 9/21

Overview Problem Architecture Results Architecture • Idea: • Wagner et al.¹ showed that a walking human gazes more frequently close by than far away:  need higher frequency closer than far away • Close by obstacles move toward robot faster than far obstacles:  need lower latency closer than far away • To satisfy those requirements, short and long range vision must be separated into 2 parallel and independent OD modules: • “Fast-OD”: processing has to be fast, vision is not necessarily long-range. • “Far-OD”: vision has to be long-range, processing can be slower. • How to make Fast-OD fast?  Simple processing and reduced input resolution.  Can we reduce resolution without reducing performance? ¹ M. Wagner, J. C. Baird, andW. Barbaresi. The locus of environmental attention. J. of Environmental Psychology, 1:195-206, 1980. Pierre Sermanet September 4th, 2007 10/21

Sensors (cameras) Eye machine High res input image (320x240 or 512x384) Low res input image (160x120) Frequency: 3Hz Frequency: 10Hz Advanced Image processing Simple Image processing Traversibility map (5 to >30m) Traversibility map (0 to 5m) Network transmission Map merging Pose (GPS + IMU + wheels) Path planning Planner machine Path Control Actuators (wheels) Latency: 700ms Latency: 250ms Overview Problem Architecture Results Architecture: Fast-OD // Far-OD Far-OD Fast-OD Pierre Sermanet September 4th, 2007 11/21

Overview Problem Architecture Results Architecture: Implementation notes • CPU cycles: All cycles must be given to Fast-OD when it runs to guarantee low latency. Different solutions are: • Use real-time OS and give high priority to Fast-OD. • With regular OS, give Fast-OD control of Far-OD:  Fast-OD pauses Far-OD, runs, then sleeps for a bit and resume Far-OD. • Use dual-core CPU. • Map merging: Fast and Far maps are merged together before planning according to their respective poses. • 2-step planning: This architecture makes it easier to separate the different planning algorithms suited for short and long range: • Fast-OD planning happens in Cartesian space and takes robot dynamics in account (more important in short range) • Far-OD planning happens in image space and uses regular path planning. Long-range planning: Image space infinity 5m 10m 10m Dynamics planning: Cartesian space Pierre Sermanet September 4th, 2007 12/21

Overview Problem Architecture Results Results: Timing measures Fast-od actuation latency 250ms 190ms Fast-od sensors latency Fast-od period (frequency) 100ms (10Hz) Far-od period (frequency) 370ms (2-3Hz) Far-od actuation latency 700ms Pierre Sermanet September 4th, 2007 13/21

Vehicle-map Global-map Overview Problem Architecture Results Results • Short and long range navigation test: • 1st obstacle appears quickly and suddenly to robot  testing short range navigation • Cul-de-sac testing “long” range navigation • Parallel architecture is consistently better at short and long range navigation than series architecture or FAST-OD only. Note: Here Fast-od has 5m radius and Far-od 15m radius. Pierre Sermanet September 4th, 2007 14/21

Overview Problem Architecture Results Results: More recent results • Fast-od + Far-od in parallel: Short-range navigation consistently successful: 0 collision over >5 runs Finish run in about 16sec along shortest path Fast-od: 10Hz & 250ms & 3meters range Far-od: 3Hz & 700ms & 30meters range Video 1: collision-free bucket maze Video 2: collision-free bucket maze • Fast-od + Far-od in series: Short-range navigation consistently failing: > 2 collisions over >5 runs Finish run in >40sec along longer path Fast-od/Far-od: 3Hz & 700ms & 3m/30m range (frequency is acceptable but latency is too high) Videos 3,4,5: obstacle collisions due to high latency and period. Pierre Sermanet September 4th, 2007 15/21

Overview Problem Architecture Results Results: More recent results • Fast-od + Far-od in parallel: Short-range navigation consistently successful: 0 collision over >5 runs Fast-od: 10Hz & 250ms & 3meters range Far-od: 3Hz & 700ms & 30meters range Note: long-range planning is off, i.e. Far-od is processing but ignored. Only short-range navigation was tested here. Video 6: Natural obstacles Video 7: Tight maze of artificial obstacles Pierre Sermanet September 4th, 2007 16/21

Overview Problem Architecture Results Results: Moving obstacles Detects and avoids moving obstacles consistently. Video 8: Fast moving obstacle Pierre Sermanet September 4th, 2007 17/21

Overview Problem Architecture Results Results: Beating humans • Autonomous short-range navigation is consistently better than inexperienced human drivers and equal or better than experienced human drivers. (driving with only robot’s images would be even harder for a human) Video 9: Experienced human driver Pierre Sermanet September 4th, 2007 18/21

Overview Problem Architecture Results Results: Processing Speed - Vision Range dilemma • We showed that processing speed prevails over vision range for short range navigation, whereas vision range prevails over speed for long range navigation. • Only 3m vision range were necessary to build a collision-free short range navigation for a 1.3m/s non-holonomic vehicle: • Vehicle’s worst-case stopping delay: 1.0 sec. • System’s worst-case reaction time: 0.25 sec. latency + 0.1 sec period • Worst-case reaction and stopping delay: 1.35 sec., (or 1.75m) • Only 1.0 sec. anticipation necessary in addition to worst-case reaction and stopping delay. • A vision range of 15m with high latency and lower frequency consistently improved the long range navigation in parallel to the short range module. Pierre Sermanet September 4th, 2007 19/21

Overview Problem Architecture Results Summary • We showed that both latency and frequency are critical in vision-based systems because of higher processing times. • A simple and very low resolution OD in parallelwith a high resolution OD proved to increase greatly the performance of a short and long range vision-based autonomous navigation system over commonly used higher resolution and sequential approaches: Processing speed prevails over range in short-range navigation and only 1.0 sec. additional anticipation to dynamics and processing delays was necessary. • Additional key concepts such as dynamics modeling must be implemented to build a complete end-to-end successful system. • A robust collision-free navigation platform, dealing with moving obstacles and beating humans, was successfully built and is able to leave enough CPU cycles available for computationally expensive algorithms. Pierre Sermanet September 4th, 2007 20/21

Questions? Pierre Sermanet September 4th, 2007 21/21

Pierre Sermanet¹·² Raia Hadsell¹ Jan Ben² Ayse Naz Erkan¹ Beat Flepp² Urs Muller² Yann LeCun¹

Pierre Sermanet¹·² Raia Hadsell¹ Jan Ben² Ayse Naz Erkan¹ Beat Flepp² Urs Muller² Yann LeCun¹

Presentation Transcript

Marie Curie Pierre Curie

ECG signal acquisition hardware design

arise

Human Milk Banking: Straight from the heart to beat the odds

COLLECTIVE DECISION MAKING

Chapter 14

Arrêt Cardiorespiratoire chez l’adulte Évidence de Science 2005

Beat the Clock Spelling Challenge

It's hard to beat a person who never gives up. -- Babe Ruth

Simple Past Tense and Past Participle

GoogLeNet