1 / 25

Design Exploration of a Human-machine Interface (HMI) Application

Design Exploration of a Human-machine Interface (HMI) Application. Francis Li Sam Madden. The Application. Data glove interface Wired, bulky SmartDust scenario A mote on each fingertip Investigate implementations Explore design alternatives. Proof-of-Concept Prototype.

bart
Download Presentation

Design Exploration of a Human-machine Interface (HMI) Application

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Design Exploration of a Human-machine Interface (HMI) Application Francis Li Sam Madden

  2. The Application • Data glove interface • Wired, bulky • SmartDust scenario • A mote on each fingertip • Investigate implementations • Explore design alternatives

  3. Proof-of-Concept Prototype • By SmartDust group • Atmel AVR Microprocessor • RFM TR1000 Radio • 6 accelerometers • Host PC performs processing • Analysis • Power: 45 mW measured • Continuous operation of processor, accelerometers, communication with host

  4. Application Analysis • Processing (on PC) • Do 20 times per second, for each accelerometer • Read in X and Y samples (10 bits each) • Compute rolling average to smooth input data • Convert averages to polar coordinates • Dominates cost: sqrt, acos, atan • Secondary cost: floating point operations • Periodically, calculate gesture via simple template matching (static hand positions)

  5. Application Analysis (cont) • Communication (from Atmel to PC) • 20 samples / sec • 6 accelerometers • 4 bytes/sample  480 bytes/sec • 115.6 kb/sec RF link • Radio = 12mA @ 3V, when transmitting  1.2 mW for radio alone • Real world power >> 1.2 mW, due to software and analog overhead ( real world analysis later )

  6. Optimization Process • Match Application to HW

  7. Optimization Process • Match Application to HW • Match Hardware to Application

  8. Optimization Process • Match Application to HW • Local computation to reduce communication • Match Hardware to Application

  9. Optimization Process • Match Application to HW • Local computation to reduce communication • Floating point  Fixed Point • Match Hardware to Application

  10. Optimization Process • Match Application to HW • Local computation to reduce communication • Floating point  Fixed Point • Match Hardware to Application • Distributed vs. Centralized

  11. Optimization Process • Match Application to HW • Local computation to reduce communication • Floating point  Fixed Point • Match Hardware to Application • Distributed vs. Centralized • TI vs. Atmel

  12. Optimization Process • Match Application to HW • Local computation to reduce communication • Floating point  Fixed Point • Match Hardware to Application • Distributed vs. Centralized • TI vs. Atmel • DSP

  13. Optimization Process • Match Application to HW • Local computation to reduce communication • Floating point  Fixed Point • Match Hardware to Application • Distributed vs. Centralized • TI vs. Atmel • DSP

  14. Communication vs.Computation • Estimates of local processing cost on Atmel (via simulation of GCC program) • Average: 2223 instr. x 2 • CalcPolar: 19017 instr.  2.83x106 instructions • Report gesture once per second FindGestureError: 5444 instr. 10 gestures, 6 accelerometers  5444 • 60  3.26x105 instr. • Memory operations are 2 cyles/instruction • Total cycles ~ 3.7M  4Mhz  13.5 mW • Communication = 8 bits/sec  negligible cost Loop 6•20 / sec

  15. Communication vs.Computation 2 • Cost of communication to Host PC (measured) • 4317 nJ/bit • From Culler, Hill, Szewczyk, Woo, “System Architecture For Networked Sensors.”  4317nJ/bit • 480 bytes/sec • 8 = 16.57 mW • Processor still sucks power • Current implementation requires 13.5mW • Using sleep, only 1.17 mW 17.74 mW total

  16. Optimization Process • Match Application to HW • Local computation to reduce communication • Floating point  Fixed Point • Match Hardware to Application • Distributed vs. Centralized • TI vs. Atmel • DSP

  17. Distributed vs. Centralized • Move some processing to each sensor • 6 processors • Each computing average, polar transform • Transmitting 4 x 8 = 32bits once/second • Using Atmel processor on each mote • Computation • ~ .5M cycles/sec  2mA @ 2.7V  5.4mW • Communication • Very small: 4317nJ • 32 = .13 mW • 5.53 mW/mote = 33.2 mW total (Bad Idea!)

  18. Optimization Process • Match Application to HW • Local computation to reduce communication • Floating point  Fixed Point • Match Hardware to Application • Distributed vs. Centralized • TI vs. Atmel • DSP

  19. TI Microcontroller Evaluation • A microcontroller with better specs • MSP430P112 330 A/Mhz active mode1.5 A standby (6 ns wakeup) • Used IAR Systems compiler, profiler, development environment • Analysis • Centralized 3.3V, 4 Mhz: 3.8 mW • Distributed 2.5V, 1 Mhz: 0.48 mW per mote • Six processors  2.9 mW

  20. Optimization Process • Match Application to HW • Local computation to reduce communication • Floating point  Fixed Point • Match Hardware to Application • Distributed vs. Centralized • TI vs. Atmel • DSP

  21. TI DSP Evaluation • TMS320C54x • Used TI Code Composer Studio, compiler, simulator • Power • Active Mode, 3.3V 10 Mhz: 33 mW • IDLE1, 0.36 mW • Analysis • Centralized: 7.8 mW • Distributed: 1.6 mW per mote • Six processors = 9.6 mW total

  22. TI DSP Evaluation Part 2 • TMS320C55x (two parallel MACs) • Same tools, with C55x compiler, simulator • Power: No details available... • Advertised: 0.9V, 0.05 mW/Mhz • Analysis • Centralized: 1170240 cycles (vs 2290440 54x) • 2 Mhz: 0.1 mW • Distributed: 195040 cycles (vs 381740 54x) • 1 Mhz: 0.05 mW • Six processors: 0.3 mW total

  23. Other Explorations • Hand optimized code • Possible to massively reduce computation cost • FP/Transcendentals conspicuously painful • Outside scope of our exploration • Radio Hardware • Bluetooth ~ 100 times more efficient • Reconfigurable Computing • Other circuitry (e.g. accelerometers)

  24. Results Summary • Cost, in mW of various implementations 17.74 using sleep mode, 28 without • 31/104 % improvement with same hardware • 170x improvement with new hardware

  25. Conclusions • By finding better mappings from SW  HW  Application, big performance gains are possible. • Effective use of local processor resources can reduce communication overheads, which are significant. • DSPs and other specialized processors can be a big win and don’t require hand-coded assembly or reconfigurable design

More Related