1 / 27

Research Areas End-User Programming Extracting and visualizing data from web

Prof. Jason Hong, Carnegie Mellon University Rapid End-User Programming and Visualization for the Web IDA Session 5 2007 CS Study Panel 24 April 2008. Principal Investigator. Principal Investigator. Research Areas End-User Programming Extracting and visualizing data from web

honora
Download Presentation

Research Areas End-User Programming Extracting and visualizing data from web

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Prof. Jason Hong, Carnegie Mellon UniversityRapid End-User Programming and Visualization for the WebIDA Session 52007 CS Study Panel24 April 2008

  2. Principal Investigator Principal Investigator • Research Areas • End-User Programming • Extracting and visualizing data from web • Usable Privacy and Security • Anti-phishing (training, detection) • Managing privacy and security policies • Mobile Computing • Location-based services • Context-aware computing Jason Hong Assistant Professor Human-Computer Interaction Institute Carnegie Mellon University PhD: University of California, Berkeley Contact Information School of Computer Science Carnegie Mellon University 2504D Newell-Simon Hall 5000 Forbes Ave Tel: (412) 268 1251 Fax: (412) 268 1266 E-mail: jasonh@cs.cmu.edu Web: http://www.cs.cmu.edu/~jasonh • Potential Military Applications • Tools for rapidly integrating data and web services • Better visualizations of large data sets • Effective training for security • Automated algorithms for detecting phishing scams • Better interfaces for managing security

  3. 30000 Foot View • High-level problems observed: • Stovepipes - Data and services spread over multiple systems • Agility - Integration takes months or years • Overload - Too much information to easily process • Goal: Make it easy for people to visualize and process data gathered from variety of sources • Information extraction + visualization + machine learning • No PhD required • Analogies: • Spreadsheets • Visual Basic

  4. Mashups as Key Focus Area • More specifically, provide an end-user programming tool that makes it easy to create mashups • Mashups are applications that combine content and services from multiple web sites • Ex. Craigslist.com + GoogleMaps = Housingmaps.com

  5. Other Example Mashups • Other example mashups • Ex. MySpace child predators • Ex. Locations of friends on MySpace or Facebook • Common themes • Aggregating multiple sources (web pages, databases, etc) • Handling multiple data formats (not designed to be shared) • Processing the data (filtering, summarizing, etc) • Supporting multiple forms of output (graphs, maps, lists)

  6. Creating Mashups is Difficult • Requires lots of skill to create a mashup • Ex. Housingmaps creator has PhD in computer science • Ex. MySpace predator list took months of custom coding • Requires programming expertise in many areas • Web crawling • Text parsing and pattern matching • Web services (WSDL and REST) • Databases • HTML • Can we accelerate this process to a matter of days or hours for non-experts?

  7. End-User Programming • Haggis, an end-user programming tool • Rapidly extract and combine data from multiple sources • Quickly create high-quality interfaces and visualizations • Use programming-by-example techniques to specify what is normal and what is anomalous

  8. 1. Extract data from multiple sources • Improved wizards for extracting data from web pages • Can specify example of desired links, system generalizes

  9. 1. Extract data from multiple sources • Improved wizards for extracting data from web pages • Can specify example of desired links, system generalizes • Better support for other patterns on web • Tables, street addresses, etc • Support for real-time data • Weather, traffic, stocks, any web page periodically updated • Sensor Andrew, sensor network being deployed at CMU • Electrical usage, water usage, etc

  10. 2. Interfaces and Visualizations • Wizards for supporting common UI patterns • Table views, maps, graph views, alerts, etc • Programming-by-example techniques

  11. 2. Interfaces and Visualizations • Output as a web page or desktop widget • Yahoo Widgets, Google Desktop, Windows Sidebar

  12. 2. Interfaces and Visualizations • Output as a web page or desktop widget • Yahoo Widgets, Google Desktop, Windows Sidebar

  13. 3. Normal versus Anomalous • Problem: Too much data, gets dropped on floor • Solution: “Teach” the system what patterns to look for • Analyst-in-the-loop: infoviz + machine learning • Long-term goal • Example: • eBay “penny sellers”, could create custom software, but slow • Analyst uses visualization to find some examples of penny sellers and gives hints to system as to why • Systemfindsmore suspects,analystgivesrelevancefeedback • As new data streams in, system can flag suspects • Can help address high turnover rate at intelligence agencies, loss of organizational memory

  14. Current Progress • First round of interviews completed • Sensor Andrew team (Civil and Electrical Engineers) • Mashup Camp • Programmers around CMU • Initial prototype of “plumbing” in progress • An Integrated Development Environment (IDE) for programmers, to facilitate extraction and visualization of data • Low-level support for extracting data from tables, basic visualizations, etc • Higher-level tools later to be built on top • First round of user tests planned for August

  15. Past Work with Marmite • Wizard for extracting data from arbitrary web pages • Combine operators together in a dataflow (Unix) • View the data in multiple ways (table, map)

  16. How Marmite Works • Wizard for getting data from web pages • Combine operators together in a dataflow (Unix) • View the data in multiple ways (table, map)

  17. How Marmite Works • Operators let you knowwhat operations can be done • Input, processing, output

  18. How Marmite Works • Operators are chained together in a dataflow (Unix)

  19. How Marmite Works • Current data is shown

  20. How Marmite Works • And multiple views too

  21. How Marmite Works • A wizard UI for helping people get the data they want

  22. Some High-Level Design Issues • Centralized model • Clean data model: well-managed, well-formatted, common representations, well-known databases, etc • Decentralized model • “Anarchic”, multiple data formats in multiple places • Hard to get lots of people to agree on data format and representation • More likely scenario (look at how databases are used today) • Haggis is being designed for this model, assuming that a person may have to clean up the data and resolve formats

  23. Other High-Level Design Issues • Discovery • What data sources are available? • May need some kind of centralized store that describes these (sort of like DNS for Internet) • Security • Access control, who can access what data sources? • This is a general problem with sensor data • Privacy • What kinds of queries / apps should people be able to do? • Unclear how to restrict those in practice

More Related