1 / 42

Making Mashups with Marmite

Making Mashups with Marmite. Jeff Wong Jason I. Hong Carnegie Mellon University. The Big Picture Problem. Lots of content out there on the web But not always in a form amenable to your needs Ex. Easy to get a list of hotels in San Jose, not so easy to sort by distance to convention center

derek-hood
Download Presentation

Making Mashups with Marmite

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Making Mashups with Marmite Jeff WongJason I. HongCarnegie Mellon University

  2. The Big Picture Problem • Lots of content out there on the web • But not always in a form amenable to your needs • Ex. Easy to get a list of hotels in San Jose, not so easy to sort by distance to convention center • Two observations: • In many cases, all of the data and services people need already exist, but not connected together • Unlikely that a web site can predict all possible needs

  3. A Solution: Mashups • Rapidly growing community of users creating “mashups” combining content from multiple web sites • Ex. Housingmaps.com

  4. A Solution: Mashups • Rapidly growing community of users creating “mashups” combining content from multiple web sites • Ex. Housingmaps.com • Ex. MySpace child predators • Ex. Friendster locations • Ex. Most popular videos on YouTube, Yahoo Video, …

  5. A Solution: Mashups • Rapidly growing community of users creating “mashups” combining content from multiple web sites • Ex. Housingmaps.com • Ex. MySpace child predators • Ex. Friendster locations • Ex. Most popular videos on YouTube, Yahoo Video, … • ProgrammableWeb.com statistics • ~1500 mashups created since April 2005 • 356 open web-based APIs available

  6. But Creating Mashups is Hard • Requires lots of skill to create a mashup • Ex. Housingmaps creator has PhD in computer science • Ex. MySpace child predator list took months • Requires programming expertise in many areas • Web crawling • Text parsing • Pattern matching • Databases • HTML

  7. MarmiteEnd-User Programming for Mashups • Main idea: make it easy to create web mashups • Use a dataflow approach connecting small operators • Inspired by Unix pipes and Apple’s Automator • Example: • Get all events from Upcoming.org • Filter out events that are too old • Put them all onto a map • Runs inside of a standard web browser

  8. Set of Operators

  9. Data Flow View

  10. Data View

  11. Using Marmite (Envisioned) • Extract content from one or more web pages • names, addresses, dates, phone #, URLs • Process it in a data flow manner • filtering out values or adding metadata • integrating with other data sources (similar to a database join operation) • Direct the output to a variety of sinks • databases, map services, text files, visualizations, web pages, or source code that can be further edited

  12. Marmite • Motivation and Examples • Features and Design Rationale • User Evaluation

  13. Features and Design Rationale • Conducted a series of quick evaluations to understand design space and potential problems • Automator • Lo-fi prototypes

  14. Automator

  15. Informal Automator Evaluation • Had three novices try three simple web-based tasks • Warm-up task • Traverse a set of web pages • Download a set of images • Some findings: • Some difficulties knowing how to start and what to do next • Little feedback about state of system between operations • Difficult to iterate due to network speed issues

  16. Lo-Fi Prototypes • 6 paper prototypes with 20 participants

  17. Design Solutions • Problem: how to start and what to do next • Solution: Suggest next actions • Weak data typing to find types (addresses, numbers, etc) • Filter operators to only show relevant ones • Suggest operators that might be applicable

  18. Design Solutions • Problem: little feedback about state of system between operations • Solution: link data flow and data view together • Many systems take program-centric view (ex. Automator) or data-centric view (ex. spreadsheets) • Use hybrid data flow / data view, showing an operation and its effects together • Data view usually “spreadsheet”, other views possible too (for example, maps)

  19. Design Solutions • Problem: difficult to iterate due to network speeds • Solution: cache data, let people “replay” data • Reload, pause, play

  20. Other Design Findings • Screen real estate issues • Collapsible operators, leaving a readable label

  21. Extracting Generic Content • Can’t have pre-defined extractor operators for every possible web site • Need a more general way of extracting data from pages • Developed a generic wizard UI for selecting links • Content from that set could be extracted via other operators • Uses Solvent (MIT), an XPath-based algorithm for finding patterns in web pages • Finds “groups” of related web content based on how HTML is structured

  22. Marmite

  23. Operators • Operators have input types • Operator uses this to guess which columns it wants • Operators have output types

  24. Implementation • JavaScript (for underlying code) and Extensible Binding Language (XBL for UI) • Operators currently in JavaScript • Ideally could be scriptable in any programming language • Currently ~15 operators

  25. Marmite • Motivation and Examples • Features and Design Rationale • User Evaluation

  26. Evaluation • Informal user study with 6 people • 2 novices • 2 people with spreadsheet experience (formulas) • 2 people with programming experience • Tasks (in increasing difficulty) • Warmup task showing how to retrieve a set of addresses and how to geocode an address • Search for and filter out events further than a week away • Compile a list of events from two event services and plot them on a map • Recreate the housingmaps site

  27. Results • Three people able to complete all tasks in ~1 hour • First two users confused about suggested actions (automatically popped up, made manual for other 4 users) • Novice made some progress, not able to finish all tasks • Able to re-create housingmaps in ~15 minutes

  28. Marmite

  29. More Results • Biggest barrier was understanding the data flow • Did not understand input and output concept • Applied operators as one-off, did not realize that it was a static representation of flow • Did not understand data flow and data view were linked

  30. Future Directions • Short-term • Better screen-scraping operators • More operators • Better connection with web services (WSDL and REST) • Better help for starting a data flow • Long-term • Intelligence analysis • Better visualizations • Location-based services

  31. Conclusions • Marmite, a tool for creating web-based mashups • Extract content from one or more web pages • Process it in a data flow manner • Direct the output to a variety of sinks • Hybrid data flow / data view • User evaluation shows some promising results Jeff Wong, Jason Hong, Making Mashups with Marmite: Re-purposing Web Content through End-User Programming, CHI 2007

  32. Marmite

  33. Types of Operators • Sources • Add data into Marmite by querying databases, extracting information from web pages, and so on. • Processors • modify, combine, or delete existing rows. Example operators include geocoding (converting street addresses to latitude and longitude) and filtering. Processor operators might add or remove columns as well • Sinks • redirect the flow the data out of Marmite. Examples include showing data on a map, saving it to a file, or to a web page.

More Related