Making mashups with marmite
Download
1 / 42

Making Mashups with Marmite - PowerPoint PPT Presentation


  • 81 Views
  • Uploaded on

Making Mashups with Marmite. Jeff Wong Jason I. Hong Carnegie Mellon University. The Big Picture Problem. Lots of content out there on the web But not always in a form amenable to your needs Ex. Easy to get a list of hotels in San Jose, not so easy to sort by distance to convention center

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Making Mashups with Marmite' - derek-hood


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Making mashups with marmite

Making Mashups with Marmite

Jeff WongJason I. HongCarnegie Mellon University


The big picture problem
The Big Picture Problem

  • Lots of content out there on the web

    • But not always in a form amenable to your needs

    • Ex. Easy to get a list of hotels in San Jose, not so easy to sort by distance to convention center

  • Two observations:

    • In many cases, all of the data and services people need already exist, but not connected together

    • Unlikely that a web site can predict all possible needs


A solution mashups
A Solution: Mashups

  • Rapidly growing community of users creating “mashups” combining content from multiple web sites

    • Ex. Housingmaps.com


A solution mashups1
A Solution: Mashups

  • Rapidly growing community of users creating “mashups” combining content from multiple web sites

    • Ex. Housingmaps.com

    • Ex. MySpace child predators

    • Ex. Friendster locations

    • Ex. Most popular videos on YouTube, Yahoo Video, …


A solution mashups2
A Solution: Mashups

  • Rapidly growing community of users creating “mashups” combining content from multiple web sites

    • Ex. Housingmaps.com

    • Ex. MySpace child predators

    • Ex. Friendster locations

    • Ex. Most popular videos on YouTube, Yahoo Video, …

  • ProgrammableWeb.com statistics

    • ~1500 mashups created since April 2005

    • 356 open web-based APIs available


But creating mashups is hard
But Creating Mashups is Hard

  • Requires lots of skill to create a mashup

    • Ex. Housingmaps creator has PhD in computer science

    • Ex. MySpace child predator list took months

  • Requires programming expertise in many areas

    • Web crawling

    • Text parsing

    • Pattern matching

    • Databases

    • HTML


Marmite end user programming for mashups
MarmiteEnd-User Programming for Mashups

  • Main idea: make it easy to create web mashups

  • Use a dataflow approach connecting small operators

    • Inspired by Unix pipes and Apple’s Automator

  • Example:

    • Get all events from Upcoming.org

    • Filter out events that are too old

    • Put them all onto a map

  • Runs inside of a standard web browser





Using marmite envisioned
Using Marmite (Envisioned)

  • Extract content from one or more web pages

    • names, addresses, dates, phone #, URLs

  • Process it in a data flow manner

    • filtering out values or adding metadata

    • integrating with other data sources (similar to a database join operation)

  • Direct the output to a variety of sinks

    • databases, map services, text files, visualizations, web pages, or source code that can be further edited


Marmite
Marmite

  • Motivation and Examples

  • Features and Design Rationale

  • User Evaluation


Features and design rationale
Features and Design Rationale

  • Conducted a series of quick evaluations to understand design space and potential problems

    • Automator

    • Lo-fi prototypes



Informal automator evaluation
Informal Automator Evaluation

  • Had three novices try three simple web-based tasks

    • Warm-up task

    • Traverse a set of web pages

    • Download a set of images

  • Some findings:

    • Some difficulties knowing how to start and what to do next

    • Little feedback about state of system between operations

    • Difficult to iterate due to network speed issues


Lo fi prototypes
Lo-Fi Prototypes

  • 6 paper prototypes with 20 participants


Design solutions
Design Solutions

  • Problem: how to start and what to do next

  • Solution: Suggest next actions

    • Weak data typing to find types (addresses, numbers, etc)

    • Filter operators to only show relevant ones

    • Suggest operators that might be applicable


Design solutions1
Design Solutions

  • Problem: little feedback about state of system between operations

  • Solution: link data flow and data view together

    • Many systems take program-centric view (ex. Automator) or data-centric view (ex. spreadsheets)

    • Use hybrid data flow / data view, showing an operation and its effects together

    • Data view usually “spreadsheet”, other views possible too (for example, maps)


Design solutions2
Design Solutions

  • Problem: difficult to iterate due to network speeds

  • Solution: cache data, let people “replay” data

    • Reload, pause, play


Other design findings
Other Design Findings

  • Screen real estate issues

    • Collapsible operators, leaving a readable label


Extracting generic content
Extracting Generic Content

  • Can’t have pre-defined extractor operators for every possible web site

    • Need a more general way of extracting data from pages

  • Developed a generic wizard UI for selecting links

    • Content from that set could be extracted via other operators

    • Uses Solvent (MIT), an XPath-based algorithm for finding patterns in web pages

      • Finds “groups” of related web content based on how HTML is structured



Operators
Operators

  • Operators have input types

    • Operator uses this to guess which columns it wants

  • Operators have output types


Implementation
Implementation

  • JavaScript (for underlying code) and Extensible Binding Language (XBL for UI)

  • Operators currently in JavaScript

    • Ideally could be scriptable in any programming language

    • Currently ~15 operators


Marmite2
Marmite

  • Motivation and Examples

  • Features and Design Rationale

  • User Evaluation


Evaluation
Evaluation

  • Informal user study with 6 people

    • 2 novices

    • 2 people with spreadsheet experience (formulas)

    • 2 people with programming experience

  • Tasks (in increasing difficulty)

    • Warmup task showing how to retrieve a set of addresses and how to geocode an address

    • Search for and filter out events further than a week away

    • Compile a list of events from two event services and plot them on a map

    • Recreate the housingmaps site


Results
Results

  • Three people able to complete all tasks in ~1 hour

    • First two users confused about suggested actions (automatically popped up, made manual for other 4 users)

    • Novice made some progress, not able to finish all tasks

  • Able to re-create housingmaps in ~15 minutes



More results
More Results

  • Biggest barrier was understanding the data flow

    • Did not understand input and output concept

    • Applied operators as one-off, did not realize that it was a static representation of flow

    • Did not understand data flow and data view were linked


Future directions
Future Directions

  • Short-term

    • Better screen-scraping operators

    • More operators

    • Better connection with web services (WSDL and REST)

    • Better help for starting a data flow

  • Long-term

    • Intelligence analysis

    • Better visualizations

    • Location-based services


Conclusions
Conclusions

  • Marmite, a tool for creating web-based mashups

    • Extract content from one or more web pages

    • Process it in a data flow manner

    • Direct the output to a variety of sinks

  • Hybrid data flow / data view

  • User evaluation shows some promising results

    Jeff Wong, Jason Hong, Making Mashups with Marmite: Re-purposing Web Content through End-User Programming, CHI 2007



Types of operators
Types of Operators

  • Sources

    • Add data into Marmite by querying databases, extracting information from web pages, and so on.

  • Processors

    • modify, combine, or delete existing rows. Example operators include geocoding (converting street addresses to latitude and longitude) and filtering. Processor operators might add or remove columns as well

  • Sinks

    • redirect the flow the data out of Marmite. Examples include showing data on a map, saving it to a file, or to a web page.


ad