diamond a storage architecture for early discard in interactive search
Download
Skip this Video
Download Presentation
Diamond:

Loading in 2 Seconds...

play fullscreen
1 / 31

Diamond: - PowerPoint PPT Presentation


  • 266 Views
  • Uploaded on

The Diamond storage runtime decides whether to evaluate a searchlet ... Diamond is a system that supports interactive data analysis of large complex data set ...

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Diamond: ' - Kelvin_Ajay


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
diamond a storage architecture for early discard in interactive search

Diamond: A Storage Architecture for early Discard in Interactive Search

Larry Huston, et al.

FAST ’04

Jan. 26th, 2006

Speaker: Sehwan Lee

contents
Contents
  • Introduction
  • Background and Motivation
  • Diamond Architecture
  • Diamond Application
  • Prototype Implementation
  • Experimental Evaluation
  • Related Work
  • Conclusion
introduction4
Introduction
  • Goal
    • To enable interactive search of nonindexed data
    • Diamond  ‘Early Discard’ technique
  • Focus
    • Pure brute-force interactive search
background and motivation6
Background and Motivation
  • Limitation of Indexing
    • Infeasible manual indexing
    • High-dimensional representation
    • Sophisticating queries
    • Complicating user’s need
background and motivation7
Background and Motivation
  • Important of Early Discard
background and motivation8
Background and Motivation
  • Self-Tuning for Hardware Evolution
    • Flexibility of active disk
      • Well-suited for ‘early discard’
      • Two mechanisms of early discard
        • Application generates specialized early discard code
        • Dynamically adapt the evaluation of early discard code
      • Two aspects of early discard
        • Adaptive partitioning of computation bet’n toe storage devices and the host computer
        • Dynamic ordering of search terms to minimize the total computation time
background and motivation9
Background and Motivation
  • Exploiting the Structure of Search
    • Search tasks
      • Only require read access
      • Typically permit stored objects to be examined in any order
        • Efficient for parallelism
      • Do not require maintaining state bet’n objects
        • Efficient for parallelism
diamond architecture11
Diamond Architecture
  • Diamond Architecture
    • Searchlet
      • Contains all of the domain specific knowledge needed for early discard
      • Is a proxy of the application that can execute within the back end
diamond architecture12
Diamond Architecture
  • Searchlets
    • Searchlet Structure
      • A set of filters + some configuration state
    • Creating Searchlets
      • A domain application generates searchlets in response to a user’s query in a number of ways
        • Domain experts implement a library of filter functions
      • A domain application generates code on the fly
diamond architecture13
Diamond Architecture
  • Key Interfaces
    • Three APIs to isolate components
      • Searchlet API
        • Applications use to interact w/ Diamond
      • Filter API
        • To interact w/ the storage run-time environment
      • Associative DMA
        • Isolates the host and the storage implementations
        • This abstracts the transport mechanism and flow control bet’n host and storage run-time system
diamond architecture14
Diamond Architecture
  • Host and Storage Systems
    • The host system
      • Where the domains application executes
    • The storage system
      • Provides a generic infrastructure for searchlet execution
diamond applications16
Diamond Applications
  • Suitable characteristics for Diamond application
    • The user is searching for specific instances of data that match a query rather than aggregate statistics about the set of matching data items
    • The user’s criteria for a successful match is often subjective, potentially ill-defined, and typically influenced by the partial results of the query
    • The mapping bet’n the user’s needs and the matching objects is too complex for it to be captured by a batch operations
diamond applications17
Diamond Applications
  • SnapFind Description
    • Goal
      • To enable users to interactively search through large collection of unlabeled photographs
      • by quickly specifying searchlets that roughly correspond to semantic content
        • to create complex image queries by combining simple filters that scan images for patches containing particular color distributions, shapes or visual textures
    • Infeasible indexing
      • Different search filter at query time
      • High-dimensional content
diamond applications18
Diamond Applications
  • SnapFind Usage Experience
    • Example task
      • Retrieve photos from an unlabeled collection based on semantic content
      • 2 cases using same GUI
        • Purely manual search
        • Using SnapFind
prototype implementation20
Prototype Implementation
  • Dynamic Partitioning of Computation
    • The Diamond storage runtime decides whether to evaluate a searchlet locally or at the host computer
    • Two methods for partitioning computation
      • CPU Splitting
      • Queue Back-Pressure
prototype implementation21
Prototype Implementation
  • Filter Ordering
    • Average time to process an object through a series of filters F0…Fn
      • C=c(F0)+P(F0)c(F1)+P(F1|F0)P(F0)c(F2)+P(F2|F1,F0)P(F1|F0)P(F0)c(F3)+……
    • Partial Ordering
      • Partial ordering  linear extension
    • Ordering Policies
      • Independent
      • Hill climbing (HC)
      • Best filter first (BFF)
experimental evaluation23
Experimental Evaluation
  • Description of Searchlets
    • Test queries
experimental evaluation24
Experimental Evaluation
  • Description of Searchlets
    • Filters
experimental evaluation25
Experimental Evaluation
  • Disk and Host Processing Power
experimental evaluation26
Experimental Evaluation
  • Disk and Host Processing Power
experimental evaluation27
Experimental Evaluation
  • Impact of Dynamic Partitioning
experimental evaluation28
Experimental Evaluation
  • Impact of Filter Ordering
experimental evaluation29
Experimental Evaluation
  • Using Diamond on Large Datasets
related work
Related Work
  • On interactive data analysis
  • On approximate query processing
conclusion
Conclusion
  • Diamond is a system that supports interactive data analysis of large complex data set
  • To efficiently perform brute-force search the diamond architecture uses early discard to push filter processing to the edges of the system
  • The diamond architecture enables the system to adapt to different hardware configurations by dynamically adjusting where computation is performed
ad