Diamond a storage architecture for early discard in interactive search
1 / 31

Diamond: - PowerPoint PPT Presentation

  • Uploaded on

The Diamond storage runtime decides whether to evaluate a searchlet ... Diamond is a system that supports interactive data analysis of large complex data set ...

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Diamond: ' - Kelvin_Ajay

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Diamond a storage architecture for early discard in interactive search l.jpg

Diamond: A Storage Architecture for early Discard in Interactive Search

Larry Huston, et al.

FAST ’04

Jan. 26th, 2006

Speaker: Sehwan Lee

Contents l.jpg

  • Introduction

  • Background and Motivation

  • Diamond Architecture

  • Diamond Application

  • Prototype Implementation

  • Experimental Evaluation

  • Related Work

  • Conclusion

Introduction4 l.jpg

  • Goal

    • To enable interactive search of nonindexed data

    • Diamond  ‘Early Discard’ technique

  • Focus

    • Pure brute-force interactive search

Background and motivation6 l.jpg
Background and Motivation

  • Limitation of Indexing

    • Infeasible manual indexing

    • High-dimensional representation

    • Sophisticating queries

    • Complicating user’s need

Background and motivation7 l.jpg
Background and Motivation

  • Important of Early Discard

Background and motivation8 l.jpg
Background and Motivation

  • Self-Tuning for Hardware Evolution

    • Flexibility of active disk

      • Well-suited for ‘early discard’

      • Two mechanisms of early discard

        • Application generates specialized early discard code

        • Dynamically adapt the evaluation of early discard code

      • Two aspects of early discard

        • Adaptive partitioning of computation bet’n toe storage devices and the host computer

        • Dynamic ordering of search terms to minimize the total computation time

Background and motivation9 l.jpg
Background and Motivation

  • Exploiting the Structure of Search

    • Search tasks

      • Only require read access

      • Typically permit stored objects to be examined in any order

        • Efficient for parallelism

      • Do not require maintaining state bet’n objects

        • Efficient for parallelism

Diamond architecture11 l.jpg
Diamond Architecture

  • Diamond Architecture

    • Searchlet

      • Contains all of the domain specific knowledge needed for early discard

      • Is a proxy of the application that can execute within the back end

Diamond architecture12 l.jpg
Diamond Architecture

  • Searchlets

    • Searchlet Structure

      • A set of filters + some configuration state

    • Creating Searchlets

      • A domain application generates searchlets in response to a user’s query in a number of ways

        • Domain experts implement a library of filter functions

      • A domain application generates code on the fly

Diamond architecture13 l.jpg
Diamond Architecture

  • Key Interfaces

    • Three APIs to isolate components

      • Searchlet API

        • Applications use to interact w/ Diamond

      • Filter API

        • To interact w/ the storage run-time environment

      • Associative DMA

        • Isolates the host and the storage implementations

        • This abstracts the transport mechanism and flow control bet’n host and storage run-time system

Diamond architecture14 l.jpg
Diamond Architecture

  • Host and Storage Systems

    • The host system

      • Where the domains application executes

    • The storage system

      • Provides a generic infrastructure for searchlet execution

Diamond applications16 l.jpg
Diamond Applications

  • Suitable characteristics for Diamond application

    • The user is searching for specific instances of data that match a query rather than aggregate statistics about the set of matching data items

    • The user’s criteria for a successful match is often subjective, potentially ill-defined, and typically influenced by the partial results of the query

    • The mapping bet’n the user’s needs and the matching objects is too complex for it to be captured by a batch operations

Diamond applications17 l.jpg
Diamond Applications

  • SnapFind Description

    • Goal

      • To enable users to interactively search through large collection of unlabeled photographs

      • by quickly specifying searchlets that roughly correspond to semantic content

        • to create complex image queries by combining simple filters that scan images for patches containing particular color distributions, shapes or visual textures

    • Infeasible indexing

      • Different search filter at query time

      • High-dimensional content

Diamond applications18 l.jpg
Diamond Applications

  • SnapFind Usage Experience

    • Example task

      • Retrieve photos from an unlabeled collection based on semantic content

      • 2 cases using same GUI

        • Purely manual search

        • Using SnapFind

Prototype implementation20 l.jpg
Prototype Implementation

  • Dynamic Partitioning of Computation

    • The Diamond storage runtime decides whether to evaluate a searchlet locally or at the host computer

    • Two methods for partitioning computation

      • CPU Splitting

      • Queue Back-Pressure

Prototype implementation21 l.jpg
Prototype Implementation

  • Filter Ordering

    • Average time to process an object through a series of filters F0…Fn

      • C=c(F0)+P(F0)c(F1)+P(F1|F0)P(F0)c(F2)+P(F2|F1,F0)P(F1|F0)P(F0)c(F3)+……

    • Partial Ordering

      • Partial ordering  linear extension

    • Ordering Policies

      • Independent

      • Hill climbing (HC)

      • Best filter first (BFF)

Experimental evaluation23 l.jpg
Experimental Evaluation

  • Description of Searchlets

    • Test queries

Experimental evaluation24 l.jpg
Experimental Evaluation

  • Description of Searchlets

    • Filters

Experimental evaluation25 l.jpg
Experimental Evaluation

  • Disk and Host Processing Power

Experimental evaluation26 l.jpg
Experimental Evaluation

  • Disk and Host Processing Power

Experimental evaluation27 l.jpg
Experimental Evaluation

  • Impact of Dynamic Partitioning

Experimental evaluation28 l.jpg
Experimental Evaluation

  • Impact of Filter Ordering

Experimental evaluation29 l.jpg
Experimental Evaluation

  • Using Diamond on Large Datasets

Related work l.jpg
Related Work

  • On interactive data analysis

  • On approximate query processing

Conclusion l.jpg

  • Diamond is a system that supports interactive data analysis of large complex data set

  • To efficiently perform brute-force search the diamond architecture uses early discard to push filter processing to the edges of the system

  • The diamond architecture enables the system to adapt to different hardware configurations by dynamically adjusting where computation is performed