1 / 33

A Suite of Web-Services for Wavelet-based Analysis of Large Atmospheric Datasets

A Suite of Web-Services for Wavelet-based Analysis of Large Atmospheric Datasets. Cyrus Shahabi University of Southern California Dept. of Computer Science Los Angeles, CA 90089-0781 shahabi@usc.edu http://infolab.usc.edu. Outline. Motivating Applications Approach: Wavelets

carlyn
Download Presentation

A Suite of Web-Services for Wavelet-based Analysis of Large Atmospheric Datasets

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Suite of Web-Services for Wavelet-based Analysis of Large Atmospheric Datasets Cyrus Shahabi University of Southern California Dept. of Computer Science Los Angeles, CA 90089-0781 shahabi@usc.edu http://infolab.usc.edu

  2. Outline • Motivating Applications • Approach: Wavelets • ProPolyne: Overview and Features • ProPolyne: Details • Development Status • ProDA Demonstration • ProDA Web-Services • A Sample Client for Progressive Queries • Why Web Services? (Answers to Dan’s Questions!) • Conclusion • Future Work

  3. Motivation: New Multidimensional Data Intensive Applications • Multidimensional data sets: (w/ dimension & measure) • Remote sensory date (from JPL): <latitude, longitude, altitude, time, temperature> • Sensor readings from GPS ground stations (from NASA): <lat, long, t,velocity> • Petroleum sales (from Digital-Government research center): <location, product, year, month,volume> • ACOUSTIC data (from UCLA sensor-network project): <IPAQ-id, volume-id, event#, time,value> • Market data (from NCR): <store-location, product-id, date,price, sale> • Large size, e.g., current (toy!) NASA/JPL data set: • Past 10 years, sampling twice a day, at a lat-long-alt grid of 64 * 128 * 16, recording 8 bytes of temperature & 16 bytes of dimensions • This is 6 MB of data per day; a total of 21 GB for 10 years • Increase: twice an hour sampling, 1024 * 4096 * 128 grid, …

  4. Motivation: Multidimensional Applications • I/O and computationally complex queries • Range-aggregate queries (w/ aggregate function) • Averagetemperature, given an area and time interval • Average velocity of upward movement of the station • Totalpetroleum sales volume of a given product in a given location and year • Number of jackets sold in Seattle in Sep. 2003 • Tougher queries: • Covariance of temperature and altitude (correlation) • Variance of sale of petroleum in 2004 in CA • Quick response-time (interactive): • the results can be approximate and/or progressively become exact

  5. Recap! • Multidimensional data • Large data • Aggregate queries • Approximate answers • Progressive answers • Multi-resolution compression • Wavelets!

  6. Approach: Enabling Data Manipulation, Query & Analysis in the WAVELET Domain • Everybody else’s idea: let’s compress data • Reason: save space? No not really! • Implicit reason: queries deal with smaller data sets and hence faster (not always true!) • More problems: not only query results can never be 100% accurate anymore, but also different queries can have very different error rates given their areas of interest • Why? At the data population time, we don’t know which coefficients are more/less important to our queries! (also observed by [Garofalakis & Gibbons, SIGMOD’02], but they proposed other ways to drop coefficients assuming a uniform workload) • Different than the signal-processing objective to reconstruct the entire signal as good as possible

  7. Approach: Enabling Data Manipulation, Query & Analysis in the WAVELET Domain • Our idea/distinction: storage is cheap and queries are ad-hoc; let’s keep all the wavelet coefficients! (no data compression) • Opportunity: At the query time, however, we have the knowledge of what is important to the pending query • ProPolyne: Progressive Evaluation of Polynomial Range-Aggregate Query

  8. Overview of ProPolyne • Define range-sum query as dot product of query vector and data vector (also observed by [Gilbert et. al, VLDB’2001]) • Offline: Multidimensional wavelet transform of data • At the query time: “lazy” wavelet transform of query vector (very fast) • Dot product of query and data vectors in the transformed domain  exact result • Choose high-energy query coefficients only fast approximate result (90% accuracy by retrieving < 10% of data) • Choose query coefficients in order of energy  progressive result

  9. ProPolyne Unique Features • “Measure” can be any polynomial on any combination of attributes • Can support COUNT, SUM, AVERAGE • Also supports Covariance, Kurtosis, etc. • All using one set of pre-computed aggregates • Independent from how well the data set can be compressed/approximated by wavelets • Because: We show “range-sum queries” can always be approximated well by wavelets (not always HAAR though!) • Potentially low update cost • Can be used for exact, approximate and progressiverange-sum query evaluation

  10. Outline • Motivating Applications • ProPolyne: Overview and Features • ProPolyne: Details • Polynomial Range-Sum Queries as Vector Queries • Evaluation of Vector Queries • Progressive/Approximate Evaluation of Vector Queries • Development Status • Conclusion • Future Work

  11. Age Salary • $50k • $55k • $58k • $100k • $130k • 57 $120k • Example: F = (Age, Salary) • R:(25 < age < 40) & (55k < salary < 150k) I Polynomial Range-Sum Queries • Polynomial range-sum queries: Q(R,f,I) • I is a finite instance of schema F • RSubSetOf Dom(F), is the range • f : Dom(F)  R is a polynomial of degree d

  12. Age Salary • $50k • $55k • $58k • $100k • $130k • 57 $120k I Hence: if where: if Or: Vector Query query data Polynomial Range-Sum Queries as “Vector Queries” • The data frequency distribution of I is the function DI : Dom(F)  Z that maps a point x to the number of times it occurs in I • To emphasize the fact that a query is an operator on the data frequency distribution, we write • Example: D(25,50)=D(28,55)=…=D(57,120)=1 and D(x)=0 otherwise.

  13. DWT of a aka wavelet coefficients of a • Hence, vector queries can be computed in the wavelet-transformed space as: • Transform the query vector at submission: O (Nd) ? • Nop! Not with our “lazy” wavelet transform for range-aggregate queries Wavelet Transformation of Data and Query Vectors

  14. Fast Evaluation of Vector Queries Using Wavelets … • Technical Requirements: • Wavelets should have small support (i.e., the shorter the filter, the better) • Wavelets must satisfy a “moment condition” • Supports any Polynomial Range-Sum up to a degree determined by the choice of wavelets • E.g., Haar can only support degree 0 (e.g., COUNT), while db4 can support up to degree 1 (e.g., SUM), and db6 for degree 2 (e.g., VARIANCE) • Standard DWT: O (N) • Our lazy wavelet transform: O (llog N), where l is the length of the filter

  15. Exact Evaluation of Vector Queries Query: SUM(salary) when (25 < age < 40) & (55k < salary < 150k) # of Nonzero Coordinates: 4380 # of Wavelet Coefficients: 837

  16. Progressive Evaluation of Vector Queries

  17. Outline • Motivating Applications • Approach: Wavelets • ProPolyne: Overview and Features • ProPolyne: Details • Development Status • ProDA Demonstration • ProDA Web-Services • A Sample Client for Progressive Queries • Why Web Services? (Answers to Dan’s Questions!) • Conclusion • Future Work

  18. ProDA Demonstration

  19. 3-Tier System Architecture Client (Query in C#) Standard SOAP (XML / HTTP) Web Services .Net C# ODP.NET DBMS Oracle 9.2.0.1.0

  20. Database Schema Cube Directory Cube … Cube Header … Dimension Values Wavelet Coefficients

  21. ProDA Web Services http://mahour.usc.edu/proda/webservices.asmx • General • For wavelet transformation • Directory • To list the cubes’ metadata • Cube • To select a cube and access its content • Query • To submit a Range-Aggregate query and ask for result: exact, approximate or progressive

  22. General Web Services • double[ ] GetFilterH (int filter) • To return high-pass (differences) wavelet filter coefficients • double[ ] GetFilterL (int filter) • To return low-pass (averages) wavelet filter coefficients • string GetFilterName (int filter) • To return wavelet filter name • float[ ] DWT1 (float[ ] a, int n, int filter) • To wavelet transform an array • float[ ] DWTn (float[ ] a, int[ ] nn, int ndim, int filter) • To wavelet transform a multidimensional array

  23. Directory Web Services • string[ ] AllCubeNames ( ) • To list the names of all available cubes • string[ ] AllCubeTitles ( ) • To list the descriptions of all available cubes • int[ ] AllCubeDims ( ) • To list the dimension numbers of all cubes

  24. Cube Web Services • void SelectDB (string dbn) • To select a cube • void CloseConnection ( ) • To end the connection • int GetFilterIndex ( ) • To return the Wavelet Filter used to transform the cube • int GetDimSize (int idim) • To return the dimension size for dimension #idim • float GetDimValue (int idim, int index) • To return the dimension value for dimension #idim at index position • float[ ] GetDimValues (int idim) • To return the dimension values for dimension #idim • int GetDimMax (int idim) • To return the maximum value for dimension #idim • int GetDimMin (int idim) • To return minimum value for dimension #idim • int DimN ( ) • To return the number of dimensions • string GetDimTitle (int idim) • To return the dimension title for dimension #idim

  25. void Cnt ( ) To submit count query void Sum (int idim) To submit sum query void Avg (int idim) Average query void Var (int idim) Variance query void Std (int idim) Standard deviation void Cov (int idim, int jdim) Covariance query void Cor (int idim, int jdim) Correlation query void PushTerm (float coef, int[ ] pow) To add polynomial query void PushOperator (char op) To add unary or binary operator for polynomials Query Web Services (1)Aggregate Functions

  26. A polynomial example Consider Var(x) Let us re-write Var(x) in post order: We can generate this query by using PushTerm and PushOperator calls: • PushTerm(1,{2}); • PushTerm(1,{1}); • PushOperator(‘^’); • PushOperator(‘-’); • PushTerm(1,{0}); • PushOperator(‘/’);

  27. long GetRangeSize ( ) To return range size long GetRetrieveSize ( ) To return number of required retrieves void Submit ( ) To submit added polynomial query float Advance (int step) To advance query progress bool HasMore ( ) To check query progress long GetCounter ( ) To return number of retrieves float GetPercentage ( ) To return percentage of query progress float GetResult ( ) To return the latest result Query Web Services (2)Cursor Functions • void SetRange (int[] startRange, int[] endRange) • To specify a range (in multidimension)

  28. A Sample Client for Progressive Query (in C#) • Creating an instance • Storing session state Initialization ProDA_WebServices pws=new ProDA_WebServices(); pws.Credentials = System.Net.CredentialCache.DefaultCredentials; pws.CookieContainer = new CookieContainer(); List Datacubes string []dbnames=pws.AllCubeNames(); for(int i=0;i<dbnames.Length;i++) Console.WriteLine(dbnames[i]); • Listing all cube names • Selecting a cube • Asking for some properties pws.SelectDB("…"); int n=pws.GetDimN(); for(int i=0;i<n;i++) Console.WriteLine("Dim"+i+"("+pws.GetDimTitle(i)+")="+pws.GetDimSize(i)); Select a cube and Get some properties • Defining a range • Submitting a query Select Ranges and Submit Query int []rStart={…}; int []rEnd ={…}; pws.SetRange(rStart,rEnd); pws.Var(1); • Asking for result progressively • Closing connection Progressive Results while(pws.HasMore()) Console.Write("Result="+pws.Advance(500)+"["+pws.GetPercentage()+"]\t\r"); pws.CloseConnection();

  29. Why Web Services? • How the technology or arch principals used in the projects could be used by others - even in different domains? • Wavelet-based query processing • Cursor functions for approximate and progressive query evaluation • Have webservices changed the way we publish data, information, or expose processes/applications? • Exposure • Web services make ProDA available to anyone, anytime, anywhere (e.g., Web service URL’s in NSF reports to make our research available) • UIs are redundant • Web services are the user interface (spending time on things we know how to do! GENESIS vs. GENESIS-II) • Tailor-suited functionality • We design the building blocks that scientists put together and customize according to their needs (don’t know what are their queries! And don’t want to limit them.)

  30. Why Web Services? • How has the use of DB and Web Services changed the way we do research? • Web services constitute a data-access layer on top of a DB • With web services, ProDA seamlessly extends basic DBMS functionality • Publishing DB techniques for testing purposes is simplified • Is there technology/software that others could utilize? • ProDA web services

  31. Conclusion • A novel pre-aggregation strategy • Supports conventional aggregates: COUNT, SUM and beyond: multivariate statistics • First pre-aggregation technique that does not require measures be specified a priori • Measures treated as functions of the attributes at the time • Provides a data independent progressive and approximate query answering technique • With provably poly-logarithmic worst-case query and update costs • And storage cost comparable or better than other pre-aggregation methods

  32. Future Work • Almost Complete • Batch queries • Efficient layout on disk • In Progress …. • I/O efficient wavelet transformation and update • Hybrid ordering of data and query coefficients • Error forecasting • Longer Term • Best basis per dimension • ProPolyne on relation algebra operators

  33. THANKS!(visit http://infolab.usc.edu)AcknowledgementsCollaborators: Brian Wilson and Amy Braverman (JPL)Students: R. Schmidt, M. Jahangiri & D. SacharidisSponsors:NASA/JPL: ESIP & REASoN programs NSF: ERC, ITR & CAREER programs Industry: Microsoft and Chevron

More Related