1 / 20

Use of the SPSSMR Data Model at ATP 12 January 2004

Use of the SPSSMR Data Model at ATP 12 January 2004. Introduction. John Lyon ATP - IT Solutions Department How the SPSS Data Model has changed the development process at ATP Vector and example project. What counts as the Data Model?. The Metadata Model The Case Data Model

bobby
Download Presentation

Use of the SPSSMR Data Model at ATP 12 January 2004

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Use of the SPSSMR Data Model at ATP 12 January 2004

  2. Introduction • John Lyon • ATP - IT Solutions Department • How the SPSS Data Model has changed the development process at ATP • Vector and example project

  3. What counts as the Data Model? • The Metadata Model • The Case Data Model • SPSS MR OLEDB Provider • SPSS Evaluate Component, SPSS Function Libraries, SQL Aggregation Engine

  4. Example project • Browser-based front end to legacy in-house cross-tabulation system • Complex continuous research project • 1,000,000+ records • Hierarchical data • Multiple end user clients with complex access control • Simple filter definition available in the front end • Complex filter and derived variable definition at back-end

  5. Other considerations • Minimum interruption of existing production processes • Client is committed to Dimensions products and the system can take advantage of a server version of the Data Model • Compatibility with other systems including MRTables • ATP had already developed an aggregation engine that works on top of the Data Model with most of the functionality required - Vector

  6. Intranet/Extranetpages Automatedcharting Web tables etc. Vector manipulation component Table object models e.g. Winyaps, mrTables OLAP Dimensions DSC ODBC triple-s Vector Architecture Vector aggregation component

  7. Demo • Example project

  8. Can we use the Data Model? • Does the Data Model’s conceptual view of a data set support all the structures I need? • Does the Data Model’s API provide functionality that makes it worth using? • Do all the layers of actual implementation I planning to use support the functionality I need in an efficient way? – TEST THEM • Can we really afford to ignore the compatibility and “future proof-ness” that we gain from using the data model?

  9. Metadata Model - Structure • Thinking of the Data Model as a data format analogous to SSS– does the metadata object model definition support all the basic structures I need? • In our experience, for reporting and at an individual project level, the answer will almost certainly be yes • Key features are: full set of data types including categorical variables, support for hierarchical data, versioning, languages, text variations, loops and grids

  10. What’s missing • Routing • No structures for defining user access rights • Because the structures tend to be project based, I think it lacks the structures that would be needed to implement a global research metadata repository • There is no concept of hierarchical categorical structures like product groupings, employee structures, retail output networks, etc

  11. Example project • The legacy system supports the definition of derived variables that are composites of numeric and categorical variables - these could not be resolved into Data Model variables types • Need to control access to variables down to an individual user level - data model does not support any structures to define this • We also needed to control access to based on time • The obvious solution was to take advantage of the open structure of the data model and control access through the API

  12. Metadata Model API • By this I mean the COM object that is used to load and manipulate the metadata structures – MDM Document • The object exposes a comprehensive set of properties allowing developers to access, modify and create new instances of all the structures supported by the data model • It handles the complexities of controlling multiple versions, languages, contexts and merging documents • It’s open - you can develop your own MDSCs

  13. What’s missing • It can be very slow to load • The load is all or nothing – there’s no concept of a partial load • You can only lock the document for read/write operations at a document level • In short, the MDM Document is exactly what it says – a DOCUMENT, with all the limitations that that implies. What I really want is an Object Database system that gives me full multi-user access to the metadata

  14. Example project • API structure of the Data Model makes it possible to work around most problems • Decided to build an MDSC that understands the metadata structures used by the legacy system including the structure for user level access control to variables and back data • Developer’s point of view - each user connects to a different data source and the MDSC deals with returning a partial view of the database for that user • Clients point of view - they can use their existing production processes and access control procedures to manage the system

  15. Case Data Model • CDSCs – which map a native data source to something understood by the data model • SPSSMR OLEDB Provider which process SQL based requests for the data and returns a series of virtual tables to the client application • To achieve this, the OLEDB Provider uses the Expression Evaluation Component, an SPSSMR Function Library and the SQL Aggregation Engine

  16. The Evaluation Component • For me the expression the evaluation component and the function library it uses are the very heart of the data model and the best reason for using it • They parse and evaluate expressions involving the market research specific data structures supported by the Case Data Model • Prior to version 2.8, the most significant feature was the ability to handle categorical variables as sets • The data model defines how these sets behave with the usual range of operators and also provides a useful range of set based functions to manipulate them

  17. Version 2.8 • With release 2.8, the data model also supports a very neat way of defining expressions on hierarchical data by introducing a syntax to “uplev” and “downlev” variables • The uplev operator is used in conjunction with an aggregate function to control how the uplev is performed • Another new feature of Data Model 2.8 is the support for hierarchical “SQL” queries which implement a hierarchical syntax and return hierarchical recordsets • Syntax for running “sub-queries” • The syntax is a clever compromise between the ideas of standard SQL and the need to support a concise expression syntax that can be evaluated outside the context of an SQL query

  18. Example project • Involves hierarchical data and on-the-fly evaluation of derived variables involving expressions across different levels of the hierarchy • With support for hierarchical data our plan is to… • Develop a CDSC to load the base data variables into HDATA tables • Develop the MDSC to load the base variables as usual but map any derived variables in the legacy system into derived variables in the MDM. These will then be evaluated on-the-fly by the Case Data Model • Unfortunately, we started development a while ago and we decided to build the logic for evaluating hierarchical expressions into the CDSC

  19. Problems with the Case Data • Performance • The case data model may use SQL to retrieve the data but it’s not a RDBMS • You can’t really ignore the underlying structure of the data file – which DSC you use in any situation makes a big difference • Test everything • In Vector we cache the data into inverted binary files

  20. Conclusion • Thinking about this application in isolation, you could argue that we don’t need to go through the data model • Some of the reasons are defensive: our clients expect it; we want to be able to integrate with other SPSSMR applications; we want to be able to support a wide range of exports; etc. • The most important reason is that the data model APIs include a wealth of functionality for handling research data that are exposed to developers in a flexible way • In the long term – it saves time! • ATP will always look to build future projects around the data model wherever possible

More Related