1 / 12

Grid-Enabling Data: Sticking Plaster, Sellotape, & Chewing Gum ?

Grid-Enabling Data: Sticking Plaster, Sellotape, & Chewing Gum ?. Colin C. Venters c.venters@ncess.ac.uk National Centre for e-Social Science University of Manchester. Terms of Reference. Data: numbers, characters, images which can processed and transmitted by [humans] and [machines].

yehudi
Download Presentation

Grid-Enabling Data: Sticking Plaster, Sellotape, & Chewing Gum ?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Grid-Enabling Data: Sticking Plaster, Sellotape, & Chewing Gum? Colin C. Venters c.venters@ncess.ac.uk National Centre for e-Social Science University of Manchester

  2. Terms of Reference • Data: numbers, characters, images which can processed and transmitted by [humans] and [machines]. • Unstructured. • Semi-structured. • Structured. • Database Management System (DBMS): a suite of programs which manage the storage and retrieval of large structured sets of persistent data. • Database: one or more large structured sets of persistent data and one component of a database management system. • Federated databases: data integration using middleware.

  3. What’s in a Grid? • Computational Grids - high performance computing resources. • Data Grids - access to heterogeneous datasets. • Access Grid - advanced video conferencing-based collaborative environment. • The Grid makes it possible to share heterogeneous, distributed resources over a network.

  4. The Grid Metaphor Mobile Access Supercomputer, PC-Cluster G R I D M I D D L E W A R E Workstation DBMS, Sensors, Experiments Visualization Networks

  5. Data Integration • Unimpeded use of distributed, heterogeneous, autonomous data resources. • Integrated view of the data resources that allow users to interact with them as if they constituted a single, global, integrated data resource. • Data integration fosters collaboration - one of the fundamental goals of e-research. • Limited DBMS support for Grid integration.

  6. Grid-Enabling: Grid Middleware • GridFTP • High-performance data transfer protocol. • Storage Resource Broker (SRB) • Uniform interface to a virtual distributed data storage resource. • Open Grid Services Architecture Data Access and Integration (OGSAI-DAI) • Grid Data Service (GDS). • Standard interface for database access. • Grid Data Service Factory (GDSF). • Establishes a database service instance. • Database Access and Integration Service Group Registry (DAISGR). • Identifies available database services. • OGSA-DQP • Distributed Query Processing i.e. search across multiple databases.

  7. ConvertGrid • ESRC pilot demonstrator project (PDP) in e-Social Science Programme. • Research problem: investigating complex research questions that require the combination of datasets from multiple sources. • Data management: • Access to multiple datasets. • Data fusion: • Multiple geo-referenced data sets i.e. different target geographies e.g. 1991 Wards, 1991 Postcode Sectors. • Converts data sources with different native geographies to a common Target Geography. • CSV or XML format. • Results returned as a string or streams (FTP/HTTP/GridFTP).

  8. Different Target Geographies

  9. ConvertGrid Architecture

  10. Challenges • Scalability: • Performance and capacity requirements. • Security: • Use of Grid Security Infrastructure (GSI) at the Grid service client level is a non-trivial problem. • Heterogeneity: • Infrastructural. • Syntactic. • Semantic. • Metadata: • Adds contexts to data aiding identification, location, and interpretation.

  11. Further Reading • Watson, P. (2003). Databases and the Grid. In: Grid Computing: Making The Global Infrastructure a Reality, F. Berman, G. Fox, and A. J. G. Hey (eds.), Wiley, pp. 363-384. • Cole, K. et al. (2003). Grid Enabling Quantitative Social Science Datasets: A Scoping Study. ESRC • Atkinson, M. et al. (2004). Data Access, Integration, and Management. In Foster, I. and Kesselman, C. The Grid2: Blueprint for a New Computing Infrastructure, Elsevier, p. 391-429.

  12. Acknowledgements • ConvertGrid Team, University of Manchester • Keith Cole, Jon McLaren, Pascal Ekin, Linda Mason, Stephen Pickles, and Justin Hayes. • Paul Watson, University of Newcastle • Alvaro Fernandes, University of Manchester • Mike Mineter, National e-Science Centre, University of Edinburgh

More Related