Case Study in e-Social Science Rob Allan (CCLRC Daresbury Laboratory) Rob Crouchley (University of Lancaster) Building Collaborative e-Research Environments JISC Consultation Workshops, 23/2/04 and 5/3/04
Specific Social Scientists Problems • They have much less experience and expertise in the use of the Grid than those typically from other research council areas; • There is a significant intellectual gap between such disciplines and computer science; • Distributed systems are also inherently complex and associated middleware products are not easy to use; • The Open Middleware Infrastructure Institute (OMII) is likely to provide generic (open-source) middleware and associated services. E-Science middleware currently not specifically targeted for the social science community.
Social Scientists Need • Help to develop a more computer-literate collaborative culture; • Help to develop component-based software, visual composition tools and scripting languages which are easy to use; • To exploit state-of-the-art software development technologies such as aspect-oriented programming to enhance flexibility. Middleware could be the catalyst for re-use and sharing in the e-Social Sciences. Some examples and ideas follow.
Research motivated by a desire to determine causality Involves identifying the various factors which influence the behaviour or outcome of interest and quantifying their effects; controlling for all the different confounding factors which would otherwise result in spurious relationships and misleading results. Randomised experiments not feasible, we cannot randomly allocate individuals to different levels of training in order to evaluate programs. We rely on observational data, i.e. data that have been obtained from surveys and censuses. This is different to “exact sciences” like physics and chemistry where repeatable experiments can be performed. Some Features of Social Science Research
Soc. Sci. needs Comprehensive Models • Interdependent sub models, we need joint models for the data complexities and the core processes we want to understand • Models are not linear in the parameters, require special procedures and are highly computationally intensive due to the high dimensionality and the interdependent sub models. • Simple analyses are usually very misleading about the role of the controls, eth, sex etc. Soc. Sci. research is complex - large parameter space, many interpretations and models which need to be tested. Cannot be done in isolation… Increasing need to link components and access large computers/ data sets from desktop.
Data Management A Data Management B Data Management C Analysis A Analysis B Analysis C Middleware E-Science Technology can link Components!
New Tools: The Analysis Cycle Main ESDS Data Sets TTWA Data, NOMIS Select Data Set and Appropriate Variables: Merge Files: Add Variables Contextual Data Working Data Results
New Tools: Simultaneous Analysis Example: research in educational attainment
E-Science can enhance Collaboration! • Particularly important in qualitative research; • Enable comparison of different markup/ interpretation; • Direct access to datasets for validation; • Direct input of data from fieldwork involving questionnaires, photography etc. • Delivery/ input devices (some mobile) may include: portals, Access Grid, PC tablets, PDA, camera, phone etc.
Researcher A Researcher B Video Corpus Researcher C New Tools : Collaboration in Video Markup VIDGRID: Multiple video streams can be delivered into an AG or portlet environment
Training and Awareness in e-Social Science! Project ReDReSS: Resource Discovery for Researchers in e-Social Science “ to accelerate the development and awareness of a new kind of computing and data infrastructure for the Social Sciences, and to support the increasingly national and global collaborations emerging in many areas of Social Science” • To help illustrate appropriate methodologies and software that admits the full complexity of substantive problems; • To help articulate the middleware needs of social researchers; • To help nurture and support a community of social researchers; • To help to provide critical mass and improve the efficiency of interactions between the interested researchers, thus reducing the number of lost opportunities for social science.
We will use/ contribute to existing technologies Resource discovery Sharing tools Personalised workspaces Flexibly delivery
E-Science enabling a Virtual Research Environment! “to make the use of e-Science technologies, methodologies and resources easier and more transparent than simply developing bespoke applications on an infrastructure toolkit (such as Globus GT2 or OGSI/ WSRF). ” We need to: Bridge the gap between different types of technology (database management, computational methods, data collection, networks, Condor resources, visualization systems, collaborative working, Access Grid, etc.); Build on pilot projects and take input from other disciplines Link to core JCSR clusters and resources at other e-Science Centres; Provide an environment to enhance the programmability and usability of such a Grid by integrating work from a number of ongoing projects and encourage community input.
The Grid “Client Problem” Many clients want to access a few Grid-enabled resources Grid Core Consumer clients: PC, TV, video, AG Middleware e.g. Globus Workplace: desktop clients Grid Core Portable clients: phones, laptop, pda, data collection
Some VRE Functions • Authentication, Authorisationand Accounting – use Shibboleth and Permis in line with JISC proposals; • Community development of content - Content Management and Editing tools: • Access to middleware resources and documentation, • Access to training materials and resources, • Enable shared development of services/ applications, • Access to a consultancy/ support service, • Application Management Services - user access via pre-defined tools and applications to the UK e-Science Grid; • Data Management Services – discovery, authorisation, transfer, replication, upload, validation, curation; • Access to Broadcasts - on the Access Grid network; • Management Functions - for experts to maintain the system and guide non-experts, e.g. via expert systems and workflow.
Sanity Check However a number of areas significant for a production Grid environment have hardly yet been tackled. Issues include: • Grid information systems, service registration, discovery and definition of facilities; • Security, in particular role-based authorisation; • Portable parallel job specifications; • Meta-scheduling, resource reservation and ‘on demand’ access; • Dynamic linking and interacting with remote data sources; • Wide-area computational/ exprtimental steering; • Workflow composition and optimisation for complex procedures; • Distributed user and application management; • Data management and replication services; • Grid programming environments, PSEs and user interfaces; • Auditing, advertising and billing in a Grid-based resource market; • Semantic and autonomic tools; • Usability issues, ethics, etc…
Human Factors Customised delivery may be key to long-term uptake: • Use an environment familiar to the researchers, e.g.: • Web portals - training, awareness, search tools (search engines are popular) • Libraries - e.g. C for programmers • Programming environment – e.g. R for statistical analysis with well-known packages • Sound, video for virtual collaboration (TV is a popular medium) Bottom line: There is a lot we can/ need to do, but Social Science is already hard – the scientists need tools that do not make it harder!
UK E-Social Science Programme There is currently a growing body of work and projects in this area: • Pilot projects - ESRC • ReDRESS: Resource Discovery for Researchers in e-Social Science – JISC • UK National Grid Service + e-Science Grid - JCSR and DTI Core Programme • NCeSS: National Centre for e-Social Science - ESRC • CQeSSS: Centre for Quantitative e-Social Science Support - ESRC (+ future NCeSS nodes) • …