410 likes | 545 Views
PaNdata Barcelona Meeting Welcome. 17-18 September 2009. Agenda. Thursday Ongoing activities everything except the proposal Friday Proposal. Thursday. 11:00 - 13:00 First Session Data Policy - next steps (Michael Wilson – applogies ) Standards (Mark Koennecke,
 
                
                E N D
PaNdataBarcelona MeetingWelcome 17-18 September 2009
Agenda • Thursday • Ongoing activities • everything except the proposal • Friday • Proposal
Thursday 11:00 - 13:00 First Session • Data Policy - next steps (Michael Wilson – applogies) • Standards (Mark Koennecke, • ESRF use of Nexus (Armando Solé ) 13:00 - 15:00 Lunch 15:00 - 17:00 Second Session • ICAT - feedback from developers meeting and plans (Tom Griffin) • Software catalogue - status and plans (Jean-Francoise Perrin)
Friday 09:30 - 11:00 Session Three • Website and Wiki (Micheal Gleaves) • Review Actions • I3 or CSA • Line 1.2.3 “Virtual Research Communities” • Line 3.3 “Coordination Actions...” • Scope for proposal (What’s in scope and what’s out) 11:00 - 11:30 Coffee Break 11:30 - 13:00 Session Four • Activities/Workpackages • Preparing the document • Review actions 13:00 - 14:30 Lunch Depart.
PolicyStatus and next steps Mgmt Support Users Implementation
Policy Framework discussion • Issue current draft (1 Oct) • Agree amongst ourselves (15 Oct) (Michael W) • Informal pass through management • Consultation with in-house scientists • Consultation with users? • Revise • Goto 2
Policy Framework issues do variables (eg embargo period) have to be standardised to ensure fairness? Might it be necessary to have different embargo periods for different experiments? Could user define there own embargo period as part of the application/evaluation? Put the variables in a table separate from the text - State the principle that we are working towards common numbers
Nexus issues • How do we break down the hurdles into small steps? • Nexus person • What will they do? • Can we get someone/somewhere? • Can applications use nexus? • Synchr person joining NIAC? • Mark to: (Christmas latest, probabaly 31 Oct) • Produce meta plan of steps to getting a nexus person • Produce job description for nexus person • Pandata Data formats “developers” workshop • Presentations on • Needs of synchrotrons (armando) • Nexus • (Mark, Freddie and Armando to organise?), • Larger meeting? • open (<30 people), @PSI? Spring 2010
ICAT status and plans • Monthly releases of ICAT • Put info on wiki about Identity Mgmt project (Rudolf)
SW CAT • Catalogue for neutron fairly well populated • Not much for synchrotrons • Catalogue could be completed • Everyone to provide info on s/w (15 Oct) • And licenses to Mark/Jean-F (15 Oct) • But not much more likely without manpower • Forge exists for pandata (forge.ill.eu) • Need to register for pan-data.eu • Do we need it? • Joint license negotation (eg MatLab). • Disussions on alternatives eg skylab (free clone)
Web/Wiki • Words for public part of wiki • Contact freddie about domain name • Michael G 15 Oct
Preparing a proposal • Which line • Which type of project • What scope • What activities/workpackages • Preparing the document
Which line: 1.2.3 or 3.3? INFRA-2010-1.2.3: Virtual Research Communities • Enable an increasing number of users and research communities from all science and engineering disciplines to access and use e-Infrastructures • Remove the constraints of distance, access and usability as well as the barriers between disciplines for a more effective scientific collaboration and Innovation • Deployment of e-Infrastructures in research communities to enable multidisciplinary collaboration • Deployment of end-to-end e-Infrastructure services and tools for integrating and increasing research capacities • Build user-configured virtual research facilities and test-beds from collection of diverse resources • Integrate and interlink regional e-Infrastructures The deployment and further evolution of e-Infrastructures addressing the research infrastructures of the ESFRI-roadmap is particularly encouraged. Combination of Collaborative projects and Coordination and Support Actions (CP-CSA) (I3) EUR 23 Million Oversubscribed Not specifically data centric
Which line: 1.2.3 or 3.3? INFRA-2010-3.3: Coordination actions, conferences and studies supporting policy development, including international cooperation, for e-Infrastructures • Enhance coordination between national and pan-European e-Infrastructure initiatives and programmes • Strengthen the innovation potential and impact of e-Infrastructures • Establish a new e-Infrastructures scientific software strategy in Europe in order to reinforce the global position of Europe • Coordinate a European eco-system of scientific data repositories (preservation and sharing) • Specific studies on e-Infrastructure related topics • Dissemination of information on the e-Infrastructure programme and projects International cooperation, including: • Further extension of e-Infrastructures to International Cooperation Partner countries (ICPC) • Joint roadmapping of activities with developed countries • Promotion of the interoperation between similar infrastructures on the global scale EUR 10 M Coordination and Support Actions – CA or SA
CSA Coordination and Support Actions (CSA) Support Measures • Networking • Coordination or support actions • (CSA-CA or CSA-SA) • Management of the consortium Financial model • reimbursement of indirect costs limited to 7% of the direct costs • (less subcontracting and third party contribution) for all participants
CA or SA Coordination actions are designed to promote and support the networking and co-ordination of research and innovation activities (projects) at national, regional, European or international level over a fixed period (at least 3 entities) Support actions are designed to complement the other FP7 funding schemes. For example, they: • underpin the implementation of the programme • help in preparations for future Community research and technological development policy activities • stimulate, encourage and facilitate the participation of SMEs, civil society organisations, small research teams, newly developed and remote research centres, as well as setting up research clusters across Europe • Cover one-off events or single purpose activities • (at least one entity)
SA or CA Funding Scheme Purpose • Supportto research activities and policies • Coordination of research activities and policies “Target ”audience Infrastructure operators, End-users (researchers in all fields of science and Engineering) Research institutes, Universities, Industry, including SMEs Activities covered by EU contribution • Conferences, seminars, workshops, working groups, studies, fact finding, monitoring, strategy development, awards and competitions, working or expert groups, operational support and dissemination, information and communication activities • Networking, coordination and dissemination activities • Management of the consortium Form of reimbursement • Based on eligible cost unless other forms are foreseen in the work programme Average duration • Between 9 and 30 months • Between 18 and 30 months Enlargement of partnership within the initial budget • NA Specific characteristics • No funding of research, development or demonstration • Normally focused on one specific activity and often one specific event. • Possibility of one single participant • In FP6, SA typically had 1- 15 participants and total EC contribution of 0.3- 3 Meuro • In FP6, CA typically had 13-26 participants and total EC contribution of 0.5-2MEuro
CSA Evaluation Criteria Scientific and technical quality • Soundness of concept, and quality of objectives • Contribution to the coordination of high quality research (CA only) • Quality and effectiveness of the coordination/support action mechanisms and associated workplan Implementation • Appropriateness of the management structures and procedures • Quality and relevant experience of the individual participants • Quality of the consortium* as a whole (including complementarity, balance) • Appropriateness of the allocation and justification of the resources to be committed (budget, staff, equipment) Impact • Contribution at the European or international level to the expected impacts listed in the workprogramme under the relevant activity • Appropriateness of measures for spreading excellence, exploiting results and disseminating knowledge through engagement with stakeholders and the public at large
Scope/Activities • Policy • Standards • Nexus • Authentication? • Any others? • Data catalogue? • Data virtualisation? • Software catalogue? • Publication catalogue? • Analysis (remote/parallel/ integration)?
Roadmap with big vision • End-to-end integration of data pipeline • “From application to publication” • Goals • Support users doing analysis • Federated analysis services • Standardiased software (accessibilty, multiuse, licenses) • Open access to software • Audit trail, redo, provenance • Multiuse of data • Exchange of data/presevere • Better quality • Quicker (real-time) feedback • Efficiency • Virtualisation of hw/os to minimise dependencies • Needs progress on • Policy • Formats • Data volume estimates • Large data sets in te future • Long term preservation/access/combination • Control systems – Common user i/f – different underneath for instrument scientist. • Proposal systems? – no!
Analysis • Objective • Analysis in place, in real-time – feedback experimental tuning • Current situation • exists in some areas - Consultation with users what is needed in other areas • Tools for beamline staff and tools for users (visulaisation – instant feedabck - for diagnosisi of beamline) - Common data format • On-line analysis eg for PX or Tomography – • Next step • “Standardised” software service • For ease for users • for efficiency of providers • Integration of simulation and experimentation • Presenation (Stephan)
Data formats • Common data format • Preservation (rerunning onld analysis) • Metadata on format version and sw version • format version control • Sw vc and preservation • Interoperability • Converters • Adapt applications (what about new sw?) • Cost benefit analysis of adapting software • ?(new software uptake model)
Common Data Policy • Uniform compliance with EU policy • See earlier slide for steps • Wider consultation outside consortium • Other eu initiatives • Other disciplines • Comparison/consultation with the US • Feedback to EU policy
Publications • Link between publication and data • To enable redoing of analysis • Meta cat for pubs • Investigate tech for linking • Ids for data • Grey literature (eg PhD data)
Planning H-J, FrankS, JBic, Mjohnson Consortium • Prepare outline for new partners (30 Sept) • Contact more partners (LLB, FRM2, Polish, EMBL?, ESS?, Bilbao?, - Heinz-J) 15 Oct • SMEs • Liaison with Matlab, IDL, Proposal document • Prepare skeleton • Revise partner profiles 15 Oct • Tele-Meeting schedule weekly teleconfs • Weekly updates to consortium Other • Consult with brussels (3.3) • CA or SA • SME and other industry partners. • eIRG • Events/meetings • Espionage
Other • Next face to face March? • (hearing March?)
HDF5/NeXus V.A. Solé – ESRF Software Group NeXus discussion, Pandata, Sep. 2009
ESRF current situation: SPEC File Format • Advantages • Simplicity (multiple column ASCII) • Widespread • Counters, Motors and MCA in same file • Disadvantages • Not suited to large datasets (images)
ESRF current situation: ESRF Data Format • Advantages • Suited to large data sets (images) • Disadvantages • Not widespread (basically ESRF) • Incomplete « official » metadata
HDF5 Needs • Efficient format to store different data types • Keep together counters, images, mca, … • Compression support • Widespread support • Efficient and easy access to the data for visualization and analysis
HDF5, why not NeXus? • What we like about NeXus • Well defined classes • A lot of endless discussions avoided • What we do not like about NeXus • A lot of endless discussions pending • No easy way to implement new needs • Can one claim everything is foreseen? • Misuse of NeXus groups: A new need should imply a new group • Slow reactivity
What do we propose? • To foresee a new group for unforeseen uses (we have just called it Measurement) • It would prevent misuse of already defined groups • Common use could lead to definition of new instruments (Ex. MCA) • Something as simple as grouping by data dimension solves several issues • Generic scan (common misuse of NXdata at most synchrotrons) • Users getting lost hunting for information • Analysis programs would know what to do with little or no intervention • A dataset of dimension 200x400x1000: 8.0E+07 Scalars in 3D volume? • 200x400 spectra of 1000 channels? • 200 images of 400x1000 pixels? • One NXentry would only contain one Measurement group • Similarly structured groups are desirable to store analysis information
How could it look like? NXroot Top level. One per file. NXentry One group per measurement Measurement One group per measurement Positioners One group per Measurement Ex. All motor positions when the command was issued. ScalarData One group per Measurement Ex. Scanned motors and counters. Spectrum Several datasets per Measurement Ex. 1 spectrum dataset per MCA device ImageData Several datasets per measurement Ex. 1 image dataset per CCD device • Advantages • Simple to implement • Answers current scientists demands (keep measurement data together, compression, …) • Compatible with NeXus if desired (specific NeXus groups can be written at any time with links and the opposite is also true) • Can be seen/used as an intermediate step for not- yet-defined instruments or uses
Current Status • Analysis tools have to be ready for NeXus/HDF5 prior to the format implementation • Scripts to convert from Specfile to HDF5 written • TODO: Add a set of EDF files to a particular scan of an HDF5 file • Python module for HDF5/NeXus file contents browsing written • Python support implemented using h5py and in collaboration with CHESS • Full support for 1D data visualization and analysis incorporated into PyMca • TODO: Apply PyMca 2D, 3D and 4D visualization capabilities to HDF5/NeXus files
PyMca HDF5/NeXus HDF5 Support Collaboration with D. Dale, CHESS SOLEIL NeXus Data courtesy of J.A. Sans and G. Martínez
Data courtesy of J.A. Sans and G. Martínez Data courtesy of A. Díaz PyMca Visualization Data courtesy of P. Cloetens PyMca Object3D Module Up to 4D visualization
We need your experience How do you deal with instruments saving the data in files with proprietary formats? - Do you include the file names somewhere in the relevant instrument field? - Do you convert the format to include it in the final file? How do you deal with data originated from several computers? - Is the sequencer who reads and writes the data? - Everything is bufferized and written by a particular server? - Is concurrent access to the file possible?
ESRF conclusions so far • HDF5 will be supported • Analysis codes must be able to deal natively with HDF5 prior to deployment • NeXus groups we can use will be used • We consider an error to use NeXus groups for things other than what they were intended for. A non-respected standard is not any longer a standard. • Our analysis codes will support NeXus in its HDF5 version (but feel free to add XML)