1 / 25

Data exchange, data merging and common storage format for NEWS

Data exchange, data merging and common storage format for NEWS. Valeri Tioukov 13/06/2019. Proposed data flow (26/07/2016 ). European scanning system. Japanese scanning system. Monte Carlo. Convertor. Convertor. Common Data Format. Common Analysis tools Compatible results.

ledbetterj
Download Presentation

Data exchange, data merging and common storage format for NEWS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data exchange, data merging and common storage format for NEWS Valeri Tioukov 13/06/2019

  2. Proposed data flow (26/07/2016) European scanning system Japanese scanning system Monte Carlo Convertor Convertor Common Data Format Common Analysis tools Compatible results LNGS 13/06/2019

  3. Presented in 2017 Current data flow European scanning system Japanese scanning system Monte Carlo EU data JP data MC data LNGS 13/06/2019

  4. DMDS - Revision 5: /dm2root/src/libDMRoot • .. • DMRCluster.cpp • DMRCluster.h • DMRGrain.cpp • DMRGrain.h • DMRImage.cpp • DMRImage.h • DMRLog.cpp • DMRLog.h • DMRMicrotrack.cpp • DMRMicrotrack.h • DMRRun.cpp • DMRRun.h • DMRRunHeader.cpp • DMRRunHeader.h • DMRView.cpp • DMRView.h • DMRViewHeader.cpp • DMRViewHeader.h • DMRootLinkDef.h • Makefile • libDMRoot.h • libDMRoot.sln • libDMRoot.vcproj Presented in 2017 First version of the data exchange library is ready and available here: http://emulsion.na.infn.it/svn/DMDS/ The project is called dm2root and contain one library libDMRoot LNGS 13/06/2019

  5. Next pass toward common(?) data format (4 days ago) New, additional storage library (almost exact copy of libDMRoot) • Question: • Why you decide to make a copy of libDMRoot and not use it directly as it was suggested? • Answer: • We have different scanning system, some parameters and some algorithms are different, so we need our own classes => the data format should be different LNGS 13/06/2019

  6. We have 3 different scanning systems in Italy Polarizing system - Color system – GrayScale system • Scanning systems differ • Some parameters differ • Some algorithms differ Do we need different storage libraries with different formats for them? Following this logics each microscope could have it’s own format: • libDMR_Color • libDMR_Polar • libDMR_Gs • …. • And if we made some modification in the scanning system? • libDMR_Polar_configuration1 • …. Create a NEW, DIFFERENT storage format due to small difference in the algorithms or parameters - Is it really a good idea? LNGS 13/06/2019

  7. Transient data model OK! • Color system => Color classes and Color algorithms • Polar system => Polar classes and Polar algorithms • B/W system => BW classes and BW algorithms • (Japanese system => Japanese classes and Japanese algorithms) Data Processing: BUT we use common storage format: Are different! Japanese system: DMRCluster { x,y,z ……. Icol = 0 Ipol = 0 IellipticFit != 0 } BW system: DMRCluster { x,y,z ……. Icol = 0 Ipol = 0 } Polar system: DMRCluster { x,y,z ……. Icol = 0 Ipol != 0 } Color system: DMRCluster { x,y,z ……. Icol != 0 Ipol = 0 } Data Storage Persistent data model Common parameters Color specific Polar specific Japanese specific LNGS 13/06/2019 That’s all! – no any storage classes duplication

  8. Color SS Polar SS Elliptical SS Phase Contrast SS Data Production Calg Palg Ealg PHalg Preprocessing Reading from Common format Writing to Common format Etc. Microtracks Grains Images Clusters Common format Postprocessing, Crosscheck, Analysis Ealg PHalg Any other algorithm Palg Calg Direct data check LNGS 13/06/2019

  9. Common format - what is this? • Is it a raw format? • Is it a final format? • Is it almost final format? Not really Not necessary Not necessary Most important properties of any common format: The information is sufficient to perform the complete data analysis It is documented and clear to everybody All relevant experimental data available in this format LNGS 13/06/2019

  10. Example 1 • Raw images only • Legal common format, but very inconvenient: • Huge files • Slow processing SS Data Production Preprocessing Assume that no any algorithms available here Reading from Common format Writing to Common format Etc. Microtracks Grains Raw Images Clusters Common format Postprocessing, Crosscheck Analysis Clustering Other processing LNGS 13/06/2019

  11. Example 2 Images related to clusters, clusters itself Already good common format SS Data Production Clustering Clustering algorithm is available Preprocessing Writing to Common format Reading from Common format Etc. Microtracks Grains Cluster Images Clusters Common format Postprocessing, Crosscheck Analysis Other processing LNGS 13/06/2019

  12. Data providers and data consumers in collaboration Providers: scanning, preprocessing, writing data into common format Consumers: reading data from common format, postprocessing, analysis Napoli consume Elliptical data Nagoya consume Polarisation data Machine learning can consume any data • Nagoya produce Elliptical data • Napoli produce Polarization data • ..etc… New algorithms can be developed by the both data providers and data consumers Once the new algorithm is available, tested and work fine, you may want to make it’s results available to Collaboration and providethem as a part of a Common Format LNGS 13/06/2019

  13. Example 3 Images related to clusters, clusters and grains Even better common format SS Data Production Clustering and graining Preprocessing Reading from Common format Writing to Common format Etc. Microtracks Grains Cluster Images Clusters Common format Postprocessing, Crosscheck Analysis Other processing LNGS 13/06/2019

  14. Example 4 Images related to clusters, clusters and grains Even better common format SS Data Production Clustering and graining Preprocessing Reading from Common format Writing to Common format Etc. Microtracks Grains Cluster Images Clusters Common format Postprocessing, Crosscheck Analysis Code exported to SVN – becomes available to Collaboration Barshift polarization analysis Other processing LNGS 13/06/2019

  15. Example 5 (near future) Images related to clusters, clusters and grains, microtracks Rich common format SS Data Production Clustering and graining Preprocessing Reading from Common format Writing to Common format Microtracks Etc. Microtracks Grains Cluster Images Clusters Common format Postprocessing, Crosscheck Analysis Code exported to SVN – becomes available to Collaboration Barshift polarization analysis Other processing LNGS 13/06/2019

  16. In libDMRoot we already prepared the structures for most of basic objects • Data provider have some information on preprocessing phase? => Fill it • Do not have this information? => Do not fill it • Not necessary to wait when the complete and ideal processing chain is established for starting the use of a common format • It is not too early to start now - it’s quite late, because we need to perform the common analysis immediately Grains Etc. Microtracks Grains Cluster Images Clusters Common format • If some structure is not enough to accommodate any information we can extend it • What is missed in DMRCluster to fit Japanese Elliptical data? • Different algorithms can produce some difference in result. Two solutions: • keep in data the information (flag) about the algorithm applied • Export algorithm itself in a way that other people can run it on Common Format data LNGS 13/06/2019

  17. Practical steps to do • Define in the libDMRoot extensions necessary (if any) to fit Nagoya data • Extend libDMRoot • Drop libJPData to avoid the code duplication and start to export data in libDMRoot - them are practically identical now, so this is straightforward • libDMRoot – storage library is conservative and should be updated only when it is really necessary and in agreement with other data providers • Instead for processing (not for storage) any new classes and new libraries can be created both on preprocessing and on postprocessing level • The only constraints are: • preprocessing algorithms must be able to write data into common format (directly or via converter) • Postprocessing – reads data from common format LNGS 13/06/2019

  18. What is the data merging? • Sample was scanned in Japan => Elliptic selection done • Same sample scanned in Napoli => Polarization analysis done • To merge data we do not need to put them together into the same tree • To merge data we do not need to put them together into the same file • We need to find one by one correspondence between clusters (grains) obtained by both systems • The basic result of the data merging is this table together with both original data files LNGS 13/06/2019

  19. File B: Grains (clusters) in common format File A: Grains (clusters) in common format Example of merged data ViewB grB grA ViewA ViewA,GrA <-> viewB,grB File 3: list of matched couples LNGS 13/06/2019

  20. The same area scanned on two Napoli systems Sample A (NSSna2) Sample B (NSSna1) C60keV_test/polarized_light/dm_tracks.dm.root 5mm x 1mm area scanned 1827 views About 900000 grains • C60keV_test/color_camera/dm_tracks.dm.root • 5 mm x 1mm area scanned • 1703 views • About 1300000 grains LNGS 13/06/2019

  21. dmalign.Aff: 1.000516 0.012561 -0.013973 0.998641 11.60 22.77 Global alignment • Result of the Global alignment procedure: • 5 mm2 vs 5 mm2 • 325000 considences found with +-1.5 μm acceptance • About of 2/3 of them are in the peak core • Matching accuracy (3σ of the peak) • X: +- 1.1 μm • Y: +- 0.65 μm LNGS 13/06/2019

  22. Merging procedure - one by one correspondence is established dmmerge –par=align.rootrc Input: a.dm.root - scanning data b.dm.root – scanning data a_b.cp.root – couples (result of dmalign) Output: a_b.mrg.root – with “match” tree made of selected branches for selected couples of both scanned samples LNGS 13/06/2019

  23. root -l check_mrg.C TCut cut("cut","abs(s2.eX-s1.eX)<0.4&&abs(s2.eY-s1.eY)<0.25"); //peak The signal is selected here LNGS 13/06/2019

  24. Peak couples Effect of polarization on clusters direction and the barycenter shift Ag40nmNP Without filter no dependence of the clusters direction from the polarization No barshifts>0.04 With filter clear dependence of the clusters direction from the polarization Very few barshifts>0.04 For nanoparticles no corellation of the cluster angle and the barshift

  25. Files sharing • We got 100 Tb of disk space in the CNAF (computing center of INFN) • WebDAV protocol is to be established for accessing this space • Once it’s done all data providers will have write-access all data consumers – read access for data • The request for access providing WebDAV was done several weeks ago • Meanwhile in Napoli we export data using our group Apache web server to make available it for downloading • Is it possible also for Japanese data? Some Cloud solution? LNGS 13/06/2019

More Related