1 / 25

Emulsion Database Design Status Report

Emulsion Database Design Status Report. Extends Opera Internal Note # 38: “ Database Architecture for the European Emulsion Scanning System”. Updated DB Schema. Distributed DB Implementation. DB Client Technologies. DB Client Libraries. Conclusions. Cristiano Bozza European Emulsion Group

isaura
Download Presentation

Emulsion Database Design Status Report

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Emulsion Database Design Status Report Extends Opera Internal Note # 38:“Database Architecture for the European Emulsion Scanning System” Updated DB Schema Distributed DB Implementation DB Client Technologies DB Client Libraries Conclusions Cristiano Bozza European Emulsion Group LNGS, May 2003

  2. Updated DB Schema

  3. Updated DB Schema

  4. Updated DB Schema

  5. Updated DB Schema Overall structure unaltered, only fine tuning of data types and table columns To assess performance, tuning must be done in real-life conditions Not only benchmarks, but real use of DB for test exposure data

  6. Updated DB Schema DB size estimate (depends on Physics as well as instrumental needs) Assumptions for minimum size: 25000 events (OPERA half data set + tests, technical runs...) Few data from CS (< 100 tracks per sheet), so they are negligible Intercalibration through 4 track maps per sheet, 100 tracks / map All sheets in a brick are intercalibrated (conservative) 1 candidate is followed on 28 sheets on average (conservative) 1 “interesting points” per event (vtx, decays, e.m. showers, ...) 3 pass scanning (Vtx location, selection, precision measurement) 100 cosmics / background / fake microtracks in each zone

  7. Updated DB Schema DB size estimate (depends on Physics as well as instrumental needs) Assumptions for maximum size: 50000 events (OPERA full data set + tests, technical runs...) Few data from CS (< 100 tracks per sheet), so they are negligible Intercalibration through 4 track maps per sheet, 100 tracks / map All sheets in a brick are intercalibrated (conservative) 10 candidates are followed on 28 sheets on average (conservative) 3 “interesting points” per event (vtx, decays, e.m. showers, ...) 3 pass scanning (Vtx location, selection, precision measurement) 100 cosmics / background / fake microtracks in each zone

  8. Updated DB Schema DB size estimate (depends on Physics as well as instrumental needs) Results: Assuming 100 bytes / track + aligned reconstructions, we have 0.11÷2.3 TB Including safety factors for possible underestimations, the DB size should stay within DBSize < 2.5 TB To estimate the network bandwidth needed, we assume that the full dataset is read 10 times during 5 years’ data taking 0.055÷1.2 Mbit/s ON AVERAGE

  9. Distributed DB Implementation Workstation DB Group Workstation DB Group Core DB Scanning DB Scanning DB

  10. Distributed DB Implementation Core DB Group: Centralized location Full OPERA emulsion data set Multimaster replication (each machine has one copy) Scanning DB: One machine per each scanning site (+ optional backup) Copy of the subset of locally produced scanning data Materialized views (each machine stores only locally produced data) The full dataset is still accessible in a transparent way Minimum network traffic

  11. Distributed DB Implementation Pilot DB farm Core DB Group Simulation: Two Dell servers with high speed (1 Gbit) Ethernet connection Scanning DB Simulation: One Dell server machine with a materialized view of Salerno data Normal 100 Mbit/s LAN connection

  12. DB Client Technologies Several client connection technologies are being explored Goal: highest possible data availability for Oracle data Windows: ODBC, OLE DB, Oracle ODP.NET, ADO, ADO.NET Linux: Oracle OCI C++ libraries, Perl, Tcl/Tk, Python, GNOME-DB, J2EE Mono (.NET) Oracle provider

  13. DB Client Technologies Oracle and Windows: Working with DB is common practice in the Windows community. Oracle provides both the DB server and several client tools / libraries. Oracle 9i Database Server consists of 3 installation disks (2.4 GB). We have made several performance tests to choose the best access method. Since our code will be under .NET, the ADO.NET layer was common to all tests. OCI C++: low level API, much harder than other methods. ODBC: general DB library. OLE DB: 50% faster than ODBC. OK! Oracle Data Provider (ODP.NET): slightly slower (~10%?) than OLE DB.

  14. DB Client Technologies Oracle and Linux: Oracle is fully committed to supporting the Linux operating system. Indeed, Oracle was the first commercial database available on Linux. All key Oracle products including Oracle 9i Database, Application Server, Collaboration Suite, Developer Suite and E-Business Suite. Oracle 9i Database consists of 3 cpio archive files ( ~ 1.4 Gb). We installed it on a Red Hat 7.3 Linux Distribution.

  15. DB Client Technologies Oracle and Linux: OCI C++ Oracle Call Interface (OCI) is the Oracle software allowing access to thedatabase from an external application. In principle, you can use this C-based API to build a Database application from scratch. Oracle and Linux: Perl Perl is probably the most famous open source language. It is an interpreted scripting language easy to learn and extremely quick. It is widely used in Internet and Database applications.

  16. Perl 5 Script Perl DBI DBD:: Oracle Oracle OCI Oracle DB DB Client Technologies Oracle and Linux: Perl Perl Database applications are based on the DBI module, an object oriented architecture module. This module requires a specific database dependent driver (DBD::Oracle) to connect to the database

  17. DB Client Technologies Oracle and Linux: Perl A sample Perl script to access data from Oracle use strict ; use DBI ; # Connection to the DB my $dbh = DBI -> connect(dbi:Oracle:operadb,operausr,operapwd); # Database SQL Query my $sql = qq { SELECT Grains,SlopeX,SlopeY FROM MIPBaseTracks}; # Execute SQL Query my $sth = $dbh ->prepare ($sql); $sth -> execute (); # Loop on the table While (my ($run,$event,$ntracks)= $sth->fetchrow_array){ # Here you can do whatever you want with the data }

  18. DB Client Technologies Oracle and Linux: Tcl/Tk Tcl is an excellent scripting language created in 1987 by John Ousterhout. In 1988 he started to develop a graphic tool called Tk. From that moment on, Tcl/Tk is one of the most favourite language in the Open Source community. Oratcl is the module allowing the connection to an Oracle database. The connection to the database can be tested interactively using the Tcl shell.

  19. DB Client Technologies Oracle and Linux: Tcl/Tk A sample Tcl/Tk script to access data from Oracle tclsh %package require Oratcl %set handle [oralogon operausr/operapwd@operadb] %set cursor [oraopen $handle] %orasql $cursor {select Grains,SlopeX,SlopeY from MIPBaseTracks} %orafetch $cursor 25 0.400 0.003 32 0.231 0.005 …………… %oraclose $cursor %oralogoff $handle %exit

  20. DB Client Technologies Oracle and Linux: Perl/Tk Tk is wrapped inside Perl. So you can combine Perl quickness with Tk graphicscapabilities. Oraexplain is the Perl/Tk module used to connect a program to an Oracle database. Oraexplain scripts are more complex because they involve graphical elements like windows, buttons and frames. You can use them to develop login windows or more complex graphics applications.

  21. DB Client Technologies Oracle and Linux: Python Python is a GUI open source and object-oriented scripting language created by Guido van Rossum. It is used for any kind of application: GUI, XML, e-mail, …. DCOracle is the module used to connect a Python script to an Oracle database. With Python you can define classes and re-use them in other scripts.

  22. DB Client Technologies Oracle and Linux: GNOME-DB The GNOME-DB project aims to provide a free unified data access architecture to the GNOME project. GNOME-DB is useful for any application that accesses persistent data (not only databases), since it now contains a pretty good data management API. We are currently testing the GNOME-DB libraries. Oracle and Windows/Linux: Mono ADO.NET Data Provider for Oracle databases works on Windows and Linux with Oracle 8i and higher versions. We are currently testing the Mono ADO.NET libraries. Tests in progress: Web applications with Apache, JSP, PHP, Java, JDBC

  23. DB Client Libraries We are developing a full set of .NET / Mono classes specific to Opera DB OperaDb Namespaces in black OperaDb::Connection Classes in dark red OperaDb::Transaction OperaDb::Scanning OperaDb::Scanning::Batch OperaDb::Scanning::LinkedZone OperaDb::Scanning::MIPBaseTrack Who does not want to use SQL can program the DB in C++, C#, VB, FORTRAN, and so on, using these classes OperaDb::Scanning::MIPIndexedEmulsionTrack OperaDb::TotalScan OperaDb::TotalScan::Layer OperaDb::TotalScan::Segment OperaDb::TotalScan::Track OperaDb::TotalScan::Volume OperaDb::ComputingInfrastructure OperaDb::ComputingInfrastructure::Machine OperaDb::ComputingInfrastructure::ProgramSettings OperaDb::ComputingInfrastructure::User OperaDb::ComputingInfrastructure::UserPermissions More classes to come... OperaDb::ComputingInfrastructure::Site

  24. Sample #1: How to store a scanning zone into Opera DB C++: idZone = LinkedZone::Save(pMyZone, idBatch, rawDataPath, startTime, endTime, dbConn, dbTrans); C#: idZone = LinkedZone.Save(MyZone, idBatch, rawDataPath, startTime, endTime, dbConn, dbTrans); Sample #2: How to retrieve a volume reconstruction from Opera DB C++: Volume *pVol = new Volume(dbConn, dbTrans, idVolume, true); C#: Volume Vol = new Volume(dbConn, dbTrans, idVolume, true); Sample #3: How to convert a scanning zone from Opera DB to ROOT file C++: SySal:: Root::Opera::LinkedZone::Save(new LinkedZone(dbConn, dbTrans, idZone), filePath); C#: SySal.Root.Opera.LinkedZone.Save(new LinkedZone(dbConn, dbTrans, idZone), filePath); DB Client Libraries Code samples

  25. Conclusions The DB architecture that has been proposed some months ago is working Oracle 9iDS looks a very good choice (easy to implement, maintain, and develop; widely supported) Even people that are not familiar with SQL can easily work with the proposed structure of emulsion DB using interface libraries All interesting OS are supported by Oracle and by our interface libraries Conversion to Root data is trivial (1 line of code) Interface libraries for OperaDB are almost (95%) complete ...everything fine up to now!

More Related