slide1 l.
Skip this Video
Loading SlideShow in 5 Seconds..
MSDC PowerPoint Presentation
Download Presentation

Loading in 2 Seconds...

play fullscreen
1 / 13

MSDC - PowerPoint PPT Presentation

  • Uploaded on

MSDC MiniSeed Data Completeness S. Pintore Scenario A network of SeisComP Remote Server archiving data on mass storage creating on each storage a Peripheral Archive A Server creating a Central Archive Telecommunication network availability < 100% Limited bandwidth Incomplete data

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript


MiniSeed Data Completeness

S. Pintore

  • A network of SeisComP Remote Server archiving data on mass storage creating on each storage a Peripheral Archive
  • A Server creating a Central Archive
  • Telecommunication network availability < 100%
  • Limited bandwidth
incomplete data
Incomplete data
  • Data in CA could be incomplete if the network become and stays unreachable for a long time
    • Some file missing in the CA
    • Data gaps into the files of the CA
the mednet seiscomp servers network
The Mednet SeisComP servers network
  • Data are stored in 24 hours files
  • The segsize parameter is set to 5000

-512 byte blocks-

    • If the link stays down longer than about an hour data will present gaps.
    • If the link stays down longer than 1-2 days some file will miss.
  • Telecommunication network availability is generally good
    • Network faults during more than 1 day are more frequent than faults longer than an hour and shorter than 1 day.
retransmit or integrate
Retransmit or Integrate ?
  • In order to insure data quality is necessary an integrity check
  • Due to the bandwidth limit you must choose between :
    • retransmitting all the file containing a gap
    • integrating your file transmitting only the data needed to fill the gap
  • These two execution steps aren’t necessarily distinct
respect the environment
Respect the environment
  • The procedure to rebuild the correct data should have a low impact on the systems, it should:
    • run on Linux using low resources
    • offer link security
    • permit control on bandwidth use
    • not need specific firewall rules
msdc solution
MSDC solution
  • MSDC uses the rsync tool that is already available, optimised for similar problems and well tested
  • The data check is made by rsync comparing the files in the CA with those in the PA
  • It uses rsync over ssh to:
    • secure the connection
    • avoid using the rsync port (873)
what does rsync offer
What does rsync offer ?
  • The features of the rsync algorithm
    • it works on arbitrary data
    • the total data transferred is about the size of a compressed diff file
    • it is fast for large files and large collections of files
    • it doesn’t assume any prior knowledge of the two files, but takes advantage of similarities
    • it is computationally inexpensive
msdc main features
MSDC main features
  • The can be run from command line or in a crontab line
  • It is a bash script
  • It avoids concurrent running conflicts, using a simple locking mechanism
  • It logs events and the name of the files corrected or definitely lost
  • The installation is made by the sysop user in his home directory
  • MSDC uses a ssh key pair for the automation of the ssh connession
  • this key pair is dedicated to the msdc use, no other connections are possible using it
  • MSDC doesn’t interfere with other keys used to automate ssh connections
  • it doesn’t need an rsync server running
the msdc package
The MSDC package
  • The MSDC package msdc.tgz contains the files listed here:
  • msdc/bin/
  • msdc/bin/validate_rsync
  • msdc/bin/rsync
  • msdc/doc/README.msdc –Documentation- msdc/doc/COPYING -GPL License-
  • msdc/ssh
  • Option to use a different date
alternative solutions after the check
Alternative solutions: after the check
  • The data check could be done using SeedStuff utilities (check_file, extr_file, etc.) or qlib ones (qmerge, etc.).
  • For the incomplete files you can either:
    • retransmit all the file
  • or:
    • use qmerge to extract the data to fill the gaps, then transmit this “patches” eventually using qmerge –again- to fill the gaps.
  • Transmission: you should use a tool offering security as scp or sftp
  • You should then automate this procedure