1 / 28

FAST: Flexible Automated Synchronization Transfer

FAST: Flexible Automated Synchronization Transfer. Rosa Filgueira – University of Edinburgh Iraklis Klamapnos - University of Edinburgh Yusuke Tanimura - AIST, Tsukuba Malcolm Atkinson- University of Edinburgh. Index. Introduction Problem description Hypothesis

darby
Download Presentation

FAST: Flexible Automated Synchronization Transfer

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. FAST: Flexible Automated Synchronization Transfer Rosa Filgueira – University of Edinburgh IraklisKlamapnos- University of Edinburgh Yusuke Tanimura- AIST, Tsukuba Malcolm Atkinson- University of Edinburgh

  2. Index • Introduction • Problem description • Hypothesis • Rock Physics laboratory experiments • Objective • Proposal • Related developments • Data transfer protocols • Data transport systems • FAST • Selecting the best data transfer protocol • Data transfer experiments • Implementation and evaluation • Future work and Questions

  3. Problem description • Large number of rock physics (RP) laboratories • Runs many experiments (Experimentalists) • Large number of rock physicists • Develops computational codes (Code builders) • Sharing experimental data among this community is still in its early days • No facilities to transfer experimental data automatically in real time with their associated description (metadata)

  4. Problem description • Several tools for providing reliable and high performance data transfer capabilities • Dropboxor Globus Online • Not optimized for the RP requirements

  5. Hypothesis • The RP community will benefit from tool • Transfers data and metadata in near-real time • Repository and DB accessible from a website • For experimentalists • Collection and comparison of experiments from many labs • For code builders • Find test data for running their models

  6. Laboratory experiments features-I • Laboratory rock property measurements • Properties of the rock sample are studied under different conditions • High-pressure vessels to apply pore pressures and stresses to cylindrical rock sample • Until the sample has failed, different features (e.g stress, porosity, temperature, etc, ....) are recorded at several time intervals • In each interval, data transferred to a local computer machine (channel. 1 channel per rock)

  7. RP laboratory experiment Pressure Vessel UCL- RP Laboratory Rock Samples

  8. Complex laboratory experiment-Creep 2 Initial target: 30 months Deploy under the sea- Mediterranean 8 rock samples- different features Different interval of times and data sizes

  9. Laboratory experiments features-II • Each experiment can record data differently • Events can be written in a new file or appended • Files can be stored in the same directory or not • Intervals for writing data can be shorts or long • Number of rocks samples could be one or several • Duration of an experiments can be short or long • Data intensive problem for transferring the data

  10. Objective • To transfer RP experimental data from one location to another • Automated data transfer until the end-experiment • Transfer experimental data • Near real time and non-real time • Synchronization • Incremental (File) and Directory • Possible interruptions and fails • Record and transfer the metadata

  11. Proposal • FAST: Flexible automated synchronization transfer • Data and metadata in real timeand non-real time • Incremental (file) and directory sync • Selection of the data-transfer protocol • Compatible with all O.S • Simple to set up and manage • Monitors the transmission, detects errors and recovers from them. • Data collected in a repository, metadata in DB, and web site for accessing them • Proposal is triggered by our work • EFFORT project • Using data provided by the Creep-2 project

  12. Data transfer protocols- TCP • File transfer Protocol (FTP) • Control and data are un-encrypted • Easy to use, lack of security • FTP security extension (FTPS) • Control encrypted (TLS or STLS), but data might not be • Secure Copy (SCP) • SSH for transferring data and authentication (more secure than previous ones) • File transfer only • Ideal for quick transfer of single files • SSH File Transfer Protocol (SFTP) • Based in SSH-2: best for secure access (packet confirmation) • File transfer, creating and delete remote directories and files • Directory synchronization, • Rsync • Incremental file transfer (delta algorithm) • File and directory synchronization • Can provide encrypted transfer by using SSH • On-the-fly compression option • Idea for back-ups

  13. Data transfer protocols- UDP • UDP-(UDT) • UDP protocol for data-intensive applications • UDT can transfer data a higher speed than TCP-based protocols • UDT Enabled Rsync (UDR) • Uses Rsync for the transport mechanism (delta) • Sends data over the UDT protocolIdeal for large data over long distance • Ideal for large data over long distance

  14. Data transport systems • GridFTP: • HP secure, reliable data rate via high bandwidth • many-to-many • difficult to use • GlobusOnline • Uses GridFTP protocol • Automates the management of files: • monitoring performance, retrying files, recovering from failes • Do not support file synchronization. • Dropbox: • Centralize cloud storage, file and directory synchronization • Rsync-delta protocol • Data stored on the Amazon S3 (Third party) • One-to-one file transfer • BTSync • Decentralized cloud storage, P2P file synchronization (No Third party). • Connecting the devices to communicate with UDP • Many-to-many file transfers • WinSCP • SFTP and FTP client for Windows

  15. Data transport systems Email from Globus Online Support We recently noticed that you are creating many CLI sessions tocli.globusonline.org, each with a single blocking transfer.  This is asuboptimal way to use Globus Online and in fact is causing us someresource usage issues.

  16. Data transport systems • Previous tools • Different data-transfer protocols • Some automated data synchronization • No one • Select the best protocol depending on requirements • Methods for tracking metadata and transferring it • Our work automatically • Selects a protocol among FTPS, SFTP, Rsync, and UDR • Injects a minimum of metadata • GridFTP and P2P discarded: communications 1-to-1 • FTPS instead of using FTP: minimum security level • SFTP derives from SCP

  17. Selecting the best protocol FTPS, SFTP, Rsync and UDR

  18. Data transfer experiments- Same local network • Two machines located in Edinburgh • VLAN Network 100MB/s • Synthetic program to generate events • Data size written to files: 50KB, 500KB, 1MB, 10MB, 100MB, 500MB, 1GB and 10GB. • Measures: transfer rate and elapsed time • Repetition: 10 times

  19. Data transfer experiments-Same local network SFTP fastest < 500MB Rsync fastest >= 500MB ** without compression

  20. Data transfer experiments- Different networks • UDR has been specially designed • Large data transfer over long distance • UDR vsRsync by using two machines • Located in different local networks • University of Edinburgh 1GbE • AIST-Tsukuba  10GbE • Generated Files: 1MB, 500MB, 1GB, 10GB and 30GB.

  21. Data transfer experiments- Different networks UDR fastest ** without compression

  22. Decision tree

  23. Implementation and evaluation • Front-end: GUI using Java SWING • Back-end: Decision tree • Data and Metadata • Data stored in a remote repository (NAS) • Metadata collected in remote database (MySQL) • Science gateway (Web tool) connected with the repository and database • Searching • Visualizing • Analyzing • Download

  24. User interface – New Experiment

  25. Implementation and evaluation • FAST has been evaluated: • By using synthetic programs for generating data • real time and non-real time • For each type of synchronization • Different data sizes, and different types of network locations • Short and Long term experiments • Stop and restart • For transferring data from a real rock physic experiment • Laboratory- UCL (London) and Edinburgh • Days: 45 days • Interval: Every minute • Rock Samples: 1

  26. Future work • Use FAST in the Creep-2 experiment • Implement FAST policies • Data available in the repository for specific users during a reasonable period • Sharing data from many-to-many locations • Decision-tree • Automating generation and maintenance • Keep up-to-date the by measuring transfers • Use FAST in more rock physics laboratories • Use FAST in other disciplines

  27. Thanks & Questions • email: rosa.filgueira@ed.ac.uk

More Related