1 / 33

Connecting arbitrary data sources to the grid

Connecting arbitrary data sources to the grid. Shunde Zhang Australian Research Collaboration Service (ARCS) eResearch SA School of Computer Science, University of Adelaide. Background. Australian Research Collaboration Service A successor of APAC Services HPC Data

montana
Download Presentation

Connecting arbitrary data sources to the grid

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Connecting arbitrary data sources to the grid Shunde Zhang Australian Research Collaboration Service (ARCS) eResearch SA School of Computer Science, University of Adelaide

  2. Background • Australian Research Collaboration Service • A successor of APAC • Services • HPC • Data • Collaboration tools: AccessGrid, EVO, Plone, drupal, Sakai

  3. ARCS Data Fabric

  4. ARCS Data Fabric (cont.) • A national service • Provided to all Australian researchers • Based on iRODS

  5. The Problem • Interoperability with “The Grid” • “The Grid”: Globus, gLite, condor, etc. • Data sources • GridFTP-compatible: dCache • Non GridFTP-compatible: iRODS, SRB • Possible solutions • “Manual” copy (or do it in PBS script) • Copy queue

  6. The Problem (cont.) • Movement of massive data • Both ends use same software (talks same protocol) • Different systems are used (talks different protocol) • Efficiency • Possible solutions • Transfer via an intermediate point

  7. A solution - old fashioned • AWS Import/Export for Amazon S3 • Ship the hard-disks by courier company

  8. Our Solution - GridFTP • De facto standard • Compatible with the Grid, and many grid clients • Efficiency • Parallel transfer • Data channel reuse • Large file transfer - in small blocks • Compatible with many file transfer services • Monitoring • Scheduling

  9. An overview of GridFTP protocol • Based on FTP with extensions • Third-party transfer • Intermediate point not needed • Security - GSI • Extended block mode • Parallel transfer • Striped transfer • Partial transfer • Reliable and restartable • TCP and UDP

  10. The Architecture GridFTP interface Generic File System Framework Data Source Plugin Data Source

  11. FileSystem creates FileSystemConnection creates FileObject creates RandomAccessFileObject Generic File System Framework

  12. FileSystem interface public String getSeparator(); public void init() throws IOException; public FileSystemConnection createFileSystemConnection(GSSCredential credential) throws FtpConfigException, IOException; public void exit();

  13. FileSystemConnection interface public FileObject getFileObject(String path); public String getHomeDir(); public String getUser(); public void close() throws IOException; public boolean isConnected(); public long getFreeSpace(String path);

  14. FileObject interface public String getName(); public String getPath(); public boolean exists(); public boolean isFile(); public boolean isDirectory(); public int getPermission(); public String getCanonicalPath() throws IOException; public FileObject[] listFiles(); public long length(); public long lastModified(); public RandomAccessFileObject getRandomAccessFileObjec(String type) throws IOException; public boolean delete(); public FileObject getParent(); public boolean mkdir(); public boolean renameTo(FileObject file); public boolean setLastModified(long t);

  15. RandomAccessFileObject interface public void seek(long offset) throws IOException; public int read() throws IOException; public int read(byte[] b) throws IOException; public int read(byte[] b, int off, int len) throws IOException; public void close() throws IOException; public String readLine() throws IOException; public void write(int b) throws IOException; public void write(byte[] b) throws IOException; public void write(byte[] b, int off, int len) throws IOException; public long length() throws IOException;

  16. GridFTP client Grid job submission system Data transfer service GridFTP interface Griffin Generic file system framework Adaptor for iRODS Adaptor for local file system Other adaptors iRODS Local File System Other data source The Implementation - Griffin

  17. Features • GridFTP protocol version 1 • Java-based • Spring framework • OS-independent • Lightweight, stand-alone, self-contained • No need to install Globus Toolkit • Two plugins included • iRODS plugin • Local file system plugin • Open source (Apache 2 & GPL)

  18. WAN LAN/localhost Client Griffin Data Source Parallel transfer with Griffin

  19. Authentication • GSI • iRODS plugin • User mapping • local file system plugin • XML file • Maps GSI authentication (certificate DN) to internal user management system

  20. Use case • Integration of the Grid and Data Fabric • iRODS plugin for Data Fabric • Third-party transfer to cluster (Globus GridFTP) • Tested with • Globus.org • Globus-url-copy (5.0 and 4.x) • Globus GridFTP GUI

  21. Performance Evaluation • Server: Two quad-core Xeon 3.16GHz CPU, 16GB memory • Client: IBM xSeries 346 with two hyper-threaded Intel Xeon 3.20GHz CPUs, 4GB memory • Network: 1Gbps LAN • WAN: two 10Gbps links • Transfer: 256MB, 512MB, 1GB, 2GB, 4GB, 8GB, 16GB • iCommands • Globus-url-copy

  22. Client globus-url-copy iCommands Griffin Jargon Adaptor iRODS Local File System Evaluation Set up - Griffin vs iCommands

  23. Evaluation Result Chart - Griffin vs iCommands

  24. Client globus-url-copy Griffin Globus GridFTP server Local FS Adaptor Local File System Evaluation Set up - Griffin vs Globus GridFTP

  25. Evaluation Result Chart - Griffin vs Globus GridFTP

  26. Related work • Client library • SAGA/jSAGA • Commons-vfs • Data transfer service • Stork • PAFTP • Globus • XIO • DSI

  27. Griffin vs. Globus GridFTP

  28. Conclusion • A generic solution to connect arbitrary data sources to the grid • Data in/out of the grid • Data transfer between different data sources • Java-based implementation • Standalone, lightweight • Plugable • Not depend on Globus

  29. Future work • Currently working on a plugin for MongoDB • Java NIO • UDP • Striped transfer

  30. MongoDB plugin • MongoDB • NOSQL database • Stores JSON-style documents • GridFS component • Stores files • Plugin for griffin • Read/write files via GridFS

  31. Acknowledgements • ARCS funded

  32. Current Status • ARCS production service • Used to transfer data in/out of ARCS Data Fabric • Website • https://projects.arcs.org.au/trac/griffin

  33. Thank you! Questions/Comments?

More Related