1 / 11

An Efficient and Transparent Transaction Management based on the Data Workflow of HVEM DataGrid

An Efficient and Transparent Transaction Management based on the Data Workflow of HVEM DataGrid. Im Young Jung Seoul National University. Introduction. Transaction Management for a safe data update and insertion on e-Science DataGrid

aspen
Download Presentation

An Efficient and Transparent Transaction Management based on the Data Workflow of HVEM DataGrid

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Efficient and Transparent Transaction Management based on the Data Workflow of HVEM DataGrid Im Young Jung Seoul National University

  2. Introduction • Transaction Management for a safe data update and insertion on e-Science DataGrid • Heterogeneous storages according to the characteristics and the size of data • Based on workflow, the storing precedence of data across heterogeneous storages in a transaction • In this paper • An efficient and transparent transaction management on HVEM DataGrid • Dividing the transaction into sub-transactions according to the transaction states and Classifying them • Transaction hierarchy and parallelism provide • efficient and safe large data upload to HVEM DataGrid • transparency in the transaction including simultaneous access to heterogeneous storages • Automatic garbage collection

  3. HVEM Grid • High Voltage Electron Microscope(HVEM) • Let scientists realize the 3D structure analysis of new materials in micrometer-scale • HVEM Grid • Remote users can perform the same tasks as on-site scientists. • Remote controlling of HVEM • Storing, retrieval and search data through HVEM DataGrid • Processing data through HVEM Computational Grid

  4. HVEM DataGrid • Designed for Biologic experiments using HVEM • A logical view of one storage for DB and file storage • The small metadata is stored at DB • Information for materials, material handling methods, HVEM experiments, Images, experimenters • The large files are stored in file storages • 2D or 3D image files, the documents related to HVEM experiments • Internal process to find files • After finding their logical path in the file storage by searching the DB, users can retrieve the files they want in the file storage

  5. HVEM DataGrid • A unified data management • The storing precedence among data • When store all biological information for the images, we should keep the images in HVEM Grid at the same time • The relational semantics between various data stored in distributed heterogeneous storages • To upload many large files to HVEM DataGrid efficiently and safely • Upload dependency & Serialization • Ensure the transactions for safe parallel uploads

  6. An efficient and transparent transaction management • Requirement for the transactions on HVEM DataGrid • Consider the semantic of HVEM DataGrid • A project is composed of several experiments • The data for an experiment should be inserted according to its data workflow • The file and its metadata should be stored to HVEM DataGrid simultaneously. Otherwise, all of them should be deleted • Support • the long lifetime transaction according to the timelimit of experiment or project • the short lifetime transaction which stores the data to HVEM DataGrid physically • The optimization for the upload of large files to reduce the blocking time should ensure safe transactions • An asynchronous and parallel upload scheme should protect upload dependency and ensure safe transactions

  7. An efficient and transparent transaction management • Transaction hierarchy • The transaction units as checkpoints on incomplete data insertion • Confine the rollback extent • When the data for an experiment or a project is not inserted to HVEM DataGrid until each timelimit, the experiment or the project should be vanished by the rollback of TnE or TnP • TnS((((1)2)5)2) • (1) represents the identity of TnP it belongs to • The next index ‘2’ indicates the identity of TnE and so on For Project For Experiment For a group of TnSs Parallel Processing For storing data to physical storage • Support Autonomous garbage collection • It is dependent on users to insert data or delete it on HVEM DataGrid. • When they do not insert experimental data any more due to any reason without deleting the related data, HVEM DataGrid would have a big garbage.

  8. Transaction management Scheme • HVEM DataGrid forks two processes to connect DB and file storage each. • When the connections succeed, it gets the next requests and so on. • The state change of TnS(((())j)i) • jSiS  jSiD(the notification from DB), jSiF(the notification from the file storage)  jSiE (both of them arrive) : TnS completes • In the light failure(LF) due to temporary failures on network or server, retry the transaction fixed times • When the retries fail, a serious failure(SF) is assumed  rollback process

  9. Evaluation • Analysis • Transparency • Through transaction hierarchy and fine grained state management • the transaction manager in HVEM DataGrid enables the transparent transaction to upload the image files to the file storage and store their metadata to DB simultaneously. • Serializability • Many TnSs are upload serializable because their state changes are logged through transaction index. • To keep the upload dependency, • the transaction manager protects the first user entering TnW. • If he withdraws the TnW, then an other user can initiate the TnW • Transaction performance • Support the transaction scheme asynchronism and parallelism • Experiment Setting • Because the sub-transaction time on DB is negligible compared with that on file storage due to data size, we only considered the upload time for image file • Considering the semantic of the data workflow in HVEM DataGrid • For an asynchronous file transfer, the request intervals for file transfer are chosen randomly within 50 sec • The physical locations of the file storages are assumed to be distributed

  10. Evaluation • Overhead • Log management cost • The cost for TnP, TnE and TnW; The general transaction management requires the log for TnS • The log size for TnP, TnE and TnW is smaller than that for TnS because they function as checkpoint rather than real transaction units. • Rollback cost • The cascade rollback of TnS in TnW due to the upload dependency on parallel processing of TnS • At LF, if the retry succeeds, the gain from transaction parallelism can be very large especially for large file handling • There are not many SFs or LFs because e-Science DataGrid is not popular as the multimedia storage

  11. Conclusion • A transaction management on HVEM Grid • Safety • Ensure a safe transaction considering the data workflow in HVEM DataGrid • Efficiency • Improve the performance to upload large files by asynchronism and parallelism • Transparency • Data management across the heterogeneous storages • Automatic garbage collection • Reduce garbage

More Related