1 / 25

File Management

File Management. Chris A. Mattmann OODT Component Working Group. What is File Management?. Managing the locations and ancillary information about files, and collections of files Ancillary information is metadata What’s a product?

Download Presentation

File Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. File Management Chris A. Mattmann OODT Component Working Group

  2. What is File Management? • Managing the locations and ancillary information about files, and collections of files • Ancillary information is metadata • What’s a product? • A collection of some set of files, and/or collections of files • So, you could have collections of other collections • Along with metadata about the product FILE-MGMT

  3. The state of things • The existing CAS system does file management • For past missions and projects, it’s done the job well • CAS implementation • Needs an update, and overall refactoring to allow for modularity and separation of concerns, and general technology and architectural updates • In particular, a couple of new requirements and drivers for projects • Suggested some ways to extend and improve the CAS to satisfy the new requirements and drivers • What are these new requirements and drivers? FILE-MGMT

  4. New Requirements and Drivers • Persisting archived files using dynamic metadata and flexible, adaptable policies based on product types • rather than the monolithic and inflexible existing method of ProductTypeRepository/ProductName/ProductVersion/ as the filesystem location to store products for all product types. • Clearly separating out the Workflow aspects of the File Manager, from Product ingestion, and flexibly supporting association of Workflows and their subsequent Tasks with any event, not only ingestion. FILE-MGMT

  5. New Requirements and Drivers • Leverage existing transactional models such as Java's Transaction API to support transactional management rather than building our own API. • If we do use any database communication, then making sure that all DB communication is dealt with using standard, available, existing db pooling APIs such as commons-dbcp , available from Apache . FILE-MGMT

  6. New Requirements and Drivers • Clearly separating out the administrative portions of policy management from the existing webapp, and distinguishing what pieces of the webapp are user-centric, and what are administrative-centric. • Supporting heirarchical product structures, such as nested directories that contain many sub-directories, and sub-directories of those sub-directories, with files strewn about at all levels • rather than only supporting the existing method of flat product structures, where all files in a product are at the same tree level. FILE-MGMT

  7. New Requirements and Drivers • Support metadata extraction based on product type or mime-type • Support dynamic product types. The file management component should not need to know about every product type a priori FILE-MGMT

  8. New Requirements and Drivers • You can read/add to the list • Available at: http://oodt.jpl.nasa.gov/wiki/display/oodt/File+Management • Please, speak your mind! FILE-MGMT

  9. File Management: Architectural implications • Managing files • Data Store: follow the typical repository pattern • Manage information about Products, Product Types, and References to products • Managing metadata • Metadata Store: follow the typical registry pattern • Manage product Metadata • Key/Value pairs • Separate out the data store and metadata store • This allows data and metadata to be managed independently FILE-MGMT

  10. Data Store FILE-MGMT

  11. Metadata Store FILE-MGMT

  12. How is this different from the existing CAS? • Separation of concerns • Anything to do with data goes into the data store package • Anything to do with metadata goes into the metadata store package • Modularity • Can have different backend implementations of standard interfaces for data stores and metadata stores • Lucene as a backend for metadata, or if you prefer, traditional DB backend • Can have multiple data stores and metadata stores per CAS • The existing CAS lumped these two capabilities together • Was difficult to reason about how to pull them apart FILE-MGMT

  13. What else do we need to do File Management? • Need a way to transfer a product from the client to the File Management service • Client gives URIs of files, or collections of files, which identifyReferences belonging to a Product FILE-MGMT

  14. Data Transfer Architecture FILE-MGMT

  15. Transferring files • How does the transfer actually occur? • You as a developer define how that happens • Implement the transferProduct(Product p) method • Can have many different types of data transfer • Local • Use native system calls, or cp • Remote • Use whatever protocol you want, XML-RPC, SOAP, WebDAV, etc. • Don’t use CORBA or RMI: they’re sooooo last year! FILE-MGMT

  16. Translating the URIs • Translating the URIs from the client to the File Manager presents an interesting challenge • For example, where should file:///home/chris/myfile.file be transferred to on the File Manager’s system? • Leverage and extend existing CAS method • Existing CAS would have answered the above questions with ProductTypeRepositoryPath/ProductName/VersionId/ • Why should that be the only answer? FILE-MGMT

  17. Versioners • Have the concept of a Versioner interface • Versioner is called by the File Manager before the product is transferred from the client to the File Manager system • Versioner uses the Product metadata, and the original product references to generate data store URIs that tell the DataTransfer implementation where to physically transfer the files for a particular Product FILE-MGMT

  18. Versioner Architecture FILE-MGMT

  19. Versioner Example • Given an mp3 Product, with Metadata: • Mp3Artist: 50cent • Mp3Genre: rap • And with references: • file:///home/chris/mp3s/gangsta-rap.mp3 FILE-MGMT

  20. Versioner Example • Use a MusicVersioner • public class MusicVersioner implements Versioner{ • public void createDataStoreReferences(Product p, Metadata m) throws VersioningException{ • String origUri = ((Reference)p.getReferences().get(0)).getOrigReference(); • String mp3RepoPath = getRepoPath(“Mp3ProductTypeName”); • String dataStoreUri = mp3RepoPath + m.getElementMap().get(“Mp3Genre”) + “/” + m.getElementMap().get(“Mp3Artist”) + “/” + getFileName(origUri); • ((Reference)p.getReferences().get(0).setDataStoreRef(dataStoreUri); • } • } FILE-MGMT

  21. Versioner Example • So • file:///home/chris/mp3s/gangsta-rap.mp3 • …Yields • file:///path/to/mp3/repo/rap/50cent/gangsta-rap.mp3 FILE-MGMT

  22. The File Manager • So, how do we put all these different generic interfaces together? • Well, something like the following • A File Manager has… • One or more data stores, to store data to • One or more metadata stores, to store metadata to • A set of Versioners that are associated with Product Types in order to figure out how to generate the reference data store URIs for a particular product • A Data Transferer that moves a Product’s file from the client to the File Manager using the source URIs and the data store URIs • An external interface to it (e.g., XML-RPC, WebDAV, etc.) FILE-MGMT

  23. What’s implemented so far? • The basic components of the architecture • Several default implementations of the interfaces • javax.sql.DataSource based implementations of DataStore and MetadataStore • Uses Apache’s DBCP for connection pooling • Local Data Transfer using Apache’s commons-io component that can handle heirarchical product structures, as well as flat product structures • Several versioners, including one that versions Products using the existing CAS approach of ProductTypeRepositoryPath/ProductName/Version, along with one that versions a product’s references based on production date time • An external interface based on Apache’s XML-RPC FILE-MGMT

  24. What needs to be done? • A lot! • Check out http://oodt.jpl.nasa.gov/vc/, and log in with your JPL Username and Password. Navigate to “SVN”, and check out the cas-filemgr component. • Modify the code • Look for bugs • Contribute! • I find new bugs everyday • Feel free to talk to me about it • Create issues in JIRA (http://oodt.jpl.nasa.gov/jira/) • Bug Fixes, RFIs, new features, you name it! • Be sure to check out the apidocs • You can build these yourself by checking out cas-filemgr from our SVN repository, and then typing: maven site • Or you can visit: http://terra.jpl.nasa.gov/~mattmann/oco/javadoc/cas-filemgr/ FILE-MGMT

  25. Questions? FILE-MGMT

More Related