1 / 33

Ingest and Loading

Ingest and Loading. DigiTool Version 3.0. Ingest Agenda. Ingest Overview and Introduction Ingest activity steps Transformers Task Chains Upload of Files Ingest Management. DigiTool Modules. Deposit. Approval. Web Services. Dispatcher & Viewers. Single & Bulk. Search & Index.

cooper-cruz
Download Presentation

Ingest and Loading

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ingest and Loading DigiTool Version 3.0

  2. Ingest Agenda • Ingest Overview and Introduction • Ingest activity steps • Transformers • Task Chains • Upload of Files • Ingest Management

  3. DigiTool Modules Deposit Approval Web Services Dispatcher & Viewers Single&Bulk Search& Index

  4. Ingest Module • Two main functions: • Creation and submission of new ingest activities –bulk and individual • Monitoring of ingest status (scheduled, running success etc.) • Ingest activities can be initiated directly from the Ingest application, by pre-defined templates for manual/automatic ingest started in the Deposit Module or potentially by FTP feed

  5. Ingest Main Functions

  6. Ingest Architecture • One loader, multi transformers • Transformer – takes objects and/or metadata as input, and transforms it to the Repository digital entity representation. • Ingest activity is a workflow that combines a certain transforming process and potential background tasks, and is followed by the generic loader. • All loads (including batch) are processed as individual digital entities to control loading errors.

  7. Output Digital Entity Digital Entity Transformer and pre ingest tasks Digital Entity Digital Entity Digital Entity Digital Entity Load to Repository Example: Template-Based Transformer Input

  8. Common Workflows • Ingest Overview and Introduction • Ingest activity steps • Transformers • Task Chains • Upload of Files • Ingest Management

  9. Ingest Activity A typical workflow for submission of Ingest activity: • 1. Enter activity name and schedule time for running, select type of transformer and determine the background tasks to run as part of the ingest activity. • 2. Order/Select background tasks into a task chain • 3. Select Digital Entity template and select/verify background task parameters • 4. Point to location of files or upload files • 5. Submit

  10. Common Workflows • Ingest Overview and Introduction • Ingest activity steps • Transformers • Task Chains • Upload of Files • Ingest Management

  11. Step 1 – Ingest Types - Transformers • 1. File stream(s) that will be loaded with no relationships • 2. File stream(s) that will become part of one parent record • 3. File stream(s) utilizing the DigiTool file name convention • 4. MARC XML file and associated file stream(s) • 5. Dublin Core XML file and associated file stream(s) • 6. Comma separated value (.csv) file • 7. METS xml file and associated file stream(s) • 8. Exported DigiTool repository elements for ingest/re-ingest

  12. Step 1 – Ingest Types - Transformers • 1. File stream(s) that will be loaded with no relationships • Treats every file uploaded as a separate entity with no relationships. Each formed record will be separate upon ingest to the repository. • 2. File stream(s) that will become part of one parent record • Used to create relationships among the file(s) ingested. An additional "parent" record will be added that allows navigation between the file(s) loaded. Ultimately, each file will attain its own individual record, but with the option of viewing the "parent" record which points to all of the stream(s) loaded. • 3. File stream(s) utilizing the DigiTool file name convention • Takes file stream(s) with filenames according to the DigiTool standard and based on these filenames, automatically creates a hierarchical METS file for load into the repository.

  13. Step 1 – Ingest Types - Transformers • 4. MARC XML file and associated file stream(s) • Takes a standard MARCXML file as input and loads each metadata record as a separate entity. The MARCXML file may contain links to file stream(s) – local or remote - through the use of metadata tag placeholders which would associate each file stream(s) with its MARC record. • 5. Dublin Core XML file and associated file stream(s) • Takes a standard DCXML file as input and loads each metadata record as a separate entity. The DCXML file may contain links to file stream(s) – local or remote -through the use of metadata tag placeholders which would associate each file stream(s) with its DC record. • 6. Comma separated value (.csv) file • Takes a standard .csv file along with appropriate mapping information and loads each row as a separate record. File stream(s) may also be uploaded as part of this transformer’s workflow.

  14. Step 1 – Ingest Types - Transformers 7. METS xml file and associated file stream(s) Takes a METS XML file as input and a decomposition into single atom units ensues for proper ingest. The XML file may contain links to file stream(s) local or remote and will be stored in the repository with all structural relationships defined such that a recomposition takes place upon delivery of this compound object. 8. Exported DigiTool repository elements for ingest/re-ingest Takes digital entities that are already in the repository-recognized format and allows their ingest/re-ingest back into the repository.

  15. Step 1 – Ingest Schedule and Assignment • Scheduling ingest assignment is a required portion of any ingest activity. Options include: • - As soon as possible • - Specified time and date • With the appropriate privileges, the assignment to other Staff users of the same Admin Unit can be set. The default is for the assignment to the logged-in staff user. • Please note: The “assigned to” staff user for any ingest activity is the only one who can activate that activity.

  16. Common Workflows • Ingest Overview and Introduction • Ingest activity steps • Transformers • Task Chains • Upload of Files • Ingest Management

  17. Step 1 – What is a task? A task is an action to be performed on the “transformed” digital entities and/or file stream(s) before ultimately ingesting the entire set of formed entities into the repository.

  18. Step 1 – Task Chain Initiation • Template based – Server-side templates representing a variety of pre-defined task chain combinations. • New task chain – Allows a tailor-built task chain to be defined and ordered in Step 2 of the ingest activity. • User-defined task chain – User-saved and defined task chain saved from a previous session. Any task chain can be saved as a user-defined task chain.

  19. Available Task Chains • Empty Chain • Technical Metadata Extraction • Add Metadata • Control Section Attribute Assignment • Full Text Extraction • PDF Full Text Extraction • Add History Event • Tiff to JP2000 Converter • Remote Stream Download • Thumbnail Creation

  20. Available Task Chains • Empty Chain - No task chain will be applied. • Technical Metadata Extraction - For recognized file stream(s), technical metadata will be extracted and mapped into standard technical metadata. • Add Metadata - Allows the linking or copying of a single metadata record which will be applied to all file stream(s) part of the ingest activity. • Control Section Attribute Assignment - Allows digital entity information to be defined on a one-by-one basis that will be applied to all digital entities part of the ingest activity. • Full Text Extraction - For recognized file stream(s), full text will be extracted as the source object’s manifestation.

  21. Available Task Chains • PDF Full Text Extraction - For pdf file stream(s), full text will be extracted as the source object’s manifestation. • Add History Event - Allows additional entries of change history metadata to be added to the file stream(s) of an ingest activity. • Tiff to JP2000 Converter - Takes tiff image(s) and creates a JPEG2000 manifestation of the source image. • Remote Stream Download - Defines the storage of URL stream(s) – either copied to local or remaining remote. • Thumbnail Creation - For recognized file stream(s), a thumbnail image will be created as the source object’s manifestation.

  22. Step 2 – Task Chain Definition and Order • Allows staff user to pick and order the available tasks for the ingest activity. • Order of tasks is relevant for certain chains: e.g. Thumbnail and Full Text before Technical Metadata extractor

  23. Step 3 – Template and Task Chain Parameters • Choose Digital Entity template: • e.g. marc_simple_entity_with_stream.xml when using the MARC transformer and wishing to load file stream(s) with the MARC records. • NOTE: Digital Entity templates are sensitive to the Transformer chosen in Step 1. • Set task parameters: • e.g. thumbnail height, width • text language encoding for full text indexing • MD insertion • etc….

  24. METS transformer - METS to D.E. METS transformer Digital Entity Mets Header Control Section dmd & amd Sections (DL content if necessary) Descriptive/ technical/rights/ METS FILE Structural Map MD Section Preservation Behavior/Struct Link File Section (URL editing) File structure MD Linking For each file in File Sec Digital Entity

  25. Common Workflows • Ingest Overview and Introduction • Ingest activity steps • Transformers • Task Chains • Upload of Files • Ingest Management

  26. Step 4 – Local Files • Choose files for upload – Active-X plugin required: Easy to use Preview of icon/thumbnail during upload • Send to server • Preview/Manage files

  27. Step 4 – Remote Files (URL) • Choose files for upload/linkage URL can be entered 1 by 1 or batch from text list * Download now (Store URL file locally) (Link to Remote location) • Preview/Manage files

  28. Ingest Activity – File upload

  29. Common Workflows • Ingest Overview and Introduction • Ingest activity steps • Transformers • Task Chains • Upload of Files • Ingest Management

  30. Ingest folders • Not scheduled – Ingest activities ready for activation that are not scheduled. • Scheduled – Ingest activities set for ingest at a specified time and date. • Running – Ingest activities that are actively running. • Success – Ingest activities that have loaded successfully. • Failed – Ingest activities that have not loaded successfully.

  31. Ingest Management • Edit, Delete and Activation • Monitoring log files – Task list – Shows all background tasks performed Task log – Full step by step log file for each ingest step. Task summary – Overview of major steps of the ingest process – e.g. Pre-transformer, Transformer, Ingest.

  32. Begin with upload of files before defining tasks/definitions for ingest activity (for mass file upload). • Pre-transformer – Transforms file stream(s) and/or metadata to the ingest-ready format so that a transformer can be initiated. Currently, METS Zip input from deposit is the only pre-transformer. • Saving task chains to personal user profile for future use. Additional Functions

  33. Thank you! www.exlibrisgroup.com

More Related