1 / 54

Switching to the fast track: Rapid digitization of the world's largest herbarium

Switching to the fast track: Rapid digitization of the world's largest herbarium. TDWG 2011- New Orleans Simon Chagnoux, Henri Michiels. The French Museum. An old institution. Founded in 1635 (at that time the Royal garden of medicinal plants)

uta
Download Presentation

Switching to the fast track: Rapid digitization of the world's largest herbarium

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Switching to the fast track: Rapid digitization of the world's largest herbarium TDWG 2011- New OrleansSimon Chagnoux, Henri Michiels

  2. The French Museum

  3. An old institution • Founded in 1635 (at that time the Royal garden of medicinal plants) • In 1793, the French revolution turns the garden into the national Museum • Now: 15 locations in France, 2000 people TDWG - Orleans

  4. Renovating the Herbarium An opportunity to digitize the entire collection

  5. The Paris Herbarium TDWG - Orleans

  6. The Renovation Project (1) • Two main drivers to this project : • the herbarium, designed for 6 million specimens, was packed with 10 million sheets and fitted with old storage • raising the storage density required to reinforce the floors TDWG - Orleans

  7. The Renovation Project (2) • The only way of doing this was to move away the entire collection and to put it back in the renovated place after works • An opportunity for • New sorting, from geographic to phylogenetic (APG3) • Reconditioning • Digitizing TDWG - Orleans

  8. Renovation Calendar 2006 – Start of the project 2009 – Start of the works 2010 (June) – Start of digitization 2011 (Nov) – Opening of the first rearranged spaces to researchers 2012 – End of the project TDWG - Orleans

  9. Budget • Overall project cost: 24,5 Million € • Building renovation 12 000 000 • Movers 900 000 • Attaching specimens 3 200 000 • Reconditioning, digitization and sorting 6 700 000 • Supplies 1 600 000 • Storage 100 000 TDWG - Orleans

  10. The renovation cycle Floor by floor renovation Herbarium DigitizationReconditioningSorting Industrial Partner Warehouse TDWG - Orleans

  11. Before .... TDWG - Orleans

  12. ... And after TDWG - Orleans

  13. Why digitize ? • Because all the parts have to bemanipulated in the course of the project • Digitizationgives us: • a virtual copy of specimens • the possibility to share and studyspecimenswithouttouchingthem • More than an electronic copy of the collection catalog, we’ll have a collaborative tool for managingscientificknowledgeinside, as well as outside the institution TDWG - Orleans

  14. 2D Digitization is cheap • the cost of digitization is marginal compared to the full project • full specimen processing (moving, sorting, reconditionning, new furniture) • digitization and name processing • digitization is appealing to funding $1,5 $0,1 TDWG - Orleans

  15. A new paradigm • For 15 years we have been entering all information of some specimens, • 1 million entries in the database (rich information) • One fifth (200 000 images) was photographed • Since summer 2010, we use a massive approach where digitization precedes data entry • 2 million records digitized in one year • limited information in the database (name and geographic area) • The scientific information can be added without manipulating the specimens themselves TDWG - Orleans

  16. The workflow Digitizing, reconditionning and sorting

  17. An industrial process (1) • We chose a contractor with an industrial know-how • A dedicated place had to be set-up and equipped by the contractor • Two teams of 20 workers in two shifts working from 6am to 9pm • The process had to align on the schedule of the renovation works, floor by floor TDWG - Orleans

  18. An industrial process (2) • Planned production rate: 17 000 sheets per day over 24 months  ca. 15 seconds / sheet • At this rate, a variation of ± 1 second per specimen has an impact of ± 300 k€ over the project cost TDWG - Orleans

  19. The Bussy-St-Georges site TDWG - Orleans

  20. Workflow overview TDWG - Orleans

  21. How to alleviate data entry • We take advantage of the physical ordering of specimens • We provide a name list to the contractor (APG 3 classification) • The contractor enriches the list with the information generated during the process and provides us with a table containing consolidated information (image number, barcode numbers, classification,…) TDWG - Orleans

  22. 1 – Delivery (1) A carting company transports the specimens to the facility where they arrive in clearly labeled boxes. Boxes receive a tracking barcode TDWG - Orleans

  23. 1 – Delivery (2) • The Museum provides two files: • a “logistics” file • number of boxes • family name and number • genus name and number • geographic area • a “taxonomy” file • List of available taxon names with family, genus, species, authors, ID (taxon number) TDWG - Orleans

  24. 1 – Delivery (3) • This information is digested by the contractor’s Information System and used along the industrial process (labeling, sorting, quality assurance) TDWG - Orleans

  25. 2 – Folder processing For each folder, the operator : • replaces the jacket (color according to region) • reads the species name and types the first letters on its computer • selects the name in a list • prints a label with barcode and identification information, and sticks it on the folder TDWG - Orleans

  26. 3 – Specimen Digitization (1) • Datamatrix and barcode are stuck on each sheet • Datamatrix: for tracking purposes • Barcode: specific to Muséum and to int’l herbarium standard • The specimens are placed three by three on a tray TDWG - Orleans

  27. 3 - Specimen Digitization (2) • The tray is placed on a conveyor belt • The sheet is scanned • The scan is checked (framing and focus) • At the end of the chain, the barcode is read to check if all specimens are back in the folder TDWG - Orleans

  28. The Digitization Bench TDWG - Orleans

  29. 4 - Reconditioning • After scanning, each sheet is inserted in a sulfurized paper liner • The barcode of each specimen is read, allowing the system to check if all specimens are back in the right folder • The folders are stored in a “cut box” before sorting TDWG - Orleans

  30. 5 - Sorting 1 (by genus) • This sorting consists in storing specimens by family and genus names • The operator puts the jackets in boxes and places them on shelves according to the family and genus numbers (the shelves are labelled in advance by the contractor) TDWG - Orleans

  31. 6 - Sorting 2 (by species) • The operator takes a box, reads the barcode on each jacket • The system displays the species name and assigns a number which is printed on a label • The label is sticked on the folder, which is then stored on the shelf with the same number TDWG - Orleans

  32. 7 – Packing, transport and final storage • The folders are put in boxes and sent to the Museum • The contractor stores the folders in the Museum’s herbarium TDWG - Orleans

  33. 60 000 images produced each week How to ensure quality in mass digitization? • Checking: • Focus • Data quality • Barcode number • Barcode location 1% of the production checked (ca. 600 images) 4 1 Samples are distributed among botanical staff 2 3

  34. Scanning Resolution and Image Format

  35. Production of images • The conveyor belt passes the specimens under a bidirectional scanner which produces 11x17” (A3), 300 dpi, 5000 x 3300 pixel images • TIFF files are saved offline (one production day per disk of 1 TB) • JPEG’s are made for online use TDWG - Orleans

  36. Scanning resolution and image size • One TIFF image is 50 MB • One JPEG is 5 MB. This compression rate was chosen to have the same level of details as with TIFF (only colour is slightly changed) • This choice is a technico-economic trade-off • For 10 million images: • TIFF represents 500 TB • JPEG represents 50 TB • Data represents <100 GB TDWG - Orleans

  37. Why do we keep TIFF ? • Partners seek lossless data (Reflora, Mellon) • Standard for physical publishing • Native scan output, which can be used for any future use or transformation TDWG - Orleans

  38. Handling TIFF data • We cannot afford « live » storage of 500 TB • … and even 1 Po with redundancy ! $$$ • With a lot of energy consumption and heat dissipation for rarely accessed images • We are planning to start using tape storage next year, with HSM software • For the time being, USB disks are stored in the collection warehouse TDWG - Orleans

  39. Exception for the types • The types are not part of this industrial process • They are manually digitized on-premises at 600 dpi (200 MB in compressed TIFF) • This process was initiated by the Mellon foundation in 2004 • We now have about 100 000 type images TDWG - Orleans

  40. What we’ve achievedand learned … … after 12 months of collaboration between scientists and industrials (over an anticipated duration of 24 months)

  41. Achievements • 2,1 million specimens processed between June 2010 and August 2011 • Images and data are of good quality • The new premises comply with today’s standards (space, safety, light, air-conditioning, …) TDWG - Orleans

  42. Fast but ... not fast enough TDWG - Orleans

  43. Reasons for being behind schedule • Logisticians have under-estimated the sorting work • Only two digitization chains are operational, instead of three (due to lack of staff) TDWG - Orleans

  44. Software and quality assurance • There is more software needed for ensuring tracability and detecting failures than for data acquisition. • Fast web publication of images allows a broader audience to perform quality control. • Continuous control is mandatory TDWG - Orleans

  45. People • Working under constant time pressure during two years is really difficult in an academic context • The contractor must be considered as a service provider and not just the team next-door (not obvious in an academic context) TDWG - Orleans

  46. ROI speed robustness quality exhaustivity specifity Working with a contractor • Culture clash • Many parameters were not known at the beginning of the project (processes, numbers, ...) • Quality control is a key point to make sure that scientific excellence governs the industrial throughput (to be defined upfront) • Write everything and always refer to the contract TDWG - Orleans

  47. Digitizing other objects • Digitizing herbarium is « easy »: • same dimensions for all objects • Easy manipulation and scanning • The plant itself is not touched – only the paper • Digitizing 3D objects is a lot more complex and generally requires to manipulate the specimen itself TDWG - Orleans

  48. Is it over ? Digitization is just a very first step…

  49. Virtual herbarium • The amount of information available on-line will lower the number of physical visits to the Herbarium • … but visitors leave post-it note on the sheets  How to replace this ? • Annotation systems • « virtual visit » website TDWG - Orleans

  50. AFM FABACEAE Abrus aureus R. Vig. Spot the differences … ? TDWG - Orleans

More Related