260 likes | 399 Views
Bringing the ART of Manuscript cataloging to the computer. World Digital Library Arab Peninsula Regional Group meeting. Magdy Nagi. The Wellcome Arabic Manuscript Cataloging Partnership. Wellcome Trust Arabic Manuscript Digitization Partnership.
 
                
                E N D
Bringing the ART of Manuscript cataloging to the computer World Digital Library Arab Peninsula Regional Group meeting Magdy Nagi
Wellcome Trust Arabic Manuscript Digitization Partnership • Creating a unique online resource of ~ 500 Arabic and Islamic manuscripts related to classical medicine associated with full text search for incipits, chapter headings, explicits, .. etc. • A partnership between Wellcome Library, KCL and the BA for the creation of an online discovery and dissemination tools to avail the manuscript and metadata on the web for free.
Main features of the application Manuscript facsimiles immediately available
Main features of the application • Manuscript facsimiles immediately available • Zooming shows images at higher quality
Main features of the application • Manuscript facsimiles immediately available • Zooming shows images at higher quality • Associating images with metadata field values
Main features of the application • Manuscript facsimiles immediately available • Zooming shows images at higher quality • Associating images with metadata field values • Entering non-standard characters
Main features of the application • Manuscript facsimiles immediately available • Zooming shows images at higher quality • Associating images with metadata field values • Entering non-standard characters • Configurable workflow between BA and Wellcome Trust
Main features of the application • Manuscript facsimiles immediately available • Zooming shows images at higher quality • Associating images with metadata field values • Entering non-standard characters • Configurable workflow between BA and Wellcome Trust • Audit trail of all changes to metadata records
Main features of the application • Manuscript facsimiles immediately available • Zooming shows images at higher quality • Associating images with metadata field values • Entering non-standard characters • Configurable workflow between BA and Wellcome Trust • Audit trail of all changes to metadata records • TEI P5 compliant output
TEI P5 • The TEI P5 standard allows entering extensive metadata about manuscripts • “This moduledefines a special purpose element which can be used to provide detailed descriptive information about handwritten primary sources.” • The very vast possibilities make it powerful yet difficult to use • <persName> (personal name) contains a proper noun or proper-noun phrase referring to a person, possibly including any or all of the person's forenames, surnames, honorifics, added names, etc. Can the data model harness its power without getting out of control?
Data Model • Provisioning the data fields that the catalogers will need is not possible, because features of the collection are not known until it is cataloged. • We discovered the need to indicate MSPart because some manuscripts are made of parts bound together • Creating fields for anything possible puts us in the same dilemma of TEI P5’s excess of possibilities The answer lies in having a Flexible Data Model, based on TEI P5 to be comprehensible.
Flexible Data Model • TEI P5 is an XML vocabulary, and XML is a flexible and structured way of storing data. • The challenge is that years of development against RDBs makes available many ways to easily create data entry applications for RDB, and almost nothing for XML. A library called XML Skeleton Annotations (XSA) was created just for that, and will soon be publicly available.
XML Skeleton Annotations (XSA) • Takes a single configuration file as input, describing the data model and the corresponding website structure. • Generates User Interface (UI) that is bound directly to the XML document loaded. • Gives users control over the look and feel of the UI generated • Access roles, indexing, authority lists are also included in the configuration file.
The outcome of using XSA A user friendly system for entering metadata that follows a very flexible model. Adding a new field to the data model is very straight forward; just a few lines in the configuration file, and no coding at all. Standards compliant records: No XML exporting code, and the library is XML schema driven. Changing the hierarchy of data is possible by XSLT. Highly reusable, and easy to learn.
Other parts of the system • Manageability? Using XML as a data storage format raises the concerns of its manageability, but there are good solutions and others are rising. • The XML collection is made searchable by submitting parts of each XML record to a Lucene index. Only the index and the document ID should be stored. • SVN is used to orchestrate and track access to the collection. But concurrent editing of one XML still needs a good XML merger before it can be safely enabled.