1 / 16

Data management (part 2)

Data management (part 2). LingDy February 14, 2012 TUFS, Tokyo David Nathan Endangered Languages Archive Hans Rausing Endangered Languages Project SOAS, University of London. Also (for Part 2). creating a catalogue/inventory/index metadocumentation data/file versions transferring data

trilby
Download Presentation

Data management (part 2)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data management(part 2) LingDy February 14, 2012 TUFS, Tokyo David Nathan Endangered Languages Archive Hans Rausing Endangered Languages Project SOAS, University of London

  2. Also (for Part 2) • creating a catalogue/inventory/index • metadocumentation • data/file versions • transferring data • sharing data • backup • character encoding

  3. Different types of metadata • there are many types of metadata • different types of materials may have different metadata • eg metadata for photos and videos may have technical parameters, lists of people appearing • e.g. metadata for transcriptions may have date, version, who transcribed, notes on progress

  4. Your collection catalogue • first, define your collection/corpus/project as some coherent (logical) set of materials • your collection catalogue/inventory/index is a type of metadata • this should list and describe all files in your collection • it usually contains the categories of information that are relevant for many files

  5. Your collection catalogue • you could have one large catalogue that covers every file, or • you could have a catalogue that is subdivided according to types of files, and/or groups of resources • there is no “one size fits all” solution!

  6. Examples

  7. Making an “active” catalogue • this is not necessary, but may be useful • if you use a spreadsheet, you can embed links to actual files to make using your collection easier

  8. Metadocumentation • you should keep an updated description of the methods, conventions, abbreviations you use • .. so somebody could fully understand (and use) your data and methods in your absence • example

  9. Data/file versions • need to distinguish or keep versions depends on purposes • by suffixing filename, eg • fugu1.txtfugu2.txt, or • fugu_1.txtfugu_2.txt • which of the above methods is better?

  10. Data/file versions • fugu_14022013.txtfugu_20130214.txt14022013_fugu.txt20130214_fugu.txt • which of the above methods would be best? • note: do not rely on system dates!

  11. Data/file versions • do you need to keep every version? • often, fine to keep “original” plus current • if information is regularly updated, corrected you can keep 1 filename and put dates in the document itself, or record dates in a catalogue/metadata file • a series of files may have inherent value, e.g. your transcriptions/annotations, as your understanding and analysis changes, so • date and keep files • use different tiers in ELAN?

  12. Transferring data • ensure your computer is not a “walled garden” • you can use • drives/devices (but avoid DVDs!!) • email • upload (where available) • send links • “cloud” e.g. Dropbox • issues include cost, potential viruses, assuring integrity of copies, but generally little problem

  13. Sharing • can we work in a shared, collaborative space? • Dropbox • Google Docs • blogs, Tumblr etc can have shared “authors”, and contributors with controlled roles

  14. Character encoding • if your document contains anything other than those on a US keyboard, use UTF character encoding • how can I tell if characters in my MS Word document are encoded as UTF8? • save as plain text and check options • copy into plain text editor such as Notepad++

  15. Character encoding • useful tools • Notepad++ http://notepad-plus-plus.org/ • SIL ViewGlyph http://scripts.sil.org/cms/scripts/page.php?item_id=ViewGlyph_home • BabelMap http://www.babelstone.co.uk/software/babelmap.html • ExSite9 http://www.intersect.org.au/exsite9

  16. Your projects • discuss in groups • what are the problems or weaknesses in our “data management plan” or data management methods?

More Related