1 / 23

Nick Thieberger Department of Linguistics & Applied Linguistics The University of Melbourne

Archives for communities of interest, the Pacific And Regional Archive for Digital Sources in Endangered Cultures (PARADISEC. Nick Thieberger Department of Linguistics & Applied Linguistics The University of Melbourne PNC Conference November 2005.

luka
Download Presentation

Nick Thieberger Department of Linguistics & Applied Linguistics The University of Melbourne

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Archives for communities of interest, the Pacific And Regional Archive for Digital Sources in Endangered Cultures (PARADISEC Nick Thieberger Department of Linguistics & Applied Linguistics The University of Melbourne PNC Conference November 2005

  2. Collaborative digital research resource set up by University of Sydney, University of Melbourne & Australian National University, 2003. (UNE joined 2004) 75% funding from Australian Research Council LIEF Scheme (3 successful applications)

  3. Communities of interest A group of linguists and musicologists recognised that large collections of recorded material were not being properly archived. The other parts of the community are speakers and their descendants. Shared needs in the current group, and need for training of new researchers. At least 3000 hours of analog fieldtapes New technologies have a steep learning curve • Need for specialised assistance • Applied for research funds to establish an archive

  4. Communities of interest • Collaboration across universities and disciplines • Support from computing specialists (data grid, mass data store, programming), government agencies (E-research, Australian Partnership for Sustainable Repositories. GrangeNet) • International links - similar initiatives (OLAC/DELAMAN) • Regional cultural centres and museums (targets for repatriation of digital recordings) • International standards - Metadata (OLAC/OAI) • All requires coordination or project management

  5. To preserve and make accessible Australian researchers’ field recordings of endangered languages and musics from the Asia-Pacific together withother digital material related to cultures of the region (theses, wordlists, texts, etc) Preservation: to adopt world’s best practice standards and formats to maximise sustainability and future usability of the collection Access: To take advantage of emerging information and communication technologies to maximise access to our collection by both researchers and cultural heritage communities

  6. Over 2000 of the world’s 6000 languages in the Asia-Pacific region Number likely to fall to a few hundred by 2100 (UNESCO) Australian researchers active in region since 1950s - making unique recordings of unrepeatable events Recordings now themselves endangered (format obsolescence, media deterioration, loss of metadata)

  7. 2500 records in PARADISEC catalogue with data on 390 languages from 50 countries including:American Samoa, Australia, Bangladesh, Botswana, Cambodia, Chile, China, Cook Islands, Fiji, French Polynesia, Greenland, Hong Kong, Iceland, India, Indonesia, Israel, Italy, Japan, Kiribati, Republic Of Korea, Lao People’s Democratic Republic, Madagascar, Malaysia, Malta, Marshall Islands, Mexico,, Federated States Of Micronesia, Myanmar, Nauru, Nepal, New Caledonia, New Zealand, Nigeria, Niue, Palau, Papua New Guinea, Philippines, Reunion, Samoa, Singapore, Solomon Islands, South Africa, Taiwan, Province of China, Thailand, Tonga, Uganda, United States of America, Vanuatu, Viet Nam, Wallis And Futuna (data as of September 2005)

  8. Locating data in the collection • Metadata complying to international standards • Open language archives community (OLAC) • Geographic data entered via a map interface for later geographic querying • Open Archives Initiative (OAI)

  9. Metadata Catalogue • SQL/PHP password access • Controlled vocabularies (language name, contributor role, data type, coverage, etc) • Link to repository data stored at the Australian Partnership for Advanced Computing (APAC) in Canberra

  10. Typical data • Stephen Wurm’s several hundred tapes, including 120 1970s Solomon Islands tapes and transcripts/fieldnotes • Arthur Capell’s 114 tapes, Pacific and PNG 1950s (and 30 archive boxes of fieldnotes) • Bert Voorhoeve’s 180 tapes - West Papua • Tom Dutton’s 295 PNG tapes

  11. Imaging fieldnotes • To date over 10,000 pages of fieldnotes have been photographed using AUSTEHC's system • Crucial that links between fieldnotes and field recordings be maintained • Aim to allow trusted users to build links between dynamic media and fieldnotes

  12. Wurm collection, Solomon Islands, 1979. Digitised cassette tape with page image of transcript, and Wurm’s language map

  13. Archival data • Linking transcripts to media • Citation of primary media • Searchable time-aligned media corpus

  14. Audiamus • Building a citable corpus of media via linked transcripts • Persistent naming implied by citability • Creation of good archival forms of media and then transcripts associated with them by stand-off markup • Need for a tool that facilitates working with this corpus • Cross platform tool Audiamus created for interacting with field recordings via their transcripts

  15. Training, resources and advocacy • Use of new technological approaches requires training, resources and advocacy • Training in use of new tools • Resources such as software, archiving, advice on tools and methods • Advocacy of the benefits of these new approaches and tools and the reasons for engaging with them

  16. Training, resources and advocacy • Great need for training expressed by postgraduate students in particular • Training is critical as tools are constantly emerging (recording techniques and equipment, software tools)

  17. Training, resources and advocacy • We have run training workshops in the use of appropriate linguistic toolsfor archival output (Toolbox, Transcriber etc) • University campuses in Melbourne, Sydney, Brisbane, University of Hawai’i • In community language centres in Melbourne, Kalgoorlie, Nambucca Heads and Sydney • Batchelor Institute

  18. Training, resources and advocacy • Methods for development of: • Time-aligned transcripts (in XML) • Interlinearised text • Dictionary production • Crucial separation of content and form to allow well-formed archival data

  19. Training, resources and advocacy • Training in creation of archival sources by fieldworkers • Naming conventions and persistent identification of data • Metadata sets and tools • Data formats • WAV • Text/XML • etc

  20. DOBES (Netherlands) LACITO (Paris) ELAR (London) ANLC (Alaska) EMELD (Michigan) DELAMAN archives AILLA (Texas) PARADISEC AMPM (Auckland) AIATSIS (Canberra) Global research community Digital Endangered Languages and Musics Archives Network

  21. We are cited as an exemplar using Digital Mass Storage Systems in the International Association of Sound and Audiovisual Archives (IASA) Guidelines on the Production and Preservation of Digital Audio Objects (IASA-TC04). Aarhus, Denmark: International Association of Sound and Audiovisual Archives (IASA), 2004, p. 51. "The Sub Committee on Technology of the Memory of the World Programme of UNESCO recommends these guidelines as best practice for Audio-Visual Archives. "

  22. Current size of collection As at October 7th 2005 - 4294 files in the collection totaling 1.66 TB Total file sizes by file type: ".jpg" : 9.71 MB ".mp3" : 53.70 GB ".pdf" : 5.70 MB ".rtf" : 1.04 MB ".tif" : 848.57 MB ".txt" : 2.15 MB ".wav" : 1.61 TB ".xml" : 1.20 MB Total file counts by file type: ".jpg" : 46 files ".mp3" : 2001 files ".pdf" : 34 files ".rtf" : 8 files ".tif" : 171 files ".txt" : 3 files ".wav" : 2000 files ".xml" : 31 files

  23. Further information http://paradisec.org.au

More Related