90 likes | 216 Views
The LANCHART data format and search engine. Data formats in the LANCHART Project. transcription. Recording (digitalization) wav-file Transcription: Transcriber wav-file & trs-file Analytical coding: Praat wav-file & TextGrid Searching & counting:
E N D
Data formats in the LANCHART Project transcription Recording (digitalization) wav-file Transcription: Transcriber wav-file & trs-file Analytical coding: Praat wav-file & TextGrid Searching & counting: The LANCHART search engine MySQL database automatic conversion automatic import
The Praat TextGrid participant tier name tier hej med dig ortografi (AMF) tier-group tier host events (AMF) ortografi (XJM) hejsa tiergroup events (XJM)
What a basic search engine does sådan noget man kan når det er ens farmors ortografi (AMF) G AS DS SB RH grammatik (AMF) R AN
The job for the LANCHART search engine match overlapping match kunne du ligge og dø hvor ingen opdagede det ortografi (AMF) G AS DS SA RJ grammatik (AMF) ordstil (AMF) L FAO OB Common tier genre Ggr
The LANCHART search engine • A WebService • JSP / Servlets + front-end JavaScript http://dgcssintranet/search.jsp • A Database Engine, MySQL Search engine: • Highly normalized to eliminate redundancy • Updated every night from Korpus
Support for multiple transcription & analysis formats • Conversions are done using a XML-based `super’- format, so that new formats can be added by creating conversion programmes
Support for multiple transcription & analysis formats CLAN/.Chat • ’Superformat’ is XML-based allowing for XSL Transformations for conversion • Programmed in Java for portability Superformat Praat/.TextGrid Transcriber/.trs Other formats