1 / 12

Þórunn Blöndal

Þórunn Blöndal. ÍSTAL The Icelandic Corpus of Spoken Language Nordtalk – NorFa: Using spoking language corpora Göteborg Aug 19-24 2002. Research on Spoken Icelandic. Research on regional differences in pronunciation language acquisition the development of narrative skills.

elwyn
Download Presentation

Þórunn Blöndal

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Þórunn Blöndal ÍSTAL The Icelandic Corpus of Spoken Language Nordtalk – NorFa: Using spoking language corpora Göteborg Aug 19-24 2002

  2. Research on Spoken Icelandic • Research on • regional differences inpronunciation • language acquisition • the development of narrative skills

  3. The ÍSTAL Group • Ásta Svavarsdóttir • The Institute of Lexicography (asta@lexis.hi.is) • Eiríkur Rögnvaldsson • University of Iceland (eirikur@hi.is) • Hrafnhildur Ragnarsdóttir • Iceland University of Education (hragnars@khi.is) • Kristín Bjarnadóttir • The Institute of Lexicography (kristinb@lexis.hi.is) • Sigurður Konráðsson • Iceland University of Education (sigkon@khi.is) • Þóra Björk Hjartardóttir • University of Iceland (thorah@hi.is) • Þórunn Blöndal • Iceland University of Education (thblond@khi.is)

  4. The Goal • From the outset, the ÍSTAL group’s primary objective was to establish a corpus of spoken language for use in two broadly defined fields: • linguistic research on the spoken language; i.e., in syntax, morphology, conversation analysis, etc. • computational linguistics and language technology

  5. ? ? Interview ? interviews shopping formal meetings ? informal conversation ? phone conversation task-oriented dialogue formal conver- sation (doctor/patient consultation, etc.) native / non- native speakers non-native speakers of Icelandic children / parents ?

  6. Sony MZ-B3

  7. The Orthography Standard orthography is used in ÍSTAL, but deviations from the most common pronunciation are given in brackets: • dálítið (a little) > dáldið > doldið Loan words embedded in Icelandic are spelled according to Icelandic phonetic rules: • OK >ókei

  8. The Header ... • Heiti upptöku: 04-701-02 – Number • Dagsetning upptöku: 040400 – Date • Stutt lýsing á efni: Spjall á kennarastofu – Short description • Kaflar umritunar: kynlífsvæðing – Topics transcribed • Stuttnefni: kennkynlíf – Abbreviated title • Lengd upptöku: 00:08:58 – Duration • Upptökutæki: Sony digital, mini disc MZ-B3 – Recording device • Þátttakandi: A = Þ1; kk 34 kennari – Participant – male 34 - teacher • Þátttakandi: B = Þ2; kk 41 kennari – Participant – male 34 - teacher • Þátttakandi: C = Þ3; kvk 40 kennari – Participant – female 34 - teacher • Þátttakandi: D = Þ4; kvk 45 kennari – Participant – female 34 - teacher • Heiti umritunar: UM-04-701-02 – Second listening/transcription • Umritari: KE – Transcriber’s initials • Dagsetning umritunar: 0800 – Date of secondlistening/ transcription • Hvað umritað: Material transcribed

  9. .... • Umritunarkerfi: AUGLUMSTAFS – Standard orthography • Yfirlesari: Proofreader • Dagsetning yfirlesturs: Proofreading date • Skráður tími: **?** • Athugasemdir: Í upphafi koma Sv skólastjóri=Sv, og Be=Be inn í samtalið sem er tekið upp í frímínútum á kennarastofu. Þ1, Þ2, Þ3 og Þ4 eru samkennarar. – Comments: In the beginning of the conversation, the headmaster (Sv) and Be participate; then they leave. Participants 1, 2, 3, and 4 are colleagues, teachers in the same school. 2-yfirlestur: HBE 060102– Second proofreading 060102

  10. ÍSTAL as it is now • The data bank contains 31 conversations with 2 to 4 participants. • The participants are 30-60 years old. • The data are collected in various geographical regions of Iceland. • Each transcription is marked with a header showing information on the participants’ age, gender, and relationship to one another; the duration of the conversation; and other relevant information. • The total duration of transcribed material is approximately 20 hours. • Of 31 conversations, 6 take place among males, 5 are among females, and 20 are mixed. • The material is transcribed according to the standard orthography with only slight deviation.

  11. ÍSTAL’s Role in Research The following have been presented as works in progress: ·Comparison between word frequencies in spoken and written Icelandic (Ásta) ·Investigation on ‘það’(‘it’ /’there’) in Icelandic (Eiríkur) ·A collaborative completion of turn constructional units (TCU) in conversation (Þórunn)

  12. Thank You!

More Related