1 / 21

3. Using typological databases in historical linguistic research

3. Using typological databases in historical linguistic research. Prerequisites for using typological features for inferring phylogenies. An adequate amount of data structured in an adequate way A proper selection of features based on their stabilities.

jayme
Download Presentation

3. Using typological databases in historical linguistic research

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 3. Using typological databases in historical linguistic research

  2. Prerequisites for using typological features for inferring phylogenies • An adequate amount of data structured in an adequate way • A proper selection of features based on their stabilities

  3. What an adequate amount of information structured in an adequate way is is an open question. Let‘s look at examples of phylogenies based on • lexical data from ASJP • typological data from Jaziky Mira • typological data from WALS for the (almost) same set of languages

  4. Abkhaz (abk) Azerbaijani (North) (azb) Bashkir (bak) Bengali (ben) Breton (bre) Bulgarian (bul) Burushaski (bsk) Catalan (cat) Chechen (che) Chukchi (ckt) Chuvash (chv) Czech (ces) Danish (dan) Dutch (nld) Finnish (fin) French (fra) Georgian (kat) Hebrew Modern (heb) Hungarian (hun) Icelandic (isl) Italian (ita) Itelmen (itl) Kabardian (kbd) Ket (ket) Khanty (kca) Kirghiz (kir) Komi Zyrian (kpv) Lezgian (lez) Nenets (yrk) Ossetic (Osetin) (oss) Persian (pes) Polish (pol) Portuguese (por) Russian (rus) Selkup (sel) Swedish (swe) Tatar (tat) Ukrainian (ukr) Uzbek (uzn) Yakut (sah)

  5. Ossetic (Osetin) (oss) Persian (pes) Polish (pol) Portuguese (por) Russian (rus) Selkup (sel) Swedish (swe) Tatar (tat) Ukrainian (ukr) Uzbek (uzn) Yakut (sah) Abkhaz (abk) Azerbaijani (North) (azb) Bashkir (bak) Bengali (ben) Breton (bre) Bulgarian (bul) Burushaski (bsk) Catalan (cat) Chechen (che) Chukchi (ckt) Chuvash (chv) Czech (ces) Danish (dan) Dutch (nld) Finnish (fin) French (fra) Georgian (kat) Hebrew Modern (heb) Hungarian (hun) Icelandic (isl) Italian (ita) Itelmen (itl) Kabardian (kbd) Ket (ket) Khanty (kca) Kirghiz (kir) Komi Zyrian (kpv) Lezgian (lez) Nenets (yrk)

  6. Georgian (kat) Chechen (che) Lezgian (lez) Uzbek (uzn) Abkhaz (abk) Azerbaijani (North) (azb) Kabardian (kbd) Ossetic (Osetin) (oss)

  7. ASJP

  8. Jazyki Mira

  9. WALS

  10. The amount of data for the languages in JM: currently unknown (Oleg is working on it). • The amount of data for this language set in WALS: between 37 and 136 features (average: 86.5). As good as it gets in WALS.

  11. The relation among the amount of data and the performance for establishing phylogenies in WALS Correlations of WALS distances with the Ethnologue classification (dotted lines) and the WALS classification (solid lines). In each group of curves, the lowest represents the sample of languages with 20 minimally attested features, and successively higher curves represent languages with 40, 60, 80, and 100 attested features.

  12. Figure 5. Results of mixing WALS and ASJP distances: correlations with the WALS classification (solid lines) and the Ethnologue classification (dotted lines) as a function of the percentage of ASJP data in the mixture. In each group of curves, the lowest (on the left side of the graph) represents the sample of languages with 20 attested features, and successively higher curves represent languages with 40, 80, 60, and 100 attested features, respectively.

  13. So in spite of the problems with WALS-type features an ASJP-type classification can be improved when combined with WALS features.But if we don‘t want to make a selection we need to figure out which are the most stable features. How do we do that?“…we are far from being able to reduce the different stabilities and viabilities of various linguistic elements to precise numbers…” (Nichols 2003: 283)

  14. Here‘s what to do: • Invent some metric or find suggestions in the literature • Test its performance on a simulated dataset where the stabilities are preset • Apply it to an empirical dataset • Look at how the results compare to other people‘s claims • Explain the results

  15. 3 different metrics • Metric A: Count for the genetic groups and the areal groups what percent of the languages share the feature value that is the best represented within each group and take into account the number of values of this best represented features and the number of languages sharing it, not just the proportion (Wichmann and Kamholz, forthc.) • Metric B: Same as A, but not taking into account the number of features involved (Nichols 1995) • Metric C: Measure the proportion of language pairs within genera which have the same value for a given feature and weight this by the proportion of unrelated language pairs which share the same value (Wichmann and Holman, under review)

  16. Performances of the Wichmann/Holman metric C, Wichmann/Kamholz Metric A, and Nichols Metric B for different situations of data coverage PL: Probability that a language is included in the sample PF: Probability that a feature is attested for a language

  17. Let‘s interpret the stabilities • My show‘s over • Your turn

  18. Revisit these lectures: • Google „Soeren Wichmann“, go to „for students“ and look at the slides • There is a special home page for the ASJP project. Check it out

  19. • Wichmann, Søren and Eric W. Holman. Under review. Assessing temporal stability for linguistic typological features. • Holman, Eric W., Søren Wichmann, Cecil H. Brown, Viveka Vilupillai, André Müller, • Pamela Brown, and Dik Bakker. In press. Explorations in automated lexicostatistics. Folia Linguistica (scheduled for 2008). • Wichmann, Søren and Arpiar Saunders. 2007. How to use typological databases in historical linguistic research. Diachronica 24.2: 373-404. • Wichmann, Søren and David Kamholz. In press. A stability metric for typological features. Sprachtypologie und Universalienforschung.

More Related