1 / 47

New approaches to language and prehistory from typology, genetics, and quantitative linguistics

New approaches to language and prehistory from typology, genetics, and quantitative linguistics. S øren Wichmann MPI-EVA & Leiden University. Lecture IV: The utility of phylogenetic algorithms and software: some case studies. Case study A.

taite
Download Presentation

New approaches to language and prehistory from typology, genetics, and quantitative linguistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. New approaches to language andprehistory from typology, genetics,and quantitative linguistics Søren Wichmann MPI-EVA & Leiden University

  2. Lecture IV: The utility of phylogenetic algorithms and software: some case studies

  3. Case study A Can the algoritms help us in refining lexicostatistics? Let‘s compare a phylogeny based on traditional methods and one based on lexicostatistics using modern phylogenetic methods. Language family studied: Mixe-Zoquean

  4. The location of Mixe-Zoquean languages

  5. Classification criteria: shared (mostly phonological) innovations: 1. Not defined 2. /h/ inserted before final consonant 3. Vowel length is lost 4. Word-final vowels lost 5. Palatalizing effect of front vowels 6. Apparently some morphological and lexical innovations (not clear) 7. (Mostly implicit by the language being intermediate in several respects, and also having its own innovations, see 12) 8a. Syllable-initial nasals become prenasalised stops; 8b. /t/ and /ts/ merge before /i/; 8c. final devoicing 9a. Development of a quantity distinction in consonants; 9b. An analogical extension involving verb classes 12. Unlaut, syncope, anaptyxis, strees changes 22. An /h/ is inserted into verb roots whose final consonant is a stop (from Wichmann 1995)

  6. A 110-word Swadesh-style list

  7. Encoding of cognation as discrete characters

  8. A distance matrix of cognate percentages

  9. Case study B Sweet dreams and crude reality: evaluating Dunn et al. (2005) on Austronesian and Papuan

  10. What does it take for a match between two tree to be „close“? A crude test of how well two trees match is to count the Robinson-Foulds distance or „symmetrical“ differences. This is a count of how many nodes that are in one tree but not the other. First tree A is compared to tree B and then tree B to tree A and the result is divided by two (implemented in TreeDist.exe in the Phylip packages, among others)

  11. The distance between the „traditional“ and the „typological“ Austronesian trees is 4. Now we may ask: if we generate 10,000 random trees with 16 taxa, how like are you to get draw a random pair from this pool that has 4 or less differences. I carried out this test (in collaboration with Mihai Albu, who generated the trees, and Thomas Mailund, who ran the trees through his program, which is similar to TreeDist.exe).

  12. Results:

  13. The conclusion seems to be in favor of Dunn et al., but. . . the time depth of the Melomelanesian subgroup of Austronesian is very shallow, perhaps 1000 years or so (this is to be checked). The time depth of the Papuan group, if it exists at all could be 10 times as large. How good does a method work at such a time depth if it only barely works at a shallow level?

  14. On a more optimistic note: If the exact same dataset that Dunn et al. used (supplied online along with their paper) is subjected to a Bayesian analysis, the Robinson-Foulds distance is down to 3! (Thanks to Arpiar Saundars for carrying out the analysis)

  15. Traditional tree Tree produced by Bayesian analysis of typological data

  16. The probability of a Robinson-Foulds distance of 3 is around 0.01

  17. Intermediary conclusion Given that a reasonably good tree can be obtained by using typological data the method could perhaps work. And it could work even better using an adequate algorithm. . .

  18. But does it actually work?

  19. A little problems not to be overlooked: Hm, low bootstrap values. . . .

  20. How low can you go?

  21. Bootstrapping in SplitsTree (10,000 runs)

  22. Zooming in on the inner nodes

  23. Bootstrap values of all inner nodes 0.221 1 3 6 7 8 9 13 15, 0.274 1 7 8 9, 0.308 1 3 6 7 8 9 13 14 15, 0.362 1 3 5 6 7 8 9 13 14 15, 0.433 1 3 7 8 9, 0.506 1 3 4 5 6 7 8 9 10 13 14 15, 0.524 1 2 3 4 5 7 8 9 10 11 12 14, 0.596 1 2 3 5 6 7 8 9 11 12 13 14 15, 0.661 1 2 3 4 5 6 8 10 11 12 13 14 15, 0.673 1 7 9, 0.701 1 3 4 5 6 7 8 9 10 12 13 14 15, 0.939 1 2 3 4 5 7 8 9 10 11 12 14 15,

  24. What have Dunn et al. accomplished? • They are the first to have published phylogenetic trees using typological data as input • They have produced a nice dataset, including new data from fieldwork BUT • The comparison between an Austronesian tree based on the comparative method and one based on typological data is not carried out in a rigorous manner • The algorithm used (Maximum Parsimony) is the worst one available • The data are organised in binary variables, which is the worst possible way because the chance factor increases as the possible number of values of a features decreases • They argue that a fit between the proposed phylogeny and geographical patterns is in favor of the proposed phylogeny being real and not due to diffusion. But precisely diffused items are expected to pattern geographically. And actually the fit is poor. • The ask a program to produce a tree. It obeys. But it also produces bootstrap values where 11 out of 12 inner nodes are below or way below 90%. This is a tree that doesn‘t want to be a tree. Yet they accept it at face value. CONCLUSION (1) • Nothing substantial has been accomplished, neither methodologically nor empirically CONCLUSION (2) • Don‘t believe everything you read in Science and—trust me—don‘t necessarily trust people who work at Max Planck institutes

  25. Case study C Let‘s dream on. . . . Towards a subgrouping of proto-New World

  26. Step 1 Make a selection of languages belonging to the West Coast, as defined by speakers being dependant on the Pacific for subsistence or navigating on it. Assumption: there could be a group within the New World family which is mostly confined to the Pacific Coast. The list: Haida, Squamish, Makah, Quileute, Coos (Hanis), Karok, Wappo, Maricopa, Huave, Quechua, Aymara, Epena Pedee, Awa Pit, Mapudungun, Qawasqar

  27. Step 2 Find out whether there are traits among the American founder traits that are significantly better represented in this group of languages. Result: two traits: fusion of Agent and Patient markers; inflectional synthesis of the verb: 8-9 catogories per word.

  28. Step 3 Extend the set of Pacific languages to Pacific-Style languages by the criterion of sharing one of the two „significantly Pacific“ features

  29. Step 4 Reduce the set by removing languages that don‘t shared at least 25% of all WALS features that have a significantly Pacific distribution

  30. Step 5 Make a classification of Pacific-Style languages, using many WALS feautres (here 96 features)

  31. Step 6 Fiddle a bit further, and interesting patterns emerge (in the next, Haida is excluded)

  32. Conclusion A knowledge of ancestral states at the root of the tree can significantly improve subgrouping. Such „Founder traits“ also lend more credibility to a phylogeny. To be able to argue for new genealogical relations by using typological data we need either (1) strongly support roots, involving comparison with languages of the rest of the world or (2) strong internal statistical support such as high bootstrap values. Preferably we should have both. There is light at the end of the tunnel.

  33. Thanks Keep in touch: wichmann@eva.mpg.de

More Related