310 likes | 534 Views
Romanian Online Dialect Atlas. ? 2003 Embleton, Uritescu, Wheeler. 2. Romanian Online Dialect Atlas . Sheila M. Embleton Department of Languages, Literatures and Linguistics, York UniversityDorin Uritescu co-editor of source atlas: Noul Atlas lingvistic rom?n. Crisana.Department of French, G
E N D
1. Romanian Online Dialect Atlas © 2003 Embleton, Uritescu, Wheeler 1 Romanian Online Dialect Atlas An exploration into the management of high volumes of complex knowledge in the social sciences and humanities.
Sheila M. Embleton
Dorin Uritescu
Eric S. Wheeler
2. Romanian Online Dialect Atlas © 2003 Embleton, Uritescu, Wheeler 2 Romanian Online Dialect Atlas Sheila M. Embleton
Department of Languages, Literatures and Linguistics, York University
Dorin Uritescu
co-editor of source atlas: Noul Atlas lingvistic român. Crisana.
Department of French, Glendon College, York University
Eric S. Wheeler
ITEC program, York University,
Managing partner, Wheeler and Young Inc.
3. Romanian Online Dialect Atlas © 2003 Embleton, Uritescu, Wheeler 3 Romanian Online Dialect Atlas Supported (2003-2006) by a grant from:
Social Sciences and Humanities Research Council (Canada)
4. Romanian Online Dialect Atlas © 2003 Embleton, Uritescu, Wheeler 4 Agenda The problem of high-volume, complex data in social sciences and humanities.
Predecessor projects: English, Finnish dialect data
Use of Multidimensional Scaling (MDS) to consolidate data
Interactive, media-rich presentation
5. Romanian Online Dialect Atlas © 2003 Embleton, Uritescu, Wheeler 5 Problem In social sciences/humanities, data is often characterized by:
high volume
multiple variables or dimensions
no a priori model
Dialectology provides a good exemplar
6. Romanian Online Dialect Atlas © 2003 Embleton, Uritescu, Wheeler 6 Dialectology Explain the variations in linguistic usage across geography
Simple example:
“church” vs. “kirk” (< OE cirice)
More realistic problem:
169 features in 313 locations (SED)
213 features in 400+ locations (Finnish)
7. Romanian Online Dialect Atlas © 2003 Embleton, Uritescu, Wheeler 7 Dialect atlases Record the details in maps
Many maps needed to make an atlas
Recovery of individual facts is possible but...
Global understanding of the situation is lost in the volume of details
8. Romanian Online Dialect Atlas © 2003 Embleton, Uritescu, Wheeler 8 English Survey of English Dialects (SED)
169 features at 313 locations
Computer Developed Linguistic Atlas of English
Applied MDS to already computerized data
9. Romanian Online Dialect Atlas © 2003 Embleton, Uritescu, Wheeler 9 English: results 2-D map of dialect locations
No geographic information used
Close correspondence to geography (as expected)
Highlighted further problems of handling and understanding high-volumes of data
10. Romanian Online Dialect Atlas © 2003 Embleton, Uritescu, Wheeler 10
11. Romanian Online Dialect Atlas © 2003 Embleton, Uritescu, Wheeler 11 Finnish
12. Romanian Online Dialect Atlas © 2003 Embleton, Uritescu, Wheeler 12 Kettunen (1940) The Dialect Atlas of Finland 213 maps x 530 locations
Up to 16 features per map
Typically 1-3 features per location
~120,000 data items
Project: data computerization (largely done)
Stage II: application of MDS (not yet done)
13. Romanian Online Dialect Atlas © 2003 Embleton, Uritescu, Wheeler 13 Map 1 (parts)
14. Romanian Online Dialect Atlas © 2003 Embleton, Uritescu, Wheeler 14
15. Romanian Online Dialect Atlas © 2003 Embleton, Uritescu, Wheeler 15 Ambiguity
16. Romanian Online Dialect Atlas © 2003 Embleton, Uritescu, Wheeler 16 Resolution Make Editorial decision:
“X, not Y”
Mark as “AMBIGUOUS”
“X or Y”
Get more input
“X (says expert)”
17. Romanian Online Dialect Atlas © 2003 Embleton, Uritescu, Wheeler 17 Lesson In transforming data from one medium to another, even well-structured data will have unexpected pitfalls:
Design data-transformation carefully
Prototype your system; Find the problems early
Plan to work iteratively
18. Romanian Online Dialect Atlas © 2003 Embleton, Uritescu, Wheeler 18 Romanian Online Dialect Atlas: Crisana Apply innovative contemporary methods in dialect geography to an online set of Romanian dialect data.
19. Romanian Online Dialect Atlas © 2003 Embleton, Uritescu, Wheeler 19 Romanian language Key to understanding the evolution of all Romance languages
Early branch, distinct from French-Spanish-Italian line
Exemplar of non-hierarchical, dialect variation, and linguistic continua
Transition areas contain mixtures of dialect features and specific features
20. Romanian Online Dialect Atlas © 2003 Embleton, Uritescu, Wheeler 20 RODA: Part 1 Create online version of The New Romanian Linguistic Atlas. Crisana (Stan & Uritescu. 1996)
Available on internet and CD
Default interpretations
Interactive interface to data
custom select data for a map
Add audio clips to illustrate data
21. Romanian Online Dialect Atlas © 2003 Embleton, Uritescu, Wheeler 21 RODA Prototype 1
22. Romanian Online Dialect Atlas © 2003 Embleton, Uritescu, Wheeler 22 RODA: Part 2 Allow plug-in applications and other analyses of data, e.g.
Apply Multidimensional Scaling to dialect data
Statistical technique
Consolidate large amounts of data
Complement to traditional analyses of small amounts of data
23. Romanian Online Dialect Atlas © 2003 Embleton, Uritescu, Wheeler 23 Multidimensional Scaling
24. Romanian Online Dialect Atlas © 2003 Embleton, Uritescu, Wheeler 24 Multidimensional Scaling Statistical technique (Torgerson 1952)
Used in sociology, psychology, marketing
Reveals the scales along which data varies; gives a data-space
Uses distances [(dis)similarities] among responses of subjects
25. Romanian Online Dialect Atlas © 2003 Embleton, Uritescu, Wheeler 25 MDS Axioms of metric
d(X,X) = 0
d(X,Y) = d(Y,X)
d(X,Y) > 0 if X?Y
d(X,Y) ? d(X,C) + d(C,Y) for all points C
Matrix reflects these rules
26. Romanian Online Dialect Atlas © 2003 Embleton, Uritescu, Wheeler 26 MDS n+1 points generate an n-dimensional space
MDS can reduce that high-dimensional space to 2 (or 3) dimensions
Result: complex data can be viewed as a “map”
27. Romanian Online Dialect Atlas © 2003 Embleton, Uritescu, Wheeler 27 MDS Can use MDS to consolidate data
English 312 dimensions reduced to 2
All 169 features included (and taken in relevant subsets)
Finnish, Romanian provide large data sets that can do the same
28. Romanian Online Dialect Atlas © 2003 Embleton, Uritescu, Wheeler 28 Interactive, media-rich presentation Objectives
Make data accessible, useful to a wide research audience
Methods
Interactive selection of data
Constructive presentation of data
Addition of audio and other media
Online is much more than a book!
29. Romanian Online Dialect Atlas © 2003 Embleton, Uritescu, Wheeler 29 Framework and App’ns Online atlas provides a framework for accessing and presenting data
Other applications can work within the framework to transform or process the data, such as:
MDS data consolidation
Tools to analyze dialect variants of phonemes (proposed)
Others
30. Romanian Online Dialect Atlas © 2003 Embleton, Uritescu, Wheeler 30 Summary Humanities and Social Sciences deal with large, complex data sets
Explore methods to access, process, present this kind of data
Solutions include:
MDS type processing
Online, interactive, rich presentation
Example: Romanian Online Dialect Atlas
31. Romanian Online Dialect Atlas © 2003 Embleton, Uritescu, Wheeler 31 References Embleton, Sheila M. and Eric S. Wheeler (2000). Computerized Dialect Atlas of Finnish: Dealing with Ambiguity. J. of Quantitative Linguistics 2000. 7.3. pp 227-231.
Embleton, Sheila M. and Eric S. Wheeler (1997a). Multidimensional Scaling and the SED Data. in Wolfgang Viereck and Heinrich Ramisch. The Computer Developed Linguistic Atlas of England 2. Tuebingen: Max Niemeyer Verlag.
Embleton, Sheila M. and Eric S. Wheeler (1997b). Finnish Dialect Atlas for Quantitative Studies. J. of Quantitative Linguistics 1997. 4.1-3. pp 99-102
Schiffman, Susan S. , M. Lance Reynolds, Forrest W. Young (1981). Introduction to Multidimensional Scaling. Theory, Methods, and Applications. New York: Academic Press. 411pp.
Torgerson, W. S. 1952. Multidimensional scaling: 1. theory and method. Psychometrika. 17. 401-419.
Stan, Ionel & Uritescu, Dorin. 1996. Noul Atlas lingvistic român. Crisana. Vol. I. Bucharest: Romanian Academy Press. (2003. Vol. II. Bucharest: Romanian Academy Press)
Uritescu, Dorin. 1983. “Asupra repartitiei dialectale a graiurilor dacoromâne. Graiul din Oas" / "On the Dialect Structure of Daco-Romanian. The Dialect of Oas”/, in Materiale si cercetari dialectale II, Cluj-Napoca: The University of Cluj- Napoca, pp. 231 - 246.
Uritescu, Dorin. 1984a. “Subdialectul crisean.” In: V. Rusu (ed.), Tratat de dialectologie româneasca. Craiova: Scrisul românesc, 284-320, 916-930.
Uritescu, Dorin. 1984b. “Graiul din Tara Oasului.” In: V. Rusu (ed.), Tratat de dialectologie româneasca. Craiova: Scrisul românesc, 390-399, 964-967.
Wheeler, Eric S. (2002). Zipf's Law and Why It Works Everywhere. Glottometrica 4, 45-48.
Wheeler, Eric S. (2003). Multidimensional Scaling to Visualize Text Separation. Glottometrica 6 forthcoming.
Wheeler, Eric S. (nd). Multidimensional scaling. chapter in Reinhard Koehler. (ed) forthcoming Handbook in Quantitative Linguistics.