LANGUAGE DOCUMENTATION and DESCRIPTION

LANGUAGE DOCUMENTATION and DESCRIPTION BaharKocaman, Sezer Yurt, Gülden Berber

Bydocumentinglanguagesweengage in amassing data forpreservation. • Thiswillallowfuturegenerationstoaccess data for languagesevenaftertheyaregone. • Describinglanguagesprovidesinformationabout them, • and has theextendedeffectthatthemereexistence of thedescriptionsmay be empoweringforendangered languages.

Fieldwork

Fieldworkis theprocedure of acquiringlinguistic data fromlanguageconsultants, preferably in environmentsfamiliartothem, such as theirhomesor workplaces. • Prototypicalfieldwork(Hyman, 2001): A linguist spending an extendedamount of time with a community in an exoticplace, documentingand recording a littleknownlanguage of a community withthehelp of localinformants.

‘describestheactivity of a researcher systematicallyanalysingparts of a language, usuallyotherthanone’snativelanguageand usuallywithin a community of speakers of that language’ (Sakel & Everett, 2012:5).

Whenplanningforfieldwork, thelinguistwill havetoconsider • what he wishestoinvestigate, • how he will be acceptedintothecommunity, • how he willworkwiththeconsultants, • how he willgoaboutgathering data, • how tomakethat data as reliableand comprehensive as possible, andso on.

Language Documentation

Language documentation is thecollection of raw data thatmaythen be usedforfurther analysis. • Themerits of languagedocumentation is to givelinguistsraw data toworkwithandto preservea culturalheritage of thecommunity.

Whilelanguagedocumentationmay not slowthe rate of languagedeath, it willpreserverecords of the languagesforfuturegenerations. • An example of a largescalelanguagedocumentation program is TheRosetta Project, whereinformation forover 1500 languages is currentlystored, much of it publiclyavailable.

Language Description

Language descriptionseekstoillustratethe essentials of a language, based on availablematerial. Itprovidesanalyses of a variety of areas of the language, such as itsphonological, morphological, grammaticalandsyntacticsystems, as well as, ideally presenting a lexicon of thelanguage. • Ideallythedescription is general enoughto be comparablewithotherdescriptions, but specific enoughtocapturetheuniqueness of thelanguage (Lehmann, 1999:6).

Thecombinedeffect of descriptionsand documentationsmayleadto a higherrecognition, even on a politicallevel, of thelanguage. Reference materials, such as educationalmaterial, produced in combinationwithdescriptivematerials, mightleadto a higherawareness, andmightevenleadtothe languagebecomingrecognizedenoughto be taught in schools.

Sampling

Typolocigalsurveysaredependent on data fromdifferentlanguages, oftenfrom a largenumber of languages. • Toincludeallhumanlanguages in an investigation is simply not possible. Becausewedon’thaveaccesstoallhumanlanguagesandwehavelimitedaccesstothelanguages of theworld.

Statementsaboutcross-linguistic patterns, tendenciesanduniversal arealwaysbased on a sample of languages.

Types of Samples

1.ProbabilitySample: Inordertocheckforstatisticaltendenciesandcorrelations of variousfeatures, weuseprobabilitysample. • Inthissample, wemust set variables beforehandandmapthesampleaccording to presence orabsence of thosevariables.

Forexample, we can checkpatternsfor reduplicationbychoosing a set of variables, such as • thelanguagedoesn’thavereduplication • thelanguage has partialreduplication only

thelanguage has fullreduplicationonly • thelanguage has bothpartialandfull reduplication. Wethenproceedtocodeeachlanguageof thesampleaccordingtothosevariables, choosingonlyonevariableperlanguage.

2. VarietySample: is ‘mainlyusedfor explorativeresearch: whenlittle is knownaboutthe form orconstruction underinvestigation it is importantthat thesampleoffers a maximumdegree of thelinguisticparameters [i.e. variables] involved’ (Rijkhoff & Bakker 1998:265).

3. ConvenienceSample: is a sample based on whatkind of data one has accessto.

Types of Bias

1.BibliographicalBias: Small or remotelylocatedlanguages, veryoften isolatesorlanguages of unknown affiliations, arebiasedtoward exclusionfromthesamples.

Forinstance, untilDerbyshire (1977) published his description of the Hixkaryanawordorder, objectinitial languageswere not to be found in any surveys on types of wordorder.

2. Genetic (Genealogical) Bias: Somelanguage familiesareoverrepresentedwhileothersare underrepresented in thesample. • Manyfeatures of a languageareinherited. If a sample is biasedtowardsonefamilyoverothers, a feature mightlookmoreorlesscommonthan it actually is, simplybecause of how it appears in thedominating family.

Forexample, toneisn’t a commonfeature in Indo-Europeanlanguages, but it is quite common in Niger-Congolanguages. If a sample has a higherproportion of Indo- Europeanlanguagesthanotherfamilies, the patternthat is likelytoemerge is thattone seemslesscommoncrosslinguisticallythan it actually is.

3.ArealBias: Languagesfromthesamelinguistic areaareoverrepresented, whichmayskewthe resultingpatternonewayoranother. • Linguisticareasareareaswherelanguageshave been in sustainedcontactandhaveinfluenced eachothersothattheyhavespecificfeatures not found in thelanguagesoutsidethearea.

Forexample, thelanguages of the Balkan area, whichbelongtodifferent genera of Indo-European, have postposedarticles as opposedtothe neighbouringlanguagesoutsidethe linguisticarea, and as opposedto otherlanguages of thesamegenera.

4. TypologicalBias:One linguistic type is over- or underrepresented in a sample. • Forexample, if we want to check if there is anycorrelation between adposition and verb-object word order, we need to include languages of all types, such as those with prepositions, those with postpositions… • If we have an overrepresentation of languages with, for example, prepositions, we are likely to get a skewed pattern.

5. CulturalBias: We have an over- orunderrepresentation of the different culturesof the world in the sample. • There is “a relationbetween certain aspects of the grammar of a language on the one hand and beliefsand practices of its speakers on the other hand” (Bakker 2010:108). • For example, in astudy on number marking,Lucy (1992) found that speakers of AmericanEnglish andspeakers of Yucatec (Mayan(Mayan): Mexico) treat nouns differently:

The Englishspeakers; • make a sharp distinction between mass and count nouns, • have obligatorynumber marking for count nouns. The Yucatec speakers; • treat most nouns asmass nouns, • have optional number marking but an obligatory numeral classifier system.

When asked to sort pictures ofobjects, • the English speakers tended to sort objects by shape, • the Yucatec speakerstended to sort objects by material composition.

The number marking systemderives from the cultural outlook( i. e. how one viewsand categorizes objects) or • the cultural interpretation of objects derives from the linguisticstructure, is probablyimpossible to establish.

“if languages are closely related genetically, they are likely to have inherited common linguistic types from their ancestor language, to be spoken in the same area and by people sharing the same culture” (Cristofaro 2005:91).

Finally, statistics may seem far removed from typology, but is actuallypretty essential, since what we are dealing with is sets of data, samples aimed at representingthe whole, anddrawing conclusions fromthese sampled data.

Databases

Thedatabaseswhichare popular recentlyarebeneficialforboth compilersandlinguisticcommunity.

Someadvantages of thesedatabasesare: • Theyarebeneficialforresearchmakingscores of data accessible. • Theyallowcompilersto be recognizedfortheir painstakingwork. • Thesedatabases can continouslyupdated.

Howeverthesedatabasesradically maydifferfromeachotherbothin selectionof languagesand in the approachtotheentries.

Forexample; • Therearedatabaseswith a vastamount of languages but wherethedata providedforeachlanguage is restricted. • Therearedatabasesprovidingveryelaborateinformationfor eachlanguage, but thenumber of languages is smaller. • Therearedatabaseswhichlookonly at onespecificlanguage domain whileotherdatabasescode a host of featuresand informationaboutthelanguage.

Thefollowingonesarethreedifferentkinds of databases • Word Atlas of Language Structure (WALS). • Atlas of PidginandCreolelanguageStructure (APiCS). • AutomatedSimilarityJudgement Program (ASJP).

1. Word Atlas of Language Structure (WALS) • Itis a milestone in terms of large-scale databases.

Somepositivefeatures of WALS • Itcompiles a number of databasesintoonesingleunit coveringa greatpart of abstractlinguisticsystemincluding phonology, morphology, syntax, grammar, andlexical features. • Italsoprovidesthefirstworldwidecollectedmapping of languagesystems. • Anotheraspect of WALS is that it includestwochaptersabout signlanguages. • Eachlinguisticfeature is dealtwithseperately in WALS • TheAtlas providesmetadataforeachlanguageincludes specificallythelocation of thelanguageanditsgenealogical classification.

Twonegativefeatures of WALS • Becauseathourswereresponsibleforindividual features, theirchaptersmaycontain a largeamountof languagesthoughthesemay not necessarilyoverlap withotherchapters. • WALS completelyignoredpidgin, creoleandmixed languages.

2.Atlas of PidginandCreole Language Structure(APiCS) • Incontrastto WALS, it is thefirstlarge- scaletypologicalprojectforpidginand creolelanguages.

Somepositiveaspects of APiCS • Itbasicallyatractsattention of experts on pidgin, creole, andmixedlanguages. • Becausethefeaturesarepredefinedandauthorsare responsibleforspecificlanguages, thecrosscompatibility betweenlanguage is absolute. • Thekind of informationthat can be foundforone language can also be foundforeverylanguage in the database. • Theinstructionsfortheauthorsweretofillout a detailed questionnaireof featuresforthelanguage of their expertise. • Eachlanguage is alsodescribed in a surveychaptercontaining a summary of thesociohistorical background and a broad structuraloutline of thelanguage.

Twonegativeaspects of APiCS • APiCSincludesonypidgin, creole, andmixed languagesthat is selectedlanguagesthatmayormay not be of a specifictypologicalsort. So a complete crosscomparisonbetweenAPiCSand WALS is not possible. • Thesample of APiCS is biasedtowardsEnglish lexifiedcontactlanguage.

3. AutomatedSimilarityJudgement Program (ASJP) • Itaimstoprovidean objectiveclassification of the world’slanguagesbymeansof lexicostatistical analysis.

Lexicostatistics is a techniqueusedto comparetherates of changeswithin a set of words in differentlanguages in order totrytoestablish in how far theyare relatedandiftheyarewhenthey seperatedfromeachother.

Somepositivesides of ASJP • Itcomputerizesthecomparisonbetweensets of wordsusing a fixedalgorithm. • Thetaskforeachcontributor is toenter a set of 40 lexicalitemsfor as manylanguages as possible. • Somemacro data is includedforeachlanguagesuch as genealogicalaffiliation, locationandnumber of speakers. • Since thedatset is small, it is possibleforcontributors tosubmit a largeamount of languages.

LANGUAGE DOCUMENTATION and DESCRIPTION