SEL3053: Analyzing Geordie Lecture 15. Hypothesis formulationPowerPoint Presentation

SEL3053: Analyzing Geordie Lecture 15. Hypothesis formulation

SEL3053: Analyzing GeordieLecture 15. Hypothesis formulation

Lecture 14 applied three different varieties of hierarchical cluster analysis to the DECTE data matrix M. In this lecture we will

1. Formulate a hypothesis in answer to the research question

2. Say why the approach to sociolinguistic hypothesis formulation proposed in this module is intrinsically superior to traditional methods in Arts and Humanities research generally and in corpus-based linguistics more specifically.

1. Hypothesis formulation

1.1Phonetic usage analysis

The results from lecture 14 are repeated in figure 1 for convenience of reference.

1. Hypothesis formulation

1.1Phonetic usage analysis

All three trees agree on the following:

(tlsn1, tlsn2)form a cluster that is strongly distinguished from the cluster of all the other speakers.

(tlsg01, tlsg40, tlsg03), (tlsg05, tlsg09, tlsg01, tlsg10) and (tlsg02, tlsg13, tlsg24) form moderately-distinctive subclusters.

1. Hypothesis formulation

1.1Phonetic usage analysis

They disagree on where to place following tlsg08, tlsg10, and tlsg13

1. Hypothesis formulation

1.1Phonetic usage analysis

It can, therefore, be concluded that there is broad agreement among all three analyses on the structure of differences in phonetic usage among the twelve speakers, but that there are some differences which need to be investigated.

Hypothesis formulation

1.2 Correlation with social data

The social data relating to the DECTE / TLS speakers is found in the <profileDesc> section of the header in each interview, as described in an earlier lecture.

The social data for our twelve speakers is shown opposite.

Hypothesis formulation

1.2 Correlation with social data

The most obvious correlation is

between the place of residence of the

speakers and the cluster analyses:

tlsn01 and tlsn02 are from Newcastle

and all the others are from Gateshead.

In the sample selected, therefore, Newcastle and Gateshead speakers are very strongly distinguished in terms of their phonetic usage.

No social data apart from place of residence survives for the Newcastle speakers, so the search for further correlations between phonetic usage and the social data focuses on the Gateshead speakers.

Hypothesis formulation

1.2 Correlation with social data

One of the trees we looked at earlier is shown below, with social information added.

Hypothesis formulation

1.2 Correlation with social data

Some observations:

Gender has the most obvious correlation with cluster structure: All the trees agree in clustering the male (tlsg02, tlsg24, tlsg13) against the remaining seven female speakers.

Hypothesis formulation

1.2 Correlation with social data

Age shows no obvious correlation among the males (given the small number of them this is unsurprising), but there is a correlation in all the trees for the females: the older ones (tlsg01, tlsg40,tlsg03) cluster against the younger ones (tlsg10, tlsg08, tlsg05, tlsg09).

Hypothesis formulation

1.2 Correlation with social data

Education shows a moderate correlation: most of the speakers have minimal education, but the two females with day-release level (tlsg05, tlsg09) cluster in all three trees.

Hypothesis formulation

1.3 Hypothesis

We are now in a position to answer the research question:

Is there systematic phonetic variation in the Tyneside speech community as represented by DECTE, and , if so, does that variation correlate systematically with social variables?

The hypothesis is:

There is systematic phonetic variation in the Tyneside speech community as represented by DECTE, and that variation correlates with social variables:

Newcastle speakers differ strongly from Gateshead ones in their phonetic usage.

Among Gateshead speakers, the main correlation is between gender and phonetic usage.

Among female Gateshead speakers the main correlation is with age, though there are moderate correlations with educational level.

2. Why the approach to hypothesis generation proposed in this module is intrinsically superior to traditional methods

In Arts and Humanities disciplines, hypothesis generation and testing has traditionally been based on the familiarity of the individual researcher with the domain of interest. For example:

In historical studies the historian spent many years getting to know the documentation relevant to his period and area of interest.

In literary studies the scholar did the same for literary materials.

In linguistics the linguist did the same for historical and contemporary samples of linguistic usage.

2. Why the approach to hypothesis generation proposed in this module is intrinsically superior to traditional methods

There is a fundamental problem with this traditional approach: it lacks the key scientific attributes of objectivity and replicability.

2. Why the approach to hypothesis generation proposed in this module is intrinsically superior to traditional methods

Objectivity:

It is a commonplace of philosophy in general and of philosophy of science in particular that there can be no truly objective observation of the world, and the present module is fully committed to that position.

There are, however, degrees of objectivity, and science tries to maximize objectivity by using methods which are general, that is, applicable across a range of domains and therefore not amenable to any individual researcher's presuppositions about his or her data.

Replicability: Research results in science must be amenable to peer review, which means that scientists other than those who produced the results must be in a position to repeat the relevant experiments and get the same results. The methodology used in science allows for this.

The methodology used in this module has those attributes: the proposed data creation, representation, transformation, and clustering methods are both objective in the above sense, and allow the analysis presented in the foregoing lectures to be replicated.

Traditional methods, on the other hand, lack them: they are essentially subjective and non-replicable.

The conclusion is that the approach to hypothesis generation presented in this module is the way forward for corpus-based linguistics.