So, why do languages have allophony? If our brains treat the different allophones of a phoneme as belonging to the same category, why have different allophones at all? In some cases, the properties of the different allophones are an almost automatic result of articulatory processes. In other cases, there is a more indirect connection between articulatory processes and allophony, and in yet other cases, there is no connection at all. Let us look at these different situations in more detail.
Motivations
A very clear example of allophones that are directly related to articulatory processes is that of the different allophones of /k/ that we asked you to think about in the preceding section. When followed by a back vowel (as in cool), /k/ is produced with the back of the tongue touching the velum, but when followed by a front vowel (as in keen), the point of contact is much further to the front, almost at the hard palate. This is an instance of a process called assimilation, where one or more features of a phone change in the direction of a phone occurring directly before or after it: in front vowels, the position of the tongue is further to the front of the oral cavity, in back vowels it is further to the back. When pronouncing words like keen and cool, the tongue already takes the position needed for the vowel when producing the preceding consonant, so the [k] is further to the front before front vowels and further to the back before back vowels.
Another example of assimilation is seen in the following data (again, from American English):
- [kʰæ̃n] can
- [kʰæt] cat
- [dɑɡ] dog
- [dɑ̃n] dawn
- [bid] bead
- [bĩm] beam
- [bãɪ̃nd] bind
- [baɪt] bite
- [lɪt] lit
- [lɪ̃m] limb
Some vowels are nasal (indicated by the tilde (˜) above the vowel character). Recall from Section 3.5 that this means they are produced with the velum lowered to allow airflow through the nasal cavity as well as the oral cavity. If you take a closer look at the distribution, you will notice that all the nasal vowels occur before nasal phones: [n] or [m]. The reason they are nasal is because speakers begin to lower the velum in anticipation of the nasal consonant while they are still producing the vowel.
Now, nasalization before a nasal consonant is a natural result of the way these phones are produced in sequence, so it is not surprising that we find nasal allophones of vowels before nasal consonants in many languages. However, “not surprising” does not mean that things have to be this way: speakers are able to control the lowering and raising of their velum during the articulation of a vowel very precisely, so while it is natural to lower it before a nasal, it is not necessary. Speakers can produce non-nasal vowels even before nasals, and nasal vowels even in contexts where no nasal is following, so it is possible for nasal and non-nasal vowels to function as different phonemes rather than allophones. This is the case, for example, in French, where we find minimal pairs like [pɛ] paix ‘peace’ vs. [pɛ̃] pain ‘bread’, [gʀɑ] gras ‘fat’ vs. [gʀɑ̃] grand ‘great’ and [bo] beau ‘beautiful’ vs. [bõ] bon ‘good’.
Another example of a natural, but not necessary process is the difference between aspirated and unaspirated plosives in English. We need more acoustic energy for stressed syllables and less for unstressed syllables, and acoustic energy is determined by the flow of air from the lungs. Put simply, the air pressure is higher when we produce a stressed syllable, and it is highest at the beginning of such a syllable. Thus, it is not surprising that some of this energy is released as a burst of air when we have a voiceless plosive ([p], [k] or [t]) at the beginning of a stressed syllable — we get aspiration. In contrast, when the plosive is preceded by an [s] — the only phone in English that can occur before voiceless plosives in the onset of English syllables — the articulation of the fricative [s] already releases quite a bit of the excess air pressure, so it is not surprising that there is no burst of air accompanying the release of the plosive [p], [t] or [k]. And if the voiceless plosive occurs in the coda, most of the air pressure has already been used up in articulating onset and nucleus, so, again, it is not surprising that there is no burst of air. (Note that the mechanism described here is only one of the reasons for the distribution of aspirated and non-aspirated plosives in Engish, the second reason has to do with the time it takes the vocal folds to start vibrating in order to produce the syllable nucleus, but let us keep things simple).
But again, saying that the distribution of aspirated and non-aspirated plosives “is not surprising” does not mean that things have to be the way they are in English. There are languages that do not have aspirated plosives at all or that do not have aspiration for all voiceless plosives, so obviously it is possible for our brain and articulatory organs to control the airflow in such a way that no burst of air is released.
One such language is Austrian German — if you are a speaker of another variety of German, the Austrian pronunciation of words like Paar ‘couple’, or Teich ‘pond’ will sound like Bar ‘bar’ and Deich ‘dyke’ to you. If you are not a native German speaker, you can hear that property of Austrian German by listening to the 1984 European hit Live is Life by the Austrian band Opus. It contains the line When we all give the power, we all give the best which, due to the Austrian accent of the singer, sounds a bit like When we all give the bower….
Conversely, there are languages that use aspiration to create phonemic contrasts, so it is obviously possible for our brain to intentionally release this burst of air in some places but not others. One such language is Hindi, which we already showed in Section 4.2 using the words [bʰɑːluː] ‘bear’ and [bɑːluː] ‘sand’.
But both of these processes — suppressing aspiration altogether or controlling it to create phonemic contrasts — take additional energy. In this case, the distribution in English is the one that is most natural in terms of articulation, so we are less surprised when we find languages with a distribution like that in English than when we find languages with a distribution like that in Austrian German or Hindi.
As mentioned at the beginning of this section, not all cases of allophony have an obvious explanation in terms of the natural process of articulation. For example, some British dialects have the glottal stop [ʔ] as an allophone of [t] between vowels or between a vowel and a syllabic [n̩] or [l̩], as in [ˈbʌ.ʔə] butter, [ˈbʌ.ʔn̩] button or [ˈbɒ.ʔl̩] bottle. These phones are not at all alike in terms of their place of articulation and there is nothing about the articulation of vowels that would make it easier to articulate a [ʔ] as opposed to a [t].
This does not mean that the ultimate motivation for the allophony has nothing to do with articulation — the variant [ʔ] emerged through a process of language change where speakers first started co-articulating the glottal stop whenever they articulated the [t] — a process sometimes called glottal strengthening — and then they stopped articulating the [t] altogether, leaving behind just the glottal stop. As its name suggests, the function of glottal strengthening was (or is, in dialects that still have it) to strengthen the perceptibility of the phone [t], which has a tendency to weaken between vowels — note that it has become a voiced alveolar flap in many dialects of English, especially in North America, giving us pronunciations like [ˈbʌ.ɾə] butter. But this is a historical process that explains how the allophony came about, it is not a natural process that would predict the distribution of allophones in present-day Englishes — as shown by the fact that most dialects of English do not have this allophony, nor does any other language.
Limits
There are cases of phones that are in complementary distribution — i.e., that cannot occur in the same phonetic environments — in a given language, but that we would not want to classify as allophones. Take [h] and [ŋ] in English: [h] can only occur at the beginning of a syllable, [ŋ] can only occur at the end of a syllable. Thus, they could never be used to create a minimal pair in English, and we could easily write a phonological rule predicting their distribution:
/h/ | → | [h] / $ _ |
→ | [ŋ] / _ $ |
According to the criteria of contrastive and complementary distribution, we could treat them as allophones. However, recall that allophones are phones that our brain classifies as identical. In doing so, it ignores small differences that are often the result of articulatory processes. But the difference between [h] and [ŋ] isn’t small: the two phones differ on every single dimension: one is voiceless, one is voiced; one is an oral fricative, one is a nasal; one is glottal, one is velar. If our brain were to ignore the differences between them, there would be nothing left! This is why most linguists posit an additional criterion for phoneme status: phonetic similarity. In order to be clearly classified as allophones of a single phoneme, two phones must be a) in complementary distribution, and b) phonetically similar. If either of these conditions is not met, they are classified as distinct phonemes. The reason for this criterion is clear, but it is also clear that it will occasionally present us with difficult choices: how similar do two phones have to be in order to be classified as allophones? What about [t] and [ʔ], with their completely different places of articulation? Fortunately, such cases are relatively rare, but they demonstrate that linguistic reality is usually more complex than our models of it!
There is also the opposite case of phones that are in contrastive distribution — i.e., that can occur in the same phonetic environments — in a given language, but that we would not want to classify as distinct phonemes, even though they can be phonetically quite different. Take the example of [t] and [ʔ] again: for many speakers, these two phones are in contrastive distribution in that they can both occur between vowels. Such speakers have, for example, both [ˈbʌ.ʔə] and [ˈbʌ.tə] in their vocabulary. The reason we do not want to classify [t] and [ʔ] as distinct phonemes for these speakers is that even though they occur in the same environments, they never form a linguistically significant contrast. Using one or the other never changes the meaning of a word, for example [ˈbʌ.ʔə] and [ˈbʌ.tə] both mean ‘butter’. Such phones are treated as a special type of allophone; they are said to be in free variation: it is impossible to write a phonological rule predicting their distribution. However, free variation does not mean that the phones occur randomly. Typically, they belong to different dialects (varieties spoken by speakers from different regions), sociolects (varieties spoken by speakers of different social strata) or registers (varieties spoken in different situations). Speakers who sometimes say [‘bʌ.ʔə] and sometimes [‘bʌ.tə] do so because they change between dialects, sociolects or registers. Thus, the variation is “free” only from the perspective of phonological structure — it is determined by factors outside of language itself.
CC-BY-NC-SA 4.0, Written by Anatol Stefanowitsch