In Chapter 3, we saw that users of spoken languages are able to produce a range of different speech sounds using anatomical structures along the passage of air flowing from the lungs. They can close their glottis in order to produce vowels or voiced consonants, or open it to produce voiceless consonants. They can shape the airflow by raising and lowering their tongue and moving it forward and backward, producing vowels with different degrees of height and frontness, they can round or spread their lips to make the vowels rounded or unrounded. They can stop the airflow completely and then release it, producing plosive consonants, or force it through a very narrow opening, producing fricatives, and they can do so in different places, for example, by pressing their lower and upper lip together, by pressing the tip of the tongue against the alveolar ridge or the back of the tongue against the velum.
Each one of these and the many other ways of manipulating their glottis, velum, tongue, lips and teeth produces a speech sound with unique acoustic properties that we can measure (for example, using a spectrogram). However, not all of these acoustic properties are relevant to users of a given spoken language. Take the words pear, spare, and bear. All three of them contain a bilabial plosive. In the first two words, this bilabial plosive is voiceless — they would broadly be transcribed as [pɛɹ] and [spɛɹ] for many accents of English. In the third word, the bilabial plosive is voiced — [bɛɹ]. However, on closer inspection the [p]’s in the first two words are not identical: Hold your palm up to your lips and say them one after the other. You should feel a little burst of air against your palm for pear, but not for spare. Figure 4.1.1 shows spectrograms of an American speaker of English saying the three words: the burst of air is clearly visible as noise across the entire spectrum in pear, but not in spare.

Figure 4.1.1. Spectrograms of the words pear, spare and bear pronounced by a speaker of American English.
This burst of air is referred to as aspiration in the study of speech sounds, in the IPA, it is represented by a superscript [ʰ], so, a more precise transcription of the word pear would be [pʰɛɹ]. There is no aspiration in the word spare. In fact, the [p] in [spɛɹ] looks almost like the [b] in [bɛɹ]; you have to look very closely to see the difference: in [bɛɹ], there is more acoustic energy across the spectrum during the articulation of the bilabial plosive — this is because the vocal folds are already vibrating. You can feel the presence and absence of vibration by lightly holding the tip of your index finger against your thyroid cartilage (the “adam’s apple”) while saying the words bear and spare.
So, all three bilabial plosives are actually different from each other: voiceless aspirated in pear, voiceless unaspirated in spare and voiced unaspirated in bear. Nevertheless, we perceive the bilabial plosives in pear and spare to be the same, and the one in bear as different. This is not because the difference between aspirated and non-aspirated sounds is smaller than the difference between voiced and voiceless sounds. On the contrary: if you listen to the words very closely, you will find that the unaspirated voiceless [p] sounds more like a [b] than like a [pʰ]!
Instead, the reason is, roughly speaking, that in English, a difference in aspiration is never associated with a difference in meaning. No two words are distinguished just by the fact that one of them contains an aspirated consonant and one an unaspirated consonant with the same place and manner of articulation. But a difference in voicing is almost always associated with a difference in meaning: there are many words that are distinguished just by the fact that one of them contains a voiced consonant and one a voiceless consonant with the same place and manner of articulation. Bear and pear, obviously, but also down and town, veil and fail, gap and cap, phase and face, ridge and rich, and so on.
Learning a language means learning to pay attention to which aspects of its speech sounds are important in the system of that language and which are not. This is not a conscious process, it is something your brain does automatically, at least in first language acquisition: not only does the human brain split the speech signal into discrete segments, it also categorizes them in such a way that all sounds that share the same relevant features fall into the same category, even though they may differ perceptively and acoustically with respect to other features. The study of such categories of speech sounds is called phonology. Which features are relevant and which features are not can differ across languages: in Hindi, aspiration is a relevant category, as there are words distinguished only by whether a particular consonant is aspirated or not. For example, [bɑːluː] (बालू) means ‘sand’ while [bʰɑːluː] (भालू) means ‘bear’.
Categories of speech sounds that share relevant features are called phonemes, we will look at them more closely in Sections 4.2 and 4.4. The different sounds within such a category are called allophones. We can often predict where a particular allophone of a phoneme will occur — this may depend, for example, on which speech sounds come before or after it, something we will discuss in Sections 4.3 and 4.4. It may also depend on where the phoneme occurs in the larger phonological unit referred to as syllable (we have already discussed syllables in Section 3.7). Languages also differ in how phonemes may be combined — the study of this is called phonotactics, we will discuss it in Section 4.5.
Phonology also deals with meaningful aspects of the speech signal that transcend individual segments, for example, stress and intonation. These are discussed in Sections 4.6 and 4.7 respectively.
To sum up: phonetics is the study of physical speech sounds — how we humans produce an acoustic signal using our articulatory organs, what measurable properties that signal has and how the delicate mechanism in our ears picks it up and transmits it to our brain via the auditory nerve. Phonology, in contrast, is the study of the mental representation of speech sounds after our brains have categorized them according to which of their phonetic properties are potentially relevant in a given language.
CC-BY-NC-SA 4.0, Written by Anatol Stefanowitsch