The vocal tract
Overview of the vocal tract
Spoken language is articulated by manipulating parts of the body inside the vocal tract, such as the lips, tongue, and other parts of the mouth and throat. The vocal tract is often depicted in a midsagittal diagram, a special kind of diagram that represents the inside of the head as if it were split down the middle between the eyes. Midsagittal diagrams are conventionally oriented as in Figure 3.2.1, with the nostrils and lips on the left and the back of the head on the right, so that we are viewing the inside of the human head from its left side. The main regions and individual articulators of the vocal tract labelled in Figure 3.2.1 are defined and described in more detail in the rest of this section and the following sections.

Figure 3.2.1. Midsagittal diagram of the human vocal tract.
Open spaces in the vocal tract
There are three important open regions of the vocal tract, coloured in Figure 3.2.1 The oral cavity (red [greyish if you have protanopia or deuteranopia]) is the main interior of the mouth, taking up space horizontally from the lips backward. The pharynx (blue [greenish if you have tritanopia]) is behind the oral cavity and tongue, forming the upper part of what we normally think of as the throat. Finally, the nasal cavity (yellow [light pink if you have tritanopia]) is the open interior of the head above the oral cavity and pharynx, from the nostrils backward and down to the pharynx. Note that the boundaries between these regions are not precisely defined.
The bottom of the pharynx splits into two tubes: the trachea (also known as the windpipe), which leads down to the lungs, and the esophagus, which leads down to the stomach. The esophagus is not normally relevant for phonetics, but the trachea is important, since the vast majority of spoken language is articulated with air coming from the lungs and passing through the glottis (the part of the larynx that contains the vocal folds).
Producing sound: the vocal folds
At the top of the trachea is the larynx (or voice box), a rigid combination of cartilages that surround the trachea. Inside the larynx are the vocal folds (or vocal cords), which are two membranes that stretch from front to back. The vocal folds are separated by an empty space, the glottis. These structures regulate airflow through the vocal tract for most consonants and vowels in all spoken languages. When we breathe, they are open (as shown in Figure 3.2.2 on the left). When we use spoken language, there are two configurations: they can stay open, or they can move close together, causing them to vibrate and produce a sound (as shown in Figure 3.2.2 on the right).

Fig. 3.2.2: The vocal folds seen from above.
You can feel this vibration by placing your fingers on the front of your throat where the larynx is, while making the sound of a bee buzzing, like the sound of the consonant at the end of the English word buzz. If instead you make the sound of a snake hissing, like the sound of the consonant at the end of the English word bus, you should feel that there is no vocal fold vibration.
Speech sounds where the vocal folds closed and vibrating because of the airflow are called voiced and speech sounds where the vocal folds are open and not vibrating are called voiceless or unvoiced. This distinction on vocal fold vibration is generally called voicing. There are other ways in which airflow and vocal folds can be manipulated to create additional sound qualities, and these ways are collectively referred to as phonation; since in this textbook we are only concerned with the distinction between voiced and voiceless speech sounds, we will use the term voicing here.
The basic units of speech: phones
The pieces of the vocal tract can be moved in various ways to manipulate the airflow from the lungs when it passes through the larynx, pharynx and especially the oral or nasal cavity. This allows the human body to produce a wide range of sounds, many (but not all) of which are used in human languages to form words.
Those sounds that are routinely used to form words in a particular language are referred to as phones of that language. (More generally, we can use the term phone to refer to sounds that are used to form words in any language). For example, the ordinary English words spill, slip, lisp, and lips each contain four phones; in fact, all four words have the same four phones, just in different orders (with very minor variations in how they are pronounced).
As just mentioned, not all sounds we can produce with the vocal tract are used as phones. Whistles, gasps, or snorts, for example, are not. These can be used to express non-linguistic meanings (in many Western cultures, whistling may be used to express admiration or to harass people sexually, gasps may express shock, snorts may express derision, etc.), but they do not occur in ordinary words in any spoken language, and hence they are not considered phones.
Even those sounds that are used as phones are not necessarily used in a specific language: spoken languages differ considerably not just in how they use phones but also in which sounds they use as phones at all. For example, English speakers may use clicking sounds to express disapproval (the soft teeth-sucking tsk-tsk sound) or to urge a horse to go faster (the loud popping tchik sound), but these are not phones in English, because they are not used within ordinary words. However, these and similar sounds, referred to as clicks, do occur as phones in other languages, most famously, the Khoisan languages of southern Africa, but also some Bantu languages (here, most famously isiZulu), as well as Hadza (a language isolate spoken in Tanzania).
When determining the phones of a particular language, we have to be careful about what kinds of words we look at. Languages often have some marginal word-like expressions that can be used while speaking, but which may contain sounds that are not phones in the language. For example, the English expression ugh is often pronounced with a rough gravelly sound at the end. This sound occurs in Scottish English words like loch ‘lake’ or clachan ‘village’, but it is not used in other varieties of English, except in the expression ugh. We would not want to count it as a phone in those varieties based on this one expression (whose status as a word is doubtful anyway), although we would want to count it as a phone of Scottish English (and of other languages that use it regularly, such as German in words like Loch ‘hole’ or Bauch ‘belly’).
One of the most fundamental distinctions between phones is that between consonants and vowels. Both of these types of phones are produced by manipulating parts of the vocal tract, but this manipulation takes very different forms. The next three sections address how phones are articulated and how they are described and categorized in meaningful ways by linguists. Sections 3.3 and 3.4 deal with consonants, Section 3.5 deals with vowels.
CC-BY-NC-SA 4.0, Adapted from Catherine Anderson, Bronwyn Bjorkman, Derek Denis, Julianne Doner, Margaret Grant, Nathan Sanders, and Ai Taniguchi, Essentials of Linguistics. 2nd ed.; edited and restructured by Anatol Stefanowitsch.