You probably agree that in the following sentence, the three-word sequence an interesting documentary feels like a unit while the three-word sequence watched an interesting does not:
What is that feeling based on? To a large extent, it is based on meaning: the adjective interesting and the indefinite article an seem to tell us something about the referent of the word documentary, not about the verb watch, so an interesting documentary is a meaningful sequence while watched an interesting is not. But how about the following sentence:
Again, you probably agree that an occasional telenovela is a unit, while watched an occasional is not. However, in this case occasional does not tell us anything about the referent of telenovela. Telenovelas cannot be occasional β what the sentence in (2) means is that Zoe occasionally watches a telenovela. In other words, in this case, occasional does tell us something about the verb watch.
While semantic relations between words are usually an indication of which sequences in a longer sequence are units and which are not, they are not always reliable. We need tests that involve both form and meaning to determine whether a particular sequence is a unit. There are four such tests that are commonly used:
Replacement tests: try to replace a sequence of words by a single word. If this is possible without changing the meaning of the sentence drastically, the sequence is a unit.
Deletion tests: try to remove a sequence of words from the sentence. If this is possible without making the sentence incomplete and without changing the meaning of the sentence drastically, the sequence is a unit.
Fragment tests: try to use a sequence of words without the rest of the sentence, for example, as an answer to a question. If this is possible, the sequence is a unit.
Movement tests: try to rearrange the word order of the sentence or to paraphrase it using a different grammatical structure β those sequences that remain whole are units.
What we mean by βwithout changing the meaning of the sentence drasticallyβ is that the resulting sentence must be true in the same situations as the original sentence, but that it may be more general, i.e., it may be true in other situations too (donβt worry, you will get an example in a moment).
Units identified by these tests are referred to as constituents, and the tests are referred to as constituent tests. Note that while these tests are useful, they are not completely reliable. There are two potential problems that you should keep in mind.
First, not all tests can be applied in all situations. This means that if a sequence of words passes one (or even better, more than one) of these tests, it is most likely a constituent, but if it does not pass one (or even more) of these tests, that does not mean that it is not a constituent. However, if a sequence fails all tests and is not a meaningful sequence, it is unlikely to be a constituent.
Second, the tests do not always give us a clear result. Even in our first language, there may be cases where we do not have a clear intuition whether a constituent passes a particular test or even whether it passes any of these tests. While this may be frustrating, it is due to the fact that human language is a very complex set of capacities and behaviors, as is our ability to interpret it, to find meaning and structure in it. So donβt get frustrated when you hit upon sentences that defy clear answers (once you are more skilled in the analysis of linguistic structure, you may even find that such cases are the most interesting ones)!
You should be careful when applying these tests to a language that is not your first language. If you know the language well, have been exposed to it and used it for a long time in many different forms and situations, you should be able to apply these tests with results that are very similar to those of someone using it as a first language. However, even in this case you should be aware that your assessment can be influenced by your first language. If you do not know a language well, you will not be able to apply the tests successfully. In this case, you have two choices. Either you can find someone using it as a first language and ask them to assess the sentences for you β in linguistics, we refer to such language users as βlanguage informantsβ or, more recently, βlanguage consultantsβ. Or you can use other types of evidence β for example, language corpora (recall the discussion in SectionΒ 1.4).
The basic idea behind the replacement test is that a sequence of words is a constituent if we can replace it by a single word. The reasoning is the following: we know that words are constituents, so anything that can be replaced by a word must also be a constituent. By the same reasoning, we can try to replace a sequence by a different sequence which we already know to be a constituent β if this is possible, again, the replaced sequence must also be a constituent.
You probably agree that, in (3a) a documentary and in the cafeteria both feel like units, while watched a or documentary in do not. And indeed, we can replace a documentary by it (see 4a), and in the cafeteria by there (see 4b), but we will not find a word that can replace watched a in (4c) or documentary in in (4d):
Sentences (4a) and (4b) also illustrate our comments about the meaning of the original and the resulting sentences. Both sentences are true in the same situations in which the original sentence in (3a) is true: for example, if we can truthfully say Zoe watched a documentary in the cafeteria it is also true that Zoe watched it in the cafeteria. However, (4a) and (4b) are also true in other situations: for example, we could truthfully say Zoe watched it in the cafeteria if Zoe watched a romantic comedy, a pro wrestling match, a historical drama, a Zombie film, etc.
So what about (3b)? Again, it may seem that a documentary and about renewable energy are units, but if we try to replace a documentary by it, we get an ungrammatical sentence (see 5a). If, on the other hand, we replace a documentary about renewable energy by it, we get a grammatical sentence:
Let us represent constituents by enclosing them in boxes β this helps illustrate the differences between (3a), shown in FigureΒ 8.2.1 and (3b), shown in FigureΒ 8.2.2:
Now, in both sentences, we can try to replace sequences within these constituents β both the cafeteria and renewable energy could also be replaced by it:
What about watched? Does it form a constituent with a documentary and/or in the cafeteria in (3a)? We obviously cannot replace the sequences watched a documentary, watched a documentary in the cafeteria or watched β¦ in the cafeteria with the word it, but we can replace some of them by the expression do the same (or similar expressions like do so):
In (7a), did the same replaces watched a documentary, in (7b) it replaces watched a documentary in the cafeteria, which leads to grammatical sentences, so both of these sequences are constituents. In contrast, in (7c) it replaces watched in the cafeteria, which is not grammatical, so watched and in the cafeteria do not form a constituent without a documentary (see FigureΒ 8.2.5).
The basic idea behind a deletion test is the following: if we can delete a sequence of words, it must consist of one or more constituents; if we cannot delete it, it only forms part of one or more constituents.
Letβs start with the same example as before, the sequences a documentary, in the cafeteria and watched a or documentary in in (3a). The first two sequences pass the deletion test, the third does not:
The sentence in (8a) is true if Zoe watches something in the cafeteria (it assumes that the context provides a hint as to what that something is), (8b) is true if Zoe watches a documentary in the cafeteria or anywhere else. But while (8c) is not ungrammatical, it is not true in the same situation as (3a) β just because Zoe watches a documentary in the cafeteria, this does not mean that she watches the cafeteria.
The deletion test is often helpful where we cannot replace a sequence of words by some other expression simply because there is no suitable expression. For example, we might suspect that about renewable energy is a constituent in (3b), there is no word or shorter expression with the same meaning. However, we can delete the sequence and get a sentence that is true in the same situation:
If Zoe watched a documentary about renewable energy, then it is also true that she watched a documentary. This gives us an even more complete picture of the units of which (3a) and (3b) consist (see FigureΒ 8.2.7 and FigureΒ 8.2.8 respectively)
When applying the deletion test, you always have to make sure that you are not deleting two separate constituents at once, as this might lead you to believe that they form a single constituent: when you can delete a sequence of words, always test whether you can also delete any parts of that sequence independently of each other. Keeping this in mind, use the deletion test to determine whether the underlined sequences in the following sentences consist of one constituent or of two constituents:
The basic idea behind a fragment test is the following: if we can use a particular sequence of words on its own, for example, as a response in a conversation, it must be a constituent.
These fragment tests confirm the results of the replacement test β they identify the same constituents. In addition, we can use the fragment test to test sequences that feel like constituents but which we cannot delete and for which there is no word that can sensibly replace them:
The basic idea behind movement tests is the following: if a sequence of words forms a constituent, then that sequence should naturally remain whole when the sentence is rearranged, for example, by moving a potential constituent to the front of the sentence, as in (15a, b) or by using an itβcleft, as in (16a, b):
However, you should treat the movement test with caution, as rearranging a sentence, as in (15), or paraphrasing it, as in (16), is very different from the other tests mentioned: the results depend very much on the grammatical properties of the constituents in question and their interaction with the structures you use to rearrange them.
Other languages have other such quirks when it comes to constituent status. For example, in German, constituents like ΓΌber erneuerbare Energie βabout renewable energyβ cannot be separated. However, constituents like erneuerbare Energie βrenewable energyβ can:
In other words, the behavior of constituents across different grammatical constructions is fascinating and full of complexities, but it cannot reliably be used to identify constituents in the first place.