Constituent tests

Section 7.2 Constituent tests

You probably agree that in the following sentence, the three-word sequence an interesting documentary feels like a unit while the three-word sequence watched an interesting does not:

🔗

(1) 🔗: Zoe watched an interesting documentary.

🔗

What is that feeling based on? To a large extent, it is based on meaning: the adjective interesting and the indefinite article an seem to tell us something about the referent of the word documentary, not about the verb watch, so an interesting documentary is a meaningful sequence while watched an interesting is not. But how about the following sentence:

🔗

(2) 🔗: Zoe watched an occasional telenovela (but most of the time, she watched documentaries).

🔗

Again, you probably agree that an occasional telenovela is a unit, while watched an occasional is not. However, in this case occasional does not tell us anything about the referent of telenovela. Telenovelas cannot be occasional — what the sentence in (2) means is that Zoe occasionally watches a telenovela. In other words, in this case, occasional does tell us something about the verb watch.

🔗

While semantic relations between words are usually an indication of which sequences in a longer sequence are units and which are not, they are not always reliable. We need tests that involve both form and meaning to determine whether a particular sequence is a unit. There are four such tests that are commonly used:

🔗

Replacement tests: try to replace a sequence of words by a single word. If this is possible without changing the meaning of the sentence drastically, the sequence is a unit.

🔗
Deletion tests: try to remove a sequence of words from the sentence. If this is possible without making the sentence incomplete and without changing the meaning of the sentence drastically, the sequence is a unit.

🔗
Fragment tests: try to use a sequence of words without the rest of the sentence, for example, as an answer to a question. If this is possible, the sequence is a unit.

🔗
Movement tests: try to rearrange the word order of the sentence or to paraphrase it using a different grammatical structure — those sequences that remain whole are units.

🔗

🔗

What we mean by “without changing the meaning of the sentence drastically” is that the resulting sentence must be true in the same situations as the original sentence, but that it may be more general, i.e., it may be true in other situations too (don’t worry, you will get an example in a moment).

🔗

Units identified by these tests are referred to as constituents, and the tests are referred to as constituent tests. Note that while these tests are useful, they are not completely reliable. There are two potential problems that you should keep in mind.

🔗

First, not all tests can be applied in all situations. This means that if a sequence of words passes one (or even better, more than one) of these tests, it is most likely a constituent, but if it does not pass one (or even more) of these tests, that does not mean that it is not a constituent. However, if a sequence fails all tests and is not a meaningful sequence, it is unlikely to be a constituent.

🔗

Second, the tests do not always give us a clear result. Even in our first language, there may be cases where we do not have a clear intuition whether a constituent passes a particular test or even whether it passes any of these tests. While this may be frustrating, it is due to the fact that human language is a very complex set of capacities and behaviors, as is our ability to interpret it, to find meaning and structure in it. So don’t get frustrated when you hit upon sentences that defy clear answers (once you are more skilled in the analysis of linguistic structure, you may even find that such cases are the most interesting ones)!

🔗

You should be careful when applying these tests to a language that is not your first language. If you know the language well, have been exposed to it and used it for a long time in many different forms and situations, you should be able to apply these tests with results that are very similar to those of someone using it as a first language. However, even in this case you should be aware that your assessment can be influenced by your first language. If you do not know a language well, you will not be able to apply the tests successfully. In this case, you have two choices. Either you can find someone using it as a first language and ask them to assess the sentences for you — in linguistics, we refer to such language users as “language informants” or, more recently, “language consultants”. Or you can use other types of evidence — for example, language corpora (recall the discussion in Section 1.2).

🔗

Let us now look at each test in turn.

🔗

Subsection Replacement tests

The basic idea behind the replacement test is that a sequence of words is a constituent if we can replace it by a single word. The reasoning is the following: we know that words are constituents, so anything that can be replaced by a word must also be a constituent. By the same reasoning, we can try to replace a sequence by a different sequence which we already know to be a constituent — if this is possible, again, the replaced sequence must also be a constituent.

🔗

This may sound more complicated than it is. Consider the following sentences:

🔗

(3a) 🔗: Zoe watched a documentary in the cafeteria.

🔗

(3b) 🔗: Zoe watched a documentary about renewable energy.

🔗

You probably agree that, in (3a) a documentary and in the cafeteria both feel like units, while watched a or documentary in do not. And indeed, we can replace a documentary by it (see 4a), and in the cafeteria by there (see 4b), but we will not find a word that can replace watched a in (4c) or documentary in in (4d):

🔗

(4a) 🔗: Zoe watched it in the cafeteria.

🔗

(4b) 🔗: Zoe watched a documentary there.

🔗

(4c) 🔗: Zoe ___ documentary in the cafeteria.

🔗

(4d) 🔗: Zoe watched a ___ the cafeteria.

🔗

Sentences (4a) and (4b) also illustrate our comments about the meaning of the original and the resulting sentences. Both sentences are true in the same situations in which the original sentence in (3a) is true: for example, if I can truthfully say Zoe watched a documentary in the cafeteria it is also true that Zoe watched it in the cafeteria. However, (4a) and (4b) are also true in other situations: for example, I could truthfully say Zoe watched it in the cafeteria if Zoe watched a romantic comedy, a pro wrestling match, a historical drama, a Zombie film, etc.

🔗

So what about (3b)? Again, it may seem that a documentary and about renewable energy are units, but if we try to replace a documentary by it, we get an ungrammatical sentence (see 5a). If, on the other hand, we replace a documentary about renewable energy by it, we get a grammatical sentence:

🔗

(5a) 🔗: * Zoe watched it about renewable energy.

🔗

(5b) 🔗: Zoe watched it.

🔗

So, it seems that in (3b), a documentary about renewable energy is a constituent, but a documentary is not.

🔗

Let us represent constituents by enclosing them in boxes — this helps illustrate the differences between (3a), shown in Figure 7.2.1 and (3b), shown in Figure 7.2.2:

🔗

described in detail following the image — Figure 7.2.1. Box diagram of the sentence *Zoe watched a documentary in the cafeteria*
🔗

Now, in both sentences, we can try to replace sequences within these constituents — both the cafeteria and renewable energy could also be replaced by it:

🔗

(6a) 🔗: Zoe has never eaten in the cafeteria, but she watched a documentary in it.

🔗

(6b) 🔗: Zoe is interested in renewable energy, so she watched a documentary about it.

🔗

In other words, constituents can contain other constituents:

🔗

What about watched? Does it form a constituent with a documentary and/or in the cafeteria in (3a)? We obviously cannot replace the sequences watched a documentary, watched a documentary in the cafeteria or watched … in the cafeteria with the word it, but we can replace some of them by the expression do the same (or similar expressions like do so):

🔗

(7a) 🔗: Zoe watched a documentary in the cafeteria and Aylin did the same on the tram.
🔗

🔗

(7b) 🔗: Zoe watched a documentary in the cafeteria and Aylin did the same.
🔗

🔗

(7c) 🔗: * Zoe watched a documentary in the cafeteria and Aylin did the same a game show.
🔗

🔗

In (7a), did the same replaces watched a documentary, in (7b) it replaces watched a documentary in the cafeteria, which leads to grammatical sentences, so both of these sequences are constituents. In contrast, in (7c) it replaces watched in the cafeteria, which is not grammatical, so watched and in the cafeteria do not form a constituent without a documentary (see Figure 7.2.5).

🔗

Question 7.2.6.

Use the replacement test to identify constituents in the following sentences:

🔗

(i) Aylin and Zoe water the trees with dry roots in the summer.

🔗

(ii) Aylin made very spicy tapas for Zoe and her neighbor last week.

🔗

(iii) Aylin‘s neighbor always forgets the girl with the green hair’s name.

🔗

Subsection Deletion tests

The basic idea behind a deletion test is the following: if we can delete a sequence of words, it must consist of one or more constituents; if we cannot delete it, it only forms part of one or more constituents.

🔗

Let’s start with the same example as before, the sequences a documentary, in the cafeteria and watched a or documentary in in (3a). The first two sequences pass the deletion test, the third does not:

🔗

(8a) 🔗: Zoe watched in the cafeteria.

🔗

(8b) 🔗: Zoe watched a documentary.

🔗

(8c) 🔗: Zoe watched the cafeteria.

🔗

The sentence in (8a) is true if Zoe watches something in the cafeteria (it assumes that the context provides a hint as to what that something is), (8b) is true if Zoe watches a documentary in the cafeteria or anywhere else. But while (8c) is not ungrammatical, it is not true in the same situation as (3a) — just because Zoe watches a documentary in the cafeteria, this does not mean that she watches the cafeteria.

🔗

The deletion test is often helpful where we cannot replace a sequence of words by some other expression simply because there is no suitable expression. For example, we might suspect that about renewable energy is a constituent in (3b), there is no word or shorter expression with the same meaning. However, we can delete the sequence and get a sentence that is true in the same situation:

🔗

(9) 🔗: Zoe watched a documentary.

🔗

If Zoe watched a documentary about renewable energy, then it is also true that she watched a documentary. This gives us an even more complete picture of the units of which (3a) and (3b) consist (see Figure 7.2.7 and Figure 7.2.8 respectively)

🔗

Question 7.2.9.

When applying the deletion test, you always have to make sure that you are not deleting two separate constituents at once, as this might lead you to believe that they form a single constituent: when you can delete a sequence of words, always test whether you can also delete any parts of that sequence independently of each other. Keeping this in mind, use the deletion test to determine whether the underlined sequences in the following sentences consist of one constituent or of two constituents:

🔗

(i) Zoe baked bagels with Aylin.

🔗

(ii) Zoe baked bagels with cinnamon.

🔗

(iii) Zoe prepared the sourdough on Tuesday last week.

🔗

Subsection Fragment tests

The basic idea behind a fragment test is the following: if we can use a particular sequence of words on its own, for example, as a response in a conversation, it must be a constituent.

🔗

Consider the utterances in (10), (11) and (12) and the responses:

🔗

(10) 🔗: What did Zoe watch?
(10a) 🔗: A documentary.
(10b) 🔗: A documentary about renewable energy.
(10c) 🔗: * A documentary in the cafeteria.

🔗

(11) 🔗: What did Zoe do?
(11a) 🔗: Watch a documentary in the cafeteria.
(11b) 🔗: Watch a documentary.
(11c) 🔗: * Watch in the cafeteria.

🔗

(12) 🔗: Where did Zoe watch a documentary?
(12a) 🔗: In the cafeteria.
(12b) 🔗: * The cafeteria.

🔗

These fragment tests confirm the results of the replacement test — they identify the same constituents. In addition, we can use the fragment test to test sequences that feel like constituents but which we cannot delete and for which there is no word that can sensibly replace them:

🔗

(13) 🔗: Zoe looked tired.

🔗

(14a) 🔗: * Zoe looked.
(14b) 🔗: Zoe looked [ … ]
(14c) 🔗: How did Zoe look? — Tired.

🔗

Question 7.2.10.

Use the fragment test to test whether the underlined sequences in the following sentences are constituents:

🔗

(i) Aylin seasoned the patatas bravas with rosemary and garlic.

🔗

(ii) Zoe ate too many patatas bravas again.

🔗

(iii) Zoe likes Aylin’s patatas bravas very much.

🔗

Which of these constituents could you also identify using a) the replacement test and b) the deletion test?

🔗

Subsection Movement tests

The basic idea behind movement tests is the following: if a sequence of words forms a constituent, then that sequence should naturally remain whole when the sentence is rearranged, for example, by moving a potential constituent to the front of the sentence, as in (15a, b) or by using an it‑cleft, as in (16a, b):

🔗

(15a) 🔗: In the cafeteria, Zoe watched a documentary.

🔗

(15b) 🔗: A documentary, Zoe watched in the cafeteria.

🔗

(16a) 🔗: It was in the cafeteria that Zoe watched a documentary.

🔗

(16b) 🔗: It was a documentary that Zoe watched in the cafeteria.

🔗

However, you should treat the movement test with caution, as rearranging a sentence, as in (15), or paraphrasing it, as in (16), is very different from the other tests mentioned: the results depend very much on the grammatical properties of the constituents in question and their interaction with the structures you use to rearrange them.

🔗

For example, consider the following sentences:

🔗

(17a) 🔗: Zoe was talking about renewable energy.

🔗

(17b) 🔗: It was renewable energy that Zoe was talking about.

🔗

(17c) 🔗: * It was about renewable energy that Zoe was talking.

🔗

Other languages have other such quirks when it comes to constituent status. For example, in German, constituents like über erneuerbare Energie ‘about renewable energy’ cannot be separated. However, constituents like erneuerbare Energie ‘renewable energy’ can:

🔗

(18a) 🔗: Zoe nutzt nur erneuerbare Energie.
🔗

lit. ‘Zoe uses only renewable Energy’
🔗

🔗

(18b) 🔗: Energie nutzt Zoe nur erneuerbare.
🔗

lit. ‘Energy uses Zoe only renewable’
🔗

🔗

This is not the case in English, where Energy, Zoe and Aylin use only renewable is not a possible sentence.

🔗

In other words, the behavior of constituents across different grammatical constructions is fascinating and full of complexities, but it cannot reliably be used to identify constituents in the first place.

🔗

Subsection

CC-BY-NC-SA 4.0. Written by Anatol Stefanowitsch

🔗

Prev Top Next