Studying language scientifically

Section 1.4 Studying language scientifically

We said that linguistics is the scientific study of human language. Let us take a more detailed look at what makes linguistics a science. You might be thinking of a scientist as someone who wears a lab coats and uses a scientific instruments to perform measurements, but although there are branches of lingustic research people do just that, such as phonetics (see Chapter 5) or psycholinguistics, that is not what makes them scientists. What makes them, and all non-lab-coat-wearing linguists scientists is that we approach language using the scientific way of thinking.

🔗

Subsection Thinking about language scientifically

The scientific way of thinking about language involves systematic, empirical study. The word empirical means that we base our ideas about language on data that we gather by observing how people use their language in natural settings, or by giving them linguistic tasks in a laboratory and recording their responses. In this way, linguistics is no different from other sciences — entomologists observe the life cycles and habitats of insects, chemists observe how substances interact, linguists observe how people use language. Just like entomologists and chemists, linguists aim for an accurate description of the phenomenon they’re studying. And like other scientists, linguists strive to make observations that are not value judgments. If an entomologist observes that a certain species of beetle eats leaves, she’s not going to judge that the beetles are eating wrong, and tell them that they’d be more successful in life if only they ate the same thing as ants. The same is true of linguists — we do not go around telling people how they should or shouldn’t use language. Or at least, that is what would be the case in an ideal world. Of course, like all scientists, and like all humans, linguists have biases that often prevent us from reaching this ideal — unlike entomologists or chemists, we are not impartial observers of organisms or substances different from ourselves, but we are part of the very thing we study — users of one or more languages, members of one or more language communities with all the cultural biases that other members of these communities have. Thus, it is more difficult for us to adhere to the scientific ideal, and we must try harder to do so.

🔗

What makes things even more difficult is that language communities tend to have their own traditions of thinking and talking about language, and those traditions can be at odds with the scientific approach to language. In many language communities, they are prescriptive, focused on literary aesthetics, or both. These approaches seems natural to members of such communities — in part, because they are woven into our education systems —, which means that they — and perhaps, you — may find the purely descriptive approach that linguists strive for irritating.

🔗

We illustrated the difference between a linguistic approach and a prescriptive approach in Section 1.2, discussing the way in which linguists might approach words like irregardless or grammatical phenomena like prepositions standing at the end of a sentence, while the noun they modify occurs somewhere else.

🔗

We saw that, as linguists, our goal is to make descriptive, empirical observations of language. In doing so, we have two problems, a practical one and a theoretical one. The practical problem concerns the availability of data. If we are interested in a particular question, such as who uses the word irregardless, it would be very inconvenient if we had to walk around and wait for speakers to use it — it is so rare that we may never encounter it in the course of our entire working life! Linguists get around this problem by collecting vast amounts of natural language data in electronic form that we can then search for words (or other linguistic phenomena) using a computer. We could also perform a simple experiment and present speakers of English with sentences containing the word to see if they notice anything unusual — this is easy enough for a simple question, but it becomes more and more difficult the larger and more complex our questions become. Linguists do rely on experiments quite a bit, but such experiments are time-consuming and expensive.

🔗

The theoretical problem concerns the interpretation of data: as linguists, we are interested not just in the specific linguistic behaviors that people display in a specific situation, but in the subconscious linguistic knowledge that guides these behaviors. That knowledge cannot be observed directly. We cannot just cut open language users’ heads in the way our entomologist friend might dissect a beetle, and even if we could, we would not find any linguistic knowledge there. Thus, we have to deduce a model of the subconscious knowledge of language users from their behavior (in the wild or in the laboratory). This is a very difficult task indeed.

🔗

Subsection Metalinguistic knowledge as a source of empirical data

One solution that many linguists propose to both of the problems mentioned above is to access our own metalinguistic awareness as fluent speakers or signers of a language. As mentioned above, as linguists we are part of what we study — why not turn this into an advantage? We already have the subconscious knowledge that we are after — so the argument goes —, all we have to do is access that knowledge.

🔗

This is an attractive idea, and to some extent we can make it work. Here’s an example of accessing your metalinguistic awareness. Say you want to create a new English word for a creature that you have invented for a game. Would you rather call that creature a blifter (pronounced like blister with an ⟨f⟩ sound instead of an ⟨s⟩ sound) or a lbifter (pronounced like lifter with a ⟨b⟩ sound inserted after the ⟨l⟩ sound)? Neither of those forms exists in English, but you probably have a strong feeling that blifter would be a perfectly ordinary English word for your creature, while naming it lbitfer is not. Notice that your feeling that something is wrong with the word lbifter is not based on normative ideas — it’s not that you will sound uneducated, rude or racist if you combine those sounds that way. Instead your feeling is based on an intuitive knowledge that this sequence of sounds just … can’t happen in English. You’ve made a descriptive observation that lbifter is not a possible word in English. From that observation, we can conclude that lbitfer violates some part of the subconscious knowledge of the English language that fluent language users have.

🔗

Since many linguists use the term mental grammar to describe that subconscious knowledge — including the knowledge not just of what we colloquially call grammar, but also of the sounds, words and bits of words —, we could say that the word blifter is grammatical and the word lbitfer is ungrammatical in English. An ungrammatical word or phrase or sentence is something that just can’t exist in a particular language: the mental grammar of that language does not contain any rules or representations that would allow language users to produce it. Thus, grammaticality isn’t about what actually exists in a language; it’s about what could exist. In this example, neither blifter nor lbitfer exist in English, but although they have the same sounds in them blifter could be an English word and lbitfer couldn’t.

🔗

It’s often useful to compare similar words, phrases or sentences to try to access our metalinguistic awareness. We did so in deciding that the relative pronoun which can be separated from its preposition but the relative pronoun that cannot. Let’s look at another example of observing what’s possible. Here are two similar sentences:

🔗

(1a) 🔗: Sam compared the forged painting with the original.
🔗

🔗

(1b) 🔗: Sam compared the forged painting and the original.
🔗

🔗

As fluent language users of English, we intuitively know that both of these are possible sentences in English — they are both grammatical.

🔗

Now let us turn these sentences into questions — something that we, as fluent language users of English, can do without giving it any thought at all:

🔗

(2a) 🔗: Did Sam compare the forged painting with the original?
🔗

🔗

(2b) 🔗: Did Sam compare the forged painting and the original?
🔗

🔗

Observing those two questions, we can see that, again, both (2a) and (2b) are acceptable in English.

🔗

Now let’s try a different kind of question:

🔗

(3a) 🔗: What did Sam compare the forged painting with?
🔗

🔗

(3b) 🔗: *What did Sam compare the forged painting and?

🔗

Comparing these two sentences, we intuitively know that (3a) is possible (grammatical), but (3b) is not. Linguists normally use an asterisk (*) in front of a sentence to indicate that they have determined it to be ungrammatical based on their own metalinguistic awareness. Inventing a word, phrase or sentence and then determining whether it is grammatical based on our own metalinguistic awareness is often referred to as producing grammaticality judgments or acceptability judgments — the process by which language users do so is called introspection.

🔗

Some linguists treat such introspectively produced grammaticality judgments as empirical data. They would state that the two similar sentences discussed here are both possible as declarative statements (1a, b) and as yes-no questions (2a, b), but when we try to make a wh-question out of them, the result is acceptable for the first one (3a) but not for the second one (3b). Having made that observation, they would now try to figure out what’s going on in the mental grammar that can account for this observation. Why is (3a) grammatical but (3b) isn’t?

🔗

While this procedure works in very simple cases like the one presented here, we should use it extremely sparingly. It should be used only in cases where our judgments are very clear and consistent, and where other users of the language share them without hesitation. This is rarely the case once we look at anything but the simplest phenomena. And even then, we should remain skeptical — it is very easy to convince yourself and others that something that you want to be true is actually true. Overall, you should not regard grammaticality judgments as empirical data, but as a short-cut that allows us, in very restricted situations, to skip the process of collecting actual empirical data.

🔗

Subsection Proper sources of empirical data

described in detail following the image — Figure 1.4.1. Hoodie
🔗

Even in simple cases, you might not want to rely on your own grammaticality judgments, and, of course, you cannot rely on your own judgments if you are dealing with a language, or a language variety, that you do not speak fluently. Instead, you can (or even have to) use a survey — a questionnaire that you distribute on paper or as an online form — to gather grammaticality judgments.

🔗

We can also use surveys for other purposes, for example, for learning about regional language variation. We can elicit the words that people use for particular items in different places. From survey data we know that some people call the item in Figure 1.4.1 a sweatshirt, other people call it a hoodie, and people in Saskatchewan call it a bunny hug. If you’re studying regional and social variation you might also gather data using in-person interviews, in which you could ask questions like, “Does the ‘u’ in student sound like the ‘oo’ in too or the ‘u’ in use?”.

🔗

As mentioned above, linguists also use huge collections of natural language to make language observations. Such a collection is called a corpus. Corpora may contain a specific type of texts — for example, fiction, academic texts, newspapers, social media or recorded conversations —, or a mixture of different types of texts meant to represent the language as a whole. There are many preexisting corpora that are routinely used in linguistic research, but the internet and modern computing have made it very easy to collect your own corpora for specific questions and to annotate and search these corpora using a variety of software tools. There are also software tools for other types of observational linguistic research, for example, for annotating and analyzing audio and video recordings of speakers and signers.

🔗

Corpora also allow us to study variation across regions, and they are special in that, for languages that have a written record, they allow you to study variation across time retrospectively — you can collect texts published at different points in the past to study earlier stages of a language, or change across different stages. You can also study differences in different types of text. We discussed this in regard to the word irregardless. If you checked the British National Corpus, a collection of 90 million words of written language (books, newspapers, magazines, essays, academic research papers, etc.) and 10 million words of recorded spoken language, you would find that the word occurs exactly twice — once in written language and once in spoken language. In comparison, if you checked the Corpus of Contemporary American English (an ever-growing corpus, which, in the version we used here, contains about 336 million words of written language and 114 million words of spoken language), it occurs 38 times — 23 times in spoken language and 15 times in written language.

🔗

Since the corpora themselves as well as their spoken and written parts are of different sizes, it is useful to calculate relative frequencies. Percentages are often used in other areas, which would mean “occurrences per 100 words” in linguistics. But since most words are rare, this is not a useful measure — instead, linguists use relative frequencies such as “per million words” or even “per ten million words”. If we use the latter, we get the following frequencies:

🔗

British English
🔗
- spoken: 1
  
  🔗
- written: 0.1
  
  🔗
🔗
American English
🔗
- spoken: 2
  
  🔗
- written: 0.44
  
  🔗
🔗

🔗

Clearly, then, the word is more frequent in written than in spoken language in both varieties, but it is generally more frequent in American than in British English.

🔗

If we are interested in the mental representation of language, we can also draw on techniques from behavioural psychology and conduct experiments. You might measure language users’ reaction times and reading times for words and sentences, or ask participants to listen to words that are mixed with white noise. Some experiments use eye-tracking to measure people’s eye movements while reading a text, watching a signer, or listening to a speaker. It’s even possible to use neural imaging techniques like electroencephalography (EEG) and functional magnetic resonance imaging (fMRI) to observe brain activity during language processing.

🔗

When you’re starting out in linguistics, it’s often really exciting to use the scientific method to think about grammar, as you start to see that grammar is not just a set of arbitrary rules to memorize so you sound “proper”. Even if we’re not peering through a microscope wearing a lab coat, the tools of language science allow us to make systematic observations of how humans use language. And we can interpret those observations to draw conclusions about the human mind.

🔗

Subsection

CC-BY-NC-SA 4.0. Adapted from Catherine Anderson, Bronwyn Bjorkman, Derek Denis, Julianne Doner, Margaret Grant, Nathan Sanders, and Ai Taniguchi, Essentials of Linguistics. 2nd ed., with rewriting and additions by Anatol Stefanowitsch.

🔗

Prev Top Next