{"id":272,"date":"2024-09-18T08:44:52","date_gmt":"2024-09-18T06:44:52","guid":{"rendered":"https:\/\/linguistica.info\/b\/lei\/?page_id=272"},"modified":"2025-06-27T10:55:54","modified_gmt":"2025-06-27T08:55:54","slug":"1-2-studying-language-scientifically","status":"publish","type":"page","link":"https:\/\/linguistica.info\/b\/leiwp\/toc\/1-introduction\/1-2-studying-language-scientifically\/","title":{"rendered":"1.2 Studying language scientifically"},"content":{"rendered":"<p>We said that linguistics is the scientific study of human language and discussed what we mean by \u201chuman language\u201d. Let us now explain what we mean by \u201cscientific study\u201d. When we say that linguistics is a science, that doesn\u2019t mean you need a lab coat and a microscope to do linguistics \u2014 although there are branches of lingustic research that use laboratory instruments, such as phonetics (see Chapter 3) or psycholinguistics. What it means is that we think about language using the scientific way of thinking. <!-- AS: minor edits --><\/p>\n<p>The scientific way of thinking about language involves systematic, <strong>empirical<\/strong> <strong>study<\/strong>. The word <em>empirical<\/em> means that we base our ideas about language on data that we gather by observing how people use their language in natural settings, or by giving them linguistic tasks in a laboratory and recording their responses. In this way, linguistics is no different from other sciences \u2014 entomologists observe the life cycles and habitats of insects, chemists observe how substances interact, linguists observe how people use language. Just like entomologists and chemists, linguists aim for an accurate description of the phenomenon they\u2019re studying. And like other scientists, linguists strive to make observations that are not value judgments. If an entomologist observes that a certain species of beetle eats leaves, she\u2019s not going to judge that the beetles are eating wrong, and tell them that they\u2019d be more successful in life if only they ate the same thing as ants. The same is true of linguists \u2014 we do not go around telling people how they should or shouldn\u2019t use language. Or at least, that is what would be the case in an ideal world. Of course, like all scientists, and like all humans, linguists have biases that often prevent us from reaching this ideal \u2014 unlike entomologists or chemists, we are not impartial observers of organisms or substances different from ourselves, but we are part of the very thing we study \u2014 users of one or more languages, members of one or more language communities with all the cultural biases that other members of these communities have. Thus, it is more difficult for us to adhere to the scientific ideal, and we must try harder to do so.<!-- AS: some edits and restructuring --><\/p>\n<p>What makes things even more difficult is that language communities tend to have their own traditions of thinking and talking about language, and those traditions can be at odds with the scientific approach to language. In many language communities, there is a <strong>prescriptive<\/strong> tradition of telling people how they should or should not use language. Such an approach seems natural to members of such communities \u2014 in part, because it is woven into our education systems \u2014, and they may find the purely <strong>descriptive<\/strong> approach that linguists strive for irritating.<!-- AS: substantial rewrite --><\/p>\n<p>To illustrate the difference between the two approaches, take the way that plurals are formed in English. A first approximation of a description of English plurals could be the following:<!-- AS: minor edits --><\/p>\n<blockquote><p>Adding <em>-s<\/em> to a noun allows <!-- KM: causes? --> it to refer to more than one instance of what the noun refers to \u2014 for example, <em>apple\/apples<\/em>,<em>book\/books<\/em>, <em>dog\/dogs<\/em>, <em>virus\/viruses<\/em> or <em>formula\/formulas.<!-- AS: added virus and formula to pick up in subsequent discussion --><\/em><\/p><\/blockquote>\n<p>This is not a complete description yet, but it accounts for a large part of the linguistic behavior of English speakers forming plurals.<!-- AS: added --><\/p>\n<p>A prescriptive statement, in contrast, would look like this:<\/p>\n<blockquote><p>Because the word <em>virus<\/em> is derived from Latin, you should pluralize it as <em>viri<\/em>, not <em>viruses<\/em>.<\/p><\/blockquote>\n<p>First, note that this statement is wrong even in the part where it talks about Latin \u2014 there is no attested plural form of\u00a0<em>virus<\/em> in the Latin texts we have, but since it is a neuter noun, the expected Latin form would be\u00a0<em>vira<\/em>. Second, this is completely irrelevant, since\u00a0<em>virus<\/em>, when used by a speaker of English, is an English word, not a Latin word, and speakers of English use the plural form\u00a0<em>viruses<\/em>. The only speakers who say <em>viri<\/em> are people who have let themselves be convinced by prescriptivists against their own better judgment.<!-- AS: integrated footnote, extended and clarified --><\/p>\n<p>Of course, there can be situations where different speakers do different things. Take the word\u00a0<em>formula<\/em> \u2014 it was included in the descriptive statement above as an example of a word whose plural is formed by the addition of an <em>-s<\/em>, and in fact, many speakers of English use this plural. However, others use the plural\u00a0<em>formulae<\/em>, which is, indeed, the Latin plural. What do we do as linguists in this situation? Do we tell people that <em>formulae<\/em> is correct because that is the Latin form? Or do we tell people that\u00a0<em>formulas<\/em> is correct because that is how plurals are normally formed in English? The answer is, we do neither. It is not our job to tell anyone how to use their language, but to observe and describe how they use it naturally. Thus, we would first state that some speakers use <em>formulas<\/em> and some use\u00a0<em>formulae<\/em>, and we would then try to determine if there are reasons for this variation \u2014 does it depend on the region where a speaker is from, does it depend on the kind of setting they are in, etc.<!-- AS: added --><\/p>\n<p>So when we\u2019re doing linguistics, our goal is to make descriptive, empirical observations of language. In doing so, we have two problems, a practical one and a theoretical one. The practical problem concerns the availability of data. If we are interested in a particular question, such as what the plural of the English word <em>formula<\/em> is, it would be very inconvenient if we had to walk around and wait for speakers to speak about more than one formula \u2014 it is a rare word even in the singular form. Linguists get around this problem by collecting vast amounts of natural language data in electronic form that we can then search for words (or other linguistic phenomena) using a computer. We could also perform a simple experiment and simply ask a number of speakers \u2014 this is easy enough for a simple question, but it becomes more and more difficult the larger and more complex our questions become. Linguists do rely on experiments quite a bit, but such experiments are time-consuming and expensive.<!--AS: rewrote and extended extensively --><\/p>\n<p>The theoretical problem concerns the interpretation of data: as linguists, we are interested not just in the specific linguistic behaviors that people display in a specific situation, but in the subconscious linguistic knowledge that guides these behaviors. That knowledge cannot be observed directly. We cannot just cut open language users&#8217; heads in the way our entomologist friend might dissect a beetle, and even if we could, we would not find any linguistic knowledge there. Thus, we have to deduce a model of the subconscious knowledge of language users from their behavior (in the wild or in the laboratory). This is a very difficult task indeed.<!--AS: rewrote and extended extensively --><\/p>\n<h2>Metalinguistic knowledge as a source of empirical data<!--AS: rewrote the section in a way that is much more critical of grammaticality judgments, added language that clarifies that they are not uncontroversial --><\/h2>\n<p>One solution that many linguists propose to both of the problems mentioned above is to access our own\u00a0<strong>metalinguistic awareness<\/strong> as fluent speakers or signers of a language. As mentioned above, as linguists we are part of what we study \u2014 why not turn this into an advantage? We already have the subconscious knowledge that we are after \u2014 so the argument goes \u2014, all we have to do is access that knowledge.<!--AS: added --><\/p>\n<p>This is an attractive idea, and to some extent we can make it work. Here\u2019s an example of accessing your metalinguistic awareness. Say you want to create a new English word for a character in a game. Are you going to call your cute little creature a <em>blifter<\/em> or a <em>lbitfer<\/em>? Neither of those forms exists in English, but they both use sounds that are part of the sound system of English. Yet, you probably have a strong feeling that <em>blifter<\/em> is an okay name for your new creature, while <em>lbitfer<\/em> is a pretty terrible name. Notice that your sense that <em>lbitfer<\/em> is wrong is not based on prescriptive ideas \u2014 it\u2019s not that it sounds rude or you\u2019ll get in trouble for combining those sounds that way. It is based on an intuitive knowledge that it just \u2026 can\u2019t happen. You\u2019ve made a descriptive observation that <em>lbitfer<\/em> is not a possible word in English. From that observation, we can conclude that <em>lbitfer<\/em> violates some part of the subconscious knowledge of the English language that fluent language users have.<!--AS: minor edits --><\/p>\n<p>Since many linguists use the term\u00a0<em>mental <\/em><em>grammar<\/em> to describe that subconscious knowledge \u2014 including the knowledge not just of what we colloquially call grammar, but also of the sounds, words and bits of words \u2014, we could say that the word<em> blifter<\/em> is\u00a0<strong>grammatical<\/strong> and the word <em>lbitfer<\/em> is\u00a0<strong>ungrammatical<\/strong> in English. An ungrammatical word or phrase or sentence is something that just can\u2019t exist in a particular language: the mental grammar of that language does not contain any rules or representations that would allow language users to produce it. Thus, grammaticality isn\u2019t about what <em>actually exists in<\/em> a language; it\u2019s about what <em>could<\/em> exist. In this example, neither <em>blifter<\/em> nor <em>lbitfer<\/em> exist in English, but although they have the same sounds in them <em>blifter<\/em> could be an English word and <em>lbifter<\/em> couldn\u2019t.<!--AS: minor edits --><\/p>\n<p>It\u2019s often useful to compare similar words, phrases or sentences to try to access our metalinguistic awareness. Let\u2019s look at another example of observing what\u2019s possible, this time from what we would usually call grammar. Here are two similar sentences:<!--AS: minor edits --><\/p>\n<div class=\"example\">\n<div class=\"number\">(1a)<\/div>\n<div class=\"sentence\"><em>Sam compared the forged painting with the original.<\/em><\/div>\n<\/div>\n<div class=\"example\">\n<div class=\"number\">(1b)<\/div>\n<div class=\"sentence\"><em>Sam compared the forged painting and the original.<\/em><\/div>\n<\/div>\n<p>As fluent language users of English, we intuitively know that both of these are possible sentences in English (they are both grammatical).<!--AS: minor edits --><\/p>\n<p>Now let us turn these sentences into questions \u2014 something that we, as fluent language users of English, can do without giving it any thought at all:<!--AS: minor edits --><\/p>\n<div class=\"example\">\n<div class=\"number\">(2a)<\/div>\n<div class=\"sentence\"><em>Did Sam compare the forged painting with the original?<\/em><\/div>\n<\/div>\n<div class=\"example\">\n<div class=\"number\">(2b)<\/div>\n<div class=\"sentence\"><em>Did Sam compare the forged painting and the original?<\/em><\/div>\n<\/div>\n<p>Observing those two questions, we can see that, again, both (2a) and (2b) are acceptable in English.<!--AS: minor edits --><\/p>\n<p>Now let\u2019s try a different kind of question:<\/p>\n<div class=\"example\">\n<div class=\"number\">(3a)<\/div>\n<div class=\"sentence\"><em>What did Sam compare the forged painting with?<\/em><\/div>\n<\/div>\n<div class=\"example\">\n<div class=\"number\">(3b)<\/div>\n<div class=\"sentence\"><em><strong>*<\/strong>What did Sam compare the forged painting and?<\/em><\/div>\n<\/div>\n<p>Comparing these two sentences, we intuitively know that (3a) is possible (grammatical), but (3b) is not. Linguists normally use an asterisk (*) in front of a sentence to indicate that they have determined it to be ungrammatical based on their own metalinguistic awareness. Inventing a word, phrase or sentence and then determining whether it is grammatical based on our own metalinguistic awareness is often referred to as producing <strong>grammaticality judgments<\/strong> or\u00a0<strong>acceptability judgments<\/strong>.<!-- KM: *introspection*?--><\/p>\n<p>Some linguists treat such grammaticality judgments as empirical data. They would state that the two similar sentences discussed here are both possible as declarative statements (1a, b) and as yes-no questions (2a, b), but when we try to make a wh-question out of them, the result is acceptable for the first one (3a) but not for the second one (3b). Having made that observation, they would now try to figure out what\u2019s going on in the mental grammar that can account for this observation. Why is (3a) grammatical but (3b) isn\u2019t?<!--AS: minor edits --><\/p>\n<p>While this procedure works in very simple cases like the one presented here, we should use it extremely sparingly. It should be used only in cases where our judgments are very clear and consistent, and where other users of the language share them without hesitation. This is rarely the case once we look at anything but the simplest phenomena. And even then, we should remain skeptical \u2014 it is very easy to convince yourself and others that something that you want to be true is actually true. Overall, you should not regard grammaticality judgments as empirical data, but as a short-cut that allows us, in very restricted situations, to skip the process of collecting actual empirical data.<!--AS: added --><!-- KM: Evl. noch hinzuf\u00fcgen, dass wir auch und gerade als kompetente Sprecher*innen einfach nicht auf alles kommen, was so vorkommen k\u00f6nnte? (Ich denke an mein deutsches Nachfeld, aber es muss ja auch gute Beispiele zum Englischen geben.) --><\/p>\n<h2>Proper sources of empirical data<\/h2>\n<div id=\"attachment_290\" style=\"width: 241px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-290\" class=\"size-full wp-image-290\" src=\"https:\/\/linguistica.info\/b\/lei\/wp-content\/uploads\/2024\/09\/Ch1_Fig4_hoodie-231x300-1.png\" alt=\"Grey zip-up hoodie on white background.\" width=\"231\" height=\"300\" \/><p id=\"caption-attachment-290\" class=\"wp-caption-text\">Figure 1.4.1. Hoodie.<\/p><\/div>\n<p>Even in simple cases, you might not want to rely on your own grammaticality judgments, and, of course, you cannot rely on your own judgments if you are dealing with a language, or a language variety, that you do not speak fluently. Instead, you can (or even have to) use a\u00a0<strong>survey<\/strong> \u2014 a questionnaire that you distribute on paper or as an online form \u2014 to gather grammaticality judgments.<\/p>\n<p>We can also use surveys for other purposes, for example, for learning about regional language variation. We can\u00a0<strong>elicit<\/strong> the words that people use for particular items in different places. From survey data we know that some people call the item in Figure 1.4.1 a <em>sweatshirt<\/em>, other people call it a <em>hoodie<\/em>, and people in Saskatchewan call it a <em>bunny hug. <\/em>If you\u2019re studying regional and social variation you might also gather data using in-person <strong>interviews<\/strong>, in which you could ask questions like, \u201cDoes the \u2018u\u2019 in <em>student<\/em> sound like the \u2018oo\u2019 in <em>too<\/em> or the \u2018u\u2019 in <em>use<\/em>?\u201d.<\/p>\n<p>As mentioned above, linguists also use huge collections of natural language to make language observations. Such a collection is called a <strong>corpus<\/strong>. Corpora may contain a specific type of texts \u2014 for example, fiction, academic texts, newspapers, social media or recorded conversations \u2014, or a mixture of different types of texts meant to represent the language as a whole. There are many preexisting corpora that are routinely used in linguistic research, but the internet and modern computing have made it very easy to collect your own corpora for specific questions and to annotate and search these corpora using a variety of software tools. There are also software tools for other types of observational linguistic research, for example, for annotating and analyzing audio and video recordings of speakers and signers.<!--AS: minor edits --><\/p>\n<p>Corpora also allow us to study variation across regions, and they are special in that, for languages that have a written record, they allow you to study variation across time retrospectively \u2014 you can collect texts published at different points in the past to study earlier\u00a0 stages of a language, or change across different stages. You can also study differences in different types of text. For example, you could test the hypothesis that <em>formulae<\/em> was the original plural form in English and that speakers began to replace it by\u00a0<em>formulas<\/em> as knowledge of Latin became less widespread. You would find that this is not true: In British English, the two plural forms were roughly equally frequent in the 18th and 19th century, <em>formulae<\/em> became much more frequent in the 20th century, and <em>formulas<\/em> taking the lead in the 21st century. In American English, <em>formulas<\/em> always was the vastly more frequent form and remained so throughout the 20th century \u2014 thus, at least for a while, the two plural forms could be considered a dialectal difference. This difference holds across all text types, but in American English, the form <em>formulae<\/em> is found around a third of the time in academic writing and prose fiction, but almost never in newspapers, magazines or spoken language. This suggests that, in American English, there was an influence of register (with <em>formulae<\/em> existing as an alternative to\u00a0<em>formulas<\/em> in formal registers only).<!--AS: added --><\/p>\n<p>If we are interested in the mental representation of language, we can also draw on techniques from behavioural psychology and conduct <strong>experiments<\/strong>. You might measure language users\u2019 reaction times and reading times for words and sentences, or ask participants to listen to words that are mixed with white noise. Some experiments use eye-tracking to measure people\u2019s eye movements while reading a text, watching a signer, or listening to a speaker. It\u2019s even possible to use <strong>neural imaging<\/strong> techniques like electroencephalography (EEG) and functional magnetic resonance imaging (fMRI) to observe brain activity during language processing.<!--AS: minor edits --><\/p>\n<p>When you\u2019re starting out in linguistics, it\u2019s often really exciting to use the scientific method to think about grammar, as you start to see that grammar is not just a set of arbitrary rules to memorize so you sound \u201cproper\u201d. Even if we\u2019re not peering through a microscope wearing a lab coat, the tools of language science allow us to make systematic observations of how humans use language. And we can interpret those observations to draw conclusions about the human mind.<\/p>\n<p style=\"text-align: right;\">\u00a0<\/p>\n<p><span class=\"nav-previous\"><a href=\"https:\/\/linguistica.info\/b\/lei\/toc\/1-introduction\/1-1-what-even-is-language\/\" rel=\"prev\"><span class=\"meta-nav\">\u2190<\/span> Previous section<\/a><\/span> <span class=\"nav-next\"><a href=\"https:\/\/linguistica.info\/b\/lei\/toc\/1-introduction\/1-3-thinking-about-standards-and-proper-grammar\/\" rel=\"next\">Next section <span class=\"meta-nav\">\u2192<\/span><\/a><\/span><\/p>\n<p class=\"authshp\">CC-BY-NC-SA 4.0, Adapted from Anderson, Catherine, Bronwyn Bjorkman, Derek Denis, Julianne Doner, Margaret Grant, Nathan Sanders, and Ai Taniguchi, <em>Essentials of Linguistics. 2nd ed.<\/em>, with rewriting and extensions by Anatol Stefanowitsch.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>We said that linguistics is the scientific study of human language and discussed what we mean by \u201chuman language\u201d. Let us now explain what we mean by \u201cscientific study\u201d. When we say that linguistics is a science, that doesn\u2019t mean you need a lab coat and a microscope to do linguistics \u2014 although there are [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"parent":9,"menu_order":2,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-272","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/linguistica.info\/b\/leiwp\/wp-json\/wp\/v2\/pages\/272","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/linguistica.info\/b\/leiwp\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/linguistica.info\/b\/leiwp\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/linguistica.info\/b\/leiwp\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/linguistica.info\/b\/leiwp\/wp-json\/wp\/v2\/comments?post=272"}],"version-history":[{"count":35,"href":"https:\/\/linguistica.info\/b\/leiwp\/wp-json\/wp\/v2\/pages\/272\/revisions"}],"predecessor-version":[{"id":2101,"href":"https:\/\/linguistica.info\/b\/leiwp\/wp-json\/wp\/v2\/pages\/272\/revisions\/2101"}],"up":[{"embeddable":true,"href":"https:\/\/linguistica.info\/b\/leiwp\/wp-json\/wp\/v2\/pages\/9"}],"wp:attachment":[{"href":"https:\/\/linguistica.info\/b\/leiwp\/wp-json\/wp\/v2\/media?parent=272"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}