Creuset of Ideas
Wednesday Linguistics: Data

2006-05-3 @ 9:43

Many of you probably don’t know this, but linguists like to make up data. Maybe not all of them, but some mainstream theories really go for that sort of thing, especially in syntax. I was reminded of this recently by a post by The Language Guy.

To me, coming for a “hard” science (or, as one of my profs put it, “inhumanities”), this is ridiculous. Sure, as speakers of the language, we can make examples up. No problem with that. But it’s when linguists start making up examples of ungrammatical utterances that things go bad. Most just rely on their own (or a very few people’s) intuition to see if what they wrote works or not. They don’t bother looking it is actually said or not.

And what is worst, if someone comes up with a counterexample, they’ll label it error, poetic or strangely dialectal. And discard it. Yep, discard real data in favour of made-up stuff.

I never trusted made-up examples. In fact, when I studied linguistics, we were taught the importance of real data. Our teacher would never have accepted research that wasn’t supported by a corpus (spoken or written). Not that we didn’t have any use for linguistic intuition, quite the contrary: intuition was the root of the investigation into meaning. We were taught to start from our intuition based on what we had observed, posit a hypothesis as to the meaning and effect of a given structure, then confront the hypothesis to the corpus data and go back to step one for refinement of the idea.

Instead of just saying this is grammatical and this isn’t, we actually looked for examples of things that would usually be deemed ungrammatical, and try and figure out why it was actually used; try and see what unexpected utterances tell us about the syntax and semantics. The idea was not to say why is x grammatical or not, but in which context and with what meaning or sense can x be uttered as grammatical.

A simple example. Any grammarian or linguist will tell you that with how one uses the bare infinitive and with why, the to-infinitive. That may be true — in about 99.44% of the time — but, an analysis of corpus data will show that it is not always the case, for example: “How tell her?”

This cannot be a ‘performance error’ for it is repeated twice in the same page (and one would hope the proof-reader would have picked it up), and is written by a native speaker with quite a good grasp of the language (George Orwell, Burmese Days). Nor is it a great puzzle: the speaker wanted to convey a specific thought different from what “How to tell her?” would have evoked. In this instance, the goal of the linguist is to find out what that difference is, where it comes from, hence what that left-out to stands for.

Now, isn’t much more fun to try and figure out what Orwell meant by “How tell her?” than to discard it as ungrammatical?

What do you think?