Which of these sentences is correct: "I have drunk the water"? "I have drank the water"? Maybe "I have drunken the water"? This is just one of many difficult questions about English, and sometimes the dictionary just doesn't offer much help. When I have a tough question about English, I use books.google.com to find the answer. Now I'll teach you the technique; it's really pretty easy.
Books.google.com, abbreviated "bgc", lets you search a gigantic database of texts, ranging from the great works of the masters to the latest romance novels. What that means is that it's the perfect place to get a bead on what English writers are writing, as well as what English speakers are saying.
Afterall, English is what English speakers speak and write. If an English speaker violates what's written in some grammar textbook, that doesn't mean the speaker is wrong. It means the textbook is wrong (or just not as accurate as it could be). Want a "textbook" that's never wrong, never inaccurate, which constantly updates as the language does, and which is a lot more interesting than most textbooks? That "textbook" is the collective writings of English speakers, and books.google.com is how you read it.
CONCRETE EXAMPLE: THE PAST PARTICIPLE OF "DRINK"
Let's look at the example of the past participle of "drink". It's rare enough that, one day when I heard someone use it, I realized I wasn't sure what the correct participle was. Remember, the past participle is different than the past tense, which is undoubtedly "drank". Past participles are the conjugations used to say "I have done something". For example, the past participle of "give" is "given", while the past tense is "gave". As for "drink", it turns out that all three possibilities work, through different time periods: "drunk", "drank", and "drunken". "Drunk" is the most common nowadays, but earlier it was "drank". "drunken" is the least common, but has been written fairly recently too. Let's see how I figured all this out.
I went to books.google.com and ran a search for "I have drunk the", with quotes. Results: 655 hits. Just to make sure they're legitimate, I skimmed through the first page of results, which includes excerpts from the texts with the phrase used in context. Some of the examples on that first page (first 10 hits) included Plato (translated of course); Dale Carnegie; and Oscar Wilde. The dates ranged from 1826 to 2004. All in just the first 20 hits out of 655; this is definitely a valid candidate for the past participle of "drink"!
Next, I ran a search for "I have drank the" (with quotes). Results: 467 hits. Although the first page didn't have any famous authors I recognized, it still included lots of very legitimate uses of English. The dates ranged from 1808 to 1893. Well, maybe this form is a little more antiquated than "drunk"!
Finally, "I have drunken the" (with quotes). Results: 106 hits. No authors I recognized on the first page, but all the examples there still looked good. Dates in the first 20 hits ranged from 1884 to 2003.
Based on just this, it's pretty reasonable to infer that "drunk" is the most popular answer nowadays, and that "drank" was the most popular in the 1800s, but is less popular nowadays. "drunken" seems to be making a comeback, but looking over the first 20 examples, it seems a lot of the "comeback" is in compilations of old poets' works, so maybe it's not actually said very often by people in "real life".
If I wanted to refine this initial guess further, there are several directions I could go. Here are three ways I could get an even better feel for the three possible past participles:
Let's use this example to create a general technique.
THE GENERAL TECHNIQUE
In creating this general technique for English research using books.google.com, I was heavily influenced by my training as a mathematician. In mathematics, there are some problems where it's impossible or impractical to find an exact answer. Instead, you find an approximate answer using what's called the "predictor-corrector" method.
The "predictor-corrector" method of problem-solving goes like this. First you make a prediction about the answer, as best as you can with your current knowledge, then use the prediction to get better knowledge. Then, repeat. With your increased knowledge, your prediction will probably be slightly better. With a better prediction, you'll be able to increase your knowledge even more. Then, repeat again. And so on, each time getting a better approximation to whatever you're seeking. When do you stop? That's up to you. The more iterations you do, the more precise your final answer becomes. So you stop when you feel like your answer is good enough.
Having read the above, you can probably already "predict" how the general scheme goes for researching English on books.google.com.
Assume you have some question or problem about English.
STEP 1: With whatever knowledge you have, guess an answer as best you can.
STEP 2: Now, go to books.google.com and run searches based on your guess. See if you can confirm or disprove the guess. If you guessed that a certain word or phrase is obsolete, restrict the search to only recent years, in "advanced book search". If your guess seems to be spot-on, delve deeper into the search results and see whether it still seems that way. If your guess is that a word is only used in certain regions or countries, see whether the search results confirm that or disprove it. And so on.
The overall goal here should be to try to disprove your guess. Your guess is almost certainly wrong, just because language is so wonderfully complex, but you might be guessing pretty close to the truth. If you can find evidence against your guess, that'll allow you to refine your guess and make it better.
STEP 3: If you're confident that the guess is close enough for your purposes, you're now done. Otherwise, go back to Step 1 and repeat the whole process.
A GRAMMATICAL EXAMPLE
A friend asked me on some forums about the following phenomenon. Lately, he'd been hearing a strange construction used a lot. He'd heard people saying things like: "It needs cleaned", or "This chore needs done", and so on. The question is, is this legitimate English grammar, or is it illiterate? How does it work?
(I was pretty sure right away what the answer was: it's legitimate English grammar, because it's spoken by native English speakers. Being naturally spoken by native English speakers makes it valid English, by definition. The real problem, here, was to find convincing evidence to demonstrate its validity to my friend, who wasn't as conscious to the truths of descriptivism vs. prescriptivism. And, for my own sakes as much as his, to find out how the construction works.)
My initial guess (which will turn out to be wrong, below) was that the construction in question was always proper; and, more precisely, I guessed that the verb "to need" can be followed by the past participle of any other verb, to indicate that what's needed is for the attached verb to be performed. For example, "This room needs cleaned" means that this room needs to be cleaned. Basically, anywhere where there is [need] + "to be" + [past participle], my guess was that the "to be" could be omitted. The result, I guessed, would suggest a sense of colloquialism, casualness, informalness, etc.
(By the way, my spellchecker just flagged "informalness" as a misspelling. Books.google.com proves the spellchecker wrong, though, with 56 "informalness" results of which the first 20 look perfectly fine.)
Since my friend provided the example "It needs cleaned", I used that, in quotes, as a search term. If my guess was wrong, the search would provide no results, or results with something visibly wrong with them.
I got eight results. The top result was Dean Koontz- a famous author and certainly a literate English role model. The other seven results were also perfectly legitimate. The only other one worth mentioning was from a book called "This Thing Don't Lead To Heaven" by Harry Crew in 1970. A glance at the excerpt there confirmed how this device offers a slight colloquial taste to the reader.
Next, I started varying the search term. Here are the results:
But in any case, the construction is certainly "valid". My friend wasn't just mishearing, and the people he heard weren't just babbling gibberish. The phenomenon isn't very common (at least in writing), but what occurence it does have in writing gives it indisputable support as being in use. And English which is in use by English speakers/writers, is the only type of English there really is.
TRAPS AND PITFALLS
As the above example illustrated, specifically with the "it needs done" search term, one of the biggest things to be alert for is the dreaded false positive. False positives can appear in the form of an unexpected construction which creates the search term you entered without invoking the phenomenon you're researching.
False positives can also include cases where Google's scanners misread a text. Sometimes people refer to these as "scannos", although that word seems to be technical jargon and doesn't itself have support on bgc. If in doubt, or if you're dealing with a very small number of search results, follow the search link to see the entire page, and then it's usually quite obvious whether or not there's a scanning error. Remember, the search results page attempts to produce excerpts from the results in text format, but the actual text is an image file.
When researching very short words, there's a big problem with noise. Someone once asked me to check whether the word "ent" had entered the language outside the context of Tolkien's Middle Earth. Since that word's so short, there are tens of thousands of false positives where it's used as an acronym or a word in another language.
Also, sometimes Google restricts preview of books, so you can't see the search term as it's used in the book, even though the book shows up among the results. Probably something to do with copyright disputes. Sometimes, if you really need to see the content of that book, you can find a working preview at Amazon. Some books on Amazon even have a "search inside this book" option. But you can't rely on this, because many don't.
WORD VERIFICATION AT WIKTIONARY
Although they don't use my general method, since it's designed for more general language research, the folks at English Wiktionary use books.google.com when a word in their database is disputed. In case you don't know, Wiktionary is a sister project of Wikipedia, it's a user-editable dictionary. The criteria they use for verifying a word is very simple. It basically boils down to finding three independent citations of the word, which citations span three years of time. Bgc isn't the only source used for citations, but it's the source of an overwhelming majority of citations.
IN OTHER LANGUAGES
Books.google.com does include books in other languages. I've used it before as a quick way of confirming that certain Japanese phrases were valid, or to find example sentences for certain Japanese words. I'm not sure how complete Google's collection of foreign language texts is, though, so tread with a little caution if you use bgc for any hardcore research.
DESCRIPTIVISM AND PRESCRIPTIVISM
The best thing about using bgc for English research, is that it's just about a perfect blend of descriptivism and prescriptivism. It's descriptive because it'll show you how English is really written, and by extension how it's really spoken. Whether or not some construction respects the careful rules of your 3rd grade English teacher, if it's a valid construction, it'll be in bgc.
On the other hand, bgc is just a slight bit prescriptive, because unlike the main google search engine, in order for writing to get on bgc, it has to be published somewhere by someone. So, whatever you find on bgc, you can rest assured that somewhere, some editor approved it. That, or the author was really devoted and self-published, but even that takes a lot more effort than getting writing on the main google search engine.
I like doing English research on bgc. It makes me feel like a language rebel, working without the support of dictionaries or grammar books. It makes me feel more "close" to the language :)
Here are a few other articles I wrote about language and such.
10 Reasons Why English Is A Hard Language Examples Of Japanese Onomatopoeia Will The Languages Of The World Ever Merge? Studying Foreign Language Proper Nouns
Books.google.com, abbreviated "bgc", lets you search a gigantic database of texts, ranging from the great works of the masters to the latest romance novels. What that means is that it's the perfect place to get a bead on what English writers are writing, as well as what English speakers are saying.
Afterall, English is what English speakers speak and write. If an English speaker violates what's written in some grammar textbook, that doesn't mean the speaker is wrong. It means the textbook is wrong (or just not as accurate as it could be). Want a "textbook" that's never wrong, never inaccurate, which constantly updates as the language does, and which is a lot more interesting than most textbooks? That "textbook" is the collective writings of English speakers, and books.google.com is how you read it.
CONCRETE EXAMPLE: THE PAST PARTICIPLE OF "DRINK"
Let's look at the example of the past participle of "drink". It's rare enough that, one day when I heard someone use it, I realized I wasn't sure what the correct participle was. Remember, the past participle is different than the past tense, which is undoubtedly "drank". Past participles are the conjugations used to say "I have done something". For example, the past participle of "give" is "given", while the past tense is "gave". As for "drink", it turns out that all three possibilities work, through different time periods: "drunk", "drank", and "drunken". "Drunk" is the most common nowadays, but earlier it was "drank". "drunken" is the least common, but has been written fairly recently too. Let's see how I figured all this out.
I went to books.google.com and ran a search for "I have drunk the", with quotes. Results: 655 hits. Just to make sure they're legitimate, I skimmed through the first page of results, which includes excerpts from the texts with the phrase used in context. Some of the examples on that first page (first 10 hits) included Plato (translated of course); Dale Carnegie; and Oscar Wilde. The dates ranged from 1826 to 2004. All in just the first 20 hits out of 655; this is definitely a valid candidate for the past participle of "drink"!
Next, I ran a search for "I have drank the" (with quotes). Results: 467 hits. Although the first page didn't have any famous authors I recognized, it still included lots of very legitimate uses of English. The dates ranged from 1808 to 1893. Well, maybe this form is a little more antiquated than "drunk"!
Finally, "I have drunken the" (with quotes). Results: 106 hits. No authors I recognized on the first page, but all the examples there still looked good. Dates in the first 20 hits ranged from 1884 to 2003.
Based on just this, it's pretty reasonable to infer that "drunk" is the most popular answer nowadays, and that "drank" was the most popular in the 1800s, but is less popular nowadays. "drunken" seems to be making a comeback, but looking over the first 20 examples, it seems a lot of the "comeback" is in compilations of old poets' works, so maybe it's not actually said very often by people in "real life".
If I wanted to refine this initial guess further, there are several directions I could go. Here are three ways I could get an even better feel for the three possible past participles:
- Go into "advanced book search" and restrict search results to more recent years.
- Vary the exact search term, like searching for "he had drunk the" or "I haven't drank any", and so on. Generally, the more specific the search term, the fewer search results you'll get. However, you don't want to make the search so general that you get false leads.
- Browse further into the search results, beyond just the first twenty.
Let's use this example to create a general technique.
THE GENERAL TECHNIQUE
In creating this general technique for English research using books.google.com, I was heavily influenced by my training as a mathematician. In mathematics, there are some problems where it's impossible or impractical to find an exact answer. Instead, you find an approximate answer using what's called the "predictor-corrector" method.
The "predictor-corrector" method of problem-solving goes like this. First you make a prediction about the answer, as best as you can with your current knowledge, then use the prediction to get better knowledge. Then, repeat. With your increased knowledge, your prediction will probably be slightly better. With a better prediction, you'll be able to increase your knowledge even more. Then, repeat again. And so on, each time getting a better approximation to whatever you're seeking. When do you stop? That's up to you. The more iterations you do, the more precise your final answer becomes. So you stop when you feel like your answer is good enough.
Having read the above, you can probably already "predict" how the general scheme goes for researching English on books.google.com.
Assume you have some question or problem about English.
STEP 1: With whatever knowledge you have, guess an answer as best you can.
STEP 2: Now, go to books.google.com and run searches based on your guess. See if you can confirm or disprove the guess. If you guessed that a certain word or phrase is obsolete, restrict the search to only recent years, in "advanced book search". If your guess seems to be spot-on, delve deeper into the search results and see whether it still seems that way. If your guess is that a word is only used in certain regions or countries, see whether the search results confirm that or disprove it. And so on.
The overall goal here should be to try to disprove your guess. Your guess is almost certainly wrong, just because language is so wonderfully complex, but you might be guessing pretty close to the truth. If you can find evidence against your guess, that'll allow you to refine your guess and make it better.
STEP 3: If you're confident that the guess is close enough for your purposes, you're now done. Otherwise, go back to Step 1 and repeat the whole process.
A GRAMMATICAL EXAMPLE
A friend asked me on some forums about the following phenomenon. Lately, he'd been hearing a strange construction used a lot. He'd heard people saying things like: "It needs cleaned", or "This chore needs done", and so on. The question is, is this legitimate English grammar, or is it illiterate? How does it work?
(I was pretty sure right away what the answer was: it's legitimate English grammar, because it's spoken by native English speakers. Being naturally spoken by native English speakers makes it valid English, by definition. The real problem, here, was to find convincing evidence to demonstrate its validity to my friend, who wasn't as conscious to the truths of descriptivism vs. prescriptivism. And, for my own sakes as much as his, to find out how the construction works.)
My initial guess (which will turn out to be wrong, below) was that the construction in question was always proper; and, more precisely, I guessed that the verb "to need" can be followed by the past participle of any other verb, to indicate that what's needed is for the attached verb to be performed. For example, "This room needs cleaned" means that this room needs to be cleaned. Basically, anywhere where there is [need] + "to be" + [past participle], my guess was that the "to be" could be omitted. The result, I guessed, would suggest a sense of colloquialism, casualness, informalness, etc.
(By the way, my spellchecker just flagged "informalness" as a misspelling. Books.google.com proves the spellchecker wrong, though, with 56 "informalness" results of which the first 20 look perfectly fine.)
Since my friend provided the example "It needs cleaned", I used that, in quotes, as a search term. If my guess was wrong, the search would provide no results, or results with something visibly wrong with them.
I got eight results. The top result was Dean Koontz- a famous author and certainly a literate English role model. The other seven results were also perfectly legitimate. The only other one worth mentioning was from a book called "This Thing Don't Lead To Heaven" by Harry Crew in 1970. A glance at the excerpt there confirmed how this device offers a slight colloquial taste to the reader.
Next, I started varying the search term. Here are the results:
- "it needs cleaned" - 8 hits which look good
- "it needs done" - 68 hits, but many are another sense, eg: "I'll do the job which management has decided it needs done." (Here "it" means "management", not "the job"). Because of the unexpected ambiguity, I had to delve in deeper, but I did find there are indeed plenty of legitimate appearances of the construction in question.
- "it needs washed" - 13 hits, but only 1 good, most are books about grammar
- "it needs fixed" - 19 hits, 5 good
- "it needs changed" - 8 hits, 4 good
- "it needs stopped" - 1 hit, good
- And many similar searches with no good hits
But in any case, the construction is certainly "valid". My friend wasn't just mishearing, and the people he heard weren't just babbling gibberish. The phenomenon isn't very common (at least in writing), but what occurence it does have in writing gives it indisputable support as being in use. And English which is in use by English speakers/writers, is the only type of English there really is.
TRAPS AND PITFALLS
As the above example illustrated, specifically with the "it needs done" search term, one of the biggest things to be alert for is the dreaded false positive. False positives can appear in the form of an unexpected construction which creates the search term you entered without invoking the phenomenon you're researching.
False positives can also include cases where Google's scanners misread a text. Sometimes people refer to these as "scannos", although that word seems to be technical jargon and doesn't itself have support on bgc. If in doubt, or if you're dealing with a very small number of search results, follow the search link to see the entire page, and then it's usually quite obvious whether or not there's a scanning error. Remember, the search results page attempts to produce excerpts from the results in text format, but the actual text is an image file.
When researching very short words, there's a big problem with noise. Someone once asked me to check whether the word "ent" had entered the language outside the context of Tolkien's Middle Earth. Since that word's so short, there are tens of thousands of false positives where it's used as an acronym or a word in another language.
Also, sometimes Google restricts preview of books, so you can't see the search term as it's used in the book, even though the book shows up among the results. Probably something to do with copyright disputes. Sometimes, if you really need to see the content of that book, you can find a working preview at Amazon. Some books on Amazon even have a "search inside this book" option. But you can't rely on this, because many don't.
WORD VERIFICATION AT WIKTIONARY
Although they don't use my general method, since it's designed for more general language research, the folks at English Wiktionary use books.google.com when a word in their database is disputed. In case you don't know, Wiktionary is a sister project of Wikipedia, it's a user-editable dictionary. The criteria they use for verifying a word is very simple. It basically boils down to finding three independent citations of the word, which citations span three years of time. Bgc isn't the only source used for citations, but it's the source of an overwhelming majority of citations.
IN OTHER LANGUAGES
Books.google.com does include books in other languages. I've used it before as a quick way of confirming that certain Japanese phrases were valid, or to find example sentences for certain Japanese words. I'm not sure how complete Google's collection of foreign language texts is, though, so tread with a little caution if you use bgc for any hardcore research.
DESCRIPTIVISM AND PRESCRIPTIVISM
The best thing about using bgc for English research, is that it's just about a perfect blend of descriptivism and prescriptivism. It's descriptive because it'll show you how English is really written, and by extension how it's really spoken. Whether or not some construction respects the careful rules of your 3rd grade English teacher, if it's a valid construction, it'll be in bgc.
On the other hand, bgc is just a slight bit prescriptive, because unlike the main google search engine, in order for writing to get on bgc, it has to be published somewhere by someone. So, whatever you find on bgc, you can rest assured that somewhere, some editor approved it. That, or the author was really devoted and self-published, but even that takes a lot more effort than getting writing on the main google search engine.
I like doing English research on bgc. It makes me feel like a language rebel, working without the support of dictionaries or grammar books. It makes me feel more "close" to the language :)
Here are a few other articles I wrote about language and such.
10 Reasons Why English Is A Hard Language Examples Of Japanese Onomatopoeia Will The Languages Of The World Ever Merge? Studying Foreign Language Proper Nouns
3 comments:
Your initial question (which one is correct, I've drunk or I've drank?) both amused and surprised me ... it always amazes me the kind of mistakes native English speakers have. I mean, I for sure make many mistakes as well, and I've been learning English for quite a long time now. Here in Spain we learn the most common irregular English verbs, and most students seem to be pretty aquainted with them ... I think most of us would have trouble with many other aspects of the language, but undoubtfully not with a drank/drunk choice.
I sometimes think this has to do with how English is taught in the UK or in the US ... Another example is the trouble some native speakers seem to have with "their / there". To me the difference is evident, since we learned this from a grammar point of view and I think we don't have trouble with spelling here even if both sound the same ... On the other side, for instance, many Spaniards have trouble with "haber" and "a ver", wich sound the same but have completely different meanings, and I'm quite sure that a foreigner learning Spanish would not even think of having trouble with ...
Very interesting comment! I never had trouble with their/there, but I see a lot of people who do. Another big one is "its"/"it's"/"its'". Even I screw that one up sometimes ;)
Hi!
I love your articles and find them informative and most interesting. Thank you for being who you are and sharing. If you have a book coming out, let me know. I would like to purchase anything you write!
Keep up the great works.
Thelma Harcum
thelmaharcum@yahoo.com
I followed you on twitter and place your link on my twitter site as well.
Best regards,
Thelma
Post a Comment