People like counting.
People like comparing things.
People like to marvel at the supposed uniqueness of English.
Those three trends are exaggerated even further on the internet.
So it’s no wonder that there are websites like this,, that tout that the number of words in English as some precise number (1,013,913 and growing at 15/day), as compared to woeful 2nd place finisher Mandarin.
The problem is that this is complete and utter nonsense, crap, b.s., ridonkulosity, bushwa, bubbe-meises.*
* – Keep this in mind as we move on.
The problem with trying to quantify the number of words in a language is that there is no precise way of defining the two most important things in that sentence – words and language.
What is a word?
What, exactly, counts as a word? We have a general sense – dog is a word, bnick is not, but the challenge with really figuring out what counts as a word is highlighted by some of the examples in the sentence above beginning with nonsense.
Does nonsense count as a word? Or is it the same as sense?
What about dog and dogs?
Or dog and hot dog?
How many words is flame, flames, inflame, inflammable, flammable?
Or grandfather, great grandfather, great-greatgrandfather and so on?
English, like almost every other language, has morphology, which is a system of building words from meaningful word parts. Loosely, morphology can be broken down into(run -> runs), (run -> runner) and (with varying degrees of coherence, e.g., cab driver, toothpick) with lots of gray area in between.
There is no way of deciding which of these word forms count as a word in a way that is not completely arbitrary. Lest you think this is a minor factor, these would easily change your answer by close to an order of magnitude as you can see from the flame or grandfather examples. Almost every word is subject to morphology and there is no principled way of deciding when the result should be counted as another word or not.
2) Synonyms, homonyms and heteronyms, oh my!
Crap is a verb. Crap is a noun. Crap means a lie and crap means feces. I guess you can count that all as one word, but what about same spelling and a more radically different meaning, e.g., bank (river) and bank ($)? Or how about same spelling, different meaning and different pronunciation, e.g., desert (sand) and desert (leave)? Or if spelling is your guide, what about different spelling of the same meaning, e.g., advisor v. adviser?
Indeed, almost every permutation of same v. different meaning, spelling and pronunciation can be found among (amongst, wink wink) words:
Some of the interesting many-to-many relationships between meaning, spelling and pronunciation
As with morphology, there is no non-arbitrary way of deciding what counts as a separate word here.
Moving on to the next word in our little rant, b.s. Are you counting abbreviations and acronyms in your list and if so, how? B.S. is pretty conventionlized, but certainly not as much as laser, though moreso than POTUS, though that depends if you’re working in politics or not, not to say what the status is of EKG, an acronym you certainly hear more than the real word itself.
As above, whatever deciding line select will be completely arbitrary. The number here probably isn’t too high – maybe on the order of 10s of thousands, but it serves to highlight another parallel problem, that of:
Did you like the word redonkulosity? I just made it up. Or at least, I thought I just made it up, but it does show up in google w/ 4000 hits. That was after thinking I had sort of created the novel word ridiculosity – spell check says it isn’t one – but Merriam Webster says it is.
The fact is that there is no definitive way of deciding whether a new word should count as, well, a word. New entries in the OED or MW are decided by a person, or group of people, according to some general guidelines relating to frequency of use, place of use and so on. These are not guidelines handed down from on high, as much as we revere the, but are, again, arbitrary. They even vary from dictionary to dictionary resulting in something like a two-fold difference in the size of different dictionaries.
Next up, bushwa, a word I didn’t even know until I read this article , where Geoff Nunberg debunks the charlatans at Global Language Monitor, albeit briefly. That’s because the word has been going out of style since about 1950. That’s a relatively recent decline as compared to other words, like emmet or pismire, both words for ant, which went out of use hundreds of years ago.
So not only do we not have a concrete way of deciding when to add a word, we similarly have no way of deciding when to remove a word from our list, either. Given that languages are in a constant state of flux, the creates a moving target wherein the exit criteria should be linked to the entrance criteria, which itself is arbitrary. So, again, more arbitrariness.
Finally, bubbe-meises, my favorite in the list, which is a word in the English dictionary. It is clearly a borrowing, in this case from Yiddish roughly meaning Old Wives’ Tale, but with a bit more of a sense of dismissal. Words are borrowed into English not with a single leap, but gradually, at different rates for each word depending on pronunciation, frequency, semantics and so on.
In counting the words of English, you will have to somehow define yet another cut-off point here when figuring out what to count and what not to count.
7) Specialty Words
And last, but not least, indeed perhaps most, in terms of how it would affects your final number, we have the millions upon millions of words associated with different scientific specializations. Not to say that Critical Theory hasn’t come up with its own unique vocabulary, but no one quite compares to Chemists and Entomologists in outdoing everyone else in word creation.
There are 350,000 species of beetles on this planet, each that can be given its own name. And that’s just beetles. There are up to 1 billion different species of bacteria. If many of the species in the Mammalia class each get its own word, so to with at least some of the Prokaryot kindgom, no?
A similar problem exists with chemicals and all the permutations and combinations that lead to a near-infinite number of possibilities wherein the only real limits are those of chemistry and not language. How, praytell, would that work in your word count?
Oxygen, certainly yes. What about Dihydrogen monoxide? Or its synonyms, Dihydrogen Oxide, Hydrogen Hydroxide, Hydronium Hydroxide and Hydric acid? Get to know these chemicals (
Clearly there are some tough (and by tough, I mean completely arbitrary) choices to be made in terms of counting words, what about language?), but good luck in figuring out how to count their names.
What is a Language?
I speak English. You (probably) speak English. We certainly don’t speak the same precise language in terms of word knowledge. Which one do we use? There are so many different levels at which a language can be defined that it’s impossible to declare a definition of what the limits of any given language is.
First, for a language like English, you have national differences. The language of America will have different words than that spoken in Canada, Australia and the UK, not to mention what people speak in India and Nigeria.
And even within a single country, you have regional dialects that have different lexicons:
The different ways of saying roundabout in America
And on down to the specific person, or, where each have our own way of speaking English, with different lists of words in our heads.
If you want to move away from the individual person and try to define the English language that is spoken in the world, it’s not clear what that really means. Is that the some total of all words across all self-reported English speakers? That’d be a mess.
You may try to go for some principled definition, e.g., the words in all books published in English, but that, too is problematic for who it excludes and the pride of place you give to literacy, the literary and editors.
Thus, as with the definition of word, you’re stuck with an arbitrary definition of what a language is.
Without a clear definition of word and without a clear definition of language you kind of sort of have no practical way of counting anything of anything. And we’re not talking requiring a level of exactitude that is within some reasonable margin of error. We’re talking potentially orders of magnitude difference depending on how you decide.
So, yes, by all means, count the number of word of English and say it’s 1,019,430, so long as you’re comfortable saying that’s +/- 1,000,000 words.
So why all the pissy vitriol on my part?
For starters, one website in particular (mentioned above) has raised ignorance on this issue to new levels of awfulness in the seeming hopes of generating profit. They say, on the issue of word counting:
Though GLM’s analysis was the subject of, the recent Google/Harvard Study of the Current Number of Words in the English Language is . At the time the New York Times article on the historic threshold several dissenting linguists as claiming that “even Google could not come up with” such a methodology. Unbeknownst to them Google was doing precisely that.
As if saying the word google magically makes everything better. The website must not have even bothered reading the NYT article as it doesn’t try to address one single issue mentioned. Indeed, the group behind this word count of English project is nothing more than a PR company – they seemingly take pride in elevating the amount of in our daily lives. It’s hard enough debunking the myths of well-meaning scientists, let alone people purposefully obfuscating the truth to make a buck.
I do appreciate the efforts of Computer Scientists in their endeavors to quantify anything that may need to be quantified. Indeed, there are certain branches of linguistics where exact answers aren’t obtainable and we must be okay with approximations and probability distributions over possible answers. I get it and understand the concept behind uncertainty and quantifying uncertainty.
At a certain point, though, there needs to be some recognition that the answer you’re providing is not meaningful in the sense that human beings would consider meaningful. So by all means, use these methods if you need an estimate on the amount of memory you’ll need in some program that indexes “all the words of English” but don’t pretend that you have calculated the “number of words in English” in any human sense of those words.
Don’t get me wrong: I’m all for finding the seed of truth in things that are otherwise considered garbage science . It’s actually sort of a little hobby of mine to revisit debunked ideas and mine them for interesting truths.
But this nonsense, crap, bushwa, b.s., bubbe-meises, I simply can’t stand.