Experts agree that prevailing current approaches to vocabulary learning are in desperate need of an overhaul. Data science and creativity are the answer.


“Vocabulary is easier to retain when it’s presented logically”

A report by Kevin Feldman and Kate Kinsella expressed alarm at the “scarcity of systematic, intentional vocabulary and language teaching”, noting that the vast majority of vocabulary instruction in schools is largely random and unplanned.[1]

What’s needed is well-chosen, carefully curated selections of words presented in context – and the latest innovations in machine learning are helping provide this.

Numerous studies have shown that new vocabulary is much easier to retain when it’s presented logically alongside other related words. As Justin Harris (et al.) puts it, “people learn best when information is presented in integrated contexts rather than as a set of isolated facts… a set of words connected in a grocery list is better remembered than the same set of words without context.”[2]

In an academic setting, this means using what Steven A. Stahl calls semantic mapping – sorting words into thematic groups and having children learn them collectively.[3] Doing so makes the learning of new vocabulary significantly faster and much more effective.[4]

“The latest machine learning technology”

The trouble is, organizing the entire English language into themed groups is no minor task. It requires a level of complex linguistic analysis that no lexicographer – let alone teacher – has the time or resources to achieve. It’s hardly surprising, then, that in the face of such a colossal challenge, materials available for vocabulary instruction have been so limited.

Until now, that is. At Mrs Wordsmith, we’re using the latest machine learning technology and corpus linguistics techniques to select, categorize, and curate the 10,000 words that make a significant difference to children’s academic success. Here’s how it works:

1. First, we feed huge quantities of well-formed grammatical text into our machine. These texts are everything from books and newspapers to online articles, as well as radio and television transcriptions that capture the way language is spoken. Variety, quality, and quantity are key.

2. Our machine then puts these words through a series of labelling processes. The words are sorted and assigned a variety of linguistic labels, before being sent back and re-labelled, again and again. The more labels each word has, the more linguistic and contextual information it carries and the better it can be organized.

3. At important stages, our language experts check on the process and make sure the machine is processing words and labels that meet our standards, sort of like inspectors on a production line. This is how our machine learns, and it’s getting smarter every day.

4. With the help of the data contained within each word’s labels, we curate the most useful 10,000 words into intuitive topics such as character or weather. Our machine then converts the words and their labels into vectors – a mathematical trick that means each word can carry a huge variety of information at once. This allows us to find patterns between words and add even more specific labels.

5. The result is a beautifully curated list of 10,000 challenging, useful words that develop children’s comprehension, writing, and analytical skills, and that enhance their achievement across the curriculum. We pass the machine’s output on to our Hollywood artists, who sugarcoat each word with a hilarious illustration that adds context and motivates children to learn. These are the words that make up our 10,000 Word Journey.


  1. Feldman, K. and Kinsella, K. (2005) Narrowing the Language Gap: The Case for Explicit Vocabulary Instruction. New York: Scholastic
  2. Harris, J., Michnick Golinkoff, R., Hirsk-Pasek, K. (2011) Lessons from the Crib for the Classroom: How Children Really Learn Vocabulary. From Susan Neuman B. Neuman and David B. Dickinson (Eds.), Narrowing the Language Gap: The Case for Explicit Vocabulary Instruction. Handbook of Early Literacy Research (3).
  3. Stahl, S.A. (1999) Vocabulary Development. Newton Upper Falls: Brookline Books
  4. Johnson, D..D., Pittelman, S.D., Toms- Bronowski, S. and Levin, K.M. (1984) An investigation of the effects of prior knowledge and vocabulary acquisition on passage comprehension (Program Report 84-5). Madison, WI: Wisconsin Center for Educational Research, University of Wisconsin.
Facebook Comments