Word pairs – or collocations – are combinations of words that are statistically likely to appear together in text and in spoken language.

For example, ‘appetite’ is a collocate for ‘voracious’, and ‘plan’ is a collocate for ‘devious’, but the concept also applies to phrases, such as ‘bride and groom’ or ‘fish and chips’.

Collocations are such an ingrained concept, that certain phrases sound odd to us when a word isn’t followed by its usual partner, e.g. ‘bride and broom’ or ‘fish and peas’. This is because collocations form part of what is called ‘formulaic language’, which encompasses all the phrases, word pairs and whole sentences that we analyse as a single unit, as a ‘big word’, instead of as a union of two or more words.

Nonetheless, out of all the words in the English language, how is it possible to know when two words belong together? How can we know which words constitute a single ‘big word’? Thanks to modern technology, bodies of texts – anything from entire novels to 140-character tweets – can be scrutinised and mined for the patterns of how English behaves in the wild.

Collocations and formulaic language are key aspects of vocabulary and language learning because studies have shown that we rely on them a lot more than we previously thought. Although experts have not been able to determine the exact percentage of formulaic language in an English conversation between native speakers, there seems to be a general agreement that it is astoundingly large.

For example, Oppenheim (2000) found between 48 and 80 percent of his data consisted of identical strings, while Erman and Warren (2000) claimed that 52% to 58% of the language they analyzed was formulaic. This means that only half or less of what we say is actually made up on the spot, the other half are pre-made strings of words that we have stored in our long-term memory.

This is why children learning vocabulary must encounter the new word alongside its word pairs to gain a full understanding of the various contexts and formulas in which it can be used. Since there is no definite list or database where this information is compiled, we took matters into our own hands. And the rest is history.


