Letters from the linguists: how we know languages
By Trevor Beers
Languages have laws—like laws of physics, which state a dropped object will fall. There are some laws of English that you just can’t break, and linguists are always working to develop theories to explain how every native speaker knows these laws.
The sentence ‘Joe picked up the yellow’ doesn’t make any sense. Obviously. This article is about why that’s so obvious.
If you take a step back and think about the process you go through to come up with coherent language, you’ll realize you don’t actually do that much. You pick the words you want to use, but you never think, ‘okay, next I need a preposition’.
It’s hard to figure out what goes on in the background as we produce language. We can’t see words being strung into sentences in the same way we can watch a cell split in two under a microscope. So when it comes to studying language, we’re stuck looking at outputs and working backwards.
Here’s a look at one theory of how language knowledge is stored in our minds and how we come up with fluent communication.
It’s been real, Chomsky ✌️
If you’ve read anything about AI, you’ve probably come across the term ‘neural network’. An artificial neural network is modeled (ish) after the human brain’s, which is made up of neurons joined by synapses. Our friends in psychology and linguistics think our knowledge of language may also be stored in a network structure.
In our minds we have tons of entries, little language neurons, stored in a multidimensional space: everything from single words to fixed phrases to templates that we use to make sentences. Each entry has tags that store information like meaning and where it can and can’t be used in a sentence. The more stored information entries have in common, the more closely related they are in the network. Our language generation process is nothing more than combining these entries.
An individual word is the perfect example of a ready-to-use entry. We can plug a word right into a bigger template (if it fits what the bigger template is looking for):
I like _______.
“cats” → I like cats. ✔️
“dogs” → I like dogs. ✔️
“and” → I like and. ❌
Clearly the last one is wrong. This template wants to combine with network entries that are closely related in that they are ‘likeable’—entries like ‘cats’, ‘bananas’, or ‘music’. We know the slot in the last example isn’t properly filled, which is why we would never produce it (let alone think it makes sense).
Words are pretty fixed in form, but some of our bigger entries aren’t complete and have empty slots—something like ‘X is the new Y’. The creators of Orange is the new Black took advantage of this one.
One of the constraints for the ‘X is the new Y’ phrase seems to be that the two entries filling the empty slots need to be closely related in the network. That’s why ‘red is the new black’ works, but ‘pumpkin spice is the new industrial revolution’ doesn’t.
Thanks for reading—now go appreciate how cool it is that you don’t need a dictionary to know ‘I avocado you’ is just wrong.
‘Letters from the linguists’ is a collection of thought-provoking articles about language and new technologies, contributed by Phrasee’s team of AI language technicians.
Enjoyed this awesome content? Read about ‘Feminization of technology’.
Phrasee’s language technicians program language generation frameworks to produce marketing copy that’s authentic to a brand’s voice. By combining art with science, and linguistics with artificial intelligence, they build custom language models that optimize marketing performance and drive more revenue for global brands.