Letters from the linguists: the evolution of computational linguistics
by Maria Krzyzak
Human language and computers, how well do they go together? Horror stories of chatbots gone rogue may come to mind, but machines interacting with human language has come a long way in the last 60-odd years.
Computational linguistics (CL), the field which combines human language and computer science, is relatively new to the game. If you’re scratching your head wondering what this all means, you’re probably not alone. But you might not realize that CL has already seeped into so many parts of our lives.
CL first came about in the 1950s, attempting to automatically translate Russian into English, the remnants of this first attempt being what we now know as machine translation. How far we’ve come since then! Now you can see the applications of CL through the personalized ads you receive, the AI assistant on your phone, spell check, and then some.
Briefly speaking, CL consists of two components. The engineering component attempts to automate linguistic processes, like in the examples above, to make the human-computer interaction easier. The scientific component contributes to theoretical understandings of how language works. Combining computers and human language gives us new perspectives on how humans acquire and learn languages and how our brains process all of this.
So, how difficult is it to get a computer to process, understand or generate human language? Firstly, think back to your English classes, learning all about parts of speech (e.g. nouns, verbs, prepositions etc.). Now imagine trying to formulate all the possible combinations of parts of speech to cover all the ways we might say something in English. This was the initial approach to CL. Rules were developed to reflect the grammar of a language. And while rule-based systems may cover a great deal of instances, how can we formulate a rule to tell whether “Well, what a surprise!” is genuine shock or sarcasm?
Or consider the following: person A asks person B, “Do you have a dog?” and B replies, “I have a cat”. Most people might understand that B doesn’t have a dog. But where is that stated in the answer? How do you compute something not said but understood?
Cases of ambiguity like this show that rule-based approaches simply can’t reflect everything that goes on in our brains. Not to mention cases where sentences may not follow standard grammar rules but still make sense. Fast forward 60 years and a lot has changed in this field. While nouns, pronouns and what-nots still play an important role, rule-based approaches are not at the forefront of CL anymore. Nonetheless, the field has become much more effective at reflecting real-life language usage. How, you ask? Brace yourself for two buzz words… Data and statistics.
Thanks to the internet, there’s a wealth of easily accessible linguistic data, from news reports to language on social media. Combine this data with the increase in computer processing power and you get more sophisticated learning models. CL now uses statistical models that are based on the probabilities of language appearing in certain environments. Models can learn the probability of language working together and make predictions about future instances, an example of this being the predictive text on your phone. Where rule-based models may have a binary option of whether language is grammatical or not, statistical-based models can tell us the likelihood of language appearing in a given context.
Given how much computational linguistics deals with the intricacies of human language, it might surprise you to know that till recently, CL was predominantly carried out by computer scientists. But many more linguists are now on the scene. Not to forget the cognitive psychologists, anthropologists, neuroscientists, mathematicians and more.
With CL being more interdisciplinary, it incorporates a wider variety of knowledge and ideas. What this means is we can approach social and commercial issues in more innovative ways. The melting pot that is computational linguistics has created ingenious solutions and will, no doubt, create many more in the future. That’s a brief intro to CL – now try and spot the 101 (or more) ways it has affected your life.
‘Letters from the linguists’ is a collection of thought-provoking articles about language and new technologies, contributed by Phrasee’s team of AI language technicians.
Phrasee’s language technicians program language generation frameworks to produce marketing copy that’s authentic to a brand’s voice. By combining art with science, and linguistics with artificial intelligence, they build custom language models that optimize marketing performance and drive more revenue for global brands.