15 May 2017
9 AI language generation fails
AI language generation can be awesome.
If it couldn’t, Phrasee wouldn’t use it to generate email marketing subject lines that consistently outperform those written by humans.
We are in a privileged position that allows us to do this well: we have an amazing team that knows what they’re doing, a deep understanding of what makes language (especially marketing language) work, and AI language generation tech that’s way ahead of the industry curve.
But, when AI programmers and designers get in over their heads, things can get pretty interesting… and sometimes fails abound.
Here are a few of our all time favourites:
9 AI language generation fails
1) Microsoft’s racist Twitterbot
No AI fail list would be worth its salt without including Microsoft’s MAJOR fail in 2016 with its Twitterbot, “Tay”.
Designed to generate human-sounding tweets and interact with millennials, Tay’s learning algorithm didn’t account for the alarming number of horrible trolls on Twitter. Tay turned into an aggressive, racist, and sexist disaster in under a day, as the Twitterverse egged her on and things quickly escalated.
Microsoft had to pull the plug in less than 24 hours because the bot spiralled out of control. Several of the tweets Tay sent out are pretty amusing considering how incredibly absurd they are. For instance, when asked if British comedian Ricky Gervais was an atheist, the bot responded connecting Gervais with Adolf Hitler. Alarmingly, “she” seemed to really like to talk about her support of Hitler… as well as the holocaust, building a wall between the U.S. and Mexico, and how Trump would be a great president.
2) The Chinese version of Tay
Similar to Tay, Microsoft also created a chatbot to operate on the highly popular Chinese messaging app WeChat. Named Xiaobing, the AI chatbot was designed to have another young, female ‘voice’ and interact with users chatting about whatever they wanted.
After the Tay fiasco, an editor for Tech in Asia decided to run his own experiment to see whether Xiaobing had the same racist leanings as Tay, chatting with her in Chinese and translating the results for the article. Turns out, she didn’t seem to sway the same way as Tay and was a bit more on guard when she detected an unsavoury subject topic.
While chatting with Xiaobing wasn’t as big a fail as Tay, we still wouldn’t call it a smashing success…
Now, we can’t speak Chinese, but we find it hard to believe that, “I’m a wolf from the north that has fallen in love with you, little sheep”, is a typical way to introduce yourself in any language.
3) Facebook Messenger’s underwhelming chatbot minions
Just a couple of months ago, Facebook reportedly started scaling back its Messenger chatbot efforts after the programs failed to fulfil 70% of users’ requests. The social network had originally promised a large ecosystem of highly loquacious chatbots, but after about six months of testing out the systems, Facebook is now saying it will only offer a small selection of bots designed to handle a limited range of cases.
When the bots were first released on the world, a Gizmodo writer took it upon himself to test out the skills of Poncho, a cute little cat who was hoping to replace your weather app. Instead of providing clear answers about the weekend’s weather however (the one thing the bot was supposed to do) the conversations went round and round.
4) Creepy Christmas carols
Sometimes home assistants can say the darndest things…
When a toddler asked Alexa to play a song called, “Digger Digger” Alexa replied, “You want to hear a station for porn?” Alexa then continued to spew a string of X-rated filth before a viral sensation video was born. We imagine, no parents would want Alexa around.
6) Medium’s mess of an article experiment
While AI can help guide us in tailor-making and optimising language quite effectively, it isn’t advanced to the point of being able to write complete articles on its own just yet… We’re still waiting for that day!.
As an experiment to see if they could teach a neural network to write information in an article format, a writer for The Atlantic decided to input information on an article he was writing to an “advanced” neural network to see what it would produce. Probably thinking, “If a robot’s eventually going to take my job anyway, why not get it to help me become a better writer first?”
Now, although there were a couple of poetic lines in the finished article, it wasn’t a piece of writing anyone would actually want to read, let alone be able to make sense of.
Case in point:
“…and more like modernings in our computer. Of course, this is not a human work. That’s the web that’s selfies that would be the moon is that they’ll make it online. That’s not only more importantly changed and most of them in all of those and questions about their factors. For example, as far several cases, all this kind of regulations for information—that’s the person who painted itself that’s the technological change of human process.”
Exactly! this article is not human work.
7) Tinderbots galore
If you’re on Tinder, chances are you’ve come across a bunch of bots. Sometimes you can tell right away that the other ‘person’ you’re talking to may just be a computer program, but they’re getting smarter and it may now take you a couple of messages to realise you’ve been wasting your time.
8) ASOS’ suspiciously stiff customer service
Supposedly, this isn’t AI talking. But if that’s true and these messages were coming from real ASOS employees, then that makes them even more of a fail. Customers complained that robotic-sounding responses on Australia’s ASOS Facebook Page seemed to avoid customers’ questions and concerns. The disconnection in answers and their stiff-sounding language made customers question whether the retailer was using bots or not.
While customers thought the responses from ASOS were sounding automated, a tweet from the ASOS Australian Twitter account assured users, “We don’t do automated responses – we respond this way because we can’t discuss customers’ details publicly!” If that’s true then the company needs to start training their employees to sound more like real people…
9) Phrasee’s rom-com script experiment
Even we here at Phrasee have missed the mark a time or two.
While our subject line AI language generation tech performs flawlessly, we’ve learned the hard way that it can’t do everything (yet).
Last valentine’s day, we put our tech to the task of generating romantic comedy scripts for our blog’s readers, and it… well… didn’t go well.
But hey, if people like us didn’t try out new stuff once in a while, then we’d never make AI language generation advances, and that’s what we’re all about!