Rewind back to the movies made about life in the previous centuries. Countries were ruled by monarchs and kingdoms communicated by sending over their representatives. Each king had a personal translator, who was well versed in other foreign languages. The translator, thus, was the bridge between the king and the messenger, establishing a clear channel of communication. Therefore, every time the king had to travel to a foreign land, it was essential for the translator to accompany him.
Star Wars has something similar, C-3PO, a super-advanced humanoid robot that is capable of speaking six million languages. It is also well-versed with alien customs and etiquettes, which facilitates rendezvous with non-human forms. Though highly reliable, it is a tedious task to carry a robot around whenever there is translation required.
Similarly, the movie Koi … Mil Gaya is about an alien spaceship being summoned to Earth when a scientist plays a particular syllable in different ways in order to generate a signal that could be used for communication. This exercise exhibits the working of a translator that converts the received signal in a manner that can be perceived by the aliens and hence, summoning them to earth. Though the presence of a translator has not been explicitly mentioned, this is essentially how the communication system works.
The idea of a universal translator is quite simple: it is a device that translates between any two languages in the shortest possible time. Translation is a two step process: decoding the source text and re-coding it in the target language. Practically, the work is not that simple. Human languages are complex in nature. The variation is not confined only to the type of languages; there is variation in the dialect, structure, grammar and sentence formation for each language, thus, requiring the translator to have in-depth knowledge of both the source and target languages. Hence, existing artificial intelligence hasn’t been able to master it quite yet.
A translator’s purpose is to present an instant translation of any language. A device common to many science fiction works, a universal translator was first mentioned in a 1945 novella ‘First Contact’ by Murphy Leinster. In this, extraterrestrials communicate with human beings by means of electromagnetic waves and developing an artificial code that acts as intermediary, referred to as ‘interlingua’. This is essentially a two translation process; alien language to artificial code and artificial code to human language.
Tardis from Doctor Who is a machine that can travel through space and time and also offer translation services. It utilizes a telepathic field which can automatically translate speech and text. It is slightly complicated since it is directly connected to the doctor, which implies that the doctor’s speech abilities control Tardis’s accessibility. That is also the reason why it does no work when the doctor goes into a coma.
Star Trek essentially uses the universal translator to communicate with the aliens. The technology there is one step ahead – the device automatically translates any person’s speech into a person’s primary language, i.e. an English speaking person would hear the other person speaking in English irrespective of the language being spoken by the other person. Here, the earth version of the translator was invented sometime before the year 2151, and it was still at the experimental stage. It was a handheld device handled by a skilled linguist, who was still required to break down some complicated translations. The device operates by scanning brain wave frequencies and using that as the basis of translation. However, in real life, a universal translator seems unrealistic because of the need for telepathy but the scientists have been working diligently towards similar real world technologies that involve the portion of known languages.
The universal translator referred to in the sci-fi movies is essentially an encoding and decoding machine, which is not a computer software but an actual intermediary enabling communication. The use of the same idea was attempted for real life translators. Research explored this intermediary conserving the basic principles of the language like grammar while using abstraction to eliminate the unique feature of each language. The processing occurs at the sentence level, taking into account the context. This process requires only a bilingual dictionary, but halts in the state of ambiguity which is, in real life, every sentence. In reality, translation works better with more data being available on languages. Though there has been tremendous progress in natural language processing, it is the subtleties of the meaning where the machines are still lagging. irrespective of the amount of data provided, we might never be able to achieve accuracy in this aspect.
Language models are the basis for any machine translation purposes. A language model is a probability distribution of words or word sequences. It gives the probability of a word sequence being valid – not in a grammatically sense but the resemblance of how people would speak or write. A language model gives us an abstract understanding of natural language and a good machine translation can be built using language models of different languages.
Language models are of two kinds:
- Simple probabilistic models: This model is built by calculating n-gram probabilities, a conditional probability that the sequence’s last word follows a particular penultimate word, leaving the last word out. This is based on Markov assumption – given the present (penultimate word), the future (the last word) does not depend on the past (other words). There are obvious drawbacks with this approach. Only the preceding n words affect the next word. But some complicated texts might have a profound meaning, that may have an influence on the next word. As the size n increases, the number of permutations also increases and it’s a huge task to calculate and store these probabilities. Non-occurring n grams create a sparsity problem, lowering the granularity of the probability distribution.
- Neural network based models: These models have embedding layers that create a random size vector for each word, that accommodates semantic relationships as well. These vectors create the needed granularity. Also, this language model is practically a function, eliminating the need to store all the n – gram counts.
Although neural networks help to overcome the sparsity problem, the context problem still remains. The first goal was to bring in more contextual words to increase the probability distribution and second is to create an architecture that enables the model to understand which context words are more important than others.
The first model is a dense layer and an output layer stacked on the top of the Continuous bag of words Word2Vec Model. This model is trained to guess the word from the context; it is trained by providing it with examples of n input words before/after the word. In this, the context problem still exists.
An improvisation is the use of Recurrent Neural Networks (RNNs) which takes all previous words into account before choosing the next word. Since it is sequential, for long sequences, training times are huge due to lack of parallelisation. Transformer architecture is a solution. A transformer consists of an encoder and a decoder. The encoder takes in the input sentence, vectorises it and transforms it using attention. The decoder does the opposite, from vector form to sentence. The sentence output can be in a different language, making transformer an ideal candidate for translation.
Machine translation is a natural language processing application that is used to translate texts between languages, that is, training the computer to understand languages as humans do. In order to build a good machine translation tool, it is necessary for the tool to master other knowledge such as grammar, language generation and understanding. Earlier, machine translation was using phrase based translation, which translates known phrases using supervised learning. The accuracy was lost on longer pieces of text.
Three major approaches emerged in machine translation: Rule-based machine translation (RBMT) (1970-90s), Statistical machine translation (SMT) (1990s-2010s) and Neural machine translation (NMT) (2014s-present). RBMT requires expert’s knowledge about the source and target languages in order to establish the semantic and syntactic rules of translation. It contains a pipeline of natural language processing which has to be performed on both the languages. It needs good dictionaries, rules that are set manually and harder rules, difficult to deal with the system. SMT uses the model that states, given a sentence in target language, a sentence from the source is sought which was translated to produce the target sentence. The source sentence is chosen such as to maximize the probability of the source given the target. This is hard to fix errors and less reliable for languages that have huge differences in their word order.
NMTs are built using one network instead of a pipeline of separate tasks. The architecture is sequence to sequence, involving an encoder and decoder. Encoder RNN will produce an encoding of the source sentence and decoder RNN will produce the target sentence subject to the encoded sentence.
In 2016, Google changed its algorithm for its Google translation tool by transitioning from phrase based translation to deep learning. Deep learning in artificial intelligence (AI) seeks to explore processes that have very little human involvement. The setting up of interlanguages is long gone and people are developing more multilingual learning methods. Deep learning, which is based on neural networks, tends to interpret the text in its own way rather than in accordance with a set of rules set by humans. The machine, therefore, has its own representations. This has increased the translation accuracy by 60%.
Today Google and Microsoft are capable of translating over 100 different languages, with the ability to achieve human level accuracy for a few of them. In reality, we are still far behind in developing an instant translator that can translate effortlessly. The Google pixel buds are a silver lining, but it fails to offer instant translations and is limited to a certain number of languages. But with each leap in development, it is going a step further than its predecessors. Neural machine translation has a bright future, and with time as artificial intelligence evolves and neural networks become complex, its capabilities will grow. But the dream of a universal translator is still a long shot. No device, in its entirety, can hope to replace human multilingualism. The future of translation will be a combination of human and NMT – where machines will be able to bring an agreeable amount of translation combined with the critical thinking, creativity and nuanced interpretation by humans.