Klaus-Dirk Schmitz:
There are basically two different approaches to machine translation. On the one hand, there are so-called rule-based methods. These first analyze the sentence to be translated and identify individual words, word groups, subordinate clauses and sentences. When this structure is recognized, they transfer the individual words into the target language and build the structure again. So there are dictionaries for the source language, for the target language and a translation dictionary. Google Translate works very differently. Behind the program is a statistical system with a huge amount of data from texts that have already been translated. The system tries to draw conclusions from this data about what sentences look like and how to find translations. This has a great advantage: I can apply this system to any language for which I have a certain amount of data. That is why Google Translate is available in many different languages. With rule-based procedures, you have to set up new dictionaries and define rules for each new language. That is why these systems exist for the major, but not for "exotic" languages.

Why aren't machine translations working so well?
Klaus-Dirk Schmitz: The problem with any translation is ambiguity. Humans do not notice the ambiguities, because they mostly recognize the meaning from the content. In English, for example, it is often not clear whether lowercase words are nouns or verbs. People usually recognize this immediately, the computer does not. Nor can this problem be solved with more data. In a study, Google Translate was tested in 2010, 2011, 2012 and 2013 with the same text. In 2010 the translation was bad, in 2011 it got better, in 2012 it got even better and in 2013 it got worse again. It takes huge amounts of statistical data to draw conclusions about how a word is translated. But the larger the amount of data, the more ambiguities arise and the results get worse again.

How are the systems used today?
Klaus-Dirk Schmitz:
Google Translate is often used to get an idea of ​​what is in a text. For example, they get a patent application in Japanese and with Google Translate they get an impression of what it might be about. This approach is called informative translation. But there is no machine translation system that translates complex texts such as operating instructions as well as a human. However, machine systems can provide support. For example, it can be more efficient to enter a text in Google Translate and then correct the result than to translate the entire text yourself. Translation memories are another machine support. The basis is a database with many sentences that have already been translated from the source language into the target language. When a new text has to be translated, the program checks whether the same or a very similar sentence already exists in the database. The system then offers this translation. And I can decide whether the proposal fits or whether I have to make small changes. Today this is practice in many areas.

Interview: Christian Sander

April 2015