Itzuli: machine translator based on neural networks
Itzuli is a translator based on neural networks capable of translating bidirectionally between Basque and three other languages (French, English and Spanish).
The solution is designed to be a digital service that helps boost the use of a minority language such as Basque. It is also designed to improve the translation service of the Basque Government and other public administrations, making the results of the service delivered faster. It has been achieved that the service is used continuously and that the translation services of the different public administrations apply their knowledge to the correction instead of to the complete translation of documents.
To achieve all this, a service has been developed that is consumed through Rest API calls and that is used from a web application, a mobile app and even a browser extension. The service is coded in Go and makes use of the Marian Server engine to run the translation model.
Citizens can use the solution without having to provide any extra data, accessing the service through the website, the mobile app or the extension. In the case of translation services of public administrations, it is necessary to use the Rest API directly, providing the API key that is provided and that unequivocally identifies them.
To measure the performance and test the solution periodically, Jmeter scripts are used to see if the changes made improve response times and respond well to base translations.
In addition, to measure the quality of the translation, a set of tests is used and evaluated using the BLEU method (Bilingual Evaluation Understudy)
From the beginning of the first tests until the deploy on production environment, a year passed. Then the service has been improved with the installation of Istio, improvements in performance with important changes in programming language, development of plugins for translation software, etc.
Every year, revisions and new trainings are carried out to update and improve the translation results. For this, new public data (such as news) and new translation memories are added.
All this development and improvement has been carried out in collaboration with a company specialized in AI called Vicomtech, since within the computer service of the Basque Government there is not the necessary knowledge for the creation of models of this type. A total of 7 people worked on the project: 1 project manager, 1 developer, 2 AI specialists, 1 DevOps specialist, 1 systems analysts and 1 QA specialist.
Results and Impact assessment
Itzuli translator has had a great impact on people in the Basque country having a positive impact on society, favoring integration and language learning. Its use is widespread as a daily use tool for users. Currently, more than 250k of translations are carried out daily, and it has become a massively used tool both at a social level, as well as in education and public administrations
According to our statistics, 75% of the translations are less than 1 second response time with a correct request rate of 99.98%, which makes the service remarkably reliable.
On the professional side, the translation services of the public administration integrate Itzuli into their translation tools, saving time in the delivery of translated documents and facilitating the translation work for these people.
Dependencies and constraints
The implementation of Itzuli presents a series of technological, data and software challenges that have had to be solved.
In terms of infrastructure challenges, to provide the solution with scalability and self-recovery, Red Hat Openshift is used as the Kubernetes platform. In addition, in order to deploy the artificial intelligence models created, graphics cards (GPUs) have been installed in the Basque Government's CPD.
In addition, it has had to be integrated to be compatible with NVIDIA software (cuda drivers), VMWare virtual machines with support for vGPU and development of AI models with Marian Server.
When talking about data challenges, the problem of how to collect them and how to share them appears. The Department of Culture and Language Policy of the Basque Government and the Basque Institute of Public Administration had a lot of data that could be used to train translation models. This data was the result of years of manual translation work. A specific contracting framework was created to share the data and this data was treated with an anonymization process so as not to expose sensitive data.
This data was added to public data used to train a model that was taken as the basis of the translation model: public subtitles, Wikipedia articles, book translations, etc.
The software-related challenges were to get the solution to work quickly and to control usage to avoid DoS attacks. For the latter, a Service Mesh (Istio) was installed and configured in Openshift that allows you to control requests through an API Key and limit requests per second and minute related to each API Key.
As an example of use, you can access to the web,
use the app
Or use the web browser extension: