Thursday, Google announced that Google Translate, its multilingual neural machine translation service, has begun offering the possibility of translation into 24 more languages. Ten of the new additions are African languages.
The list includes the Ashanti Twi language, which is spoken by about 11 million people in Ghana; Lingala, spoken by around 45 million people in Central Africa–mostly in the Democratic Republic of Congo; Tigrinya, spoken by about 8 million people in Eritrea and Ethiopia; Sepedi, spoken by around 14 million people in South Africa; and Oromo, spoken by 37 million people in Ethiopia and Kenya.
The African languages of Bambara, Jeje, Krio, Luganda, and Tsonga were also added.
Google software engineer and researcher Isaac Caswell revealed that the company implemented, for the first time, the use of a neural model of artificial intelligence that learned the languages “from scratch.”
He explained that to implement the new languages, Google used millions of examples that were needed for a system to “understand” and be able to translate them. With the neural model, also known as machine learning model, the added languages were trained in this way. Technology then began to “understand” how languages work. The company says it consulted representatives from several communities before releasing the new languages.
“Imagine that you are polyglot and that, based on your understanding of how languages are, you can interpret something. This is more or less how our neural network operates,” Caswell told BBC. Google, however, admits that the technology isn’t perfect, as some linguists have noted problems with the languages already available.
“For many supported languages, even the largest languages in Africa that we have supported–say like Yoruba, Igbo, the translation is not great. It will definitely get the idea across but often it will lose much of the subtlety of the language,” said Caswell.
Along with the inclusion of the 10 African languages, the new language update comes with Bhojpuri, which is spoken by as many as 50 million people in northern India, Nepal, and Fiji; Guarani, which is spoken by about 7 million people in Paraguay, as well as indigenous populations in Brazil, Argentina, and Chile; and Quechua, spoken by about 10 million indigenous people in Argentina, Peru, and Bolivia
With the new additions, Google Translator now offers a total of 133 languages. The tech giant has plans to soon add voice recognition.
Here is the full list of languages, including the African languages, recently added by Google Translate:
- Aymara – spoken by nearly 2 million people in Bolivia, Chile, and Peru
- Assamese – spoken by nearly 25 million people in northeast India
- Ashante – spoken by about 11 million people in Ghana
- Bambara – spoken by around 14 million people in Mali
- Boiapuri – spoken by around 50 million people in northern India, Nepal, and Fiji
- Diveí – spoken by around 300,000 people in the Maldives
- Dogri – spoken by around 3 million people in northern India
- Jeje – spoken by 7 million people from Ghana and Togo
- Guarani – spoken by 7 million people in Paraguay, Bolivia, Argentina, and Brazil
- Ilocano – spoken by around 10 million people in the northern Philippines
- Konkani – spoken by nearly 2 million people in central India
- Krio – spoken by nearly 4 million people in Sierra Leone
- Sorani Kurdish – spoken by around 8 million people (most of them from Iraq)
- Lingala – spoken by nearly 45 million people in the Republic of Congo, Angola, Republic of South Sudan, and Central African Republic
- Luganda – spoken by nearly 20 million people in Uganda and Rwanda
- Maithili – spoken by nearly 34 million people in northern India
- Manipuri – spoken by 2 million people in northeast India
- Mizo – spoken by around 830,000 people in northeast India
- Oromo – spoken by 37 million people in Ethiopia and Kenya
- Quechua – spoken by 10 million people in Peru, Bolivia, Ecuador, and regions close to the countries
- Sanskrit – spoken by 20,000 people in India
- Sepedi – spoken by around 14 million people in South Africa
- Tigrinya – spoken by nearly 8 million people in Eritrea and Ethiopia
- Tsonga – spoken by around 7 million people in Eswatini, Mozambique, South Africa, and Zimbabwe