JRC publishes texts to help development of computer-assisted translation systems
The EU’s Joint Research Centre (JRC) has published a million sentences translated into 22 official EU languages in a bid to help the development of computer-assisted translation technologies
and software, by offering free and open access to this collection of sentences, the EU hopes to foster multilingualism and provide a valuable resource for system developers to create machine
As part of its remit, the EU translates all its legal and political documents into all 23 official languages, meaning translators must work with 253 possible language pair combinations across
1.5 million pages a year. This also means there is a collection of translated texts which is of great value as a learning base for system developers.
‘By this initiative the European Commission intends to boost human language technologies, support multilingualism and make computer-assisted translation easier, cheaper and more accessible,’
said Leonard Orban, EU Commissioner for Multilingualism.
While it is relatively easy to find English/French documents on the web to aid such developments, it is much more difficult to find Latvian to Romanian, for example. ‘Citizens belonging to the
smaller linguistic communities will have an easier access to documents and web pages only available in the most used languages,’ Mr Orban continued.
Because the text is offered in context, it can also help develop and test grammar and spell checkers, online dictionaries and text classification systems.
‘This unique collection of language data contributes to the creation of a new generation of software tools for human language processing and helps foster the competitiveness of the language
industry, which is already one of the fastest growing industries in the European Union,’ said Janez Potocnik, European Commissioner for Science and Research.