High-performance Translation Memory

Juš Lozej

Fullstack Engineer

Node.js

Other

I was tasked with developing a high-performance Translation Memory microservice, which would be able to integrate with existing CAT (Computer assisted translation) tools. The translation memory would be offered as a SaaS product among other translation oriented tools.

The main purpose of the service is to store segments of text in different languages and adding a way to match new segments with existing ones. To achieve this we used the multi-model database ArangoDB. It provided us with storing translations in documents that were connected in a graph for retrieving individual translations, while also having a good suite of text similarity measures. With these measures we designed analyzers that let you match source segments with other existing translations in a variety of languages.

Apart from storing translated segments, the memory services also contained a suite of analytics tools to determine how much of a newly added translation project we already have translations (or partial translations) for. This information was used to lower the price of new projects.

High performance was achieved with the use of optimized raw queries, appropriate indexing setup and database distribution.

Like this project