Retazos conectados

Escritos de Diego Bartolomé para acompañar a la vida

Posts Tagged ‘tauyou

Customized Machine Translation: the Path to Success

No doubt, you have used one of the many online machine translators (MT) to translate a short text or a web page. If you didn’t speak the target language, you probably looked at the result and thought, “Well, that French looks like good enough to me!” On the other hand, if you did speak the target language, you probably thought, “MT doesn’t work. This isn’t the quality I want to offer my customers. MT produces awful results. MT is the enemy that will put me out of business.”

 However, online MT might not be the best approach to solve your pains (or your customers’). Nowadays, you can select among several MT providers that offer a wide variety of solutions ranging from rule-based MT (e.g. Systran or Lucy Software) to the best-of-breed statistical MT (e.g. tauyou, Asia Online or Safaba), or you can even develop a system yourself using Moses, Joshua, Apertium, or any other open-source software. Your decision depends mainly on your available resources and the time to market. Now might be already too late.

 After choosing the engine, there are languages that work best with MT. If we take English as the source, translating into Latin Languages with MT can lead you to realistic productivity increases above 200% with a customized statistical MT in a controlled environment. On the other side, languages such a Polish, Arabic, or Chinese yield to the worse results, although still with some minor productivity gains. The general rule here is that the closer the languages, the better the results, no matter if the engine is statistical or rule-based. You shall measure everything you can, ranging from post-editing time, subjective evaluation, and metrics such as the Word Error Rate or the Post-editing distance.

 The domain you choose is also relevant, since it has an impact on the quality. The domain might be decided by your customer, although it is a better decision if you decide based on your overall revenue, your skills, and the domain results. Since it is easy to try and see how good it works, take your decisions based on available data. In our experience, technical domains, as well as legal or medical, yield to significantly better results than marketing or news. The metrics you will be extracting from your projects will tell you where to focus your efforts.

 Integrating a new tool in your workflow is never straightforward, because people need to change the way they work. In that sense, the less number of steps, the better. Thanks to current APIs such as the one defined by TAUS, MT systems can be plugged into any CAT tool fairly easily. However, other differentiated workflows can also provide the best adaptation and use of resources. Many of our customers first pre-translate the files, and then they run that file through MT, which will translate the remaining segments. This approach provided optimum results before the widespread availability of APIs. Choosing an API or the pre-processing step depends mainly on the CAT tool, your current process, and the translation speed of your MT provider.

 After putting MT into production, the key task is to optimize the engines and provide automatic feedback so that they can improve over time. The first one, and maybe the most important one, is correcting the errors translators feel are important. After that, you can analyze the metrics and develop rules either at the input or at the output to improve the quality. The most common approach is to feed the engines with the updated translation memories (TMs) periodically. At the beginning of the implementation this is done more often (e.g. weekly), and after around 3 months, updates become less important because the quality will improve, as metrics will state.

 We have not mentioned the most important aspect of human quality MT, which is the post-editing staff. Certainly, the skills required by post-editors differ slightly from the translators. That is the reason why organizations such as GALA are promoting initiatives such as the GALA Talent Program. Post-editors should be given clear guidelines, they shall know whatis expected from their job. For instance, a client translating e-commerce listings has a clear description of the types of edits they should do and the maximum amount of time that should be invested in any item. Regarding payment, a fair scheme is needed where everybody wins: end client, LSP, and freelance. Do not simply apply a rate reduction that might lead to jobs rejection and decrease in the morale of your team.

Nowadays, we see many more types of content than we used to – e-mail, multilingual chat, social networking, document filtering and selection, etc. – and more are sure to come in the future. Some of the customers with translation needs in these areas would be quite happy with a customized MT solution that produces “good enough” quality without any human intervention. Your best option is not to try to convince them they need better quality (and refuse to use MT at all), but rather to embrace the chance to diversify your portfolio with MT-based services that bring new, recurring revenues for your company. The business model is also a key aspect of the implementation of MT in your organization. Think about it in advance!

(This article was published in Dragosfer in 2013)

Written by Diego Bartolome

15/02/2014 at 10:34

Xerrada emprenedora a la UB

L’Albert Torruella m’ha convidat a les seves classes de Creació d’Empreses a la Universitat de Barcelona (UB). I encantat hi he anat, disposat a respondre a les seves preguntes i les dels seus estudiants.

Sempre és interessant, n’aprenc i m’obliga a parar una mica i repensar tot plegat. També, recordes els bons i mals moments. Durant una hora i mitja hem conversat sobre socis, els inicis, les idees, el moment actual, el futur, i molts més temes. La veritat és que se m’ha passat rapidíssim.

El millor, però, aquests dos tweets de dues persones que estaven a la classe. Gràcies a tots per escoltar-me. Espero que us hagi estat útil.

Fins la propera, força i sort!

tweets from UB

Written by Diego Bartolome

12/03/2013 at 23:51

End of the year

and these are the things we’ve done during the semester at tauyou <language technology>.

Have a great 2013! It depends on you.

Written by Diego Bartolome

31/12/2012 at 09:49

A %d blogueros les gusta esto: