University of Leeds Logo Lancaster University

ASSIST: Automated Semantic Assistance for Translators


Translators have access to a wealth of information during the process of translating a text. This includes monolingual dictionaries to examine the senses in the source and target languages, and bilingual dictionaries to examine lexical equivalence. Less available is a translation (or parallel) corpus which provides examples of how translation equivalents are used in the target language. The recent research focus in translation studies has been on providing translation equivalents for technical vocabulary in a restricted domain. The problem we aim to address is that of providing contextual examples of translation equivalents for words from the general lexicon. This also suggests the need for using comparable corpora, i.e. collections of texts with similar properties in several languages, as parallel corpora are typically small (less than one million words) and not representative. For a sentence in the source language (English or Russian) the tool will give examples of similar contexts in the target language selected from the target language corpus and provide an interface for creating and maintaining a user dictionary of contextualised translation equivalents. The reason we concentrate on the general lexicon is because of the variety of meanings and possible translations that are exhibited by words from the general lexicon, but are not usually covered by translation equivalence lists given in bilingual dictionaries.

The project is supported by two EPSRC grants: EP/C004574 for (UCREL) Lancaster, EP/C005902 for (Centre for Translation Studies) Leeds. Project dates: April 2005 - June 2007.

More information in the Computing Department projects database entry.

Principal Investigators: Roger Garside (Computing, Lancaster University) Tony Hartley, (Centre for Translation Studies, University of Leeds)

Co-investigators: Paul Rayson (Computing, Lancaster University), Serge Sharoff, (Centre for Translation Studies, University of Leeds) Tony McEnery (Linguistics, Lancaster University), Andrew Wilson (Linguistics, Lancaster University),

Researchers: Scott Song-lin Piao (Computing, Lancaster University), Olga Mudraya (Linguistics, Lancaster University), Bogdan Babych (Centre for Translation Studies, University of Leeds)

Availability of project resources

Project events

  1. Third International Workshop on Language Resources for Translation Work, Research & Training. A Satellite Event of LREC 2006. 28th May 2006. Magazzini del Cotone Conference Center, Genoa, Italy.
  2. EACL 2006 Workshop on Multi-word-expressions in a multilingual context, Trento, (Italy), 3rd April 2006.

Project publications

  1. Sharoff, S., Babych, B., Hartley, A. (forthcoming). 'Irrefragable answers' using comparable corpora to retrieve translation equivalents, accepted by Language Resources and Evaluation Journal.
  2. Mudraya, O., Piao, S.L., Rayson, P., Sharoff, S., Babych, B. and Löfberg, L. (forthcoming). Automatic Extraction of Translation Equivalents of Phrasal and Light Verbs in English and Russian. In Granger, S. and Meunier, F. (eds.) Phraseology : an interdisciplinary perspective. Benjamins, Amsterdam.
  3. Rayson, P. and Stevenson, M. (forthcoming) Sense and semantic tagging, chapter 27 in Lüdeling, A. and Kytö, M. Corpus Linguistics. An international handbook (Handbooks of Linguistics and Communication Science Series), Mouton de Gruyter, Berlin.
  4. Babych, B., Hartley, A., Sharoff, S. and Mudraya, O. (2007). Assisting Translators in Indirect Lexical Transfer. In proceedings of 45th Annual Meeting of the Association for Computational Linguistics (ACL 2007), Prague, Czech Republic, June 23-30 2007. (Video of the conference presentation available on the ACL Video Archive)
  5. Sharoff, S. and Munday, J. (2007). Corpora for Translators and Presentation of ASSIST. Workshop at the ITI (Institute of Translation & Interpreting) Conference 2007, London, 21-22 April 2007.
  6. Mudraya, O., Babych, B., Piao, S., Rayson, P., Wilson, A. (2006). Developing a Russian semantic tagger for automatic semantic annotation. In proceedings of Corpus Linguistics 2006, St. Petersburg, Russia, 10-14 October 2006, pp. 282-289 (in Russian), pp. 290-297 (in English). English PDF version Russian PDF version (slides)
  7. Piao, S. L., Rayson, P., Mudraya, O., Wilson, A. and Garside, R. (2006) Measuring MWE compositionality using semantic annotation. In proceedings of COLING/ACL workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties, July 23, 2006, Sydney, Australia. PDF version (Download data for human ratings)
  8. Sharoff, S., Babych, B., Hartley, A. (2006) Using comparable corpora to solve problems difficult for human translators. In Proceedings of COLING/ACL 2006 Conference, Sydney, July 2006, pp. 739-746. PDF version
  9. Rayson, P. (2006) Falling foul of multiword expressions. Presented at the Workshop on Chinese Multi-word expressions and MT. China Centre for Information Industry Development (CCID), Beijing, P.R. China. June 9th 2006.
  10. Rayson, P. (2006) Moving from key words to key domains. Invited talk at the Chinese Academy of Social Sciences, Beijing, P.R. China. June 8th, 2006.
  11. Rayson, P. (2006) Automated semantic assistance for human translators. Invited talk at the Department of Chinese, Translation and Linguistics, City University of Hong Kong. June 5th, 2006.
  12. Sharoff, S. (2006) Translation as problem solving: uses of comparable corpora. In Proc. of Third International Workshop on Language Resources for Translation Work, Research & Training at LREC2006, Genoa, May, 2006. PDF version
  13. Sharoff, S., Babych, B., Hartley, A. (2006) Using collocations from comparable corpora to find translation equivalents. In Proc. of LREC2006, Genoa, May, 2006, pp. 465-470. PDF version
  14. Sharoff, S., Babych, B., Rayson, P., Mudraya, P. and Piao, S. (2006) ASSIST: Automated Semantic Assistance for Translators. In companion proceedings to the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2006), Trento, Italy, April 3-7, 2006, pp. 139 - 142. ISBN 1-932432-60-4. PDF version
  15. Mudraya, O., Piao, S.L., Löfberg, L., Rayson, P., Archer, D. (2005). English-Russian-Finnish cross-language comparison of phrasal verb translation equivalents. In Cosme, C., Gouverneur, C., Meunier, F., & Paquot, M. (eds.), Proceedings of the Phraseology 2005 Conference, Lovain-la-Neuve, Belgium, 13-15 October 2005, pp. 277-281. PDF version
  16. Scott S.L. Piao, Dawn Archer, Olga Mudraya, Paul Rayson, Roger Garside, Tony McEnery, Andrew Wilson (2005) A Large Semantic Lexicon for Corpus Annotation. In proceedings of the Corpus Linguistics 2005 conference, July 14-17, Birmingham, UK. Proceedings from the Corpus Linguistics Conference Series on-line e-journal, Vol. 1, no. 1, ISSN 1747-9398. PDF version
  17. Rayson, P. (2005) Right from the word go: identifying multi-word-expressions for semantic tagging. Invited talk at BAAL Corpus Linguistics SIG / OTA Workshop: Identifying and Researching Multi-Word Units. Thursday 21st April 2005, Oxford University Computing Services. (PDF versionslides)
  18. Sharoff, S. (2004) Harnessing the lawless: using comparable corpora to find translation equivalents. Journal of Applied Linguistics 1(3), 333-350. PDF version
  19. S. Sharoff, P. Rayson, O. Mudraya, A. Wilson and T. McEnery (2004). A tool for assisting translators using automatic semantic annotation. Presented at Corpus Use and Learning to Translate (CULT-BCN) Barcelona, January 22nd-24th 2004.