Metrics for evaluation ofword-level machine translation quality estimation

Varvara Logacheva, Michal Lukasik, Lucia Specia

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Citations (SciVal)

Abstract

The aim of this paper is to investigate suitable evaluation strategies for the task of word-level quality estimation of machine translation. We suggest various metrics to replace F1-score for the "BAD" class, which is currently used as main metric. We compare the metrics' performance on real system outputs and synthetically generated datasets and suggest a reliable alternative to the F1-BAD score-the multiplication of F1-scores for different classes. Other metrics have lower discriminative power and are biased by unfair labellings.

Original languageEnglish
Title of host publication54th Annual Meeting of the Association for Computational Linguistics, ACL 2016 - Short Papers
PublisherAssociation for Computational Linguistics (ACL)
Pages585-590
Number of pages6
ISBN (Electronic)9781510827592
Publication statusPublished - 2016
Externally publishedYes
Event54th Annual Meeting of the Association for Computational Linguistics, ACL 2016 - Berlin, Germany
Duration: 7 Aug 201612 Aug 2016

Publication series

Name54th Annual Meeting of the Association for Computational Linguistics, ACL 2016 - Short Papers

Conference

Conference54th Annual Meeting of the Association for Computational Linguistics, ACL 2016
Country/TerritoryGermany
CityBerlin
Period7/08/1612/08/16

Fingerprint

Dive into the research topics of 'Metrics for evaluation ofword-level machine translation quality estimation'. Together they form a unique fingerprint.

Cite this