torchtext.data.metrics¶
bleu_score¶
-
torchtext.data.metrics.
bleu_score
(candidate_corpus, references_corpus, max_n=4, weights=[0.25, 0.25, 0.25, 0.25])[source]¶ Computes the BLEU score between a candidate translation corpus and a references translation corpus. Based on https://www.aclweb.org/anthology/P02-1040.pdf
Parameters: - candidate_corpus – an iterable of candidate translations. Each translation is an iterable of tokens
- references_corpus – an iterable of iterables of reference translations. Each translation is an iterable of tokens
- max_n – the maximum n-gram we want to use. E.g. if max_n=3, we will use unigrams, bigrams and trigrams
- weights – a list of weights used for each n-gram category (uniform by default)
Examples
>>> from torchtext.data.metrics import bleu_score >>> candidate_corpus = [['I', 'ate', 'the', 'apple'], ['I', 'did']] >>> references_corpus = [[['I', 'ate', 'it'], ['I', 'ate', 'apples']], [['I', 'did']]] >>> bleu_score(candidate_corpus, references_corpus) 0.7598356856515925