[Day18] Seq2Seq

중요

BLEU score는 데이터의 X가 순서정보를 가진 단어들(문장)로 이루어져 있고, y 또한 단어들의 시리즈(문장)로 이루어진 경우에 사용되며, 번역을 하는 모델에 주로 사용된다. 여기서 3가지 요소를 고려하여 계산한다.

위 3가지 경우를 아래 예제로 진행하겠다.

1-gram(unigrams): The, more, see, the, more, the, merrier, flavor, the, food, has (11)
2-gram(bigrams): The more, more see, see the, the more, more the, the merrier, merrier flavor, flavor the, the food, food has (10)
3-gram(trigrams): The more see, more see the, see the more, the more the, more the merrier, the merrier flavor, merrier flavor the, flavor the food, the food has (9)
4-gram(4-gram): The more see the, more see the more, see the more the, the more the merrier, more the merrier flavor, the merrier flavor the, merrier flavor the food, flavor the food has (8)

위 순서쌍을 가지고 정답 문장하고 얼마나 겹치나를 계산한다.

예로 먼저 1-gram에 대해서 보게 되면 정답 문장들과 맞는것들에 중복된 단어들(the: 3, more: 2, merrier: 1)

이를 보정하기 위해 정답 문장에 있는 중복된는 단어의 max count(the: 2, more: 1, merrier:1)를 고려한다. 다른 n-gram도 같은 방식으로 처리한다.

위 예제에 대해서 문장길이에 대한 보정계수를 구하면

min(1, (𝑙𝑒𝑛𝑔𝑡ℎ 𝑜𝑓 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛) / (𝑙𝑒𝑛𝑔𝑡ℎ 𝑜𝑓 𝑟𝑒𝑓𝑒𝑟𝑒𝑛𝑐𝑒)) = min(1, 11/7) = 1

위 값을 종합해서 BLEU score를 계산하면 다음과 같다.

[Day22] 페이지랭크 & 전파 모델 (0)	2021.02.23
[Day21] 그래프 이론 기초 & 그래프 패턴 (0)	2021.02.22
[Day20] Self-supervised Pre-training Models (0)	2021.02.19
[Day19] Transformer (0)	2021.02.18
[Day17] LSTM and GRU (0)	2021.02.16
[Day16] NLP 기초 (0)	2021.02.15
[Day15] Generative model (0)	2021.02.05
[Day14] RNN (0)	2021.02.04