E N D
Name : Christoffel Daniel Yesaya Tambunan Student ID: 9169420231 Course : Natural Language Programming a)Prove that the loss J (Equation 2) is the same as the cross-entropy loss between y and ? ̂ (note that y, ? ̂ are vectors and ?? ̂ is a scalar): Answer : y is a one-hot encoded vector. •For ? ≠ 0,??= 0.→ ??log(? ̂?) = 0 •For ? = 0,??= 1.→ ??log(? ̂?) = log(? ̂?) •Substitute it, •The loss J is the same as cross-entropy loss. b)Compute the partial derivative of ?(??,?,?)with respect to vc. Please write your answer in terms of y, ? ̂, and U. Show your work (the whole procedure) to receive full credit. Answer : ???) exp (?? ??(??,?,?) ??? exp(?? ? ) = − ??? ??? ∑ ??? ?∈????? ? ???)) − ???∑ ??? )) (???(exp(?? = − exp (?? ??? ?∈????? ? ???) − ???∑ ??? )) ((?? = − exp (?? ??? ?∈????? ???)?? exp (?? exp(?? ?? ?(? = ?|? = ?) = −?0+ ∑ = −?0+ ∑ ??? ∑ ) ?∈????? ?∈????? ?∈????? = ?(? ̂ − ?) = −?0+ ∑ ? ̂??? ?∈????? c)Compute the partial derivatives of ?(??,?,?)with respect to each of the ‘outside’ word vectors, ??’?. There will be two cases: when ? = ?, the true ‘outside’ word vector, and ? ≠ ?, for all other words. Please write your answer in terms of y, ? ̂, and vc. In this part, you may use specific elements within these terms as well (such as y1, y2, . . . ). Note that uw is a vector while y1, y2, . . . are scalars. Show your work (the whole procedure) to receive full credit.
Answer : ???) exp (?? ??(??,?,?) ?? exp(?? ? ) = − ??? ??? ∑ ??? ?∈????? ? )) ???) − ???∑ ??? ((?? = − exp (?? ??? ?∈????? ???) ??? ???)) = −?(?? +?(log (∑ exp (?? ?∈????? ??? •When w=o : = −??+ ?(? = ?|? = ?)??= ? ̂???− ?? = ??(? ̂?− 1) •When w≠o : = 0 + +?(? = ?|? = ?)?? = ? ̂??? d)Write down the partial derivative of ?(??,?,?) with respect to U. Please break down you answer in terms of the column ??(??,?,?) ,??(??,?,?) ??2 ,…,??(??,?,?) ??|?????|. No ??1 derivations are necessary, just an answer in the form of a matrix. Answer : The partial derivatives of each component of J with respect to each element of U (parameter of outside words vec). [??(??,?,?) ??1 ,??(??,?,?) ??2 ,??(??,?,?) ??3 ,…,??(??,?,?) ??|?????| ] e)Suppose the center word is ? = ??and the context window is [??−?,...,??−1,??,??+1,...,??+?], where ? is the context window size. Recall that for the skip-gram version of word2vec, the total loss for the context window is: ?????−????(??,??−?,…,??+?,? = ∑ ?(??,??+?,?) −?≤?≤?,?≠0 Here, ?(??,??+?,? represents an arbitrary loss term for the center word ? = ?? and outside word ??+?.?(??,??+?,?) is equal to Equation 2. Write down three partial derivatives:
??????−????(??,??−?,…,??+?,?) ?? ??????−????(??,??−?,…,??+?,?) ??? ??????−????(??,??−?,…,??+?,?) ??? I. II. III. ?ℎ?? ? ≠ ? ??(??,??+?,?) ?? ??(??,??+?,?) ??? Write your answers in terms of solution should be one line. and . This is very simple – each Answer : ??????−????(??,??−?,…,??+?,?) ?? ??????−????(??,??−?,…,??+?,?) ??? ??????−????(??,??−?,…,??+?,?) ??? ??(??,??+?,?) ?? ??(??,??+?,?) ??? I. = ∑ −?≤?≤?,?≠0 II. = ∑ −?≤?≤?,?≠0 III. ?ℎ?? ? ≠ ? = 0