Machine Translation

Parallel Implementation Of Word Alignment Model:IBM MODEL 1Professor: Dr.AzimiFatemeAhmadi-FakhrAfshinArefiSaba JamalianDept. of Electrical and Computer EngineeringShiraz UniversityGeneral-purpose Programming of Massively ParallelGraphics Processors

Machine Translation • Suppose we are asked to translate a foreign sentencef into an English sentence e: f : f1 … fm e : e1 … el • What should we do ? • For each word in foreign sentence f , we find its most proper word in English. • Based on our knowledge in English language , we change the order of generated English words. • We might also need to change the words themselves. • f1 f2 f3 … fm • e1e2e3… em • e1e3e2em+1…el

Example امروزصبح بهمدرسه رفتم Translation Translation Model Finding its most proper word in English went school to morning today Language Model today morning went to school Reordering and Changing the words this morning I went to school

Statistical Translation Models امروزصبح بهمدرسه رفتم Translation Model Finding its most proper word in English went school to morning today t( go|رفتم) > t(x|رفتم) x as all other English words • The machine must know t(e|f) for all possible e and f to find the max. • Machine should be trained: • IBM Model 1-5 • Calculate t(f|e).

IBM Models 1 (Brown et.al [1993]) Corpus (Large Body Of Text) Model 1 t(f|e) for all e and f which are in the Corpus

IBM Models 1 (Brown et.al [1993]) Choose initialize value for t(f|e) for all f and e, then repeat the following steps until Convergence:

IBM Models 1 (Brown et.al [1993]) The problem is to find t(f|e) for all e and f t(f|e): -- -- -- -- -- -- -- -- -- -- -- -- ei -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- fj How probable it is that fj be the translation of ei

IBM Models 1 (Brown et.al [1993]) t(f|e): -- -- -- -- -- -- Initialize -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- c(f|e): -- -- -- -- -- -- -- -- -- -- -- -- ei -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- fj ∑ of each Row C(f|e) Total(e): Initialize to Zero ei - - - - -

IBM Models 1 (Brown et.al [1993]) • In each sentence pair , for each f in foreign sentence, we calculate ∑ t(f|e) for all e in the English sentence , called totals . • Suppose we are given : • <f(s),e(s)>: < (f1 f2 f3) , ( e1 e2 e3 e4) > • Totals [2]= t(f|e)[1,2]+t(f|e)[2,2]+t(f|e)[3,2]+t(f|e)[4,2] • C(f|e)[1,2]+=t(f|e)[1,2]/totals[2] • Total_e[1]+=t(f|e)[1,2]/totals[2]

IBM Models 1 (Brown et.al [1993]) After processing all sentence pairs in the corpus, update the value of t(f|e) for all e and f: t(f|e)[i,j] = C(f|e)[i,j]/total(e)[i] Start processing the sentence pairs, Calculating C(f|e) and total(e) using t(f|e) Continue the process until value t(f|e) has converged to a desired value.

IBM Model 1 (Psudou Code) initialize t(f|e) do until converge c(f|e)=0 for all e and f, total(e)=0 for all e, for all sentence pair do total(s,f)=0 for all f, for all f in f(s) do for e in all e(s) do total(s,f)+=t(f|e) for all e in e(s) do{ for all f in f(s) do c(f|e)+=t(f|e)/total(s,f) total(e)+=t(f|e)/total(s,f) for all e do for all f do t(f|e)=c(f|e)/total(e) Initialization Initialize to zero Calculating Totals for each f In f(s) Calculating C(f|e) and total(e) Updating t(f|e) using C(f|e) and total(e)

Parallelizing IBM Model 1 For each f,e it is independent of others initialize t(f|e) do until converge c(f|e)=0 for all e and f total(f)=0 for all f for all sentence pair do total(s,f)=0 for all f, for all e in e(s) do for f in all f(s) do{ total(s,f)+=t(f|e) for all e in e(s) do{ for all f in f(s) do c(f|e)+=t(f|e)/total(s,f) total(f)+=t(f|e)/total(s,f) for all e do for all f do t(f|e)=c(f|e)/total(f) For each f,e it is independent of others The process on each sentence pair is independent of others Updating the value of each t(f|e) for all t and f is independent of each other

Initialize t(f|e) Each thread initialize one entry of t(f|e) to a specified value: • __global__ void initialize(float* device_t_f_e){int pos=blockIdx.x*blockDim.x+threadIdx.x;device_t_f_e[pos]=(1.0/NUM_F);} • __global__ void initialize(float* device_t_f_e){int pos=blockIdx.x*blockDim.x+threadIdx.x;device_t_f_e[pos]=(100000/NUM_F);} Underflow is possible

Process Of Each Sentence Pair for all sentence pair do total(s,f)=0 for all f, for all e in e(s) do for f in all f(s) do{ total(s,f)+=t(f|e) for all e in e(s) do{ for all f in f(s) do c(f|e)+=t(f|e)/total(s,f) total(f)+=t(f|e)/total(s,f) Using shared memory No use of Reduction. Why? Each Thread Process one Sentence Pair Use atomicAdd(), as it’s possible that two or more threads add a value to c(f|e) or total(f) simultaneously. It is data dependent.

Updating t(f|e) __global__ void update (float* device_t_f_e, float* device_count_f_e, float* device_total_f, intblock_size, int Col) { int pos=blockIdx.x*block_size+threadIdx.x; float total=device_total_f[pos/Col]; float count=device_count_f_e[pos]; device_t_f_e[pos]=(100000*count/total); device_count_f_e[pos]=0; } Each thread update one entry of t(f|e) to a specified value And Set one entry of c(f|e) to zero for next iteration Here, it is not possible to set total(f) to Zero, As there is no synchronization between threads out of a block

Setting total(f) to Zero __global__ void total(float* device_total_f){ int pos=threadIdx.x+blockDim.x*blockIdx.x; device_total_f[pos]=0; } Each thread set one entry of total(f) to Zero:

Results

Future Goals • Convergence Condition: • We repeat the iterations of calculating C(f|e) and t(f|e) for 5 times. • But it should be driven from the value of t(f|e). • We wish to add it to our code as it has a capability of parallelization. • It’s just one of IBM Model 1-5, which are implemented as GIZA++ package. • We wish to parallelize 4 other models.

We Want to Express Our Appreciation to: Dr.Fazly For her useful comments and valuable notifications. Dr.Azimi For his kindness and full support.

Machine Translation