1 / 20

200 likes | 343 Views

Parallel Implementation Of Word Alignment Model: IBM MODEL 1 Professor: Dr.Azimi Fateme Ahmadi-Fakhr Afshin Arefi Saba Jamalian Dept. of Electrical and Computer Engineering Shiraz University General-purpose Programming of Massively Parallel Graphics Processors. Machine Translation.

Download Presentation
## Machine Translation

**An Image/Link below is provided (as is) to download presentation**
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.
Content is provided to you AS IS for your information and personal use only.
Download presentation by click this link.
While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

**Parallel Implementation Of Word Alignment Model:IBM MODEL**1Professor: Dr.AzimiFatemeAhmadi-FakhrAfshinArefiSaba JamalianDept. of Electrical and Computer EngineeringShiraz UniversityGeneral-purpose Programming of Massively ParallelGraphics Processors**Machine Translation**• Suppose we are asked to translate a foreign sentencef into an English sentence e: f : f1 … fm e : e1 … el • What should we do ? • For each word in foreign sentence f , we find its most proper word in English. • Based on our knowledge in English language , we change the order of generated English words. • We might also need to change the words themselves. • f1 f2 f3 … fm • e1e2e3… em • e1e3e2em+1…el**Example**امروزصبح بهمدرسه رفتم Translation Translation Model Finding its most proper word in English went school to morning today Language Model today morning went to school Reordering and Changing the words this morning I went to school**Statistical Translation Models**امروزصبح بهمدرسه رفتم Translation Model Finding its most proper word in English went school to morning today t( go|رفتم) > t(x|رفتم) x as all other English words • The machine must know t(e|f) for all possible e and f to find the max. • Machine should be trained: • IBM Model 1-5 • Calculate t(f|e).**IBM Models 1 (Brown et.al [1993])**Corpus (Large Body Of Text) Model 1 t(f|e) for all e and f which are in the Corpus**IBM Models 1 (Brown et.al [1993])**Choose initialize value for t(f|e) for all f and e, then repeat the following steps until Convergence:**IBM Models 1 (Brown et.al [1993])**The problem is to find t(f|e) for all e and f t(f|e): -- -- -- -- -- -- -- -- -- -- -- -- ei -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- fj How probable it is that fj be the translation of ei**IBM Models 1 (Brown et.al [1993])**t(f|e): -- -- -- -- -- -- Initialize -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- c(f|e): -- -- -- -- -- -- -- -- -- -- -- -- ei -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- fj ∑ of each Row C(f|e) Total(e): Initialize to Zero ei - - - - -**IBM Models 1 (Brown et.al [1993])**• In each sentence pair , for each f in foreign sentence, we calculate ∑ t(f|e) for all e in the English sentence , called totals . • Suppose we are given : • <f(s),e(s)>: < (f1 f2 f3) , ( e1 e2 e3 e4) > • Totals [2]= t(f|e)[1,2]+t(f|e)[2,2]+t(f|e)[3,2]+t(f|e)[4,2] • C(f|e)[1,2]+=t(f|e)[1,2]/totals[2] • Total_e[1]+=t(f|e)[1,2]/totals[2]**IBM Models 1 (Brown et.al [1993])**After processing all sentence pairs in the corpus, update the value of t(f|e) for all e and f: t(f|e)[i,j] = C(f|e)[i,j]/total(e)[i] Start processing the sentence pairs, Calculating C(f|e) and total(e) using t(f|e) Continue the process until value t(f|e) has converged to a desired value.**IBM Model 1 (Psudou Code)**initialize t(f|e) do until converge c(f|e)=0 for all e and f, total(e)=0 for all e, for all sentence pair do total(s,f)=0 for all f, for all f in f(s) do for e in all e(s) do total(s,f)+=t(f|e) for all e in e(s) do{ for all f in f(s) do c(f|e)+=t(f|e)/total(s,f) total(e)+=t(f|e)/total(s,f) for all e do for all f do t(f|e)=c(f|e)/total(e) Initialization Initialize to zero Calculating Totals for each f In f(s) Calculating C(f|e) and total(e) Updating t(f|e) using C(f|e) and total(e)**Parallelizing IBM Model 1**For each f,e it is independent of others initialize t(f|e) do until converge c(f|e)=0 for all e and f total(f)=0 for all f for all sentence pair do total(s,f)=0 for all f, for all e in e(s) do for f in all f(s) do{ total(s,f)+=t(f|e) for all e in e(s) do{ for all f in f(s) do c(f|e)+=t(f|e)/total(s,f) total(f)+=t(f|e)/total(s,f) for all e do for all f do t(f|e)=c(f|e)/total(f) For each f,e it is independent of others The process on each sentence pair is independent of others Updating the value of each t(f|e) for all t and f is independent of each other**Initialize t(f|e)**Each thread initialize one entry of t(f|e) to a specified value: • __global__ void initialize(float* device_t_f_e){int pos=blockIdx.x*blockDim.x+threadIdx.x;device_t_f_e[pos]=(1.0/NUM_F);} • __global__ void initialize(float* device_t_f_e){int pos=blockIdx.x*blockDim.x+threadIdx.x;device_t_f_e[pos]=(100000/NUM_F);} Underflow is possible**Process Of Each Sentence Pair**for all sentence pair do total(s,f)=0 for all f, for all e in e(s) do for f in all f(s) do{ total(s,f)+=t(f|e) for all e in e(s) do{ for all f in f(s) do c(f|e)+=t(f|e)/total(s,f) total(f)+=t(f|e)/total(s,f) Using shared memory No use of Reduction. Why? Each Thread Process one Sentence Pair Use atomicAdd(), as it’s possible that two or more threads add a value to c(f|e) or total(f) simultaneously. It is data dependent.**Updating t(f|e)**__global__ void update (float* device_t_f_e, float* device_count_f_e, float* device_total_f, intblock_size, int Col) { int pos=blockIdx.x*block_size+threadIdx.x; float total=device_total_f[pos/Col]; float count=device_count_f_e[pos]; device_t_f_e[pos]=(100000*count/total); device_count_f_e[pos]=0; } Each thread update one entry of t(f|e) to a specified value And Set one entry of c(f|e) to zero for next iteration Here, it is not possible to set total(f) to Zero, As there is no synchronization between threads out of a block**Setting total(f) to Zero**__global__ void total(float* device_total_f){ int pos=threadIdx.x+blockDim.x*blockIdx.x; device_total_f[pos]=0; } Each thread set one entry of total(f) to Zero:**Future Goals**• Convergence Condition: • We repeat the iterations of calculating C(f|e) and t(f|e) for 5 times. • But it should be driven from the value of t(f|e). • We wish to add it to our code as it has a capability of parallelization. • It’s just one of IBM Model 1-5, which are implemented as GIZA++ package. • We wish to parallelize 4 other models.**We Want to Express Our Appreciation to:**Dr.Fazly For her useful comments and valuable notifications. Dr.Azimi For his kindness and full support.

More Related