Understanding the Maltese Alphabet and Challenges in Tourism Industry Recovery
This lecture outlines the complexities surrounding the Maltese alphabet, including character recognition issues, which could enhance Maltese language processing. It addresses current tourism challenges highlighted by a study from Deloitte, revealing a 19.8% drop in hotel earnings due to decreased tourist influx and rising costs. Furthermore, we explore potential algorithmic solutions for recovering unique Maltese characters, examining the noisy channel model applicable to words. The session aims to merge linguistic considerations with practical recovery strategies for the Maltese tourism sector.
Understanding the Maltese Alphabet and Challenges in Tourism Industry Recovery
E N D
Presentation Transcript
CHARM Lecture 1 Outline of the Problem
The Problem 1 The Maltese Alphabet A a B b Ċ ċ D d E e F f Ġ ġ G g Għ għ H h a be ċe de e ef ġe ge ajn akka Ħ ħ I i Ie ie J j K k L l M m N n O o P p ħe i ie je ke elle emme enne o pe Q q R r S s T t U u V v W w X x Ż ż Z z qe erre esse te u ve we exxe że zej We will refer to ordinary characters that could yield Maltese characters as charms
The Problem 2 from KullĦadd FIL-KRIZI li ghandna fit-turizmu fil-gzejjer taghna l-aghar li qed jintlaqtu huma l-lukandi tal tliet stilel. L-ahhar studju li sar mid-Deloitte ghall-Assocjazzjoni Maltija tal-Lukandi u Ristoranti jghidilna kif in-nuqqas tal turisti u z-zieda fl-ispejjez ghal dawn il-lukandi fissru li ghamlu telf tal 19.8% fir-rata tal qliegh taghhom u fosthom kien hemm min salva biss anki fl-aqwa tas-sajf permezz tal l-istudenti. L-istess studju juri li 70% tas-sidien tal dawn il-lukandi jibzghu li se jkomplu jbatu min-nuqqas tal turisti u se jkollhom hafna kmamar vojta fix-xhur li gejjin.
The Problem 3 Is there some way in which we can recover the special Maltese characters automatically? If so • What is the underlying algorithmic model? • What knowledge must the programme bring to bear? • What resources are needed to build the knowledge base?
Noisy Channel Modelfor Sentence Translation (Brown et. al. 1990) target sentence sourcesentence sentence diagram from Jurafsky & Martin
Algorithmic Model • Noisy channel model is domain independent. • Brown applied it to the domain of translation from source language to target language. • We can use it for the domain of words.
Noisy Channel at Word Level KullĦadd source NOISY CHANNEL KullHadd target
Main Algorithm: Four Steps • See target word t • Generate the set S of all possible source words for that word. • Pick the most probable source word s in S • Output s
Step 1: See Target Word • Preprocessing • noise • case • punctuation • hyphen • Tokenisation • words • numbers • other
Step 2 • Generate S If t contains charms generate S = {s | forall 0 < i <= len(t) s[i] = t[i] \/ s[i] = m(t[i]) }
Step 3 • Pick the most probable source word s in return argmax(P(s)) for s in S • This is covered in lecture 2