CMSC 723 / LING 645: Intro to Computational Linguistics September 29, 2004: Dorr Toward a Kimmo FAQ Prof. Bonnie J. DorrDr. Christof MonzTA: Adam Lee
FAQ for Kimmo 1) What order does PCKimmo go through the FSAs? Is it in the order listed in the .aut file?NO. They are simultaneously traversed (as described in class last week); I'll emphasize this again this week. 2) In the .aut file, there is an underscore in all of the FSAs. What does this mean?You should only use the characters that I have specified for German in the lab description. The underscore in English was only because that automaton was used in a working system, after tokenization had taken place. So it was there to deal with tokens like "because_of" or "so_that". But you don't need to do anything like this, so ignore it for your lab. 3) Can you have two letters after the colon (replacement operator)?NO. As described last week in class, only single characters are allowed.
FAQ for Kimmo (continued) 4) In an FSA, where specificity matters, it is only the feasible pair that matters, not the individual letter, correct? (for example if I have a pair of e:i, I do not need to mention e or i specifically if I do not want this feasible pair to happen again).If you have the feasible pair e:i in one automaton, you do not need to mention that pair explicitly again in some other automaton. However, unfortunately, once you have the feasible pair e:i, it will affect what you put in other automata. In particular, all your other automata need to have a transition to cover that case, even if it is simply the transition =:= , that is you need to make sure you don't fail in some *other* automaton while you are convering an "e" to an "i" in the e-to-i automaton.
FAQ for Kimmo (continued) 5) Nouns in German are capitalized, and this is how they are listed, but the root forms are not capitalized? I assumed we should make the root forms capitals. Is this correct?No, the root forms need not be capital letters. We stated this assumption explicitly in the lab: "For the purpose of this project, use lower case characters only, even for German nouns (which are usually otherwise capitalized)." 6) In PC Kimmo, how do we say which states are end states for the FSAs? I COMPLETELY FORGOT TO MENTION THIS LAST WEEK! However, it is very clearly stated in the PC Kimmo manual. Use colon (:) to indicate final state and dot (.) to indicate non-final state. (Although I didn't mention this, it is evident in the matrix form of the automaton I showed last week.)