200 likes | 322 Views
This document explores the concepts of deterministic and non-deterministic recognition in natural language processing (NLP) algorithms, as adapted from Jurafsky and Martin. It covers the representations of automata using transition tables, the process of recognition, and the differentiation between deterministic and non-deterministic finite state automata (DFA and NFA). Key points include failure states, the mechanics of recognition processes, and the implications of non-determinism in computational searches.
E N D
CSA305: Natural Language Algorithms Deterministic and Non Deterministic Recognition CSA3050 NLP Algorithms
Acknowledgement • Material presented adapted fromJurafsky and Martin Ch 2 CSA3050 NLP Algorithms
Representation of Automata using Transition Tables CSA3050 NLP Algorithms
Transition Table Representation in Prolog S a b ! s(0,1,0,0). s(1,0,2,0). s(2,0,3,0). s(3,0,3,4). s(4,0,0,0). next(OldState,a,NewState) :- s(OldState,NewState,_,_). next(OldState,b,NewState) :- s(OldState,_,NewState,_). next(OldState,’!’,NewState) :- s(OldState,_,_,NewState). CSA3050 NLP Algorithms
A Better Representation s(0,b,1). s(1,a,2). s(2,a,3). s(3,a,3). s(3,’!’,4). next(OldState,Sym,NewState) :- s(OldState,Sym,NewState). CSA3050 NLP Algorithms
The Process of Recognition 1 • Start in the initial state and at the first symbol of the word. • If there is an arc labelled with that symbol, the machine transitions to the next state, and the symbol is consumed. • The process continues with successive symbols until .... CSA3050 NLP Algorithms
The Process of Recognition 2 One or more of these conditions holds: • A. All symbols in the input are consumed • IF current state is final, succeed, else fail • B. There are no transitions out of a state for the current symbol. • fail CSA3050 NLP Algorithms
Deterministic Recognition • A deterministic algorithm is one that has no choice points • The following algorithm takes as input a tape and an automaton. • returns accept else reject CSA3050 NLP Algorithms
DETERMINISTIC FSA RECOGNITION CSA3050 NLP Algorithms
Skeleton of Prolog Implementation drec(Tape,Machine,State,Result). drec([ ], M, S, yes) :- final(S). drec([H|T], M, S, Result) :- tran(M,S,H,N), drec(T,M,N,Result). drec(_,_,_,no). CSA3050 NLP Algorithms
Failure States • We can regard failure as a special state. • That state is reached by adding supplementary arcs that represent invalid input. CSA3050 NLP Algorithms
Adding a Failure State CSA3050 NLP Algorithms
Deterministic versus Non Deterministic Recognition. • The behaviour of the automata we have considered is fully determined by the current state, and the input symbol. • The recognition process is said to be deterministic • This is not necessarily the case. • Several arcs with the same label. • -Transitions. Arcs with no label. • Automata like this are called non-determinstic CSA3050 NLP Algorithms
Non Deterministic FAs CSA3050 NLP Algorithms
Non Deterministic Recognition • There are three ways of dealing with non-deterministic recognition: • Backtracking: at every choice point, record the state and as yet unexplored choices. • Lookahead: peek ahead n symbols in the input in order to decide which path to take. • Parallel search: look at every path in parallel. CSA3050 NLP Algorithms
ND-RECOGNISE • function ND-RECOGNISE(tape,machine) returns accept or reject • agenda { (q0(machine),0) } • search_state NEXT(agenda) • loop • if ACCEPT-STATE?(search_state) = true • then return accept • else • agenda agenda GENERATE-NEW-STATES(search_state) • if agenda is empty • then return reject • else current_state NEXT(agenda) • end CSA3050 NLP Algorithms
ACCEPT-STATE? function ACCEPT-STATES?(search_state) mstate first(search_state) tape_pos second(search_state) if tape[tape_pos] = end_input and IS-FINAL?(mstate) then return true elsereturn false CSA3050 NLP Algorithms
GENERATE-NEW-STATES function GENERATE-NEW-STATES(search_state) mstate first(search_state) tape_pos second(search_state) return {(x,tape_pos) | x=trantable[mstate,] } {(x, tape_pos + 1) | trantable[mstate, tape[tape_pos]]} CSA3050 NLP Algorithms
Recognition as Search • Recognition can be regarded as a search problem • Initial state, Goal State • Rules • Strategy • Different search behaviours (depth first, breadth first) can be evoked by managing the agenda in different ways. • See Jurafsky & Martin sect 2.2 CSA3050 NLP Algorithms
Deterministic and Non Deterministic FSAs • The class of languages recognisable by NDFSA is identical to that recognised by DFSA. • For every NDFSA ND there is an equivalent FSA D. • The states of D correspond to sets of states in ND • If N is the number of states in ND, the number of states in D is ≤ 2N CSA3050 NLP Algorithms