Data Structures in Language Processing: Classification and Search Operations

Chapter - 2 Data Structure for Language Processing

Classification of Data structures • Language processor makes frequent use of search operation over its data structures. • The data structure used in language processing can be classified on the basis of the following criteria: • Nature of a DS – whether a linear or nonlinear DS. • Purpose of a DS – whether a search DS or an allocation DS. • Lifetime of DS – whether used during language processing or during target program execution.

Linear data structure • A Linear DSconsist of a linear arrangement of elements in the memory. • Linear DS requires a contiguous memory for elements as shown in fig(a). • Problem: In the situations where the size of a data structure is difficult to predict. In such a situation a designer is forced to overestimate the memory requirements of linear data structure. This leads to wastage of memory.

Non linear data structure • The elements of a nonlinear data structure need not contiguous areas of memory, which avoids the memory allocation problem seen in the context of linear DS. • fig(b) shows allocation to four nonlinear data structure i.e. E,F,G and H, where F is stored in 3-different memory areas. • nonlinear arrangement of elements leads to lower search efficiency.

Search Data Structure • Search DS (or search structure) is a set of entries, each entry accommodating the information concerning one entity in source program. • Each entry assumes to contain a key field which forms the basis for a search operation. • Search DS are used to construct various tables of information. • Search DS are used during language processing to maintain attribute information of different entities in source program.

Search Data Structure Cont... Entry Formats: • Entries consist of two parts, a fixed part and a variant part. Each part consists of set of fields. • Fields of the fixed part exist in each entry of the search structure. • The value in the tag field of the fixed part determines the information to be stored in the variant part of the entry. • For e.g., Entries in the symbol table of a compiler have the following fields: • Fixed part: Fields symbol and class (class is the tag field) • Variant part: variable, operator, procedure name, function name etc.

Search Data Structure Cont... Algorithm (Generic search procedure): • Make a prediction concerning the entry of the search DS with symbol ‘s’ may be occupying. Let this be entry e. • Let se be the symbol occupying eth entry. Compare ‘s’ with se. Exit with success if the two match. • Repeat steps 1 and 2 till it can be concluded that the symbol does not exist in the search DS. • Each comparison of step 2 is called a probe (p). Ps : Number of probes in a successful search Pu : Number of probes in an unsuccessful search

Search Data Structure Cont... Operations on search structure: • The following operations are performed on search structure: • Operation add: Add the entry of a symbol • Operation search: Search & locate the entry of a symbol. • Operation delete: Delete the entry of a symbol. • The entry for a symbol is created only once, but may search for a large number of times during the processing of a program.

Search Data Structure Cont... Table organization • Table is linear data structure. Two points can be made concerning table as search structure. • Given the location of an entry of the table, so easy to move on next entry or previous entry of table for search technique. • Tables using the fixed length entry organization. It states that the address of an entry in a table can be determined from its entry number. • 3-main types of Table organization are: • Sequential search organization • Binary search organization • Hash table organization

Search Data Structure Cont... Sequential search organization: • It uses Generic search procedure to search any symbol from the table. • Fig. shows a typical state of a table using the sequential search organization.

Search Data Structure Cont... Sequential search organization (operations): • Search for a symbol: Ps = f/2 for a successful search Pu= f for an unsuccessful search • Add a symbol: The symbol is added to the first free entry in the table. The value of ‘ f ’is updated accordingly. • Delete a symbol: • Physical deletion : Entry is deleted by erasing or by overwriting. Thus, if the dth entry is to be deleted, entries d+1 to f can be shifted ‘up’ by one entry each. This would require (f - d) shift operations in symbol table. • Logical deletion : It is performed by adding some information to the entry to indicate its deletion. This can be implemented by introducing a new field to indicate whether an entry is active or deleted.

Search Data Structure Cont... Binary search organization: • All entries in a table are assumed to satisfy an ordering relation. Algorithm (Binary search): • Start := 1; end := f; • While Start <= end • e := (Start + end )/2; take rounded value. Exit with success if s = se. • If s < se then end := e – 1; else start := e + 1; • Exit with failure.

Search Data Structure Cont... Hash table organization: • Search prediction depends on the value of s. • 3-possibilities exist: • The entry may be occupied by s • The entry may be occupied by some other symbol, or • Entry may be empty • Algorithm (Hash table organization): • e : = h(s) • Exit with success if s = se and with failure if entry e is unoccupied. • Repeat steps 1 and 2 with different hashing functions (multiplication function or division functions etc…).

Allocation Data Structure • We will discuss two allocation data structure, stack(linear) and heaps(nonlinear). Stack: A stack is a linear Data Structure which specifies the following properties: • Allocation and deallocations are performed in a last-in-first-out (LIFO) manner. • Only the last entry is accessible at the time.

Allocation Data Structure Following fig. illustrates the stake allocation and deallocation process.

Allocation Data Structure Extended stack • Sometimes extension is needed in the simple stack model because all entities may not be of the same size. The size of an entity is assumed to be an integral multiple of the size of a stack entry. • Following figure shows extended stack model. In addition to SB and TOS, two new pointers exist in the model: • A record base pointer (RB) pointing to the first word of the last record in stack. • The first word of each record is a reserved pointer. This pointer is used for housekeeping purposes as explain below.

Allocation Data Structure Extended stack Extended stack mode (b)-allocation (c)-deallocation

Allocation Data Structure • Allocation time actions: NoStatement • TOS := TOS + 1 ; • TOS* := RB; • RB := TOS; • TOS := TOS + n; • Deallocation time actions: NoStatement 1. TOS := RB - 1 ; 2. RB := TOS*;

Heap • A heap is a nonlinear DS which permits allocation and deallocation of entities in a random order. • An allocation request returns a pointer to the allocated area in the heap. • A deallocation request must present a pointer to the area to be deallocated. • Memory management: memory management thus consisting of: • Identifying the free memory areas (or holes). • Reusing free memory areas.

Heap Cont… Identifying the free memory areas: • Two popular techniques used to identify free memory space are: • Reference Counts • Garbage Collection Reference Counts • In reference count techniques, the system associates a reference count with each memory area to indicate the number of its active user. • The number incremented when a new user gains access to that area and is decremented when a user finishes using it. The area is known to be free when its reference count drops to zero. • Advantage: reference count technique is simple to implement • Disadvantage: Incremental overheads, i.e. overheads at every allocation and deallocation.

Heap Cont… Garbage Collection: • Garbage collection makes two passes over the memory to identify unused areas. • In the first pass it traverses all pointers pointing to allocated areas and marks the memory areas which are in use. • The second pass finds all unmarked areas and declares them to be free. • The garbage collection overheads are not incremental. They are incurred every time the system runs out of free memory to allocate to fresh requests.

Heap Cont… Reuse of memory: • When a free list is used, two techniques can be used to perform a fresh allocation: • First fit technique: Select the first free area whose size is >= n words, where n is the number of words to be allocated. • Best fit technique: This technique finds the smallest free area whose size >= n.

Data Structures in Language Processing: Classification and Search Operations

Data Structures in Language Processing: Classification and Search Operations

Presentation Transcript

Chapter 2

Chapter 2

Chapter 2

Chapter 2

Chapter 2

Chapter 2

Chapter 2

Chapter 2

Chapter 2

Chapter 2:

Chapter 2

chapter 2

chapter 2

Chapter 2-2

CHAPTER 2

Chapter 2

Chapter 2

CHAPTER 2

Chapter 2

Chapter 2

CHAPTER 2

Chapter 2