- 85 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'Sets of Digital Data' - quiana

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Digital Data

- In earlier work with BSTs and various balanced trees, we compared keys for order or equality
- Here, we take advantage of structure of key
- Use it as an index, or
- Decompose string key into characters, or
- Treat key as numerical quantity on which we can perform operations

Assumptions

- We will construct and manipulate sets that
- Are drawn from a universe U of size N
- U = {u0, …uN-1}
- A relatively simple procedure exists by which we can compute, for an element u U, the index i such that u = ui.
- Easy if U is set of integers
- Also easy if U is set of characters with character codes in a contiguous interval

Bit Vector

- Used to represent a subset S U
- A table of N bits, Bits[0.. N-1]
- Bits[i] == 1 if ui S
- Bits[i] == 0 if ui S
- Example: today’s attendance

0 1 2 3 4 5 6 -- student number

1 1 0 1 0 1 1

1 = present

0 = absent

Bit Vectors

- Assume:
- determining element index takes constant time
- accessing position in table takes constant time
- May actually take several ops, and depend somewhat on N(size of universe), but not on size of set represented
- Then:
- Insert, Delete, Member are constant time ops

Bit Vectors

- A subset of a set of size N always takes N bits to represent, independent of size of subset
- Makes sense if:
- N is not too large
- need to represent sets of size comparable to N

Storage Efficiency

- Bit Vector vs. Binary Trees
- Binary Tree, set of size n
- Requires n(2p + K) bits
- K >= lg N, size of field to represent key value
- p = number of bits in a pointer
- Bit Vector, takes N bits
- If n N, then bit vector more efficient
- If p = K = 32, then tree becomes more space efficient when n/N 1%
- Actually, when n(2p + K) = N, which is when n/N = 1/96

When to use Bit Vectors?

- When universe is relatively small
- When sets are large in relation to size of universe

Advantages of Bit Vectors

- O(1) implementation of Insert, Delete, Member
- Union and Intersection easy
- Implement via Boolean and and or operations
- May actually take less than one op/element, as operations are performed on full machine word
- If machine word == 32, then one machine operation handles 32 potential elements of set

Disadvantages of Bit Vectors

- On some computers access to individual bits can require shifting and masking operations (expensive)
- Result is that Member may be much more expensive than Union
- Initialization takes (N) -- zero all the bits in the vector
- But can use constant time initialization algorithm
- But that makes storage requirement go to 2p + 1 bits per element
- So, in practice, just use machine ops to set to zero, which are efficient

Tries and Digital Search Trees

- If the key can be decomposed into characters, then the characters of the key can be used as indices
- Tries are based on this idea
- “trie” is the middle symbol of retrieval, a pun on tree, but pronounced “try”

Tries

- Assume k possible character values
- A trie is a (k+1)-ary tree
- each node a table of k+1 pointers
- One pointer for each possible character
- One for the end of string character,

Tries

- Path for key of m characters is length m, with pointer at
- Don’t need to store key itself .. It is the path followed.
- Info field might be pointed to by element

Tries: Analysis

- Let:
- n be the number of keys stored in a trie
- l be the length(in characters) of the longest key
- s be the number of nodes in the trie
- k be the size of the alphabet
- Pro:
- Access time is O(l), independent of k, n and s
- Con:
- Size -- requires (k+1) * s * p bits
- Most pointers are null, so lots of wasted space

Strategies for reducing storage requirements of tries

- Implement a k-ary trie with m nodes as a 2-D, m by k table

A B C D E … M …. P …. T ….

0

1

2

3

4

5

Table approach

- Number the nodes in the diagram of slide 13 from 1 to m
- The table entry corresponding to jth child of ith node is the index of the child node
- How does that save space? Just as many nodes and elements as on slide 13
- … need only ceil(lg(m)) bits to represent, smaller than a pointer …

Patricia Tree:Another strategy for reducing space in a trie

- Patricia tree
- Practical Algorithm to Retrieve Information Codedin Alphanumeric
- Eliminate nodes with only one nonempty child
- Can now skip right from T to in TURING in our example
- Skip from MA …. To E or in the MENDEL , MENDELEEV chain
- But need to store with each node the index of the character on which it discriminates
- And need to store the key itself at the leaf

de la Briandais trees

- Another strategy to save space vs. standard tries
- Use a linked list instead of a table at the node level
- Each pointer labeled with the character it indexes
- longer search time than tries; depends on size of character set
- saves significant amounts of memory

Another strategy …

- Use tries at the first few levels
- Use ordinary BSTs or de la Briandais at the lower levels
- reasoning:
- speed advantage at the top, but not too much extra memory required
- save space at lower levels

Digital Search Trees

- Treat keys as bit strings
- (strings over the alphabet {0,1})
- Binary tree – search directed left on 0, right on 1
- Each node contains not only two pointers, but also contains a key that matches that string prefix
- Compare for equality before searching left or right
- If frequencies are known, store higher frequency keys nearer root
- Can be grown dynamically
- Expected Search time: O(log n)

Download Presentation

Connecting to Server..