1 / 17

Efficient Implementation of a Statics Counter Architecture

Efficient Implementation of a Statics Counter Architecture. Author: Sriram Ramabhadran, George Varghese Publisher: SIGMETRICS’03 Presenter: Yun-Yan Chang Date: 2010/12/29. Outline. Introduction Previous works Scheme LR(T) Aggregated bitmap Implementation Conclusion. Introduction.

jane
Download Presentation

Efficient Implementation of a Statics Counter Architecture

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Efficient Implementation of a Statics Counter Architecture Author:Sriram Ramabhadran, George Varghese Publisher: SIGMETRICS’03 Presenter: Yun-Yan Chang Date: 2010/12/29

  2. Outline • Introduction • Previous works • Scheme • LR(T) • Aggregated bitmap • Implementation • Conclusion

  3. Introduction • Remove bottleneck of [1] by proposing a counter management algorithm (CMA) called LR(T) (Largest Recent with threshold T) that avoids sorting by only keeping a bitmap that tracks counters that are larger than threshold T.

  4. Previous Work • D. Shah, S. Iyer, B. Prabhakar, and N. McKeown • Maintaining statistics counters in router line cards • Propose a hybrid architecture in which DRAM is used to store the statistics counters but a small amount of SRAM is used to enable counter updates at line rate. • Propose a CMA called LCF (Largest CounterFirst)which picks the counter with the largest value to beupdated to DRAM.

  5. Previous Work (cont.) • Architecture • SRAM stores N counters of size m<M bits. • DRAM stores N counters of size M bits. • The SRAM counters hold recent updates and are periodically transferred to the corresponding DRAM counters. Figure 1. Statistics counter architecture

  6. Previous Work (cont.) • Largest Counter First (LCF) • An algorithm which can minimize the size of SRAM. • Selects the largest counter. • If multiple counters have the same value, picks one arbitrarily. • Updates the value of the corresponding counter in the DRAM and sets in the SRAM. • Bottleneck: • Sort: find the highest counter • Difficult to implement at high speed

  7. LR(T) Algorithm • Algorithm description • Let j*be the counter with the largest value among the counters incremented in the last cycle of b updates to SRAM. • If the value of counter cj*≥T, then updates counter j*to DRAM. • If cj* <T, LR(T) updates any counter with value at least T to DRAM. • If no counter exists, LR(T) updates counter j*to DRAM.

  8. LR(T) Algorithm • Proof: • Threshold T=0 allows a simple implementation, while T=b is optimal and minimizes the size of SRAM requirement. • LR(0) • Only remembers the last b updates to SRAM in determining which counter update to DRAM. • Let be maximum value of a counter can reach under LR(0) • Theorem 1: • Implies SRAM counter of size at least

  9. LR(T) Algorithm • LR(b) • Threshold increases from 0 to b. • b:time between accesses DRAM • Let be maximum value of a counter can reach under LR(0). • Theorem2: • Implies any counter is at most (b − 1)(N − 1) • Value of counter cannot be larger than (b-1)+logd(N-1) , where

  10. Aggregated bitmap • To minimize the required storage • Consider a fixed universe U of N elements labelled 1, 2,…,N. • Use a bitmap b1b2 ... bN to record which elements are contained in set S or not. • biis set to 1 if element i ∈ S, otherwise set to 0. • Implement functions: • add(i) Adds element i to set S • delete(i) Deletes element i from set S • test(i) Tests whether element i belongs to set S • find() Returns any element i that belongs to set S

  11. Aggregated bitmap Figure 2: Aggregated bitmap for N = 128 elements and W = 16 word size.

  12. Aggregated bitmap • Each group of W bits in the bitmap is aggregated to form a single node. • N : bits of aggregated bitmap • W: the word size (N and W must be power of 2) Total: nodes Total memory: Figure 2: Aggregated bitmap for N=128 elements and W=16 word size. W

  13. Aggregated bitmap • Each internal node in the tree contains two fields called lcount and rcount. • lcount is the number of 1s present in its left child • rcount is the number of 1s present in its right child lcount rcount Figure 2: Aggregated bitmap for N=128 elements and W=16 word size.

  14. Aggregated bitmap • Pipelined implementation • Each operation proceeds top-down, start at root, from one level to another. • At each level of the tree, there is potentially a memory read followed by a memory write. • Storing each of the levels of the tree in a different memory bank permits simultaneous access to all levels of the tree.

  15. Implementation • To implement LR(T), it’s necessary to keep track of two things: • The largest value among all counters updated in the last cycle of b updates along with the corresponding counter j∗. • All counters above the threshold T. • Memoryaccesses for counter operations and bitmap operationsproceed in parallel.

  16. Implementation • Every cycle of b updates involves b SRAM and a DRAM update operation • SRAM update operation • Two accesses to update SRAM counter • Two accesses for add • DRAM update operation • Two accesses to read and reset SRAM counter • Four accesses for delete and find. • Two DRAM accesses to update DRAM counter Figure 3: Timing diagram for SRAM and DRAM updates for two successive cycles of b counter updates.

  17. Conclusion • For a reference system of a million 64-bit counters and a line rate of 10 Gbps with 10 counter updates per packet Table 1: Cost - benefit comparison for different schemes.

More Related