# 計算機組織與組合語言 - PowerPoint PPT Presentation

1 / 31

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

## 計算機組織與組合語言

Teacher : cyy

Presenter : B98902071 康秩群

### Outline of this slide

• The 0-bits counting problem

• Naïve algorithm

• Querying table approach

• Counting 1’s and subtracted by 16

• Eliminating algorithm

• Parallelly counting algorithm

• Other improvement skills

### 數圈圈問題

• Input: An array of 16-bits integers. (Size of the array is no more than 32)

• Output: The amount of 0-bits in the array.

### Naïve algorithm

b = 0;

do {

d = a[--c];

r2 = 1;

do {

if (d & r2 == 0)

b++;

r2 <<= 1;

} while (r2 != 0);

} while (c > 0);

return b;

### Time complexity

The above algorithm runs in

[amount of 0-bits] *5 + [amount of 1-bits] *4 + !@#\$

= O(5n) = O(n)

• 4032 clocks

• Rank #39

### Querying table approach

0110110110011110

group1group2group3group4

0000:4 0’s

0001:3 0’s

0010:3 0’s

0011:2 0’s

………………..

1111:0 0’s

### Constructing table

int C0[16]={4,3,3,2,

3,2,2,1,

3,2,2,1,

2,1,1,0};

### Querying table

do {

d = a[--c];

b += C0[d & 0xF];

d >>= 4;

d &= 0xFFF;

b += C0[d & 0xF];

d >>= 4;

b += C0[d & 0xF];

d >>= 4;

b += C0[d & 0xF];

} while (c > 0);

### Time complexity

The above algorithm runs in

[amount of integers] *24 + 42(constructig table)

= O( (24/16)n ) = O(1.5n) = O(n)

• 1578 clocks

• Rank #18

### Counting 1’s and subtracted by 16

• You can construct a larger table such as C0[64], and divide the integer to 6-6-4.

• Run time is no less than ¾ of the above algorithm (>=1200).

• How about another view point that count 1’s and then subtracted by 16.

• There many interesting algorithms!

### Eliminating algorithm

while (n){

count++;

n &= n-1;

}

• Twocases : (1.)*************1 (2.)*****10...0000

### Case 1

• n = *************1

• n-1 =*************0

-----------------------

• n&n-1 =*************0

• A one was eliminated.

### Case 2

• n = *****10...0000

• n-1 =*****01...1111

-----------------------

• n&n-1 =*****00...0000

• A one was eliminated.

### Eliminate a 1 each round

• When n is eliminated to zero, that’s the end!

### Implement

b = c << 4;//c * 16

do {

d = a[--c];

while (d){

b--;

d &= d - 1;

}

} while (c > 0);

return b;

### Time complexity

The above algorithm runs in

[amount of 1-bits] *5 + [size of array] *4 + 4

= O( [5 + (4/16)]n ) = O(5.25n) = O(n)

### Performance

• 2100 clocks

• Rank #27

• Slower? It depends on the amount of 1’s.

• It’s faster then the above before rejudge.

• Obviously, the amount of 1-bits was increased.

• But the code is short, good to do other things.

### Parallelly counting algorithm

• Similar as the others

00(0)0 ones→00 – 0 = 00(0)

01(1) 1 ones→ 01 – 0 = 01(1)

10(2) 1 ones→ 10 – 1 = 01(1)

11(3) 2 ones→ 11 – 1 = 10(2)

• [the original two bit] – [the left bit]

• then add them all iteratively

### Parallelly counting algorithm

do {

x = x - ((x >> 1) & 0x5555);

x = (x&0x3333) + ((x>>2) & 0x3333);

x = (x + (x >> 4));

b -= x & 0xF;

b -= (x >>8) & 0xF;

} while (c > 0);

### Time complexity

The above algorithm runs in

[amount of integers] *18 + 9

= O( (18/16)n ) = O(1.125n) = O(n)

• 1224 clocks

• Rank #10

### Processing 3 integers

• 3個數字一組一起算(同阿蹦)

• 4個bits可表示0~15，但同一組1的數量最多只有4個

• 故算出每4bits中1的數量後可塞進3組數字(4 * 3 = 12 < 15)

• 後續動作可一起做，節省兩組的後續計算時間

• Code有點長就不附上了，有興趣請與我聯繫

1111→0100

1111→0100

1111→0100

-----------------

1100

### Time complexity

The above algorithm runs in

ceil (amount of integers/3) *45 + 10

= O( (15/16)n ) = O(0.9375n) = O(n)

• 1090 clocks

• Rank #7

### Other improvement skills

• 攤開迴圈

• 以該code長度可攤開四組(12個數字)

• 尾端未滿三組須跳出，盡可能將不影響之判斷式移除

• 可順便測得兩組測資分別為16、32組數字

• 在main裡直接輸入直接算 (for part #3)

• 亦可攤開三組(9個數字)

### Final performance

• Part #2:1002 clocks

• Rank #1 (Can run even faster by combining the others’ skills)

• Part #3:674 clocks

• Rank #1