1 / 31

計算機組織與組合語言 - PowerPoint PPT Presentation

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

PowerPoint Slideshow about ' 計算機組織與組合語言' - jered

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

計算機組織與組合語言

Teacher : cyy

Presenter : B98902071 康秩群

Outline of this slide
• The 0-bits counting problem
• Naïve algorithm
• Querying table approach
• Counting 1’s and subtracted by 16
• Eliminating algorithm
• Parallelly counting algorithm
• Other improvement skills

• Input: An array of 16-bits integers. (Size of the array is no more than 32)
• Output: The amount of 0-bits in the array.
Naïve algorithm

b = 0;

do {

d = a[--c];

r2 = 1;

do {

if (d & r2 == 0)

b++;

r2 <<= 1;

} while (r2 != 0);

} while (c > 0);

return b;

Time complexity

The above algorithm runs in

[amount of 0-bits] *5 + [amount of 1-bits] *4 + [email protected]#\$

= O(5n) = O(n)

Performance
• 4032 clocks
• Rank #39
Querying table approach

0110110110011110

group1group2group3group4

0000 : 4 0’s

0001 : 3 0’s

0010 : 3 0’s

0011 : 2 0’s

………………..

1111 : 0 0’s

Constructing table

int C0[16]={4,3,3,2,

3,2,2,1,

3,2,2,1,

2,1,1,0};

Querying table

do {

d = a[--c];

b += C0[d & 0xF];

d >>= 4;

d &= 0xFFF;

b += C0[d & 0xF];

d >>= 4;

b += C0[d & 0xF];

d >>= 4;

b += C0[d & 0xF];

} while (c > 0);

Time complexity

The above algorithm runs in

[amount of integers] *24 + 42(constructig table)

= O( (24/16)n ) = O(1.5n) = O(n)

Performance
• 1578 clocks
• Rank #18
Counting 1’s and subtracted by 16
• You can construct a larger table such as C0[64], and divide the integer to 6-6-4.
• Run time is no less than ¾ of the above algorithm (>=1200).
• How about another view point that count 1’s and then subtracted by 16.
• There many interesting algorithms!
Eliminating algorithm

while (n){

count++;

n &= n-1;

}

• Twocases : (1.) *************1 (2.)*****10...0000
Case 1
• n = *************1
• n-1 =*************0

-----------------------

• n&n-1 = *************0
• A one was eliminated.
Case 2
• n = *****10...0000
• n-1 =*****01...1111

-----------------------

• n&n-1 = *****00...0000
• A one was eliminated.
Eliminate a 1 each round
• When n is eliminated to zero, that’s the end!
Implement

b = c << 4; //c * 16

do {

d = a[--c];

while (d){

b--;

d &= d - 1;

}

} while (c > 0);

return b;

Time complexity

The above algorithm runs in

[amount of 1-bits] *5 + [size of array] *4 + 4

= O( [5 + (4/16)]n ) = O(5.25n) = O(n)

Performance
• 2100 clocks
• Rank #27
• Slower? It depends on the amount of 1’s.
• It’s faster then the above before rejudge.
• Obviously, the amount of 1-bits was increased.
• But the code is short, good to do other things.
Parallelly counting algorithm
• Similar as the others

00(0) 0 ones →00 – 0 = 00(0)

01(1) 1 ones → 01 – 0 = 01(1)

10(2) 1 ones → 10 – 1 = 01(1)

11(3) 2 ones → 11 – 1 = 10(2)

• [the original two bit] – [the left bit]
• then add them all iteratively
Parallelly counting algorithm

do {

x = x - ((x >> 1) & 0x5555);

x = (x&0x3333) + ((x>>2) & 0x3333);

x = (x + (x >> 4));

b -= x & 0xF;

b -= (x >>8) & 0xF;

} while (c > 0);

Time complexity

The above algorithm runs in

[amount of integers] *18 + 9

= O( (18/16)n ) = O(1.125n) = O(n)

Performance
• 1224 clocks
• Rank #10
Processing 3 integers
• 3個數字一組一起算(同阿蹦)
• 4個bits可表示0~15，但同一組1的數量最多只有4個
• 故算出每4bits中1的數量後可塞進3組數字(4 * 3 = 12 < 15)
• 後續動作可一起做，節省兩組的後續計算時間
• Code有點長就不附上了，有興趣請與我聯繫

1111 → 0100

1111 → 0100

1111 → 0100

-----------------

1100

Time complexity

The above algorithm runs in

ceil (amount of integers/3) *45 + 10

= O( (15/16)n ) = O(0.9375n) = O(n)

Performance
• 1090 clocks
• Rank #7
Other improvement skills
• 攤開迴圈
• 以該code長度可攤開四組(12個數字)
• 尾端未滿三組須跳出，盡可能將不影響之判斷式移除
• 可順便測得兩組測資分別為16、32組數字
• 在main裡直接輸入直接算 (for part #3)
• 亦可攤開三組(9個數字)
Final performance
• Part #2: 1002 clocks
• Rank #1 (Can run even faster by combining the others’ skills)
• Part #3: 674 clocks
• Rank #1
Appreciation