計算機組織與組合語言
This presentation is the property of its rightful owner.
Sponsored Links
1 / 31

計算機組織與組合語言 PowerPoint PPT Presentation


  • 133 Views
  • Uploaded on
  • Presentation posted in: General

計算機組織與組合語言. Teacher : cyy P resenter : B98902071 康秩群. Outline of this slide. The 0 -bits counting problem Naïve algorithm Querying table approach Counting 1’s and subtracted by 16 Eliminating algorithm Parallelly counting algorithm Other improvement skills. 數圈圈問題.

Download Presentation

計算機組織與組合語言

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


1878883

計算機組織與組合語言

Teacher : cyy

Presenter : B98902071 康秩群


Outline of this slide

Outline of this slide

  • The 0-bits counting problem

    • Naïve algorithm

    • Querying table approach

    • Counting 1’s and subtracted by 16

      • Eliminating algorithm

      • Parallelly counting algorithm

    • Other improvement skills


1878883

數圈圈問題

  • Input: An array of 16-bits integers. (Size of the array is no more than 32)

  • Output: The amount of 0-bits in the array.


Any na ve algorithm

Any naïve algorithm?


Na ve algorithm

Naïve algorithm

b = 0;

do {

d = a[--c];

r2 = 1;

do {

if (d & r2 == 0)

b++;

r2 <<= 1;

} while (r2 != 0);

} while (c > 0);

return b;


Time complexity

Time complexity

The above algorithm runs in

[amount of 0-bits] *5 + [amount of 1-bits] *4 + !@#$

= O(5n) = O(n)


Performance

Performance

  • 4032 clocks

    • Rank #39


1878883

「數圈圈問題」的複雜度


Querying table approach

Querying table approach

0110110110011110

group1group2group3group4

0000:4 0’s

0001:3 0’s

0010:3 0’s

0011:2 0’s

………………..

1111:0 0’s


Constructing table

Constructing table

int C0[16]={4,3,3,2,

3,2,2,1,

3,2,2,1,

2,1,1,0};


Querying table

Querying table

do {

d = a[--c];

b += C0[d & 0xF];

d >>= 4;

d &= 0xFFF;

b += C0[d & 0xF];

d >>= 4;

b += C0[d & 0xF];

d >>= 4;

b += C0[d & 0xF];

} while (c > 0);


Time complexity1

Time complexity

The above algorithm runs in

[amount of integers] *24 + 42(constructig table)

= O( (24/16)n ) = O(1.5n) = O(n)


Performance1

Performance

  • 1578 clocks

    • Rank #18


Counting 1 s and subtracted by 16

Counting 1’s and subtracted by 16

  • You can construct a larger table such as C0[64], and divide the integer to 6-6-4.

  • Run time is no less than ¾ of the above algorithm (>=1200).

  • How about another view point that count 1’s and then subtracted by 16.

  • There many interesting algorithms!


Eliminating algorithm

Eliminating algorithm

while (n){

count++;

n &= n-1;

}

  • Twocases : (1.)*************1 (2.)*****10...0000


Case 1

Case 1

  • n = *************1

  • n-1 =*************0

    -----------------------

  • n&n-1 =*************0

  • A one was eliminated.


Case 2

Case 2

  • n = *****10...0000

  • n-1 =*****01...1111

    -----------------------

  • n&n-1 =*****00...0000

  • A one was eliminated.


Eliminate a 1 each round

Eliminate a 1 each round

  • When n is eliminated to zero, that’s the end!


Implement

Implement

b = c << 4;//c * 16

do {

d = a[--c];

while (d){

b--;

d &= d - 1;

}

} while (c > 0);

return b;


Time complexity2

Time complexity

The above algorithm runs in

[amount of 1-bits] *5 + [size of array] *4 + 4

= O( [5 + (4/16)]n ) = O(5.25n) = O(n)


Performance2

Performance

  • 2100 clocks

    • Rank #27

  • Slower? It depends on the amount of 1’s.

  • It’s faster then the above before rejudge.

    • Obviously, the amount of 1-bits was increased.

  • But the code is short, good to do other things.


Parallelly counting algorithm

Parallelly counting algorithm

  • Similar as the others

    00(0)0 ones→00 – 0 = 00(0)

    01(1) 1 ones→ 01 – 0 = 01(1)

    10(2) 1 ones→ 10 – 1 = 01(1)

    11(3) 2 ones→ 11 – 1 = 10(2)

  • [the original two bit] – [the left bit]

  • then add them all iteratively


Parallelly counting algorithm1

Parallelly counting algorithm

do {

x = x - ((x >> 1) & 0x5555);

x = (x&0x3333) + ((x>>2) & 0x3333);

x = (x + (x >> 4));

b -= x & 0xF;

b -= (x >>8) & 0xF;

} while (c > 0);


Time complexity3

Time complexity

The above algorithm runs in

[amount of integers] *18 + 9

= O( (18/16)n ) = O(1.125n) = O(n)


Performance3

Performance

  • 1224 clocks

    • Rank #10


Processing 3 integers

Processing 3 integers

  • 3個數字一組一起算(同阿蹦)

    • 4個bits可表示0~15,但同一組1的數量最多只有4個

    • 故算出每4bits中1的數量後可塞進3組數字(4 * 3 = 12 < 15)

    • 後續動作可一起做,節省兩組的後續計算時間

      • Code有點長就不附上了,有興趣請與我聯繫

        1111→0100

        1111→0100

        1111→0100

        -----------------

        1100


Time complexity4

Time complexity

The above algorithm runs in

ceil (amount of integers/3) *45 + 10

= O( (15/16)n ) = O(0.9375n) = O(n)


Performance4

Performance

  • 1090 clocks

    • Rank #7


Other improvement skills

Other improvement skills

  • 攤開迴圈

    • 以該code長度可攤開四組(12個數字)

    • 尾端未滿三組須跳出,盡可能將不影響之判斷式移除

      • 可順便測得兩組測資分別為16、32組數字

  • 在main裡直接輸入直接算 (for part #3)

    • 亦可攤開三組(9個數字)


Final performance

Final performance

  • Part #2:1002 clocks

    • Rank #1 (Can run even faster by combining the others’ skills)

  • Part #3:674 clocks

    • Rank #1


Appreciation

Appreciation

  • Thanks for your attention.

  • Thanks for Professor hil’s slides prototype.


  • Login