pipeline and vector processing chapter2 and appendix a l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Pipeline and Vector Processing (Chapter2 and Appendix A) PowerPoint Presentation
Download Presentation
Pipeline and Vector Processing (Chapter2 and Appendix A)

Loading in 2 Seconds...

play fullscreen
1 / 30

Pipeline and Vector Processing (Chapter2 and Appendix A) - PowerPoint PPT Presentation


  • 377 Views
  • Uploaded on

Pipeline and Vector Processing (Chapter2 and Appendix A). Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2010. Parallel processing. A parallel processing system is able to perform concurrent data processing to achieve faster execution time

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Pipeline and Vector Processing (Chapter2 and Appendix A)' - albert


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
pipeline and vector processing chapter2 and appendix a

Pipeline and Vector Processing(Chapter2 and Appendix A)

Dr. Bernard Chen Ph.D.

University of Central Arkansas

Spring 2010

parallel processing
Parallel processing
  • A parallel processing system is able to perform concurrent data processing to achieve faster execution time
  • The system may have two or more ALUs and be able to execute two or more instructions at the same time
  • Goal is to increase the throughput– the amount of processing that can be accomplished during a given interval of time
parallel processing classification
Parallel processing classification

Single instruction stream, single data stream – SISD

Single instruction stream, multiple data stream – SIMD

Multiple instruction stream, single data stream – MISD

Multiple instruction stream, multiple data stream – MIMD

single instruction stream single data stream sisd
Single instruction stream, single data stream – SISD
  • Single control unit, single computer, and a memory unit
  • Instructions are executed sequentially. Parallel processing may be achieved by means of multiple functional units or by pipeline processing
single instruction stream multiple data stream simd
Single instruction stream, multiple data stream – SIMD
  • Represents an organization that includes many processing units under the supervision of a common control unit.
  • Includes multiple processing units with a single control unit. All processors receive the same instruction, but operate on different data.
multiple instruction stream single data stream misd
Multiple instruction stream, single data stream – MISD
  • Theoretical only
  • processors receive different instructions, but operate on same data.
multiple instruction stream multiple data stream mimd
Multiple instruction stream, multiple data stream – MIMD
  • A computer system capable of processing several programs at the same time.
  • Most multiprocessor and multicomputer systems can be classified in this category
pipelining laundry example
Pipelining: Laundry Example

Small laundry has one washer, one dryer and one operator, it takes 90 minutes to finish one load:

Washer takes 30 minutes

Dryer takes 40 minutes

“operator folding” takes 20 minutes

A

B

C

D

sequential laundry
Sequential Laundry

This operator scheduled his loads to be delivered to the laundry every 90 minutes which is the time required to finish one load. In other words he will not start a new task unless he is already done with the previous task

The process is sequential. Sequential laundry takes 6 hours for 4 loads

A

B

C

D

6 PM

Midnight

7

8

9

11

10

Time

30

40

20

30

40

20

30

40

20

30

40

20

T

a

s

k

O

r

d

e

r

90 min

efficiently scheduled laundry pipelined laundry operator start work asap
Efficiently scheduled laundry: Pipelined LaundryOperator start work ASAP

Another operator asks for the delivery of loads to the laundry every 40 minutes!?.

Pipelined laundry takes 3.5 hours for 4 loads

30

40

40

40

40

20

A

B

C

D

6 PM

Midnight

7

8

9

11

10

Time

40

40

40

T

a

s

k

O

r

d

e

r

pipelining facts
Pipelining Facts

Multiple tasks operating simultaneously

Pipelining doesn’t help latency of single task, it helps throughput of entire workload

Pipeline rate limited by slowest pipeline stage

Potential speedup = Number of pipe stages

Unbalanced lengthsof pipe stages reduces speedup

Time to “fill” pipeline and time to “drain” it reduces speedup

30

40

40

40

40

20

A

B

C

D

6 PM

7

8

9

Time

T

a

s

k

O

r

d

e

r

The washer waits for the dryer for 10 minutes

9 2 pipelining
9.2 Pipelining
  • Decomposes a sequential process into segments.
  • Divide the processor into segment processors each one is dedicated to a particular segment.
  • Each segment is executed in a dedicated segment-processor operates concurrently with all other segments.
  • Information flows through these multiple hardware segments.
9 2 pipelining13

k segments

9.2 Pipelining
  • Instruction execution is divided into k segments or stages
    • Instruction exits pipe stage k-1 and proceeds into pipe stage k
    • All pipe stages take the same amount of time; called one processor cycle
    • Length of the processor cycle is determined by the slowest pipe stage
speedup
SPEEDUP
  • Consider a k-segment pipeline operating on n data sets. (In the above example, k = 3 and n = 4.)
  • It takes k clock cycles to fill the pipeline and get the first result from the output of the pipeline.
  • After that the remaining (n - 1) results will come out at each clock cycle.
  • It therefore takes (k + n - 1) clock cycles to complete the task.
speedup15
SPEEDUP
  • If we execute the same task sequentially in a single processing unit, it takes (k * n) clock cycles.
  • • The speedup gained by using the pipeline is:
speedup16
SPEEDUP
  • S = k * n / (k + n - 1 )

For n >> k (such as 1 million data sets on a 3-stage pipeline),

  • S ~ k
  • So we can gain the speedup which is equal to the number of functional units for a large data sets. This is because the multiple functional units can work in parallel except for the filling and cleaning-up cycles.
pipeline performance
Pipeline Performance

n:instructions

k: stages in pipeline

: clockcycle

Tk: total time

n is equivalent to number of loads in the laundry example

k is the stages (washing, drying and folding.

Clock cycle is the slowest task time

n

k

example
Example
  • A non-pipeline system takes 100ns to process a task;
  • the same task can be processed in a FIVE-segment pipeline into 20ns, each
  • Determine the speedup ratio of the pipeline for 1000 tasks.
example answer
Example Answer
  • Speedup Ratio for 1000 tasks:

100*1000 / (5 + 1000 -1)*20 = 4.98

example21
Example
  • A non-pipeline system takes 100ns to process a task;
  • the same task can be processed in a six-segment pipeline with the time delay of each segment in the pipeline is as follows 20ns, 25ns, 30ns, 10ns, 15ns, and 30ns.
  • Determine the speedup ratio of the pipeline for 10, 100, and 1000 tasks. What is the maximum speedup that can be achieved?
example answer22
Example Answer
  • Speedup Ratio for 10 tasks:

100*10 / (6+10-1)*30

  • Speedup Ratio for 100 tasks:

100*100 / (6+100-1)*30

  • Speedup Ratio for 1000 tasks:

100*1000 / (6+1000-1)*30

  • Maximum Speedup:

100*N/ (6+N-1)*30 = 10/3

some definitions
Some definitions
  • Pipeline: is an implementation technique where multiple instructions are overlapped in execution.
  • Pipeline stage: The computer pipeline is to divided instruction processing into stages.
    • Each stage completes a part of an instruction and loads a new part in parallel.
some definitions24
Some definitions

Throughput of the instruction pipeline is determined by how often an instruction exits the pipeline. Pipelining does not decrease the time for individual instruction execution. Instead, it increases instruction throughput.

Machine cycle . The time required to move an instruction one step further in the pipeline. The length of the machine cycle is determined by the time required for the slowest pipe stage.

slide25

Instruction pipeline versus sequential processing

sequential processing

Instruction pipeline

slide26

Instruction pipeline (Contd.)

sequential processing is

faster for few instructions

instructions seperate
Instructions seperate
  • 1. Fetch the instruction
  • 2. Decode the instruction
  • 3. Fetch the operands from memory
  • 4. Execute the instruction
  • 5. Store the results in the proper place
5 stage pipelining
5-Stage Pipelining

S1

1

2

3

4

5

6

7

8

9

S2

1

2

3

4

5

6

7

8

S3

1

2

3

4

5

6

7

S4

1

2

3

4

5

6

S5

1

2

3

4

5

S1

S2

S3

S4

S5

Fetch

Instruction

(FI)

Decode

Instruction

(DI)

Fetch

Operand

(FO)

Execution

Instruction

(EI)

Write

Operand

(WO)

Time

five stage instruction pipeline
Five Stage Instruction Pipeline

Fetch instruction

Decode instruction

Fetch operands

Execute instructions

Write result

difficulties
Difficulties...

If a complicated memory access occurs in stage 1, stage 2 will be delayed and the rest of the pipe is stalled.

If there is a branch, if.. and jump, then some of the instructions that have already entered the pipeline should not be processed.

We need to deal with these difficulties to keep the pipeline moving