

http://www.cse.unsw.edu.au/~cs2121 Lecturer: Hui Wu Term 2, 2019

1

<section-header><section-header><section-header><section-header><section-header><section-header><section-header><section-header><text>















- space (temporary storage) for Processor
- User-visible registers
- User-invisible registers
- Control and status registers
- Number and function vary between processor designs

- One of the major design decisions
- Top level of memory hierarchy

















| <b>Processor Cycle</b>                                                                                                |
|-----------------------------------------------------------------------------------------------------------------------|
| • All modern processors are synchronous machines.                                                                     |
| • Their timing is controlled by an external "clock" signal.                                                           |
| □ This is just a square electric pulse that is supplied to the processor (and memory etc) by an external source time. |
| A processor running at 1GHz receives 10 <sup>9</sup> clock pulses per second.                                         |
| ✤ One pulse lasts 0.000000001 second.                                                                                 |
| → Time                                                                                                                |
| • The processor operations are therefore broken up in cycles. <sub>17</sub>                                           |









<section-header><section-header><list-item><list-item><list-item><list-item><list-item><list-item>



































| Data Hazards                                                                                                              |                                                |  |  |  |
|---------------------------------------------------------------------------------------------------------------------------|------------------------------------------------|--|--|--|
| • A data hazard occurs when one instruction needs the result of another instruction, but the result is not available yet. |                                                |  |  |  |
| MULT R2, R3<br>ADD R4, R0                                                                                                 | $R1:R0 \leftarrow R2*R3$ $R4 \leftarrow R4+R0$ |  |  |  |
| Instruction cycle $\rightarrow$                                                                                           | 1 2 3 4 5 6 7 8 9 10 11                        |  |  |  |
| MULT R2, R3                                                                                                               | FI DI CO FO EI WO                              |  |  |  |
| ADD R4, R0                                                                                                                | FI DI CO                                       |  |  |  |
| Instruction i+2                                                                                                           | FI DI COFO EI WO                               |  |  |  |
| ADD R4 R0 is stalled by two clock cycles                                                                                  |                                                |  |  |  |
|                                                                                                                           | 39                                             |  |  |  |
|                                                                                                                           |                                                |  |  |  |



| <b>Control Hazards (Cont.)</b>                                                                                |                                                                                                                                                                                                                                                              |  |  |
|---------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|
| Case 1: Branch is taken.                                                                                      |                                                                                                                                                                                                                                                              |  |  |
| At this the targ<br>taken, A<br>penalty of<br>Instruction cycle →<br>SUB R10, R9<br>BRGE CS2121<br>ADD R2, R1 | moment, both the condition (set by SUB) and<br>et address are known. Since the branch is<br>ADD R2, R1 will be executed next. There is a<br>of 3 clock cycles in this case.<br>1 2 3 4 5 6 7 8 9 10 11<br>FIDICOFOELWO<br>FIDICOFOELWO<br>Stall FIDICOFOELWO |  |  |
|                                                                                                               | 41                                                                                                                                                                                                                                                           |  |  |







## **Loop Buffer**

- Very fast memory
- Maintained by fetch stage of pipeline
- Check buffer before fetching from memory
- Very good for small loops or jumps
- c.f. cache
- Used by CRAY-1

45







## **Reading Material**

 Chapters 5&6. Computer Organization & Design: The HW/SW Interface by David Patterson and John Hennessy.

49