Instructions: Time $\mathbf{5 0}$ minutes. Open book and notes exam. No electronics. Please answer all problems in the space provided and limit your answer to the space provided. There are six problems and each problem is for 5 marks. No questions are allowed.

## <Good Luck>

P1. As the semiconductor technology is allowing manufacturing smaller transistors, what is happening to the following?

- Transistor switching delay: is getting smaller
- Number of transistors per chip: is increasing
- Processor design: more complex designs with more cores and larger caches
- Capacity per DRAM chip: is increasing
- Power consumption per transistor: is decreasing

P2. For same chip manufacturing technology,
a) What happens to the yield when the die area is increased?

The yield decreases.
b) What happens to the chip cost when the die area is doubled?

The chip cost increases about four times.

P3. A processor runs on a clock rate of 2 GHz . What is the CPU time needed to execute a program consisting of $4 \times 10^{9}$ instructions? Assume that the program has $40 \%$ memory instructions that take 2 cycles each and the other instructions take 1 cycle each.

Time $=\left[4 \times 10^{9} \times(0.4 \times 2+0.6 \times 1)\right] / 2 \times 10^{9}$

$$
=2 \times(0.8+0.6)=2.8 \text { seconds }
$$

P4. Schedule the following code sequence in the table below for a static multiple issue processor similar to the one described in the class but is four-issue wide instead of two. Assume that the issue packet can have one ALU instruction, one branch instruction, and two memory instructions as shown in the table below. Note that you need to perform register renaming in order to have an efficient schedule.

Loop: lw \$t0, $0(\$ s 1)$
addu \$t0, \$t0, \$s2
sw \$t0, 0(\$s1)
lw $\$ t 0,-4(\$ s 1)$
addu \$t0, \$t0, \$s2
sw $\$ t 0,-4(\$ s 1)$
lw $\$ \mathrm{t} 0,-8(\$ \mathrm{~s} 1)$
addu \$t0, \$t0, \$s2
sw \$t0, -8(\$s1)
lw \$t0, -12 (\$s1)
addu \$t0, \$t0, \$s2
sw \$t0, -12 (\$s1)
addi \$s1, \$s1,-16
bne \$s1, \$zero, Loop

| Packet | ALU instruction | Branch instruction | Memory instruction | Memory instruction |
| :---: | :--- | :--- | :--- | :--- |
| 1 |  |  | lw t0, 0(s1) | lw t1, -4 (s1) |
| 2 | addi s1, s1, -16 |  | lw t2, -8(s1) | lw t3, -12 (s1) |
| 3 | addu t0, t0, s2 |  | sw t0, 16(s1) |  |
| 4 | addu t1, t1, s2 |  | sw t1, 12 (s1) |  |
| 5 | addu t2, t2, s2 |  | sw t2, 8(s1) |  |
| 6 | addu t3, t3, s2 |  | sw t3, 4(s1) |  |
| 7 |  | bne s1,r0,Loop |  |  |
| 8 |  |  |  |  |

P5. Assume that a branch instruction is executed on a processor that has a branch prediction unit that uses a 2-bit branch history table. Also assume that the BHT is initialized to zeros and the initial prediction is not taken. Knowing that this branch instruction changes direction (between taken and not taken) every time it is executed, find the branch prediction accuracy rate.

The predictor starts in State 00 (deep not taken). Each taken branch takes the predictor to the 01 state (shallow not taken) and each not taken branch takes it back to the 00 state. The predictor always predicts not taken making it right in half the cases. So the accuracy is $\mathbf{5 0 \%}$.

P6. Assume that the following code sequence is executed by a double-issue speculative pipelined processor. This processor uses reservation stations, common data buses, and reorder buffer. The fetch stage takes one cycle and the issue stage takes one cycle. The integer latency is 1 cycle and the load latency is 2 cycles ( 1 cycle for address calculation and 1 cycle for data memory access). The processor has one address calculation unit, one memory access unit, one integer ALU unit, and one branch unit. Using the multi-cycle pipeline diagram below, specify the execution of these instructions in this processor pipeline.

|  |  | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 1w | R3, 0 (R1) | F | I | A | M | W | C |  |  |  |  |  |  |
| add | R4, R4, R3 | F | I |  |  |  | E | W | C |  |  |  |  |
| addi | R1, R1, \#4 |  | F | I | E | W |  |  | C |  |  |  |  |
| bne | R1, R2, -4 |  | F | I |  |  | E | W |  | C |  |  |  |

