## 0907731 Advanced Computer Architecture (Spring 2013) Midterm Exam

رقم التسجيل: ..... رقم التسلسل: .....

الإسم: .....

\_\_\_\_\_

Instructions: Time 60 min. Open book and notes exam. No electronics. Please answer all problems in the space provided and limit your answer to the space provided. No questions are allowed. Every question has 6 points.

## \_\_\_\_\_\_ Q1. The figure below shows the block diagram of Sun UltraSPARC T1 processor. Briefly describe this processor.



- This is an 8-core multi-threaded processor.
- Each core has its own L1 caches, but the 8 cores share four banks of L2 caches through the crossbar.
- For high memory throughput, each of the four L2 cache banks has one memory port to the DDR2-SDRAM memory. Each cache bank is responsible of caching one fourth of the address space.

|                                          | tion: It spends 150 seconds in the first phase and 50 seconds in the xecution of only one phase by a factor of five. What would be your                                            |
|------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| f = 150 / (150 + 50) = 0.75              |                                                                                                                                                                                    |
| <b>Speedup</b> = 1 / (1-0.75 + 0.75/5) = | = 1 / (0.25 + 0.15) = 1 / 0.4 = 2.5                                                                                                                                                |
|                                          |                                                                                                                                                                                    |
|                                          |                                                                                                                                                                                    |
|                                          |                                                                                                                                                                                    |
|                                          |                                                                                                                                                                                    |
|                                          |                                                                                                                                                                                    |
| three. Convert the following C cod       | he MIPS instruction set architecture that has two operands instead of<br>e to this variant of MIPS instructions. Assume that the compile<br>Register R1, R2, and R3, respectively. |
| if (x == 0)                              |                                                                                                                                                                                    |
| x = y + z;<br>else                       | bnez R1, Else                                                                                                                                                                      |
| $\mathbf{x} = \mathbf{x} - \mathbf{z};$  | add R1, R2                                                                                                                                                                         |
|                                          | add R1, R3                                                                                                                                                                         |
|                                          | j Out                                                                                                                                                                              |
|                                          | Else:                                                                                                                                                                              |
|                                          | sub R1, R3                                                                                                                                                                         |
|                                          | Out:                                                                                                                                                                               |
|                                          |                                                                                                                                                                                    |
|                                          |                                                                                                                                                                                    |
|                                          |                                                                                                                                                                                    |
|                                          |                                                                                                                                                                                    |
|                                          |                                                                                                                                                                                    |
|                                          |                                                                                                                                                                                    |
|                                          |                                                                                                                                                                                    |
|                                          |                                                                                                                                                                                    |
|                                          |                                                                                                                                                                                    |

**Q4.** Unroll the following loop twice and schedule it, then find how many cycles are needed to execute one iteration of the unrolled loop. Assume that you have the five-stage pipeline with multi-cycle operations, FP operations take 3 cycles, branch instructions are resolved in the Decode stage, and the processor can write one integer result and one FP result in the Write Back stage every cycle.

| Loop: |                                     | F4,<br>F0,               | 0(R2<br>F0,                    |                        | 1        | F C<br>F | ) ]<br>     | <b>E 1</b><br>D - | и и<br>- е            | E I              |             |        |    |     |     |     |     |     |   |   |    |    | sta:<br>tall |    | -    |    |
|-------|-------------------------------------|--------------------------|--------------------------------|------------------------|----------|----------|-------------|-------------------|-----------------------|------------------|-------------|--------|----|-----|-----|-----|-----|-----|---|---|----|----|--------------|----|------|----|
|       | daddi                               | R2,                      | R2,                            | #-8                    |          |          |             |                   | I                     | F -              |             | - 1    | DI | 2 1 | M   | N   |     |     |   |   |    |    |              |    |      |    |
|       | daddi                               | R1,                      | R1,                            | <b>#</b> -8            |          |          |             |                   |                       |                  |             | 1      | FI | D I | E 1 | M   | N   |     |   |   |    |    |              |    |      |    |
|       | bneq                                | R1,                      | R3,                            | Loop                   | <u>S</u> |          |             |                   |                       |                  |             |        | 1  | FI  | D - | - 1 | EI  | M I | W |   | 01 | ne | sta          | 11 | cyc] | le |
| Loop: | l.d<br>l.d<br>mul.d<br>l.d<br>daddi | F4,<br>F6,<br>F0,<br>F8, | 0 (R2<br>-8 (1<br>F0,<br>-8 (1 | 2)<br>R1)<br>F4<br>R2) | F        | F        | E<br>D<br>F | M<br>E<br>D<br>F  | W<br>M<br>E<br>D<br>F | W<br>M<br>E<br>D | W<br>E<br>E | E<br>M | м  | W   | 1   | 2   | 3   | 4   | 5 | 6 | 7  | 8  |              |    |      |    |
|       |                                     |                          | F6,                            |                        |          |          |             |                   |                       | -                |             |        | E  |     | F   | м   | TAT |     |   |   |    |    |              |    |      |    |
|       |                                     |                          | F2,                            |                        |          |          |             |                   |                       |                  | E           |        | D  |     |     |     |     | ы   |   |   |    |    |              |    |      |    |
|       |                                     |                          |                                | #-16                   | 6        |          |             |                   |                       |                  |             | -      |    |     | E   |     |     |     |   |   |    |    |              |    |      |    |
|       |                                     |                          |                                | " <u>-</u>             |          |          |             |                   |                       |                  |             |        | •  |     |     |     |     | E   | м | W |    |    |              |    |      |    |
|       | bneq                                | R1,                      | Loop                           | ,                      |          |          |             |                   |                       |                  |             |        |    |     | F   | D   | E   | М   | W |   |    |    |              |    |      |    |
|       | add.d                               | F2,                      | F2,                            | <b>F10</b>             |          |          |             |                   |                       |                  |             |        |    |     |     |     |     |     |   |   |    |    |              |    |      |    |

Need 11+4 cycles to execute one iteration of the unrolled loop.

**Q5.** Assume that the following five instructions are executed by a processor that uses Tomasolu's Algorithm in executing all instruction types. This processor has <u>two</u> reservation stations for each functional unit. The integer latency is 1 cycle, the memory latency is 2 cycles, and floating-point latency is 2 cycles. The processor has one address calculation unit, one memory access unit, one integer ALU unit, one floating-point unit, and one branch unit. Using pipeline diagram in the space below, find the number of cycles needed to fetch and commit these instructions.

|       |     |        | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9                     | 10                    | 11 | 12 | 13 | 14 | 15 | 16 | 17 |
|-------|-----|--------|---|---|---|---|---|---|---|---|-----------------------|-----------------------|----|----|----|----|----|----|----|
| lw    | R1, | 0(R2)  | F | Ι | A | Μ | W |   |   |   |                       |                       |    |    |    |    |    |    |    |
| lw    | FO, | 0(R1)  |   | F | Ι | - | - | Α | Μ | W |                       |                       |    |    |    |    |    |    |    |
| add.d | F4, | F2, F0 |   |   | F | Ι | - | - | - | - | <b>A</b> <sub>1</sub> | <b>A</b> <sub>2</sub> | W  |    |    |    |    |    |    |
| SW    | F4, | 0(R1)  |   |   |   | F | - | Ι | Α | - | -                     | -                     | -  | Μ  |    |    |    |    |    |
| lw    | R1, | 8(R2)  |   |   |   |   | F | - | - | - | Ι                     | A                     | Μ  | W  |    |    |    |    |    |

## 12 Cycles

<Good Luck>