## Computer Design (0907432) Solution of Homework 3

Problems 1-7 are 6 marks each; Problems 8-11 are 4 marks each; Problems 12-14 are 6 marks each.

**Problem 1**: (a) What are the three main problems of the first generation of VLIW processors?

- 1. Increase in code size
- 2. Operated in lock-step; no hazard detection HW
- 3. Binary code compatibility

(b) How these problems are solved in EPIC?

- 1. Register independent instructions are grouped in instruction groups and arranged in bundles. There are no empty fields in the bundles.
- 2. The processor checks dependencies and can detect hazards.
- 3. Bundles are independent of the hardware implementation.

**Problem 2**: In the ILP limits study described in Chapter 3, what are the six assumptions on an ideal processor?

- 1. Infinite instruction issue per cycle
- 2. Infinite instruction window size
- 3. Infinite register renaming
- 4. Perfect branch prediction
- 5. Perfect caches
- 6. Perfect memory alias analysis

**Problem 3**: Why do the SPECFP applications have more ILP than the SPECINT applications?

Because they often have long loops that act on regular arrays.

**Problem 4**: Is it realistic to assume that in the coming few years engineers will be able to build processors that issue up to 64 instructions per cycle? Why?

No: Because there will be huge number of checks that need to be done to analyze these 64 instructions with themselves and with previous instructions. This is very complex and will adversely affect the clock cycle.

## **Problem 5**: Simultaneous Multithreading (SMT)

 a) Briefly describe the main features of an SMT processor. The SMT processor fetches and issues instructions from multiple threads every cycle. b) Why does the SMT processor have best processor resources utilization when compared with superscalar, fine-grained multithreading, and coarse-grained multithreading?

The SMT processor can execute ready instructions from multiple threads at every cycle. However, the superscalar processor is limited to executing ready instructions from one thread and the other multithreading processors execute ready instructions that alternate from multiple threads. So SMT generally has more ready instructions to execute.

c) Give one advantage and one disadvantage of giving equal priorities to all threads running on an SMT processor?
Advantage: Highest throughput Disadvantage: each thread runs slower.



Problem 6: Draw the organization of the Power 5 processor's pipeline?

Problem 7: Draw the organization of the Sun UltraSPARC T2 multicore processor?



Problem 8: State the four classes in Flynn's taxonomy?

- 1. Single Instruction Single Data (SISD)
- 2. Single Instruction Multiple Data (SIMD)
- 3. Multiple Instruction Single Data (MISD)
- 4. Multiple Instruction Multiple Data (MIMD)

**Problem 9**: Draw typical organizations of centralized memory and distributed memory multiprocessors?



**Centralized Memory** 

## **Distributed Memory**

Problem 10: What are the two communication models in parallel architectures?

- 1. Passing messages among the processors
- 2. Communication through a shared address space

Problem 11: What are the two main challenges in parallel programming?

- 1. Sequential fraction of the program
- 2. Long latency of remote memory

**Problem 12**: Give a definition for a coherent memory system?

- 1. <u>Preserve Program</u> Order: A read by processor P to location X that follows a write by P to X, with no writes of X by another processor occurring between the write and the read by P, always returns the value written by P
- 2. <u>Coherent view of memory</u>: Read by a processor to location X that follows a write by another processor to X returns the written value if the read and write are sufficiently separated in time and no other writes to X occur between the two accesses
- 3. <u>Write serialization</u>: 2 writes to same location by any 2 processors are seen in the same order by all processors

Problem 13: Give a definition for write consistency?

- **1.** A write does not complete (and allow the next write to occur) until all processors have seen the effect of that write
- 2. The processor does not change the order of any write with respect to any other memory access; if a processor writes location A followed by location B, any processor that sees the new value of B must also see the new value of A

**Problem 14**: Draw the state diagram of the write-back snoopy cache coherent protocol described in the class?

