Instruction Set Architectures

Let's look at an example of an ISA. We'll begin by looking at a program written in assembly language, a language of mnemonic instructions. There's usually a 1 to 1 correspondence between assembly language instructions and machine instructions.

ori r1,r0,16 // OR immediate; r1 := r0 OR 16 ori r2,r0,1 // r2 := r0 OR 1 loop: dadd r3,r2,r2 // r3 := r2 + r2 movz r2,r3,r0 // r2 := r3 if r0 == 0 dsubi r1,r1,1 // r1 := r1 - 1 bnez r1,loop // if r1 != 0 then goto loop

This program will be assembled into six 32-bit instructions in the MIPS ISA. What does the program do?

Types of Instruction Sets

Stack
Accumulator
General-purpose register

Load/Store (or register-register). Only load and store instructions access main memory. The rest of the instructions act only on registers.
Register-memory. Any instruction may access main memory.

So-called Reduced Instruction Set Computing (RISC) architectures are load/store architectures. Examples: SPARC, MIPS, Alpha AXP, PowerPC. Complex Instruction Computing (CISC) architectures are usually register-memory architectures. Examples: VAX, x86, MC68000.

Memory Addressing

Addressing memory is one of the most important functions of the ISA. How memory is addressed is specified by the ISA.

Endianness. Memory can be accessed as either Big Endian or Little Endian. That means a multi-byte structure such as an integer or address may be stored with either it's most significant byte first or last.
Alignment. Some ISAs, for implementation reasons, require that memory accesses be aligned. For instance, on the Alpha, an access to a 8-byte quadword must be through an address divisible by 8; otherwise, the offending load or store instruction will generate an exception.
Addressing modes. Addressing mode refers to the way in which a machine instruction accesses memory. The machine instruction contains some data that is used to come up with the effective address that will be used in some transaction with the memory system. Some examples:
- PC-relative addressing. An immediate offset in the instruction is added to the program counter register to yield the effective address.
- Displacement addressing. An immediate offset in the instruction is added to a specified register to yield the effective address.
- Immediate addressing. An immediate value is specified.
- Indirect addressing (in combination with other modes). A first effective address is used to fetch a value from memory. That value is used to form a second effective address.
Operand Types
- Integers. Usually 8-bit (characters), 16-bit (words), 32-bit (doubleword), 64-bit (quadword). The terminology may differ from one ISA to another.
- Single and double precision floating point numbers, usually 32-bit and 64-bit respectively.
- Binary-coded decimal. A single decimal digit occupies one half of a byte. Sometimes called packed decimal because decimal digits are packed together into bytes.
- Strings. Some ISAs support variable-length strings of bytes as a primitive data type in memory.
- Vectors of primitive types. Some (weird) ISAs support fixed or variable length vectors of primitive types. Examples: CRAY vector processors, MMX extensions to x86.
Types of Instructions
- Data transfer instructions. Load data from memory into registers, or store data from registers into memory. Transfer data between different kinds of special-purpose registers.
- Arithmetic and logical instructions. Perform arithmetic (e.g. add, subtract, multiply) and logic (e.g. AND, OR, XOR) as well as a combination of both (less than, greater than, compare).
- Control transfer instructions. Instructions that affect the value of the program counter register. Unconditional jump, procedure call, return, conditional branch, indirect jump, software interrupt (e.g. trap). Also, instructions that do something based on some condition, e.g. predicated instructions.
- Floating point instructions. Traditionally, instructions that deal with floating point values are given separate treatment. Add, multiply, scientific calculator-type functions (e.g. tangent, square root), convert between integer, single, and double precision.
Instruction Encoding

How are instruction types, operands, addressing modes, etc. communicated to the hardware? The ISA specifies a binary encoding of instructions. The assembler encodes programs using this encoding, and the microarchitecture reads and executes the encoded program. The MIPS instruction set is a good example.

Example: The MIPS instruction set

Every instruction in the MIPS instruction set is 32-bit long. Let's number the bits from 0 to 31, with 31 being the most significant bit. There are 32 integer registers. Most of them are general purpose. Register R0 is always set to 0. The first six bits, bits 31-26, specify an opcode giving information about what that instruction is supposed to do. MIPS registers and addresses are 64-bit. MIPS is byte-addressable, requires aligned accesses, and can be switched to either Big Endian or Little Endian.
```
___________________________________________________________________________ |_6-bit opcode_|_5-bit_rs_|_5-bit_rt_|________16-bit_immediate______________|
```
```
___________________________________________________________________________ |_6-bit opcode_|_5-bit_rs_|_5-bit_rt_|_5-bit_rd_|_5-bit_shamt_|_6-bit_funct_|
```
```
___________________________________________________________________________ |_6-bit opcode_|_____________26-bit offset added to PC______________________|
```
Compilers

A compiler is a program that translates programs from a high-level language into machine instructions that can be directly executed by the CPU. The compiler has a large and, these days, increasing impact on the performance of computer systems.

Structure of an Optimizing Compiler
1. Front end. The front end reads in the source code written by the programmer. It performs lexical analysis and parsing to get the code into an intermediate form that can be easily worked with in the rest of the compiler. The result of the front end is an intermediate representation of the program code, e.g. an abstract syntax tree or three-address code.
2. High-level optimizations. This stage performs high-level code-improving transformations on the intermediate representation. The transformations at this stage are mostly machine-independent, i.e., they require little or no knowledge about the ISA. Some examples:
  - Constant propagation, constant folding
  - Redundancy elimination
  - Loop transformations
  - Procedure integration (automatic inlining)
  - Dead and unreachable code elimination
3. Low-level optimizations. At this stage, the code is transformed into a lower-level intermediate representation that has more information about the ISA. Examples of possible optimizations:
  - Strength reduction
  - Machine idioms
  - Register allocation
  - Cache-concious loop transformations
4. Code generation. At this stage, assembly-language code is generated from the intermediate representation. Some optimizations may be performed at this stage:
  - Code placement
  - Low-level feedback-directed optimization, e.g. branch hints
  - Instruction selection (e.g. mov 0 vs. xor)
  - Peephole optimizations
5. Assembly. The assembler transforms the assembly language program into machine instructions ready to be loaded. Certain alignment optimizations may be performed at this time.
Effect of Compiler on the Architecture

The computer architect must be aware of compiler technology. Compilers decide what the mix of instructions executed will be. These days, architecture people and compiler people must cooperate to achieve improved performance.

Example Problems
- Useless instructions. If an architect designs a very cool instruction that is hardly ever used by the compiler, something has gone wrong. The VAX polynomial-evaluate and CALL instructions are examples. An instruction set must be, in a sense, easy to compile for.
- Overly-ambitious or useless optimizations. The compiler writer might think that reducing the instruction count is the most important way to improve performance. However, the compiler writer must be aware that on modern microarchitectures, sometimes programs that execute more instructions can be faster. For example, multiplying by a constant can be done with a single multiply instruction or several shifts and adds. The latter may be faster, depending on microarchitectural details. On the Alpha, for example, inserting NOP (no operation) instructions in just the right places will cause a more even usage of functional units, speeding things up.
- Not exposing enough of the microarchtecture. Modern microarchitectures with caches, speculation, out-of-order execution, etc. are increasingly complex. It is very difficult for a compiler writer to come up with a reasonable performance model without microarchitectural details. If more of these details are exposed, either through documentation or explicit changes in the ISA (e.g. performance counters, hint instructions), the compiler writer will be able to better improve the generated code.
- Exposing too much of the microarchitecture. Encoding microarchitectural details into the ISA means that every subsequent implementation of the ISA must support these details or risk not being backward-compatible. A good example is branch delay slots.
Example Compiler Output
Consider the following C program:
```
#include int a; int main () < int i; a = 0; for (i=0; iprintf ("%d\n", a); exit (0); >
```
With GCC version 2.95.4 on my Pentium 4, it compiles into something like this:
```
.LC0: .string "%d\n" .comm a,4,4 main: movl $0,a # store a 0 to memory location a xorl %edx,%edx # edx := edx xor edx, i.e., edx := 0 .L21: movl %edx,%eax # eax := edx addl a,%eax # eax := eax + a movl %eax,a # a := eax incl %edx # edx := edx + 1 cmpl $99,%edx # compare edx to 99 jle .L21 # if less than or equal then goto .L21 pushl %eax # push eax onto the stack pushl $.LC0 # push the address of "%d\n" onto the stack call printf # call the printf function pushl $0 # push a 0 onto the stack call exit # call the exit function
```
- What is each instruction doing?
- What is the instruction count? That is, how many instructions are executed when this program runs?
- Suppose branch instructions take 2 cycles, memory instructions take 3 cycles, and all other instructions take 1 cycle. How many cycles are consumed by this program?
- What is the IPC (instructions per cycle) for this program?

Instruction Set Architectures

Types of Instruction Sets

Memory Addressing

Operand Types

Types of Instructions

Instruction Encoding

Example: The MIPS instruction set

Compilers

Structure of an Optimizing Compiler

Effect of Compiler on the Architecture

Example Problems

Example Compiler Output