Skip to content
Informatikk notater
GitHubLinkedIn

Hazards, Forwarding, Branch Prediction

Hazards

PipelineInstructionStage
Immediate Memory4FETCH (IF)
Pipeline Register(PR1)4FETCH (IF)
Register (READ part)3DECODE (ID)
PR23DECODE (ID)
ALU2EXECUTE (IDEX)
PR32EXECUTE (IDEX)
DM1Memory (EX/MEM)
PR41Memory (EX/MEM)
Register (WRITE part)0WRITE BACK (WB)

Example 1

InstructionAssemblyData Hazard
0add t0, t1, t2
1sub t4, t0, t3RAW
2and t5, t0, t6RAW
3or t7, t0, t8RAW
4xor t9, t0, t10

Data Hazards

  • RAW: (0,1), (0,2), (0,3)
  • WAR
  • WAW

Data hazard types

  • RAW (Read After Write)
  • WAR (Write After Read)
  • WAW (Write After Write)

Example 2

0: li t1, 1
1: li t2, 2
2: addi t1, t1, 1
3: beq t1, t2, crypt
4: and t3, t1, t2
5: j exit
crypt:
    6: xori t3, t1, 0xff # xor 0x2 0xff = 0xfd
exit:
    7: addi t3, t1, 0x00

Here there is a RAW hazard happening between instruction 0 and 2 as instruction 2 depends on t1, but as instruction 2 executes in EX/MEM instruction 0 is still in FETCH stage.

The solution to this hazard is to insert a NOP between instruction 0 and 2 so instruction 0 is decoded as instruction 2 is executed. (The instructions can execute in parallel)

0: li t1, 1
1: li t2, 2
2: nop # addi 0x, 0x, 0
3: addi t1, t1, 1
...

Another problem is the branching on 3 because as the instruction is in WB, instructions 4 and 5 has been predictably queued and are being FETCHED and DECODED. If these instructions were executed it would throw the program out of whack.

The solution here is pipeline flushing. The ripes simulator does this automatically, seemingly by replacing the flushed instructions with NOPs. Seems like it does this by clearing the pipeline registers containing the instructions of 4 and 5

Branch Prediction

How early can we know whether a branch has been taken?

In the DECODE stage you could have a mini BEQ check, thus moving the Branch Taken from the EXECUTE stage to the DECODE stage, 1 pipeline step earlier.

A loop in a program has low locality, it uses only part of the instructions in a program and only part of memory frequently.

Hardware can optimize for a loop with a 1 bit branch predictor and even optimize for a loop within a loop with a 2 bit branch predictor

Forwarding

Too little time for this

Exam

  • Starts with Potpurri section from all of pensum
  • 3/4 bigger problems for focus ISA, microarchitecture, digital teknikk
  • OS paging