Pages

Simple 8-bit Processor Design

ABSTRACT
The purpose of this project is to design, stimulate, and test an 8-bit multicycle microprocessor. Description of the processor will be written using Verilog HDL in register transfer level. Stimulation will be performed using ModelSim to demonstrate the executions of the processor’s 11 instructions.

SPECIFICATION
The processor should have an 8-bit address bus and an 8-bit bi-directional data bus for accessing memory and registers. Processor’s read and write operations should satisfy the timing specifications of external memory chips.
The processor should be able to do 11 instructions as described below(summarized in table 1):
• Load and Store instructions, including load-immediate, load and store, move data between memory and the register file.
• ALU instructions, including move, and, sub and add, perform ALU operations on register values and write the results back into the register file. Although the move is not performing a computation, it is using the transfer operation of the ALU. Note that the other ALU instructions, and, sub, and add, use register R1 as an implicit operand.
• Jump and Branch instructions, including branch, branch-if-negative, branch-if-zero, and jump-and-link, change the instruction execution sequence of a program.

InstructionAssembly FormInstruction Format  Operation performed
MoveMOV  RA, RB00RARBRB <= RA
LoadLD      RA, RB01RARBRB <= M[RA]
StoreST       RA, RB10RARBM[RA] <= RB
AndAND   RB11000RBRB <= RB AND R1*
AddADD   RB11001RBRB <= RB + R1*
SubSUB    RB11010RBRB <= RB – R1*
BranchBR       RB11011RBPC <= RB
Branch if zeroBRZ    RB11100RBPC <= RB  if  Z=1
Branch if negativeBRN    RB11101RBPC <= RB  if  N=1
Jump and linkJAL     RB11110RBPC <= RB,   RB <= PC + 1
Load immediateLDI     RB11111RBRB<=M[next byte], PC<=PC+1

The processor executes instructions in 4 stages:
1. Instruction fetch: Fetch the instruction from the memory(EPROM), latch it into the Instruction Register (IR) and increment the program counter.
2. Instruction decode: decode the instruction to decide which actions to perform. Determine the instruction from the opcode field and determine the operands from the operand fields.
3. Computation, Jump or Branch completion, or Memory access:
o Computation: Perform the ALU operation on the operand or operands specified by the instruction.
o Jump or branch completion: Load the PC with the target address specified by the instruction. Store the link address in the register file if instruction is JAL.
o Memory access: Place the address on the address bus.
4. Memory access completion or Write-back: Perform memory access, or store the ALU result in the destination register specified by the instruction.

DESIGN
I designed the processor using multicycle implementation, and used finite state machine to implement the control units of the processor. A complete data path of our multicycle implementation is shown in figure below:


To make the processor to execute instructions 4 steps as explained above, I use necessary number of registers to hold the data between each steps and clock cycles. I use 4 mux to select the appropriate inputs for the pc, registers, and memory. To simplify the design, I use an addition adder to perform the operation of PC+1, although this operation could be done using the
ALU. With such a simple design of using adder for PC, the implementation of the processor become more simple and flexible, especially in accomplishing instructions like JAL and LDI.
To allow the processor to select appropriate data in different step of execution and clock cycles, I implement the control unit of the processor with a finite state machine(Moore), which has total of 15 states. A detailed state diagram of the control unit is showed in the state diagram below:



STIMULATE AND TEST
To test and stimulate our implementation of the multicycle processor, I use Verilog to write a RTL description for my design. Source codes are in APPENDIX SECTION. I also use Modelsim to perform a functional stimulation. To verify the functionality of the processor, I have pre-loaded the EPROM with the instructions listed in Table 2. I initially set the PC to be 0 and set the data in the registers R0 to R7 to be 0, 1,2,3,4,5,6,7, respectively.
PC / MaddrMachine CodeInstruction/Assembly CodeOperations Performed/ResultsPC after instruction
000110001MOV R6,R1R6 <= 011
101010111LD R2, R7R7 <= M[2]  (M[2]=be)2
210111110ST R7, R6M[be] <= R6   (R6=06)3
311000011AND R3R3 <= 02  (R1=06, 03AND06=02)4
411001100ADD R4R4 <= 0a   (04+06 = 0a)5
511010101SUB R5R5 <= ff    (05-06=ff),  Neg<=16
611111000LDI R0R0 <= 0b  (M[7]=0b)8
700001011“Immediate value =0b”Skip since instruction is LDI
811010010SUB R2R2 <= fc   (02 – 06 = fc), Neg<=19
911101000BRN R0Check Neg=1, PC <= 0b0b
0a11110101JAL R5R5<=0b , PC<=ffff
0b11111000LDI R0R0 <= 0a0d
0c00001010“Immediate value =0a”Skip since instruction is LDI
0d11011000BR R0 PC <= 0a0a

The wave forms shows the execution of the instructions in table 2. The results are expected; all instructions are completed properly as I trace the instructions in the wave forms. In the simulation, the processor execute instructions from PC=0 to PC=9, and then it jump to PC=0a. After instructions in PC=0b and 0c, it jump back to PC=0a to execute the instruction “JAL R5”, which writes hex value of 0b in register R5 and jump to PC=ff. At PC=ff (M[ff]=opcode = 1111111), instruction is “LDI R7” which should load the value in M[next byte of ff] to register R7, but there is no next byte after memory address ff. As shown in figure 5, the result of this instruction is loaded the value in M[0] to register R7. This shows that the LDI as well as all other memory access instructions can always access memory. The waveforms also
verified all the instructions are working properly in our implementation. The results of the testing instructions are listed in table 2 column 4 and in the waveforms(figure 3-5) shown below.









Component Test
TetraMax is used to find all faults and generate test sets with ATPG.
All the stuck-at faults for the combinational components:
2to1 mux
4to1 mux
Decoder for controller
ALU
Detected Faults
114
276
22
489
Possibly detected
0
0
0
0
Undetectable Faults
0
0
0
11
ATPG untestable
0
0
0
0
Not detected
0
0
0
0
Total Faults
114
276
22
500
Test coverage
100%
100%
100%
100%
Number of tests
10
20
10
32
For components 2to1 mux and 4to1 mux, 114 and 275 faults were found, respectively, with no undetectable faults. The 2to1 mux needs 10 tests to detect all faults, and the 4to1 mux needs 20 tests to detect all faults. For the 7 to 4 decoder used in the control unit, 22 faults are found and 10 tests are generated. For the ALU, total of 500 faults were found, but 11 of these faults are redundant.  To detect all detectable faults for the ALU, 32 test patterns are needed.


2 comments:

  1. very nice article.
    Where can I find the Appendix section ?

    ReplyDelete
  2. Hello . I have a similar hw. like this. Can you help me about it please. Thanks by the way really good work.

    ReplyDelete