XC3020-based CPU

The design of a Xilinx FPGA-based processor

Yoav Freund and Rene Dorta, CMPE202 FALL 1990

CPU design - tools and procedures

The REGIS CPU was implemented in hardware using two XILINX 3020 LCA chips. These are very flexible and powerful Logic Cell Array chips can be programmed, using an external ROM chip, to perform general logic manipulations. Their design involves 64 CLB units, each capable of performing a simple binary function (3 to 2 or 4 to 1) and two flip flops. By programming the CLB's and their connections any sequential logic can be implemented provided there are enough CLB's and provided the routing between CLBs is doable.
The final hardware on which REGIS was implemented included two LCA chips, two EPROMs for programming these chips, a RAM that was emulated by an emulator that runs on a PC computer, and several pull-up resistors, anti-noise capacitors and the like. The connections between the chips where doen using a development breadboard used for general design projects.
The implementation of the REGIS CPU involved the extensive use of computer aided design tools. This section reports the exact tools used and the tips that we have gathered while using them and that might be of use for others that will follow a similar route.
The design is separated into two. The control and the data-path. The data-path was designed on the circuit level using the FUTURENET schematic editor. The control's general design was done on the scematic level, but most of the actual logic was entered as text (ascii) files that describe the state machine and the decoder. These design entries can be viewed as the "source code" for the design of the hardware. From these source files, after many compilation, checking and translation steps a "bit-stream" is generated. This bit stream is stored into the EPROMS and programs the LCA chips after power-up. This bit stream is the equivalent of the machine code that is the final result of software development.
The many steps between the source files and the final EPROM chips that are used in the circuit are summarized in Figure 1. as most of the steps are automatic, they were executed by a MAKE file that is also in the report.
In the following I shall give a brief description of each step.
All the scematic design tools run on PC computers.

Futurenet:
This is the schematic editor tool on which most of the time is spent, it is a good tool although it takes some time to get used to it. some hints:
It is worthwhile to use the command lines, (things like /l;left 5;/l;/l;right 5; down 3) it generates a very regular schematic which is easier to correct.
The attributes cause most of the errors, the important ones to understand are SIG, PINI, PINO, FILE. It seems also nice to use PART and LOC.
It is worthwhile to learn to use BUSes.
It is important to name all signals, in the final designs most many signal get arbitrary names and then the simulator: SUSIE is very hard to use.

DCM,DRC,PINC:
These programs work on the individual drawings in the hierarchical design, the errors they find are usually easy to locate and correct, most of the warnings can be ignored as they arise from the fact that they look at a partial design (one file at a time).

PIN2XNF:
This is the main linker - in this stage most of the incompatibility errors between the parts in the hierarchical design will be reported.

XNFMERGE:
This is a link program that links the `.xnf` files generated by the schematic tools to the .xnf files generated by the logic design tools and imported from the unix stations on which they run.

XNFMAP,MAP2LCA:
These are the tools that translate the functional design into a design using the CLB's in the LCA chips. In this step problems of incorrect buffering routing are detected, note:
the clock should be driven by a special buffer and its net should be declared as long and critical.
internal tristate buffers and external tristate buffers are not the same!
defining internal busses as ``long'' lines aids the mapper/router in choosing a good mapping.

APR:
The router takes by far more time then all the other steps combined (about 45 minutes per chip) and it might fail for complex designs even if they use only a small part of the chip. We were lucky as the APR managed to route our data path although it used 54 out of the 64 CLB's.

XACT:
This is the tool that shows you how your chips actually looks like internally after it has been routed. It can actually be used to program the LCA on the actual CLB level and it has many aids for doing that, we used it only for doing a final checking (DRC) of our designs and generating the bitstreams to be loaded into the EPROM.

TOPC and PAL:
are programs that run on the digital lab's `dlab2`PC that programs EPROMS, they are lousy but do the job.

SUSIE:
the simulator is an important design tool in this methodology in which most of your design is inaccessible - in the LCA chip internal works. With respect to its importance, it is definitely a great dissappointment:
It user interface is bad, they way menus and selections are done is very inconvenient.
The mouse driver it uses is non standard and taking the driver off in order to use another program requires turning the machine off!
The signal names accepted by the simulator are only 9 characters long, highly incompatible with the other design tools that generate very long names from the hyrarchical design (especially in the .xnf files). The only way we have for overcoming that is to use an editor (the SED) with a command script, to run on the .xnf files and replace the names with shorter names.
The simulator has a bug - it will not read correctly signals that are tied to the ground (constant low).
the simulator can run on one LCA chip final design (DPBUFFf.LCA) at a time, thus full timing simulation of designs involving more than one LCA such as ours are not possible.

The design of the control logic was done on the SUN-stations (the beans connected to DAIZU).
The design of the state machine was entered in a special format (see clogic.fsm) in this format, each line relates with a specified previous state and partly specified input (with possibly some ``dont cares'') an output condition and a next state. This file is first run through MUSTANG which assigns each state a binary encoding by which it is stored in the state-register. Using the `-l` option, as we did, results in an encoding in which for each state one of the bits is one and all the others are zero. As after RESET the LCA chip is in a state where all the flip-flops are at zero we had to get into the .PLA file generated by mustang and assign this encoding to the reset state in which control unit starts its operation.
`misII`
`was used to derive and minimize equations for implementing the logic required to execute the state machine's function.`
`EQN2XNF`
`translates the results of misII to the .xnf format that can then be transported to the PC and linked into the schematics.`

The final hardware

The final hardware was done on a design kit, this is a good kit for fast breadboarding and testing but is sometimes unreliable. It also resulted in an unavoidable spagetti of wires that luckily did not cause us too many errors but is probably partly responsible to the high noise level on our lines.
It seems that the debounced toggle button electronics was not able to drive some of the signals that we needed (the PGM/DN open drain signal) and we had to use a switch.
Another tool that was used in the final stage is the ram emulator, this tool is more or less convenient (once you can get access to it!), two notes might be worthwhile:
The emulator simply maps the onboard pin-compatible socket into address `c000:0` and up (LOW-mem socket) or c200:0 (high-mem socket) so using DEBUG - the DOS debugger (or better yet, if you have one of those full-screen low-level debuggers for the PC) you can easily monitor and change the memory. This is a much better solution than using the READMEMH,WRITMEMH programs etc. (you can also load a program from a file using the DEBUG program LOAD command).
The ram-emulator (at least the one we used pays no attention to the signal cs1, and is activated/deactivated only according to cs2.

Final results

The hardware was able to run exactly as planned when the clock was run manually or at rates up to about 1Khz, at higher rates errors would result. This is most probably a result of extremely noisy signal that are a result of the very long lines and the poor connections.

Pak K. Chan
Wed Oct 18 10:35:29 PDT 2000