Several single-purpose and specialized hardware systems have been built to address biosequence analysis problems. Four recent directions are illustrated by BioScan, BISP, B-SYS, and PAM and Splash.
BioScan is a massively parallel VLSI
implementation of the BLAST algorithm [27]. BioScan chips
are fabricated in
1.2
m CMOS, contain 812 PEs (537000 transistors), and run at
32MHz. This chips can accept a character every 17 clocks, or
1.88 million characters per second (1.88 MCPS), though the actual
system is slower. A working system of 16 chips (12992 PEs) has been
built, and an Internet server interface is currently under
development. The system is preloaded with a weight table scaled
according to sequence length before performing the comparison.
Internally, BioScan has 16-bit words and uses an infinity value to
prevent overflow and minimum value (0) to prevent underflow.
BISP takes a different approach, implementing in hardware full dynamic
programming with gaps [6]. Each 400000-transistor,
1
m CMOS chip contains 16 sequence comparison PEs. Each PE has
about 35 16-bit registers, comparators, and multiplexors, in addition
to local random access memory (RAM) for the storage of cost tables.
About 20% of each chip is devoted to controlling the PEs. As with
BioScan, sentinels prevent overflow and underflow. Although it
operates with an 80ns clock (one clock for each dynamic programming
cell update), its limiting factor will be the 3Mbyte/s data transfer
rate to the host.
B-SYS is a general-purpose systolic array
that, although optimized for a variety of sequence comparison
algorithms, can perform several other functions such as text
compression and decompression [17]. With its less
aggressive implementation, each 85000-transistor 2
m CMOS chip
features 47 PEs in a linear array. Each 8-bit ALU shares 16 registers
with each of its two neighbors, enabling shared data and zero-overhead
communication. The chips can execute instructions
at about 4MHz, and the basic sequence comparison operation requires
6-25 instructions (660-160 kCPS), depending on algorithm. Although
sentinels can be programmed, typically modulo sequence comparison, in
which dynamic programming values are compared modulo 256, is used
[21, 16]. As with BISP, B-SYS can implement dynamic
programming with gap penalties. A reimplimentation on the
order of BISP or BioScan could place 256 PEs on each chip, and
be clocked 3-6 times faster.
Splash and PAM are field-programmable gate array systems specifically designed for configuration as special-purpose co-processors [12, 2, 15]. Thus, they provide for the implementation of sequence comparison algorithms in hardware without the need to fabricate new chips for each algorithm. Programming specific algorithms is, unfortunately, time consuming. An edit-distance calculation with fixed costs and no gaps required nearly 3000 lines of code, and placed 30 PEs on each chip (the implemented algorithm required 2N-1 PEs to compare strings of length N over a 4-character alphabet). Implementation for a larger alphabet (the proteins), biologically-significant cost functions (at least a 6 bits), and gap penalties would severely decrease processor density and, as a result, performance. However, the effort in programming these systems is significantly lower than designing a VLSI chip, and they are general-purpose, able to perform a large number of functions.
Table 1: Edit distance calculation (arranged by CUPS/chip).
Table 1 summarizes the performance of several of these
machines. The number of PEs, maximum native sequence length, and the
approximate number of chips are listed for each machine. Performance
metrics include
sequence comparison, a metric first
used with the nMOS P-NAC system [22]; the maximum attained
or attainable number of dynamic programming cell updates per second
(CUPS), generally not
for the
problem; and the CUPS per chip, a rough
measurement of efficiency or performance per dollar. B-SYS* figures are
estimates of a 64-chip version (using the 47-PE chip) with a more
sophisticated interface between host and co-processor (the prototype
was built on an ISA card, and each instruction required 3 writes over
the 8MHz bus). The BISP times are derived from published system
performance estimates. In the
column, BISP and B-SYS*
are assumed to be able to perform several comparisons at once,
breaking each comparison at a chip boundary, as the BioScan system
enables (this handicaps BioScan, as 712 PEs are not used). All others
are based on personal experimentation or extrapolation from published
experimentation. Note that the table is for the simplest dynamic
programming edit-distance calculation. BioScan implements a different
algorithm on a larger alphabet, and BISP can do gap comparison in the same
amount of time (P-NAC cannot, Splash would require a new configuration,
and B-SYS would slow down by a factor of 4).
Jones reports 75MCUPS on a 64K CM-2 (corresponding to two times
faster than the results in the table) by microcoding the inner loop of
the dynamic programming algorithm [19]; Jones has also
presented methods for database pattern searching with limited gap
length on the CM-2 [18]. By making full use of modulo
sequence comparison to reduce data communication, this author estimates
that a factor of 2-4 performance increase is attainable over the CM-2 and MP-1
results of Table 1 for long sequences [16]. Core
has compared dynamic programming and the BLAST algorithm on the CM-1
and Intel iPSC hypercube computer [8].
Figure 1: Hidden Markov model for protein comparison and
alignment [13].