next up previous
Next: Hidden Markov model sequence Up: Introduction Previous: Biosequence analysis methods

Hardware for biosequence analysis

Several single-purpose and specialized hardware systems have been built to address biosequence analysis problems. Four recent directions are illustrated by BioScan, BISP, B-SYS, and PAM and Splash.

BioScan is a massively parallel VLSI implementation of the BLAST algorithm [27]. BioScan chips are fabricated in 1.2 tex2html_wrap_inline454 m CMOS, contain 812 PEs (537000 transistors), and run at 32MHz. This chips can accept a character every 17 clocks, or 1.88 million characters per second (1.88 MCPS), though the actual system is slower. A working system of 16 chips (12992 PEs) has been built, and an Internet server interface is currently under development. The system is preloaded with a weight table scaled according to sequence length before performing the comparison. Internally, BioScan has 16-bit words and uses an infinity value to prevent overflow and minimum value (0) to prevent underflow.

BISP takes a different approach, implementing in hardware full dynamic programming with gaps [6]. Each 400000-transistor, 1 tex2html_wrap_inline454 m CMOS chip contains 16 sequence comparison PEs. Each PE has about 35 16-bit registers, comparators, and multiplexors, in addition to local random access memory (RAM) for the storage of cost tables. About 20% of each chip is devoted to controlling the PEs. As with BioScan, sentinels prevent overflow and underflow. Although it operates with an 80ns clock (one clock for each dynamic programming cell update), its limiting factor will be the 3Mbyte/s data transfer rate to the host.

B-SYS is a general-purpose systolic array that, although optimized for a variety of sequence comparison algorithms, can perform several other functions such as text compression and decompression [17]. With its less aggressive implementation, each 85000-transistor 2 tex2html_wrap_inline454 m CMOS chip features 47 PEs in a linear array. Each 8-bit ALU shares 16 registers with each of its two neighbors, enabling shared data and zero-overhead communication. The chips can execute instructions at about 4MHz, and the basic sequence comparison operation requires 6-25 instructions (660-160 kCPS), depending on algorithm. Although sentinels can be programmed, typically modulo sequence comparison, in which dynamic programming values are compared modulo 256, is used [21, 16]. As with BISP, B-SYS can implement dynamic programming with gap penalties. A reimplimentation on the order of BISP or BioScan could place 256 PEs on each chip, and be clocked 3-6 times faster.

Splash and PAM are field-programmable gate array systems specifically designed for configuration as special-purpose co-processors [12, 2, 15]. Thus, they provide for the implementation of sequence comparison algorithms in hardware without the need to fabricate new chips for each algorithm. Programming specific algorithms is, unfortunately, time consuming. An edit-distance calculation with fixed costs and no gaps required nearly 3000 lines of code, and placed 30 PEs on each chip (the implemented algorithm required 2N-1 PEs to compare strings of length N over a 4-character alphabet). Implementation for a larger alphabet (the proteins), biologically-significant cost functions (at least a 6 bits), and gap penalties would severely decrease processor density and, as a result, performance. However, the effort in programming these systems is significantly lower than designing a VLSI chip, and they are general-purpose, able to perform a large number of functions.

   table148
Table 1: Edit distance calculation (arranged by CUPS/chip).

Table 1 summarizes the performance of several of these machines. The number of PEs, maximum native sequence length, and the approximate number of chips are listed for each machine. Performance metrics include tex2html_wrap_inline464 sequence comparison, a metric first used with the nMOS P-NAC system [22]; the maximum attained or attainable number of dynamic programming cell updates per second (CUPS), generally not for the tex2html_wrap_inline464 problem; and the CUPS per chip, a rough measurement of efficiency or performance per dollar. B-SYS* figures are estimates of a 64-chip version (using the 47-PE chip) with a more sophisticated interface between host and co-processor (the prototype was built on an ISA card, and each instruction required 3 writes over the 8MHz bus). The BISP times are derived from published system performance estimates. In the tex2html_wrap_inline464 column, BISP and B-SYS* are assumed to be able to perform several comparisons at once, breaking each comparison at a chip boundary, as the BioScan system enables (this handicaps BioScan, as 712 PEs are not used). All others are based on personal experimentation or extrapolation from published experimentation. Note that the table is for the simplest dynamic programming edit-distance calculation. BioScan implements a different algorithm on a larger alphabet, and BISP can do gap comparison in the same amount of time (P-NAC cannot, Splash would require a new configuration, and B-SYS would slow down by a factor of 4). Jones reports 75MCUPS on a 64K CM-2 (corresponding to two times faster than the results in the table) by microcoding the inner loop of the dynamic programming algorithm [19]; Jones has also presented methods for database pattern searching with limited gap length on the CM-2 [18]. By making full use of modulo sequence comparison to reduce data communication, this author estimates that a factor of 2-4 performance increase is attainable over the CM-2 and MP-1 results of Table 1 for long sequences [16]. Core has compared dynamic programming and the BLAST algorithm on the CM-1 and Intel iPSC hypercube computer [8].

   figure153
Figure 1: Hidden Markov model for protein comparison and alignment [13].


next up previous
Next: Hidden Markov model sequence Up: Introduction Previous: Biosequence analysis methods

Rey Rivera
Tue Jul 30 14:16:55 PDT 1996