UCSC BME 194 Winter 2004

Resource-efficient Programming

Program Assignment 2 (bitsets)

(Last Update: 11:27 PST 10 January 2004 )
The first "column" of Programming Pearls mentioned bit sets as the obvious solution to a particular sorting problem that needed small space. For this assignment we will implement a bit-set sorting routine and compare its performance with the qsort() routine in the standard C library. To avoid the time and disk-space costs of having large files, we will build a test framework that looks only at the sorting, without the I/O.

The program should be structured in multiple files:

We are using multiple files for two reasons: to get you acquainted with multiple-file compilation and to provide reusable components for later homework exercises.

bitset.c and bitset.h

This module should have the fundamental definition of a bitset. In c++ we would make a class for bitsets (except that there already is one in the standard template library), but in c we will have a struct and a number of functions. Please define the bitset type as follows:
    typedef struct _bitset
    {   unsigned int univ_size;
        long *bits;
    } bitset;
The bits pointer is supposed to point to enough space for univ_size bits (representing 0 to univ_size-1).

Here are some operations that you should define:

void bitset_alloc(bitset *s, int univ)
Sets s->univ_size and allocates the right number of words for bits. The pointer s should be a valid pointer to a bitset (not NULL). Use assert to make sure that the bits were allocated.
void bitset_free(bitset *s)
Free the bits, set bits to NULL, and univ_size to 0. Note, the bitset object itself is not freed, since it may be a statically allocated object or on the stack, not on the heap.
void bitset_union(bitset *s, const bitset* t)
Do an in-place union operation: s gets s union t. Use an assertion to verify that s and t are over the same universe, or allow t to be over a smaller universe, or correctly handle the reallocation and copying needed to increase the size of the universe for s.
void bitset_intersection(bitset *s, const bitset* t)
Do an in-place intersection operation: s gets s intersect t. Correctly handle the cases where s and t have different size universes. Note that there is no need for reallocation of s, since the intersection is a subset of s.
void bitset_clear(bitset *s)
S is set to the empty set, without changing its universe.
void bitset_add(bitset *s, unsigned int x)
Add element x to bitset s.
void bitset_remove(bitset *s, unsigned int x)
Remove element x from bitset s.
int bitset_member(const bitset* s, unsigned int x)
returns 1 if x is in s, 0 otherwise
bitset_forall(x,s,code)
Write a macro that executes the code in a loop in which x is sequentially assigned to every element of the bitset pointed to by s. Use a temporary bitset* variable to hold the value of s, so that s is instantiated only once by the macro.

bitset_sort.c and bitset_sort.h

Write a function
    int bitset_sort(unsigned int* out, const unsigned int* in, 
    	int num_to_sort, int univ)
that puts the first num_to_sort integers from the array "in" into a bitset (with universe size univ), then puts them in ascending order in the array out. Note: the "in" and "out" pointers may be the same, so that
    bitset_sort(A,A,num_to_sort,univ) 
does an in-place sort of array A. The return value from bitset_sort is the number of integers added to the out array. Note: the return value is no larger than num_to_sort, but may be smaller if there are duplicate entries in the "in" array.

random_array.c and random_array.h

Write functions
    random_fill(unsigned int *out, int num_to_fill, int univ)
    random_distinct_fill(unsigned int *out, int num_to_fill, int univ)

The random_fill procedure should put num_to_fill random numbers from the range [0..univ-1] into the out array. The numbers should be independent of each other and from a uniform distribution.

The random_distinct_fill procedure also puts num_to_fill random numbers from the range [0..univ-1] into the out array, but it also ensures that all numbers are distinct. (Note: this requires num_to_fill<=univ.) For simplicity, we'll assume that num_to_fill is less than half of univ. With this assumption it is fairly efficient to generate random numbers uniformly and check to see if they have already been used, by keeping a temporary bitset of all numbers that have been used.

time_sort.c

Write a program that determines the times for several operations. Compile everything with -O4 for best speed. Provide the output in a format that can be used easily by gnuplot to plot the results.

Time the following operations for universe sizes that are powers of 10 (say 10^3 through 10^7):

Time the following operations for a universe of 10^7, with array sizes that are powers of 2 (say 2^10 through 2^21):

The bitset_sort and qsort routines may have times that are data-dependent, so they should be run on the same arrays. The easiest way to ensure this is to create an array with random_distinct_fill, do a copying (not in-place) sort with bitset_sort, and then an in-place sort with qsort.

Use clock() calls (or getrusage() if you feel ready for the challenge) to get execution time at various points in your program, and take differences to allocate the time to the appropriate operations.

For the shorter array sizes you will certainly need to run the sort many times to get enough sample points to get good timing estimates. You should use different values in the array for each call.

Plotting

Plot the results of your timing analysis using gnuplot. Use commands like "set xlabel", "set ylabel", "set title", and the "title" keyword in the "plot" command to label your plots appropriately.

You can put multiple plots on the same graph in gnuplot, but if you want separate graphs in one command file (say bitset-time.gnuplot), it is a good idea to separate the plot commands with

    pause -1 'hit return to continue'
so that the set of graphs may be viewed sequentially by typing
    load 'bitset-time.gnuplot'
to gnuplot.

To turn in

Submit all your .c and .h files, the output data file(s), a README file that describes the experiment you did, and the bitset-time.gnuplot command file that plots your results.

Note: your code, your README file, and your graphical results should be understandable by someone who has not seen this assignment. In fact, everything you turn in for this class should be understandable without reference to the class or the assignment. You will be graded in part on the clarity of your presentation, not just the correctness of the code and the tests.

Bonus points: you can get bonus points by doing the reallocating version of bitset_union, by measuring the extra cost of assertions that check preconditions on the routines, by measuring the cost of macros vs. functions for bitset_add and bitset_member, and so forth. Note that you can remove assertions without changing the code by compiling with -DNDEBUG=1


slug icon to go to School of Engineering home page SoE home     UCSC Bioinformatics Home Page     BME 194 home page    

Questions about page content should be directed to

Kevin Karplus
Computer Engineering
University of California, Santa Cruz
Santa Cruz, CA 95064
USA
karplus@soe.ucsc.edu
1-831-459-4250