UCSC BME 194 Winter 2004
Resource-efficient Programming
Program Assignment 2 (bitsets)
(Last Update:
11:27 PST 10 January 2004
)
The first "column" of Programming Pearls mentioned bit sets as
the obvious solution to a particular sorting problem that needed small space.
For this assignment we will implement a bit-set sorting routine and
compare its performance with the qsort() routine in the standard C library.
To avoid the time and disk-space costs of having large files, we will
build a test framework that looks only at the sorting, without the I/O.
The program should be structured in multiple files:
- bitset.c and bitset.h: fundamental, general-purpose bitset operations.
- bitset_sort.c and bitset_sort.h: the sorting routine.
- random_array.c and random_array.h: a routine to fill an
array with random numbers for testing purposes.
- time_sort.c: the main program to do the timing
We are using multiple files for two reasons: to get you acquainted
with multiple-file compilation and to provide reusable components for
later homework exercises.
bitset.c and bitset.h
This module should have the fundamental definition of a bitset. In
c++ we would make a class for bitsets (except that there already is
one in the standard template library), but in c we will have a struct
and a number of functions. Please define the bitset type as follows:
typedef struct _bitset
{ unsigned int univ_size;
long *bits;
} bitset;
The bits pointer is supposed to point to enough space for
univ_size bits (representing 0 to univ_size-1).
Here are some operations that you should define:
- void bitset_alloc(bitset *s, int univ)
- Sets s->univ_size and allocates the right number of words for bits.
The pointer s should be a valid pointer to a bitset (not NULL).
Use assert to make sure that the bits were allocated.
- void bitset_free(bitset *s)
- Free the bits, set bits to NULL, and univ_size to 0.
Note, the bitset object itself is not freed, since it may be a
statically allocated object or on the stack, not on the heap.
- void bitset_union(bitset *s, const bitset* t)
- Do an in-place union operation: s gets s union t.
Use an assertion to verify that s and t are over the same universe, or
allow t to be over a smaller universe, or correctly handle the
reallocation and copying needed to increase the size of the universe for s.
- void bitset_intersection(bitset *s, const bitset* t)
- Do an in-place intersection operation: s gets s intersect t.
Correctly handle the cases where s and t have different size universes.
Note that there is no need for reallocation of s, since the
intersection is a subset of s.
- void bitset_clear(bitset *s)
- S is set to the empty set, without changing its universe.
- void bitset_add(bitset *s, unsigned int x)
- Add element x to bitset s.
- void bitset_remove(bitset *s, unsigned int x)
- Remove element x from bitset s.
- int bitset_member(const bitset* s, unsigned int x)
- returns 1 if x is in s, 0 otherwise
- bitset_forall(x,s,code)
- Write a macro that executes the code in a loop in which x is
sequentially assigned to every element of the bitset pointed to by s.
Use a temporary bitset* variable to hold the value of s, so that s is
instantiated only once by the macro.
bitset_sort.c and bitset_sort.h
Write a function
int bitset_sort(unsigned int* out, const unsigned int* in,
int num_to_sort, int univ)
that puts the first num_to_sort integers from the array "in" into a
bitset (with universe size univ), then puts them in ascending order in
the array out.
Note: the "in" and "out" pointers may be the same, so that
bitset_sort(A,A,num_to_sort,univ)
does an in-place sort of array A.
The return value from bitset_sort is the number of integers added to
the out array. Note: the return value is no larger than num_to_sort,
but may be smaller if there are duplicate entries in the "in" array.
random_array.c and random_array.h
Write functions
random_fill(unsigned int *out, int num_to_fill, int univ)
random_distinct_fill(unsigned int *out, int num_to_fill, int univ)
The random_fill procedure should put num_to_fill random numbers from
the range [0..univ-1] into the out array. The numbers should be
independent of each other and from a uniform distribution.
The random_distinct_fill procedure also puts num_to_fill random
numbers from the range [0..univ-1] into the out array, but it also
ensures that all numbers are distinct. (Note: this requires num_to_fill<=univ.)
For simplicity, we'll assume that num_to_fill is less than half of
univ. With this assumption it is fairly efficient to generate random
numbers uniformly and check to see if they have already been used, by
keeping a temporary bitset of all numbers that have been used.
time_sort.c
Write a program that determines the times for several operations.
Compile everything with -O4 for best speed.
Provide the output in a format that can be used easily by gnuplot to
plot the results.
Time the following operations for universe sizes
that are powers of 10 (say 10^3 through 10^7):
- bitset_clear
- bitset_add
- bitset_union
- count (implemented as a bitset_forall with a count++ body)
Time the following operations for a universe of 10^7, with array sizes
that are powers of 2 (say 2^10 through 2^21):
- random_fill
- random_distinct_fill
- bitset_sort
- qsort
The bitset_sort and qsort routines may have times that are
data-dependent, so they should be run on the same arrays.
The easiest way to ensure this is to create an array with
random_distinct_fill, do a copying (not in-place) sort with
bitset_sort, and then an in-place sort with qsort.
Use clock() calls (or getrusage() if you feel ready for the challenge)
to get execution time at various points in your program, and take
differences to allocate the time to the appropriate operations.
For the shorter array sizes you will certainly need to run the sort
many times to get enough sample points to get good timing estimates.
You should use different values in the array for each call.
Plotting
Plot the results of your timing analysis using gnuplot.
Use commands like "set xlabel", "set ylabel", "set title", and the
"title" keyword in the "plot" command to label your plots appropriately.
You can put multiple plots on the same graph in gnuplot, but if you
want separate graphs in one command file (say bitset-time.gnuplot), it
is a good idea to separate
the plot commands with
pause -1 'hit return to continue'
so that the set of graphs may be viewed sequentially by typing
load 'bitset-time.gnuplot'
to gnuplot.
To turn in
Submit all your .c and .h files, the output data file(s), a README
file that describes the experiment you did, and the
bitset-time.gnuplot command file that plots your results.
Note: your code, your README file, and your graphical results
should be understandable by someone who has not seen this assignment.
In fact, everything you turn in for this class should be
understandable without reference to the class or the assignment. You
will be graded in part on the clarity of your presentation, not just
the correctness of the code and the tests.
Bonus points: you can get bonus points by doing the reallocating
version of bitset_union, by measuring the extra cost of assertions
that check preconditions on the routines, by measuring the cost of
macros vs. functions for bitset_add and bitset_member, and so forth.
Note that you can remove assertions without changing the code by
compiling with -DNDEBUG=1
SoE home
UCSC Bioinformatics Home Page
BME 194 home page
Questions about page content should be directed to
Kevin Karplus
Computer Engineering
University of California, Santa Cruz
Santa Cruz, CA 95064
USA
karplus@soe.ucsc.edu
1-831-459-4250