Start by creating code for a heap data structure. Once again, we're going to cheat a bit and have only int-valued objects in the heap---not pointers to more complex data objects as one would normally have.
The data structure for the heap should look something like
struct _heap
{ int * data;
int NumHeap; // number of elements in the heap
int AllocHeap; // size of data array (NumHeap<=AllocHeap)
}
In the comments for your code, be sure to keep track of where the heap property is valid. The heap property is that data(parent)>=data(child). Since we are starting our numbering at 0, this means data[i]>=data[2*i+1] and data[i]>=data[2*i+2]. There are other important properties of the heap to maintain, as discussed in class (like using data[0]...data[NumHeap-1]).
Implement the following operations (using procedures or macros as appropriate)
I actually want you to do two implementations of heapify, one of which builds the heap from zero up, the other from the middle down.
The upward building heap has a loop in which NumHeap grows from 1 to num, with the heap property always holding for 0..NumHeap. This can be implemented as a very simple loop:
h->NumHeap=1;
while(h->NumHeapNumHeap]);
The downward building heap has a loop that counts downward, and maintains a loop invariant that has the heap property holding for [i..num]. The loop here is a bit trickier to write, but still fairly simple.
Build scaffolding to test out your heap manipulation routines. Use assertions as appropriate in your code to make sure that you have robust primitive operations.
Once your heaps are working correctly, write a heapsort routine that heapifies an array, then does repeated pop_heaps to fill the array from the end. You should have two versions of heapsort, corresponding to your two heapify implementations.
Time the heapsorts and compare them to your quicksort and insertion sort implementations. Put the results in a README file.
struct tree_node
{ struct tree_node *left, *right;
float weight;
char name; // only the leaves have names
}
Input format: a file of entries, one per line. There may be any number of entries, terminated by the end of the file. Each entry consists of a single character in the first column of the line, followed by white space, followed by the probability of that character, for example
A 0.1 B 0.6 C 0.3
Output format: First report the entropy of the input (- sum P(i) log_2 P(i)) which is a lower bound on the encoding cost for the alphabet, then report the actual average encoding cost (sum P(i) bits(code(i))). Finally, report the Huffman code, one character per line. Each entry consists of a single character (as in the input), followed by a tab, followed by the probability of the character followed by a tab, followed by the Huffman code for the character. For example,
Entropy = 1.2955 bits Encoding cost= 1.4 bits A 0.1 00 B 0.6 1 C 0.3 01
Demonstrate that the program is working by doing good scaffolding to test it, particularly in boundary cases. Time it. Profile it with gprof.
the 0.1 of 0.1 program 0.01 ...You might want to redesign the input and output formats for this task, since some encoding schemes could have arbitrary strings to encode, and having white-spaces in the strings might break the program.
Questions about page content should be directed to
Kevin Karplus