Unspecified Behavior in Imperative Programming Languages

  CS 203, Spring 1998
  Homework No. 1

Part 1: Unspecified behavior in Standard Pascal.

Two platforms were used for Pascal tests.

Example 1: Subrange Types

Pascal supports subrange types, which are constrained simple types. For example, a type used to represent one hockey game of an 82 game season could be defined as:

  type gameno = 1..82;

According to the standard, assigning a integral value outside the defined range to a variable of type gameno is an error. The following test program was used:

  program subrange (output);

  type little = 1..10;
       big = 11..20;

  var a : little ;
      b : big;

  begin
    a := 257;
    b := 300;
    writeln(a, b)
  end.

Neither compiler generated an error message, which in itself should be an error. The two executables generated different output. On aargh:
 
  257 300

and on arapaho:
 
  1 44

In the latter case, it appears that the compiler is using the subrange specification to reduce memory consumption by reserving a single byte to store an unsigned integer--thus the value 257 "wraps" to 1. The Pascal User Manual and Report (second edition) by Jensen & Wirth says: "To the implementor [subrange types] also suggest an opportunity to conserve memory space and to introduce validity checks upon assignment at run-time." Clearly, the SunSoft implementors chose to do the former, and neither compiler group implemented the latter. This is potentially a serious problem, since "out of range" values may be silently "wrapped" to a different value, without any notification to the programmer, depending on the platform being used.

Example 2: Boolean Short-circuit

According to the Pascal Standard, once the boolean value of an if statement's evaluation expression is determined, remaining test expressions may or may not be executed. For example, given
 
  if (x > 10) and (f(x,y)) then...

     where x = 1 and f(x,y) has a side effect--say
     for example changing the value of x,

f(x,y) may or may not be executed. The standard permits either. The following program was written and executed on the two test platforms:

  program iftest (output);

  var global : integer;

  function addnrtn : integer;
    begin global := global + 1;
     ddnrtn := global 
  end; { addnrtn } 

  begin
    global := 1;
    if (global > 2) and (addnrtn > 2) then
      writeln('Well this should never happen!');
    writeln('global is ', global)
  end.

On both platforms, it produced the output:

  global is 2

In other words the function addnrtn executed despite the failure of the first if condition--boolean "short circuiting" did not occur. The programmer clearly should not rely on the presence or absence of the short-circuit, and the Pascal report warns as much. Nonetheless this is potentially a serious problem if the programmer either disregards the behavior and expects one situation or the other, or worse, if the programmer accidentally uses an expression with a side effect. In the latter case the program is technically correct but might produce different results depending on the compiler implementation.

Part 2: Unspecified behavior in ANSI C.

Two platforms were used for C tests:

Example 1: Order of function argument evaluation

The ANSI C standard expressly does not define the order of execution of function arguments. In cases where the arguments have side effects, the results are not defined. The following test program was used:

  int a = 0;

  int inca(void) {
    return ++a;
  }

  int main(void) {
    printf("%d %d %d %d\n", inca(), inca(), inca(), inca());
  }

On aargh, the Intel machine, the results were:

  4 3 2 1

and on sally, the MIPS-based machine:

  1 2 3 4

As with relying on the presence of boolean short-circuit in Pascal, the programmer is clearly making an error by depending on either behavior. But the risk here is of accidentally using a function argument with side effects and getting different results across platforms, without reasonable hope that the compiler will find the problem.

Example 2: Struct Bit-Field Ordering

According to the second edition of Kernighan & Ritchie's The C Programming Language, "almost everything about [struct bit] fields is implementation-dependent". The particular characteristic tested here is that "fields are assigned left to right on some machines and right to left on others." The following program was used:

  struct {
    unsigned int f1 : 1;
    unsigned int f2 : 2;
    unsigned int f3 : 3;
    unsigned int f4 : 4;
  } square_peg = { 1, 0, 1, 0 };

  int main(void) {
    unsigned int round_hole = 0xf & *((unsigned int *) &square_peg);
    printf("value of integer-cast struct is 0x%x\n", round_hole);
    return 0;
  }

The program compiles without warnings or errors. On the Intel PentiumPro platform, the output is:

  value of integer-cast struct is 0x9

On the SGI (MIPS) platform, the result is:

  value of integer-cast struct is 0x0

In the former case the bit fields were assigned at the least significant side of the unsigned int used to store them. In the latter, they were presumably assigned to the most signicant end. As with the other unspecified behavior described here, the danger is most likely from erroneous programs that rely on one ordering and behave correctly on their "native" platform, only to break in mysterious ways on another machine or with a new compiler.