logo

 

 

slogan

Subversive C: A guerrilla guide to fooling your tools

 

By the wicked Witch of the West
January 2019

 


These days there is no shortage of comment from bloggerati on safer or more secure C. Now Phaedrus brings you something different. Welcome to the Subversive C spot, where The Wicked Witch of the West shows you how both compilers and tools can be fooled by contrived coding, often with surprising consequences.


 

Look at the following program:

 

 

 

 

 

 

/* cimplex-d-0006.c - tests order of evaluation of argument list */
#include <stdio.h> 
void PrintEvalOrder( int p1, int p2, int p3)
{
                  printf("\np1, p2, p3 evaluated in order: p%i, p%i, p%i\n", p1, p2, p3);
                  return;
}
int main(void)
{
                  int i = 0;
                  
                  /* next line attempts to test order of eval. of args to a function call */
                                    
                  PrintEvalOrder((++i), (++i), (++i));       
                  
                  return 0;
}

Compile this under clang-4.0 and you get the warning:

 

cimplex-d-0006.c:15:18: warning: multiple unsequenced modifications to 'i' [-Wunsequenced]
                  PrintEvalOrder((++i), (++i), (++i));
                        ^           ~~

 

and the output when run is:

p1, p2, p3 evaluated in order: p1, p2, p3

revealing that clang chose left-to-right evaluation for the arguments in the argument list.

Fair enough, you might think ... except that the effect of multiple side effects between sequence points is, according to the C standard, undefined. This is an example of where it makes sense for the compiler to carry on and do something sensible rather than bomb out with an error. Yet not all compilers will do this. Compile the program under gcc-5 and you get the compiler diagnostic:

 
cimplex-d-0006.c:15:32: warning: operation on ‘i’ may be undefined [-Wsequence-point]
  PrintEvalOrder((++i), (++i), (++i));
                                ^
cimplex-d-0006.c:15:32: warning: operation on ‘i’ may be undefined [-Wsequence-point]

The messages are still just warnings but this time the word “undefined alerts us to what the C standard says about the code. The output from the gcc-compiled programs is:

 

p1, p2, p3 evaluated in order: p3, p3, p3

 

Here, the output does not contradict the standard because, since the behaviour is undefined, the compiler can do what it pleases with this code. On the other hand, this looks very much like incautious optimisation and one wonders whether this is what the developers of gcc actually intend the behaviour to be.

Now, if we try compiling with tcc, we get the same output as with clang but there is no warning about the undefined behaviour. For a small open-source C compiler, this is perhaps understandable. If we now throw the code at the open-source checker cppcheck, we get the diagnostic message:

 

~.c:15: error: Expression '++i,++i' depends on order of evaluation of side effects
~.c:15: error: Expression '++i,++i,++i' depends on order of evaluation of side effects

 

This is as expected.

 

Here we have tried to exploit multiple side effects to determine the order of evaluation but we have not succeeded. Though the behaviour is strictly speaking undefined, we are probably confident that both clang-4.0 and tcc evaluate the arguments left-to-right. Yet we are still left in the dark about gcc. Not to worry. A small modification to the program suffices to get more helpful output for all three compilers. Consider the program:

 

#include <stdio.h>
static int a[3] = {0, 0, 0};
void PrintEvalOrder( int p1, int p2, int p3)
{
                  printf("\np1, p2, p3 evaluated in order: p%i, p%i, p%i\n", p1, p2, p3);
                  return;
}
int main(void)
{
                  int i = 0;
                  /* next line attempts to test order of eval. of args to a function call */
                                    
                  PrintEvalOrder((a[0]=++i), (a[1]=++i), (a[2]=++i));               
                  
                  return 0;
}

 

Now we get diagnostics from clang, gcc, and cppcheck but not tcc. output from the clang and tcc-compiled programs is as before but this time the gcc-compiled program outputs:

 

p1, p2, p3 evaluated in order: p3, p2, p1

 

and we infer that gcc evaluates the argument list right-to-left.

 

As tests for order for order of evaluation, both of the programs are imperfect, because the way they elicit the implementation-specific behaviour is with constructs for which the language standard stipulates undefined behaviour. There are, however, contrived means to avoid undefinedness ... and we’ll leave it as a challenge to readers to work out what those means might be. Next time we’ll give the answer to the challenge. For the moment, we’ll outline the kinds of issues that the Subversive C column will look at as time goes on.

 

C is widely used in critical systems. Gradually, as verifiers improve, we are getting closer and closer to the level of code quality that SPARK Ada has achieved for many years now. Whether C will ever catch up with SPARK Ada is a moot point.  Here we’ve shown the variation in compiler behaviour from constructs with highly contrived side effects but the really interesting thing to test is the behaviour of verifiers when analysing such programs.


Although most SPARK Ada is written using the AdaCore tools, the market for C tools is more heterogeneous. Different verifiers use different verification methods and typically need to work with different language subsets. These diverse tools are now firmly in the sights of the Subversive C column and your author fancies her chances of coming up with contrived programs to rattle their cages.

 

Let’s put it in perspective, however. If you are developing critical systems in C, then you should be working with a language subset enforcement tool and a modern C verifier. This is an enormous culture shock to most C programmers when they first encounter it, not least because some of the subsets can be very strict. Sticking to MISRA C is a start but, depending on the verifier, may not be enough.

 

Despite what you may have been told, the principal aim of language subsetting is to render programs tractable to verification by automatic verification tools.

 

In the coming months, the Subversive C articles will be bringing you examples of the kinds of code that different kinds of tools find troublesome. When you see why and how the trouble arises, you’ll understand just why language subsetting is so important.