Frama-C code slicer not loading any C files - c

I have a 1000 lines C file with 10 maths algorithms written by a professor, I need to delete 9 maths functions and all their dependencies from the 1000 lines, so i am having a go using Frama-C Boron windows binary installer.
Now it won't load the simplest example.c file... i select source file and nothing loads.
Boron edition is from 2010 so i checked how to compile a later Frama-C: they say having a space in my windows 7 user name can cause problems, which is not encouraging.
Is Frama-C my best option for my slicing task?
Here is an example file that won't load:
// File swap.c:
/*# requires \valid(a) && \valid(b);
# ensures A: *a == \old(*b) ;
# ensures B: *b == \old(*a) ;
# assigns *a,*b ;
#*/
void swap(int *a,int *b)
{
int tmp = *a ;
*a = *b ;
*b = tmp ;
return ;
}
Here is the code i wish to take only one function from, the option labelled smooth and swvd. https://sites.google.com/site/kootsoop/Home/cohens_class_code

I looked at the code you linked to, and it does not seem like the best candidate for a Frama-C analysis. For instance, that code is not strictly C99-conforming, using e.g. some old-style prototypes (including implicit int return types), functions that are used before they are defined without forward declarations (fft), and a missing header inclusion (stdlib.h). Those are not big issues, since the changes are relatively simple, and some of them are treated similarly to how gcc -std=c99 works: they emit warnings but not errors. However, it's important to notice that they do require a non-zero amount of time, therefore this won't be a "plug-and-play" solution.
On a more general note, Frama-C relies on CIL (C Intermediate Language) for C code normalization, so the sliced program will probably not be identical to "the original program minus the sliced statements". If the objective is merely to remove some statements but keep the code syntactically identical otherwise, then Frama-C will not be ideal1.
Finally, it is worth noting that some Frama-C analyses can help finding dead code, and the result is even clearer if the code is already split into functions. For instance, using Value analysis on a properly configured program, it is possible to see which statements/functions are never executed. But this does rely on the absence of at least some kinds of undefined behavior.
E.g. if uninitialized variables are used in your program (which is forbidden by the C standard, but occasionally happens and goes unnoticed), the Value analysis will stop its propagation and the code afterwards may be marked as dead, since it is semantically dead w.r.t. the standard. It's important to be aware of that, since a naive approach would be misleading.
Overall, for the code size you mention, I'm not sure Frama-C would be a cost-effective approach, especially if (1) you have never used Frama-C and are having trouble compiling it (Boron is a really old release, not recommended) and (2) if you already know your code base, and therefore would be relatively proficient in manually slicing its parts.
1That said, I do not know of any C slicer that preserves statements like that; my point is that, while intuitively one might think that a C slicer would straightforwardly preserve most of the syntax, C is such a devious language that doing so is very hard, hence why most tools will do some normalization steps beforehand.

Related

Return a struct directly or fill a pointer?

Let's say I have the following function to initialize a data structure:
void init_data(struct data *d) {
d->x = 5;
d->y = 10;
}
However, with the following code:
struct data init_data(void) {
struct data d = { 5, 10 };
return d;
}
Wouldn't this be optimized away due to copy elision and be just as performant as the former version?
I tried to do some tests on godbolt to see if the assembly was the same, but when using any optimization flags everything was always entirely optimized away, with nothing left but something like this: movabsq $42949672965, %rax, and I am not sure if the same would happen in real code.
The first version I provided seems to be very common in C libraries, and I do not understand why as they should be both just as fast with RVO, with the latter requiring less code.
The first version I provided seems to be very common in C libraries, and I do not understand why as they should be both just as fast with
RVO, with the latter requiring less code.
The main reason for the first being so common is historic. The second way of initializing structures from literals was not standard (well, it was, but only for static initializers and never for automatic variables) and it's never allowed on assignments (well, I've not checked the status of the recent standards) Even, in ancient C, a simple assignment as:
struct A a, b;
...
a = b; /* this was not allowed a long time ago */
was not accepted at all.
So, in order to be able to compile code in every platform, you have to write the old way, as normally, modern compilers allow you to compile legacy code, while the opposite (old compilers accepting new code) is not possible.
And this also applies to returning structures or passing them by value. Apart of being normally a huge waste of resources (it's common to see the whole structure being copied in the stack or copied back to the proper place, once the function returns) old compilers didn't accept these, so to be portable, you must avoid to use these constructs.
Finally a comment: don't use your compiler to check if both constructs generate the same code, as probably it does... but you'll get the wrong assumption that this is common, and you'll run into error. Another different implementation can (and is allowed to do) different translation and result in different code.

initialising constant static array with algorhythm [duplicate]

I am thinking about the following problem: I want to program a microcontroller (let's say an AVR mega type) with a program that uses some sort of look-up tables.
The first attempt would be to locate the table in a separate file and create it using any other scripting language/program/.... In this case there is quite some effort in creating the necessary source files for C.
My thought was now to use the preprocessor and compiler to handle things. I tried to implement this with a table of sine values (just as an example):
#include <avr/io.h>
#include <math.h>
#define S1(i,n) ((uint8_t) sin(M_PI*(i)/n*255))
#define S4(i,n) S1(i,n), S1(i+1,n), S1(i+2,n), S1(i+3,n)
uint8_t lut[] = {S4(0,4)};
void main()
{
uint8_t val, i;
for(i=0; i<4; i++)
{
val = lut[i];
}
}
If I compile this code I get warnings about the sin function. Further in the assembly there is nothing in the section .data. If I just remove the sin in the third line I get the data in the assembly. Clearly all information are available at compile time.
Can you tell me if there is a way to achieve what I intent: The compiler calculates as many values as offline possible? Or is the best way to go using an external script/program/... to calculate the table entries and add these to a separate file that will just be #included?
The general problem here is that sin call makes this initialization de facto illegal, according to rules of C language, as it's not constant expression per se and you're initializing array of static storage duration, which requires that. This also explains why your array is not in .data section.
C11 (N1570) §6.6/2,3 Constant expressions (emphasis mine)
A constant expression can be evaluated during translation rather than
runtime, and accordingly may be used in any place that a constant may
be.
Constant expressions shall not contain assignment, increment,
decrement, function-call, or comma operators, except when they are
contained within a subexpression that is not evaluated.115)
However as by #ShafikYaghmour's comment GCC will replace sin function call with its built-in counterpart (unless -fno-builtin option is present), that is likely to be treated as constant expression. According to 6.57 Other Built-in Functions Provided by GCC:
GCC includes built-in versions of many of the functions in the
standard C library. The versions prefixed with __builtin_ are always
treated as having the same meaning as the C library function even if
you specify the -fno-builtin option.
What you are trying is not part of the C language. In situations like this, I have written code following this pattern:
#if GENERATE_SOURCECODE
int main (void)
{
... Code that uses printf to write C code to stdout
}
#else
// Source code generated by the code above
... Here I paste in what the code above generated
// The rest of the program
#endif
Every time you need to change it, you run the code with GENERATE_SOURCECODE defined, and paste in the output. Works well if your code is self contained and the generated output only ever changes if the code generating it changes.
First of all, it should go without saying that you should evaluate (probably by experiment) whether this is worth doing. Your lookup table is going to increase your data size and programmer effort, but may or may not provide a runtime speed increase that you need.
If you still want to do it, I don't think the C preprocessor can do it straightforwardly, because it has no facilities for iteration or recursion.
The most robust way to go about this would be to write a program in C or some other language to print out C source for the table, and then include that file in your program using the preprocessor. If you are using a tool like make, you can create a rule to generate the table file and have your .c file depend on that file.
On the other hand, if you are sure you are never going to change this table, you could write a program to generate it once and just paste it in.

How to make GCC evaluate functions at compile time?

I am thinking about the following problem: I want to program a microcontroller (let's say an AVR mega type) with a program that uses some sort of look-up tables.
The first attempt would be to locate the table in a separate file and create it using any other scripting language/program/.... In this case there is quite some effort in creating the necessary source files for C.
My thought was now to use the preprocessor and compiler to handle things. I tried to implement this with a table of sine values (just as an example):
#include <avr/io.h>
#include <math.h>
#define S1(i,n) ((uint8_t) sin(M_PI*(i)/n*255))
#define S4(i,n) S1(i,n), S1(i+1,n), S1(i+2,n), S1(i+3,n)
uint8_t lut[] = {S4(0,4)};
void main()
{
uint8_t val, i;
for(i=0; i<4; i++)
{
val = lut[i];
}
}
If I compile this code I get warnings about the sin function. Further in the assembly there is nothing in the section .data. If I just remove the sin in the third line I get the data in the assembly. Clearly all information are available at compile time.
Can you tell me if there is a way to achieve what I intent: The compiler calculates as many values as offline possible? Or is the best way to go using an external script/program/... to calculate the table entries and add these to a separate file that will just be #included?
The general problem here is that sin call makes this initialization de facto illegal, according to rules of C language, as it's not constant expression per se and you're initializing array of static storage duration, which requires that. This also explains why your array is not in .data section.
C11 (N1570) §6.6/2,3 Constant expressions (emphasis mine)
A constant expression can be evaluated during translation rather than
runtime, and accordingly may be used in any place that a constant may
be.
Constant expressions shall not contain assignment, increment,
decrement, function-call, or comma operators, except when they are
contained within a subexpression that is not evaluated.115)
However as by #ShafikYaghmour's comment GCC will replace sin function call with its built-in counterpart (unless -fno-builtin option is present), that is likely to be treated as constant expression. According to 6.57 Other Built-in Functions Provided by GCC:
GCC includes built-in versions of many of the functions in the
standard C library. The versions prefixed with __builtin_ are always
treated as having the same meaning as the C library function even if
you specify the -fno-builtin option.
What you are trying is not part of the C language. In situations like this, I have written code following this pattern:
#if GENERATE_SOURCECODE
int main (void)
{
... Code that uses printf to write C code to stdout
}
#else
// Source code generated by the code above
... Here I paste in what the code above generated
// The rest of the program
#endif
Every time you need to change it, you run the code with GENERATE_SOURCECODE defined, and paste in the output. Works well if your code is self contained and the generated output only ever changes if the code generating it changes.
First of all, it should go without saying that you should evaluate (probably by experiment) whether this is worth doing. Your lookup table is going to increase your data size and programmer effort, but may or may not provide a runtime speed increase that you need.
If you still want to do it, I don't think the C preprocessor can do it straightforwardly, because it has no facilities for iteration or recursion.
The most robust way to go about this would be to write a program in C or some other language to print out C source for the table, and then include that file in your program using the preprocessor. If you are using a tool like make, you can create a rule to generate the table file and have your .c file depend on that file.
On the other hand, if you are sure you are never going to change this table, you could write a program to generate it once and just paste it in.

Why was mixing declarations and code forbidden up until C99?

I have recently become a teaching assistant for a university course which primarily teaches C. The course standardized on C90, mostly due to widespread compiler support. One of the very confusing concepts to C newbies with previous Java experience is the rule that variable declarations and code may not be intermingled within a block (compound statement).
This limitation was finally lifted with C99, but I wonder: does anybody know why it was there in the first place? Does it simplify variable scope analysis? Does it allow the programmer to specify at which points of program execution the stack should grow for new variables?
I assume the language designers wouldn't have added such a limitation if it had absolutely no purpose at all.
In the very beginning of C the available memory and CPU resources were really scarce. So it had to compile really fast with minimal memory requirements.
Therefore the C language has been designed to require only a very simple compiler which compiles fast. This in turn lead to "single-pass compiler" concept: The compiler reads the source-file and translates everything into assembler code as soon as possible - usually while reading the source file. For example: When the compiler reads the definition of a global variable the appropriate code is emitted immediately.
This trait is visible in C up until today:
C requires "forward declarations" of all and everything. A multi-pass compiler could look forward and deduce the declarations of variables of functions in the same file by itself.
This in turn makes the *.h files necessary.
When compiling a function, the layout of the stack frame must be computed as soon as possible - otherwise the compiler had to do several passes over the function body.
Nowadays no serious C compiler is still "single pass", because many important optimizations cannot be done within one pass. A little bit more can be found in Wikipedia.
The standard body lingered for quite some time to relax that "single-pass" point in regard to the function body. I assume, that other things were more important.
It was that way because it had always been done that way, it made writing compilers a little easier, and nobody had really thought of doing it any other way. In time people realised that it was more important to favour making life easier for language users rather than compiler writers.
I assume the language designers wouldn't have added such a limitation if it had absolutely no purpose at all.
Don't assume that the language designers set out to restrict the language. Often restrictions like this arise by chance and circumstance.
I guess it should be easier for a non-optimising compiler to produce efficient code this way:
int a;
int b;
int c;
...
Although 3 separate variables are declared, the stack pointer can be incremented at once without optimising strategies such as reordering, etc.
Compare this to:
int a;
foo();
int b;
bar();
int c;
To increment the stack pointer just once, this requires a kind of optimisation, although not a very advanced one.
Moreover, as a stylistic issue, the first approach encourages a more disciplined way of coding (no wonder that Pascal too enforces this) by being able to see all the local variables at one place and eventually inspect them together as a whole. This provides a clearer separation between code and data.
Requiring that variables declarations appear at the start of a compound statement did not impair the expressiveness of C89. Anything that one could legitimately do using a mid-block declaration could be done just as well by adding an open-brace before the declaration and doubling up the closing brace of the enclosing block. While such a requirement may sometimes have cluttered source code with extra opening and closing braces, such braces would not have been just noise--they would have marked the beginning and end of variables' scopes.
Consider the following two code examples:
{
do_something_1();
{
int foo;
foo = something1();
if (foo) do_something_1(foo);
}
{
int bar;
bar = something2();
if (bar) do_something_2(bar);
}
{
int boz;
boz = something3();
if (boz) do_something_3(boz);
}
}
and
{
do_something_1();
int foo;
foo = something1();
if (foo) do_something_1(foo);
int bar;
bar = something2();
if (bar) do_something_2(bar);
int boz;
boz = something3();
if (boz) do_something_3(boz);
}
From a run-time perspective, most modern compilers probably wouldn't care about whether foo is syntactically in scope during the execution of do_something3(), since it could determine that any value it held before that statement would not be used after. On the other hand, encouraging programmers to write declarations in a way which would generate sub-optimal code in the absence of an optimizing compiler is hardly an appealing concept.
Further, while handling the simpler cases involving intermixed variable declarations would not be difficult (even a 1970's compiler could have done it, if the authors wanted to allow such constructs), things become more complicated if the block which contains intermixed declarations also contains any goto or case labels. The creators of C probably thought allowing intermixing of variable declarations and other statements would complicate the standards too much to be worth the benefit.
Back in the days of C youth, when Dennis Ritchie worked on it, computers (PDP-11 for example) have very limited memory (e.g. 64K words), and the compiler had to be small, so it had to optimize very few things and very simply. And at that time (I coded in C on Sun-4/110 in the 1986-89 era), declaring register variables was really useful for the compiler.
Today's compilers are much more complex. For example, a recent version of GCC (4.6) has more 5 or 10 million lines of source code (depending upon how you measure it), and does a big lot of optimizations which did not existed when the first C compilers appeared.
And today's processors are also very different (you cannot suppose that today's machines are just like machines from the 1980s, but thousands of times faster and with thousands times more RAM and disk). Today, the memory hierarchy is very important: cache misses are what the processor does the most (waiting for data from RAM). But in the 1980s access to memory was almost as fast (or as slow, by current standards) than execution of a single machine instruction. This is completely false today: to read your RAM module, your processor may have to wait for several hundreds of nanoseconds, while for data in L1 cache, it can execute more that one instruction each nanosecond.
So don't think of C as a language very close to the hardware: this was true in the 1980s, but it is false today.
Oh, but you could (in a way) mix declarations and code, but declaring new variables was limited to the start of a block. For example, the following is valid C89 code:
void f()
{
int a;
do_something();
{
int b = do_something_else();
}
}

Automatically deleting unused local variables from C source code

I want to delete unused local variables from C file.
Example:
int fun(int a , int b)
{
int c,sum=0;
sum=a + b;
return sum;
}
Here the unused variable is 'c'.
I will externally have a list of all unused local variables. Now using unused local variables which I have, we have to find local variables from source code & delete.
In above Example "c" is unused variable. I will be knowing it (I have code for that).
Here I have to find c & delete it .
EDIT
The point is not to find unused local variables with an external tool. The point is to remove them from code given a list of them.
Turn up your compiler warning level, and it should tell you.
Putting your source fragment in "f.c":
% gcc -c -Wall f.c
f.c: In function 'fun':
f.c:1: warning: unused variable 'c'
Tricky - you will have to parse C code for this. How close does the result have to be?
Example of what I mean:
int a, /* foo */
b, /* << the unused one */
c; /* bar */
Now, it's obvious to humans that the second comment has to go.
Slight variation:
void test(/* in */ int a, /* unused */ int b, /* out */ int* c);
Again, the second comment has to go, the one before b this time.
In general, you want to parse your input, filter it, and emit everything that's not the declaration of an unused variable. Your parser would have to preserve comments and #include statements, but if you don't #include headers it may be impossible to recognize declarations (even more so if macro's are used to hide the declaration). After all, you need headers to decide if A * B(); is a function declaration (when A is a type) or a multiplication (when A is a variable)
[edit] Furthermore:
Even if you know that a variable is unused, the proper way to remove it depends a lot on remote context. For instance, assume
int foo(int a, int b, int c) { return a + b; }
Clearly, c is unused. Can you change it to ?
int foo(int a, int b) { return a + b; }
Perhaps, but not if &foo is stored int a int(*)(int,int,int). And that may happen somewhere else. If (and only if) that happens, you should change it to
int foo(int a, int b, int /*unused*/ ) { return a + b; }
Why do you want to do this? Assuming you have a decent optimizing compiler (GCC, Visual Studio et al) the binary output will not be any different wheter you remove the 'int c' in your original example or not.
If this is just about code cleanup, any recent IDE will give you quick links to the source code for each warning, just click and delete :)
My answer is more of an elaborate comment to MSalters' very thorough answer.
I would go beyond 'tricky' and say that such a tool is both impossible and inadvisable.
If you are looking to simply remove the references to the variable, then you could write a code parser of your own, but it would need to distinguish between the function context it is in such as
int foo(double a, double b)
{
b = 10.0;
return (int) b;
}
int bar(double a, double b)
{
a = 5.00;
return (int) a;
}
Any simple parser would have trouble with both 'a' and 'b' being unused variables.
Secondly, if you consider comments as MSalter has, you'll discover that people do not comment consistently;
double a;
/*a is designed as a dummy variable*/
double b;
/*a is designed as a dummy variable*/
double a;
double b;
double a; /*a is designed as a dummy variable*/
double b;
etc.
So simply removing the unused variables will create orphaned comments, which are arguably more dangerous than not commenting at all.
Ultimately, it is an obscenely difficult task to do elegantly, and you would be mangling code regardless. By automating the process, you would be making the code worse.
Lastly, you should be considering why the variables were in the code in the first place, and if they are deprecated, why they were not deleted when all their references were.
Static code analysis tools in additional to warning level as Paul correctly stated.
As well as being able to reveal these through warnings, the compiler will normally optimise these away if any optimisations are turned on. Checking if a variable is never referenced is quite trivial in terms of implementation in the compiler.
You will need a good parser that preserves original character position of tokens (even in presence of preprocessor!). There are some tools for automated refactoring of C/C++, but they are far from mainstream.
I recommend you to check out Taras' Blog. The guy is doing some large automated refactorings of Mozilla codebase, like replacing out-params with return values. His main tool for code rewriting is Pork:
Pork is a C++ parsing and rewriting
tool chain. The core of Pork is a C++
parser that provides exact character
positions for the start and end of
every AST node, as well as the set of
macro expansions that contain any
location. This information allows C++
to be automatically rewritten in a
precise way.
From the blog:
So far pork has been used for “minor”
things like renaming
classes&functions, rotating
outparameters and correcting prbool
bugs. Additionally, Pork proved itself
in an experiment which involved
rewriting almost every function (ie
generating a 3+MB patch) in Mozilla to
use garbage collection instead of
reference-counting.
It is for C++, but it may suit your needs.
One of the posters above says "impossible and inadvisable".
Another says "tricky", which is the right answer.
You need 1) a full C (or whatever language of interest) parser,
2) inference procedures that understand the language
identifier references and data flows to determine that a variable
is indeed "dead", and 3) the ability to actually modify
the source code.
What's hard about all this is the huge energy to build
1) 2) 3). You can't justify for any individual cleanup task.
What one can do is to build such infrastructure specifically
with the goal of amortizing it across lots of differnt
program analysis and transformation tasks.
My company offers such a tool: The DMS Software Reengineering
Toolkit. See
http://www.semdesigns.com/Products/DMS/DMSToolkit.html
DMS has production quality front ends for many languages,
including C, C++, Java and COBOL.
We have in fact built an automated "find useless declarations"
tool for Java that does two things:
a) lists them all (thus producing the list!)
b) makes a copy of the code with the useless declarations
removed.
You choose which answer you want to keep :-)
To do the same for C would not be difficult. We already
have a tool that identifies such dead variables/functions.
One case we did not addess, is the "useless parameter"
case, becasue to remove a useless parameter, you have
to find all the calls from other modules,
verify that setting up the argument doesn't have a side
effect, and rip out the useless argument.
We in fact have full graphs of the entire software
system of interest, and so this would also be
possible.
So, its just tricky, and not even very tricky
if you have the right infrastructure.
You can solve the problem as a text processing problem. There must be a small number of regexp patterns how unused local variables are defined in the source code.
Using a list of unused variable names and the line numbers where they are, You can process the C source code line-by-line. On each line You can iterate over the variable names. On each variable name You can match the patterns one-by-one. After a successful match You know the syntax of the definition, so You know how to delete the unused variable from it.
For example if the source line is: "int a, unused, b;" and the compiler reported "unused" as an unused variable in that line, than the pattern "/, unused,/" will match and You can replace that substring with a single ",".
Also: splint.
Splint is a tool for statically checking C programs for security vulnerabilities and coding mistakes. With minimal effort, Splint can be used as a better lint. If additional effort is invested adding annotations to programs, Splint can perform stronger checking than can be done by any standard lint.

Resources