frama-c slicing plugin appears to discard used stack values - c

Problem description
I'm developing a frama-c plugin that uses the slicing plugin as a library to remove unused bits of automatically generated code. Unfortunately the slicing plugin drops a bunch of stack values, which are actually used. They are used in so far as their addresses are contained in structures that are handed of to abstract external functions.
Simple example
This is a simpler example that models the same general structure I have.
/* Abstract external function */
void some_function(int* ints[]);
int main() {
int i;
int *p = &i;
int *a[] = { &p };
some_function(a);
return 0;
}
When slicing this example with frama-c-gui -slice-calls some_function experiment_slicing.c (I haven't figures out how to see the slicing output when invoking the command line without gui) it drops everything but the declaration int *a[]; and the call to some_function.
Attempted fixes
I tried fixing it by adding ACSL annotations.
However what I believed to be the sensible specification (see below) did not work
/*# requires \valid(ints) && \valid(ints[0]);
*/
void some_function(int* ints[]);
I then tried with an assign (see below) which does have the desired behaviour. however it is not a correct specification, since the function never actually writes to the pointer but needs to read it for correct functionality. I am worried that if I move ahead with such an incorrect specification it will lead to weird problems down the line.
/*# requires \valid(ints) && \valid(ints[0]);
assigns *ints;
*/
void some_function(int* ints[]);

You are on the right track: it is the assigns clause that you should use here: it will indicate which parts of the memory state are concerned by a call to an undefined function. However, you need to provide a complete assigns clause, with its \from part (that indicates which memory location are read to compute the new value of the memory location written to).
I have added an int variable to your example, as your function isn't returning a result (void return type). For a function that is returning something, you should also have a clause assigns \result \from ...;:
int x;
/*# assigns x \from indirect:ints[..], *(ints[..]); */
void some_function(int* ints[]);
int main() {
int i;
int*p = &i;
int *a[] = { &p };
some_function(a);
return 0;
}
The assigns clause indicates that some_function might change the value of x, and that the new value will be computed from the addresses stored in ints[..]
(the indirect label tells that we're not using their value directly, this is described in more detail in section 8.2 of Eva's manual), and their content.
using frama-c -slice-calls some_function file.c -then-last -print (the last arguments are here to print the resulting file on the standard output: -then-last indicates that the following options should operate on the last Frama-C project created, in that case the one resulting from the slicing, and -print prints the C code of said project. You may also use -ocode output.c to redirect the pretty-printing of the code into output.c.) gives the following result:
* Generated by Frama-C */
void some_function(int **ints);
void main(void)
{
int i;
int *p = & i;
int *a[1] = {(int *)(& p)};
some_function(a);
return;
}
Note in addition that your example is not well-typed: &p is a pointer to pointer to int, and should thus be stored in an int** array, not an int* array. But I assume that it only stems from reducing your original example and is does not matter much for slicing itself.

Related

How to find all memory accesses (global,local) by each function in a given C code?

Given a C code and a variable in the C code (global or a local variable of a function), is there a way to find the functions which uses this variable? This should also show the accesses to the variable by a function if it is also accessed through a pointer.
Tried to extract info using LLVM IR but seems difficult.
int a = 2;
int array1 = {1,2,3};
int function1(int c, int d) {
return c + d;
}
int function2 (int arg1[], int * p1, int *p2) {
int a;
return arg1[2]+ (*p1) +a + (*p2);
}
int main() {
int e =2, f=3,g;
g = function1(e,f);
int array2[] = {1,2,3,4};
g = function2(array1,&e,array2);
return 0;
}
variables and the functions which uses them
globals:
a - none,
array1 - function2, main
local variables :
function2:a - function2,
main:e - main, function2,
main:f - main,
main:g - main,
main:array2 - main,function2
is there a way to find the functions which uses this variable
Your best shot will be to use IDE, most of them will be able to trace references to global variables.
Alternatively, you can use static analysis tool like cxref (the one matching https://linux.die.net/man/1/cxref). I used it long time ago, and it was useful. There is a documentation tool with the same name - which might work.
As last resort, if you do not have any other choice, comment the variable declaration, and try building the code. The compiler will raise an error on every bad reference. (Minor exception: locally scoped variables that hides global definitions may not raise an error).
show the accesses to the variable by a function if it is also accessed
through a pointer.
This is extremely hard (impossible for real programs) with static analysis. Usually, this is done at runtime. Some debuggers (e.g. gdb watch) allow you to identify when a variable is being modified (including via pointers). With hardware support it is also possible to set 'read watch' in gdb. See gdb rwatch, and Can I set a breakpoint on 'memory access' in GDB?

Why use "[*]" instead of "[]" in function prototype?

Here is what is it written as rationale for adding the fancy * star syntax for declaring array types inside function prototypes - just for clarification before we get into the question:
A function prototype can have parameters that have variable length
array types (ยง6.7.5.2) using a special syntax as in
int minimum(int,int [*][*]); This is consistent with other C prototypes where the name
of the parameter need not be specified.
But I'm pretty confident that we can have the same effect by simply using only ordinary arrays with unspecified size like this (here re-writing the function example named minimum given above in the quote with what I believe exactly the same functionality (except for using size_t instead of int as first parameter which isn't that important in the case)):
#include <stdio.h>
int minimum(size_t, int (*)[]);
int (main)()
{
size_t sz;
scanf("%zu", &sz);
int vla[sz];
for(size_t i = 0; i < sz; ++i)
vla[i] = i;
minimum(sizeof(vla) / sizeof(*vla), &vla);
int a[] = { 5, 4, 3, 2, 1, 0 };
minimum(sizeof(a) / sizeof(*a), &a);
}
int minimum(size_t a, int (*b)[a])
{
for(size_t i = 0; i < sizeof(*b) / sizeof(**b); ++i)
printf("%d ", (*b)[i]);
return printf("\n");
}
Because I'm pretty sure that there was some place in the standard stating that 2 arrays are compatible only if their size are equal and no-matter if they are variable or not.
My point is also confirmed by the fact that the minimum definition wouldn't complain for "conflicting types" as it would if some of it's parameters had incompatible types (which I don't think is the case as both of those arrays have size which is unspecified at compile-time - I refer to the second parameter of minimum).
OK besides - can you point me 1 single use-case for [*] that can not be replaced using ordinary unspecified size arrays?
The above code compiles without any warnings using both clang and gcc. It also produces the expected output.
For anyone who doesn't know C (or anyone who thinks that he/she knows it) - function parameter of type array is implicitly transformed to "pointer to its elements type". So this:
int minimum(int,int [*][*]);
Gets adjusted to:
int minimum(int,int (*)[*]);
And then I'm arguing that it could be also written as:
int minimum(int,int (*)[]);
Without any consequences and with the same behavior as the 2 forms above. Thus making the [*] form obsolete.
OK besides - can you point me 1 single use-case for [*] that can not
be replaced using ordinary unspecified size arrays?
This would be the case, when you pass three-dimensional VLA array:
int minimum(size_t, int [*][*][*]);
This can be written as:
int minimum(size_t, int (*)[*][*]);
or even using an array of unspecified size:
int minimum(size_t, int (*)[][*]);
But you have no possibility to omit nor get around of the last indice, thus it has to stay as [*] in a such declaration.
[] can only be used as the leftmost "dimension specifier" of a multidimensional array, whereas [*] can be used anywhere.
In function parameter declarations, the leftmost (only!) [...] is adjusted to (*) anyway, so one could use (*) in that position at the expense of some clarity.
One can omit the dimension in the next-to-leftmost [...], leaving the empty brackets. This will leave the array element type incomplete. This is not a big deal, as one can complete it close to the point of use (e.g. in the function definition).
The next [...] needs a number or * inside which cannot be omitted. These declarations
int foo (int [*][*][*]);
int foo (int (*)[*][*]);
int foo (int (*)[ ][*]);
are all compatible, but there isn't one compatible with them that doesn't specify the third dimension as either * or a number. If the third dimension is indeed variable, * is the only option.
Thus, [*] is necessary at least for dimensions 3 and up.

getting address of particular instruction in a function [duplicate]

I want to know the length of C function (written by me) at runtime. Any method to get it? It seems sizeof doesn't work here.
There is a way to determine the size of a function. The command is:
nm -S <object_file_name>
This will return the sizes of each function inside the object file. Consult the manual pages in the GNU using 'man nm' to gather more information on this.
You can get this information from the linker if you are using a custom linker script. Add a linker section just for the given function, with linker symbols on either side:
mysec_start = .;
*(.mysection)
mysec_end = .;
Then you can specifically assign the function to that section. The difference between the symbols is the length of the function:
#include <stdio.h>
int i;
__attribute__((noinline, section(".mysection"))) void test_func (void)
{
i++;
}
int main (void)
{
extern unsigned char mysec_start[];
extern unsigned char mysec_end[];
printf ("Func len: %lu\n", mysec_end - mysec_start);
test_func ();
return 0;
}
This example is for GCC, but any C toolchain should have a way to specify which section to assign a function to. I would check the results against the assembly listing to verify that it's working the way you want it to.
There is no way in standard C to get the amount of memory occupied by a function.
I have just came up with a solution for the exact same problem but the code i have written is platform depended.
The idea behind, putting known opcodes at the end of the function and searching for them from start while counting bytes we have skipped.
Here is the medium link which i have explained with some code
https://medium.com/#gurhanpolat/calculate-c-function-size-x64-x86-c1f49921aa1a
Executables (at least ones which have debug info stripped) doesn't store function lengths in any way. So there's no possibility to parse this info in runtime by self. If you have to manipulate with functions, you should do something with your objects in linking phase or by accessing them as files from your executable. For example, you may tell linker to link symbol tables as ordinary data section into the executable, assign them some name, and parse when program runs. But remember, this would be specific to your linker and object format.
Also note, that function layout is also platform specific and there are some things that make the term "function length" unclear:
Functions may have store used constants in code sections directly after function code and access them using PC-relative addressing (ARM compilers do this).
Functions may have "prologs" and "epilogs" which may may be common to several functions and thus lie outside main body.
Function code may inline other function code
They all may count or not count in function length.
Also function may be completely inlined by compiler, so it loose its body.
A fully worked out solution without linker or dirty platform dependent tricks:
#include <stdio.h>
int i;
__attribute__((noinline, section("mysec"))) void test_func (void)
{
i++;
}
int main (void)
{
extern char __start_mysec[];
extern char __stop_mysec[];
printf ("Func len: %lu\n", __stop_mysec - __start_mysec);
test_func ();
return 0;
}
That's what you get when you read FazJaxton's answer with jakobbotsch's comment
In e.g. Codewarrior, you can place labels around a function, e.g.
label1:
void someFunc()
{
/* code goes here. */
}
label2:
and then calculate the size like (int)(label2-label1), but this is obviously very compiler dependent. Depending on your system and compiler, you may have to hack linker scripts, etc.
The start of the function is the function pointer, you already know that.
The problem is to find the end, but that can be done this way:
#include <time.h>
int foo(void)
{
int i = 0;
++i + time(0); // time(0) is to prevent optimizer from just doing: return 1;
return i;
}
int main(int argc, char *argv[])
{
return (int)((long)main - (long)foo);
}
It works here because the program has ONLY TWO functions so if the code is re-ordered (main implemented before foo) then you will get an irrelevant (negative) calculation, letting you know that it did not work this way but that it WOULD work if you move the foo() code into main() - just substract the main() size you got with the initial negative reply.
If the result is positive, then it will be correct -if no padding is done (yes, some compilers happily inflate the code, either for alignment or for other, less obvious reasons).
The ending (int)(long) cast is for portability between 32-bit and 64-bit code (function pointers will be longer on a 64-bit platform).
This is faily portable and should work reasonably well.
There's no facility defined within the C language itself to return the length of a function; there are simply too many variables involved (compiler, target instruction set, object file/executable file format, optimization settings, debug settings, etc.). The very same source code may result in functions of different sizes for different systems.
C simply doesn't provide any sort of reflection capability to support this kind of information (although individual compilers may supply extensions, such as the Codewarrior example cited by sskuce). If you need to know how many bytes your function takes up in memory, then you'll have to examine the generated object or executable file directly.
sizeof func won't work because the expression func is being treated as a pointer to the function, so you're getting the size of a pointer value, not the function itself.
Just subtract the address of your function from the address of the next function. But note it may not work on your system, so use it only if you
are 100% sure:
#include <stdint.h>
int function() {
return 0;
}
int function_end() {
return 0;
}
int main(void) {
intptr_t size = (intptr_t) function_end - (intptr_t) function;
}
There is no standard way of doing it either in C or C++. There might naturally exist implementation/platform-specific ways of doiung it, but I am not aware of any
size_t try_get_func_size_x86(void* pfn, bool check_prev_opcode = true, size_t max_opcodes_runout = 10000)
{
const unsigned char* op = (const unsigned char*)pfn;
for(int i = 0; i < max_opcodes_runout; i++, op++)
{
size_t sz_at = (size_t)(op - (const unsigned char*)pfn) + 1;
switch(*op)
{
case 0xC3: // ret Opcode
case 0xC2: // ret x Opcode
if(!check_prev_opcode)
return sz_at;
switch(*(op-1)) // Checking Previous Opcode
{
case 0x5D: // pop ebp
case 0x5B: // pop ebx
case 0x5E: // pop esi
case 0x5F: // pop edi
case 0xC9: // leave
return sz_at;
}
}
}
return 0;
}
You can find the length of your C function by subtracting the addresses of functions.
Let me provide you an example
int function1()
{
}
int function2()
{
int a,b; //just defining some variable for increasing the memory size
printf("This function would take more memory than earlier function i.e function01 ");
}
int main()
{
printf("Printing the address of function01 %p\n",function01);
printf("Printing the address of function02 %p\n",function02);
printf("Printing the address of main %p\n",main);
return 0;
}
Hope you would get your answer after compiling it. After compiling you will able to see the
difference in size of function01 and function2.
Note : Normally there is 16bytes diff between one function and other.

Direct access to the function stack

I previously asked a question about C functions which take an unspecified number of parameters e.g. void foo() { /* code here */ } and which can be called with an unspecified number of arguments of unspecified type.
When I asked whether it is possible for a function like void foo() { /* code here */ } to get the parameters with which it was called e.g. foo(42, "random") somebody said that:
The only you can do is to use the calling conventions and knowledge of the architecture you are running at and get parameters directly from the stack. source
My question is:
If I have this function
void foo()
{
// get the parameters here
};
And I call it: foo("dummy1", "dummy2") is it possible to get the 2 parameters inside the foo function directly from the stack?
If yes, how? Is it possible to have access to the full stack? For example if I call a function recursively, is it possible to have access to each function state somehow?
If not, what's the point with the functions with unspecified number of parameters? Is this a bug in the C programming language? In which cases would anyone want foo("dummy1", "dummy2") to compile and run fine for a function which header is void foo()?
Lots of 'if's:
You stick to one version of a compiler.
One set of compiler options.
Somehow manage to convince your compiler to never pass arguments in registers.
Convince your compiler not to treat two calls f(5, "foo") and f(&i, 3.14) with different arguments to the same function as error. (This used to be a feature of, for example, the early DeSmet C compilers).
Then the activation record of a function is predictable (ie you look at the generated assembly and assume it will always be the same): the return address will be there somewhere and the saved bp (base pointer, if your architecture has one), and the sequence of the arguments will be the same. So how would you know what actual parameters were passed? You will have to encode them (their size, offset), presumably in the first argument, sort of what printf does.
Recursion (ie being in a recursive call makes no difference) each instance has its activation record (did I say you have to convince your compiler never optimise tail calls?), but in C, unlike in Pascal, you don't have a link backwards to the caller's activation record (ie local variables) since there are no nested function declarations. Getting access to the full stack ie all the activation records before the current instance is pretty tedious, error prone and mostly interest to writers of malicious code who would like to manipulate the return address.
So that's a lot of hassle and assumptions for essentially nothing.
Yes you can access passed parameters directly via stack. But no, you can't use old-style function definition to create function with variable number and type of parameters. Following code shows how to access a param via stack pointer. It is totally platform dependent , so i have no clue if it going to work on your machine or not, but you can get the idea
long foo();
int main(void)
{
printf( "%lu",foo(7));
}
long foo(x)
long x;
{
register void* sp asm("rsp");
printf("rsp = %p rsp_ value = %lx\n",sp+8, *((long*)(sp + 8)));
return *((long*)(sp + 8)) + 12;
}
get stack head pointer (rsp register on my machine)
add the offset of passed parameter to rsp => you get pointer to long x on stack
dereference the pointer, add 12 (do whatever you need) and return the value.
The offset is the issue since it depends on compiler, OS, and who knows on what else.
For this example i simple checked checked it in debugger, but if it really important for you i think you can come with some "general" for your machine solution.
If you declare void foo(), then you will get a compilation error for foo("dummy1", "dummy2").
You can declare a function that takes an unspecified number of arguments as follows (for example):
int func(char x,...);
As you can see, at least one argument must be specified. This is so that inside the function, you will be able to access all the arguments that follow the last specified argument.
Suppose you have the following call:
short y = 1000;
int sum = func(1,y,5000,"abc");
Here is how you can implement func and access each of the unspecified arguments:
int func(char x,...)
{
short y = (short)((int*)&x+1)[0]; // y = 1000
int z = (int )((int*)&x+2)[0]; // z = 5000
char* s = (char*)((int*)&x+3)[0]; // s[0...2] = "abc"
return x+y+z+s[0]; // 1+1000+5000+'a' = 6098
}
The problem here, as you can see, is that the type of each argument and the total number of arguments are unknown. So any call to func with an "inappropriate" list of arguments, may (and probably will) result in a runtime exception.
Hence, typically, the first argument is a string (const char*) which indicates the type of each of the following arguments, as well as the total number of arguments. In addition, there are standard macros for extracting the unspecified arguments - va_start and va_end.
For example, here is how you can implement a function similar in behavior to printf:
void log_printf(const char* data,...)
{
static char str[256] = {0};
va_list args;
va_start(args,data);
vsnprintf(str,sizeof(str),data,args);
va_end(args);
fprintf(global_fp,str);
printf(str);
}
P.S.: the example above is not thread-safe, and is only given here as an example...

What could be case of use `int x = x;` expression (C language)?

I have a lib written in C. In code i found a few lines like this int x = x;. I need to rewrite all this pieces of code for compilation with /Zw flag. In some places that mean's int x = some_struct->x;, but in another cases i don't understand what is it. In some places it first use of x variable. So in which cases could be used such int x = x; expression.
void oc_enc_tokenize_dc_frag_list(oc_enc_ctx *_enc,int _pli,
const ptrdiff_t *_coded_fragis,ptrdiff_t _ncoded_fragis,
int _prev_ndct_tokens1,int _prev_eob_run1){
const ogg_int16_t *frag_dc;
ptrdiff_t fragii;
unsigned char *dct_tokens0;
unsigned char *dct_tokens1;
ogg_uint16_t *extra_bits0;
ogg_uint16_t *extra_bits1;
ptrdiff_t ti0;
ptrdiff_t ti1r;
ptrdiff_t ti1w;
int eob_run0;
int eob_run1;
int neobs1;
int token;
int eb;
int token1=token1;
int eb1=eb1;
/*Return immediately if there are no coded fragments; otherwise we'd flush
any trailing EOB run into the AC 1 list and never read it back out.*/
if(_ncoded_fragis<=0)return;
frag_dc=_enc->frag_dc;
dct_tokens0=_enc->dct_tokens[_pli][0];
dct_tokens1=_enc->dct_tokens[_pli][1];
extra_bits0=_enc->extra_bits[_pli][0];
extra_bits1=_enc->extra_bits[_pli][1];
ti0=_enc->ndct_tokens[_pli][0];
ti1w=ti1r=_prev_ndct_tokens1;
eob_run0=_enc->eob_run[_pli][0];
/*Flush any trailing EOB run for the 1st AC coefficient.
This is needed to allow us to track tokens to the end of the list.*/
eob_run1=_enc->eob_run[_pli][1];
if(eob_run1>0)oc_enc_eob_log(_enc,_pli,1,eob_run1);
/*If there was an active EOB run at the start of the 1st AC stack, read it
in and decode it.*/
if(_prev_eob_run1>0){
token1=dct_tokens1[ti1r];
eb1=extra_bits1[ti1r];
ti1r++;
eob_run1=oc_decode_eob_token(token1,eb1);
code exaple - variable token1 - it's first use of token1 in file and token1 never meets in other files, it's not global, not static anywhere...
Update with /Zw flag:error C4700: uninitialized local variable 'token1' used
without flag: all works fine with this lib
Update 2
it's theora 1.1.1 lib
Resume
on advice of the guys in comments, i replace every int x = x; with int x = 0 and everything works fine =) everyone thanx for answers
If you literally have int x = x;, there isn't much use of it. This piece attempts to initialize x with itself, that is, with the value of an uninitialized variable.
This may suppress some compiler warnings/errors related to uninitialized or unused variables. But some compilers can catch these dubious cases as well.
This probably also results in undefined behavior from the C standard's view point.
EDIT: Random Number Bug in Debian Linux is an article (with further links) about use and abuse of uninitialized variables and the price one may pay one day.
It prevents the compiler from emitting a warning that the variable is unused.

Resources