Dynamic array with Frama-C and Eva - c

In https://stackoverflow.com/a/57116260/946226 I learned how to verify that a function foo that operates on a buffer (given by a begin and end pointer) really only reads form it, but creating a representative main function that calls it:
#include <stddef.h>
#define N 100
char test[N];
extern char *foo(char *, char *);
int main() {
char* beg, *end;
beg = &test[0];
end = &test[0] + N;
foo(beg, end);
}
but this does not catch bugs that only appear when the buffer is very short.
I tried the following:
#include <stddef.h>
#include <stdlib.h>
#include "__fc_builtin.h"
extern char *foo(char *, char *);
int main() {
int n = Frama_C_interval(0, 255);
uint8_t *test = malloc(n);
if (test != NULL) {
for (int i=0; i<n; i++) test[i]=Frama_C_interval(0, 255);
char* beg, *end;
beg = &test[0];
end = &test[0] + n;
foo(beg, end);
}
}
But that does not work:
[eva:alarm] frama-main.c:14: Warning:
out of bounds write. assert \valid(test + i);
Can I make it work?

As mentioned in anol's comment, none of the abstract domains available within Eva is capable of keeping track of the relation between n and the length of the memory block returned by malloc. Hence, for all practical purposes, it will not be possible to get rid of the warning in such circumstances for a real-life analysis. Generally speaking, it is important to prepare an initial state which leads to precise bounds for the buffer that are manipulated throughout the program (while the content can stay much more abstract).
That said, for smaller experiments, and if you don't mind wasting (quite a lot of) CPU cycles, it is possible to cheat a little bit, by basically instructing Eva to consider each possible length separately. This is done by a few annotations and command-line options (Frama-C 19.0 Potassium only)
#include <stdint.h>
#include <stddef.h>
#include <stdlib.h>
#include "__fc_builtin.h"
extern char *foo(char *, char *);
int main() {
int n = Frama_C_interval(0, 255);
//# split n;
uint8_t *test = malloc(n);
if (test != NULL) {
//# loop unroll n;
for (int i=0; i<n; i++) {
Frama_C_show_each_test(n, i, test);
test[i]=Frama_C_interval(0, 255);
}
char* beg, *end;
beg = &test[0];
end = &test[0] + n;
foo(beg, end);
}
}
Launch Frama-C with
frama-c -eva file.c \
-eva-precision 7 \
-eva-split-limit 256 \
-eva-builtin malloc:Frama_C_malloc_fresh
In the code, //# split n indicates that Eva should consider separately each possible value of n at this point. It goes along with the -eva-split-limit 256 (by default, Eva won't split if the expression can have more than 100 values). //# loop unroll n ask to unroll the loop n times instead of merging results for all steps.
For the other command-line options, -eva-precision 7 sets the various parameters controlling Eva precision to sensible values. It goes from 0 (less precise than default) up to 11 (maximal precision - don't try it on anything more than a dozen of lines). -eva-builtin malloc:Frama_C_malloc_fresh instructs Eva to create a fresh base address for any call to malloc it encounters. Otherwise, you'd get a single base for all lengths, defeating the purpose of splitting on n in the first place.

Related

Pass a 2D char array to a function in C

I'm a beginning programmer who is confused with passing a two dimensional array to a function. I think it may just be a simple syntax problem. I've looked for an answer, but nothing I've found seems to help, or is too far above my level for me to understand.
I identify the array and the function in the main function as, and after initializing it, attempt to call it:
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
int const ROWS = 8;
int const COLS = 8;
int main(int argc, char** argv) {
char board[ROWS][COLS];
bool canReach(char board[][], int i, int j);
//initialize array
//values of i and j given in a for loop
canReach(board, i, j);
return (EXIT_SUCCESS);
}
While writing the function outside the main function, I defined it exactly the same as I did in the main function.
bool canReach(char board[][], int i, int j){
//Functions purpose
}
When I attempt to build the program, I'm given this error twice and the program does not build:
error: array has incomplete element type 'char[][]'
bool canReach(char board[][], int i, int j)
^
Please note that I'm trying to pass the entire array to the function, and not just a single value. What can I do to fix this problem? I would appreciate it if it didn't have to use pointers, as I find them quite confusing. Also, I've tried to leave out things that I thought weren't important, but I may have missed something I needed, or kept in things I didn't. Thank you for your time in helping out this starting programmer!
You can just pass arrays as function arguments with definition of their size.
bool canReach(char board[ROWS][COLS], int i, int j);
When the size is unknown, pointers are the way.
bool canReach(char* board, int i, int j);
You should know, that arrays != pointers but pointers can storage the address of an array.
Here is a demonstrative program
#include <stdlib.h>
#include <stdio.h>
#include <stdbool.h>
bool canReach( int n, int m, char board[][m] )
{
for ( int i = 0; i < n; i++ )
{
for ( int j = 0; j < m; j++ )
{
board[i][j] = 0;
}
}
return printf( "Hello SaarthakSaxena" );
}
int main( void )
{
const int ROWS = 8;
const int COLS = 8;
char board[ROWS][COLS];
canReach( ROWS, COLS, board );
return EXIT_SUCCESS;
}
Its output is
Hello SaarthakSaxena
Defining a function inside another function (here: main) is not allowed in C. That is an extension of some compilers (e.g. gcc), but should not be used.
You have to specify the dimension of the array. C arrays do not have implicit size information as in HLL.
It also is not a good idea to use const variables for array dimensions in C. Instead
#define ROWS 8
#define COLS 8
Assuming i and j are the indexes of an element in the array, you can use the signature:
bool canReach(size_t rows, size_t cols, char board[rows][cols],
size_t i, size_t j);
This allows to pass arrays of (run-time) variable size to the function. If the dimensions are guaranteed fixed at run-time:
bool canReach(char board[ROWS][COLS], size_t i, size_t j);
But only if using the macros above. It does not work with the const variables.
Both versions tell the compiler which dimension the array has, so it can calculate the address of each element. The first dimension might be omitted, but there is nothing gained and it would imhibit optional bounds checking (C11 option). Note the 1D-case char ca[] is just a special version of this general requirement you can always omit the left(/outer)most dimension.
Note I changed the types to the (unsigned) size_t as that is the appropriate type for array-indexing and will generate a conversion warning if properly enabled (strongly recommended). However, you can use int, but have to ensure no value becomes negative.
Hint: If you intend to store non-character integers in the array or do arithmetic on the elements, you should specify the signed-ness of the char types. char as such can be either unsigned or signed, depending on implementation. So use unsigned char or signed char, depending what you want.

Unexpected performance with global variables

I am getting a strange result using global variables. This question was inspired by another question. In the code below if I change
int ncols = 4096;
to
static int ncols = 4096;
or
const int ncols = 4096;
the code runs much faster and the assembly is much simpler.
//c99 -O3 -Wall -fopenmp foo.c
#include <stdlib.h>
#include <stdio.h>
#include <omp.h>
int nrows = 4096;
int ncols = 4096;
//static int ncols = 4096;
char* buff;
void func(char* pbuff, int * _nrows, int * _ncols) {
for (int i=0; i<*_nrows; i++) {
for (int j=0; j<*_ncols; j++) {
*pbuff += 1;
pbuff++;
}
}
}
int main(void) {
buff = calloc(ncols*nrows, sizeof*buff);
double dtime = -omp_get_wtime();
for(int k=0; k<100; k++) func(buff, &nrows, &ncols);
dtime += omp_get_wtime();
printf("time %.16e\n", dtime/100);
return 0;
}
I also get the same result if char* buff is a automatic variable (i.e. not global or static). I mean:
//c99 -O3 -Wall -fopenmp foo.c
#include <stdlib.h>
#include <stdio.h>
#include <omp.h>
int nrows = 4096;
int ncols = 4096;
void func(char* pbuff, int * _nrows, int * _ncols) {
for (int i=0; i<*_nrows; i++) {
for (int j=0; j<*_ncols; j++) {
*pbuff += 1;
pbuff++;
}
}
}
int main(void) {
char* buff = calloc(ncols*nrows, sizeof*buff);
double dtime = -omp_get_wtime();
for(int k=0; k<100; k++) func(buff, &nrows, &ncols);
dtime += omp_get_wtime();
printf("time %.16e\n", dtime/100);
return 0;
}
If I change buff to be a short pointer then the performance is fast and does not depend on if ncols is static or constant of if buff is automatic. However, when I make buff an int* pointer I observe the same effect as char*.
I thought this may be due to pointer aliasing so I also tried
void func(int * restrict pbuff, int * restrict _nrows, int * restirct _ncols)
but it made no difference.
Here are my questions
When buff is either a char* pointer or a int* global pointer why is the code
faster when ncols has file scope or is constant?
Why does buff being an automatic variable instead of global or static make the code faster?
Why does it make no difference when buff is a short pointer?
If this is due to pointer aliasing why does restrict have no noticeable effect?
Note that I'm using omp_get_wtime() simply because it's convenient for timing.
Some elements allow, as it's been written, GCC to assume different behaviors in terms of optimization; likely, the most impacting optimization we see is loop vectorization. Therefore,
Why is the code faster?
The code is faster because the hot part of it, the loops in func, have been optimized with auto-vectorization. In the case of a qualified ncols with static/const, indeed, GCC emits:
note: loop vectorized
note: loop peeled for vectorization to enhance alignment
which is visible if you turn on -fopt-info-loop, -fopt-info-vec or combinations of those with a further -optimized since it has the same effect.
Why does buff being an automatic variable instead of global or static
make the code faster?
In this case, GCC is able to compute the number of iterations which is intuitively necessary to apply vectorization. This is again due to the storage of buf which is external if not specified otherwise. The whole vectorization is immediately skipped, unlike when buff is local where it carries on and succeeds.
Why does it make no difference when buff is a short pointer?
Why should it? func accepts a char* which may alias anything.
If this is due to pointer aliasing why does restrict have no noticeable effect?
I don't think because GCC can see that they don't alias when func is invoked: restrict isn't needed.
A const will most likely always yield faster or equally fast code as a read/write variable, since the compiler knows that the variable won't be changed, which in turn enables a whole lot of optimization options.
Declaring a file scope variable int or static int should not affect performance much, as it will still be allocated at the very same place: the .data section.
But as mentioned in comments, if the variable is global, the compiler might have to assume that some other file (translation unit) might modify it and therefore block some optimization. I suppose this is what's happening.
But this shouldn't be any concern anyhow, since there is never a reason to declare a global variable in C, period. Always declare them as static to prevent the variable from getting abused for spaghetti-coding purposes.
In general I'd also question your benchmarking results. In Windows you should be using QueryPerformanceCounter and similar.
https://msdn.microsoft.com/en-us/library/windows/desktop/dn553408%28v=vs.85%29.aspx

Segmentation fault (core dumped) in the following code. What's wrong?

I am relatively new to C. I have encountered quite a few segmentation faults but I was able to find the error within a few minutes. However this one's got me confused. Here's a new function I was trying to write. This basically is the C equivalent of the python code
r=t[M:N]
Here's my C code with a test case
#include <stdio.h>
char* subarraym(char* a, int M, int N)
{
char* s;
int i;
for (i=M; i<N; i++){ s[i-M]=a[i]; }
return s;
}
main()
{
char* t="Aldehydes and Ketones";
char* r=subarraym(t,2,10);
printf("%c\n",r[4]);
return 0;
}
The expected answer was 'd'. However I got a segmentation fault.
Extra Info: I was using GCC
Your code will not work because your sub-array pointer is never initialized. You could copy the sub-array, but then you will have to manage the memory, and that's overkilling for your problem.
In C, arrays are usually passed around as pairs of pointer and number of elements. For example:
void do_something(char *p, int np);
If you follow this idiom, then getting a sub-array is trivial, assuming no overflow:
void do_something_sub(char *p, int np, int m, int n)
{
do_array(p + m, n);
}
Checking and managing overflow is also easy, but it is left as an exercise to the reader.
Note 1: Generally, you will not write a function such as do_something_sub(), just call do_something() directly with the proper arguments.
Note 2: Some people prefer to use size_t instead of int for array sizes. size_t is an unsigned type, so you will never have negative values.
Note 3: In C, strings are just like char arrays, but the length is determined by ending them with a NUL char, instead of passing around the length. So to get a NUL-terminated substring, you have to either copy the substring to another char array or modify the original string and overwrite the next-to-last char with the NUL.
From
...,10);
you expect to receive 10 char (+1 0-terminator), so provide it to the function somehow.
Not doing so, but writing to invalid memory by
char * s; /* note, that s is NOT initialised, so it points "nowhere". */
...
s[i-M] = ...
provokes undefined behaviour.
Possible solution to provide memory for such a case can be found in this answer: https://stackoverflow.com/a/25230722/694576
You need to secure the necessary memory.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char* subarraym(char *a, int M, int N){
if(N < 0){
N += strlen(a);
if(N < 0)
N = 0;
}
int len = N - M;
char *s =calloc(len+1, sizeof(char));//memory allocate for substring
return memcpy(s, &a[M], len);
}
int main(){
char *t="Aldehydes and Ketones";
char *r=subarraym(t,2,10);
printf("%c\n",r[4]);
free(r);
return 0;
}

Pointer to pointer or global variables?

Below I have to examples of code that do the same thing and give the same output. In the first, I use pointer to pointer argument passing to eliminate the use of ans as a global. In the second, I madeans a global which eliminated the additional uses of * when dealing with pointer to pointer:
Example 1:
// pointer to pointer
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
unsigned char serial[] = {
0x1,0x2,0x3,0x4
};
void checkSerial(unsigned char* buf, unsigned char ** ans)
{
int i;
unsigned char *part;
part = 0;
i=2;
part = &buf[i];
*ans = (unsigned char*)malloc(2);
memset(*ans,0,2);
memcpy(*ans,part,2);
printf("0x%x\n",**ans);
++(*ans);
printf("0x%x\n",**ans);
}
int main(void)
{
unsigned char *ans, *buf;
while(1)
{
buf = malloc(4);
memset(buf,0,4);
memcpy(buf, serial, sizeof(serial));
checkSerial(buf, &ans);
--ans;
printf("the value is 0x%x\n", *ans);
free(buf);
free(ans);
sleep(3);
}
return 0;
}
Example 2:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
unsigned char serial[] = {
0x1,0x2,0x3,0x4
};
unsigned char ans[2];
void checkSerial(unsigned char* buf)
{
int i;
unsigned char *part;
part = 0;
i=2;
part = &buf[i];
int j;
for(j=0;j<2;j++)
{
ans[j] = part[j];
}
printf("0x%x\n",*ans);
++(*ans);
printf("0x%x\n",*ans);
}
int main(void)
{
unsigned char *buf;
while(1)
{
buf = malloc(4);
memset(buf,0,4);
memcpy(buf, serial, sizeof(serial));
checkSerial(buf);
printf("the value is 0x%x\n", *ans);
free(buf);
sleep(3);
}
return 0;
}
Which technique is preferred in C?
Avoid global variables when it is not necessary. Going with first example is preferable.
Global variables are easily accessible by every functions, they can be read or modified by any part of the program, making it difficult to remember or reason about every possible use.
Keep variables as close to the scope they are being used in as possible. This prevents unexpected values for your variables and potential naming issues.
I personally don't like defining global variable where there is ways to avoid it.
But some guys say that, the concept of pointer is very much confusing. I don't feel that though..
My advice, if you get confuse with pointers try to avoid it with defining global variable. Otherwise, use pointers... :)
TL;DR: Solutions 1 and 2 are both bad.
The way you wrote the example makes malloc useless since you know the size of ans and buf at compile-time, if those are really known at compile-time then , just don't use malloc at all, declare variables on the stack. In C, generally avoid dynamic memory allocation as much as possible and prefer to create buffers which can hold the maximum size a buffer can have in your application. That avoids this kind of problems in the first place. The way you wrote the example makes malloc useless since you know the size of ans and buf at compile-time. The only place where dynamic memory allocation can be useful is for buffers whose sizes are unknown at compile-time, but you can still avoid it (see below). If buf is an incoming message, and ans the answer to this message, the size of ans can be unknown at compile-time, at least if you use variable-length messages.
Your version 2 is not working and can not work! First you declared ans to be an array of size 1 and iterate over it until index 2(now you edited that). Second to declare the array ans as global you would need to know its size at compile-time, and then of course if you knew its size at compile-time you would just declare the array ans in the function checkSerial. Moreover, when you declare a variable which is used by several functions in C don't forget to declare it static, otherwise it can be accessed from all files in your project.
A solution avoiding dynamic allocation, notice you avoid the disadvantages of your 2 solutions: the pointer to pointer and the global variable, and moreover your program can not leak since you don't use dynamic allocation:
enum {MSG_MAX_SIZE = 256 };
typedef struct message {
uint8_t payload[MSG_MAX_SIZE];
size_t msg_size;
} message_t;
void checkSerial(const message_t *buf, message_t *ans)
{
//parse buf and determine size of answer
...
...
//fill answer payload
ans->msg_size = buf[42];
}
int main(void)
{
while (1) {
message_t buf;
getMsg(&buf);
message_t ans;
checkSerial(&buf, &ans);
}
}

Microchip C18 - Weird code behavior (maybe extended-mode / non-extended-mode related)

I have this weird problem with the Microchip C18 compiler for PIC18F67J60.
I have created a very simple function that should return the index of a Sub-String in a larger String.
I don't know whats wrong, but the behavior seems to be related to wether extended mode is enabled or not.
With Extended-Mode enabled in MPLAB.X I get:
The memcmppgm2ram function returns zero all the time.
With Extended-Mode disabled in MPLAB.X I get:
The value of iterator variable i counts as: 0, 1, 3, 7, 15, 21
I'm thinking some stack issue or something, because this is really weird.
The complete code is shown below.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char bigString[] = "this is a big string";
unsigned char findSubStr(char *str, const rom char *subStr, unsigned char n, unsigned char m)
{
unsigned char i;
for (i=0; i < n-m; i++)
{
if(0 == memcmppgm2ram(&str[i], (const far rom void*)subStr, m))
return i;
}
return n; // not found
}
void main(void)
{
char n;
n = findSubStr(bigString, (const rom void*)"big", sizeof(bigString), 3);
}
memcmppgm2ram() expects a pointer to data memory (ram) as its first argument. You are passing a pointer to a string literal, which is located in program memory (rom).
You can use memcmppgm() instead, or copy the other string to ram using memcpypgm2ram() or strcpypgm2ram().
Unfortunately I can't test this, as I don't have access to this compiler at the moment.

Resources