Trying to understand C sizing better - c

First things first, the code works, but it didn't for a while, and I'm trying to understand why what I did fixes it.
So I have a function:
int array_size(const char **array) {
int i = 0;
while (array[i] != NULL) ++i;
return i;
}
I also have this pointer which I started with one element and a call to a function which modifies local_mig:
int main(void) {
char **local_mig = malloc(sizeof(char *) * 1);
populate_local_mig(&local_mig);
int size = array_size(local_mig); // 9
}
This function looks like this (note the comment on second to last line):
void populate_local_mig(char ***local_mig) {
// ...above here reads a directory with 5 .sql files
while ((directory = readdir(dir)) != NULL) {
int d_name_len = strlen(directory->d_name);
char *file_name = malloc(sizeof(char) * (d_name_len + 1));
strcpy(file_name, (const char *)directory->d_name);
size_t len = strlen(file_name);
if (len > 4 && strcmp(file_name + len - 4, ".sql") == 0) {
(*local_mig)[i] = malloc(sizeof(char) * (len + 1));
strcpy((*local_mig)[i], file_name);
++i;
*local_mig = realloc(*local_mig, sizeof(char *) * (i + 1));
}
}
//(*local_mig)[i] = NULL;
}
Still with me? Good.
Later on, I call array_size(local_mig); and it returns 9. What the? I was expecting 5. So naturally when I iterate over local_mig later, I eventually segfault when it tries to read the 6th element.
So, I added (*local_mig)[i] = NULL; and suddenly everything was ok and it returned 5, like it should have.
All along I figured since I allocated exactly enough space to fit each character array, that the size would obviously be the number of times I resized local_mig.
Turns out I was wrong... very very wrong. But why, I ask...

If you don't set the last pointer in your list to NULL, you will encounter undefined behavior in your array_size function, as it rolls right past the end of the array (with no marker to stop it) and into memory that you probably do not own and is not initialized.
The unpredicted size of 9 is the result of the aforementioned undefined behavior. It's probably the result of whatever was in memory at the time. Really, though, with UB, anything can happen.

The loop in array_size eventually gets up to testing array[i] != NULL, where i is the last index in the space you allocated with realloc.
If you actually did set this entry to NULL then all is well. But if you didn't: uninitialized values are different to null pointers. Reading an uninitialized value may cause a crash, or the compiler may optimize the program based on the assumption that you never read uninitialized values because the language specification says you aren't meant to do that!
A likely result is that this last entry will appear to contain a junk value which probably does not match NULL. And then your loop continues to read past the end of the allocated space , with unpredictable results.

Related

How to fix segfault caused by a realloc going out of bounds?

Hello and TIA for your help. As I am new to to posting questions, I welcome any feedback on how this quesiton has been asked. I have researched much in SO without finding what I thought I was looking for.
I'm still working on it, and I'm not really good at C.
My purpose is extracting data from certain specific tags from a given XML and writing it to file. My issue arises because as I try to fill up the data struct I created for this purpose, at a certain point the realloc() function gives me a pointer to an address that's out of bounds.
If you look at this example
#include <stdio.h>
int main() {
char **arrayString = NULL;
char *testString;
testString = malloc(sizeof("1234567890123456789012345678901234567890123456789"));
strcpy(testString, "1234567890123456789012345678901234567890123456789");
int numElem = 0;
while (numElem < 50) {
numElem++;
arrayString = realloc(arrayString, numElem * sizeof(char**));
arrayString[numElem-1] = malloc(strlen(testString)+1);
strcpy(arrayString[numElem-1], testString);
}
printf("done\n");
return 0;
}
it does a similar, but simplified thing to my code. Basically tries to fill up the char** with c strings but it goes to segfault. (Yes I understand I am using strcpy and not its safer alternatives, but as far as I understand it copies until the '\0', which is automatically included when you write a string between "", and that's all I need)
I'll explain more in dephth below.
In this code i make use of the libxml2, but you don't need to know it to help me.
I have a custom struct declared this way:
struct List {
char key[24][15];
char **value[15];
int size[15];
};
struct List *list; //i've tried to make this static after reading that it could make a difference but to no avail
Which is filled up with the necessary key values. list->size[] is initialized with zeros, to keep track of how many values i've inserted in value.
value is delcared this way because for each key, i need an array of char* to store each and every value associated with it. (I thought this through, but it could be a wrong approach and am welcome to suggestions - but that's not the purpose of the question)
I loop through the xml file, and for each node I do a strcmp between the name of the node and each of my keys. When there is a match, the index of that key is used as an index in the value matrix. I then try to extend the allocated memory for the c string matrix and then afterwards for the single char*.
The "broken" code, follows, where
read is the index of the key abovementioned.
reader is the xmlNode
string contained the name of the xmlNode but is then freed so consider it as if its a new char*
list is the above declared struct
if (xmlTextReaderNodeType(reader) == 3 && read >= 0)
{
/* pull out the node value */
xmlChar *value;
value = xmlTextReaderValue(reader);
if (value != NULL) {
free(string);
string=strdup(value);
/*increment array size */
list->size[read]++;
/* allocate char** */ list->value[read]=realloc(list->value[read],list->size[read] * sizeof(char**));
if (list->value[read] == NULL)
return 16;
/*allocate string (char*) memory */
list->value[read][list->size[read]-1] = realloc(list->value[read][list->size[read]-1], sizeof(char*)*sizeof(string));
if (list->value[read][list->size[read]-1] == NULL)
return 16;
/*write string in list */
strcpy(list->value[read][list->size[read]-1], string);
}
/*free memory*/
xmlFree(value);
}
xmlFree(name);
free(string);
I'd expect this to allocate the char**, and then the char*, but after a few iteration of this code (which is a function wrapped in a while loop) i get a segfault.
Analyzing this with gdb (not an expert with it, just learned it on the fly) I noticed that indeed the code seems to work as expected for 15 iteration. At the 16th iteration, the list->value[read][list->size[read]-1] after the size is incremented, list->value[read][list->size[read]-1] points to a 0x51, marked as address out of bounds. The realloc only brings it to a 0x3730006c6d782e31, still marked as out of bounds. I would expect it to point at the last allocated value.
Here is an image of that: https://imgur.com/a/FAHoidp
How can I properly allocate the needed memory without going out of bounds?
Your code has quite a few problems:
You are not including all the appropriate headers. How did you get this to compile? If you are using malloc and realloc, you need to #include <stdlib.h>. If you are using strlen and strcpy, you need to #include <string.h>.
Not really a mistake, but unless you are applying sizeof to a type itself you don't have to use enclosing brackets.
Stop using sizeof str to get the length of a string. The correct and safe approach is strlen(str)+1. If you apply sizeof to a pointer someday you will run into trouble.
Don't use sizeof(type) as argument to malloc, calloc or realloc. Instead, use sizeof *ptr. This will avoid your incorrect numElem * sizeof(char**) and instead replace it with numElem * sizeof *arrayString, which correctly translates to numElem * sizeof(char*). This time, though, you were saved by the pure coincidence that sizeof(char**) == sizeof(char*), at least on GCC.
If you are dynamically allocating memory, you must also deallocate it manually when you no longer need it. Use free for this purpose: free(testString);, free(arrayString);.
Not really a mistake, but if you want to cycle through elements, use a for loop, not a while loop. This way your intention is known by every reader.
This code compiles fine on GCC:
#include <stdio.h> //NULL, printf
#include <stdlib.h> //malloc, realloc, free
#include <string.h> //strlen, strcpy
int main()
{
char** arrayString = NULL;
char* testString;
testString = malloc(strlen("1234567890123456789012345678901234567890123456789") + 1);
strcpy(testString, "1234567890123456789012345678901234567890123456789");
for (int numElem = 1; numElem < 50; numElem++)
{
arrayString = realloc(arrayString, numElem * sizeof *arrayString);
arrayString[numElem - 1] = malloc(strlen(testString) + 1);
strcpy(arrayString[numElem - 1], testString);
}
free(arrayString);
free(testString);
printf("done\n");
return 0;
}

Function receives a pointer to double, allocates memory and fills resulted array of doubles [duplicate]

This question already has answers here:
Initializing a pointer in a separate function in C
(2 answers)
Closed 3 years ago.
My goal is to pass a pointer to double to a function, dynamically allocate memory inside of the function, fill resulted array with double values and return filled array. After lurking attentively everywhere in StackOverflow, I have found two related topics, namely Initializing a pointer in a separate function in C and C dynamically growing array. Accordingly, I have tried to write my own code. However, the result was not the same as it was described in aforementioned topics. This program was run using both gcc and Visual Studio.
First trial.
int main()
{
double *p;
int count = getArray(&p);
<...print content of p...>
return 0;
}
int getArray(double *p)
{
int count = 1;
while(1)
{
if(count == 1)
p = (double*)malloc(sizeof(double));
else
p = (double*)realloc(p, count*sizeof(double));
scanf("%lf", &p[count-1]);
<...some condition to break...>
count++;
{
<... print the content of p ...>
return count;
}
(Here comes the warning from compiler about incompatible argument type. Ignore it).
Input:
1.11
2.22
3.33
Output:
1.11
2.22
3.33
0.00
0.00
0.00
Second trial.
int main()
{
double *p;
int count = getArray(&p);
<...print content of p...>
return 0;
}
int getArray(double **p)
{
int count = 1;
while(1)
{
if(count == 1)
*p = (double*)malloc(sizeof(double));
else
{
double ** temp = (double*)realloc(*p, count*sizeof(double));
p = temp;
}
scanf("%lf", &(*p)[count-1]);
<...some condition to break...>
count++;
{
<... print the content of p ...>
return count;
}
Input:
1.11
2.22
Segmentation error.
I tried this method on several different *nix machines, it fails when the loop uses realloc. SURPRISINGLY, this code works perfect using Visual Studio.
My questions are: first code allows to allocate and reallocate the memory and even passes all this allocated memory to main(), however, all the values are zeroed. What is the problem? As for the second program, what is the reason of the segmentation error?
The right way of doing it is like this:
int getArray(double **p)
{
int count = 0;
while(1)
{
if(count == 0)
*p = malloc(sizeof(**p));
else
*p = realloc(*p, (count+1)*sizeof(**p));
scanf("%lf", &((*p)[count]));
<...some condition to break...>
count++;
{
<...print content of p...>
return count;
}
If you pass a pointer to a function and you want to change not only the value it is pointing at, but change the address it is pointing to you HAVE to use a double pointer. It is simply not possible otherwise.
And save yourself some trouble by using sizeof(var) instead of sizeof(type). If you write int *p; p = malloc(sizeof(int));, then you are writing the same thing (int) twice, which means that you can mess things up if they don't match, which is exactly what happened to you. This also makes it harder to change the code afterwards, because you need to change at multiple places. If you instead write int *p; p = malloc(sizeof(*p)); that risk is gone.
Plus, don't cast malloc. It's completely unnecessary.
One more thing you always should do when allocating (and reallocating) is to check if the allocation was successful. Like this:
if(count == 0)
*p = malloc(sizeof(**p));
else
*p = realloc(*p, (count+1)*sizeof(**p));
if(!p) { /* Handle error */ }
Also note that it is possible to reallocate a NULL pointer, so in this case the malloc is not necessary. Just use the realloc call only without the if statement. One thing worth mentioning is that if you want to be able to continue execution if the realloc fails, you should NOT assign p to the return value. If realloc fails, you will lose whatever you had before. Do like this instead:
int getArray(double **p)
{
int count = 0;
// If *p is not pointing to allocated memory or NULL, the behavior
// of realloc will be undefined.
*p = NULL;
while(1)
{
void *tmp = realloc(*p, (count+1)*sizeof(**p));
if(!tmp) {
fprintf(stderr, "Fail allocating");
exit(EXIT_FAILURE);
}
*p = tmp;
// I prefer fgets and sscanf. Makes it easier to avoid problems
// with remaining characters in stdin and it makes debugging easier
const size_t str_size = 100;
char str[str_size];
if(! fgets(str, str_size, stdin)) {
fprintf(stderr, "Fail reading");
exit(EXIT_FAILURE);
}
if(sscanf(str, "%lf", &((*p)[count])) != 1) {
fprintf(stderr, "Fail converting");
exit(EXIT_FAILURE);
}
count++;
// Just an arbitrary exit condition
if ((*p)[count-1] < 1) {
printf("%lf\n", (*p)[count]);
break;
}
}
return count;
}
You mentioned in comments below that you're having troubles with pointers in general. That's not unusual. It can be a bit tricky, and it takes some practice to get used to it. My best advice is to learn what * and & really means and really think things through. * is the dereference operator, so *p is the value that exists at address p. **p is the value that exists at address *p. The address operator & is kind of an inverse to *, so *&x is the same as x. Also remember that the [] operator used for indexing is just syntactic sugar. It works like this: p[5] translates to *(p+5), which has the funny effect that p[5] is the same as 5[p].
In my first version of above code, I used p = tmp instead of *p = tmp and when I constructed a complete example to find that bug, I also used *p[count] instead of (*p)[count]. Sorry about that, but it does emphasize my point. When dealing with pointers, and especially pointers to pointers, REALLY think about what you're writing. *p[count] is equivalent to *(*(p+count)) while (*p)[count] is equivalent to *((*p) + count) which is something completely different, and unfortunately, none of these mistakes was caught even though I compiled with -Wall -Wextra -std=c18 -pedantic-errors.
You mentioned in comments below that you need to cast the result of realloc. That probably means that you're using a C++ compiler, and in that case you need to cast, and it should be (double *). In that case, change to this:
double *tmp = (double*)realloc(*p, (count+1)*sizeof(**p));
if(!tmp) {
fprintf(stderr, "Fail allocating");
exit(EXIT_FAILURE);
}
*p = tmp;
Note that I also changed the type of the pointer. In C, it does not matter what type of pointer tmp is, but in C++ it either has to be a double* or you would need to do another cast: *p = (double*)tmp

Abort trap: 6 error with arrays in c

The following code compiled fine yesterday for a while, started giving the abort trap: 6 error at one point, then worked fine again for a while, and again started giving the same error. All the answers I've looked up deal with strings of some fixed specified length. I'm not very experienced in programming so any help as to why this is happening is appreciated. (The code is for computing the Zeckendorf representation.)
If I simply use printf to print the digits one by one instead of using strings the code works fine.
#include <string.h>
// helper function to compute the largest fibonacci number <= n
// this works fine
void maxfib(int n, int *index, int *fib) {
int fib1 = 0;
int fib2 = 1;
int new = fib1 + fib2;
*index = 2;
while (new <= n) {
fib1 = fib2;
fib2 = new;
new = fib1 + fib2;
(*index)++;
if (new == n) {
*fib = new;
}
}
*fib = fib2;
(*index)--;
}
char *zeckendorf(int n) {
int index;
int newindex;
int fib;
char *ans = ""; // I'm guessing the error is coming from here
while (n > 0) {
maxfib(n, &index, &fib);
n -= fib;
maxfib(n, &newindex, &fib);
strcat(ans, "1");
for (int j = index - 1; j > newindex; j--) {
strcat(ans, "0");
}
}
return ans;
}
Your guess is quite correct:
char *ans = ""; // I'm guessing the error is coming from here
That makes ans point to a read-only array of one character, whose only element is the string terminator. Trying to append to this will write out of bounds and give you undefined behavior.
One solution is to dynamically allocate memory for the string, and if you don't know the size beforehand then you need to reallocate to increase the size. If you do this, don't forget to add space for the string terminator, and to free the memory once you're done with it.
Basically, you have two approaches when you want to receive a string from function in C
Caller allocates buffer (either statically or dynamically) and passes it to the callee as a pointer and size. Callee writes data to buffer. If it fits, it returns success as a status. If it does not fit, returns error. You may decide that in such case either buffer is untouched or it contains all data fitting in the size. You can choose whatever suits you better, just document it properly for future users (including you in future).
Callee allocates buffer dynamically, fills the buffer and returns pointer to the buffer. Caller must free the memory to avoid memory leak.
In your case the zeckendorf() function can determine how much memory is needed for the string. The index of first Fibonacci number less than parameter determines the length of result. Add 1 for terminating zero and you know how much memory you need to allocate.
So, if you choose first approach, you need to pass additional two parameters to zeckendorf() function: char *buffer and int size and write to the buffer instead of ans. And you need to have some marker to know if it's first iteration of the while() loop. If it is, after maxfib(n, &index, &fib); check the condition index+1<=size. If condition is true, you can proceed with your function. If not, you can return error immediately.
For second approach initialize the ans as:
char *ans = NULL;
after maxfib(n, &index, &fib); add:
if(ans==NULL) {
ans=malloc(index+1);
}
and continue as you did. Return ans from function. Remember to call free() in caller, when result is no longer needed to avoid memory leak.
In both cases remember to write the terminating \0 to buffer.
There is also a third approach. You can declare ans as:
static char ans[20];
inside zeckendorf(). Function shall behave as in first approach, but the buffer and its size is already hardcoded. I recommend to #define BUFSIZE 20 and either declare variable as static char ans[BUFSIZE]; and use BUFSIZE when checking available size. Please be aware that it works only in single threaded environment. And every call to zeckendorf() will overwrite the previous result. Consider following code.
char *a,*b;
a=zeckendorf(10);
b=zeckendorf(15);
printf("%s\n",a);
printf("%s\n",b);
The zeckendorf() function always return the same pointer. So a and b would pointer to the same buffer, where the string for 15 would be stored. So, you either need to store the result somewhere, or do processing in proper order:
a=zeckendorf(10);
printf("%s\n",a);
b=zeckendorf(15);
printf("%s\n",b);
As a rule of thumb majority (if not all) Linux standard C library function uses either first or third approach.

Segmentation fault on malloc

After running this function many (not sure exactly how many) times, it seg faults on a simple memory allocation. Why would this suddenly happen? I did notice something strange in GDB. In the function that calls it, normally there's 6-digit long hex value for wrd (wrd = 0x605140 for example), however on the call where it crashes, the hex value is only two digits long. (wrd=0x21). I also checked the wrd->length, and it's 3.
The line that it crashes on is...
char *word_temp = malloc(wrd->length * sizeof(char));
EDIT:
Here's the code that creates the wrd...
while(fgets(input, 100, src) != 0)
{
int i = 0;
while(input[i] != '\0')
{
i++;
}
struct word *wrd = malloc(sizeof(struct word));
wrd->letters = input;
wrd->length = i;
If I'm getting an overflow, how do I fix that?
Looks like wrd->length does not include the terminating '\0'.
Fix 1, allocate word_temp like this:
char *word_temp = malloc( wrd->length + 1 );
Fix 2, include the '\0' by modifying you length count loop:
int i = 0;
while(input[i++] != '\0') {}
This will increase i one more time than code in the question, which is easy to see if you consider case of input being empty.
Note that you need to do either fix 1 or fix 2, not both. Choose which ever works with rest of your code.
You probably have a second issue with this line:
wrd->letters = input;
It does not copy input, it copies the pointer. If you change contents of input, contents of wrd->letters changes too, because they point to same memory location. Also if input is a local char array, then once it goes out of scope, wrd->letters becomes a dangling pointer, which will be overwritten by other data, and modifying it after that will result in memory corruption.
Possible fix (depending on rest of your code) is to use strdup:
wrd->letters = strdup(input);
Remember that it is now allocated from heap, so when done, you must remember to do
free(wrd->letters);
About wrd being 0x21, that indicates either memory corruption, or that you actually have two separate wrd variables, and one one is left uninitialized.
For example, maybe wrd is a function parameter struct word *wrd, in which case you only modify the local value in function, it does not get passed back to the caller. To modify the pointer of caller, you need to have pointer to pointer: struct word **wrd and then do (*wrd) = malloc... and (*wrd)->letters... etc.

strcpy behaving differently if destination string is uninitialized

I'm working in C trying to create a huffman decoder. This piece of code only works if codearray comes in uninitialized, otherwise it gives me a segmentation fault. However, valgrind complains that codearray is uninitialized if I do it that way. I went through it with ddd and the segmentaion fault happens once strcpy is called and I cannot figure out why.
void printtree_inorder(node* n,char* code,char* letarray,char** codearray)
{
if (n == NULL) {
return;
}
static int counter=0;
appenddigit(code,'0');
printtree_inorder(n -> left,code,letarray,codearray);
remdigit(code);
if (n->let!='\0') {
letarray[counter]=n->let;
strcpy(codearray[counter],code);
counter++;
}
appenddigit(code,'1');
printtree_inorder(n -> right,code,letarray,codearray);
remdigit(code);
}
Here is the calling function:
char code[100]={'\0'};
char** codearray=(char**)malloc(numchars*sizeof(char*));
for (i=0;i<numchars;i++) {
codearray[i]=(char*)malloc(100*sizeof(char));
}
char* letarray=(char*)malloc((numchars+1)*sizeof(char));
letarray[0]='\0';
printtree_inorder(root,code,letarray,codearray);
for (i=0;i<numchars;i++) {
codearray[i]=(char*)malloc(100*sizeof(char));
}
this is the code you talking about? it is not really initialization code, it is making room for data code.
char** codearray=(char**)malloc(numchars*sizeof(char*));
just creates you an array of char *, but they do not point to any valid memory.
so, your "initialization code" just makes sure, that your memory is created correcly.
the other thing what really scares me is, that your counter variable is static.
calling
printtree_inorder(root,code,letarray,codearray);
printtree_inorder(root,code,letarray,codearray);
will also end in a segmentation fault, since counter will be > then numchars when you call it a second time (from outside).
so, lets rewrite your code a bit and make it more safe
char* code = (char *)malloc(numchars + 1);
memset(code, 0, numchars + 1);
char* letarray = (char *)malloc(numchars + 1);
memset(letarray, 0, numchars + 1);
char** codearray = (char **)malloc(numchars * sizeof(char *));
memset(codearray, 0, numchars * sizeof(char *));
printtree_inorder(root, code, letarray, codearray, 0);
free(code);
// do not forget the free the other allocations later as well as
void printtree_inorder(node* n,char* code,char* letarray,char** codearray, int counter)
{
if (n == NULL) {
return;
}
appenddigit(code,'0');
printtree_inorder(n -> left,code,letarray,codearray, counter);
remdigit(code);
if (n->let!='\0')
{
letarray[counter] = n->let;
codearray[counter] = strdup(code);
++counter;
}
appenddigit(code,'1');
printtree_inorder(n -> right,code,letarray,codearray, counter);
remdigit(code);
}
Probably in the "initialized" call the array isn't really correctly initialized at all, therefore the function crashes.
When "not initialized", the array probably contains values that (by chance) don't lead to a segmentation fault, depending on what the program has done previously with the memory that ends up being used for codearray.
The function tries to copy a string to wherever codearray[counter] points to:
strcpy(codearray[counter],code);
In the function call you show this codearray[counter] is a random value since only the array was malloc'ed, but the elements weren't initialized to any specific values. strcpy() then tries to write to that random memory address.
You have to allocate memory for the copy of the string, for example by using strdup() instead of strcpy().

Resources