Dynamically allocating and copying an array - c

I sometimes see code like this:
char* copyStr(char* input) {
int inputLength;
char *answer;
inputLength = strlen(input);
answer = malloc(inputLength + 1);
answer = input;
return answer;
}
People often say this code doesn't work and that this pattern
answer = malloc(inputLength + 1);
answer = input;
makes no sense. Why is it so? To my eye, the code is OK. It allocates the right amount of memory for the answer, and then copies the input to the answer. And it seems to work in my tests, for example
int main()
{
printf ("%s\n", copyStr("Hello world!"));
}
does what I expect it to do. So what's wrong with it?

To put it simply. This code:
var = foo();
var = bar();
is 100% equivalent to this in all1 situations:
foo();
var = bar();
Furthermore, if foo() has no side effects, it's 100% equivalent to just the last line:
// foo();
var = bar();
This goes for ANY function, including malloc. If we for a moment forget what malloc does and just focus on what just have been said, we can quickly realize what's written in the comments in this code:
answer = malloc(inputLength + 1);
// Here, the variable answer contains the return value from the call to malloc
answer = input;
// Here, it contains the value of input. The old value is overwritten, and
// is - unless you saved it in another variable - permanently lost.
What malloc does really simple. It returns a pointer to a memory block, or a NULL pointer if the allocation failed.2 That's it. What you are doing with a call like ptr = malloc(size) is absolutely nothing more fancy than storing that address in the pointer variable ptr. And pointer variables are in the same way no more fancy than other variables like int or float. An int stores an integer. A pointer stores a memory address. There's no magic here.
1It's 100% equivalent except you're doing really fancy stuff like reading the variable var with an external program
2malloc(0) can return a non-null pointer, but in practice it does not make a difference since it would be undefined behavior to dereference it, and allocating zero bytes is a pretty pointless (haha, point) operation.

To answer this question, let's look at a somewhat simpler code fragment first.
int answer;
answer = 42;
answer = 0;
Even the most cursory of observers would notice that the first assignment
answer = 42;
is useless. It places the value of 42 into answer, only to be thrown away and replaced with 0 at the very next instant of time. So that line of code can be thrown away completely.
Let's verify this by looking at optimised assembly code generated by a C compiler. As we can see, the line answer = 42; does not indeed have any effect on the resulting machine code.
Now compare this to the code in question
answer = malloc(inputLength + 1);
answer = input;
If reasoning by analogy is valid in this case, then we must conclude that the first assignment is useless and can omitted. We place something (the result of malloc) in answer, only to be thrown away and replaced by something else a moment later.
Of course we cannot say whether it is applicable without further research, but we can confirm our suspicion by looking at the generated assembly again. And it is confirmed. The compiler does not even generate any calls to malloc and strlen! They are indeed useless.
So where does this intuition
It allocates the right amount of memory for the answer, and then copies the input to the answer
break down?
The problem lies in the eternal confusion between pointers and arrays.
One may often see claims that in C, arrays are pointers, or that pointers are arrays, or that arrays and pointers are interchangeable, or any number of variations thereof. These claims are all false and misleading. Pointers and arrays are completely different things. They often work together, but that's far cry from being one and the same. Let's break down pointers and arrays in the code example.
input is a pointer variable
input (presumably) points into a string, which is an array of char
answer is another pointer variable
malloc(...) dynamically allocates a new array of char and returns a pointer that points into said array
answer = malloc(...) copies that pointer to answer, now answer points into the array allocated by malloc
answer = input copies another pointer (that we have already seen above) into answer
now answer and input point into the same string, and the result of malloc is forgotten and thrown away
So this explains why your code is doing what you expect it to do. Instead of having two identical copies of the string "Hello world!" you have just one string and two different pointers into it. Which might seem like that's just what the doctor ordered, but it breaks down as soon as we do something ever so slightly complicated. For example, code like this
char *lineArray[MAX_LINES];
char buffer[BUF_LEN];
int i = 0;
while (i < MAX_LINES && fgets(buffer, BUF_LEN, stdin)) {
lineArray[i++] = copyStr(buffer);
}
will end up with every element of stringArray pointing into the same string, instead of into a bunch of different lines taken from stdin.
OK, so now we have established that answer = input copies a pointer. But we want to copy an array, which we have just allocated space for! How do we do that?
Since our arrays are presumably NUL-terminated character strings, we can use a standard library function designed for copying NUL-terminated character strings.
strcpy(answer, input);
For other arrays we can use memcpy. The main difference is that we have to pass down the array length.
memcpy(answer, input, inputLength + 1);
Both variants will work in our case, but the first one is preferred because it reaffirms that we are dealing with strings. Here's the fixed copyStr for completeness:
char* copyStr(char* input) {
int inputLength;
char *answer;
inputLength = strlen(input);
answer = malloc(inputLength + 1);
strcpy(answer, input);
return answer;
}
Incidentally, it works almost the same as the non-standard but widely available strdup function (strdup has a better signature and working error checks, which we have omitted here).

Related

realloc-ing a struct with flexible array

I ran into a rather weird problem,
I have the following code:
typedef struct{
char *a;
char *b;
char *c;
}Str;
typedef struct{
int size;
str array[]; //flexible array.
}strArr;
The purpose here is to allocate a,b, and c for the new element from the realloc.
StrArr *arr;
int arrSize;
arrSize = 1;
arr = malloc(sizeof(strArr)+sizeof(int)*arrSize);
arr->size++;
arr = realloc(arr, sizeof(strArr)+sizeof(int)*arr->size);
arr->array[arr->size-1].a = malloc(sizeof(char)*75);
arr->size++;
card = realloc(arr, sizeof(strArr)+sizeof(int)*arr->size);
The question is: whenever arr is realloc'd to be one bigger, do you have to allocate memory for the strings of the new element? This code will fail if it is run because it gives me glibc detected at the second realloc. What am I doing wrong? If i take off the malloc statement in the middle it runs. Also, if i try a strcpy into arr->array[arr->size-1].a, it would segfault.
Any help would be appreciated.
Thank you.
There are numerous issues with this code, enough to suggest that whatever you're experiencing can't be reproduced. Nonetheless, there are sufficient problems to cause instability (i.e. segmentation violations). I'm going to assume you meant to use a lowercase s in str rather than an uppercase S in Str; it only makes sense that way. Similarly for the lowercase s (which should be) in strArray.
At which point have you assigned arr->size a value in order for arr->size++; to be useful? That itself is a mistake, but that's interlaced into another mistake:
arr = realloc(arr, sizeof(strArr)+sizeof(int)*arr->size);
That turns out to be a major issue as you continue to use the uninitialised variable in critical pieces of logic, again and again, nonetheless, once that issue is resolved, the next mistake here is:
Anything that resembles the pattern X = realloc(X, Y); is suspicious. It's the Xes. Those should be different. You're not supposed to just replace the values like that. I mean, it'll work, kind of... but it's not much more effort to do it properly, and unless done properly, this won't be valgrind-friendly. That should be a big deal to you, because valgrind is a tool that helps us identify memory leaks!
You should store this into a temporary variable:
void *temp = realloc(X, Y);
... and then you can handle errors, perhaps by cleaning up and exiting properly:
if (temp == NULL) {
perror("realloc");
/* free(X); // what would valgrind cease complaining about? */
exit(EXIT_FAILURE);
}
... and replacing X with temp:
X = temp;
sizeof(int) should not be assumed to be the same size as sizeof str (whatever str is). Given the type of arr->array, I would expect sizeof str or, better yet, here's a nice pattern to keep in mind:
// X = realloc(Y, Z); or ...
void *temp = realloc(arr, sizeof *arr + arr->size * sizeof arr->array[0]);
// XXX: handle errors
The question is: whenever arr is realloc'd to be one bigger, do you have to allocate memory for the strings of the new element?
The strings themselves should be in a separate storage location to the list nodes. What is this? Strings and list nodes, in the same array?!
I suppose it might make sense if by strings you mean fixed-width, null padded fields. Fixing the width of the field makes expressing the array in a one-dimensional space much easier.
Otherwise, you should keep your strings allocated separately from your list nodes... in a manner with which the down-stream programmer has complete control over, if I may add, is kinda nice, though you lose that the moment you use realloc, malloc, etc (and thus the moment you use VLAs, hmmmm!)...
What am I doing wrong?
I think I've picked apart your code sufficing to say:
Initialise all of your variables before you use them. In this case, there are some variables pointed at by arr which are used without first being initialised.
Don't assume sizeof(int) and sizeof (/*any pointer type*/) have the same width. There are very real systems where this won't be true.
Remember to use that X = realloc(Y, Z); pattern, followed by error handling, followed by Y = X;.
I'm still not sure whether forcing down-stream programmers to rely upon malloc/realloc/etc and free is necessary, or even beneficial, here.
Also, if i try a strcpy into arr->array[arr->size-1].a, it would segfault.
Yes, well... there's that phantom arr->size-related issue again!

String starting state in C

Sorry if this is a bit of a starter question but I am pretty new to C. I am using the GCC complier. When I write a program with a string in, if the string is beyond a certain length it appears to start with some contents. I am worried about just overwritting it as it could be being used by another program. Here is an example code that shows the issue:
#include <stdio.h>
// Using the GCC Compiler
// Why is there already something in MyString?
int main(void) {
char MyString[250];
printf("%s", MyString);
getch();
return 0;
}
How do I SAFELY avoid this issue? Thanks for your help.
Why is there already something in MyString?
myString is not initiailized and can contain anything.
To initialize to an empty string:
char MyString[250] = { 0 };
or as pointed out by unwind in his answer:
char MyString[250] = "";
which is more readable (and consistent with the following).
To initialize to a string:
char myString[250] = "some-string";
I am worried about just overwritting it as it could be being used by another program
Each running instance of your program will have its own myString.
For some reason many are recommending the array-style initialization of
char myString[50] = { 0 };
however, since this array is intended to be used as a string, I find it far clearer and more intuitive (and simpler syntactically) to use a string initializer:
char myString[50] = "";
This does exactly the same thing, but makes it quite a lot clearer that what you intend to initialize the array as is in fact an empty string.
The situation you're seeing with "random" data is just what happens to be in the array, since you are not initializing it you simply get what happens to be there. This does not mean that the memory is being used by some other program at the same time, so you don't need to worry about that. You do need to worry about handing a pointer to an array of char that is not properly 0-terminated to any C function expecting a string, though.
Technically you are then invoking undefined behavior, which is something you should avoid. It can easily crash your program, since there's no telling how far away into memory you might end up. Operating systems are free to kill processes that try to access memory that they're not allowed to touch.
Properly initializing the array to an empty string avoids this issue.
The problem is that your string is not initialized.
A C-String ends with ends with '\0', so you should simply put something like
MyString[0] = '\0';
behind your declaration. This way you make sure that functions like printf work the way you expect them to work.
char MyString[250] = {0};
but for good use
std::string
Since you have initialzed the char array to any value, it'll contain some garbage value. It's a good programming practice to use something like:
char MyString[250] = "My Array"; // If you know the array to be used
char MyString[250] = '\0'; // If you don't intend to fill the char array data during initialization

C char* pointers pointing to same location where they definitely shouldn't

I'm trying to write a simple C program on Ubuntu using Eclipse CDT (yes, I'm more comfortable with an IDE and I'm used to Eclipse from Java development), and I'm stuck with something weird. On one part of my code, I initialize a char array in a function, and it is by default pointing to the same location with one of the inputs, which has nothing to do with that char array. Here is my code:
char* subdir(const char input[], const char dir[]){
[*] int totallen = strlen(input) + strlen(dir) + 2;
char retval[totallen];
strcpy(retval, input);
strcat(retval, dir);
...}
Ok at the part I've marked with [*], there is a checkpoint. Even at that breakpoint, when I check y locals, I see that retval is pointing to the same address with my argument input. It not even possible as input comes from another function and retval is created in this function. Is is me being unexperienced with C and missing something, or is there a bug somewhere with the C compiler?
It seems so obvious to me that they should't point to the same (and a valid, of course, they aren't NULL) location. When the code goes on, it literally messes up everything; I get random characters and shapes in console and the program crashes.
I don't think it makes sense to check the address of retval BEFORE it appears, it being a VLA and all (by definition the compiler and the debugger don't know much about it, it's generated at runtime on the stack).
Try checking its address after its point of definition.
EDIT
I just read the "I get random characters and shapes in console". It's obvious now that you are returning the VLA and expecting things to work.
A VLA is only valid inside the block where it was defined. Using it outside is undefined behavior and thus very dangerous. Even if the size were constant, it still wouldn't be valid to return it from the function. In this case you most definitely want to malloc the memory.
What cnicutar said.
I hate people who do this, so I hate me ... but ... Arrays of non-const size are a C99 extension and not supported by C++. Of course GCC has extensions to make it happen.
Under the covers you are essentially doing an _alloca, so your odds of blowing out the stack are proportional to who has access to abuse the function.
Finally, I hope it doesn't actually get returned, because that would be returning a pointer to a stack allocated array, which would be your real problem since that array is gone as of the point of return.
In C++ you would typically use a string class.
In C you would either pass a pointer and length in as parameters, or a pointer to a pointer (or return a pointer) and specify the calls should call free() on it when done. These solutions all suck because they are error prone to leaks or truncation or overflow. :/
Well, your fundamental problem is that you are returning a pointer to the stack allocated VLA. You can't do that. Pointers to local variables are only valid inside the scope of the function that declares them. Your code results in Undefined Behaviour.
At least I am assuming that somewhere in the ..... in the real code is the line return retval.
You'll need to use heap allocation, or pass a suitably sized buffer to the function.
As well as that, you only need +1 rather than +2 in the length calculation - there is only one null-terminator.
Try changing retval to a character pointer and allocating your buffer using malloc().
Pass the two string arguments as, char * or const char *
Rather than returning char *, you should just pass another parameter with a string pointer that you already malloc'd space for.
Return bool or int describing what happened in the function, and use the parameter you passed to store the result.
Lastly don't forget to free the memory since you're having to malloc space for the string on the heap...
//retstr is not a const like the other two
bool subdir(const char *input, const char *dir,char *retstr){
strcpy(retstr, input);
strcat(retstr, dir);
return 1;
}
int main()
{
char h[]="Hello ";
char w[]="World!";
char *greet=(char*)malloc(strlen(h)+strlen(w)+1); //Size of the result plus room for the terminator!
subdir(h,w,greet);
printf("%s",greet);
return 1;
}
This will print: "Hello World!" added together by your function.
Also when you're creating a string on the fly you must malloc. The compiler doesn't know how long the two other strings are going to be, thus using char greet[totallen]; shouldn't work.

How to prevent dangling pointers/junk in c?

I'm new to C and haven't really grasped when C decides to free an object and when it decides to keep an object.
heap_t is pointer to a struct heap.
heap_t create_heap(){
heap_t h_t = (heap_t)malloc(sizeof(heap));
h_t->it = 0;
h_t->len = 10;
h_t->arr = (token_t)calloc(10, sizeof(token));
//call below a couple of times to fill up arr
app_heap(h_t, ENUM, "enum", 1);
return h_t;
}
putting h_t through
int app_heap(heap_t h, enum symbol s, char* word, int line){
int it = h->it;
int len = h->len;
if (it + 1 < len ){
token temp;
h->arr[it] = temp;
h->arr[it].sym = s;
h->arr[it].word = word;
h->arr[it].line = line;
h->it = it + 1;
printf(h->arr[it].word);
return 1;
} else {
h->len = len*2;
h->arr = realloc(h->arr, len*2);
return app_heap(h, s, word, line);
}
}
Why does my h_t->arr fill up with junk and eventually I get a segmentation fault? How do I fix this? Any C coding tips/styles to avoid stuff like this?
First, to answer your question about the crash, I think the reason you are getting segmentation fault is that you fail to multiply len by sizeof(token) in the call to realloc. You end up writing past the end of the block that has been allocated, eventually triggering a segfault.
As far as "deciding to free an object and when [...] to keep an object" goes, C does not decide any of it for you: it simply does it when you tell it to by calling free, without asking you any further questions. This "obedience" ends up costing you sometimes, because you can accidentally free something you still need. It is a good idea to NULL out the pointer, to improve your chance of catching the issue faster (unfortunately, this is not enough to eliminate the problem altogether, because of shared pointers).
free(h->arr);
h -> arr = NULL; // Doing this is a good practice
To summarize, managing memory in C is a tedious task that requires a lot of thinking and discipline. You need to check the result of every allocation call to see if it has failed, and perform many auxiliary tasks when it does.
C does not "decide" anything, if you have allocated something yourself with an explicit call to e.g. malloc(), it will stay allocated until you free() it (or until the program terminates, typically).
I think this:
token temp;
h->arr[it] = temp;
h->arr[it].sym = s;
/* more accesses */
is very weird, the first two lines don't do anything sensible.
As pointed out by dasblinkenlight, you're failing to scale the re-allocation into bytes, which will cause dramatic shrinkage of the array when it tries to grow, and corrupt it totally.
You shouldn't cast the return values of malloc() and realloc(), in C.
Remember that realloc() might fail, in which case you will lose your pointer if you overwrite it like you do.
Lots of repetition in your code, i.e. realloc(h->arr, len*2) instead of realloc(h->arr, h->len * sizeof *h->arr) and so on.
Note how the last bullet point also fixes the realloc() scaling bug mentioned above.
You're not reallocating to the proper size, the realloc statement needs to be:
realloc(h->arr, sizeof(token) * len*2);
^^^^^^^^^^^^
(Or perhaps better realloc(h->arr, sizeof *h->arr * h->h_len);)
In C, you are responsible to free the memory you allocate. You have to free() the memory you've malloc/calloc/realloc'ed when it's suitable to do so. The C runtime never frees anything, except when the program has terminated(some more esoteric systems might not release the memory even then).
Also, try to be consistent, the general form for allocating is always T *foo = malloc(sizeof *foo), and dont duplicate stuff.
e.g.
h_t->arr = (token_t)calloc(10, sizeof(token));
^^^^^^^^ ^^ ^^^^^^^^^^^^^
Don't cast the return value of malloc in C. It's unncessesary and might hide a serious compiler warning and bug if you forget to include stdlib.h
the cast is token_t but the sizeof applies to token, why are they different, and are they the same type as *h_t->arr ?
You already have the magic 10 value, use h_t->len
If you ever change the type of h_t->arr, you have to remember to change the sizeof(..)
So make this
h_t->arr = calloc(h_t->len, sizeof *h_t->arr);
Two main problems in creating dangling pointers in C are the not assigning
NULL to a pointer after freeing its allocated memory, and shared pointers.
There is a solution to the first problem, of automatically nulling out the pointer.
void SaferFree(void *AFree[])
{
free(AFree[0]);
AFree[0] = NULL;
}
The caller, instead calling
free(p);
will call
SaferFree(&p);
In respect to the second and harder to be siolved issue:
The rule of three says:
If you need to explicitly declare either the destructor, copy constructor or copy assignment operator yourself, you probably need to explicitly declare all three of them.
Sharing a pointer in C is simply copying it (copy assignment). It means that using the rule of three (or the general rule of 0)
when programming in C obliges the programmer to supply a way to construct and especially destruct such an assignment, which is possible, but not an
easy task especially when C does not supply a descructor that is implicitly activated as in C++.

Pointer initialization and string manipulation in C

I have this function which is called about 1000 times from main(). When i initialize a pointer in this function using malloc(), seg fault occurs, possibly because i did not free() it before leaving the function. Now, I tried free()ing the pointer before returning to main, but its of no use, eventually a seg fault occurs.
The above scenario being one thing, how do i initialize double pointers (**ptr) and pointer to array of pointers (*ptr[])?
Is there a way to copy a string ( which is a char array) into an array of char pointers.
char arr[]; (Lets say there are fifty such arrays)
char *ptr_arr[50]; Now i want point each such char arr[] in *ptr_arr[]
How do i initialize char *ptr_arr[] here?
What are the effects of uninitialized pointers in C?
Does strcpy() append the '\0' on its own or do we have to do it manually? How safe is strcpy() compared to strncpy()? Like wise with strcat() and strncat().
Thanks.
Segfault can be caused by many things. Do you check the pointer after the malloc (if it's NULL)? Step through the lines of the code to see exactly where does it happen (and ask a seperate question with more details and code)
You don't seem to understand the relation of pointers and arrays in C. First, a pointer to array of pointers is defined like type*** or type**[]. In practice, only twice-indirected pointers are useful. Still, you can have something like this, just dereference the pointer enough times and do the actual memory allocation.
This is messy. Should be a separate question.
They most likely crash your program, BUT this is undefined, so you can't be sure. They might have the address of an already used memory "slot", so there might be a bug you don't even notice.
From your question, my advice would be to google "pointers in C" and read some tutorials to get an understanding of what pointers are and how to use them - there's a lot that would need to be repeated in an SO answer to get you up to speed.
The top two hits are here and here.
It's hard to answer your first question without seeing some code -- Segmentation Faults are tricky to track down and seeing the code would be more straightforward.
Double pointers are not more special than single pointers as the concepts behind them are the same. For example...
char * c = malloc(4);
char **c = &c;
I'm not quite sure what c) is asking, but to answer your last question, uninitialized pointers have undefined action in C, ie. you shouldn't rely on any specific result happening.
EDIT: You seem to have added a question since I replied...
strcpy(..) will indeed copy the null terminator of the source string to the destination string.
for part 'a', maybe this helps:
void myfunction(void) {
int * p = (int *) malloc (sizeof(int));
free(p);
}
int main () {
int i;
for (i = 0; i < 1000; i++)
myfunction();
return 0;
}
Here's a nice introduction to pointers from Stanford.
A pointer is a special type of variable which holds the address or location of another variable. Pointers point to these locations by keeping a record of the spot at which they were stored. Pointers to variables are found by recording the address at which a variable is stored. It is always possible to find the address of a piece of storage in C using the special & operator. For instance: if location were a float type variable, it would be easy to find a pointer to it called location_ptr
float location;
float *location_ptr,*address;
location_ptr = &(location);
or
address = &(location);
The declarations of pointers look a little strange at first. The star * symbol which stands in front of the variable name is C's way of declaring that variable to be a pointer. The four lines above make two identical pointers to a floating point variable called location, one of them is called location_ptr and the other is called address. The point is that a pointer is just a place to keep a record of the address of a variable, so they are really the same thing.
A pointer is a bundle of information that has two parts. One part is the address of the beginning of the segment of memory that holds whatever is pointed to. The other part is the type of value that the pointer points to the beginning of. This tells the computer how much of the memory after the beginning to read and how to interpret it. Thus, if the pointer is of a type int, the segment of memory returned will be four bytes long (32 bits) and be interpreted as an integer. In the case of a function, the type is the type of value that the function will return, although the address is the address of the beginning of the function executable.
Also get more tutorial on C/C++ Programming on http://www.jnucode.blogspot.com
You've added an additional question about strcpy/strncpy.
strcpy is actually safer.
It copies a nul terminated string, and it adds the nul terminator to the copy. i.e. you get an exact duplicate of the original string.
strncpy on the other hand has two distinct behaviours:
if the source string is fewer than 'n' characters long, it acts just as strcpy, nul terminating the copy
if the source string is greater than or equal to 'n' characters long, then it simply stops copying when it gets to 'n', and leaves the string unterminated. It is therefore necessary to always nul-terminate the resulting string to be sure it's still valid:
char dest[123];
strncpy(dest, source, 123);
dest[122] = '\0';

Resources