Is there anything I should know about using strtok on a malloced string?
In my code I have (in general terms)
char* line=getline();
Parse(dest,line);
free(line);
where getline() is a function that returns a char * to some malloced memory.
and Parse(dest, line) is a function that does parsing online, storing the results in dest, (which has been partially filled earlier, from other information).
Parse() calls strtok() a variable number of times on line, and does some validation.
Each token (a pointer to what is returned by strtok()) is put into a queue 'til I know how many I have.
They are then copied onto a malloc'd char** in dest.
Now free(line)
and a function that free's each part of the char*[] in dest, both come up on valgrind as:
"Address 0x5179450 is 8 bytes inside a block of size 38 free'd"
or something similar.
I'm considering refactoring my code not to directly store the tokens on the the char** but instead store a copy of them (by mallocing space == to strlen(token)+1, then using strcpy()).
There is a function strdup which allocates memory and then copies another string into it.
You ask:
Is there anything I should know about
using strtok on a malloced string?
There are a number of things to be aware of. First, strtok() modifies the string as it processes it, inserting nulls ('\0') where the delimiters are found. This is not a problem with allocated memory (that's modifiable!); it is a problem if you try passing a constant string to strtok().
Second, you must have as many calls to free() as you do to malloc() and calloc() (but realloc() can mess with the counting).
In my code I have (in general terms)
char* line=getline();
Parse(dest,line);
free(line);
Unless Parse() allocates the space it keeps, you cannot use the dest structure (or, more precisely, the pointers into the line within the dest structure) after the call to free(). The free() releases the space that was allocated by getline() and any use of the pointers after that yields undefined behaviour. Note that undefined behaviour includes the option of 'appearing to work, but only by coincidence'.
where getline() is a function that
returns a char * to some malloced
memory, and Parse(dest, line) is a
function that does parsing online,
storing the results in dest (which
has been partially filled earlier,
from other information).
Parse() calls strtok() a a variable
number of times on line, and does some
validation. Each token (a pointer to
what is returned by strtok()) is put
into a queue 'til I know how many I
have.
Note that the pointers returned by strtok() are all pointers into the single chunk of space allocated by getline(). You have not described any extra memory allocation.
They are then copied onto a malloc'd
char** in dest.
This sounds as if you copy the pointers from strtok() into an array of pointers, but you do not attend to copying the data that those pointers are pointing at.
Now free(line) and a function that
free's each part of the char*[] in
dest,
both come up on valgrind as:
"Address 0x5179450 is 8 bytes inside a block of size 38 free'd"
or something similar.
The first free() of the 'char *[]' part of dest probably has a pointer to line and therefore frees the whole block of memory. All the subsequent frees on the parts of dest are trying to free an address not returned by malloc(), and valgrind is trying to tell you that. The free(line) operation then fails because the first free() of the pointers in dest already freed that space.
I'm considering refactoring my code
[to] store a copy of them [...].
The refactoring proposed is probably sensible; the function strdup() already mentioned by others will do the job neatly and reliably.
Note that after refactoring, you will still need to release line, but you will not release any of the pointers returned by strtok(). They are just pointers into the space managed by (identified by) line and will all be released when you release line.
Note that you will need to free each of the separately allocated (strdup()'d) strings as well as the array of character pointers that are accessed via dest.
Alternatively, do not free line immediately after calling Parse(). Have dest record the allocated pointer (line) and free that when it frees the array of pointers. You still do not release the pointers returned by strtok(), though.
they are then copied on to to a malloc'd char** in dest.
The strings are copied, or the pointers are copied? The strtok function modifies the string you give it so that it can give you pointers into that same string without copying anything. When you get tokens from it, you must copy them. Either that or keep the input string around as long as any of the token pointers are in use.
Many people recommend that you avoid strtok altogether because it's error-prone. Also, if you're using threads and the CRT is not thread-aware, strtok will likely crash your app.
1 in your parse(), strtok() only writes '\0' at every matching position. actually this step is nothing special. using strtok() is easy. of course it cannot be used on read-only memory buffer.
2 for each sub-string got in parse(), copy it to a malloc()ed buffer accordingly. if i give a simple example for storing the sub-strings, it looks like the below code, say conceptually, though it might not be exactly the same as your real code:
char **dest;
dest = (char**)malloc(N * sizeof(char*));
for (i: 0..N-1) {
dest[i] = (char*)malloc(LEN);
strcpy(dest[i], sub_strings[i]);
NOTE: above 2 lines could be just one line as below
dest[i] = strdup(sub_string[i]);
}
3 free dest, conceptually again:
for (i: 0..N-1) {
free(dest[i]);
}
free(dest);
4 call free(line) is nothing special too, and it doesn't affect your "dest" even a little.
"dest" and "line" use different memory buffer, so you can perform step 4 before step 3 if preferred. if you had following above steps, no errors would occur. seems you made mistacks in step 2 of your code.
Related
I've been testing out interactions between malloc() and various string functions in order to try to learn more about how pointers and memory work in C, but I'm a bit confused about the following interactions.
char *myString = malloc(5); // enough space for 5 characters (no '\0')
strcpy(myString, "Hello"); // shouldn't work since there isn't enough heap memory
printf(%s, %zd\n", myString, strlen(myString)); // also shouldn't work without '\0'
free(myString);
Everything above appears to work properly. I've tried using printf() for each character to see if the null terminator is present, but '\0' appears to just print as a blank space anyways.
My confusion lies in:
String literals will always have an implicit null terminator.
strcpy should copy over the null terminator onto myString, but there isn't enough allocated heap memory
printf/strlen shouldn't work unless myString has a terminator
Since myString apparently has a null terminator, where is it? Did it just get placed at a random memory location? Is the above code an error waiting to happen?
Addressing your three points:
String literals will always have an implicit null terminator.
Correct.
strcpy should copy over the null terminator onto myString, but there isn't enough allocated heap memory
strcpy has no way of knowing how large the destination buffer is, and will happily write past the end of it (overwritting whatever is after the buffer in memory. For information on this off-the-end-access look up 'buffer overrun' or 'buffer overflow'. These are common security weaknesses).
For a safer version, use strncpy which takes the length of the destination buffer as an argument so as not to write past the end of it.
printf/strlen shouldn't work unless myString has a terminator
The phrase 'shouldn't work' is a bit vague here. printf/strlen/etc will continue reading through memory until a null terminator is found, which could be immediately after the string or could be thousands of bytes away (in your case you have written the null terminator to the memory immediately after myString so printf/strlen/etc will stop there).
Lastly:
Is the above code an error waiting to happen?
Yes. You are overwriting memory that has not been allocated which could cause any manor of problems depending on what happened to be overwritten.
From the strcpy man page:
If the destination string of a strcpy() is not large enough, then anything might happen. Overflowing fixed-length string buffers is a favorite cracker technique for taking complete control of the machine. Any time a program reads or copies data into a buffer, the program first needs to check that there's enough space. This may be unnecessary if you can show that overflow is impossible, but be careful: programs can get changed over time, in ways that may make the impossible possible.
Say I have the code:
char* word = malloc (sizeof(char) * 6);
strcpy(word, "hello\0extra");
puts(word);
free(word);
This compiles just find and Valgrind has no issue, but is there actually a problem? It seems like I am writing into memory that I don't own.
Also, a separate issue, but when I do overfill my buffer with something like
char* word = malloc (sizeof(char) * 6);
strcpy(word, "1234567\0");
puts(word);
free(word);
It prints out 1234567 and Valgrind does catch the problem. What are the consequences of doing something like this? It seems to work every time. Please correct me if this is wrong, but from what I understand, it is possible for another program to take the memory past the 6 and write into it. If that happened, will printing the word just go on forever until it encounter a nul character? That character has just been really confusing for me in learning C strings.
The first strcpy is okay
strcpy(word, "hello\0extra");
You create a char array constant and pass the pointer to strcpy. All characters (including the first \0) is copied, the remainder is ignored.
But wait... You have some extra characters. This makes your const data section a bit larger. Could be a problem in embedded environment where flash space is rare. But there is no run-time problem.
strcpy(word, "hello\0extra");
This is valid because the second paramter should be a well formed string and it is because you have a \0 as your 6th character which forms a string of length 5.
strcpy(word, "1234567\0");
Here you are accessing memory which you don't own/allocated so this is an access violation and might cause crash.(seg fault)
With your first call to strcpy, NUL is inserted into the middle of the string. That means that functions that deal with null-terminated strings will think of your string as stopping with the first NUL, and the rest of your string is ignored. However, free will free all of it and valgrind will not report a problem because malloc will store the length of the buffer in the allocation table and free will use that entry to determine how many bytes to free. In other words, malloc and free are not meant to deal with null-terminated strings, so the NUL in the middle of the string will not affect them. Instead, free determines the length of the string based on how many bytes you allocated in the first place.
With the second example, you overflow the end of the buffer that was allocated by malloc. The results of that are undefined. In theory, that memory that you are writing to could have been allocated by another call to malloc, but in your example, nothing is done with the memory after your buffer, so it is harmless. The string-processing functions think of your string as ending with the first NUL, not with the end of the buffer allocated by malloc, so all of the string is printed out.
Your first question has a couple good answers already. About your second question, on the consequences of writing one byte past the end of your malloced memory:
It's doubtful that mallocing 6 bytes and writing 7 into it will cause a crash. malloc likes to align memory on certain boundaries, so it's not likely to give you six bytes right at the end of a page, such that there would be an access violation at byte 7. But if you malloc 65536 bytes and try to write past the end of that, your program might crash. Writing to invalid memory works a lot of the time, which makes debugging tricky, because you get random crashes only in certain situations.
I am trying to understand how to free up memory fully after calls to strtok(). I read most of the answered questions here and none seemed to address the point of my confusion. If this is a duplicate feel free to point me to the direction of something that answers my question
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main()
{
char * aliteral = "Hello/world/fine/you";
char * allocatedstring;
char * token;
int i=0;
allocatedstring=(char *) malloc(sizeof(allocatedstring)*21);
allocatedstring=strcpy(allocatedstring,aliteral);
token = strtok(allocatedstring, "/");
token = strtok(NULL, "/");
token = strtok(NULL, "/");
token = strtok(NULL, "/");
printf("%s\n",allocatedstring);
printf("%s\n",token);
free(allocatedstring);
return 0;
}
Freeing allocatedstring here only frees up the string up to the first \0 character that replaced strtok's delimiter. So it only clears up until "Hello". I checked that using eclipse debugger and monitoring the memory addresses.
How do I clear the rest of it? I tried 2 things, having 1 extra pointer point to the start of allocatedstring and freeing that (didnt work) and freeing token after call to strtok() (didnt work either)
So how do I clean up the parts of allocatedstring that are now between \0 's ?
EDIT : To clarify, seeing the memory address blocks in eclipse debugger, I was seeing the string "HELLO WORLD FINE YOU" in the memory blocks that were initially allocated by the call to malloc. After the call to free(), the blocks containing "HELLO" and the first \0 turned to gibberish, but the rest of the blocks kept the characters "FINE YOU". I assumed that meant that they were not freed.
free has no knowledge of \0 terminated strings.
free will free what was allocated with malloc, and should work properly in this situation.
If your evidence that free is not working is simply that the string data still exists, then you misunderstand free.
It does not zero out the memory. It simply marks it as available for use.
The original data remains in that memory. But the memory may be allocated by the next caller of malloc, and that caller will be able to overwrite your data at will, because you don't own that data anymore!
If you want that memory cleared (such as, if it contains a password or a security key), you must clear it out with something like memset, before you call free.
Again, free only marks the memory as "unallocated, available for use by malloc", and does not clear out the contents.
PS Some debugging systems, such as Visual Studio, will overwrite freed data, but only to make it obvious in the debugger that it has been freed. That behavior is not contractually needed in C, and only aids in debugging. Typically, freed memory may be filled with something like 0xdeadbeef.
You have a minor issue on this line:
allocatedstring=(char *) malloc(sizeof(allocatedstring)*21);
sizeof(allocatedstring) is equal to sizeof (char *); you're allocating enough space for 21 pointers to char, not 21 characters. This isn't a problem, since a char * is going to be at least as large as a char, but indicates some confusion about what types you're dealing with.
That said, you don't have to worry about the size of allocatedstring or *allocatedstring; since you're allocating enough space to hold the literal, you can do the following:
allocatedstring = malloc( strlen( aliteral ) + 1 ); // note no cast
As for the behavior you're seeing...
free is releasing all the memory associated with allocatedstring; the fact that part of the memory hadn't yet been overwritten when you checked isn't surprising, because free doesn't affect the contents of that memory; it will contain whatever was last written to it until something else allocates and uses it.
If I have a character pointer that contains NULL bytes is there any built in function I can use to find the length or will I just have to write my own function? Btw I'm using gcc.
EDIT:
Should have mentioned the character pointer was created using malloc().
If you have a pointer then the ONLY way to know the size is to store the size separately or have a unique value which terminates the string. (typically '\0') If you have neither of these, it simply cannot be done.
EDIT: since you have specified that you allocated the buffer using malloc then the answer is the paragraph above. You need to either remember how much you allocated with malloc or simply have a terminating value.
If you happen to have an array (like: char s[] = "hello\0world";) then you could resort to sizeof(s). But be very careful, the moment you try it with a pointer, you will get the size of the pointer, not the size of an array. (but strlen(s) would equal 5 since it counts up to the first '\0').
In addition, arrays decay to pointers when passed to functions. So if you pass the array to a function, you are back to square one.
NOTE:
void f(int *p) {}
and
void f(int p[]) {}
and
void f(int p[10]) {}
are all the same. In all 3 versions, p is a pointer, not an array.
How do you know where the string ends, if it contains NULL bytes as part of it? Certainly no built in function can work with strings like that. It'll interpret the first null byte as the end of the string.
If you want the length, you'll have to store it yourself. Keep in mind that no standard library string functions will work correctly on strings like these.
You'll need to keep track of the length yourself.
C strings are null terminated, meaning that the first null character signals the end of the string. All builtin string functions rely on this, so if you have a buffer that can contain NULLs as part of the data then you can't use them.
Since you're using malloc then you may need to keep track of two sizes: the size of your allocated buffer, and how many characters within that buffer constitute valid data.
Does a string created with 'strcpy' need to be freed? And how to free it?
Edit: The destination is allocated like this:
char* buffer[LEN];
strcpy itself doesn't allocate memory for the destination string so, no, it doesn't have to be freed.
Of course, if something else had allocated memory for it, then, yes, that memory should be freed eventually but that has nothing to do with strcpy.
That previous statement seems to be the case since your definition is an array of character pointers rather than an array of characters:
char* buffer[LEN];
and that will almost certainly be done with:
buffer[n] = malloc (length);
It's a good idea to start thinking in terms of responsibility for malloc'ed memory. By that, I mean passing a malloc'ed memory block may also involve passing the responsibility for freeing it at some point.
You just need to figure out (or decide, if it's your code) whether the responsibility for managing the memory goes along with the memory itself. With strcpy, even if you pass in an already-malloc'ed block for the destination, the responsibility is not being passed so you will still have to free that memory yourself. This allows you to easily pass in a malloc'ed or non-malloc'ed buffer without having to worry about it.
You may be thinking of strdup which is basically making a copy of a string by first allocating the memory for it. The string returned from that needs to be freed, definitely.
If you use
char buffer[6];
strcpy(buffer, "hello");
for example, then buffer is freed when the end of its scope is reached.
On the other hand,
char *buffer;
buffer = malloc(sizeof(char) * 6);
strcpy(buffer, "hello");
this way you need to free the memory you allocated.
But it doesn't actually have anything to do with strcpy, only about how you allocate your string.
You provide a pointer to the destination buffer for strcpy, so it depends on how you allocated that buffer as to whether it needs to be freed and how to free it.
For example if you allocated the buffer using malloc, then yes you will need to free it. If you allocated the buffer on the stack then no you will not, it will be released automatically when it goes out of scope.
The strcpy function copies a string into a buffer you need to get some other way (such as malloc); you should free that buffer using whatever mechanism is correct for how you allocated it.
strcpy() doesn't create a string, it only copies a string. Memory allocation is completely separated from that process.
So you have to take care of memory to which the string is copied - if it was allocated dynamically you have to free it at some point. Since you seem to have a stack-allocated buffer you don't have to do anything special - the buffer will be reclaimed when it goes out of scope.