I've been testing out interactions between malloc() and various string functions in order to try to learn more about how pointers and memory work in C, but I'm a bit confused about the following interactions.
char *myString = malloc(5); // enough space for 5 characters (no '\0')
strcpy(myString, "Hello"); // shouldn't work since there isn't enough heap memory
printf(%s, %zd\n", myString, strlen(myString)); // also shouldn't work without '\0'
free(myString);
Everything above appears to work properly. I've tried using printf() for each character to see if the null terminator is present, but '\0' appears to just print as a blank space anyways.
My confusion lies in:
String literals will always have an implicit null terminator.
strcpy should copy over the null terminator onto myString, but there isn't enough allocated heap memory
printf/strlen shouldn't work unless myString has a terminator
Since myString apparently has a null terminator, where is it? Did it just get placed at a random memory location? Is the above code an error waiting to happen?
Addressing your three points:
String literals will always have an implicit null terminator.
Correct.
strcpy should copy over the null terminator onto myString, but there isn't enough allocated heap memory
strcpy has no way of knowing how large the destination buffer is, and will happily write past the end of it (overwritting whatever is after the buffer in memory. For information on this off-the-end-access look up 'buffer overrun' or 'buffer overflow'. These are common security weaknesses).
For a safer version, use strncpy which takes the length of the destination buffer as an argument so as not to write past the end of it.
printf/strlen shouldn't work unless myString has a terminator
The phrase 'shouldn't work' is a bit vague here. printf/strlen/etc will continue reading through memory until a null terminator is found, which could be immediately after the string or could be thousands of bytes away (in your case you have written the null terminator to the memory immediately after myString so printf/strlen/etc will stop there).
Lastly:
Is the above code an error waiting to happen?
Yes. You are overwriting memory that has not been allocated which could cause any manor of problems depending on what happened to be overwritten.
From the strcpy man page:
If the destination string of a strcpy() is not large enough, then anything might happen. Overflowing fixed-length string buffers is a favorite cracker technique for taking complete control of the machine. Any time a program reads or copies data into a buffer, the program first needs to check that there's enough space. This may be unnecessary if you can show that overflow is impossible, but be careful: programs can get changed over time, in ways that may make the impossible possible.
Related
I have a C function that takes in a char* pointer. One of the function's preconditions is that the pointer argument is a null-terminated string
void foo(char *str) {
int length = strlen(str);
// ...
}
If str isn't a pointer to a null-terminated string, then strlen crashes. Is there a portable way to ensure that a char* pointer really does point to a null-terminated string?
I was thinking about using VirtualQuery to find lowest address after str that's not readable, and if we haven't seen a null-terminator between the beginning of str and that address, then str doesn't point to a null-terminated string.
No, there is no portable way to do that. A null-terminated string can be arbitrarily long (up to SIZE_MAX bytes) -- and so can a char array that isn't null-terminated. A function that takes a char* argument has no way of knowing how big a chunk of valid memory it points to, if any. A check would have to traverse memory until it finds a null character, which means that if there is no null character in array, it will go past the end of it, causing undefined behavior.
That's why the standard C library functions that take string pointers as arguments have undefined behavior of the argument doesn't point to a string. (Checking for a NULL pointer would be easy enough, but that would catch only one error case at the cost of slower execution for valid arguments.)
EDIT : Responding to your question's title:
Portable way to check if a char* pointer is a null-terminated string
a pointer cannot be a string. It may or may not be a pointer to a string.
To prove null termination of a string, you don't just have to prove that a null char exists, you have to prove that it exists at exactly the right spot (no later, but also no earlier). To do that you need to know the intended content or at least length of the string, at which point it is very simple to do the verification...
Consider e.g. a device w/o virtual memory: That means you can iterate over the whole address space without triggering any kind of interrupts.
If your stack is at a higher address than the heap and your compiler puts a copy of '\0' on the stack (instead of only keeping it in a register or using it as an immediate value), you are suddenly guaranteed that any string on the heap will be weakly zero-terminated in the sense that you will always be able to consider the '\0' that your verification code put on the stack as the zero-terminator.
The other answers are correct, but here's another way of thinking about it.
If the pointer points to a buffer of n chars, none of which are '\0', then as soon as you try to examine the n + 1 character, you're in the realm of undefined behavior. So, to scan to see if there's a '\0', it's not enough to know some upper bound of where the end of the buffer is, you have to know exactly where the end of the buffer is.
C doesn't give you a way to know that, other than to require that the caller provide it to you. VirtualQuery (assuming it were portable) is not enough because there may be other objects immediately after the buffer in memory. While it might appear to work on many implementations, the fact that you're relying on undefined behavior means that it's necessarily non-portable.
The best you can do is put an upper bounds on the size of the string with the strn functions. So if you are writing a library call and don't trust the caller, document your call noting that strings cannot be above a specific reasonable size and check:
#define MAXNAME 32
if (strnlen(sketchyName,MAXNAME)==MAXNAME) return ERROR;
As others have pointed out, there is no portable way to do this. The reason is that it isn't useful.
Normal semantics are to check for NULL only, and assume if a non-NULL is passed, it's valid. After all, there's likely to be a NULL somewhere after your pointer. The only other possibility is that you run into unmapped memory. It's more likely however that even with a bogus pointer, you find a NULL. That means a bogus 2000 character string will still get past the check.
Say I have the code:
char* word = malloc (sizeof(char) * 6);
strcpy(word, "hello\0extra");
puts(word);
free(word);
This compiles just find and Valgrind has no issue, but is there actually a problem? It seems like I am writing into memory that I don't own.
Also, a separate issue, but when I do overfill my buffer with something like
char* word = malloc (sizeof(char) * 6);
strcpy(word, "1234567\0");
puts(word);
free(word);
It prints out 1234567 and Valgrind does catch the problem. What are the consequences of doing something like this? It seems to work every time. Please correct me if this is wrong, but from what I understand, it is possible for another program to take the memory past the 6 and write into it. If that happened, will printing the word just go on forever until it encounter a nul character? That character has just been really confusing for me in learning C strings.
The first strcpy is okay
strcpy(word, "hello\0extra");
You create a char array constant and pass the pointer to strcpy. All characters (including the first \0) is copied, the remainder is ignored.
But wait... You have some extra characters. This makes your const data section a bit larger. Could be a problem in embedded environment where flash space is rare. But there is no run-time problem.
strcpy(word, "hello\0extra");
This is valid because the second paramter should be a well formed string and it is because you have a \0 as your 6th character which forms a string of length 5.
strcpy(word, "1234567\0");
Here you are accessing memory which you don't own/allocated so this is an access violation and might cause crash.(seg fault)
With your first call to strcpy, NUL is inserted into the middle of the string. That means that functions that deal with null-terminated strings will think of your string as stopping with the first NUL, and the rest of your string is ignored. However, free will free all of it and valgrind will not report a problem because malloc will store the length of the buffer in the allocation table and free will use that entry to determine how many bytes to free. In other words, malloc and free are not meant to deal with null-terminated strings, so the NUL in the middle of the string will not affect them. Instead, free determines the length of the string based on how many bytes you allocated in the first place.
With the second example, you overflow the end of the buffer that was allocated by malloc. The results of that are undefined. In theory, that memory that you are writing to could have been allocated by another call to malloc, but in your example, nothing is done with the memory after your buffer, so it is harmless. The string-processing functions think of your string as ending with the first NUL, not with the end of the buffer allocated by malloc, so all of the string is printed out.
Your first question has a couple good answers already. About your second question, on the consequences of writing one byte past the end of your malloced memory:
It's doubtful that mallocing 6 bytes and writing 7 into it will cause a crash. malloc likes to align memory on certain boundaries, so it's not likely to give you six bytes right at the end of a page, such that there would be an access violation at byte 7. But if you malloc 65536 bytes and try to write past the end of that, your program might crash. Writing to invalid memory works a lot of the time, which makes debugging tricky, because you get random crashes only in certain situations.
I have 2 questions..
is it necessary to add a termination character when executing the following commands against a char *string ?
strcpy();
strncpy();
Is it necessary to allocate memory before before doing any operation with the above to function against the char *string ?
for example..
char *str;
str = malloc(strlen(texttocopy));
strcpy(texttocopy, str); // see the below edit
Please explain.
EDIT :
in the above code I inverted the argument. it is just typo i made while asking the question here. The correct way should be
strcpy(str, texttocopy); // :)
The strcpy function always adds the terminator, but strncpy may not do it in some cases.
And for the second question, yes you need to make sure there is enough memory allocated for the destination. In your example you have not allocated enough memory, and will have a buffer overflow. Remember that strlen returns the length of the string without counting the terminator. You also have inverted the arguments to strcpy, the destination is the first argument.
'strcpy' function copies data from source to destination address including with '\0' termination character . 'strncpy' function copies data as the same way but if there is no termination character '\0' exists in the first n bytes to be copied, termination character will not be copied then and you will need to add it by yourself to terminate the string.
You will always have to statically or dynamically allocate a memory space to play with. Therefore, you should declare a character array or dynamically allocate a chunk of memory first then you can play nice with your strings
I'm writing a C code for a class. This class requires that our code compile and run on the school server, which is a sparc solaris machine. I'm running Linux x64.
I have this line to parse (THIS IS NOT ACTUAL CODE BUT IS INPUT TO MY PROGRAM):
while ( cond1 ){
I need to capture the "while" and the "cond1" into separate strings. I've been using strtok() to do this. In Linux, the following lines:
char *cond = NULL;
cond = (char *)malloc(sizeof(char));
memset(cond, 0, sizeof(char));
strcpy(cond, strtok(NULL, ": \t\(){")); //already got the "while" out of the line
will correctly capture the string "cond1".Running this on the solaris machine, however, gives me the string "cone1".
Note that in plenty of other cases within my program, strings are being copied correctly. (For instance, the "while") was captured correctly.
Does anyone know what is going on here?
The line:
cond = (char *)malloc(sizeof(char));
allocates exactly one char for storage, into which you are then copying more than one - strcpy needs to put, at a bare minimum, the null terminator but, in your case, also the results of your strtok as well.
The reason it may work on a different system is that some implementations of malloc will allocate at a certain resolution (e.g., a multiple of 16 bytes) no matter what actual value you ask for, so you may have some free space there at the end of your buffer. But what you're attempting is still very much undefined behaviour.
The fact that the undefined behaviour may be to work sometimes in no way abrogates your responsibility to avoid such behaviour.
Allocate enough space for storing the results of your strtok and you should be okay.
The safest way to do this is to dynamically allocate the space so that it's at least as big as the string you're passing to strtok. That way there can be no possibility of overflow (other than weird edge cases where other threads may modify the data behind your back but, if that were the case, strtok would be a very bad choice anyway).
Something like (if instr is your original input string):
cond = (char*)malloc(strlen(instr)+1);
This guarantees that any token extracted from instr will fit within cond.
As an aside, sizeof(char) is always 1 by definition, so you don't need to multiply by it.
cond is being allocated one byte. strcpy is copying at least two bytes to that allocation. That is, you are writing more bytes into the allocation than there is room for.
One way to fix it to use char *cond = malloc (1000); instead of what you've got.
You only allocated memory for 1 character but you trying to store at least 6 characters (you need space for the terminating \0). The quick and dirty way to solve this is just say
char cond[128]
instead of malloc.
Is there anything I should know about using strtok on a malloced string?
In my code I have (in general terms)
char* line=getline();
Parse(dest,line);
free(line);
where getline() is a function that returns a char * to some malloced memory.
and Parse(dest, line) is a function that does parsing online, storing the results in dest, (which has been partially filled earlier, from other information).
Parse() calls strtok() a variable number of times on line, and does some validation.
Each token (a pointer to what is returned by strtok()) is put into a queue 'til I know how many I have.
They are then copied onto a malloc'd char** in dest.
Now free(line)
and a function that free's each part of the char*[] in dest, both come up on valgrind as:
"Address 0x5179450 is 8 bytes inside a block of size 38 free'd"
or something similar.
I'm considering refactoring my code not to directly store the tokens on the the char** but instead store a copy of them (by mallocing space == to strlen(token)+1, then using strcpy()).
There is a function strdup which allocates memory and then copies another string into it.
You ask:
Is there anything I should know about
using strtok on a malloced string?
There are a number of things to be aware of. First, strtok() modifies the string as it processes it, inserting nulls ('\0') where the delimiters are found. This is not a problem with allocated memory (that's modifiable!); it is a problem if you try passing a constant string to strtok().
Second, you must have as many calls to free() as you do to malloc() and calloc() (but realloc() can mess with the counting).
In my code I have (in general terms)
char* line=getline();
Parse(dest,line);
free(line);
Unless Parse() allocates the space it keeps, you cannot use the dest structure (or, more precisely, the pointers into the line within the dest structure) after the call to free(). The free() releases the space that was allocated by getline() and any use of the pointers after that yields undefined behaviour. Note that undefined behaviour includes the option of 'appearing to work, but only by coincidence'.
where getline() is a function that
returns a char * to some malloced
memory, and Parse(dest, line) is a
function that does parsing online,
storing the results in dest (which
has been partially filled earlier,
from other information).
Parse() calls strtok() a a variable
number of times on line, and does some
validation. Each token (a pointer to
what is returned by strtok()) is put
into a queue 'til I know how many I
have.
Note that the pointers returned by strtok() are all pointers into the single chunk of space allocated by getline(). You have not described any extra memory allocation.
They are then copied onto a malloc'd
char** in dest.
This sounds as if you copy the pointers from strtok() into an array of pointers, but you do not attend to copying the data that those pointers are pointing at.
Now free(line) and a function that
free's each part of the char*[] in
dest,
both come up on valgrind as:
"Address 0x5179450 is 8 bytes inside a block of size 38 free'd"
or something similar.
The first free() of the 'char *[]' part of dest probably has a pointer to line and therefore frees the whole block of memory. All the subsequent frees on the parts of dest are trying to free an address not returned by malloc(), and valgrind is trying to tell you that. The free(line) operation then fails because the first free() of the pointers in dest already freed that space.
I'm considering refactoring my code
[to] store a copy of them [...].
The refactoring proposed is probably sensible; the function strdup() already mentioned by others will do the job neatly and reliably.
Note that after refactoring, you will still need to release line, but you will not release any of the pointers returned by strtok(). They are just pointers into the space managed by (identified by) line and will all be released when you release line.
Note that you will need to free each of the separately allocated (strdup()'d) strings as well as the array of character pointers that are accessed via dest.
Alternatively, do not free line immediately after calling Parse(). Have dest record the allocated pointer (line) and free that when it frees the array of pointers. You still do not release the pointers returned by strtok(), though.
they are then copied on to to a malloc'd char** in dest.
The strings are copied, or the pointers are copied? The strtok function modifies the string you give it so that it can give you pointers into that same string without copying anything. When you get tokens from it, you must copy them. Either that or keep the input string around as long as any of the token pointers are in use.
Many people recommend that you avoid strtok altogether because it's error-prone. Also, if you're using threads and the CRT is not thread-aware, strtok will likely crash your app.
1 in your parse(), strtok() only writes '\0' at every matching position. actually this step is nothing special. using strtok() is easy. of course it cannot be used on read-only memory buffer.
2 for each sub-string got in parse(), copy it to a malloc()ed buffer accordingly. if i give a simple example for storing the sub-strings, it looks like the below code, say conceptually, though it might not be exactly the same as your real code:
char **dest;
dest = (char**)malloc(N * sizeof(char*));
for (i: 0..N-1) {
dest[i] = (char*)malloc(LEN);
strcpy(dest[i], sub_strings[i]);
NOTE: above 2 lines could be just one line as below
dest[i] = strdup(sub_string[i]);
}
3 free dest, conceptually again:
for (i: 0..N-1) {
free(dest[i]);
}
free(dest);
4 call free(line) is nothing special too, and it doesn't affect your "dest" even a little.
"dest" and "line" use different memory buffer, so you can perform step 4 before step 3 if preferred. if you had following above steps, no errors would occur. seems you made mistacks in step 2 of your code.