Overflow not detected when writing nul character in middle of string? - c

Say I have the code:
char* word = malloc (sizeof(char) * 6);
strcpy(word, "hello\0extra");
puts(word);
free(word);
This compiles just find and Valgrind has no issue, but is there actually a problem? It seems like I am writing into memory that I don't own.
Also, a separate issue, but when I do overfill my buffer with something like
char* word = malloc (sizeof(char) * 6);
strcpy(word, "1234567\0");
puts(word);
free(word);
It prints out 1234567 and Valgrind does catch the problem. What are the consequences of doing something like this? It seems to work every time. Please correct me if this is wrong, but from what I understand, it is possible for another program to take the memory past the 6 and write into it. If that happened, will printing the word just go on forever until it encounter a nul character? That character has just been really confusing for me in learning C strings.

The first strcpy is okay
strcpy(word, "hello\0extra");
You create a char array constant and pass the pointer to strcpy. All characters (including the first \0) is copied, the remainder is ignored.
But wait... You have some extra characters. This makes your const data section a bit larger. Could be a problem in embedded environment where flash space is rare. But there is no run-time problem.

strcpy(word, "hello\0extra");
This is valid because the second paramter should be a well formed string and it is because you have a \0 as your 6th character which forms a string of length 5.
strcpy(word, "1234567\0");
Here you are accessing memory which you don't own/allocated so this is an access violation and might cause crash.(seg fault)

With your first call to strcpy, NUL is inserted into the middle of the string. That means that functions that deal with null-terminated strings will think of your string as stopping with the first NUL, and the rest of your string is ignored. However, free will free all of it and valgrind will not report a problem because malloc will store the length of the buffer in the allocation table and free will use that entry to determine how many bytes to free. In other words, malloc and free are not meant to deal with null-terminated strings, so the NUL in the middle of the string will not affect them. Instead, free determines the length of the string based on how many bytes you allocated in the first place.
With the second example, you overflow the end of the buffer that was allocated by malloc. The results of that are undefined. In theory, that memory that you are writing to could have been allocated by another call to malloc, but in your example, nothing is done with the memory after your buffer, so it is harmless. The string-processing functions think of your string as ending with the first NUL, not with the end of the buffer allocated by malloc, so all of the string is printed out.

Your first question has a couple good answers already. About your second question, on the consequences of writing one byte past the end of your malloced memory:
It's doubtful that mallocing 6 bytes and writing 7 into it will cause a crash. malloc likes to align memory on certain boundaries, so it's not likely to give you six bytes right at the end of a page, such that there would be an access violation at byte 7. But if you malloc 65536 bytes and try to write past the end of that, your program might crash. Writing to invalid memory works a lot of the time, which makes debugging tricky, because you get random crashes only in certain situations.

Related

malloc and strcpy interactions

I've been testing out interactions between malloc() and various string functions in order to try to learn more about how pointers and memory work in C, but I'm a bit confused about the following interactions.
char *myString = malloc(5); // enough space for 5 characters (no '\0')
strcpy(myString, "Hello"); // shouldn't work since there isn't enough heap memory
printf(%s, %zd\n", myString, strlen(myString)); // also shouldn't work without '\0'
free(myString);
Everything above appears to work properly. I've tried using printf() for each character to see if the null terminator is present, but '\0' appears to just print as a blank space anyways.
My confusion lies in:
String literals will always have an implicit null terminator.
strcpy should copy over the null terminator onto myString, but there isn't enough allocated heap memory
printf/strlen shouldn't work unless myString has a terminator
Since myString apparently has a null terminator, where is it? Did it just get placed at a random memory location? Is the above code an error waiting to happen?
Addressing your three points:
String literals will always have an implicit null terminator.
Correct.
strcpy should copy over the null terminator onto myString, but there isn't enough allocated heap memory
strcpy has no way of knowing how large the destination buffer is, and will happily write past the end of it (overwritting whatever is after the buffer in memory. For information on this off-the-end-access look up 'buffer overrun' or 'buffer overflow'. These are common security weaknesses).
For a safer version, use strncpy which takes the length of the destination buffer as an argument so as not to write past the end of it.
printf/strlen shouldn't work unless myString has a terminator
The phrase 'shouldn't work' is a bit vague here. printf/strlen/etc will continue reading through memory until a null terminator is found, which could be immediately after the string or could be thousands of bytes away (in your case you have written the null terminator to the memory immediately after myString so printf/strlen/etc will stop there).
Lastly:
Is the above code an error waiting to happen?
Yes. You are overwriting memory that has not been allocated which could cause any manor of problems depending on what happened to be overwritten.
From the strcpy man page:
If the destination string of a strcpy() is not large enough, then anything might happen. Overflowing fixed-length string buffers is a favorite cracker technique for taking complete control of the machine. Any time a program reads or copies data into a buffer, the program first needs to check that there's enough space. This may be unnecessary if you can show that overflow is impossible, but be careful: programs can get changed over time, in ways that may make the impossible possible.

c strcat overwrite source string?

I'm a Java programmer struggling to pick up C. In particular, I am struggling to understand strcat(). If I call:
strcat(dst, src);
I get that strcat() will modify my dst String. But shouldn't it leave the src String alone? Consider the below code:
#include<stdio.h>
#include<string.h>
void printStuff(char* a, char* b){
printf("----------------------------------------------\n");
printf("src: (%d chars)\t\"%s\"\n",strlen(a),a);
printf("dst: (%d chars)\t\"%s\"\n",strlen(b),b);
printf("----------------------------------------------\n");
}
int main()
{
char src[25], dst[25];
strcpy(src, "This is source123");
strcpy(dst, "This is destination");
printStuff(src, dst);
strcat(dst, src);
printStuff(src, dst);
return 0;
}
Which produces this output on my Linux box, compiling with GCC:
----------------------------------------------
src: (17 chars) "This is source123"
dst: (19 chars) "This is destination"
----------------------------------------------
----------------------------------------------
src: (4 chars) "e123"
dst: (36 chars) "This is destinationThis is source123"
----------------------------------------------
I'm assuming that the full "This is source123" String is still in memory and strcat() has advanced the char* src pointer forward 13 chars. But why? Why 13 chars? I've played around with the length of the dst string, and it definitely has an impact on the src pointer after strcat() is done. But I don't understand why...
Also... how would you debug this, in GDB, say? I tried "step" to step into the strcat() function, but I guess that function wasn't analyzed by the debugger; "step" did nothing.
Thanks!
-ROA
PS - One quick note, I did read through similar strcat() posts on this site, but didn't see one that seemed to directly apply to my question. Apologies if I missed the post which did.
Your destination doesn't have enough memory allocated to hold the new concatenated string. In this case this means that src is probably being overwritten by strcat due to it writing beyond the bounds of dst.
Allocate enough memory for dst and it should work without it overwriting the source string.
Note that the new memory segment that holds the concatenated strings needs to be at least the size of the two strings(in your case 36) plus space for the null terminator.
Yes, I'm sure everything to do with manual memory management comes with some difficulty if your background is strictly Java.
With regard to anything related to C strings, it will probably be useful to put everything you know about Java Strings out of your head. The closest Java analogs of C strings are char[] and byte[]. Even there you can get in trouble, however, because Java performs bounds-checking for you, but C does not. In fact, C allows you to do all manner of things that you oughtn't to do, all the while standing back and quietly murmuring, "who knows what will happen if you do that?".
In particular, when you call strcat() or any other function that writes into a char array, you are responsible for ensuring that there is enough space in the destination array to accommodate the characters. If there isn't, then the resulting behavior is undefined (who knows what will happen?). You exercised just such undefined behavior.
Generally speaking, you need to do one or more of these things:
Have a hard upper bound on the size that could be needed, and allocate at least that much space, or
Know how much space you have, and work within that space (e.g. truncate any excess), or
Track how much space you have and how much space you need, and allocate more space as needed (being sure to later free all dynamically allocated space when you no longer need it).

Why does this intentionally incorrect use of strcpy not fail horribly?

Why does the below C code using strcpy work just fine for me? I tried to make it fail in two ways:
1) I tried strcpy from a string literal into allocated memory that was too small to contain it. It copied the whole thing and didn't complain.
2) I tried strcpy from an array that was not NUL-terminated. The strcpy and the printf worked just fine. I had thought that strcpy copied chars until a NUL was found, but none was present and it still stopped.
Why don't these fail? Am I just getting "lucky" in some way, or am I misunderstanding how this function works? Is it specific to my platform (OS X Lion), or do most modern platforms work this way?
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main() {
char *src1 = "123456789";
char *dst1 = (char *)malloc( 5 );
char src2[5] = {'h','e','l','l','o'};
char *dst2 = (char *)malloc( 6 );
printf("src1: %s\n", src1);
strcpy(dst1, src1);
printf("dst1: %s\n", dst1);
strcpy(dst2, src2);
printf("src2: %s\n", src2);
dst2[5] = '\0';
printf("dst2: %s\n", dst2);
return 0;
}
The output from running this code is:
$ ./a.out
src1: 123456789
dst1: 123456789
src2: hello
dst2: hello
First, copying into an array that is too small:
C has no protection for going past array bounds, so if there is nothing sensitive at dst1[5..9], then you get lucky, and the copy goes into memory that you don't rightfully own, but it doesn't crash either. However, that memory is not safe, because it has not been allocated to your variable. Another variable may well have that memory allocated to it, and later overwrite the data you put in there, corrupting your string later on.
Secondly, copying from an array that is not null-terminated:
Even though we're usually taught that memory is full of arbitrary data, huge chunks of it are zero'd out. Even though you didn't put a null-terminator in src2, chances are good that src[5] happens to be \0 anyway. This makes the copy succeed. Note that this is NOT guaranteed, and could fail on any run, on any platform, at anytime. But you got lucky this time (and probably most of the time), and it worked.
Overwriting beyond the bounds of allocated memory causes Undefined Behavior.
So in a way yes you got lucky.
Undefined behavior means anything can happen and the behavior cannot be explained as the Standard, which defines the rules of the language, does not define any behavior.
EDIT:
On Second thoughts, I would say you are really Unlucky here that the program works fine and does not crash. It works now does not mean it will work always, In fact it is a bomb ticking to blow off.
As per Murphy's Law:
"Anything that can go wrong will go wrong"["and most likely at the most inconvenient possible moment"]
[ ]- Is my edit to the Law :)
Yes, you're quite simply getting lucky.
Typically, the heap is contiguous. This means that when you write past the malloced memory, you could be corrupting the following memory block, or some internal data structures that may exist between user memory blocks. Such corruption often manifests itself long after the offending code, which makes debugging this type of bugs difficult.
You're probably getting the NULs because the memory happens to be zero-filled (which isn't guaranteed).
As #Als said, this is undefined behaviour. This may crash, but it doesn't have to.
Many memory managers allocate in larger chunks of memory and then hand it to the "user" in smaller chunks, probably a mutliple of 4 or 8 bytes. So your write over the boundary probably simply writes into the extra bytes allocated. Or it overwrites one of the other variables you have.
You're not malloc-ing enough bytes there. The first string, "123456789" is 10 bytes (the null terminator is present), and {'h','e','l','l','o'} is 6 bytes (again, making room for the null terminator). You're currently clobbering the memory with that code, which leads to undefined (i.e. odd) behavior.

Strcpy() corrupts the copied string in Solaris but not Linux

I'm writing a C code for a class. This class requires that our code compile and run on the school server, which is a sparc solaris machine. I'm running Linux x64.
I have this line to parse (THIS IS NOT ACTUAL CODE BUT IS INPUT TO MY PROGRAM):
while ( cond1 ){
I need to capture the "while" and the "cond1" into separate strings. I've been using strtok() to do this. In Linux, the following lines:
char *cond = NULL;
cond = (char *)malloc(sizeof(char));
memset(cond, 0, sizeof(char));
strcpy(cond, strtok(NULL, ": \t\(){")); //already got the "while" out of the line
will correctly capture the string "cond1".Running this on the solaris machine, however, gives me the string "cone1".
Note that in plenty of other cases within my program, strings are being copied correctly. (For instance, the "while") was captured correctly.
Does anyone know what is going on here?
The line:
cond = (char *)malloc(sizeof(char));
allocates exactly one char for storage, into which you are then copying more than one - strcpy needs to put, at a bare minimum, the null terminator but, in your case, also the results of your strtok as well.
The reason it may work on a different system is that some implementations of malloc will allocate at a certain resolution (e.g., a multiple of 16 bytes) no matter what actual value you ask for, so you may have some free space there at the end of your buffer. But what you're attempting is still very much undefined behaviour.
The fact that the undefined behaviour may be to work sometimes in no way abrogates your responsibility to avoid such behaviour.
Allocate enough space for storing the results of your strtok and you should be okay.
The safest way to do this is to dynamically allocate the space so that it's at least as big as the string you're passing to strtok. That way there can be no possibility of overflow (other than weird edge cases where other threads may modify the data behind your back but, if that were the case, strtok would be a very bad choice anyway).
Something like (if instr is your original input string):
cond = (char*)malloc(strlen(instr)+1);
This guarantees that any token extracted from instr will fit within cond.
As an aside, sizeof(char) is always 1 by definition, so you don't need to multiply by it.
cond is being allocated one byte. strcpy is copying at least two bytes to that allocation. That is, you are writing more bytes into the allocation than there is room for.
One way to fix it to use char *cond = malloc (1000); instead of what you've got.
You only allocated memory for 1 character but you trying to store at least 6 characters (you need space for the terminating \0). The quick and dirty way to solve this is just say
char cond[128]
instead of malloc.

Using strtok() on an allocated string?

Is there anything I should know about using strtok on a malloced string?
In my code I have (in general terms)
char* line=getline();
Parse(dest,line);
free(line);
where getline() is a function that returns a char * to some malloced memory.
and Parse(dest, line) is a function that does parsing online, storing the results in dest, (which has been partially filled earlier, from other information).
Parse() calls strtok() a variable number of times on line, and does some validation.
Each token (a pointer to what is returned by strtok()) is put into a queue 'til I know how many I have.
They are then copied onto a malloc'd char** in dest.
Now free(line)
and a function that free's each part of the char*[] in dest, both come up on valgrind as:
"Address 0x5179450 is 8 bytes inside a block of size 38 free'd"
or something similar.
I'm considering refactoring my code not to directly store the tokens on the the char** but instead store a copy of them (by mallocing space == to strlen(token)+1, then using strcpy()).
There is a function strdup which allocates memory and then copies another string into it.
You ask:
Is there anything I should know about
using strtok on a malloced string?
There are a number of things to be aware of. First, strtok() modifies the string as it processes it, inserting nulls ('\0') where the delimiters are found. This is not a problem with allocated memory (that's modifiable!); it is a problem if you try passing a constant string to strtok().
Second, you must have as many calls to free() as you do to malloc() and calloc() (but realloc() can mess with the counting).
In my code I have (in general terms)
char* line=getline();
Parse(dest,line);
free(line);
Unless Parse() allocates the space it keeps, you cannot use the dest structure (or, more precisely, the pointers into the line within the dest structure) after the call to free(). The free() releases the space that was allocated by getline() and any use of the pointers after that yields undefined behaviour. Note that undefined behaviour includes the option of 'appearing to work, but only by coincidence'.
where getline() is a function that
returns a char * to some malloced
memory, and Parse(dest, line) is a
function that does parsing online,
storing the results in dest (which
has been partially filled earlier,
from other information).
Parse() calls strtok() a a variable
number of times on line, and does some
validation. Each token (a pointer to
what is returned by strtok()) is put
into a queue 'til I know how many I
have.
Note that the pointers returned by strtok() are all pointers into the single chunk of space allocated by getline(). You have not described any extra memory allocation.
They are then copied onto a malloc'd
char** in dest.
This sounds as if you copy the pointers from strtok() into an array of pointers, but you do not attend to copying the data that those pointers are pointing at.
Now free(line) and a function that
free's each part of the char*[] in
dest,
both come up on valgrind as:
"Address 0x5179450 is 8 bytes inside a block of size 38 free'd"
or something similar.
The first free() of the 'char *[]' part of dest probably has a pointer to line and therefore frees the whole block of memory. All the subsequent frees on the parts of dest are trying to free an address not returned by malloc(), and valgrind is trying to tell you that. The free(line) operation then fails because the first free() of the pointers in dest already freed that space.
I'm considering refactoring my code
[to] store a copy of them [...].
The refactoring proposed is probably sensible; the function strdup() already mentioned by others will do the job neatly and reliably.
Note that after refactoring, you will still need to release line, but you will not release any of the pointers returned by strtok(). They are just pointers into the space managed by (identified by) line and will all be released when you release line.
Note that you will need to free each of the separately allocated (strdup()'d) strings as well as the array of character pointers that are accessed via dest.
Alternatively, do not free line immediately after calling Parse(). Have dest record the allocated pointer (line) and free that when it frees the array of pointers. You still do not release the pointers returned by strtok(), though.
they are then copied on to to a malloc'd char** in dest.
The strings are copied, or the pointers are copied? The strtok function modifies the string you give it so that it can give you pointers into that same string without copying anything. When you get tokens from it, you must copy them. Either that or keep the input string around as long as any of the token pointers are in use.
Many people recommend that you avoid strtok altogether because it's error-prone. Also, if you're using threads and the CRT is not thread-aware, strtok will likely crash your app.
1 in your parse(), strtok() only writes '\0' at every matching position. actually this step is nothing special. using strtok() is easy. of course it cannot be used on read-only memory buffer.
2 for each sub-string got in parse(), copy it to a malloc()ed buffer accordingly. if i give a simple example for storing the sub-strings, it looks like the below code, say conceptually, though it might not be exactly the same as your real code:
char **dest;
dest = (char**)malloc(N * sizeof(char*));
for (i: 0..N-1) {
dest[i] = (char*)malloc(LEN);
strcpy(dest[i], sub_strings[i]);
NOTE: above 2 lines could be just one line as below
dest[i] = strdup(sub_string[i]);
}
3 free dest, conceptually again:
for (i: 0..N-1) {
free(dest[i]);
}
free(dest);
4 call free(line) is nothing special too, and it doesn't affect your "dest" even a little.
"dest" and "line" use different memory buffer, so you can perform step 4 before step 3 if preferred. if you had following above steps, no errors would occur. seems you made mistacks in step 2 of your code.

Resources