c strcat overwrite source string? - c

I'm a Java programmer struggling to pick up C. In particular, I am struggling to understand strcat(). If I call:
strcat(dst, src);
I get that strcat() will modify my dst String. But shouldn't it leave the src String alone? Consider the below code:
#include<stdio.h>
#include<string.h>
void printStuff(char* a, char* b){
printf("----------------------------------------------\n");
printf("src: (%d chars)\t\"%s\"\n",strlen(a),a);
printf("dst: (%d chars)\t\"%s\"\n",strlen(b),b);
printf("----------------------------------------------\n");
}
int main()
{
char src[25], dst[25];
strcpy(src, "This is source123");
strcpy(dst, "This is destination");
printStuff(src, dst);
strcat(dst, src);
printStuff(src, dst);
return 0;
}
Which produces this output on my Linux box, compiling with GCC:
----------------------------------------------
src: (17 chars) "This is source123"
dst: (19 chars) "This is destination"
----------------------------------------------
----------------------------------------------
src: (4 chars) "e123"
dst: (36 chars) "This is destinationThis is source123"
----------------------------------------------
I'm assuming that the full "This is source123" String is still in memory and strcat() has advanced the char* src pointer forward 13 chars. But why? Why 13 chars? I've played around with the length of the dst string, and it definitely has an impact on the src pointer after strcat() is done. But I don't understand why...
Also... how would you debug this, in GDB, say? I tried "step" to step into the strcat() function, but I guess that function wasn't analyzed by the debugger; "step" did nothing.
Thanks!
-ROA
PS - One quick note, I did read through similar strcat() posts on this site, but didn't see one that seemed to directly apply to my question. Apologies if I missed the post which did.

Your destination doesn't have enough memory allocated to hold the new concatenated string. In this case this means that src is probably being overwritten by strcat due to it writing beyond the bounds of dst.
Allocate enough memory for dst and it should work without it overwriting the source string.
Note that the new memory segment that holds the concatenated strings needs to be at least the size of the two strings(in your case 36) plus space for the null terminator.

Yes, I'm sure everything to do with manual memory management comes with some difficulty if your background is strictly Java.
With regard to anything related to C strings, it will probably be useful to put everything you know about Java Strings out of your head. The closest Java analogs of C strings are char[] and byte[]. Even there you can get in trouble, however, because Java performs bounds-checking for you, but C does not. In fact, C allows you to do all manner of things that you oughtn't to do, all the while standing back and quietly murmuring, "who knows what will happen if you do that?".
In particular, when you call strcat() or any other function that writes into a char array, you are responsible for ensuring that there is enough space in the destination array to accommodate the characters. If there isn't, then the resulting behavior is undefined (who knows what will happen?). You exercised just such undefined behavior.
Generally speaking, you need to do one or more of these things:
Have a hard upper bound on the size that could be needed, and allocate at least that much space, or
Know how much space you have, and work within that space (e.g. truncate any excess), or
Track how much space you have and how much space you need, and allocate more space as needed (being sure to later free all dynamically allocated space when you no longer need it).

Related

Confusion in "strcat function in C assumes the destination string is large enough to hold contents of source string and its own."

So I read that strcat function is to be used carefully as the destination string should be large enough to hold contents of its own and source string. And it was true for the following program that I wrote:
#include <stdio.h>
#include <string.h>
int main(){
char *src, *dest;
printf("Enter Source String : ");
fgets(src, 10, stdin);
printf("Enter destination String : ");
fgets(dest, 20, stdin);
strcat(dest, src);
printf("Concatenated string is %s", dest);
return 0;
}
But not true for the one that I wrote here:
#include <stdio.h>
#include <string.h>
int main(){
char src[11] = "Hello ABC";
char dest[15] = "Hello DEFGIJK";
strcat(dest, src);
printf("concatenated string %s", dest);
getchar();
return 0;
}
This program ends up adding both without considering that destination string is not large enough. Why is it so?
The strcat function has no way of knowing exactly how long the destination buffer is, so it assumes that the buffer passed to it is large enough. If it's not, you invoke undefined behavior by writing past the end of the buffer. That's what's happening in the second piece of code.
The first piece of code is also invalid because both src and dest are uninitialized pointers. When you pass them to fgets, it reads whatever garbage value they contain, treats it as a valid address, then tries to write values to that invalid address. This is also undefined behavior.
One of the things that makes C fast is that it doesn't check to make sure you follow the rules. It just tells you the rules and assumes that you follow them, and if you don't bad things may or may not happen. In your particular case it appeared to work but there's no guarantee of that.
For example, when I ran your second piece of code it also appeared to work. But if I changed it to this:
#include <stdio.h>
#include <string.h>
int main(){
char dest[15] = "Hello DEFGIJK";
strcat(dest, "Hello ABC XXXXXXXXXX");
printf("concatenated string %s", dest);
return 0;
}
The program crashes.
I think your confusion is not actually about the definition of strcat. Your real confusion is that you assumed that the C compiler would enforce all the "rules". That assumption is quite false.
Yes, the first argument to strcat must be a pointer to memory sufficient to store the concatenated result. In both of your programs, that requirement is violated. You may be getting the impression, from the lack of error messages in either program, that perhaps the rule isn't what you thought it was, that somehow it's valid to call strcat even when the first argument is not a pointer to enough memory. But no, that's not the case: calling strcat when there's not enough memory is definitely wrong. The fact that there were no error messages, or that one or both programs appeared to "work", proves nothing.
Here's an analogy. (You may even have had this experience when you were a child.) Suppose your mother tells you not to run across the street, because you might get hit by a car. Suppose you run across the street anyway, and do not get hit by a car. Do you conclude that your mother's advice was incorrect? Is this a valid conclusion?
In summary, what you read was correct: strcat must be used carefully. But let's rephrase that: you must be careful when calling strcat. If you're not careful, all sorts of things can go wrong, without any warning. In fact, many style guides recommend not using functions such as strcat at all, because they're so easy to misuse if you're careless. (Functions such as strcat can be used perfectly safely as long as you're careful -- but of course not all programmers are sufficiently careful.)
The strcat() function is indeed to be used carefully because it doesn't protect you from anything. If the source string isn't NULL-terminated, the destination string isn't NULL-terminated, or the destination string doesn't have enough space, strcat will still copy data. Therefore, it is easy to overwrite data you didn't mean to overwrite. It is your responsibility to make sure you have enough space. Using strncat() instead of strcat will also give you some extra safety.
Edit Here's an example:
#include <stdio.h>
#include <string.h>
int main()
{
char s1[16] = {0};
char s2[16] = {0};
strcpy(s2, "0123456789abcdefOOPS WAY TOO LONG");
/* ^^^ purposefully copy too much data into s2 */
printf("-%s-\n",s1);
return 0;
}
I never assigned to s1, so the output should ideally be --. However, because of how the compiler happened to arrange s1 and s2 in memory, the output I actually got was -OOPS WAY TOO LONG-. The strcpy(s2,...) overwrote the contents of s1 as well.
On gcc, -Wall or -Wstringop-overflow will help you detect situations like this one, where the compiler knows the size of the source string. However, in general, the compiler can't know how big your data will be. Therefore, you have to write code that makes sure you don't copy more than you have room for.
Both snippets invoke undefined behavior - the first because src and dest are not initialized to point anywhere meaningful, and the second because you are writing past the end of the array.
C does not enforce any kind of bounds checking on array accesses - you won't get an "Index out of range" exception if you try to write past the end of an array. You may get a runtime error if you try to access past a page boundary or clobber something important like the frame pointer, but otherwise you just risk corrupting data in your program.
Yes, you are responsible for making sure the target buffer is large enough for the final string. Otherwise the results are unpredictable.
I'd like to point out what is actually happening in the 2nd program in order to illustrate the problem.
It allocates 15 bytes at the memory location starting at dest and copies 14 bytes into it (including the null terminator):
char dest[15] = "Hello DEFGIJK";
...and 11 bytes at src with 10 bytes copied into it:
char src[11] = "Hello ABC";
The strcat() call then copies 10 bytes (9 chars plus the null terminator) from src into dest, starting right after the 'K' in dest. The resulting string at dest will be 23 bytes long including the null terminator. The problem is, you allocated only 15 bytes at dest, and the memory adjacent to that memory will be overwritten, i.e. corrupted, leading to program instability, wrong results, data corruption, etc.
Note that the strcat() function knows nothing about the amount of memory you've allocated at dest (or src, for that matter). It is up to you to make sure you've allocated enough memory at dest to prevent memory corruption.
By the way, the first program doesn't allocate memory at dest or src at all, so your calls to fgets() are corrupting memory starting at those locations.

C how strcpy works and Does it change the size of the original string?

I have this code..
#include <stdio.h>
#include <string.h>
int main() {
char a[6]="Hello";
char b[]="this is mine";
strcpy(a,b);
printf("%d\n",sizeof(a));
printf("%d\n",sizeof(b));
printf("%s\n",a);
printf("%s\n",b);
printf("%d\n",sizeof(a));
return 0;
}
6
13
this is mine
this is mine
6
I want to ask even I have copied the larger strng to a but its size is not changing but its contents are changing why?
Any help would be appreciated..
You cannot change the size of array A. strcpy is meant to be very fast, so it assumes that you, the user, passes a large enough array to fit the copied string. What you have done is override your array a's null terminator, and changed memory past where you have allocated. In many cases this will not work and cause your program to crash, but in a simple example it will run.
The array a has a size of six char that cannot be changed. When you copy the other, longer string into the array, you overrun the array, introducing both instability and security concerns to the program.
When the computer loads the program into memory, the string literal hello is loaded into read-only memory as a constant, the space needed for the array is allocated in the stack memory, and finally, the string is copied into the array.
In this case, the source string overruns the destination array's length, as array a can hold 6 characters and the string literal that you are trying to copy to it is is 13 characters. This leads to a buffer overrun, which can lead to bugs, at the very least. Worse than that is the potential for information leaks and even more disastrous security consequences.
Please reference the strcpy man page:
Strcpy man page
In this example code, it minimally worked, but this is definitely something to avoid.
The size is not changing because for the compiler the array a has a fixed size and you cannot change it and no one can.
The contents are changing because there is no check performed for the bounds of the array, and it's working because of a coincidence. The code you posted has undefined behavior, and one of the possible outcomes is that it works as it is working in your case, but that will not necessarily always happen, add a variable to your main() function for example, and it might stop working.

Overflow not detected when writing nul character in middle of string?

Say I have the code:
char* word = malloc (sizeof(char) * 6);
strcpy(word, "hello\0extra");
puts(word);
free(word);
This compiles just find and Valgrind has no issue, but is there actually a problem? It seems like I am writing into memory that I don't own.
Also, a separate issue, but when I do overfill my buffer with something like
char* word = malloc (sizeof(char) * 6);
strcpy(word, "1234567\0");
puts(word);
free(word);
It prints out 1234567 and Valgrind does catch the problem. What are the consequences of doing something like this? It seems to work every time. Please correct me if this is wrong, but from what I understand, it is possible for another program to take the memory past the 6 and write into it. If that happened, will printing the word just go on forever until it encounter a nul character? That character has just been really confusing for me in learning C strings.
The first strcpy is okay
strcpy(word, "hello\0extra");
You create a char array constant and pass the pointer to strcpy. All characters (including the first \0) is copied, the remainder is ignored.
But wait... You have some extra characters. This makes your const data section a bit larger. Could be a problem in embedded environment where flash space is rare. But there is no run-time problem.
strcpy(word, "hello\0extra");
This is valid because the second paramter should be a well formed string and it is because you have a \0 as your 6th character which forms a string of length 5.
strcpy(word, "1234567\0");
Here you are accessing memory which you don't own/allocated so this is an access violation and might cause crash.(seg fault)
With your first call to strcpy, NUL is inserted into the middle of the string. That means that functions that deal with null-terminated strings will think of your string as stopping with the first NUL, and the rest of your string is ignored. However, free will free all of it and valgrind will not report a problem because malloc will store the length of the buffer in the allocation table and free will use that entry to determine how many bytes to free. In other words, malloc and free are not meant to deal with null-terminated strings, so the NUL in the middle of the string will not affect them. Instead, free determines the length of the string based on how many bytes you allocated in the first place.
With the second example, you overflow the end of the buffer that was allocated by malloc. The results of that are undefined. In theory, that memory that you are writing to could have been allocated by another call to malloc, but in your example, nothing is done with the memory after your buffer, so it is harmless. The string-processing functions think of your string as ending with the first NUL, not with the end of the buffer allocated by malloc, so all of the string is printed out.
Your first question has a couple good answers already. About your second question, on the consequences of writing one byte past the end of your malloced memory:
It's doubtful that mallocing 6 bytes and writing 7 into it will cause a crash. malloc likes to align memory on certain boundaries, so it's not likely to give you six bytes right at the end of a page, such that there would be an access violation at byte 7. But if you malloc 65536 bytes and try to write past the end of that, your program might crash. Writing to invalid memory works a lot of the time, which makes debugging tricky, because you get random crashes only in certain situations.

How strcpy works behind the scenes?

This may be a very basic question for some. I was trying to understand how strcpy works actually behind the scenes. for example, in this code
#include <stdio.h>
#include <string.h>
int main ()
{
char s[6] = "Hello";
char a[20] = "world isnsadsdas";
strcpy(s,a);
printf("%s\n",s);
printf("%d\n", sizeof(s));
return 0;
}
As I am declaring s to be a static array with size less than that of source. I thought it wont print the whole word, but it did print world isnsadsdas .. So, I thought that this strcpy function might be allocating new size if destination is less than the source. But now, when I check sizeof(s), it is still 6, but it is printing out more than that. Hows that working actually?
You've just caused undefined behaviour, so anything can happen. In your case, you're getting lucky and it's not crashing, but you shouldn't rely on that happening. Here's a simplified strcpy implementation (but it's not too far off from many real ones):
char *strcpy(char *d, const char *s)
{
char *saved = d;
while (*s)
{
*d++ = *s++;
}
*d = 0;
return saved;
}
sizeof is just returning you the size of your array from compile time. If you use strlen, I think you'll see what you expect. But as I mentioned above, relying on undefined behaviour is a bad idea.
http://natashenka.ca/wp-content/uploads/2014/01/strcpy8x11.png
strcpy is considered dangerous for reasons like the one you are demonstrating. The two buffers you created are local variables stored in the stack frame of the function. Here is roughly what the stack frame looks like:
http://upload.wikimedia.org/wikipedia/commons/thumb/d/d3/Call_stack_layout.svg/342px-Call_stack_layout.svg.png
FYI things are put on top of the stack meaning it grows backwards through memory (This does not mean the variables in memory are read backwards, just that newer ones are put 'behind' older ones). So that means if you write far enough into the locals section of your function's stack frame, you will write forward over every other stack variable after the variable you are copying to and break into other sections, and eventually overwrite the return pointer. The result is that if you are clever, you have full control of where the function returns. You could make it do anything really, but it isn't YOU that is the concern.
As you seem to know by making your first buffer 6 chars long for a 5 character string, C strings end in a null byte \x00. The strcpy function copies bytes until the source byte is 0, but it does not check that the destination is that long, which is why it can copy over the boundary of the array. This is also why your print is reading the buffer past its size, it reads till \x00. Interestingly, the strcpy may have written into the data of s depending on the order the compiler gave it in the stack, so a fun exercise could be to also print a and see if you get something like 'snsadsdas', but I can't be sure what it would look like even if it is polluting s because there are sometimes bytes in between the stack entries for various reasons).
If this buffer holds say, a password to check in code with a hashing function, and you copy it to a buffer in the stack from wherever you get it (a network packet if a server, or a text box, etc) you very well may copy more data from the source than the destination buffer can hold and give return control of your program to whatever user was able to send a packet to you or try a password. They just have to type the right number of characters, and then the correct characters that represent an address to somewhere in ram to jump to.
You can use strcpy if you check the bounds and maybe trim the source string, but it is considered bad practice. There are more modern functions that take a max length like http://www.cplusplus.com/reference/cstring/strncpy/
Oh and lastly, this is all called a buffer overflow. Some compilers add a nice little blob of bytes randomly chosen by the OS before and after every stack entry. After every copy the OS checks these bytes against its copy and terminates the program if they differ. This solves a lot of security problems, but it is still possible to copy bytes far enough into the stack to overwrite the pointer to the function to handle what happens when those bytes have been changed thus letting you do the same thing. It just becomes a lot harder to do right.
In C there is no bounds checking of arrays, its a trade off in order to have better performance at the risk of shooting yourself in the foot.
strcpy() doesn't care whether the target buffer is big enough so copying too many bytes will cause undefined behavior.
that is one of the reasons that a new version of strcpy were introduced where you can specify the target buffer size strcpy_s()
Note that sizeof(s) is determined at run time. Use strlen() to find the number of characters s occupied. When you perform strcpy() source string will be replaced by destination string so your output wont be "Helloworld isnsadsdas"
#include <stdio.h>
#include <string.h>
main ()
{
char s[6] = "Hello";
char a[20] = "world isnsadsdas";
strcpy(s,a);
printf("%s\n",s);
printf("%d\n", strlen(s));
}
You are relying on undefined behaviour in as much as that the compiler has chose to place the two arrays where your code happens to work. This may not work in future.
As to the sizeof operator, this is figured out at compile time.
Once you use adequate array sizes you need to use strlen to fetch the length of the strings.
The best way to understand how strcpy works behind the scene is...reading its source code!
You can read the source for GLibC : http://fossies.org/dox/glibc-2.17/strcpy_8c_source.html . I hope it helps!
At the end of every string/character array there is a null terminator character '\0' which marks the end of the string/character array.
strcpy() preforms its task until it sees the '\0' character.
printf() also preforms its task until it sees the '\0' character.
sizeof() on the other hand is not interested in the content of the array, only its allocated size (how big it is supposed to be), thus not taking into consideration where the string/character array actually ends (how big it actually is).
As opposed to sizeof(), there is strlen() that is interested in how long the string actually is (not how long it was supposed to be) and thus counts the number of characters until it reaches the end ('\0' character) where it stops (it doesn't include the '\0' character).
Better Solution is
char *strcpy(char *p,char const *q)
{
char *saved=p;
while(*p++=*q++);
return saved;
}

Strcpy() corrupts the copied string in Solaris but not Linux

I'm writing a C code for a class. This class requires that our code compile and run on the school server, which is a sparc solaris machine. I'm running Linux x64.
I have this line to parse (THIS IS NOT ACTUAL CODE BUT IS INPUT TO MY PROGRAM):
while ( cond1 ){
I need to capture the "while" and the "cond1" into separate strings. I've been using strtok() to do this. In Linux, the following lines:
char *cond = NULL;
cond = (char *)malloc(sizeof(char));
memset(cond, 0, sizeof(char));
strcpy(cond, strtok(NULL, ": \t\(){")); //already got the "while" out of the line
will correctly capture the string "cond1".Running this on the solaris machine, however, gives me the string "cone1".
Note that in plenty of other cases within my program, strings are being copied correctly. (For instance, the "while") was captured correctly.
Does anyone know what is going on here?
The line:
cond = (char *)malloc(sizeof(char));
allocates exactly one char for storage, into which you are then copying more than one - strcpy needs to put, at a bare minimum, the null terminator but, in your case, also the results of your strtok as well.
The reason it may work on a different system is that some implementations of malloc will allocate at a certain resolution (e.g., a multiple of 16 bytes) no matter what actual value you ask for, so you may have some free space there at the end of your buffer. But what you're attempting is still very much undefined behaviour.
The fact that the undefined behaviour may be to work sometimes in no way abrogates your responsibility to avoid such behaviour.
Allocate enough space for storing the results of your strtok and you should be okay.
The safest way to do this is to dynamically allocate the space so that it's at least as big as the string you're passing to strtok. That way there can be no possibility of overflow (other than weird edge cases where other threads may modify the data behind your back but, if that were the case, strtok would be a very bad choice anyway).
Something like (if instr is your original input string):
cond = (char*)malloc(strlen(instr)+1);
This guarantees that any token extracted from instr will fit within cond.
As an aside, sizeof(char) is always 1 by definition, so you don't need to multiply by it.
cond is being allocated one byte. strcpy is copying at least two bytes to that allocation. That is, you are writing more bytes into the allocation than there is room for.
One way to fix it to use char *cond = malloc (1000); instead of what you've got.
You only allocated memory for 1 character but you trying to store at least 6 characters (you need space for the terminating \0). The quick and dirty way to solve this is just say
char cond[128]
instead of malloc.

Resources