Confusion about strcat with memcpy - c

Okay so I have seen a few implementations of the strcat function with memcpy. I understand that it is efficient, since there no need to allocate. But how do you preserve overwriting the contents of the source string with the resultant string.
For example lets take-:
char *str1 = "Hello";
char *str2 = "World";
str1 = strcat(str1, str2);
How do I ensure that in str2 isn't overwritten with the contents of the resultant "HelloWorld" string ?
Also if strings are nothing but char arrays, and arrays are suppose to have a fixed size then without reallocation of memory if I copy bytes into the array that are larger than the array, then isn't that unsafe ?

It's not about unsafe, it's undefined behavior.
First of all, you're trying to modify a string literal, which inherently invokes UB.
Secondly, regarding the size of the destination buffer, quoting the man page (emphasis mine)
The strcat() function appends the src string to the dest string, overwriting the terminating null byte ('\0') at the end of dest, and then adds a terminating null byte. The strings may not overlap, and the dest string must have enough space for the result. If dest is not large enough, program behavior is unpredictable; [...]

I understand that it is efficient, since there no need to allocate.
That's an incorrect understanding. Neither memcpy nor strcat allocates memory. Both require that you pass pointers that point to sufficient amount of valid memory. If that is not the case, the program is subject to undefined behavior.
Your posted code is subject to undefined behavior for couple of reasons:
str1 points to a string literal, which is in read-only portion of the program.
str1 does not enough memory to hold the string "HelloWorld" and the terminating null character.

Related

Why does this simple C program crash at runtime?

I tried the following simple C program but it crashes at runtime without giving any output. What is wrong here? How can I solve this problem?
#include <stdio.h>
#include <string.h>
int main(void)
{
char *s1="segmentation";
char *s2="fault";
char *s3=strcat(s1,s2);
printf("concatanated string is %s",s3);
}
So this is the agregated answer for this question:
you should not try to alter string literal in any way. according to the C standard , altering string literals causes undefined behaviour:
"It is unspecified whether these arrays are distinct provided their
elements have the appropriate values. If the program attempts to
modify such an array, the behavior is undefined."
but let's say for the discussion that s1 is not string literal - you still need have enough buffer for strcat to work on - strcat finds the nul termination character and start writing on it the string you're appending. if your buffer is not big enough - you will try to write outside the bounderies of your array - causing again undefined behaviour.
Because strcat append functions on his first argument.
Ie the result will be store on s1 not on s3
You should allocate more memory for s1.
Ie :
char* s1 = malloc(sizeof(char) * (13 + 6)); //length of your 2 strings
strcpy(s1, "segmentation");
char *s2="fault";
strcat(s1,s2);
printf("concatanated string is %s",s1);
Others are focusing on there is not enough space in s1 for string concatenation. However, the bigger problem here is you are trying to modify a string literal, which is undefined behavior. Defining s1 as a char array that has enough space should work:
char s1[20] = "segmentation";
char *s2 = "fault";
strcat(s1,s2);
printf("concatanated string is %s",s1);
char *s1="segmentation";
s1 is an immutable string, which will be reside in read-only memory. If you look at the strcat definition:
char *strcat(char *dest, const char *src) here
dest -- This is pointer to the destination array, which should contain a C string, and should be large enough to contain the concatenated resulting string.
so when you are calling char *s3=strcat(s1,s2); you are trying to modify the immutable string which result in segmentation fault.
The most problematic thing here is that you declared s1 and s2 as char * and not as const char* - always use const in such case - this is read-only memory when you initialize a string this way.
If you want to extend the string in s1, you should not initialize it as you did, but you should allocate the memory for s1 on the stack or in the dynamic memory.
Example for allocating on the stack:
char s1[100] = "segmentation";
Example for allocating in the dynamic memory:
char *s1 = malloc(100 * sizeof(char));
strcpy(s1, "segmentation");
I used here 100 as I assume that this is enough for your string. You should always allocate a number that is at least the length of your string + 1
Found a similar one here on comp.lang.c It also answers in depth.
the main problem here is that space for the concatenated result is not
properly allocated. C does not provide an automatically-managed string
type. C compilers allocate memory only for objects explicitly
mentioned in the source code (in the case of strings, this includes
character arrays and string literals). The programmer must arrange for
sufficient space for the results of run-time operations such as string
concatenation, typically by declaring arrays, or by calling malloc.
strcat() performs no allocation; the second string is appended to the
first one, in place. The first (destination) string must be writable
and have enough room for the concatenated result. Therefore, one fix
would be to declare the first string as an array:
The original call to strcat in the question actually has two problems:
the string literal pointed to by s1, besides not being big enough for
any concatenated text, is not necessarily writable at all.
look at the definition of strcat()
char *strcat(char *dest, const char *src)
dest -- This is pointer to the destination array, which should contain a C string, and should be large enough to contain the concatenated resulting string.
src -- This is the string to be appended. This should not overlap the destination.
s1 is not enough to hold the concatenated string, which cause to write beyond the limit. It causes the run-time failure.
try this,
char *s1="segmentation";
char *s2="fault";
char* s3 = malloc(sizeof(s1) + sizeof(s2));
strcpy(s3, s1);
strcat(s3, s2);

strcpy behaving differently when two pointers are assigned strings in different ways

I am sorry, I might me asking a dumb question but I want to understand is there any difference in the below assignments? strcpy works in the first case but not in the second case.
char *str1;
*str1 = "Hello";
char *str2 = "World";
strcpy(str1,str2); //Works as expected
char *str1 = "Hello";
char *str2 = "World";
strcpy(str1,str2); //SEGMENTATION FAULT
How does compiler understand each assignment?Please Clarify.
Edit: In the first snippet you wrote *str1 = "Hello" which is equivalent to assigning to str[0], which is obviously wrong, because str1 is uninitialized and therefore is an invalid pointer. If we assume that you meant str1 = "Hello", then you are still wrong:
According to C specs, Attempting to modify a string literal results in undefined behavior: they may be stored in read-only storage (such as .rodata) or combined with other string literals so both snippets that you provided will yield undefined behavior.
I can only guess that in the second snippet the compiler is storing the string in some read-only storage, while in the first one it doesn't, so it works, but it's not guaranteed.
Sorry, both examples are very wrong and lead to undefined behaviour, that might or might not crash. Let me try to explain why:
str1 is a dangling pointer. That means str1 points to somewhere in your memory, writing to str1 can have arbitrary consequences. For example a crash or overriding some data in memory (eg. other local variables, variables in other functions, everything is possible)
The line *str1 = "Hello"; is also wrong (even if str1 were a valid pointer) as *str1 has type char (not char *) and is the first character of str1 which is dangling. However, you assign it a pointer ("Hello", type char *) which is a type error that your compiler will tell you about
str2 is a valid pointer but presumably points to read-only memory (hence the crash). Normally, constant strings are stored in read-only data in the binary, you cannot write to them, but that's exactly what you do in strcpy(str1,str2);.
A more correct example of what you want to achieve might be (with an array on the stack):
#define STR1_LEN 128
char str1[STR1_LEN] = "Hello"; /* array with space for 128 characters */
char *str2 = "World";
strncpy(str1, str2, STR1_LEN);
str1[STR1_LEN - 1] = 0; /* be sure to terminate str1 */
Other option (with dynamically managed memory):
#define STR1_LEN 128
char *str1 = malloc(STR1_LEN); /* allocate dynamic memory for str1 */
char *str2 = "World";
/* we should check here that str1 is not NULL, which would mean 'out of memory' */
strncpy(str1, str2, STR1_LEN);
str1[STR1_LEN - 1] = 0; /* be sure to terminate str1 */
free(str1); /* free the memory for str1 */
str1 = NULL;
EDIT: #chqrlie requested in the comments that the #define should be named STR1_SIZE not STR1_LEN. Presumably to reduce confusion because it's not the length in characters of the "string" but the length/size of the buffer allocated. Furthermore, #chqrlie requested not to give examples with the strncpy function. That wasn't really my choice as the OP used strcpy which is very dangerous so I picked the closest function that can be used correctly. But yes, I should probably have added, that the use of strcpy, strncpy, and similar functions is not recommended.
There seems to be some confusion here. Both fragments invoke undefined behaviour. Let me explain why:
char *str1; defines a pointer to characters, but it is uninitialized. It this definition occurs in the body of a function, its value is invalid. If this definition occurs at the global level, it is initialized to NULL.
*str1 = "Hello"; is an error: you are assigning a string pointer to the character pointed to by str1. str1 is uninitialized, so it does not point to anything valid, and you channot assign a pointer to a character. You should have written str1 = "Hello";. Furthermore, the string "Hello" is constant, so the definition of str1 really should be const char *str1;.
char *str2 = "World"; Here you define a pointer to a constant string "World". This statement is correct, but it would be better to define str2 as const char *str2 = "World"; for the same reason as above.
strcpy(str1,str2); //Works as expected NO it does not work at all! str1 does not point to a char array large enough to hold a copy of the string "World" including the final '\0'. Given the circumstances, this code invokes undefined behaviour, which may or may not cause a crash.
You mention the code works as expected: it only does no in appearance: what really happens is this: str1 is uninitialized, if it pointed to an area of memory that cannot be written, writing to it would likely have crashed the program with a segmentation fault; but if it happens to point to an area of memory where you can write, and the next statement *str1 = "Hello"; will modify the first byte of this area, then strcpy(str1, "World"); will modify the first 6 bytes at that place. The string pointed to by str1 will then be "World", as expected, but you have overwritten some area of memory that may be used for other purposes your program may consequently crash later in unexpected ways, a very hard to find bug! This is definitely undefined behaviour.
The second fragment invokes undefined behaviour for a different reason:
char *str1 = "Hello"; No problem, but should be const.
char *str2 = "World"; OK too, but should also be const.
strcpy(str1,str2); //SEGMENTATION FAULT of course it is invalid: you are trying to overwrite the constant character string "Hello" with the characters from the string "World". It would work if the string constant was stored in modifiable memory, and would cause even greater confusion later in the program as the value of the string constant was changed. Luckily, most modern environemnts prevent this by storing string constants in a read only memory. Trying to modify said memory causes a segment violation, ie: you are accessing the data segment of memory in a faulty way.
You should use strcpy() only to copy strings to character arrays you define as char buffer[SOME_SIZE]; or allocate as char *buffer = malloc(SOME_SIZE); with SOME_SIZE large enough to hold what you are trying to copy plus the final '\0'
Both code are wrong, even if "it works" in your first case. Hopefully this is only an academic question! :)
First let's look at *str1 which you are trying to modify.
char *str1;
This declares a dangling pointer, that is a pointer with the value of some unspecified address in the memory. Here the program is simple there is no important stuff, but you could have modified very critical data here!
char *str = "Hello";
This declares a pointer which will point to a protected section of the memory that even the program itself cannot change during execution, this is what a segmentation fault means.
To use strcpy(), the first parameter should be a char array dynamically allocated with malloc(). If fact, don't use strcpy(), learn to use strncpy() instead because it is safer.

C - strcpy pointer

I want to ask about strcpy. I got problem here. Here is my code:
char *string1 = "Sentence 1";
char *string2 = "A";
strcpy(string1, string2);
I think there is no problem in my code there. The address of the first character in string1 and string2 are sent to the function strcpy. There should be no problem in this code, right?
Anybody please help me solve this problem or explain to me..
Thank you.
There is a problem -- your pointers are each pointing to memory which you cannot write to; they're pointing to constants which the compiler builds into your application.
You need to allocate space in writable memory (the stack via char string1[<size>]; for example, or the heap via char *string1 = malloc(<size>);). Be sure to replace with the amount of buffer space you need, and add an extra byte at least for NULL termination. If you malloc(), be sure you free() later!
This gives undefined behaviour. The compiler may allow it, due to a quirk of history (string literals aren't const), but you're basically trying to overwrite data which on many platforms you simply cannot modify.
From linux man pages:
char *strcpy(char *dest, const char *src);
The strcpy() function copies the string pointed to by src,
including the terminating null byte ('\0'), to the buffer pointed to
by dest. The strings may not overlap, and the destination string
dest must be large enough to receive the copy.
You have a problem with your *dest pointer, since it's pointing to a string literal instead of allocated, modifiable memory. Try defining string one as char string1[BUFFER_LENGTH]; or allocate it dynamically with malloc().

having memcpy problem

char *a=NULL;
char *s=NULL;
a=(char *)calloc(1,(sizeof(char)));
s=(char *)calloc(1,(sizeof(char)));
a="DATA";
memcpy(s,a,(strlen(a)));
printf("%s",s);
Can you plz tell me why its printing DATA½½½½½½½½■ε■????How to print only DATA?? Thanks
Strings in C are terminated by a zero character value (nul).
strlen returns the number of characters before the zero.
So you are not copying the zero.
printf keeps going, printing whatever is in the memory after s until it hits a zero.
You also are only creating a buffer of size 1, so you are writing data over whatever is after s, and you leak the memory calloc'd to a before you set a to be a literal.
Allocate the memory for s after finding the length of the string, allocating one more byte to include the nul terminator, then copy a into s. You don't need to allocate anything for a as the C runtime looks after storing the literal "DATA".
strlen does only count the chars without the terminator '\0'.
Without this terminator printf does not know the end od the string.
Solution:
memcpy(s,a,(strlen(a)+1));
You are first allocating memory, then throwing that memory away by re-assigning the pointer using a string literal. Your arguments to calloc() also look very wrong.
Also, memcpy() is not a string copying function, it doesn't include the terminator. You should use strcpy().
The best way to print only DATA would seem to be
puts("DATA");
You need to be more clear on what you want to do, to get help with the pointers/allocations/copying.
Your
a="DATA";
trashes the pointer to the allocated memory. It does not copy "DATA" into the memory. Which however would be not enough to store it, since
a=(char *)calloc(1,(sizeof(char)));
allocates a single char. While
memcpy(s,a,(strlen(a)));
copies what is pointed now by a (string literal "DATA") to the memory which is pointed by s. But again, s points to a single char allocated, and copying more than 1 char will overwrite something and results in a bug.
strlen(a) gives you 4 (the length of "DATA") and memcpy copies exactly 4 char. But to know where a string ends, C uses the convention to put a final "null" char ('\0') to its end. So indeed "DATA" is, in memory, 'D' 'A' 'T' 'A' '\0'.
All string related function expect the null byte, and they don't stop printing until they find it.
To copy strings, use instead strcpy (or strncpy), it copies the string with its final null byte too. (strcpy is less "secure" since you can overflow the destination buffer).
But the biggest problem I can see here is that you reserve a single char only for a (and you trash it then) and s, so DATA\0 won't fit anywhere.
You are reserving space for 1 character so you are actually using the memory of some other variable when you are writing "DATA" (which is 4 characters + the trailing \0 to mark the end of the string).
a=(char *)calloc(1,(sizeof(char)));
For this example you would need 5 characters or more:
a=(char *)calloc(5, (sizeof(char)));
You need to store a terminating \0 after that DATA string so printf() will know to stop printing.
You could replace memcpy with strcat:
strcat(s, a);
should do it.
Note, however, that there's a bug earlier on:
calloc(1,sizeof(char))
will only allocate a single byte! That's certainly not enough! Depending on the implementation, your program may or may not crash.

I'm new to C, can someone explain why the size of this string can change?

I have never really done much C but am starting to play around with it. I am writing little snippets like the one below to try to understand the usage and behaviour of key constructs/functions in C. The one below I wrote trying to understand the difference between char* string and char string[] and how then lengths of strings work. Furthermore I wanted to see if sprintf could be used to concatenate two strings and set it into a third string.
What I discovered was that the third string I was using to store the concatenation of the other two had to be set with char string[] syntax or the binary would die with SIGSEGV (Address boundary error). Setting it using the array syntax required a size so I initially started by setting it to the combined size of the other two strings. This seemed to let me perform the concatenation well enough.
Out of curiosity, though, I tried expanding the "concatenated" string to be longer than the size I had allocated. Much to my surprise, it still worked and the string size increased and could be printf'd fine.
My question is: Why does this happen, is it invalid or have risks/drawbacks? Furthermore, why is char str3[length3] valid but char str3[7] causes "SIGABRT (Abort)" when sprintf line tries to execute?
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void main() {
char* str1 = "Sup";
char* str2 = "Dood";
int length1 = strlen(str1);
int length2 = strlen(str2);
int length3 = length1 + length2;
char str3[length3];
//char str3[7];
printf("%s (length %d)\n", str1, length1); // Sup (length 3)
printf("%s (length %d)\n", str2, length2); // Dood (length 4)
printf("total length: %d\n", length3); // total length: 7
printf("str3 length: %d\n", (int)strlen(str3)); // str3 length: 6
sprintf(str3, "%s<-------------------->%s", str1, str2);
printf("%s\n", str3); // Sup<-------------------->Dood
printf("str3 length after sprintf: %d\n", // str3 length after sprintf: 29
(int)strlen(str3));
}
This line is wrong:
char str3[length3];
You're not taking the terminating zero into account. It should be:
char str3[length3+1];
You're also trying to get the length of str3, while it hasn't been set yet.
In addition, this line:
sprintf(str3, "%s<-------------------->%s", str1, str2);
will overflow the buffer you allocated for str3. Make sure you allocate enough space to hold the complete string, including the terminating zero.
void main() {
char* str1 = "Sup"; // a pointer to the statically allocated sequence of characters {'S', 'u', 'p', '\0' }
char* str2 = "Dood"; // a pointer to the statically allocated sequence of characters {'D', 'o', 'o', 'd', '\0' }
int length1 = strlen(str1); // the length of str1 without the terminating \0 == 3
int length2 = strlen(str2); // the length of str2 without the terminating \0 == 4
int length3 = length1 + length2;
char str3[length3]; // declare an array of7 characters, uninitialized
So far so good. Now:
printf("str3 length: %d\n", (int)strlen(str3)); // What is the length of str3? str3 is uninitialized!
C is a primitive language. It doesn't have strings. What it does have is arrays and pointers. A string is a convention, not a datatype. By convention, people agree that "an array of chars is a string, and the string ends at the first null character". All the C string functions follow this convention, but it is a convention. It is simply assumed that you follow it, or the string functions will break.
So str3 is not a 7-character string. It is an array of 7 characters. If you pass it to a function which expects a string, then that function will look for a '\0' to find the end of the string. str3 was never initialized, so it contains random garbage. In your case, apparently, there was a '\0' after the 6th character so strlen returns 6, but that's not guaranteed. If it hadn't been there, then it would have read past the end of the array.
sprintf(str3, "%s<-------------------->%s", str1, str2);
And here it goes wrong again. You are trying to copy the string "Sup<-------------------->Dood\0" into an array of 7 characters. That won't fit. Of course the C function doesn't know this, it just copies past the end of the array. Undefined behavior, and will probably crash.
printf("%s\n", str3); // Sup<-------------------->Dood
And here you try to print the string stored at str3. printf is a string function. It doesn't care (or know) about the size of your array. It is given a string, and, like all other string functions, determines the length of the string by looking for a '\0'.
Instead of trying to learn C by trial and error, I suggest that you go to your local bookshop and buy an "introduction to C programming" book. You'll end up knowing the language a lot better that way.
There is nothing more dangerous than a programmer who half understands C!
What you have to understand is that C doesn't actually have strings, it has character arrays. Moreover, the character arrays don't have associated length information -- instead, string length is determined by iterating over the characters until a null byte is encountered. This implies, that every char array should be at least strlen + 1 characters in length.
C doesn't perform array bounds checking. This means that the functions you call blindly trust you to have allocated enough space for your strings. When that isn't the case, you may end up writing beyond the bounds of the memory you allocated for your string. For a stack allocated char array, you'll overwrite the values of local variables. For heap-allocated char arrays, you may write beyond the memory area of your application. In either case, the best case is you'll error out immediately, and the worst case is that things appear to be working, but actually aren't.
As for the assignment, you can't write something like this:
char *str;
sprintf(str, ...);
and expect it to work -- str is an uninitialized pointer, so the value is "not defined", which in practice means "garbage". Pointers are memory addresses, so an attempt to write to an uninitialized pointer is an attempt to write to a random memory location. Not a good idea. Instead, what you want to do is something like:
char *str = malloc(sizeof(char) * (string length + 1));
which allocates n+1 characters worth of storage and stores the pointer to that storage in str. Of course, to be safe, you should check whether or not malloc returns null. And when you're done, you need to call free(str).
The reason your code works with the array syntax is because the array, being a local variable, is automatically allocated, so there's actually a free slice of memory there. That's (usually) not the case with an uninitialized pointer.
As for the question of how the size of a string can change, once you understand the bit about null bytes, it becomes obvious: all you need to do to change the size of a string is futz with the null byte. For example:
char str[] = "Foo bar";
str[1] = (char)0; // I'd use the character literal, but this editor won't let me
At this point, the length of the string as reported by strlen will be exactly 1. Or:
char str[] = "Foo bar";
str[7] = '!';
after which strlen will probably crash, because it will keep trying to read more bytes from beyond the array boundary. It might encounter a null byte and then stop (and of course, return the wrong string length), or it might crash.
I've written all of one C program, so expect this answer to be inaccurate and incomplete in a number of ways, which will undoubtedly be pointed out in the comments. ;-)
Your str3 is too short - you need to add extra byte for null-terminator and the length of "<-------------------->" string literal.
Out of curiosity, though, I tried
expanding the "concatenated" string to
be longer than the size I had
allocated. Much to my surprise, it
still worked and the string size
increased and could be printf'd fine.
The behaviour is undefined so it may or may not segfault.
strlen returns the length of the string without the trailing NULL byte (\0, 0x00) but when you create a variable to hold the combined strings you need to add that 1 character.
char str3[length3 + 1];
…and you should be all set.
C strings are '\0' terminated and require an extra byte for that, so at least you should do
char str3[length3 + 1]
will do the job.
In sprintf() ypu are writing beyond the space allocated for str3. This may cause any type of undefined behavior (If you are lucky then it will crash). In strlen(), it is just searching for a NULL character from the memory location you specified and it is finding one in 29th location. It can as well be 129 also i.e. it will behave very erratically.
A few important points:
Just because it works doesn't mean it's safe. Going past the end of a buffer is always unsafe, and even if it works on your computer, it may fail under a different OS, different compiler, or even a second run.
I suggest you think of a char array as a container and a string as an object that is stored inside the container. In this case, the container must be 1 character longer than the object it holds, since a "null character" is required to indicate the end of the object. The container is a fixed size, and the object can change size (by moving the null character).
The first null character in the array indicates the end of the string. The remainder of the array is unused.
You can store different things in a char array (such as a sequence of numbers). It just depends on how you use it. But string function such as printf() or strcat() assume that there is a null-terminated string to be found there.

Resources