Concatenating string with itself two times give segmentation fault - c

#include <stdio.h>
int main(void)
{
char a[100] = "hi";
strcat(a, a);
strcat(a, a);
printf("%s\n", a);
return 0;
}

From the definition of strcat in the C language standard, §7.21.3.1/2
If copying takes place between objects that overlap, the behavior is undefined.
My compiler crashes even when it's done once, as strcat(a, a); copies the first character of the second argument over the '\0' at the end of the first argument, then the second character of the second argument after it, etc until it encounters a '\0' in the second argument.. which never happens because that '\0' was gone when the first character was copied.

From strcat(3) manpage:
DESCRIPTION
The strcat() and strncat() functions append a copy of the null-terminated
string s2 to the end of the null-terminated string s1, then add a termi-
nating '\0'. The string s1 must have sufficient space to hold the
result.
The strncat() function appends not more than n characters from s2, and
then adds a terminating\0'.`
The source and destination strings should not overlap, as the behavior is
undefined.

This is undefined behavior.
If you're in a loop, reading each char from a string, until \0 is found, but at the same time you're appending (writing) chars to the end of it, when is the loop going to end?

I suspect it's because the implementation is overwriting the null byte at the end:
Start: a = {h,i,\0}
src^ vdst
next: a = {h,i,h}
src^ vdst
next: a = {h,i,h,i}
src^ vdst
next: a = {h,i,h,i,h}
...
Because you overwrote the null terminator, your source string will never end and the method will keep copying until it tries to access memory it shouldn't and segfaults.

Related

C Arbitrary length string

I have a doubt how the length for an array is allocated
#include <stdio.h>
#include <string.h>
int main()
{
char str[] = "s";
long unsigned a = strlen(str);
scanf("%s", str);
printf("%s\n%lu\n", str, a);
return 0;
}
In the above program, I assign the string "s" to a char array.
I thought the length of str[] is 1. so we cannot store more than the length of the array. But it behaves differently. If I reading a string using scanf it is stored in str[] without any error. What was the length of the array str?
Sample I/O :
Hello
Hello 1
Your str is an array of char initialized with "s", that is, it has size 2 and length 1. The size is one more than the length because a NUL string terminator character (\0) is added at the end.
Your str array can hold at most two char. Trying to write more will cause your program to access memory past the end of the array, which is undefined behavior.
What actually happens though, is that since the str array is stored somewhere in memory (on the stack), and that memory region is far larger than 2 bytes, you are actually able to write past the end without causing a crash. This does not mean that you should. It's still undefined behavior.
Since your array has size 2, it can only hold a string of length 1, plus its terminator. To use scanf() and correctly avoid writing past the end of the array, you can use the field width specifier: a numeric value after the % and before the s, like this:
scanf("%1s", str);
When an array is declared without specifying its size when the size is determined by the used initializers.
In this declaration of an array
char str[] = "s";
there is used a string literal as an initializer. A string literal is a sequence of characters terminated by an included zero-terminating character. That is the string literal "s" has two characters { 's', '\0' }.
Its characters are used to initialize sequentially elements of the array str.
So if you will write
printf( "sizeof( str ) = %zu\n", sizeof( str ) );
then the output will be 2. The length of a string is determinate as a number of characters before the terminating zero character. So if you will write
#include <string.h>
//...
printf( "strlen( str ) = %zu\n", strlen( str ) );
then the output will be 1.
If you will try to write data outside an array then you will get undefined behavior because a memory that does not belong to the array will be overwritten. In some cases you can get the expected result. In other cases the program can finish abnormally. That is the behavior of the program is undefined.
The array str has size 2: 1 byte for the character 's' and one for the terminating null byte. What you're doing is writing past the end of the array. Doing so invokes undefined behavior.
When your code has undefined behavior, it could crash, it could output strange results, or it could (as in this case) appear to work properly. Also, making a seemingly unrelated change such as a printf call for debugging or an unused local variable can change how undefined behavior manifests itself.

How to store '\0' in a char array

Is it possible to store the char '\0' inside a char array and then store different characters after? For example
char* tmp = "My\0name\0is\0\0";
I was taught that is actually called a string list in C, but when I tried to print the above (using printf("%s\n", tmp)), it only printed
"My".
Yes, it is surely possible, however, furthermore, you cannot use that array as string and get the content stored after the '\0'.
By definition, a string is a char array, terminated by the null character, '\0'. All string related function will stop at the terminating null byte (for example, an argument, containing a '\0' in between the actual contents, passed to format specifier%s in printf()).
Quoting C11, chapter §7.1.1, Definitions of terms
A string is a contiguous sequence of characters terminated by and including the first null
character. [...]
However, for byte-by-byte processing, you're good to go as long as you stay within the allocated memory region.
The problem you are having is with the function you are using to print tmp. Functions like printf will assume that the string is null terminated, so it will stop when it sees the first \0
If you try the following code you will see more of the value in tmp
int main(int c,char** a){
char* tmp = "My\0name\0is\0\0";
write(1,tmp,12);
}

Is it undefined behavior if the destination string in strcat function is not null terminated?

The following program
// Code has taken from http://ideone.com/AXClWb
#include <stdio.h>
#include <string.h>
#define SIZE1 5
#define SIZE2 10
#define SIZE3 15
int main(void){
char a[SIZE1] = "Hello";
char b[SIZE2] = " World";
char res[SIZE3] = {0};
for (int i=0 ; i<SIZE1 ; i++){
res[i] = a[i];
}
strcat(res, b);
printf("The new string is: %s\n",res);
return 0;
}
has well defined behavior. As per the requirement, source string b is null terminated. But what would be the behavior if the line
char res[SIZE3] = {0}; // Destination string
is replaced with
char res[SIZE3];
Does standard says explicitly about the destination string to be null terminated too?
TL;DR Yes.
Since this is a language-lawyer question, let me add my two cents to it.
Quoting C11, chapter §7.24.3.1/2 (emphas is mine)
char *strcat(char * restrict s1,const char * restrict s2);
The strcat function appends a copy of the string pointed to by s2 (including the
terminating null character) to the end of the string pointed to by s1. The initial character
of s2 overwrites the null character at the end of s1.[...]
and, by definition, a string is null-terminated, quoting §7.1.1/1
A string is a contiguous sequence of characters terminated by and including the first null
character.
So, if the source char array is not null-terminated (i.e., not a string), strcat() may very well go beyond the bounds in search of the end which invokes undefined behavior.
As per your question, char res[SIZE3]; being an automatic local variable, will contain indeterminate value, and if used as the destination of strcat(), will invoke UB.
I think man explicitly says that
Description
The strcat() function appends the src string to the dest string, overwriting the terminating null byte ('\0') at the end of dest, and then adds a terminating null byte. The strings may not overlap, and the dest string must have enough space for the result. If dest is not large enough, program behavior is unpredictable; buffer overruns are a favorite avenue for attacking secure programs.
Enphasis mine
BTW I think strcat starts searching for the null terminator into the dest string before to concatenate the new string, so it is obviously UB, as far as dest string has automatic storage.
In the proposed code
for (int i=0 ; i<SIZE1 ; i++){
res[i] = a[i];
}
Copy 5 chars of a and not the null terminator to res string, so other bytes from 5 to 14 are uninitialized.
Standard also says about safaer implementation strcat-s
K.3.7.2.1 The strcat_s function
Synopsis
#define _ _STDC_WANT_LIB_EXT1_ _ 1
#include <string.h>
errno_t strcat_s(char * restrict s1,
rsize_t s1max,
const char * restrict s2);
Runtime-constraints
2 Let m denote the value s1max - strnlen_s(s1, s1max) upon entry to
strcat_s.
We can see that strlen_s always return them valid size for the dest buffer. From my point of view this implementation was introduced to avoid the UB of the question.
If you leave res uninitialized then, after the copying a into res (in for loop), there's no NUL terminator in res. So, the behaviour of strcat() is undefined if the destination string doesn't contain a NUL byte.
Basically strcat() requires both of its arguments to be strings (i.e. both must contain the terminating NUL byte). Otherwise, it's undefined behaviour. This
is obvious from the description of strcat():
§7.23.3.2, strcat() function
The strcat function appends a copy of the string pointed to by s2
(including the terminating null character) to the end of the string
pointed to by s1. The initial character of s2 overwrites the null
character at the end of s1.
(emphasis mine).
If char res[SIZE3]; is on the stack, it'll have random/undefined stuff in it.
You'll never know whether there'll be a zero byte within res[SIZE3], so yes, strcatting to that is undefined.
If char res[SIZE3]; is an uninitialized global, it'll be all zeros, which will make it behave as an empty c-string, and strcating to it will be safe (as long as SIZE3 is large enough for what you're appending).

strncat() is copying to the same string again

I am trying to concatenate two strings in C programming. Here is my code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char const *argv[])
{
/* code */
char s1[3],s2[34];
strncat(s1,"mv ",3);
strncat(s2," /home/xxxxxxx/.local/share/Trash/",34);
printf("%s \n",s1);
return 0;
}
when I try to print the value in s1 it prints mv /home/xxxxxxx/.local/share/Trash as the output. Why is the value i am putting for s2 getting added with the string s1? If the question is already asked please put the link.
You have undefined behavior in your code, as neither s1 nor s2 are initialized. Uninitialized (non-static) local variables have indeterminate values, and it's unlikely that they have the string terminator that strncat needs to find the end of the string to know where to append the source string.
After you fix the above, you also have another case of undefined behavior when strncat tries to write the string terminator beyond the end of the arrays.
Also, you're not concatenating the two strings, as s1 and s2 are two unrelated arrays, you just append the literal strings to the end of the two arrays but not together.
What you could do is to allocate an array big enough to hold both strings, and the string terminator, then copy the first string into the array, and then append the second string.
Or not use e.g. snprintf (or _snprintf is using the Microsoft Windows runtime library) to construct the string.
char s[100];
snprintf(s, sizeof(s), "mv %s %s", somepath, someotherpath);
s1 is defined as
char s1[3],
that is it has three elements (characters). When strncat was executed
strncat(s1," mv",3);
this three elements were filled with { ' ', 'm', 'y' }
After that this array does not have the terminating zero.
Format specifier %s in printf function outputs a character array until the terminatig zero will be encountered. As array s1 does not have the terminating zero then the printf continues to output all bytes that are beyond the array. because after array s1 there is array s2
char s1[3],s2[34];
then it also is outputed until the terminating zero will be encountered.
Take into account that function strncat requires that the target string would be zero terminated. However you did not initialize s1. So the behaviour of the program is undefined.
If you want that the program would work correctly you have to define array s1 as having four characters
char s1[4] = { '\0' },s2[34];
strncat(s1,"mv ",4);
Or it would be simpler to write
char s1[4] = { '\0' },s2[34];
strcat( s1, "mv " );
Or even the following way
char s1[4],s2[34];
strcpy( s1, "mv " );
that is it would be better to use function strcpy instead of strncpy
when I try to print the value in s1 it prints mv /home/ashwini/.local/share/Trash as the output.
This is undefined behavior: s1 is not null-terminated, because it has a three-character string in a space of three characters; there's no space for the null terminator.
Your s2 string buffer happens to be located in the adjacent region of memory, so it gets printed as well, until printf runs into the null terminator of s2.
Allocating more memory to s1 and accounting for null termination would fix this problem:
char s1[4],s2[36];
s1[0] = '\0';
strncat(s1," mv", 4);
s2[0] = '\0';
strncat(s2," /home/xxxxxxx/.local/share/Trash/", 36);
However, strncat is not a proper function for working with regular strings: it is designed for use with fixed-length strings, which are no longer in widespread use. Unfortunately, C standard library does not include strlcat, which has proper semantic for "regular" C strings. It is available on many systems as a library extension, though.
Demo.

I'm having problems appending the value of a char variable to a string in C

I've been trying to use strcat(array, &charVariable) to add charVariable to array,
when i display the array it displays this ╠╠╠╠╠╠╠╠.
can anyone help me out?
The problem is that strcat expects a pointer to a null-terminated character sequence. In your case, you are passing an address of a stand-alone char variable. If the item in memory immediately after the char variable is not zero, you will trigger undefined behavior (appending garbage characters or crashing).
Here is how to do it correctly:
char tmp[2];
tmp[0] = charVariable;
tmp[1] = '\0';
strcat(array, tmp);
strcat is meant to be used on strings, which have a null byte \0 at the end. You can try the approach suggested by dasblinkenlight, but a more efficient approach (assuming array has enough space to add the extra char, which you'd have to assume anyway to use strcat) is:
int len = strlen(array);
array[len] = charVariable;
array[len+1] = '\0';
Rules for using strcat():
The target string must already have a null character ('\0') terminator, otherwise strcat() will not be able to find the end of it.
The target string must be big enough to hold the string you are appending to it, in addition to the characters and the terminating null character that it already contains.
The source string you are appended to the target string must also have a null terminator, otherwise strcat() will not be able to find the end of it.
Addendum
#dasblinkenlight gives one proper way to do it. Here is another:
size_t len;
len = strlen(array);
array[len] = charVariable;
array[len+1] = '\0';

Resources