C Arbitrary length string - c

I have a doubt how the length for an array is allocated
#include <stdio.h>
#include <string.h>
int main()
{
char str[] = "s";
long unsigned a = strlen(str);
scanf("%s", str);
printf("%s\n%lu\n", str, a);
return 0;
}
In the above program, I assign the string "s" to a char array.
I thought the length of str[] is 1. so we cannot store more than the length of the array. But it behaves differently. If I reading a string using scanf it is stored in str[] without any error. What was the length of the array str?
Sample I/O :
Hello
Hello 1

Your str is an array of char initialized with "s", that is, it has size 2 and length 1. The size is one more than the length because a NUL string terminator character (\0) is added at the end.
Your str array can hold at most two char. Trying to write more will cause your program to access memory past the end of the array, which is undefined behavior.
What actually happens though, is that since the str array is stored somewhere in memory (on the stack), and that memory region is far larger than 2 bytes, you are actually able to write past the end without causing a crash. This does not mean that you should. It's still undefined behavior.
Since your array has size 2, it can only hold a string of length 1, plus its terminator. To use scanf() and correctly avoid writing past the end of the array, you can use the field width specifier: a numeric value after the % and before the s, like this:
scanf("%1s", str);

When an array is declared without specifying its size when the size is determined by the used initializers.
In this declaration of an array
char str[] = "s";
there is used a string literal as an initializer. A string literal is a sequence of characters terminated by an included zero-terminating character. That is the string literal "s" has two characters { 's', '\0' }.
Its characters are used to initialize sequentially elements of the array str.
So if you will write
printf( "sizeof( str ) = %zu\n", sizeof( str ) );
then the output will be 2. The length of a string is determinate as a number of characters before the terminating zero character. So if you will write
#include <string.h>
//...
printf( "strlen( str ) = %zu\n", strlen( str ) );
then the output will be 1.
If you will try to write data outside an array then you will get undefined behavior because a memory that does not belong to the array will be overwritten. In some cases you can get the expected result. In other cases the program can finish abnormally. That is the behavior of the program is undefined.

The array str has size 2: 1 byte for the character 's' and one for the terminating null byte. What you're doing is writing past the end of the array. Doing so invokes undefined behavior.
When your code has undefined behavior, it could crash, it could output strange results, or it could (as in this case) appear to work properly. Also, making a seemingly unrelated change such as a printf call for debugging or an unused local variable can change how undefined behavior manifests itself.

Related

Why strcmp giving different response for complete filled character array?

#include <stdio.h>
#include <string.h>
void main()
{
char a[10]="123456789";
char b[10]="123456789";
int d;
d=strcmp(a,b);
printf("\nstrcmp(a,b) %d", (strcmp(a,b)==0) ? 0:1);
printf("compare Value %d",d);
}
Output:
strcmp(a,b) 0
compare value 0
If the same program response is different when increase the array to full value, I mean 10 characters. That time the values are different.
#include <stdio.h>
#include <string.h>
void main()
{
char a[10]="1234567890";
char b[10]="1234567890";
int d;
d=strcmp(a,b);
printf("\nstrcmp(a,b) %d", (strcmp(a,b)==0) ? 0:1);
printf("compare Value %d",d);
}
Output:
strcmp(a,b) 1
compare value -175
Why strcmp responding differently when the string is reached full value of array ?
The behaviour of your second snippet is undefined.
There's no room for the null-terminator, which is relied upon by strcmp, when you write char a[10]="1234567890";. This causes strcmp to overrun the array.
One remedy is to use strncmp.
Another one is to use char a[]="1234567890"; (with b adjusted similarly) and let the compiler figure out the array length which will be, in this case, 11.
According to the definitions of terms used in the C Standard (7.1.1 Definitions of terms)
1 Astring is a contiguous sequence of characters terminated by and
including the first null character....The length of a string is the
number of bytes preceding the null character and the value of a string
is the sequence of the values of the contained characters, in order.
According to the description of function strcmp
2 The strcmp function compares the string pointed to by s1 to
the string pointed to by s2.
According to the section 6.7.9 Initialization Of the Standard
14 An array of character type may be initialized by a character string
literal or UTF−8 string literal, optionally enclosed in braces.
Successive bytes of the string literal (including the terminating
null character if there is room or if the array is of unknown size)
initialize the elements of the array.
In the first program arrays a and b initialized by string literals have room to store the terminating zero.
char a[10]="123456789";
char b[10]="123456789";
Thus the array contain string and the function strcmp may be applied to these arrays.
In the second program arrays a and b do not have a room to store the terminating zero
char a[10]="1234567890";
char b[10]="1234567890";
So the arrays do not contain strings and the function strcmp may not be applied to the arrays. Otherwise it will have undefined behaviour because it will stop when it finds either non-equal characters beyond the arrays (because the arrays have all equal characters) or a terminating zero.
You could get a valid result if you limit the comparison with the sizes of the arrays. To do that you have to use another standard function strncmp
Its call can look for example the following way
strncmp( a, b, sizeof( a ) );
In your second case,
char a[10]="1234567890";
char b[10]="1234567890";
you arrays are not null-terminated, so they cannot be used as strings. Any function operating on string family will invoke undefined behavior, (as they will go past the allocated memory in search of the null-terminator).
You better be using
char a[ ]="1234567890";
char b[ ]="1234567890";
to leave the size allocation to the compiler to avoid the null-termination issue. Compiler will allocate enough memory to hold the supplied initializer as well as the terminating null.
That said, void main() should br int main(void) at least to conform to the standards.
You declare and initialize your array with string literal(but no space for nul termiantor) and also string manipulation function requires C-style string to be passed as argument (terminated with '\0') .
So ,in your second program your arrays -
char a[10]="1234567890";
char b[10]="1234567890";
There is no space for '\0' character , so this invokes undefined behavior.
Increase size of your arrays -
char a[11]="1234567890"; //or char a[]="1234567890";

2-D character array

#include<stdio.h>
void main()
{
char a[10][5] = {"hi", "hello", "fellow"};
printf("%s",a[0]);
}
Why this code printing only hi
#include<stdio.h>
void main()
{
char a[10][5] = {"hi", "hello", "fellow"};
printf("%s",a[1]);
}
While this code is printing "hellofellow"
Why this code printing only hi
You've told printf to print the string stored at a[0], and that string happens to be "hi".
While this code is printing "hellofellow"
This one is by coincidence, in fact your code ought to be rejected by the compiler due to a constraint violation:
No initializer shall attempt to provide a value for an object not contained within the entity being initialized.
The string "fellow", specifically the 'w' at the end of it does not fit within the char[5] being initialised, and this violates the C standard. Perhaps also by coincidence, your compiler provides an extension (making it technically a non-C compiler), and so you don't see the error messages that I do:
prog.c:3:6: error: return type of 'main' is not 'int' [-Werror=main]
void main()
^
prog.c: In function 'main':
prog.c:5:37: error: initializer-string for array of chars is too long [-Werror]
char a[10][5] = {"hi", "hello", "fellow"};
^
Note that the second error message is complaining about "fellow", but not "hello". Your "hello" initialisation is valid by exception:
An array of character type may be initialized by a character string literal or UTF-8 string literal, optionally enclosed in braces. Successive bytes of the string literal (including the terminating null character if there is room or if the array is of unknown size) initialize the elements of the array.
The emphasis is mine. What the emphasised section states is that if there isn't enough room for a terminal '\0' character, that won't be used in the initialisation.
Your code:
char a[10][5] = {"hi", "hello", "fellow"};
Allocates 10 char [5]
"hello" takes up 5 so there is no room for the terminating \0, so it runs into "fellow"
If you try it, a [3] should be "w" because "fellow" is too big and the "w" runs over from a[2] to a[3]
Aside from being undefined behavior, it is confusing what you were trying to do
It will give undefines behaviour as string are null-terminated.
And element hello has length of 5.
Declare your array as a[10][7] then you will get intended output.
See here -https://ideone.com/c2zUs0
Why this code printing only hi
Because a[0][2] is null indicating termination thus giving you hi.
This is undefined behavior due to insufficient space to store \0 character.
Please note that the memory allocated is 5bytes per string in your array of strings. Thus, for the a[1] there is not sufficient memory to store the \0 character as all five bytes are assigned with "hello".
Thus, the subsequent memory is read until the \0 character is found.
Thus, you can change the line:
char a[10][5] = {"hi", "hello", "fellow"};
to
char a[][7] = {"hi", "hello", "fellow"};
Why this code printing only hi
This is because the \0 character is already encountered at a[0][2] and thus the reading of the characters is stopped.
What Your Code Does:
Look at the following statement:
char a[10][5] = {"hi", "hello", "fellow"};
It allocates 10 rows. 5 characters are allocated for each index of a.
What is the Problem:
Strings are Null Terminated there is always a null-terminator needed to be stored except for the given characters, so basically the used size of array is numOfCharacters+1, the extra one byte is for the null terminator. When you are initializing the array with exactly size number of characters, the null terminator is skipped. Normally the character array value is printed until the first \0(null terminator) is not found. Please also have a look at this.
The Solution:
No need to worry about this problem, all you need to do is just to set the size equal to the numOfCharactersInString + 1. You can use the following statement:
char a[10][7] = {"hi", "hello", "fellow"};
Since the largest string is "fellow" which contains 6 characters, you need to set the size 6 + 1 that is why the statement should use char a[10][7] instead of char a[10][5]
Hope it helps.
When you declare a 2-D character array as
char a[10][5] = {"hi", "hello", "fellow"};
char a[10][5] reserves memory to store 10 strings each of length 5 which means 4 characters + 1 '\0' character. A point to note is that the array elements are stored in contiguous memory locations.
a[0] points to the first string, a[1] to the second and so on.
Also when you initialize an array partially the other uninitialized elements become 0 instead of being garbage values.
Now in your case,after initialization if you try to visualize the array it would be something like
hi\0\0\0hellofello\0\0...
Now the command
printf("%s",a[0]);
prints characters starting from 'h' of "hi" and stops printing when a '\0' is encountered so "hi" is printed.
Now for the second case,
printf("%s",a[1]);
characters are printed starting from the 'h' of "hello" till a '\0' is encountered.Now the '\0' character is encountered only after printing "hellofello" and hence the output.

strncat() is copying to the same string again

I am trying to concatenate two strings in C programming. Here is my code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char const *argv[])
{
/* code */
char s1[3],s2[34];
strncat(s1,"mv ",3);
strncat(s2," /home/xxxxxxx/.local/share/Trash/",34);
printf("%s \n",s1);
return 0;
}
when I try to print the value in s1 it prints mv /home/xxxxxxx/.local/share/Trash as the output. Why is the value i am putting for s2 getting added with the string s1? If the question is already asked please put the link.
You have undefined behavior in your code, as neither s1 nor s2 are initialized. Uninitialized (non-static) local variables have indeterminate values, and it's unlikely that they have the string terminator that strncat needs to find the end of the string to know where to append the source string.
After you fix the above, you also have another case of undefined behavior when strncat tries to write the string terminator beyond the end of the arrays.
Also, you're not concatenating the two strings, as s1 and s2 are two unrelated arrays, you just append the literal strings to the end of the two arrays but not together.
What you could do is to allocate an array big enough to hold both strings, and the string terminator, then copy the first string into the array, and then append the second string.
Or not use e.g. snprintf (or _snprintf is using the Microsoft Windows runtime library) to construct the string.
char s[100];
snprintf(s, sizeof(s), "mv %s %s", somepath, someotherpath);
s1 is defined as
char s1[3],
that is it has three elements (characters). When strncat was executed
strncat(s1," mv",3);
this three elements were filled with { ' ', 'm', 'y' }
After that this array does not have the terminating zero.
Format specifier %s in printf function outputs a character array until the terminatig zero will be encountered. As array s1 does not have the terminating zero then the printf continues to output all bytes that are beyond the array. because after array s1 there is array s2
char s1[3],s2[34];
then it also is outputed until the terminating zero will be encountered.
Take into account that function strncat requires that the target string would be zero terminated. However you did not initialize s1. So the behaviour of the program is undefined.
If you want that the program would work correctly you have to define array s1 as having four characters
char s1[4] = { '\0' },s2[34];
strncat(s1,"mv ",4);
Or it would be simpler to write
char s1[4] = { '\0' },s2[34];
strcat( s1, "mv " );
Or even the following way
char s1[4],s2[34];
strcpy( s1, "mv " );
that is it would be better to use function strcpy instead of strncpy
when I try to print the value in s1 it prints mv /home/ashwini/.local/share/Trash as the output.
This is undefined behavior: s1 is not null-terminated, because it has a three-character string in a space of three characters; there's no space for the null terminator.
Your s2 string buffer happens to be located in the adjacent region of memory, so it gets printed as well, until printf runs into the null terminator of s2.
Allocating more memory to s1 and accounting for null termination would fix this problem:
char s1[4],s2[36];
s1[0] = '\0';
strncat(s1," mv", 4);
s2[0] = '\0';
strncat(s2," /home/xxxxxxx/.local/share/Trash/", 36);
However, strncat is not a proper function for working with regular strings: it is designed for use with fixed-length strings, which are no longer in widespread use. Unfortunately, C standard library does not include strlcat, which has proper semantic for "regular" C strings. It is available on many systems as a library extension, though.
Demo.

strlen and size of for character arrays

I have the following code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
char p[5];
char q[]="Hello";
int i=0;
strcpy(p,"Hello");
printf("strlen(p)=%d\n",strlen(p));
printf("sizeof(p)=%d\n",sizeof(p));
printf("strlen(q)=%d\n",strlen(q));
printf("sizeof(q)=%d\n",sizeof(q));
for(i=0;i<6;i++)
{
printf("p[%d]=%c\tq[%d]=%c\n",i,p[i],i,q[i]);
}
return 0;
}
The output that I get is:
strlen(p)=5
sizeof(p)=5
strlen(q)=5
sizeof(q)=6
p[0]=H q[0]=H
p[1]=e q[1]=e
p[2]=l q[2]=l
p[3]=l q[3]=l
p[4]=o q[4]=o
p[5]= q[5]=
I know declaring array like q[]="some string" sets the size of the array equal to the number of characters in the string const, but why is there a difference in the output of sizeof() for both the types of array declaration?
How does the strlen() & the printf() know when to stop, there was no null character added while declaring the two arrays.
There are multiple questions in your question.
strcpy(p,"Hello");
This is illegal since p is only 5 chars long, so there's no room
left for the terminating 0 added by strcpy. Consequently it is
either not 0-terminated or the 0 byte was added outside the available
space - calling strlen on it is also undefined behavior or fishy at
least
Calling sizeof on p is okay and yields the correct value of 5.
Calling strlen(q) yields 5 because q indeed contains a 0 terminator - implicitly added by initializing with a string literal - and there are 5 chars before the 0
Since it contains a 0 terminator, q is really an array of 6
characters so sizeof yields 6.
char p[5];
strcpy(p,"Hello");
copies 5 characters into p and writes the terminating null-character ('\0') at 6th position, i.e. out of the bounds of this array, which yields undefined behavior.
From manual page of strcpy:
"If the destination string of a strcpy() is not large enough, then anything might happen. Any time a program reads or copies data into a buffer, the program first needs to check that there's enough space."
char p[5];
strcpy(p,"Hello");
This strcpy writes a 0 into p[5]. So it's out of bounds. The sizeof(p) is still 5 though.
You have written over the end of p. It's incorrect and results in undefined behavior. In
this case nothing bad happened and it went unnoticed.
The other string you have, has a length of 5 and a sizeof 6.
The q char array also contains the null terminating character. While the fixed size of p doesn't allow the null character to be copied in. Notice that strlen will check for the null character to count the amount of characters of a string, therefore not having one will probably cause undefined behavior.
sizeof(q) is 6, since it contains null terminator.
p does not hold enough space for the null terminator - so strlen(p) can be any random value. This is called undefined behavior.
Strings in C are terminated by a NUL character '\0';
This is why sizeof(q) returns 6, it has enough space to store the '\0' at the end.
You've sized p yourself to be able to hold 5 characters, not enough for the trailing '\0'.
So, this code is undefined behaviour:
strcpy(p, "Hello");
This is copying the '\0' into p[5], which is out-of-bounds.
Question: why is there a difference in the output of sizeof() for both the types of array declaration?
Answer: This statement declares a variable named q, with type char[], pointing at a memory location that holds "Hello".
char q[] = "Hello";
sizeof(q) is 6 because the string "Hello" is comprised of 'H','e','l','l','o','\0', which includes the NULL char in the count.
This statement declares a variable named p, with type char[], pointing to a memory location where 5 char's are reserved.
char p[5];
Note that depending upon memory alignment flags to the compiler, you may actually have 6, 8, or more char's reserved at the location reserved to p. And C won't complain if you reference or assign p[5] (which is the ordinal sixth char in the p[] array).
sizeof(p) is 5 because the compiler has recorded how big the memory location you declared for p. So sizeof(p) and sizeof(q) return different values because p and q are declared differently and refer to different entities.
Question: How does the strlen() & the printf() know when to stop, there was no null character added while declaring the two arrays.
Answer: Both strlen() function calls count the number of non-NULL char's. So both strlen function calls count char's until they locate the NULL terminator. Which both p and q have, at least until the memory location at p+5 is assigned another value. This is because p and q are both allocated on the stack. Look at the addresses of p, q, and the integer i. Here is your function with additional variables added to help illustrate where p and q are located,
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define min(a,b) (((a)<(b))?(a):(b))
#define max(a,b) (((a)<(b))?(b):(a))
int main()
{
char m0 = 'X';
char p[5];
char m1 = 'Y';
char q[]="Hello";
char m2 = 'Z';
int i=0;
strcpy(p,"World");
printf("strlen(p)=%d\n",strlen(p));
printf("sizeof(p)=%d\n",sizeof(p));
printf("strlen(q)=%d\n",strlen(q));
printf("sizeof(q)=%d\n",sizeof(q));
for(i=0;i<6;i++)
{
printf("p[%d]=%c\tq[%d]=%c\n",i,p[i],i,q[i]);
}
printf("m0=%x, %c\n",&m0,m0);
printf(" p=%x\n",p);
printf("m1=%x, %c\n",&m1,m1);
printf(" q=%x\n",q);
printf("m2=%x, %c\n",&m2,m2);
char *x;
for(x=min(&m0,&m2);x<max(&m0,&m2);x++)
{
printf("x[%x]=%c\n",x,*x);
}
return 0;
}
Observe that m0, m1, and m2 are adjacent to the arrays p[] and q[]. When run on my Linux system, we observe that the strcpy of "World" modifies the value of m0 (replaces the 'X' with '\0').
strlen(p)=5
sizeof(p)=5
strlen(q)=5
sizeof(q)=6
p[0]=W q[0]=H
p[1]=o q[1]=e
p[2]=r q[2]=l
p[3]=l q[3]=l
p[4]=d q[4]=o
p[5]= q[5]=
m0=bfbea6a7,
p=bfbea6a2
m1=bfbea6a1, Y
q=bfbea69b
m2=bfbea69a, Z
x[bfbea69a]=Z
x[bfbea69b]=H
x[bfbea69c]=e
x[bfbea69d]=l
x[bfbea69e]=l
x[bfbea69f]=o
x[bfbea6a0]=
x[bfbea6a1]=Y
x[bfbea6a2]=W
x[bfbea6a3]=o
x[bfbea6a4]=r
x[bfbea6a5]=l
x[bfbea6a6]=d
x[bfbea6a7]=
A C literal string such as "Hello" or "World" is terminated by the NULL char, and includes that char in the size of the string. The strcpy() function copies the entire string, including the NULL char at the end.
You should use strncpy, or check the destination string size. Note that when you used strcpy(p,q), you copied more characters (the NULL terminator) than p[] had allocated. That is something you want to avoid. C does not do boundary checking on arrays, so it will let you perform the strcpy. Though lint would detect this error.

I'm new to C, can someone explain why the size of this string can change?

I have never really done much C but am starting to play around with it. I am writing little snippets like the one below to try to understand the usage and behaviour of key constructs/functions in C. The one below I wrote trying to understand the difference between char* string and char string[] and how then lengths of strings work. Furthermore I wanted to see if sprintf could be used to concatenate two strings and set it into a third string.
What I discovered was that the third string I was using to store the concatenation of the other two had to be set with char string[] syntax or the binary would die with SIGSEGV (Address boundary error). Setting it using the array syntax required a size so I initially started by setting it to the combined size of the other two strings. This seemed to let me perform the concatenation well enough.
Out of curiosity, though, I tried expanding the "concatenated" string to be longer than the size I had allocated. Much to my surprise, it still worked and the string size increased and could be printf'd fine.
My question is: Why does this happen, is it invalid or have risks/drawbacks? Furthermore, why is char str3[length3] valid but char str3[7] causes "SIGABRT (Abort)" when sprintf line tries to execute?
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void main() {
char* str1 = "Sup";
char* str2 = "Dood";
int length1 = strlen(str1);
int length2 = strlen(str2);
int length3 = length1 + length2;
char str3[length3];
//char str3[7];
printf("%s (length %d)\n", str1, length1); // Sup (length 3)
printf("%s (length %d)\n", str2, length2); // Dood (length 4)
printf("total length: %d\n", length3); // total length: 7
printf("str3 length: %d\n", (int)strlen(str3)); // str3 length: 6
sprintf(str3, "%s<-------------------->%s", str1, str2);
printf("%s\n", str3); // Sup<-------------------->Dood
printf("str3 length after sprintf: %d\n", // str3 length after sprintf: 29
(int)strlen(str3));
}
This line is wrong:
char str3[length3];
You're not taking the terminating zero into account. It should be:
char str3[length3+1];
You're also trying to get the length of str3, while it hasn't been set yet.
In addition, this line:
sprintf(str3, "%s<-------------------->%s", str1, str2);
will overflow the buffer you allocated for str3. Make sure you allocate enough space to hold the complete string, including the terminating zero.
void main() {
char* str1 = "Sup"; // a pointer to the statically allocated sequence of characters {'S', 'u', 'p', '\0' }
char* str2 = "Dood"; // a pointer to the statically allocated sequence of characters {'D', 'o', 'o', 'd', '\0' }
int length1 = strlen(str1); // the length of str1 without the terminating \0 == 3
int length2 = strlen(str2); // the length of str2 without the terminating \0 == 4
int length3 = length1 + length2;
char str3[length3]; // declare an array of7 characters, uninitialized
So far so good. Now:
printf("str3 length: %d\n", (int)strlen(str3)); // What is the length of str3? str3 is uninitialized!
C is a primitive language. It doesn't have strings. What it does have is arrays and pointers. A string is a convention, not a datatype. By convention, people agree that "an array of chars is a string, and the string ends at the first null character". All the C string functions follow this convention, but it is a convention. It is simply assumed that you follow it, or the string functions will break.
So str3 is not a 7-character string. It is an array of 7 characters. If you pass it to a function which expects a string, then that function will look for a '\0' to find the end of the string. str3 was never initialized, so it contains random garbage. In your case, apparently, there was a '\0' after the 6th character so strlen returns 6, but that's not guaranteed. If it hadn't been there, then it would have read past the end of the array.
sprintf(str3, "%s<-------------------->%s", str1, str2);
And here it goes wrong again. You are trying to copy the string "Sup<-------------------->Dood\0" into an array of 7 characters. That won't fit. Of course the C function doesn't know this, it just copies past the end of the array. Undefined behavior, and will probably crash.
printf("%s\n", str3); // Sup<-------------------->Dood
And here you try to print the string stored at str3. printf is a string function. It doesn't care (or know) about the size of your array. It is given a string, and, like all other string functions, determines the length of the string by looking for a '\0'.
Instead of trying to learn C by trial and error, I suggest that you go to your local bookshop and buy an "introduction to C programming" book. You'll end up knowing the language a lot better that way.
There is nothing more dangerous than a programmer who half understands C!
What you have to understand is that C doesn't actually have strings, it has character arrays. Moreover, the character arrays don't have associated length information -- instead, string length is determined by iterating over the characters until a null byte is encountered. This implies, that every char array should be at least strlen + 1 characters in length.
C doesn't perform array bounds checking. This means that the functions you call blindly trust you to have allocated enough space for your strings. When that isn't the case, you may end up writing beyond the bounds of the memory you allocated for your string. For a stack allocated char array, you'll overwrite the values of local variables. For heap-allocated char arrays, you may write beyond the memory area of your application. In either case, the best case is you'll error out immediately, and the worst case is that things appear to be working, but actually aren't.
As for the assignment, you can't write something like this:
char *str;
sprintf(str, ...);
and expect it to work -- str is an uninitialized pointer, so the value is "not defined", which in practice means "garbage". Pointers are memory addresses, so an attempt to write to an uninitialized pointer is an attempt to write to a random memory location. Not a good idea. Instead, what you want to do is something like:
char *str = malloc(sizeof(char) * (string length + 1));
which allocates n+1 characters worth of storage and stores the pointer to that storage in str. Of course, to be safe, you should check whether or not malloc returns null. And when you're done, you need to call free(str).
The reason your code works with the array syntax is because the array, being a local variable, is automatically allocated, so there's actually a free slice of memory there. That's (usually) not the case with an uninitialized pointer.
As for the question of how the size of a string can change, once you understand the bit about null bytes, it becomes obvious: all you need to do to change the size of a string is futz with the null byte. For example:
char str[] = "Foo bar";
str[1] = (char)0; // I'd use the character literal, but this editor won't let me
At this point, the length of the string as reported by strlen will be exactly 1. Or:
char str[] = "Foo bar";
str[7] = '!';
after which strlen will probably crash, because it will keep trying to read more bytes from beyond the array boundary. It might encounter a null byte and then stop (and of course, return the wrong string length), or it might crash.
I've written all of one C program, so expect this answer to be inaccurate and incomplete in a number of ways, which will undoubtedly be pointed out in the comments. ;-)
Your str3 is too short - you need to add extra byte for null-terminator and the length of "<-------------------->" string literal.
Out of curiosity, though, I tried
expanding the "concatenated" string to
be longer than the size I had
allocated. Much to my surprise, it
still worked and the string size
increased and could be printf'd fine.
The behaviour is undefined so it may or may not segfault.
strlen returns the length of the string without the trailing NULL byte (\0, 0x00) but when you create a variable to hold the combined strings you need to add that 1 character.
char str3[length3 + 1];
…and you should be all set.
C strings are '\0' terminated and require an extra byte for that, so at least you should do
char str3[length3 + 1]
will do the job.
In sprintf() ypu are writing beyond the space allocated for str3. This may cause any type of undefined behavior (If you are lucky then it will crash). In strlen(), it is just searching for a NULL character from the memory location you specified and it is finding one in 29th location. It can as well be 129 also i.e. it will behave very erratically.
A few important points:
Just because it works doesn't mean it's safe. Going past the end of a buffer is always unsafe, and even if it works on your computer, it may fail under a different OS, different compiler, or even a second run.
I suggest you think of a char array as a container and a string as an object that is stored inside the container. In this case, the container must be 1 character longer than the object it holds, since a "null character" is required to indicate the end of the object. The container is a fixed size, and the object can change size (by moving the null character).
The first null character in the array indicates the end of the string. The remainder of the array is unused.
You can store different things in a char array (such as a sequence of numbers). It just depends on how you use it. But string function such as printf() or strcat() assume that there is a null-terminated string to be found there.

Resources