strlen and size of for character arrays - c

I have the following code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
char p[5];
char q[]="Hello";
int i=0;
strcpy(p,"Hello");
printf("strlen(p)=%d\n",strlen(p));
printf("sizeof(p)=%d\n",sizeof(p));
printf("strlen(q)=%d\n",strlen(q));
printf("sizeof(q)=%d\n",sizeof(q));
for(i=0;i<6;i++)
{
printf("p[%d]=%c\tq[%d]=%c\n",i,p[i],i,q[i]);
}
return 0;
}
The output that I get is:
strlen(p)=5
sizeof(p)=5
strlen(q)=5
sizeof(q)=6
p[0]=H q[0]=H
p[1]=e q[1]=e
p[2]=l q[2]=l
p[3]=l q[3]=l
p[4]=o q[4]=o
p[5]= q[5]=
I know declaring array like q[]="some string" sets the size of the array equal to the number of characters in the string const, but why is there a difference in the output of sizeof() for both the types of array declaration?
How does the strlen() & the printf() know when to stop, there was no null character added while declaring the two arrays.

There are multiple questions in your question.
strcpy(p,"Hello");
This is illegal since p is only 5 chars long, so there's no room
left for the terminating 0 added by strcpy. Consequently it is
either not 0-terminated or the 0 byte was added outside the available
space - calling strlen on it is also undefined behavior or fishy at
least
Calling sizeof on p is okay and yields the correct value of 5.
Calling strlen(q) yields 5 because q indeed contains a 0 terminator - implicitly added by initializing with a string literal - and there are 5 chars before the 0
Since it contains a 0 terminator, q is really an array of 6
characters so sizeof yields 6.

char p[5];
strcpy(p,"Hello");
copies 5 characters into p and writes the terminating null-character ('\0') at 6th position, i.e. out of the bounds of this array, which yields undefined behavior.
From manual page of strcpy:
"If the destination string of a strcpy() is not large enough, then anything might happen. Any time a program reads or copies data into a buffer, the program first needs to check that there's enough space."

char p[5];
strcpy(p,"Hello");
This strcpy writes a 0 into p[5]. So it's out of bounds. The sizeof(p) is still 5 though.
You have written over the end of p. It's incorrect and results in undefined behavior. In
this case nothing bad happened and it went unnoticed.
The other string you have, has a length of 5 and a sizeof 6.

The q char array also contains the null terminating character. While the fixed size of p doesn't allow the null character to be copied in. Notice that strlen will check for the null character to count the amount of characters of a string, therefore not having one will probably cause undefined behavior.

sizeof(q) is 6, since it contains null terminator.
p does not hold enough space for the null terminator - so strlen(p) can be any random value. This is called undefined behavior.

Strings in C are terminated by a NUL character '\0';
This is why sizeof(q) returns 6, it has enough space to store the '\0' at the end.
You've sized p yourself to be able to hold 5 characters, not enough for the trailing '\0'.
So, this code is undefined behaviour:
strcpy(p, "Hello");
This is copying the '\0' into p[5], which is out-of-bounds.

Question: why is there a difference in the output of sizeof() for both the types of array declaration?
Answer: This statement declares a variable named q, with type char[], pointing at a memory location that holds "Hello".
char q[] = "Hello";
sizeof(q) is 6 because the string "Hello" is comprised of 'H','e','l','l','o','\0', which includes the NULL char in the count.
This statement declares a variable named p, with type char[], pointing to a memory location where 5 char's are reserved.
char p[5];
Note that depending upon memory alignment flags to the compiler, you may actually have 6, 8, or more char's reserved at the location reserved to p. And C won't complain if you reference or assign p[5] (which is the ordinal sixth char in the p[] array).
sizeof(p) is 5 because the compiler has recorded how big the memory location you declared for p. So sizeof(p) and sizeof(q) return different values because p and q are declared differently and refer to different entities.
Question: How does the strlen() & the printf() know when to stop, there was no null character added while declaring the two arrays.
Answer: Both strlen() function calls count the number of non-NULL char's. So both strlen function calls count char's until they locate the NULL terminator. Which both p and q have, at least until the memory location at p+5 is assigned another value. This is because p and q are both allocated on the stack. Look at the addresses of p, q, and the integer i. Here is your function with additional variables added to help illustrate where p and q are located,
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define min(a,b) (((a)<(b))?(a):(b))
#define max(a,b) (((a)<(b))?(b):(a))
int main()
{
char m0 = 'X';
char p[5];
char m1 = 'Y';
char q[]="Hello";
char m2 = 'Z';
int i=0;
strcpy(p,"World");
printf("strlen(p)=%d\n",strlen(p));
printf("sizeof(p)=%d\n",sizeof(p));
printf("strlen(q)=%d\n",strlen(q));
printf("sizeof(q)=%d\n",sizeof(q));
for(i=0;i<6;i++)
{
printf("p[%d]=%c\tq[%d]=%c\n",i,p[i],i,q[i]);
}
printf("m0=%x, %c\n",&m0,m0);
printf(" p=%x\n",p);
printf("m1=%x, %c\n",&m1,m1);
printf(" q=%x\n",q);
printf("m2=%x, %c\n",&m2,m2);
char *x;
for(x=min(&m0,&m2);x<max(&m0,&m2);x++)
{
printf("x[%x]=%c\n",x,*x);
}
return 0;
}
Observe that m0, m1, and m2 are adjacent to the arrays p[] and q[]. When run on my Linux system, we observe that the strcpy of "World" modifies the value of m0 (replaces the 'X' with '\0').
strlen(p)=5
sizeof(p)=5
strlen(q)=5
sizeof(q)=6
p[0]=W q[0]=H
p[1]=o q[1]=e
p[2]=r q[2]=l
p[3]=l q[3]=l
p[4]=d q[4]=o
p[5]= q[5]=
m0=bfbea6a7,
p=bfbea6a2
m1=bfbea6a1, Y
q=bfbea69b
m2=bfbea69a, Z
x[bfbea69a]=Z
x[bfbea69b]=H
x[bfbea69c]=e
x[bfbea69d]=l
x[bfbea69e]=l
x[bfbea69f]=o
x[bfbea6a0]=
x[bfbea6a1]=Y
x[bfbea6a2]=W
x[bfbea6a3]=o
x[bfbea6a4]=r
x[bfbea6a5]=l
x[bfbea6a6]=d
x[bfbea6a7]=
A C literal string such as "Hello" or "World" is terminated by the NULL char, and includes that char in the size of the string. The strcpy() function copies the entire string, including the NULL char at the end.
You should use strncpy, or check the destination string size. Note that when you used strcpy(p,q), you copied more characters (the NULL terminator) than p[] had allocated. That is something you want to avoid. C does not do boundary checking on arrays, so it will let you perform the strcpy. Though lint would detect this error.

Related

C Arbitrary length string

I have a doubt how the length for an array is allocated
#include <stdio.h>
#include <string.h>
int main()
{
char str[] = "s";
long unsigned a = strlen(str);
scanf("%s", str);
printf("%s\n%lu\n", str, a);
return 0;
}
In the above program, I assign the string "s" to a char array.
I thought the length of str[] is 1. so we cannot store more than the length of the array. But it behaves differently. If I reading a string using scanf it is stored in str[] without any error. What was the length of the array str?
Sample I/O :
Hello
Hello 1
Your str is an array of char initialized with "s", that is, it has size 2 and length 1. The size is one more than the length because a NUL string terminator character (\0) is added at the end.
Your str array can hold at most two char. Trying to write more will cause your program to access memory past the end of the array, which is undefined behavior.
What actually happens though, is that since the str array is stored somewhere in memory (on the stack), and that memory region is far larger than 2 bytes, you are actually able to write past the end without causing a crash. This does not mean that you should. It's still undefined behavior.
Since your array has size 2, it can only hold a string of length 1, plus its terminator. To use scanf() and correctly avoid writing past the end of the array, you can use the field width specifier: a numeric value after the % and before the s, like this:
scanf("%1s", str);
When an array is declared without specifying its size when the size is determined by the used initializers.
In this declaration of an array
char str[] = "s";
there is used a string literal as an initializer. A string literal is a sequence of characters terminated by an included zero-terminating character. That is the string literal "s" has two characters { 's', '\0' }.
Its characters are used to initialize sequentially elements of the array str.
So if you will write
printf( "sizeof( str ) = %zu\n", sizeof( str ) );
then the output will be 2. The length of a string is determinate as a number of characters before the terminating zero character. So if you will write
#include <string.h>
//...
printf( "strlen( str ) = %zu\n", strlen( str ) );
then the output will be 1.
If you will try to write data outside an array then you will get undefined behavior because a memory that does not belong to the array will be overwritten. In some cases you can get the expected result. In other cases the program can finish abnormally. That is the behavior of the program is undefined.
The array str has size 2: 1 byte for the character 's' and one for the terminating null byte. What you're doing is writing past the end of the array. Doing so invokes undefined behavior.
When your code has undefined behavior, it could crash, it could output strange results, or it could (as in this case) appear to work properly. Also, making a seemingly unrelated change such as a printf call for debugging or an unused local variable can change how undefined behavior manifests itself.

Is it undefined behavior if the destination string in strcat function is not null terminated?

The following program
// Code has taken from http://ideone.com/AXClWb
#include <stdio.h>
#include <string.h>
#define SIZE1 5
#define SIZE2 10
#define SIZE3 15
int main(void){
char a[SIZE1] = "Hello";
char b[SIZE2] = " World";
char res[SIZE3] = {0};
for (int i=0 ; i<SIZE1 ; i++){
res[i] = a[i];
}
strcat(res, b);
printf("The new string is: %s\n",res);
return 0;
}
has well defined behavior. As per the requirement, source string b is null terminated. But what would be the behavior if the line
char res[SIZE3] = {0}; // Destination string
is replaced with
char res[SIZE3];
Does standard says explicitly about the destination string to be null terminated too?
TL;DR Yes.
Since this is a language-lawyer question, let me add my two cents to it.
Quoting C11, chapter §7.24.3.1/2 (emphas is mine)
char *strcat(char * restrict s1,const char * restrict s2);
The strcat function appends a copy of the string pointed to by s2 (including the
terminating null character) to the end of the string pointed to by s1. The initial character
of s2 overwrites the null character at the end of s1.[...]
and, by definition, a string is null-terminated, quoting §7.1.1/1
A string is a contiguous sequence of characters terminated by and including the first null
character.
So, if the source char array is not null-terminated (i.e., not a string), strcat() may very well go beyond the bounds in search of the end which invokes undefined behavior.
As per your question, char res[SIZE3]; being an automatic local variable, will contain indeterminate value, and if used as the destination of strcat(), will invoke UB.
I think man explicitly says that
Description
The strcat() function appends the src string to the dest string, overwriting the terminating null byte ('\0') at the end of dest, and then adds a terminating null byte. The strings may not overlap, and the dest string must have enough space for the result. If dest is not large enough, program behavior is unpredictable; buffer overruns are a favorite avenue for attacking secure programs.
Enphasis mine
BTW I think strcat starts searching for the null terminator into the dest string before to concatenate the new string, so it is obviously UB, as far as dest string has automatic storage.
In the proposed code
for (int i=0 ; i<SIZE1 ; i++){
res[i] = a[i];
}
Copy 5 chars of a and not the null terminator to res string, so other bytes from 5 to 14 are uninitialized.
Standard also says about safaer implementation strcat-s
K.3.7.2.1 The strcat_s function
Synopsis
#define _ _STDC_WANT_LIB_EXT1_ _ 1
#include <string.h>
errno_t strcat_s(char * restrict s1,
rsize_t s1max,
const char * restrict s2);
Runtime-constraints
2 Let m denote the value s1max - strnlen_s(s1, s1max) upon entry to
strcat_s.
We can see that strlen_s always return them valid size for the dest buffer. From my point of view this implementation was introduced to avoid the UB of the question.
If you leave res uninitialized then, after the copying a into res (in for loop), there's no NUL terminator in res. So, the behaviour of strcat() is undefined if the destination string doesn't contain a NUL byte.
Basically strcat() requires both of its arguments to be strings (i.e. both must contain the terminating NUL byte). Otherwise, it's undefined behaviour. This
is obvious from the description of strcat():
§7.23.3.2, strcat() function
The strcat function appends a copy of the string pointed to by s2
(including the terminating null character) to the end of the string
pointed to by s1. The initial character of s2 overwrites the null
character at the end of s1.
(emphasis mine).
If char res[SIZE3]; is on the stack, it'll have random/undefined stuff in it.
You'll never know whether there'll be a zero byte within res[SIZE3], so yes, strcatting to that is undefined.
If char res[SIZE3]; is an uninitialized global, it'll be all zeros, which will make it behave as an empty c-string, and strcating to it will be safe (as long as SIZE3 is large enough for what you're appending).

Why strcmp giving different response for complete filled character array?

#include <stdio.h>
#include <string.h>
void main()
{
char a[10]="123456789";
char b[10]="123456789";
int d;
d=strcmp(a,b);
printf("\nstrcmp(a,b) %d", (strcmp(a,b)==0) ? 0:1);
printf("compare Value %d",d);
}
Output:
strcmp(a,b) 0
compare value 0
If the same program response is different when increase the array to full value, I mean 10 characters. That time the values are different.
#include <stdio.h>
#include <string.h>
void main()
{
char a[10]="1234567890";
char b[10]="1234567890";
int d;
d=strcmp(a,b);
printf("\nstrcmp(a,b) %d", (strcmp(a,b)==0) ? 0:1);
printf("compare Value %d",d);
}
Output:
strcmp(a,b) 1
compare value -175
Why strcmp responding differently when the string is reached full value of array ?
The behaviour of your second snippet is undefined.
There's no room for the null-terminator, which is relied upon by strcmp, when you write char a[10]="1234567890";. This causes strcmp to overrun the array.
One remedy is to use strncmp.
Another one is to use char a[]="1234567890"; (with b adjusted similarly) and let the compiler figure out the array length which will be, in this case, 11.
According to the definitions of terms used in the C Standard (7.1.1 Definitions of terms)
1 Astring is a contiguous sequence of characters terminated by and
including the first null character....The length of a string is the
number of bytes preceding the null character and the value of a string
is the sequence of the values of the contained characters, in order.
According to the description of function strcmp
2 The strcmp function compares the string pointed to by s1 to
the string pointed to by s2.
According to the section 6.7.9 Initialization Of the Standard
14 An array of character type may be initialized by a character string
literal or UTF−8 string literal, optionally enclosed in braces.
Successive bytes of the string literal (including the terminating
null character if there is room or if the array is of unknown size)
initialize the elements of the array.
In the first program arrays a and b initialized by string literals have room to store the terminating zero.
char a[10]="123456789";
char b[10]="123456789";
Thus the array contain string and the function strcmp may be applied to these arrays.
In the second program arrays a and b do not have a room to store the terminating zero
char a[10]="1234567890";
char b[10]="1234567890";
So the arrays do not contain strings and the function strcmp may not be applied to the arrays. Otherwise it will have undefined behaviour because it will stop when it finds either non-equal characters beyond the arrays (because the arrays have all equal characters) or a terminating zero.
You could get a valid result if you limit the comparison with the sizes of the arrays. To do that you have to use another standard function strncmp
Its call can look for example the following way
strncmp( a, b, sizeof( a ) );
In your second case,
char a[10]="1234567890";
char b[10]="1234567890";
you arrays are not null-terminated, so they cannot be used as strings. Any function operating on string family will invoke undefined behavior, (as they will go past the allocated memory in search of the null-terminator).
You better be using
char a[ ]="1234567890";
char b[ ]="1234567890";
to leave the size allocation to the compiler to avoid the null-termination issue. Compiler will allocate enough memory to hold the supplied initializer as well as the terminating null.
That said, void main() should br int main(void) at least to conform to the standards.
You declare and initialize your array with string literal(but no space for nul termiantor) and also string manipulation function requires C-style string to be passed as argument (terminated with '\0') .
So ,in your second program your arrays -
char a[10]="1234567890";
char b[10]="1234567890";
There is no space for '\0' character , so this invokes undefined behavior.
Increase size of your arrays -
char a[11]="1234567890"; //or char a[]="1234567890";

Basic C question on copying char arrays to char pointers

I have some doubts in basic C programming.
I have a char array and I have to copy it to a char pointer. So I did the following:
char a[] = {0x3f, 0x4d};
char *p = a;
printf("a = %s\n",a);
printf("p = %s\n",p);
unsigned char str[] = {0x3b, 0x4b};
unsigned char *pstr =str;
memcpy(pstr, str, sizeof str);
printf("str = %s\n",str);
printf("pstr = %s\n",pstr);
My printf statements for pstr and str get appended with the data "a".
If I remove memcpy I get junk. Can some C Guru enlighten me?
Firstly, C strings (the %s in printf) are expected to be NUL-terminated. You're missing the terminators. Try char a[] = {0x3f, 0x4d, 0} (same goes for str).
Secondly, pstr and str point to the same memory, so your memcpy is a no-op. This is a minor point compared to the first one.
Add a null terminator, cause that's what you printf expects:
char a[] = {0x3f, 0x4d, '\0'};
The standard way C strings are represented is that in memory, they are a sequence of non-zero bytes representing the characters, followed by a zero (or NULL) byte. You should declare:
char a[] = {0x3f, 0x4d, 0};
When you assign a string pointer (as in unsigned char *pstr = str;) both pointers point to the same memory area, and thus the same characters. There is no need to copy the characters.
When you do need to copy characters, you should be using strlen(), the sizeof() operator returns the number of bytes its argument uses in memory. sizeof(pointer) is the number of bytes the pointer uses, not the length of the string. You find the length of a string (i.e. the number of bytes it occupies in memory) with the strlen() function. Also, there are standard functions to copy C strings. You should rely on those to do the right thing:
strcpy(pstr, str);
printf's %s expects a 0-terminated string, your strings aren't. The uninitialized memory following your arrays may however happen to start with a 0-byte, in which case your code will appear to be correct - it still isn't.
You're declaring an array "str", then pointing to it with pstr. Note that you have no null-terminating character, so after using memcpy you copy the block to itself with no null terminator, as a string requires. Thus, printf can't find the end of the string and continues printing until it finds a 0 (or '\0' in character terms)
Agreed. You'll have to add a null byte at the end of your array of chars.
char a[] = {0x3f, 0x4d, '\0'};
The reason being is that you're creating a string without declaring where it actually ends. Your memcpy() function copies *str to *pstr and automatically adds a null byte for you, which is why it works.
Without memcpy() there the string never knows when to end, so it reaches into subsequent memory addresses and returns whatever random values are stored there. When you're creating a string out of characters, always remember to end it with a null byte.

I'm new to C, can someone explain why the size of this string can change?

I have never really done much C but am starting to play around with it. I am writing little snippets like the one below to try to understand the usage and behaviour of key constructs/functions in C. The one below I wrote trying to understand the difference between char* string and char string[] and how then lengths of strings work. Furthermore I wanted to see if sprintf could be used to concatenate two strings and set it into a third string.
What I discovered was that the third string I was using to store the concatenation of the other two had to be set with char string[] syntax or the binary would die with SIGSEGV (Address boundary error). Setting it using the array syntax required a size so I initially started by setting it to the combined size of the other two strings. This seemed to let me perform the concatenation well enough.
Out of curiosity, though, I tried expanding the "concatenated" string to be longer than the size I had allocated. Much to my surprise, it still worked and the string size increased and could be printf'd fine.
My question is: Why does this happen, is it invalid or have risks/drawbacks? Furthermore, why is char str3[length3] valid but char str3[7] causes "SIGABRT (Abort)" when sprintf line tries to execute?
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void main() {
char* str1 = "Sup";
char* str2 = "Dood";
int length1 = strlen(str1);
int length2 = strlen(str2);
int length3 = length1 + length2;
char str3[length3];
//char str3[7];
printf("%s (length %d)\n", str1, length1); // Sup (length 3)
printf("%s (length %d)\n", str2, length2); // Dood (length 4)
printf("total length: %d\n", length3); // total length: 7
printf("str3 length: %d\n", (int)strlen(str3)); // str3 length: 6
sprintf(str3, "%s<-------------------->%s", str1, str2);
printf("%s\n", str3); // Sup<-------------------->Dood
printf("str3 length after sprintf: %d\n", // str3 length after sprintf: 29
(int)strlen(str3));
}
This line is wrong:
char str3[length3];
You're not taking the terminating zero into account. It should be:
char str3[length3+1];
You're also trying to get the length of str3, while it hasn't been set yet.
In addition, this line:
sprintf(str3, "%s<-------------------->%s", str1, str2);
will overflow the buffer you allocated for str3. Make sure you allocate enough space to hold the complete string, including the terminating zero.
void main() {
char* str1 = "Sup"; // a pointer to the statically allocated sequence of characters {'S', 'u', 'p', '\0' }
char* str2 = "Dood"; // a pointer to the statically allocated sequence of characters {'D', 'o', 'o', 'd', '\0' }
int length1 = strlen(str1); // the length of str1 without the terminating \0 == 3
int length2 = strlen(str2); // the length of str2 without the terminating \0 == 4
int length3 = length1 + length2;
char str3[length3]; // declare an array of7 characters, uninitialized
So far so good. Now:
printf("str3 length: %d\n", (int)strlen(str3)); // What is the length of str3? str3 is uninitialized!
C is a primitive language. It doesn't have strings. What it does have is arrays and pointers. A string is a convention, not a datatype. By convention, people agree that "an array of chars is a string, and the string ends at the first null character". All the C string functions follow this convention, but it is a convention. It is simply assumed that you follow it, or the string functions will break.
So str3 is not a 7-character string. It is an array of 7 characters. If you pass it to a function which expects a string, then that function will look for a '\0' to find the end of the string. str3 was never initialized, so it contains random garbage. In your case, apparently, there was a '\0' after the 6th character so strlen returns 6, but that's not guaranteed. If it hadn't been there, then it would have read past the end of the array.
sprintf(str3, "%s<-------------------->%s", str1, str2);
And here it goes wrong again. You are trying to copy the string "Sup<-------------------->Dood\0" into an array of 7 characters. That won't fit. Of course the C function doesn't know this, it just copies past the end of the array. Undefined behavior, and will probably crash.
printf("%s\n", str3); // Sup<-------------------->Dood
And here you try to print the string stored at str3. printf is a string function. It doesn't care (or know) about the size of your array. It is given a string, and, like all other string functions, determines the length of the string by looking for a '\0'.
Instead of trying to learn C by trial and error, I suggest that you go to your local bookshop and buy an "introduction to C programming" book. You'll end up knowing the language a lot better that way.
There is nothing more dangerous than a programmer who half understands C!
What you have to understand is that C doesn't actually have strings, it has character arrays. Moreover, the character arrays don't have associated length information -- instead, string length is determined by iterating over the characters until a null byte is encountered. This implies, that every char array should be at least strlen + 1 characters in length.
C doesn't perform array bounds checking. This means that the functions you call blindly trust you to have allocated enough space for your strings. When that isn't the case, you may end up writing beyond the bounds of the memory you allocated for your string. For a stack allocated char array, you'll overwrite the values of local variables. For heap-allocated char arrays, you may write beyond the memory area of your application. In either case, the best case is you'll error out immediately, and the worst case is that things appear to be working, but actually aren't.
As for the assignment, you can't write something like this:
char *str;
sprintf(str, ...);
and expect it to work -- str is an uninitialized pointer, so the value is "not defined", which in practice means "garbage". Pointers are memory addresses, so an attempt to write to an uninitialized pointer is an attempt to write to a random memory location. Not a good idea. Instead, what you want to do is something like:
char *str = malloc(sizeof(char) * (string length + 1));
which allocates n+1 characters worth of storage and stores the pointer to that storage in str. Of course, to be safe, you should check whether or not malloc returns null. And when you're done, you need to call free(str).
The reason your code works with the array syntax is because the array, being a local variable, is automatically allocated, so there's actually a free slice of memory there. That's (usually) not the case with an uninitialized pointer.
As for the question of how the size of a string can change, once you understand the bit about null bytes, it becomes obvious: all you need to do to change the size of a string is futz with the null byte. For example:
char str[] = "Foo bar";
str[1] = (char)0; // I'd use the character literal, but this editor won't let me
At this point, the length of the string as reported by strlen will be exactly 1. Or:
char str[] = "Foo bar";
str[7] = '!';
after which strlen will probably crash, because it will keep trying to read more bytes from beyond the array boundary. It might encounter a null byte and then stop (and of course, return the wrong string length), or it might crash.
I've written all of one C program, so expect this answer to be inaccurate and incomplete in a number of ways, which will undoubtedly be pointed out in the comments. ;-)
Your str3 is too short - you need to add extra byte for null-terminator and the length of "<-------------------->" string literal.
Out of curiosity, though, I tried
expanding the "concatenated" string to
be longer than the size I had
allocated. Much to my surprise, it
still worked and the string size
increased and could be printf'd fine.
The behaviour is undefined so it may or may not segfault.
strlen returns the length of the string without the trailing NULL byte (\0, 0x00) but when you create a variable to hold the combined strings you need to add that 1 character.
char str3[length3 + 1];
…and you should be all set.
C strings are '\0' terminated and require an extra byte for that, so at least you should do
char str3[length3 + 1]
will do the job.
In sprintf() ypu are writing beyond the space allocated for str3. This may cause any type of undefined behavior (If you are lucky then it will crash). In strlen(), it is just searching for a NULL character from the memory location you specified and it is finding one in 29th location. It can as well be 129 also i.e. it will behave very erratically.
A few important points:
Just because it works doesn't mean it's safe. Going past the end of a buffer is always unsafe, and even if it works on your computer, it may fail under a different OS, different compiler, or even a second run.
I suggest you think of a char array as a container and a string as an object that is stored inside the container. In this case, the container must be 1 character longer than the object it holds, since a "null character" is required to indicate the end of the object. The container is a fixed size, and the object can change size (by moving the null character).
The first null character in the array indicates the end of the string. The remainder of the array is unused.
You can store different things in a char array (such as a sequence of numbers). It just depends on how you use it. But string function such as printf() or strcat() assume that there is a null-terminated string to be found there.

Resources