Strcpy: behaving more like 'strcut' - c

char a[3], b[3];
strcpy(a,"abc");
printf("a1 = %s\n", a);
strcpy(b,a);
printf("a2 = %s\n", a);
printf("b = %s\n", b);
From how I understand strcpy to work the output would be:
a1 = abc
a2 = abc
b = abc
Instead I obtain
a1 = abc
a2 =
b = abc
Why when I call strcpy the second time does it (apparently) erase the contents of a?
Thanks

This is a buffer overflow problem - your a and b are too short – they don't have room for the null terminator. What is happening is a is just after b in memory, so when strcpy(b,a) executes, the null terminator stored at the end of b is actually the same memory location as the first character of a. This makes a suddenly an empty string.
For starters, make the lengths of the arrays 4 instead of 3. This is okay in sandbox/play/learning mode, but consider in production code:
Use safer string functions (e.g. strncpy) to avoid buffer overflows.
Use character arrays/buffers that support variable size or pre-calculation of the size required to fit your data.

Since you arrays are too small and do not have room for the null terminator you are most likely overwriting a when you try to copy a to b since the strcpy does not know when to stop copying. This declaration would fix the problem for this particular program:
char a[4], b[4];
In the general case you need to ensure that your destination has enough to space to accommodate the source as well as the null terminator.
This example gives you a better idea of what is going on, this is just for demonstration purposes and you should use code like this for anything else but to learn. This works for me in ideone and you can see if live here but may not work properly in other compilers since we are invoking undefined behavior:
#include <stdio.h>
#include <string.h>
int main()
{
char a[3], b[4];
// a will have a lower address in memory than b
printf("%p %p\n", a, b);
// "abc" is a null terminated literal use a size of 4 to force a copy of null
strncpy(a,"abc",4);
// printf will not overrun buffer since we terminated it
printf("a2 = %s\n", a);
// explicitly only copy 3 bytes
strncpy(b,a,3);
// manually null terminate b
b[3] = '\0' ;
// So we can prove we are seeing b's contents
b[0] = 'z' ;
// This will overrun into b now since b[0] is no longer null
printf("a2 = %s\n", a);
printf("b = %s\n", b);
}

The first strcpy(a,"abc") is already wrong. Don't get confused with char array versus C-String... a C-String is a always a char array, but a char array is NOT always a C-String.
A C-String must have a '\0' char in the end. So when you do strcpy "abc" -> a[3] you are actually moving the following 4 bytes to your array { 'a', 'b', 'c', '\0' }
Because a and b were created together, b is right ahead a. When you print out a it goes fine IN THIS CASE because printf() still can find a '\0' to identify as the end of string, despite it's wrong... because your '\0' char is the the area reserved to b.
The following problems are all related to the same thing...
The solution is: the buffer to your C-String must the the maximum size of your string + 1, so you can guarantee you will have room for the '\0' char. If you need for more details, google for "C-String" or "null-terminated string".

You've made a very common beginners mistake. In C, there is no string primitive; when we talk about strings, we're really talking about null-terminated character arrays (or buffers, I don't care what nomenclature you like). so your char[3] will hold a string of 2 letters, plus the null terminator. Another subtle issue is that in memory, they will be laid out on the stack as a[0]a[1]a[2]b[0]b[1]b[2]--and this is the reason you didn't crash when you deserved to. See "abc" is REALLY "abc\0", so a[3] == c and b[0] == \0, and since the behavior is undefined when strings overlap (as these do), I suspect that your implementation just copied chars until it copies a \0. That being the case, strcpy(a, b) will result in a being an empty string.
On the other hand, your program works as it was written to. What you wrote isn't what you meant :)

Related

string gets filled with garbage

i got a string and a scanf that reads from input until it finds a *, which is the character i picked for the end of the text. After the * all the remaining cells get filled with random characters.
I know that a string after the \0 character if not filled completly until the last cell will fill all the remaining empty ones with \0, why is this not the case and how can i make it so that after the last letter given in input all the remaining cells are the same value?
char string1 [100];
scanf("%[^*]s", string1);
for (int i = 0; i < 100; ++i) {
printf("\n %d=%d",i,string1[i]);
}
if i try to input something like hello*, here's the output:
0=104
1=101
2=108
3=108
4=111
5=0
6=0
7=0
8=92
9=0
10=68
You have an uninitialized array:
char string1 [100];
that has indeterminate values. You could initialize the array like
char string1 [100] = { 0 };
or
char string1 [100] = "";
In this call
scanf("%[^*]s", string1);
you need to remove the trailing character s, because %[] and %s are distinct format specifiers. There is no %[]s format specifier. It should look like this:
scanf("%[^*]", string1);
The array contains a string terminated by the zero character '\0'.
So to output the string you should write for example
for ( int i = 0; string1[i] != '\0'; ++i) {
printf( "%c", string1[i] ); // or putchar( string1[i] );
putchar( '\n' );
or like
for ( int i = 0; string1[i] != '\0'; ++i) {
printf("\n %d=%c",i,string1[i]);
putchar( '\n' );
or just
puts( string1 );
As for your statement
printf("\n %d=%d",i,string1[i]);
then it outputs each character (including non-initialized characters) as integers due to using the conversion specifier d instead of c. That is the function outputs internal ASCII representations of characters.
I know that a string after the \0 character if not filled completly
until the last cell will fill all the remaining empty ones with \0
No, that's not true.
It couldn't be true: there is no length to a string. No where neither the compiler nor any function can even know what is the size of the string. Only you do. So, no, string don't autofill with '\0'
Keep in minds that there aren't any string types in C. Just pointer to chars (sometimes those pointers are constant pointers to an array, but still, they are just pointers. We know where they start, but there is no way (other than deciding it and being consistent while coding) to know where they stop.
Sure, most of the time, there is an obvious answer, that make obvious for any reader of the code what is the size of the allocated memory.
For example, when you code
char string1[20];
sprintf(string1, "hello");
it is quite obvious for a reader of that code that the allocated memory is 20 bytes. So you may think that the compiler should know, when sprinting in it of sscaning to it, that it should fill the unused part of the 20 bytes with 0. But, first of all, the compiler is not there anymore when you will sscanf or sprintf. That occurs at runtime, and compiler is at compilation time. At run time, there is not trace of that 20.
Plus, it can be more complicated than that
void fillString(char *p){
sprintf(p, "hello");
}
int main(){
char string1[20];
string1[0]='O';
string1[1]='t';
fillString(&(string1[2]));
}
How in this case does sprintf is supposed to know that it must fill 18 bytes with the string then '\0'?
And that is for normal usage. I haven't started yet with convoluted but legal usages. Such as using char buffer[1000]; as an array of 50 length-20 strings (buffer, buffer+20, buffer+40, ...) or things like
union {
char str[40];
struct {
char substr1[20];
char substr2[20];
} s;
}
So, no, strings are not filled up with '\0'. That is not the case. It is not the habit in C to have implicit thing happening under the hood. And that could not be the case, even if we wanted to.
Your "star-terminated string" behaves exactly as a "null-terminated string" does. Sometimes the rest of the allocated memory is full of 0, sometimes it is not. The scanf won't touch anything else that what is strictly needed. The rest of the allocated memory remains untouched. If that memory happened to be full of '\0' before the call to scanf, then it remains so. Otherwise not. Which leads me to my last remark: you seem to believe that it is scanf that fills the memory with non-null chars. It is not. Those chars were already there before. If you had the feeling that some other methods fill the rest of memory with '\0', that was just an impression (a natural one, since most of the time, newly allocated memory are 0. Not because a rule says so. But because that is the most frequent byte to be found in random area of memory. That is why uninitialized variables bugs are so painful: they occur only from times to times, because very often uninitialized variables are 0, just by chance, but still they are)
The easiest way to create a zeroed array is to use calloc.
Try replacing
char string1 [100];
with
char *string1=calloc(1,100);

Why one string writes two char in C? [duplicate]

I'm currently learning C and I'm confused with differences between char array and string, as well as how they work.
Question 1:
Why is there a difference in the outcomes of source code 1 and source code 2?
Source code 1:
#include <stdio.h>
#include <string.h>
int main(void)
{
char c[2]="Hi";
printf("%d\n", strlen(c)); //returns 3 (not 2!?)
return 0;
}
Source code 2:
#include <stdio.h>
#include <string.h>
int main(void)
{
char c[3]="Hi";
printf("%d\n", strlen(c)); //returns 2 (not 3!?)
return 0;
}
Question 2:
How is a string variable different from a char array? How to declare them with the minimum required index numbers allowing \0 to be stored if any (please read the codes below)?
char name[index] = "Mick"; //should index be 4 or 5?
char name[index] = {'M', 'i', 'c', 'k'}; //should index be 4 or 5?
#define name "Mick" //what is the size? Is there a \0?
Question 3:
Does the terminating NUL ONLY follow strings but not char arrays? So the actual value of the string "Hi" is [H][i][\0] and the actual value of the char array "Hi" is [H][i]?
Question 4:
Suppose c[2] is going to store "Hi" followed by a \0 (not sure how this is done, using gets(c) maybe?). So where is the \0 stored? Is it stored "somewhere" after c[2] to become [H][i]\0 or will c[2] be appended with a \0 to become c[3] which is [H][i][\0]?
It is quite confusing that sometimes there is a \0 following the string/char array and causes trouble when I compare two variables by if (c1==c2) as it most likely returns FALSE (0).
Detailed answers are appreciated. But keeping your answer brief helps my understanding :)
Thank you in advance!
Answer 1: In code 1 you have a char array that is not a string; in code 2 you have a char array that is also a string.
Answer 2: A string is a char array in which (at least) one element has the value 0; if you leave the size part empty, the compiler will automatically fill it with the minimum possible value.
char astring[] = "foobar"; /* compiler automagically uses 7 for size */
printf("%d\n", (int)sizeof astring);
Answer 3: a char array in which one of the elements is NUL is a string; a char array where no elements are NUL is not a string.
Answer 4: an array defined to hold two elements (char c[2];) cannot hold three elements. If it is going to be a string it can only be the empty string or a string with 1 character.
Question 1:
Why is there a difference in the outcomes of source code 1 and source
code 2?
Source code 1:
#include <stdio.h>
#include <string.h>
int main()
{
char c[2]="Hi";
printf("%d", strlen(c)); //returns 3 (not 2!?)
getchar();
}
Source code 2:
#include <stdio.h>
#include <string.h>
int main()
{
char c[3]="Hi";
printf("%d", strlen(c)); //returns 2 (not 3!?)
getchar();
}
answer:
Because in the first case, c[] is only holding "Hi". strlen looks for a zero at the end, and, depending on exactly what is behind c[] finds one sooner or later, or crashes. We can't say without knowing exactly what is in the memory behind the c[] array.
Question 2:
How is a string variable different from a char array? How to declare
them with the minimum required index numbers allowing \0 to be stored
if any (please read the codes below)?
char name[index] = "Mick"; //should index be 4 or 5?
char name[index] = {'M', 'i', 'c', 'k'}; //should index be 4 or 5?
answer
Really depends on what you want to do. Probably 5 if you want to actually use the content as a string. But there's nothing saying you can't store "Mick" in a 4 character array - you just can't use strlen to find out how long it is, because strlen will continue to 5 and quite possibly (much) further to find the length, and if there is no zero in the next several memory locations, it could lead to a crash, because eventually, there won't be valid memory addresses to read.
#define name "Mick" //what is the size? Is there a \0?
This has absolutely no size at all, until you use name somwhere. #defines are not part of what the compiler sees - the pre-processor will replace name with "Mick" if you use name anywhere - and hopefully, that's in a place the compiler can make sense of. And then the same rules apply as in previous answer - it depends on how you want to use the array of characters. For correct operation with strlen, strpy, and nearly all other str... functions, you need a zero at the end.
Question 3:
Does the terminating null ONLY follow strings but not char arrays? So
the actual value of the string "Hi" is [H][i][\0] and the actual value
of the char array "Hi" is [H][i]?
Yes, no, maybe. It all depends on how you USE the "Hi" string literal (that's the technical name for 'something within double quotes'). If the compiler is "allowed", it will put a zero at the end. But if you initialize an array to a given size, it will stuff the bytes in there, and if there isn't room for a zero, that's your problem, not the compiler's.
Question 4:
Suppose c[2] is going to store "Hi" followed by a \0 (not sure how
this is done, using gets(c) maybe?). So where is the \0 stored? Is it
stored "somewhere" after c[2] to become [H][i]\0 or will c[2] be
appended with a \0 to become c[3] which is [H][i][\0]?
In c[2], beyond the 'H', 'i', there is no telling what is stored [technically, it could well be "the end of the earth" - in computer terms, that's "memory that can't be read - in which case strlen on that WILL crash your program, because strlen reads beyond the end of the earth]. But if could also be a zero, a one, the letter 'a', the number 42, or any other 8-bit [1] value.
It is quiet confusing that sometimes there is a \0 following the
string/char array and causes trouble when I compare two variables by
if (c1==c2) as it most likely returns FALSE (0).
If c1 and c2 are char arrays, that will ALWAYS be false since c1 and c2 are never going to have the same address, and when using an array in C in that way, it becomes "the address in memory of the first element in the array". So no matter what teh contents of c1 and c2 is, their address can never be the same [because they are two different variables, and two variables can not have the same location in memory - that's like trying to park two cars in a parking space large enough only for one car - and no, crushing either car is not allowed in our thought experiment].
[1] Char isn't guaranteed to be 8 bits. But lets inore that for now.
Running source code one is undefined behavior because strlen() requires a NUL-terminated string, which c[2] = "Hi"; /* = { 'H', 'i' } */ is not. A string differs from a char array in that a string is a char array with at least one NUL byte somewhere in the array.
The remaining answers should follow easily from this fact.
To autosize a char array to match the size of a string literal at initialization, simply specify no array size:
char c[] = "This will automatically size the c array (including the NUL).";
Note that you cannot compare char arrays with the == operator. You have to use
if (strcmp(c1, c2) == 0) {
/* Equal. */
} else {
/* Not equal. */
}
strlen() works on \0 terminating characters and in C all strings should be \0 terminated. So when you have given only 2 spaces for 2 characters H and i but there is no room for \0. Hence you are getting Undefined Behavior in strlen().
In case of char c[3] = "Hi"; there is \0 at the third place and strlen() will calculate the actual length.
How to declare them with the minimum required index numbers allowing \0 to be stored if any ?
When you are not sure about the size of char array , Do like this :
char c1[] = "Mike"; // strlen = 4
char c2[] = "Omkant" // strlen = 6
NOTE :
EDIT :In the above case where no size is mentioned explicitly , Do not confuse with sizeof with the strlen().
strlen() returns only number of charaters
sizeof gives number of characters plus one more (for \0 character).
So sizeof always gives exactly 1 more than the number returned by strlen().

why does this simple code gives not work at times????then when i restart my code blocks it does work at times

#include <stdio.h>
#include <stdlib.h>
main()
{
char *a,*b,*c={0};
int i=0,j=0;
a=(char *)malloc(20*sizeof(char));
b=(char *)malloc(20*sizeof(char));
c=(char *)malloc(20*sizeof(char));
printf("Enter two strings:");
gets(a);
gets(b);
while(a[i]!=NULL)
{
c[i]=a[i];
i++;
}
while(b[j]!=NULL)
{
c[i]=b[j];
i++;
j++;
}
printf("The concated string is %s",c);
}
this is crazy........i spend one whole night it didn't work and then next night it suddenly works perfectly....i'm confused
There are many things wrong with your code.
Not all of them matter if all you care about is getting the code to work.
However, I have tried here to show you different misconceptions that are clear from your code, and show you how to code it better.
You are misunderstanding what NULL means. a NULL pointer doesn't point at anything
Strings are terminated with '\0' which is an ASCII NUL, not the same thing, though both use the value 0.
char* s = "hello";
The above string is actually 6 characters long. 5 bytes for the hello, 1 for the '\0' that is stuck at the end. Incidentally, this means that you can only have strings up to 19 characters long because you need to reserve one byte for the terminal '\0'
char* r = NULL;
The pointer r is pointing at nothing. There is no '\0' there, and if you attempt to look at r[0], you will crash.
As Ooga pointed out, you missed terminating with '\0' which is going to create random errors because your printf will keep going to try to print until the first zero byte it finds. Whether you crash on any particular run is a matter of luck. Zeros are common, so usually you will stop before you crash, but you will probably print out some junk after the string.
Personally, I would rather crash than have the program randomly print out the wrong thing. At least when you crash, you know something is wrong and can fix it.
You also seem to have forgotten to free the memory you malloc.
If you are going to use malloc, you should use free at the end:
int* a = malloc(20);
...
free(a);
You also are only mallocing 20 characters. If you go over that, you will do horrible things in memory. 20 seems too short, you will have only 19 characters plus the null on the end to play with but if you do have 20 characters each in a and b, you would need 40 characters in c.
If this is an assignment to use malloc, then use it, but you should free when you are done. If you don't have to use malloc, this example does not show a reason for using it since you are allocating a small, constant amount of memory.
You are initializing c:
char* c = {0};
In a way that makes no sense.
The {0} is an array with a single zero value. c is pointing to it, but then you immediately point it at something else and never look at your little array again.
You probably mean that C is pointing to nothing at first.
That would be:
char* c = NULL;
but then you are immediately wiping out the null, so why initialize c, but not a and b?
As a general rule, you should not declare values and initialize them later. You can always do something stupid and use them before they are initialized. Instead, initialize as you declared the:
int* a = malloc(20);
int* b = malloc(20);
int* c = malloc(40);
Incidentally, the size of a char is by definition 1, so:
20* sizeof(char)
is the same as 20.
You probably saw an example like:
20 * sizeof(int)
Since sizeof(int) which is not 1 the above does something. Typically sizeof(int) is 4 bytes, so the above would allocate 80 bytes.
gets is unsafe, since it doesn't say how long the buffer is
ALWAYS use fgets instead of gets. (see below).
Many computers have been hacked using this bug (see http://en.wikipedia.org/wiki/Robert_Tappan_Morris)
Still, since malloc is not really needed, in your code, you really should write:
enum { SIZE = 128 };
char a[SIZE];
fgets(a, SIZE, STDIN);
char b[SIZE];
fgets(b, SIZE, STDIN);
char c[SIZE*2];
int i;
int j = 0;
for (i = 0; a[i] != '\0' && i < 127; i++)
c[j++] = a[i];
for (i; b[i] != '\0' && i < 127; i++)
c[j++] = a[i];
c[j] = '\0';
...
Last, I don't know if you are learning C or C++. I will simply point out that this kind of programming is a lot easier in C++ where a lot of the work is done for you. You can first get concatenation done the easy way, then learn all the pointer manipulation which is harder.
#include <string>
#include <iostream>
using namespace std;
int main() {
string a,b,c;
getline(cin, a); // read in a line
getline(cin, b);
c = a + b;
cout << c;
}
Of course, you still need to learn this low-level pointer stuff to be a sophisticated programmer in C++, but if the purpose is just to read in and concatenate lines, C++ makes it a lot easier.
You are not properly null-terminating c. Add this before the printf:
c[i] = '\0';
Leaving out null-termination will seem to work correctly if the char at i happens to be 0, but you need to set it to be sure.
The string c is not being terminated with a null char, this means printf does not know where to stop and will likely segfault your program when it overruns. The reason you may be getting sporadic success is that there is a random chance the malloced area has been pre zeroed when you allocate it, if this is the case it will succeed as a null char is represented as a literal 0 byte.
there are two solutions available to you here, first you could manually terminate the string with a null char as so:
c[i] = '\0';
Second you can use calloc instead of malloc, it guarantees the memory is always pre zeroed.
As a side note you should likely add some length checking to your code to ensure c will not overflow if A and B are both over 10. (or just make c 40 long)
I hope this helps.

K&R code example confusion: copy string

In chapter one, the K&R introduces a function copy as follows:
void copy(char to[], char from[]) {
/* copy from from[] to to[], assumes sufficient space */
int i = 0;
while ((to[i] = from[i]) != '\0') {
i++;
}
}
Tinkering around a little with this function, I had some unexpected results.
Example program:
int main() {
char a[3] = {'h', 'a', '\n'};
char b[3];
printf("a: %s", a); // prints ha
copy(b, a);
printf("a: %s", a); // prints nothing
printf("b: %s", b); // prints ha
return 0;
}
Now to my questions:
Why does the copying from a to b work, that is why does the while loop in copy ever terminate even though a does not contain a '\0'?
Why is a mutated?
You're probably experiencing a buffer overflow.
As a is not terminated properly (missing \0), copy copies from a to b as long as it finds no \0. Therefore, there are more bytes written to b as it can contain which then overflow into a (platform dependent, undefined behaviour).
The part that overflows is a \0 after a, thus making a a zero-length string.
Your stack probably looks like this:
b a
[ arbitrary memory ][ 0 0 0 ][ h a \n ][ 0 ? ? ? ? ]
The ? indicates unknown data as we don't know what lies there and it is nowhere specified.
But we know that there has to be a 0, or otherwise printf would print much more garbage.
copy copies until there is a zero found after the beginning a. There is, by chance, a
0 after the end of a, which then gets copied to b. As b is already filled with the
contents from a, b overflows into a, making your stack look like this:
b a
[ arbitrary memory ][ h a \n ][ 0 a \n ][ 0 ? ? ? ? ]
As there's a \0 at the beginning of a, printf assumes a to be empty.
This copy function relies on a terminating nullbyte to determine when it should stop copying. When you use a string constant, it is automatically null-terminated. However, normal arrays are not null-terminated, so the function continues to access memory behind the end of a until one of the bytes there happens to be a \0. When the copy function stops copying depends on the contents of that memory area. You do not know what, if anything, happens to be there, so you have no idea how long copy will keep on copying or what will happen.
Even though you did not explicitly put a '\0' in your variable a, it is very likely that a[3] (which is technically out of bounds) happens to be a zero.
Lots of memory is usually filled with Zeros / \0 values, although there is absolutely no guarantee of this.
That explains why you successfully copied the string to b.
Your copy function is wrong in the sense it "thinks" its copying arrays of ints:
void copy(int to[], int from[]) {
When it should be copying arrays of characters:
void copy(char to[], int char[]) {
It works, because your while loop isn't doing an logical operation, its doing an asignment.
So it's as long going through the loop, as long the asigned value is true.(and thats as long it's non zero) So its terminated when the '\0' (whats 0) is reached.
and it will anywhen probably terminate (with undefined behaviour) even if there is no '\0', because after leaving your boundarys the chance of the apearance of any zero valued byte is pretty big. but after leaving your array boundaries, the behavior of your program could be anything. (it could even let nasal dragons spawn ;) )
The reason why b doesn't print anything could be grounded in the Byte order, as you are wrting int values into a Byte array, so it could be that the first Byte of the Byteorder is placed in first Byte and is 0 so from outside it will be treat as a '\0' even if your purpose was to treat the whole int as a sign and not as 4 single values.
p.s.: This would even break strict-aliasing rules.
Better to examine whether to or from is NULL before the while loop below.
while ((to[i] = from[i]) != '\0'){
i++;
}
To your questions:
1.The loop terminates because something stored in memory after a equals to 0 (or '\0'), which is undefined.
2.a may change or not after calling copy which is undefined as well. Both variables stored after a and the position b stored at affect. In the case nemo mentioned above, a mutated after function copy was executed.
Always insert a '\0' at the end of a char array for security.

About string length, terminating NUL, etc

I'm currently learning C and I'm confused with differences between char array and string, as well as how they work.
Question 1:
Why is there a difference in the outcomes of source code 1 and source code 2?
Source code 1:
#include <stdio.h>
#include <string.h>
int main(void)
{
char c[2]="Hi";
printf("%d\n", strlen(c)); //returns 3 (not 2!?)
return 0;
}
Source code 2:
#include <stdio.h>
#include <string.h>
int main(void)
{
char c[3]="Hi";
printf("%d\n", strlen(c)); //returns 2 (not 3!?)
return 0;
}
Question 2:
How is a string variable different from a char array? How to declare them with the minimum required index numbers allowing \0 to be stored if any (please read the codes below)?
char name[index] = "Mick"; //should index be 4 or 5?
char name[index] = {'M', 'i', 'c', 'k'}; //should index be 4 or 5?
#define name "Mick" //what is the size? Is there a \0?
Question 3:
Does the terminating NUL ONLY follow strings but not char arrays? So the actual value of the string "Hi" is [H][i][\0] and the actual value of the char array "Hi" is [H][i]?
Question 4:
Suppose c[2] is going to store "Hi" followed by a \0 (not sure how this is done, using gets(c) maybe?). So where is the \0 stored? Is it stored "somewhere" after c[2] to become [H][i]\0 or will c[2] be appended with a \0 to become c[3] which is [H][i][\0]?
It is quite confusing that sometimes there is a \0 following the string/char array and causes trouble when I compare two variables by if (c1==c2) as it most likely returns FALSE (0).
Detailed answers are appreciated. But keeping your answer brief helps my understanding :)
Thank you in advance!
Answer 1: In code 1 you have a char array that is not a string; in code 2 you have a char array that is also a string.
Answer 2: A string is a char array in which (at least) one element has the value 0; if you leave the size part empty, the compiler will automatically fill it with the minimum possible value.
char astring[] = "foobar"; /* compiler automagically uses 7 for size */
printf("%d\n", (int)sizeof astring);
Answer 3: a char array in which one of the elements is NUL is a string; a char array where no elements are NUL is not a string.
Answer 4: an array defined to hold two elements (char c[2];) cannot hold three elements. If it is going to be a string it can only be the empty string or a string with 1 character.
Question 1:
Why is there a difference in the outcomes of source code 1 and source
code 2?
Source code 1:
#include <stdio.h>
#include <string.h>
int main()
{
char c[2]="Hi";
printf("%d", strlen(c)); //returns 3 (not 2!?)
getchar();
}
Source code 2:
#include <stdio.h>
#include <string.h>
int main()
{
char c[3]="Hi";
printf("%d", strlen(c)); //returns 2 (not 3!?)
getchar();
}
answer:
Because in the first case, c[] is only holding "Hi". strlen looks for a zero at the end, and, depending on exactly what is behind c[] finds one sooner or later, or crashes. We can't say without knowing exactly what is in the memory behind the c[] array.
Question 2:
How is a string variable different from a char array? How to declare
them with the minimum required index numbers allowing \0 to be stored
if any (please read the codes below)?
char name[index] = "Mick"; //should index be 4 or 5?
char name[index] = {'M', 'i', 'c', 'k'}; //should index be 4 or 5?
answer
Really depends on what you want to do. Probably 5 if you want to actually use the content as a string. But there's nothing saying you can't store "Mick" in a 4 character array - you just can't use strlen to find out how long it is, because strlen will continue to 5 and quite possibly (much) further to find the length, and if there is no zero in the next several memory locations, it could lead to a crash, because eventually, there won't be valid memory addresses to read.
#define name "Mick" //what is the size? Is there a \0?
This has absolutely no size at all, until you use name somwhere. #defines are not part of what the compiler sees - the pre-processor will replace name with "Mick" if you use name anywhere - and hopefully, that's in a place the compiler can make sense of. And then the same rules apply as in previous answer - it depends on how you want to use the array of characters. For correct operation with strlen, strpy, and nearly all other str... functions, you need a zero at the end.
Question 3:
Does the terminating null ONLY follow strings but not char arrays? So
the actual value of the string "Hi" is [H][i][\0] and the actual value
of the char array "Hi" is [H][i]?
Yes, no, maybe. It all depends on how you USE the "Hi" string literal (that's the technical name for 'something within double quotes'). If the compiler is "allowed", it will put a zero at the end. But if you initialize an array to a given size, it will stuff the bytes in there, and if there isn't room for a zero, that's your problem, not the compiler's.
Question 4:
Suppose c[2] is going to store "Hi" followed by a \0 (not sure how
this is done, using gets(c) maybe?). So where is the \0 stored? Is it
stored "somewhere" after c[2] to become [H][i]\0 or will c[2] be
appended with a \0 to become c[3] which is [H][i][\0]?
In c[2], beyond the 'H', 'i', there is no telling what is stored [technically, it could well be "the end of the earth" - in computer terms, that's "memory that can't be read - in which case strlen on that WILL crash your program, because strlen reads beyond the end of the earth]. But if could also be a zero, a one, the letter 'a', the number 42, or any other 8-bit [1] value.
It is quiet confusing that sometimes there is a \0 following the
string/char array and causes trouble when I compare two variables by
if (c1==c2) as it most likely returns FALSE (0).
If c1 and c2 are char arrays, that will ALWAYS be false since c1 and c2 are never going to have the same address, and when using an array in C in that way, it becomes "the address in memory of the first element in the array". So no matter what teh contents of c1 and c2 is, their address can never be the same [because they are two different variables, and two variables can not have the same location in memory - that's like trying to park two cars in a parking space large enough only for one car - and no, crushing either car is not allowed in our thought experiment].
[1] Char isn't guaranteed to be 8 bits. But lets inore that for now.
Running source code one is undefined behavior because strlen() requires a NUL-terminated string, which c[2] = "Hi"; /* = { 'H', 'i' } */ is not. A string differs from a char array in that a string is a char array with at least one NUL byte somewhere in the array.
The remaining answers should follow easily from this fact.
To autosize a char array to match the size of a string literal at initialization, simply specify no array size:
char c[] = "This will automatically size the c array (including the NUL).";
Note that you cannot compare char arrays with the == operator. You have to use
if (strcmp(c1, c2) == 0) {
/* Equal. */
} else {
/* Not equal. */
}
strlen() works on \0 terminating characters and in C all strings should be \0 terminated. So when you have given only 2 spaces for 2 characters H and i but there is no room for \0. Hence you are getting Undefined Behavior in strlen().
In case of char c[3] = "Hi"; there is \0 at the third place and strlen() will calculate the actual length.
How to declare them with the minimum required index numbers allowing \0 to be stored if any ?
When you are not sure about the size of char array , Do like this :
char c1[] = "Mike"; // strlen = 4
char c2[] = "Omkant" // strlen = 6
NOTE :
EDIT :In the above case where no size is mentioned explicitly , Do not confuse with sizeof with the strlen().
strlen() returns only number of charaters
sizeof gives number of characters plus one more (for \0 character).
So sizeof always gives exactly 1 more than the number returned by strlen().

Resources