C: strange string phenomenon - c

could someone explain this phenomenon.
#include "stdio.h"
#include "stdlib.h"
int main()
{
char foo[]="foo";
char bar[3]="bar";
printf("%s",foo);
printf("\n");
printf("%s",bar);
return 0;
}
Result:
foo
barfoo
If I change the order and create bar before foo, I get a correct output.
#include "stdio.h"
#include "stdlib.h"
int main()
{
char bar[3]="bar";
char foo[]="foo";
printf("%s",foo);
printf("\n");
printf("%s",bar);
return 0;
}
Result:
foo
bar
And one more.
#include "stdio.h"
#include "stdlib.h"
int main()
{
char foobar[]="foobar";
char FOO[3]={'F','O','O','\0'};
char BAR[3]="BAR";
printf("%s",foobar);
printf("\n");
printf("%s",FOO);
printf("\n");
printf("%s",BAR);
return 0;
}
Result:
foobar
FOOfoobar
BARFOOfoobar

The string "bar" is four characters long: {'b', 'a', 'r', '\0'}. If you explicitly specify the array length then you need to allocate at least four characters:
char bar[4]="bar";
When you do this:
char bar[3]="bar";
printf("%s",bar);
You are invoking undefined behavior as the bar variable has no null terminator. Anything could happen. In this case specifically, the compiler has laid out the two arrays contiguously in memory:
'b' 'a' 'r' 'f' 'o' 'o' '\0'
^ ^
bar[3] foo[4]
When you print bar it keeps reading until it finds a null terminator, any null terminator. Since bar has none it keeps going until it finds the one at the end of "foo\0".

If you declare char bar[3]="bar";, then you will declare a char array with no room for the null terminator. So printf() will just carry on reading chars from memory, printing them to the console, until it encounters a '\0'.

The other posters have already explained to you that in your
char bar[3] = "bar";
example the string terminator does not fit into the array, so the string ends up non-terminated. Formally speaking, it is not even a string (since stings are required to be terminated by definition). You are attempting to print a non-string as a string (using %s format specifier), which results in undefined behavior. Undefined behavior is exactly what you observe.
In C++ language (for example) the
char bar[3] = "bar";
declaration would be illegal, since C++ does not allow the zero terminator to "fall off" in a declaration like that. C allows it, but only for the implicit zero terminator character. The
char bar[3] = "barr";
declaration is illegal in both C and C++.
Again, the "missing zero" trick works in C with the implicit zero terminator character only. It doesn't work with any explicit initializer: you are not allowed to explicitly specify more initializers than there are elements in the array. Which brings us to your third example. In your third example you have
char FOO[3] = { 'F', 'O', 'O', '\0' };
declaration, which explicitly specifies 4 initializers for an array of size 3. This is illegal in C. Your third example is not compilable. If your compiler accepted it without a diagnostic message, you compiler must be broken. The behavior of your third program cannot be explained by C language, since it is not a C program.

As you hinted at knowing in your line char FOO[3]={'F','O','O','\0'}; this is a null termination issue. The problem is that the null terminator is a character. If you allocate memory for 3 characters, you can't put 4 characters in that location (it just takes the first 3 and truncates the rest).

you're missing the \0 at the end of the string.. and an array with 4 elements is declared as FOO[4] not FOO[3]..

In the first example you haven't got a null-terminated string. It just so happens that they are laid out in memory contiguously and thus the behaviour can be explained as the run over from one string to the other.
In the next example FOO is of size 3, but you are giving it four elements. In the same many BAR is not null terminated.
char FOO[3]={'F','O','O','\0'};
char BAR[3]="BAR";

bar is not null terminated, so the printf keeps following the array until it gets to a '\0' character. The stack is arranged such that bar and foo are right next to each other in memory. The only way C knows an array's size is by finding a null terminal. So if you laid out your stack in memory it would look like:
0 1 2 3 4 5 6
'b' 'a' 'r' 'f' 'o' 'o' '\0'
^bar begins ^foo begins
By saying foo[] the compiler sets the size of foo based on the constant string its initialized with. Its smart enough to make in 4 characters to include the null terminator, '\0'.
To solve this, the size of bar should actually be 4, ie:
char bar[4] = "bar"; // extra space for null terminal
or better, let the compiler figure it out like you did with foo:
char bar[] = "bar"; // compiler adds null term character('\0')

char bar[3]="bar"; doesn't leave enough space to add the terminating '\0' character.
If you do char bar[4]="bar";, You should get the result you expect.

Please read this detailed answer which will give insight into this...
The reason you're seeing "funny" things is because the strings are not NUL terminated...

The culprit is this line:
char bar[3]="bar";
This causes only 'b', 'a' and 'r' to be in the length 3 array that you have created.
Now as it so happens, the string in foo is 'f', 'o', 'o' and '\0' and it was allocated contiguous location with bar. So the memory looked like:
b | a | r | f | o | o | \0
I hope this makes it clear.

This line
char bar[3]="bar";
causes undefined behaviour as "bar" is four characters taking care of the '\0'. So you bar array should be four bytes.
undefined behaviour means anything can happen - including good and bad things

Related

Why can I store a string in the memory address of a char?

I'm starting to understand pointers and how to dereference them etc. I've been practising with ints but I figured a char would behave similarly. Use the * to dereference, use the & to access the memory address.
But in my example below, the same syntax is used to set the address of a char and to save a string to the same variable. How does this work? I think I'm just generally confused and maybe I'm overthinking it.
int main()
{
char *myCharPointer;
char charMemoryHolder = 'G';
myCharPointer = &charMemoryHolder;
printf("%s\n", myCharPointer);
myCharPointer = "This is a string.";
printf("%s\n", myCharPointer);
return 0;
}
First, you need to understand how "strings" work in C.
"Strings" are stored as an array of characters in memory. Since there is no way of determining how long the string is, a NUL character, '\0', is appended after the string so that we know where it ends.
So for example if you have a string "foo", it may look like this in memory:
--------------------------------------------
| 'f' | 'o' | 'o' | '\0' | 'k' | 'b' | 'x' | ...
--------------------------------------------
The things after '\0' are just stuff that happens to be placed after the string, which may or may not be initialised.
When you assign a "string" to a variable of type char *, what happens is that the variable will point to the beginning of the string, so in the above example it will point to 'f'. (In other words, if you have a string str, then str == &str[0] is always true.) When you assign a string to a variable of type char *, you are actually assigning the address of the zeroth character of the string to the variable.
When you pass this variable to printf(), it starts at the pointed address, then goes through each char one by one until it sees '\0' and stops. For example if we have:
char *str = "foo";
and you pass it to printf(), it will do the following:
Dereference str (which gives 'f')
Dereference (str+1) (which gives 'o')
Dereference (str+2) (which gives another 'o')
Dereference (str+3) (which gives '\0' so the process stops).
This also leads to the conclusion that what you're currently doing is actually wrong. In your code you have:
char charMemoryHolder = 'G';
myCharPointer = &charMemoryHolder;
printf("%s\n", myCharPointer);
When printf() sees the %s specifier, it goes to address pointed to by myCharPointer, in this case it contains 'G'. It will then try to get next character after 'G', which is undefined behaviour. It might give you the correct result every now and then (if the next memory location happens to contain '\0'), but in general you should never do this.
Several comments
Static strings in c are treated as a (char *) to a null terminated
array of characters. Eg. "ab" would essentially be a char * to a block of memory with 97 98 0. (97 is 'a', 98 is 'b', and 0 is the null termination.)
Your code myCharPointer = &charMemoryHolder; followed by printf("%s\n", myCharPointer) is not safe. printf should be passed a null terminated string, and there's no guarantee that memory contain the value 0 immediately follows your character charMemoryHolder.
In C, string literals evaluate to pointers to read-only arrays of chars (except when used to initialize char arrays). This is a special case in the C language and does not generalize to other pointer types. A char * variable may hold the address of either a single char variable or the start address of an array of characters. In this case the array is a string of characters which has been stored in a static region of memory.
charMemoryHolder is a variable that has an address in memory.
"This is a string." is a string constant that is stored in memory and also has an address.
Both of these addresses can be stored in myCharPointer and dereferenced to access the first character.
In the case of printf("%s\n", myCharPointer), the pointer will be dereferenced and the character displayed, then the pointer is incremented. It repeasts this until finds a null (value zero) character and stops.
Hopefully you are now wondering what happens when you are pointing to the single 'G' character, which is not null-terminated like a string constant. The answer is "undefined behavior" and will most likely print random garbage until it finds a zero value in memory, but could print exactly the correct value, hence "undefined behavior". Use %c to print the single character.

2-D character array

#include<stdio.h>
void main()
{
char a[10][5] = {"hi", "hello", "fellow"};
printf("%s",a[0]);
}
Why this code printing only hi
#include<stdio.h>
void main()
{
char a[10][5] = {"hi", "hello", "fellow"};
printf("%s",a[1]);
}
While this code is printing "hellofellow"
Why this code printing only hi
You've told printf to print the string stored at a[0], and that string happens to be "hi".
While this code is printing "hellofellow"
This one is by coincidence, in fact your code ought to be rejected by the compiler due to a constraint violation:
No initializer shall attempt to provide a value for an object not contained within the entity being initialized.
The string "fellow", specifically the 'w' at the end of it does not fit within the char[5] being initialised, and this violates the C standard. Perhaps also by coincidence, your compiler provides an extension (making it technically a non-C compiler), and so you don't see the error messages that I do:
prog.c:3:6: error: return type of 'main' is not 'int' [-Werror=main]
void main()
^
prog.c: In function 'main':
prog.c:5:37: error: initializer-string for array of chars is too long [-Werror]
char a[10][5] = {"hi", "hello", "fellow"};
^
Note that the second error message is complaining about "fellow", but not "hello". Your "hello" initialisation is valid by exception:
An array of character type may be initialized by a character string literal or UTF-8 string literal, optionally enclosed in braces. Successive bytes of the string literal (including the terminating null character if there is room or if the array is of unknown size) initialize the elements of the array.
The emphasis is mine. What the emphasised section states is that if there isn't enough room for a terminal '\0' character, that won't be used in the initialisation.
Your code:
char a[10][5] = {"hi", "hello", "fellow"};
Allocates 10 char [5]
"hello" takes up 5 so there is no room for the terminating \0, so it runs into "fellow"
If you try it, a [3] should be "w" because "fellow" is too big and the "w" runs over from a[2] to a[3]
Aside from being undefined behavior, it is confusing what you were trying to do
It will give undefines behaviour as string are null-terminated.
And element hello has length of 5.
Declare your array as a[10][7] then you will get intended output.
See here -https://ideone.com/c2zUs0
Why this code printing only hi
Because a[0][2] is null indicating termination thus giving you hi.
This is undefined behavior due to insufficient space to store \0 character.
Please note that the memory allocated is 5bytes per string in your array of strings. Thus, for the a[1] there is not sufficient memory to store the \0 character as all five bytes are assigned with "hello".
Thus, the subsequent memory is read until the \0 character is found.
Thus, you can change the line:
char a[10][5] = {"hi", "hello", "fellow"};
to
char a[][7] = {"hi", "hello", "fellow"};
Why this code printing only hi
This is because the \0 character is already encountered at a[0][2] and thus the reading of the characters is stopped.
What Your Code Does:
Look at the following statement:
char a[10][5] = {"hi", "hello", "fellow"};
It allocates 10 rows. 5 characters are allocated for each index of a.
What is the Problem:
Strings are Null Terminated there is always a null-terminator needed to be stored except for the given characters, so basically the used size of array is numOfCharacters+1, the extra one byte is for the null terminator. When you are initializing the array with exactly size number of characters, the null terminator is skipped. Normally the character array value is printed until the first \0(null terminator) is not found. Please also have a look at this.
The Solution:
No need to worry about this problem, all you need to do is just to set the size equal to the numOfCharactersInString + 1. You can use the following statement:
char a[10][7] = {"hi", "hello", "fellow"};
Since the largest string is "fellow" which contains 6 characters, you need to set the size 6 + 1 that is why the statement should use char a[10][7] instead of char a[10][5]
Hope it helps.
When you declare a 2-D character array as
char a[10][5] = {"hi", "hello", "fellow"};
char a[10][5] reserves memory to store 10 strings each of length 5 which means 4 characters + 1 '\0' character. A point to note is that the array elements are stored in contiguous memory locations.
a[0] points to the first string, a[1] to the second and so on.
Also when you initialize an array partially the other uninitialized elements become 0 instead of being garbage values.
Now in your case,after initialization if you try to visualize the array it would be something like
hi\0\0\0hellofello\0\0...
Now the command
printf("%s",a[0]);
prints characters starting from 'h' of "hi" and stops printing when a '\0' is encountered so "hi" is printed.
Now for the second case,
printf("%s",a[1]);
characters are printed starting from the 'h' of "hello" till a '\0' is encountered.Now the '\0' character is encountered only after printing "hellofello" and hence the output.

Unknown erroneous characters are being added to strings?

I am learning C. Some characters are being added automatically to my program. What am I doing wrong?
#include <stdio.h>
#include <string.h>
int main() {
char test1[2]="xx";
char test2[2]="xx";
printf("test is %s and %s.\n", test1, test2);
return 0;
}
Here is how I am running it on Fedora 20.
gcc -o problem problem.c
./problem
test is xx?}� and xx#.
I would expect the answer would be test is xx and xx.
The issue is that string literals such as "xx" have an extra character that is the nul-termination, \0, that is, it is composed of the characters 'x', 'x' and '\0'.
This is how functions that take char* and treat them as strings know the extent of the strings. Your arrays are simply one element too short, missing the nul-terminator. By passing char* that don't point to a nul-terminated string to a function that expects one, you are invoking undefined behaviour.
You can initialize them like this instead:
char test[] = "xx";
This will result in test having the correct length of 3. You can test that using the sizeof operator. Of course, you can also be explicit about the length:
char test[3] = "xx";
but this is more error-prone.
When you define a String in C like this
char A[] = "hello";
It gets initialized something like this
A = { 'h', 'e', 'l', 'l', 'o', '\0'}
That last null character is needed for the it to be a string. So in your code
char test1[2]="xx";
You have made the test1 character array to be 2 characters long, leaving no space for the null character.
To correct your program, You can either not give the size of the character array, like
char test1[]="xx";
Or, give one more then the characters you are filling in, like
char test1[3]="xx";
In your code char test1[2]="xx", char test1[2] creates a kind a "container" for two chars, but the actual string "xx" implicitly has three chars xx0, where 0 indicates an end of the line. This 0 is an indicator for printf, where it should stop reading the input string. In your case printf doesn't get this 0 as 0 doesn't fit into the test1 and it reads to some random zero in memory, printing everything it meets on the way.
You should change your declaration to the following:
char test1[3]="xx"

storing of strings in char arrays in C

#include<stdio.h>
int main()
{
char a[5]="hello";
puts(a); //prints hello
}
Why does the code compile correctly? We need six places to store "hello", correct?
The C compiler will let you run off the end of arrays, it does no checks of that sort.
The C compiler allows you to explicitly ask for no null terminator.
char a[] = "Hello"; /* adds a terminator implicitly */
char a[6] = "Hello"; /* adds a terminator implicitly */
char a[5] = "Hello"; /* skips it */
Any value smaller than 5 results in an error.
As for why - one possibility is that your strings are of a fixed size, or are being used as buffers of byte values. In these cases you do not need a null terminator.
Best practice is to use char a[] so the compiler can set it to the correct value (including terminator) automatically.
a doesn't contain a null terminated string (extra initializers for fixed size arrays - such as the null terminator in "hello" - are discarded), so the behaviour when a pointer to that array is passed to puts is undefined.
In my experience, a lot of compilers will let you get away with compiling this. It will usually crash at runtime, though (because you don't have a null terminator).
C char array initialization includes the terminating null only if there is room or if the array dimensions are not specified.
You need 6 characters to store "hello" as a null terminated string. But char arrays are not constrained to store nul terminated string, you may need the array for another purpose and forcing an additional nul character in those cases would be pointless.
That is because in C memory management is done manually unlike in java and some other few languages....
The six places you allocated is not checked for during compilation but if you
have to get into filing(I mean storing actually) you are going to have a runtime error becuase the program kept five places in memory(but is expected to hold six) for the characters but the compiler did not check!
"hello" string is kept in read-only memory with 0 in the end. "a" points to this string, this is why the program may work correctly. But I think that generally this is undefined behavior.
It is necessary to see Assembly code generated by compiler to see what happens exactly. If you want to get junk output in this situation, try:
char a[5] = {'h', 'e', 'l', 'l', 'o'}
The C compiler you are using does not check that the string literal fits to the char array. You need 6 characters in the array to fit the literal "Hello" since the literal includes a terminating zero. Modern compilers, such as Visual C++ 2010 do check these things and give you and error.

C string initializer doesn't include terminator?

I am a little confused by the following C code snippets:
printf("Peter string is %d bytes\n", sizeof("Peter")); // Peter string is 6 bytes
This tells me that when C compiles a string in double quotes, it will automatically add an extra byte for the null terminator.
printf("Hello '%s'\n", "Peter");
The printf function knows when to stop reading the string "Peter" because it reaches the null terminator, so ...
char myString[2][9] = {"123456789", "123456789" };
printf("myString: %s\n", myString[0]);
Here, printf prints all 18 characters because there's no null terminators (and they wouldn't fit without taking out the 9's). Does C not add the null terminator in a variable definition?
Your string is [2][9]. Those [9] are ['1', '2', etc... '8', '9']. Because you only gave it room for 9 chars in the first array dimension, and because you used all 9, it has no room to place a '\0' character. redefine your char array:
char string[2][10] = {"123456789", "123456789"};
And it should work.
Sure it does, you just aren't leaving enough room for the '\0' byte. Making it:
char string[2][10] = { "123456789", "123456789" };
Will work as you expect (will just print 9 characters).
If you tell C that an array is a given size, C cannot make the array any larger. It would be disobeying you if it did so! Remember that not every char array contains a null terminated string. Sometimes the array (as used) is truly an array of (individual) char. The compiler doesn't know what you are doing and cannot read your mind.
This is why C allows you to initialize a char array where the null terminator won't fit but everything else will. Try your example with a string one byte longer and the compiler will complain.
Note that your example will compile but will not do what you expect, as the contents are not (null terminated) strings. With GCC, running your example, I see the string I should, followed by garbage.
Alterenatively, you can use:
char* myString[2] = {"123456789", "123456789" };
Like this, the initializer computes the right size for your null terminated strings.
C allows unterminated strings, C++ does not.
C allows character arrays to be
initialized with string constants. It
also allows a string constant
initializer to contain exactly one
more character than the array it
initializes, i.e., the implicit
terminating null character of the
string may be ignored. For example:
char name1[] = "Harry"; // Array of 6 char
char name2[6] = "Harry"; // Array of 6 char
char name3[] = { 'H', 'a', 'r', 'r', 'y', '\0' };
// Same as 'name1' initialization
char name4[5] = "Harry"; // Array of 5 char, no null char
C++ also allows character arrays to be
initialized with string constants, but
always includes the terminating null
character in the initialization. Thus
the last initializer (name4) in the
example above is invalid in C++.
Is there a reason why the compiler doesn't warn that there isn't enough room for the 0 byte? I get a warning if I try to add another '9' that won't fit, but it doesn't seem to care about dropping the 0 byte?
The '\0' byte isn't it's problem. Most of the time, if you have this:
char code[9] = "123456789";
The next byte will be off the edge of the variable, but will be unused memory, and will most likely be 0 (unless you malloc() and don't set the values before using them). So most of the time it works, even if it's bad for you.
If you're using gcc, you might also want to use the -Wall flag, or one of the other (million) warning flags. This might help (not sure).

Resources