I am learning C. Some characters are being added automatically to my program. What am I doing wrong?
#include <stdio.h>
#include <string.h>
int main() {
char test1[2]="xx";
char test2[2]="xx";
printf("test is %s and %s.\n", test1, test2);
return 0;
}
Here is how I am running it on Fedora 20.
gcc -o problem problem.c
./problem
test is xx?}� and xx#.
I would expect the answer would be test is xx and xx.
The issue is that string literals such as "xx" have an extra character that is the nul-termination, \0, that is, it is composed of the characters 'x', 'x' and '\0'.
This is how functions that take char* and treat them as strings know the extent of the strings. Your arrays are simply one element too short, missing the nul-terminator. By passing char* that don't point to a nul-terminated string to a function that expects one, you are invoking undefined behaviour.
You can initialize them like this instead:
char test[] = "xx";
This will result in test having the correct length of 3. You can test that using the sizeof operator. Of course, you can also be explicit about the length:
char test[3] = "xx";
but this is more error-prone.
When you define a String in C like this
char A[] = "hello";
It gets initialized something like this
A = { 'h', 'e', 'l', 'l', 'o', '\0'}
That last null character is needed for the it to be a string. So in your code
char test1[2]="xx";
You have made the test1 character array to be 2 characters long, leaving no space for the null character.
To correct your program, You can either not give the size of the character array, like
char test1[]="xx";
Or, give one more then the characters you are filling in, like
char test1[3]="xx";
In your code char test1[2]="xx", char test1[2] creates a kind a "container" for two chars, but the actual string "xx" implicitly has three chars xx0, where 0 indicates an end of the line. This 0 is an indicator for printf, where it should stop reading the input string. In your case printf doesn't get this 0 as 0 doesn't fit into the test1 and it reads to some random zero in memory, printing everything it meets on the way.
You should change your declaration to the following:
char test1[3]="xx"
Related
I am learning C and I came across the pointers.
Even though I learned more with this tutorial than from the textbook I still wonder about the char pointers.
If I program this
#include <stdio.h>
int main()
{
char *ptr_str;
ptr_str = "Hello World";
printf(ptr_str);
return 0;
}
The result is
Hello World
I don't understand how there isn't an error while compiling since the pointer ptr_str is pointing directly to the text and not to the first character of the text. I thought that only this would work
#include <stdio.h>
int main()
{
char *ptr_str;
char var_str[] = "Hello World";
ptr_str = var_str;
printf(ptr_str);
return 0;
}
So in the first example how was I pointing directly to the text?
Your code works because string literals are essentially static arrays.
ptr_str = "Hello World";
is treated by the compiler as if it were
static char __tmp_0[] = {'H', 'e', 'l', 'l', 'o', ' ', 'W', 'o', 'r', 'l', 'd', '\0' };
ptr_str = __tmp_0;
(except trying to modify the contents of a string literal has undefined behavior).
You can even apply sizeof to a string literal and you'll get the size of the array: sizeof "Hello" is 6, for example.
In the context of assignment to a char pointer the 'value' of a string literal is the address of its first character.
so
ptr_str = "Hello World";
sets ptr_str to the address of the 'H'
Why won't the first one work? It will work as you have seen.
String literals are arrays. From §6.4.5p6 C11 Standard N1570
The multibyte character sequence is then used to initialize an array of static storage duration and length just sufficient to contain the sequence. For character string literals, the array elements have type char, and are initialized with the individual bytes of the multibyte character sequence.
Now in the first case literal array decayed into pointer to first element - so decayed pointer will basically be pointing to 'H'. You assigned that pointer to ptr_str. Now printf will expect a format specifier and the corresponding argument. Here it will be %s and corresponding argument would be char*. And printf will print every character until it reached the \0. That's all it happened. This is how you ended up pointing directly to the text.
Note that second case is quite different from first case in that - second case a copy is being made which can be modified (Trying to modify the first one would be undefined behavior). We are basically initializing a char array with the content of the string literal.
I'm using a template from my teacher and at the beginning of the code is says:
#include "lab8.h"
void main(void)
{
int response;
int count;
string words[MAX_COUNT];
Later on in the function, a whole lot of words get put inside the words string. So I was like looking at that last line and got confused. I thought char declared strings? What does that last line even do? I also noticed in a couple of function parameter lists later on, there was entered "string words" instead of what I expected that mention char or something.
EDIT:
typedef char string[MAX_LENGTH];
had been written in the .h file didn't see it.
C does not have a basic data type called string.
Check the lab8.h file carefully. Usually, string should be a typedef of unsigned char.
Essentially, string words[MAX_COUNT]; defines an array of variable type string containing MAX_COUNT number of variables.
C does not have a dedicated string data type. In C, a string is a sequence of character values followed by a zero-valued byte. Strings are stored as arrays of char, but not all arrays of char contain strings.
For example,
char word[] = { 'h', 'e', 'l', 'l', 'o', 0 };
stores the string "hello" in the array variable word. The array size is taken from the size of the initializer, which is 6 (5 characters plus the 0 terminator). The zero-valued byte serves as a sentinel value for string handling functions like strlen, strcpy, strcat, and for arguments to printf and scanf that use the %s and %[ conversion specifiers.
By contrast,
char arr[] = { 'h', 'e', 'l', 'l', 'o' };
stores a sequence of character values, but since there's no terminating 0-valued byte, this sequence is not considered a string, and you would not want to use it as an argument to any string-handling function (since there's no terminator, the function has no way of knowing where the string ends and will wind up attempting to access memory outside of the array, which can lead to anything from garbage output to a crash).
Without seeing the contents of lab8.h, I'm going to speculate that the string type is a typedef for an array of char, something like
#define MAX_STRING_LENGTH 20 // or some other value
typedef char string[MAX_STRING_LENGTH];
Thus, an array of string is an array of arrays of char; it would be equivalent to
char words[MAX_COUNT][MAX_STRING_LENGTH];
So each words[i] is an N-element array of char.
The following code snippet gives unexpected output in Turbo C++ compiler:
char a[]={'a','b','c'};
printf("%s",a);
Why doesn't this print abc? In my understanding, strings are implemented as one dimensional character arrays in C.
Secondly, what is the difference between %s and %2s?
This is because your string is not zero-terminated. This will work:
char a[]={'a','b','c', '\0'};
The %2s specifies the minimum width of the printout. Since you are printing a 3-character string, this will be ignored. If you used %5s, however, your string would be padded on the left with two spaces.
char a[]={'a','b','c'};
Well one problem is that strings need to be null terminated:
char a[]={'a','b','c', 0};
Without change the original char-array you can also use
char a[]={'a','b','c'};
printf("%.3s",a);
or
char a[]={'a','b','c'};
printf("%.*s",sizeof(a),a);
or
char a[]={'a','b','c'};
fwrite(a,3,1,stdout);
or
char a[]={'a','b','c'};
fwrite(a,sizeof(a),1,stdout);
Because you aren't using a string. To be considered as a string you need the 'null termination': '\0' or 0 (yes, without quotes).
You can achieve this by two forms of initializations:
char a[] = {'a', 'b', 'c', '\0'};
or using the compiler at your side:
char a[] = "abc";
Whenever we store a string in c programming, we always have one extra character at the end to identify the end of the string.
The Extra character used is the null character '\0'.
In your above program you are missing the null character.
You can define your string as
char a[] = "abc";
to get the desired result.
could someone explain this phenomenon.
#include "stdio.h"
#include "stdlib.h"
int main()
{
char foo[]="foo";
char bar[3]="bar";
printf("%s",foo);
printf("\n");
printf("%s",bar);
return 0;
}
Result:
foo
barfoo
If I change the order and create bar before foo, I get a correct output.
#include "stdio.h"
#include "stdlib.h"
int main()
{
char bar[3]="bar";
char foo[]="foo";
printf("%s",foo);
printf("\n");
printf("%s",bar);
return 0;
}
Result:
foo
bar
And one more.
#include "stdio.h"
#include "stdlib.h"
int main()
{
char foobar[]="foobar";
char FOO[3]={'F','O','O','\0'};
char BAR[3]="BAR";
printf("%s",foobar);
printf("\n");
printf("%s",FOO);
printf("\n");
printf("%s",BAR);
return 0;
}
Result:
foobar
FOOfoobar
BARFOOfoobar
The string "bar" is four characters long: {'b', 'a', 'r', '\0'}. If you explicitly specify the array length then you need to allocate at least four characters:
char bar[4]="bar";
When you do this:
char bar[3]="bar";
printf("%s",bar);
You are invoking undefined behavior as the bar variable has no null terminator. Anything could happen. In this case specifically, the compiler has laid out the two arrays contiguously in memory:
'b' 'a' 'r' 'f' 'o' 'o' '\0'
^ ^
bar[3] foo[4]
When you print bar it keeps reading until it finds a null terminator, any null terminator. Since bar has none it keeps going until it finds the one at the end of "foo\0".
If you declare char bar[3]="bar";, then you will declare a char array with no room for the null terminator. So printf() will just carry on reading chars from memory, printing them to the console, until it encounters a '\0'.
The other posters have already explained to you that in your
char bar[3] = "bar";
example the string terminator does not fit into the array, so the string ends up non-terminated. Formally speaking, it is not even a string (since stings are required to be terminated by definition). You are attempting to print a non-string as a string (using %s format specifier), which results in undefined behavior. Undefined behavior is exactly what you observe.
In C++ language (for example) the
char bar[3] = "bar";
declaration would be illegal, since C++ does not allow the zero terminator to "fall off" in a declaration like that. C allows it, but only for the implicit zero terminator character. The
char bar[3] = "barr";
declaration is illegal in both C and C++.
Again, the "missing zero" trick works in C with the implicit zero terminator character only. It doesn't work with any explicit initializer: you are not allowed to explicitly specify more initializers than there are elements in the array. Which brings us to your third example. In your third example you have
char FOO[3] = { 'F', 'O', 'O', '\0' };
declaration, which explicitly specifies 4 initializers for an array of size 3. This is illegal in C. Your third example is not compilable. If your compiler accepted it without a diagnostic message, you compiler must be broken. The behavior of your third program cannot be explained by C language, since it is not a C program.
As you hinted at knowing in your line char FOO[3]={'F','O','O','\0'}; this is a null termination issue. The problem is that the null terminator is a character. If you allocate memory for 3 characters, you can't put 4 characters in that location (it just takes the first 3 and truncates the rest).
you're missing the \0 at the end of the string.. and an array with 4 elements is declared as FOO[4] not FOO[3]..
In the first example you haven't got a null-terminated string. It just so happens that they are laid out in memory contiguously and thus the behaviour can be explained as the run over from one string to the other.
In the next example FOO is of size 3, but you are giving it four elements. In the same many BAR is not null terminated.
char FOO[3]={'F','O','O','\0'};
char BAR[3]="BAR";
bar is not null terminated, so the printf keeps following the array until it gets to a '\0' character. The stack is arranged such that bar and foo are right next to each other in memory. The only way C knows an array's size is by finding a null terminal. So if you laid out your stack in memory it would look like:
0 1 2 3 4 5 6
'b' 'a' 'r' 'f' 'o' 'o' '\0'
^bar begins ^foo begins
By saying foo[] the compiler sets the size of foo based on the constant string its initialized with. Its smart enough to make in 4 characters to include the null terminator, '\0'.
To solve this, the size of bar should actually be 4, ie:
char bar[4] = "bar"; // extra space for null terminal
or better, let the compiler figure it out like you did with foo:
char bar[] = "bar"; // compiler adds null term character('\0')
char bar[3]="bar"; doesn't leave enough space to add the terminating '\0' character.
If you do char bar[4]="bar";, You should get the result you expect.
Please read this detailed answer which will give insight into this...
The reason you're seeing "funny" things is because the strings are not NUL terminated...
The culprit is this line:
char bar[3]="bar";
This causes only 'b', 'a' and 'r' to be in the length 3 array that you have created.
Now as it so happens, the string in foo is 'f', 'o', 'o' and '\0' and it was allocated contiguous location with bar. So the memory looked like:
b | a | r | f | o | o | \0
I hope this makes it clear.
This line
char bar[3]="bar";
causes undefined behaviour as "bar" is four characters taking care of the '\0'. So you bar array should be four bytes.
undefined behaviour means anything can happen - including good and bad things
I am a little confused by the following C code snippets:
printf("Peter string is %d bytes\n", sizeof("Peter")); // Peter string is 6 bytes
This tells me that when C compiles a string in double quotes, it will automatically add an extra byte for the null terminator.
printf("Hello '%s'\n", "Peter");
The printf function knows when to stop reading the string "Peter" because it reaches the null terminator, so ...
char myString[2][9] = {"123456789", "123456789" };
printf("myString: %s\n", myString[0]);
Here, printf prints all 18 characters because there's no null terminators (and they wouldn't fit without taking out the 9's). Does C not add the null terminator in a variable definition?
Your string is [2][9]. Those [9] are ['1', '2', etc... '8', '9']. Because you only gave it room for 9 chars in the first array dimension, and because you used all 9, it has no room to place a '\0' character. redefine your char array:
char string[2][10] = {"123456789", "123456789"};
And it should work.
Sure it does, you just aren't leaving enough room for the '\0' byte. Making it:
char string[2][10] = { "123456789", "123456789" };
Will work as you expect (will just print 9 characters).
If you tell C that an array is a given size, C cannot make the array any larger. It would be disobeying you if it did so! Remember that not every char array contains a null terminated string. Sometimes the array (as used) is truly an array of (individual) char. The compiler doesn't know what you are doing and cannot read your mind.
This is why C allows you to initialize a char array where the null terminator won't fit but everything else will. Try your example with a string one byte longer and the compiler will complain.
Note that your example will compile but will not do what you expect, as the contents are not (null terminated) strings. With GCC, running your example, I see the string I should, followed by garbage.
Alterenatively, you can use:
char* myString[2] = {"123456789", "123456789" };
Like this, the initializer computes the right size for your null terminated strings.
C allows unterminated strings, C++ does not.
C allows character arrays to be
initialized with string constants. It
also allows a string constant
initializer to contain exactly one
more character than the array it
initializes, i.e., the implicit
terminating null character of the
string may be ignored. For example:
char name1[] = "Harry"; // Array of 6 char
char name2[6] = "Harry"; // Array of 6 char
char name3[] = { 'H', 'a', 'r', 'r', 'y', '\0' };
// Same as 'name1' initialization
char name4[5] = "Harry"; // Array of 5 char, no null char
C++ also allows character arrays to be
initialized with string constants, but
always includes the terminating null
character in the initialization. Thus
the last initializer (name4) in the
example above is invalid in C++.
Is there a reason why the compiler doesn't warn that there isn't enough room for the 0 byte? I get a warning if I try to add another '9' that won't fit, but it doesn't seem to care about dropping the 0 byte?
The '\0' byte isn't it's problem. Most of the time, if you have this:
char code[9] = "123456789";
The next byte will be off the edge of the variable, but will be unused memory, and will most likely be 0 (unless you malloc() and don't set the values before using them). So most of the time it works, even if it's bad for you.
If you're using gcc, you might also want to use the -Wall flag, or one of the other (million) warning flags. This might help (not sure).