C language, char=char with unexpected results - c

Hi everybody and thanks in advance for any help, this is the situation:
#define N 12
[..]
char vect[N][2];
char strng[2];
[..]
vect[i][0]=strng[2]; //this two lines are in a simple for cycle
vect[i][2]=strng[0];
Now, if in string[2] I have "c 2", what I expect in vect[i][0] is '2' and in vect[i][1] 'c'.
I use code::blocks and watching vect I have instead "2#", but it could be "2À" as well.
Can you help me? Where am I wrong?

Array indexes goes from zero up to the size minus one. So using e.g. strng[2] you access the third entry in the two-entry array. Accessing an array out of bounds leads to undefined behavior and the data will be indeterminate.
You should also remember that all strings in C are one more character than reported by e.g. strlen, and that extra character is a special terminator character. So if you want a two-character string, you really need three characters: Two for the string, and one for the terminator.

Rewrite these statements
vect[i][0]=strng[2]; //this two lines are in a simple for cycle
vect[i][2]=strng[0];
the following way
vect[i][0]=strng[1]; //this two lines are in a simple for cycle
vect[i][1]=strng[0];
provided that string contains two characters { 'c', '2' }.
Take into account that array string can not have string literal "c 2", because you defined it as
char strng[2];
that is it can contain only two characters.
If you want that the array would contain indeed "c 2" then you have to define it either as
char strng[3];
or as
char strng[4];
if you want to include the terminating zero.
In this case you may write
vect[i][0]=strng[2]; //this two lines are in a simple for cycle
vect[i][1]=strng[0];

Assuming strng literally contains "c 2", then your memory is the issue. strng[2] contains 3 cells iirc. 2 for holding chars and then a null terminator (ie \0). so when you try to access strng[2], (which you cant because you can only go to N-1 cells, where N is the number allocated for it) it contains undefined results, since it isnt null terminated and you are reaching beyond memory you allocated

Related

How char array behaves for longer strings?

I asked this question as one of multiple questions here. But people asked me to ask them separately. So why this question.
Consider below code lines:
char a[5] = "geeks"; //1
char a3[] = {'g','e','e','k','s'}; //d
printf("a:%s,%u\n",a,sizeof(a)); //5
printf("a3:%s,%u\n",a3,sizeof(a3)); //j
printf("a[5]:%d,%c\n",a[5],a[5]);
printf("a3[5]:%d,%c\n",a3[5],a3[5]);
Output:
a:geeksV,5
a3:geeks,5
a[5]:86,V
a3[5]:127,
However the output in original question was:
a:geeks,5
a3:geeksV,5
The question 1 in original question was:
Does line #1 adds \0? Notice that sizeof prints 5 in line #5 indicating \0 is not there. But then, how #5 does not print something like geeksU as in case of line #j? I feel \0 does indeed gets added in line #1, but is not considered in sizeof, while is considered by printf. Am I right with this?
Realizing that the output has changed (for same online compiler) when I took out only those code lines which are related to first question in original question, now I doubt whats going on here? I believe these are undefined behavior by C standard. Can someone shed more light? Possibly for another compiler?
Sorry again for asking 2nd question.
char a[5] = "geeks"; //1
Here, you specify the array's size as '5', and initialize it with 5 characters.
Therefore, you do not have a "C string", which by definition is ended by a NUL. (0).
printf("a:%s,%u\n",a,sizeof(a)); //5
The array itself still has a size of 5, which is correctly reported by the sizeof operator, but your call to printf is undefined behaviour and could print anything after the arrray's contents - it will just keep looking at the next address until it finds a 0 somewhere. That could be immediately, or it could print a 1000000 garbage characters, or it could cause some sort of segfault or other crash.
char a3[] = {'g','e','e','k','s'}; //d
Because you don't specify the array's size, the compiler will, through the initialization syntax, determine the size of the array. However, the way you chose to initialize a3, it will still only provide 5 bytes of length.
The reason for that is that your initialization just is an initialization list, and not a "string". Therefore, your subsequent call to printf also is undefined behaviour, and it is just luck that at the position a3[5] there seems to be a 0 in your case.
Effectively, both examples have the very same error.
You could have it different thus:
char a3[] = "geeks";
Using a string literal for initialization of the array with unspecified size will cause the compiler to allocate enough memory to hold the string and the additional NUL-terminator, and sizeof (a3) will now yield 6.
"geeks" here is a string literal in C.
When you define "geeks" the compiler automatically adds the NULL character to the end. This makes it 6 characters long.
But you are assigning it to char a[5]. This will cause undefined behaviour.
As mentioned by #DavidBowling, in this case the following condition applies
(Section 6.7.8.14) C99 standard.
An array of character type may be initialized by a character string literal, optionally enclosed in braces. Successive characters of the character string literal (including the terminating null character if there is room or if the array is of unknown size) initialize the elements of the array
the elements "geeks" will be copied into the array 'a' but the NULL character will not be copied.
So in this case when you try to print the array, it will continue printing until it encounters a \0 in the memory.
From the further print statements it is seen that a[5] has the value V. Presumably the next byte on your system is \0 and the array print stops.
So, in your system, at that instance, "geeksV" is printed.

The mechanics of populating an array [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 6 years ago.
Improve this question
I tried to populate an array of structs defined as follows:
typedef struct{
char directive[5];
}directive_nfo_t;
By using the following:
directive_nfo_t directive_list[]=
{
{"ALIGN"},{"ASCII"},{"BSS"},{"BYTE"},{"END"},{"EQU"},{"ORG"}
};
To my surprise, the first few elements were corrupted like so:
[0]= ALIGNASCIIBSS
[1]= ASCIIBSS
[2]= BSS
...
Until I made the following change:
typedef struct{
char directive[6]; <-- made char array +1
}directive_nfo_t;
Then the first few arrays were correct like so:
[0]= ALIGN
[1]= ASCII
[2]= BSS
...
My question is what happens in the background to explain this behavior?
Regards.
In C, a string is a sequence of character values followed by a 0-valued terminator; the string "ASCII" is represented by the character sequence 'A', 'S', 'C', 'I', 'I', 0. Thus, you need a six-element array to store the string.
For a string that's N characters long, you need an array of N+1 characters to store it.
When you explicitly initialize a char array as string literal in the way you do:
char some_array[] = {"ALIGN"};
the compiler actually populates the 0th to 4th "position" (total of 5 positions) with the characters inside quotation marks, but also the fifth position with \0 without requiring you do it explicitly (if it has space enough). So the size equals 6. You exceed the boundaries if you don't count the \0 character into the size calculation and restrict the size to 5. Compiler would omit the terminating character.
In your case it looks as if the first element of the next member "overwrote" what should have been the omitted \0 character of the previous, since you haven't reserved a place for it. In fact the "mechanics of populating the array" boils down to the compiler writing as much data as could fit inside the boundaries. The address of the first position of the next member string logically corresponds to your assignment, although the \0 from the previous is missing.
Since your printf() format tag was %s, the function printed the characters until it reached the first \0 character, which is in fact undefined behavior.
That's why
char directive[6];
was correct size assignment in your code.
If the char array is big enough, C compiler automatically places a '\0' after the text.
If it is just large enough for the text, that terminator is omitted, which is what has happened here.
If there isn't even room for the text, the compiler will say something like "too many initialisers" or "array bounds overflow".
The struct array elements are adjacent in memory. The first two items lack a terminator, so the second item printed only stops at the terminator after the third item. The first item, is also printed until it reaches that same terminator. By making the array size 6, the compiler was able to place a terminator after every item.
Unlike in C++, C allows you to (unintentionally) shoot yourself in the feet, by allowing to omit NUL terminating character '\0' in the char array initializer when there is no room for it. Your case can be narrowed down to a simple array definition such as:
char str[5] = "ALFAP";
which is a syntatic shortcut to:
char str[5] = {'A', 'L', 'F', 'A', 'P'};
It may be kind of misleading, because in different context the same "ALFAP" represets string literal, that always has the ending NUL character:
char* str = "ALFAP" // here, "ALFAP" always contains NUL character at the end
My question is what happens in the background to explain this behavior? Regards.
You have an array of struct directive_nfo_t type and each struct directive_nfo_t holds array of five characters (in your first example).
The output that you were getting when you have 5 character array in directive_nfo_t type was basically due to two things-
Array elements are stored in consecutive memory locations.
In C, the abstract idea of a string is implemented with just null terminated array of characters.
When you have declared an array of directive_nfo_t type, each element of directive_nfo_t is stored in consecutive memory location and each element has 5 character array(which are also stored in consecutive locations) in it. And in your Initialization list({"ALIGN"},{"ASCII"},{"BSS"},{"BYTE"},{"END"},{"EQU"},{"ORG"}) for the array, you have used all the 5 characters in storing your data in first two elements of directive_nfo_t ("ALIGN" and "ASCII"). As, in C, functions which operate on character array to implement abstract idea of string, assume that a string will be terminated by using a null character at the end. Therefore, in the first two elements of directive_nfo_t array, the printf will keep on printing characters until it reaches null character(which it will find in element storing character array "BSS"). After printing ALIGN, printf will access the first character of second element of the array of directive_nfo_t (character A of ASCII). It occurred because there was not space for null character in the first element of array of directive_nfo_t type and compiler wouldn't add characters beyond array size as it does array bound check. From the third element of you array, you have enough space for null character and hence, printf works as expected.
You will get UNDEFINED BEHAVIOR if you allocate less memory to store your character array and use those functions which assume null terminated character array. Always set the size of the character array to MAX + 1 when you want to store maximum MAX characters in your array.

C character array and its length

I am studying now C with "C Programming Absolute Beginner's Guide" (3rd Edition) and there was written that all character arrays should have a size equal to the string length + 1 (which is string-termination zero length). But this code:
#include <stdio.h>
main()
{
char name[4] = "Givi";
printf("%s\n",name);
return 0;
}
outputs Givi and not Giv. Array size is 4 and in that case it should output Giv, because 4 (string length) + 1 (string-termination zero character length) = 5, and the character array size is only 4.
Why does my code output Givi and not Giv?
I am using MinGW 4.9.2 SEH for compilation.
You are hitting what is considered to be undefined behavior. It's working now, but due to chance, not correctness.
In your case, it's because the memory in your program is probably all zeroed out at the beginning. So even though your string is not terminated properly, it just so happens that the memory right after it is zero, so printf knows when to stop.
+-----------------------+
|G|i|v|i|\0|\0|... |
+-----------------------+
| your | rest of |
| stuff | memory (stack)|
+-----------------------+
Other languages, such as Java, have safeguards against this sort of situations. Languages like C, however, do less hand holding, which, on the one hand, allows more flexibility, but on the other, give you much, much more ways to shoot you in the foot with subtle issues such as this one. In other words, if your code compiles, that doesn't mean it's correct and it won't blow up now, in 5 minutes or in 5 years.
In real life, this is almost never the case, and your string might end up getting stored next to other things, which would always end up getting printed out together with your string. You never want this. Situations like this might lead to crashes, exploits and leaked confidential information.
See the following diagram for an example. Imagine you're working on a web server and the string "secret"--a user's password or key is stored right next to your harmless string:
+-----------------------+
|G|i|v|i|s|e|c|r|e|t |
+-----------------------+
| your | rest of |
| stuff | memory (stack)|
+-----------------------+
Every time you would output what you would think is "Givi", you'd end up printing out the secret string, which is not what you want.
The byte after the last character always has to be 0, otherwise printf would not know when the string is terminanted and would try to access bytes (or chars) while they are not 0.
As Andrei said, apparently it just happened, that the compiler put at least one byte with the value 0 after your string data, so printf recognized the end of the string.
This can vary from compiler to compiler and thus is undefined behaviour.
There could, for instance, be a chance to have printf accessing an address, which your program is not allowed to. This would result in a crash.
In C text strings are stored as zero terminated arrays of characters. This means that the end of a text string is indicated by a special character, a numeric value of zero (0), to indicate the end of the string.
So the array of text characters to be used to store a C text string must include an array element for each of the characters as well as an additional array element for the end of string.
All of the C text string functions (strcpy(), strcmp(), strcat(), etc.) all expect that the end of a text string is indicated by a value of zero. This includes the printf() family of functions that print or output text to the screen or to a file. Since these functions depend on seeing a zero value to terminate the string, one source of errors when using C text strings is copying too many characters due to a missing zero terminator or copying a long text string into a smaller buffer. This type of error is known as a buffer overflow error.
The C compiler will perform some types of adjustments for you automatically. For instance:
char *pText = "four"; // pointer to a text string constant, compiler automatically adds zero to an additional array element for the constant "four"
char text[] = "four"; // compiler creates a array with 5 elements and puts the characters four in the first four array elements, a value of 0 in the fifth
char text[5] = "four"; // programmer creates array of 5 elements, compiler puts the characters four in the first four array elements, a value of 0 in the fifth
In the example you provided a good C compiler should issue at the minimum a warning and probably an error. However it looks like your compiler is truncating the string to the array size and is not adding the additional zero string terminator. And you are getting lucky in that there is a zero value after the end of the string. I suppose there is also the possibility that the C compiler is adding an additional array element anyway but that would seem unlikely.
What your book states is basically right, but there is missing the phrase "at least". The array can very well be larger.
You already stated the reason for the min length requirement. So what does that tell you about the example? It is crap!
What it exhibits is called undefined behaviour (UB) and might result in daemons flying out your nose for the printf() - not the initializer. It is just not covered by the C standard (well ,the standard actually says this is UB), so the compiler (and your libraries) are not expected to behave correctly.
For such cases, no terminator will be appended explicitly, so the string is not properly terminated when passed to `printf()".
Reason this does not produce an error is likely some legacy code which did exploit this to safe some bytes of memory. So, instead of reporting an error that the implicit trailing '\0' terminator does not fit, it simply does not append it. Silently truncating the string literal would also be a bad idea.
The following line:
char name[4] = "Givi";
May give warning like:
string for array of chars is too long
Because the behavior is Undefined, still compiler may pass it. But if you debug, you will see:
name[0] 'G'
name[1] 'i'
name[2] 'V'
name[3] '\0'
And so the output is
Giv
Not Give as you mentioned in the question!
I'm using GCC compiler.
But if you write something like this:
char name[4] = "Giv";
Compiles fine! And output is
Giv

Array fill in C

I have this problem with a lot of arrays in my program, and I can't understand why. I think I miss something on array theory.
"Someone" adds at the end of my arrays some sort of char characters such as ?^)(&%. For example if I have an array of lenght 5 with "hello", so it's full, sometimes it prints hello?()/&%%. I can undesrtand it can occur if it's of 10 elements and i use only 5, so maybe the other 5 elements get some random values, but if it's full, where the hell gets those strange values?
I partially solve it by manaully adding at the end the character '\0'.
For example this problem occurs, sometimes, when I try to fill an array from another array (i read a line form a test file with fgets, then I have to extract single words):
...
for(x=0;fgets(c,500,fileb);x++) { // read old local file
int l=strlen(c);
i=0;
for (k=0;k<(l-34);k++) {
if(c[k+33]!='\n') {
userDatabaseLocalPath[k]=c[k+33];
}
}
Thanks
Strings in C are terminated by a character with the value 0, often referred to as a character literal, i.e. '\0'.
A character array of size 5 can not hold the string hello, since the terminator doesn't fit. Functions expecting a terminator will be confused.
To declare an array holding a string, the best syntax to use is:
char greeting[] = "hello";
This way, you don't need to specify the length (count the characters), since the compiler does that for you. And you also don't need to include the terminator, it's added automatically so the above will create this, in memory:
+-+-+-+-+-+--+
greeting: |h|e|l|l|o|\0|
+-+-+-+-+-+--+
You say that you have problems "filling an array from another longer array", this sounds like an operation most referred to as string copying. Since strings are just arrays with terminators, you can't blindly copy a longer string over a shorter, unless you know that there is extra space.
Given the above, this code would invoke undefined behavior:
strcpy(greeting, "hi there!");
since the string being copied into the greeting array is longer than what the array has space for.
This is typically avoided by using "known to be large enough" buffers, or adding checks that manually keep track of the space used. There is a function called strncpy() which sort of does this, but I would not recommend using it since its exact semantics are fairly odd.
You are facing the issue of boundary limit for the array.. if the array is of size 5 , then its not necessary that the sixth location which will be \0 be safe.. As it is not memory reserved/assigned for your array.. If after sometime some other application accesses this memory and writes to it. you will lose the \0 resulting in the string helloI accessed this space being read. which is what you are getting.

Strings without a '\0' char?

If by mistake,I define a char array with no \0 as its last character, what happens then?
I'm asking this because I noticed that if I try to iterate through the array with while(cnt!='\0'), where cnt is an int variable used as an index to the array, and simultaneously print the cnt values to monitor what's happening the iteration stops at the last character +2.The extra characters are of course random but I can't get it why it has to stop after 2.Does the compiler automatically inserts a \0 character? Links to relevant documentation would be appreciated.
To make it clear I give an example. Let's say that the array str contains the word doh(with no '\0'). Printing the cnt variable at every loop would give me this:
doh+
or doh^
and so on.
EDIT (undefined behaviour)
Accessing array elements outside of the array boundaries is undefined behaviour.
Calling string functions with anything other than a C string is undefined behaviour.
Don't do it!
A C string is a sequence of bytes terminated by and including a '\0' (NUL terminator). All the bytes must belong to the same object.
Anyway, what you see is a coincidence!
But it might happen like this
,------------------ garbage
| ,---------------- str[cnt] (when cnt == 4, no bounds-checking)
memory ----> [...|d|o|h|*|0|0|0|4|...]
| | \_____/ -------- cnt (big-endian, properly 4-byte aligned)
\___/ ------------------ str
If you define a char array without the terminating \0 (called a "null terminator"), then your string, well, won't have that terminator. You would do that like so:
char strings[] = {'h', 'e', 'l', 'l', 'o'};
The compiler never automatically inserts a null terminator in this case. The fact that your code stops after "+2" is a coincidence; it could just as easily stopped at +50 or anywhere else, depending on whether there happened to be \0 character in the memory following your string.
If you define a string as:
char strings[] = "hello";
Then that will indeed be null-terminated. When you use quotation marks like that in C, then even though you can't physically see it in the text editor, there is a null terminator at the end of the string.
There are some C string-related functions that will automatically append a null-terminator. This isn't something the compiler does, but part of the function's specification itself. For example, strncat(), which concatenates one string to another, will add the null terminator at the end.
However, if one of the strings you use doesn't already have that terminator, then that function will not know where the string ends and you'll end up with garbage values (or a segmentation fault.)
In C language the term string refers to a zero-terminated array of characters. So, pedantically speaking there's no such thing as "strings without a '\0' char". If it is not zero-terminated, it is not a string.
Now, there's nothing wrong with having a mere array of characters without any zeros in it, as long as you understand that it is not a string. If you ever attempt to work with such character array as if it is a string, the behavior of your program is undefined. Anything can happen. It might appear to "work" for some magical reasons. Or it might crash all the time. It doesn't really matter what such a program will actually do, since if the behavior is undefined, the program is useless.
This would happen if, by coincidence, the byte at *(str + 5) is 0 (as a number, not ASCII)
As far as most string-handling functions are concerned, strings always stop at a '\0' character. If you miss this null-terminator somewhere, one of three things will usually happen:
Your program will continue reading past the end of the string until it finds a '\0' that just happened to be there. There are several ways for such a character to be there, but none of them is usually predictable beforehand: it could be part of another variable, part of the executable code or even part of a larger string that was previously stored in the same buffer. Of course by the time that happens, the program may have processed a significant amount of garbage. If you see lots of garbage produced by a printf(), an unterminated string is a common cause.
Your program will continue reading past the end of the string until it tries to read an address outside its address space, causing a memory error (e.g. the dreaded "Segmentation fault" in Linux systems).
Your program will run out of space when copying over the string and will, again, cause a memory error.
And, no, the C compiler will not normally do anything but what you specify in your program - for example it won't terminate a string on its own. This is what makes C so powerful and also so hard to code for.
I bet that an int is defined just after your string and that this int takes only small values such that at least one byte is 0.

Resources