C programming: Array prints two characters on its own

C programming: Array prints two characters on its own - c

I'm trying to print a 2 character string. This is the part of that code.
char arraytwo[3];
// 2 characters
for (i = 'a'; i <= 'z'; i++)
{
arraytwo[0] = i;
for (j = 'a'; j <= 'z'; j++)
{
arraytwo[1] = j;
printf("%s\n", arraytwo);
}
}
The output I am getting is this. For some reason it keeps adding "AZ" at the end of each iteration. What am I missing?
aaAZ
abAZ
acAZ
adAZ
aeAZ
afAZ
agAZ
ahAZ
aiAZ
ajAZ
akAZ

What you're missing is the definition of a string, it has to be null-terminated, by definition.
Quoting C11, chapter §7.1.1, (emphasis mine)
A string is a contiguous sequence of characters terminated by and including the first null
character. [....]
In your case, for arraytwo,
it's automatic storage, and not initialized explicitly.
you did not null-terminate it manually.
So, technically, arraytwo is not a string.
In this usage, as an argument to %s format specifier, out of bound access happens in search of the null-terminator, which causes undefined behavior.
Also quoting chapter §7.21.6.1
s
If no l length modifier is present, the argument shall be a pointer to the initial
element of an array of character type.280) Characters from the array are
written up to (but not including) the terminating null character. [....]
Solution:
Either initialize the array elements to 0, something like char arraytwo[3] = {0};
Or, manually null terminate your array, like arraytwo[2] = '\0';
before using the array as string.

You were on a good way making the array with a length of 3. But you didn't initialize it to be all 0s. So at the beginning of your snipped, the array contains some random garbage and after the array also comes some random stuff.
When you pass a pointer to a printf("%s") function call, printf will output the memory starting with the pointed-to value and incrementing the pointer until it hits a '\0'.
In your case this already happend after 2 random characters. But it does not have to. And if the bell-ringing character (it was '\b' I believe) is there too in the random part of your printf call, your computer might even start to beep.

Related

How does an array terminate?

As we know a string terminates with '\0'.
It's because to know the compiler that string ended, or to secure from garbage values.
But how does an array terminate?
If '\0' is used it will take it as 0 a valid integer,
So how does the compiler knows the array ended?

C does not perform bounds checking on arrays. That's part of what makes it fast. However that also means it's up to you to ensure you don't read or write past the end of an array. So the language will allow you to do something like this:
int arr[5];
arr[10] = 4;
But if you do, you invoke undefined behavior. So you need to keep track of how large an array is yourself and ensure you don't go past the end.
Note that this also applies to character arrays, which can be treated as a string if it contains a sequence of characters terminated by a null byte. So this is a string:
char str[10] = "hello";
And so is this:
char str[5] = { 'h', 'i', 0, 0, 0 };
But this is not:
char str[5] = "hello"; // no space for the null terminator.

C doesn't provide any protections or guarantees to you about 'knowing the array is ended.' That's on you as the programmer to keep in mind in order to avoid accessing memory outside your array.

C language does not have native string type. In C, strings are actually one-dimensional array of characters terminated by a null character '\0'.
From C Standard#7.1.1p1 [emphasis mine]
A string is a contiguous sequence of characters terminated by and including the first null character. The term multibyte string is sometimes used instead to emphasize special processing given to multibyte characters contained in the string or to avoid confusion with a wide string. A pointer to a string is a pointer to its initial (lowest addressed) character. The length of a string is the number of bytes preceding the null character and the value of a string is the sequence of the values of the contained characters, in order.
String is a special case of character array which is terminated by a null character '\0'. All the standard library string related functions read the input string based on this rule i.e. read until first null character.
There is no significance of null character '\0' in array of any type apart from character array in C.
So, apart from string, for all other types of array, programmer is suppose to explicitly keep the track of number of elements in the array.
Also, note that, first null character ('\0') is the indication of string termination but it is not stopping you to read beyond it.
Consider this example:
#include <stdio.h>
int main(void) {
char str[5] = {'H', 'i', '\0', 'z'};
printf ("%s\n", str);
printf ("%c\n", str[3]);
return 0;
}
When you print the string
printf ("%s\n", str);
the output you will get is - Hi
because with %s format specifier, printf() writes every byte up to and not including the first null terminator [note the use of null character in the strings], but you can also print the 4th character of array as it is within the range of char array str though beyond first '\0' character
printf ("%c\n", str[3]);
the output you will get is - z
Additional:
Trying to access array beyond its size lead to undefined behavior which includes the program may execute incorrectly (either crashing or silently generating incorrect results), or it may fortuitously do exactly what the programmer intended.

It’s just a matter of convention. If you wanted to, you could totally write code that handled array termination (for arrays of any type) via some sentinel value. Here’s an example that does just that, arbitrarily using -1 as the sentinel:
int length(int arr[]) {
int i;
for (i = 0; arr[i] != -1; i++) {}
return i;
}
However, this is obviously utterly unpractical: You couldn’t use -1 in the array any longer.
By contrast, for C strings the sentinel value '\0' is less problematic because it’s expected that normal test won’t contain this character. This assumption is kind of valid. But even so there are obviously many strings which do contain '\0' as a valid character, and null-termination is therefore by no means universal.
One very common alternative is to store strings in a struct that looks something like this:
struct string {
unsigned int length;
char *buffer;
}
That is, we explicitly store a length alongside a buffer. This buffer isn’t null-terminated (although in practice it often has an additional terminal '\0' byte for compatibility with C functions).
Anyway, the answer boils down to: For C strings, null termination is a convenient convention. But it is only a convention, enforced by the C string functions (and by the C string literal syntax). You could use a similar convention for other array types but it would be prohibitively impractical. This is why other conventions developed for arrays. Notably, most functions that deal with arrays expect both an array and a length parameter. This length parameter determines where the array terminates.

Printf prints more than the size of an array

So, I'm rewriting the tar extract command, and I stumbled upon a weird problem:
In short, I allocate a HEADER struct that contains multiple char arrays, let's say:
struct HEADER {
char foo[42];
char bar[12];
}
When I fprintf foo, I get a 3 character-long string, which is OK since the fourth character is a '\0'. But when I print bar, I have 25 characters that are printed.
How can I do to only get the 12 characters of bar?
EDIT The fact that the array isn't null terminated is 'normal' and cannot be changed, otherwise I wouldn't have so much trouble with it. What I want to do is parse the x first characters of my array, something like
char res[13];
magicScanf(res, 12, bar);
res[12] = '\0'
EDIT It turns out the string WAS null-terminated already. I thought it wasn't since it was the most logic possibility for my bug. As it's another question, I'll accept an answer that matched the problem described. If someone has an idea as to why sprintf could've printed 25 characters INCLUDING 2 \0, I would be glad.

You can print strings without NUL terminators by including a precision:
printf ("%.25s", s);
or, if your precision is unknown at compilation time:
printf ("%.*s", length, s);

The problem is that the size of arrays are lost when calling a function. Thus, the fprintf function does not know the size of the array and can only end at a \0.

No, unless you have supplied the precision, fprintf() has no magical way to know the size of the array supplied as argument to %s, it still relies on the terminating null.
Quoting C11, chapter §7.21.6.1, (emphasis mine)
s
If no l length modifier is present, the argument shall be a pointer to the initial
element of an array of character type.280) Characters from the array are
written up to (but not including) the terminating null character. If the
precision is specified, no more than that many bytes are written. If the
precision is not specified or is greater than the size of the array, the array shall
contain a null character.
So, in case your array is not null terminated, you must use a precision wo avoid out of bound access.

void printbar(struct HEADER *h) {
printf("%.12s", h->bar);
}
You can use it like this
struct HEADER data[100];
/* ... */
printbar(data + 42); /* print data[42].bar */
Note that if one of the 12 bytes of bar has a value of zero, not all of them get printed.
You might be better off printing them one by one
void printbar(struct HEADER *h) {
printf("%02x", h->bar[0]);
for (int i = 1; i < 12; i++) printf(" %02x", h->bar[i]);
}

Simple single char array encryption needs an artificially long array to work?

Running a simple encryption on a single char array. It doesn't seem to work when the array size is less than or equal to 1, even though only a single char is changing.
The below works because yesCrypto[10] is set to 10 (or > 1).
char noCrypto[] = "H"; //sets an array to hold unencrypted H
char yesCrypto[10]; //sets array to hold encrypted H
yesCrypto[0]=noCrypto[0]+1;
//takes 'H' from noCrypto and turns it into an 'I' and moves it into yesCrypto.
printf("Encrypted string is '%s'\n", yesCrypto);
//prints Encrypted version of 'H', 'I'
The below does not work because yesCrypto[0] is set to 0, also does not work when set to 1.
char noCrypto[] = "H"; //sets an array to hold unencrypted H
char yesCrypto[1]; //sets array to hold encrypted H
yesCrypto[0]=noCrypto[0]+1;
//takes 'H' from noCrypto and turns it into an 'I' and moves it into yesCrypto.
printf("Encrypted string is '%s'\n", yesCrypto);
//prints 'IH'
Side question: why is it printing IH when it is not working probably.

Code is attempting to print a character array that is not a string using "%s".
yesCrypto[] is not certainly null character terminated.
char yesCrypto[10];
yesCrypto[0] = noCrypto[0]+1;
printf("Encrypted string is '%s'\n", yesCrypto); // bad
Instead, limit printing or append a null character.
// 1 is the maximum number of characters to print
printf("Encrypted string is '%.*s'\n", 1, yesCrypto);
// or
yesCrypto[1] = '\0';
printf("Encrypted string is '%s'\n", yesCrypto);
OP's 2nd code is just bad as object arrays of length 0 lack defined behavior.
// bad
char yesCrypto[0];
OP's edited post uses char yesCrypto[1];. In that case use
yesCrypto[0] = noCrypto[0]+1;
printf("Encrypted string is '%.*s'\n", 1, yesCrypto);
// or
printf("Encrypted character is '%c'\n", yesCrypto[0]);
Fundamentally, printing encrypted data as a string is a problem as the encrypted character array may contain a null character in numerous places and a string requires a null character and ends with the first one.

In the first case, you're supplying an array (as an argument to %s) which is not null-terminated.
Quoting C11, chapter §7.21.6.1,
s
If no l length modifier is present, the argument shall be a pointer to the initial
element of an array of character type.280) Characters from the array are
written up to (but not including) the terminating null character. If the
precision is specified, no more than that many bytes are written. If the
precision is not specified or is greater than the size of the array, the array shall
contain a null character.
In this case, yesCrypto being an automatic local array and left uninitialized, the contents are indeterminate, so there's no guarantee of a null being present in the array. So the usage causes undefined behavior.
What you're seeing in the second case is undefined behavior, too.
Quoting C11, chapter §6.7.6.2
In addition to optional type qualifiers and the keyword static, the [ and ] may delimit
an expression or *. If they delimit an expression (which specifies the size of an array), the
expression shall have an integer type. If the expression is a constant expression, it shall
have a value greater than zero. [...]
So, the later code (containing char yesCrypto[0];) has Constraints violations, it invokes UB.
A note on why this might not produce a compilation error:
gcc does have an extension which supports zer-length arrays, but the use case is very specific and since C99, the "flexible array member" is a standadized choice over this extension.
Finally, for
...also does not work when set to 1....
will lack the space for a null-terminator, raising the same issue as in the very first case. To put it in simple words, to make a char array behave like a string containing n elements, you need
size of the array to be n+1
index n to contain a null character ('\0').

K&R - section 1.9: understanding character arrays (and incidentally buffers)

Let's start with a very basic question about character arrays that I could not understand from the description in the book:
Does every character array end with '\0'?
Is the length of it always equal to the number of characters + 1 for '\0'?
meaning that if I specify a character array length of 10 I would be able to store only 9 characters that are not '\0'?
or does the '\0' come after the last array slot, so all 10 slots could be used for any character and an 11th non-reachable slot would contain the '\0' char?
Going further into the example in this section, it defines a getline() function that reads a string and counts the number of characters in it.
you can see the entire code here (in this example getline() was changed to gline(), since getline() is already defined in newer stdio.h libraries)
Here's the function:
int getline(char s[], int lim) {
int c, i;
for (i = 0; i < lim - 1 && (c = getchar()) != EOF && c != '\n'; ++i) {
s[i] = c;
}
if (c == '\n') {
s[i] = c;
++i;
}
s[i] = '\0';
return i;
}
It is explained that the array stores the input in this manner:
[h][e][l][l][o][\n][\0]
and the function will return a count of 6, including the '\n' char,
but this is only true if the loop exits because of a '\n' char.
If the loop exits because it has reached it's limit, it will return an array like this (as I understand this):
[s][n][a][z][z][y][\0]
now the count will also be 6.
Comparing both strings will return that they're equal when clearly "snazzy" is a longer word than "hello",
and so this code has a bug (by my personal requirements, as I would like to not count '\n' as part of the string).
Trying to fix this I tried (among many other things) to remove adding the '\n' char to the array and not incrementing the counter,
and I found out incidentally that when entering more characters than the array could store, the extra characters wait in the input buffer,
and would later be passed to the getline() function, so if I would enter:
"snazzy lolz\n"
it would use it up like this:
first getline() call: [s][n][a][z][z][y][\0]
second getline() call: [ ][l][o][l][z][\n][\0]
This change also introduced an interesting bug, if I try to enter a string that is exactly 7 characters long (including '\n') the program would quit straight away because it would pass a '\0' char to the next getline() call which would return 0 and would exit the while loop that calls getline() in main().
I am now confused as to what to do next.
How can I make it not count the '\n' char but also avoid the bug it created?
Many thanks

There is a convention in C that strings end with a null character. On that convention, all your questions are based. So
Does every character array end with '\0'?
No, It ends with \0 because the programmers put it there.
Is the length of it always equal to the number of characters + 1 for '\0'?
Yes, but only because of this convention. Thereto, for example you allocate one more byte (char) than the length of the string to accommodate this \0.
Strings are stored in character arrays such as char s[32]; or char *s = malloc(strlen(name) + 1);

Does every character array end with '\0'?
No; strings are a special case - they are character arrays with a nul (\0) terminator. This is more a convention than a feature of the language, although it is part of the language in-so-far that literal constant strings have a nul terminator. Moreover in a character string, the nul appears at the end of the string, not the end of the array - the array holding the string may be longer that the string it holds.
So the nul merely indicates the end of a string in a character array. If the character array represents data other than a string, then it may contain zero elements anywhere.
Is the length of it always equal to the number of characters + 1 for '\0'?
Again you are conflating strings with character arrays. They are not the same. A string happens to use a character array as a container. A string requires an array that is at least the length of the string plus one.
meaning that if I specify a character array length of 10 I would be
able to store only 9 characters that are not '\0'?
You will be able to store 10 characters of any value. If however you choose to interpret the array as a string, the string comprises only those characters up-to and including the first nul character.
or does the '\0' come after the last array slot, so all 10 slots could
be used for any character and an 11th non-reachable slot would contain
the '\0' char?
The nul is at the end of the string, not the end of the array, and certainly not after the end of the array.
Comparing both strings will return that they're equal when clearly
"snazzy" is a longer word than "hello",
In what world are those strings equal? They have equal length, not equal content.
and so this code has a bug (by my personal requirements, as I would
like to not count '\n' as part of the string).
Someone else's code not doing what you require is hardly a bug; that implementation is by design and is identical to the behaviour of the standard library fgets() function. If you require different behaviour, then you are of course free to implement to your needs; just omit the part:
if (c == '\n') {
s[i] = c;
++i;
}
To explicitly flush any remaining characters in the buffer the removed code above may be replaced with:
while(c != '\n') {
c = getchar() ;
}
One reason why you might not do that is that the data may be coming from a file redirected to stdin.
One reason for retaining the '\n' is that enables detection of incomplete input, which may be useful in some cases. For example you may want all the data in the line, regardless of length and despite a necessarily finite buffer length, a string returned without a newline would indicate that there is more day to be read, so you could then write code to handle that situation.

What is the purpose of array[index] = 0;?

char dev[20] = "dev_name";
char dest[32];
strncpy(dest,dev,sizeof(dev));
dest[sizeof(dev)-1] = 0;
What does dest[sizeof(dev)-1] = 0; means?

In your code, assuming size is analogous to sizeof,
dev_name[size(dec_name)-1] = 0;
dev_name[size(dec_name)-1] points to the last element of the array, remember C arrays use 0 based indexing.
Then, by definition, a string in c is exxentially a char array with null-termination, so if you want to use a char array as string, you must have the null-termination.
0 is the ASCII value of NUL or null. So, essentially, you're putting a null-terminator to the char array.

Does it mean all the element of that array are assigned zero?
No it does not mean this.
Assuming you meant strncpy(dest,dev_name,sizeof(dev_name)); /* Extra bracket */ and dev_name and sizeof in you last line; You are assigning NUL character to the last to mark the end of name array.
When you write a string literal like "foo", it is automatically NUL terminated by the compiler. When you take your own arrays, you sometimes need to mark the end of string manually.
From man strncpy
The strncpy() function is similar, except that at most n bytes of src
are copied. Warning: If there is no null byte among the first n bytes
of src, the string placed in dest will not be null-terminated.
Explicitly the null-termination is added to handle the warning case in your code snippet.
dev_name[0], dev_name[1], dev_name[2] etc are first, second, third ... characters of your string. Assuming device name has less than 31 characters, it is automatically NUL terminated after strncpy and you don't need to do anything.
If the name has exactly 31 character, last character character (32nd) is already '\0' (ascii code 0) and writing 0 again over it does not make any difference.
If the name has more than 31 character (corner case), last character character is not NUL and dev_name[sizeof(dev_name)-1] = 0; will make the name NUL terminated.