Confusion about strings in C

Confusion about strings in C - c

Char arrays have continuously confused me in C.
Here is the following code:
char tcp_port[100], udp_port[6];
tcp_port[99] = '\0'; udp_port[5] = '\0';
fscanf(fp, " tcp_port=%s", tcp_port);
fscanf(fp, " udp_port=%s", udp_port);
printf("%s\n", tcp_port); printf("%s\n", udp_port);
This works and prints out the right number. However, since tcp_port has 100 elements, how do those just disappear when printing? The port is only 5 characters long and the last element is null terminated. Does printf just ignore those unintialized elements, and do those uninitialized elements contain random data?

Yes, printf() only prints the characters up to the first \0 character. All C string functions do this. They also automatically append that \0 character when necessary, like the scanf() function there. That's why it's called a "0-terminated string".
The other elements can contain anything and they will be completely ignored. In practice, they usually contain random junk, but it depends on a variety of factors.
Note that when you allocate memory you must keep that \0 character in mind. Your tcp_port string can only at most 99 characters, because the last one must be 0.

Related

Why is the last character of a string not captured?

What happens to the last (nth) character of a n-character string when I try to output the string?
I've included my code, sample input and output below that highlights that the last character I input is lost.
Code:
char buffer[10];
fgets(buffer, sizeof(buffer), stdin);
printf("%s", buffer);
return 0;
Input:
aaaaaaaaab (that's 9 a's followed by 1 b)
Output:
aaaaaaaaa (9 a's)

For an array of characters to be treated as a propper string, its last character must be a null terminator (or null byte) '\0'.
The fgets function, in particular always makes sure that this character is added to the char array, so for a size argument of 10 it stores the first 9 caracters in the array and a null byte in the last available space, if the input is larger than or equal to 9.
Be aware that the unread characters, like b in your sample case, will remain in the input buffer stdin, and can disrupt future input reads.
This null byte acts as a sentinel, and is used by functions like printf to know where the string ends, needless to say that this character is not printable.
If you pass a non null terminated array of characters to printf this will amount to undefined behavior.
Many other functions in the standard library (and others) rely on this to work properly so it's imperative that you make sure that all your strings are properly null terminated.

char array in a struct data type

I actually have a question regarding the concept of a char array, especially the one which is declared and initialized like below.
char aString[10] = "";
What i was taught was that this array can store up to 10 characters (index 0-9) and that at index 10 there is an automatically placed null terminating character (i know that accessing it would not be right) such that if we use string handling functions (printf, scanf, strcmp, etc.) they would know when the string stops.
However when I tried making a struct data type like below,
typedef struct customer{
char accountNum[10];
char name[100];
char idNum[15];
char address[200];
char dateOfBirth[10];
unsigned long long int balance;
char dateOpening[10];
}CUSTOMER;
inserted 10 characters into accountNum (any method, e.g. scanf), and printf it, what is printed out will be accountNum and values in the first word of name (i know that printf will stop at a space or a '\0'). This indicates that a char array does not have a terminating null at the end of the array.
Does this mean that if we have a char array of size 10 (char aString[10]), its maximum number of char it can store is 9 characters? or does things work differently in a struct? It would be nice if someone can help me the concept because it seems like i may have been working with undefined behaviour this whole time.

char aString[10] = "";
What i was taught was that this array can store up to 10 characters (index 0-9)
Yes.
and that at index 10 there is an automatically placed null terminating character
That is wrong. For one thing, index 10 would be out of bounds of the array. The compiler will certainly not initialize data outside of the memory it has reserved for the array.
What actually happens is that the compiler will copy the entire string literal including the null-terminator into the array, and if there are any remaining elements then they will be set to zeros. If the string literal is longer than the array can hold, the compile will simply fail.
In your example, the string literal has a length of 1 char (the null terminator), so the entire array ends up initialized with zeros.
i know that accessing it would not be right
There is no problem with accessing the null terminator, as long as it is inside the bounds of the array.
such that if we use string handling functions (printf, scanf, strcmp, etc.) they would know when the string stops.
Yes, they expect C-style strings and so will look for a null terminator - unless they are explicitly told the actual string length, ie by using a precision modifier for %s, or using strncmp(), etc.
However when I tried making a struct data type like below,
<snip>
inserted 10 characters into accountNum (any method, e.g. scanf), and printf it, what is printed out will be accountNum and values in the first word of name
That means you either forgot to null-terminate accountNum, or you likely overflowed it by writing too many characters into it. For instance, that is very easy to do when misusing scanf(), strcpy(), etc.
i know that printf will stop at a space or a '\0'
printf() does not stop on a space, only on a null terminator. Unless you tell it the max length explicitly, eg:
CUSTOMER c;
strncpy(c.accountNum, "1234567890", 10); // <-- will not be null terminated!
printf("%.10s", c.accountNum); // <-- stops after printing 10 chars!
If it has not encountered a null terminator by the time it reaches the 10th character, it will stop itself.
This indicates that a char array does not have a terminating null at the end of the array.
An array is just an array, there is no terminator, only a size. If you want to treat a character array as a C-style string, then you are responsible for making sure the array contains a nul character in it. But that is just semantics of the character data, the compiler will not do anything to ensure that behavior for you (except for in the one case of initializing a character array with a string literal).
Does this mean that if we have a char array of size 10 (char aString[10]), its maximum number of char it can store is 9 characters?
Its maximum storage will always be 10 chars, period. But if you want to treat the array as a C-style string, then one of those chars must be a nul.
or does things work differently in a struct?
No. Where an array is used does not matter. The compiler treats all array the same, regardless of context (except for the one special case of initializing a character array with a string literal).

What i was taught was that this array can store up to 10 characters (index 0-9) and that at index 10 there is an automatically placed null terminating character (i know that accessing it would not be right) such that if we use string handling functions (printf, scanf, strcmp, etc.) they would know when the string stops.
Yes, but accessing the null terminating character is absolutely safe.
inserted 10 characters into accountNum (any method, e.g. scanf), and printf it, what is printed out will be accountNum and values in the first word of name (i know that printf will stop at a space or a '\0'). This indicates that a char array does not have a terminating null at the end of the array.
printf does not stop for a space, only for a null terminating character. In this case, printf will print all characters until it sees '\0'.
Does this mean that if we have a char array of size 10 (char aString[10]), its maximum number of char it can store is 9 characters?
Yes.
or does things work differently in a struct?
There is no difference.

Printf prints more than the size of an array

So, I'm rewriting the tar extract command, and I stumbled upon a weird problem:
In short, I allocate a HEADER struct that contains multiple char arrays, let's say:
struct HEADER {
char foo[42];
char bar[12];
}
When I fprintf foo, I get a 3 character-long string, which is OK since the fourth character is a '\0'. But when I print bar, I have 25 characters that are printed.
How can I do to only get the 12 characters of bar?
EDIT The fact that the array isn't null terminated is 'normal' and cannot be changed, otherwise I wouldn't have so much trouble with it. What I want to do is parse the x first characters of my array, something like
char res[13];
magicScanf(res, 12, bar);
res[12] = '\0'
EDIT It turns out the string WAS null-terminated already. I thought it wasn't since it was the most logic possibility for my bug. As it's another question, I'll accept an answer that matched the problem described. If someone has an idea as to why sprintf could've printed 25 characters INCLUDING 2 \0, I would be glad.

You can print strings without NUL terminators by including a precision:
printf ("%.25s", s);
or, if your precision is unknown at compilation time:
printf ("%.*s", length, s);

The problem is that the size of arrays are lost when calling a function. Thus, the fprintf function does not know the size of the array and can only end at a \0.

No, unless you have supplied the precision, fprintf() has no magical way to know the size of the array supplied as argument to %s, it still relies on the terminating null.
Quoting C11, chapter §7.21.6.1, (emphasis mine)
s
If no l length modifier is present, the argument shall be a pointer to the initial
element of an array of character type.280) Characters from the array are
written up to (but not including) the terminating null character. If the
precision is specified, no more than that many bytes are written. If the
precision is not specified or is greater than the size of the array, the array shall
contain a null character.
So, in case your array is not null terminated, you must use a precision wo avoid out of bound access.

void printbar(struct HEADER *h) {
printf("%.12s", h->bar);
}
You can use it like this
struct HEADER data[100];
/* ... */
printbar(data + 42); /* print data[42].bar */
Note that if one of the 12 bytes of bar has a value of zero, not all of them get printed.
You might be better off printing them one by one
void printbar(struct HEADER *h) {
printf("%02x", h->bar[0]);
for (int i = 1; i < 12; i++) printf(" %02x", h->bar[i]);
}

What is the significance of '\0' in the character array? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I know that every string in C ends with '\0' character. It is very useful in cases when we need to know when the string ends. However, I am unable to comprehend its use in printing a string and printing a string without it. I have the following code:-
/* Printing out an array of characters */
#include<stdio.h>
#include<conio.h>
int main()
{
char a[7]={'h','e','l','l','o','!','\0'};
int i;
/* Loop where we do not care about the '\0' */
for(i=0;i<7;i++)
{
printf("%c",a[i]);
}
printf("\n");
/* Part which prints the entire character array as string */
printf("%s",a);
printf("\n");
/* Loop where we care about the '\0' */
for(i=0;i<7&&a[i]!='\0';i++)
{
printf("%c",a[i]);
}
}
The output is:-
hello!
hello!
hello!
I am unable to understand the difference. any explanations?

In this case:
for(i=0;i<7;i++)
{
printf("%c",a[i]);
}
You loop for a number of times (7) and then quit. That is the end condition of the loop. It terminates, no matter anything else.
In the other case, you also loop for 7 times and no more and you just added another condition, which really serves no function as you already keeping a count of things. If you did the following:
int index = 0;
while (a[index] != '\0') { printf("%c", a[index]); index++; }
now you would depend on the zero termination character being there, if it wasn't in the string, you while loop would go on forever until the program crashed or something terminated it forcedly. Probably printing garbage on your screen.

\0 is not part of data in character string. It is indicator of end of string. If length of string is not known, look for this indicator. With its help you can replace your cycle of:
for(i=0;i<7&&a[i]!='\0';i++) { ...
with:
for(int i=0; a[i]; ++i) { ...
So, for-loops and printf are displaying the same string. The only difference how you print it.

'\0' does not correspond to a displayable character; that's why the first and last versions appear to be the same.
The second version is the same because under the hood, printf is just iterating until it hits the '\0'.

The purpose of the terminating zero character is to terminate the string, i.e. to indirectly encode the string length information in the string itself. If you somehow already know the length of your string, you can write code that works correctly without relying on that terminating zero character. That's basically all.
Now, in your code sample the first cycle does something that does not make much sense. It prints 7 characters from a string that actually has length 6. I.e. it attempts to print the terminating zero as well.

When you want to print a string from first character until end. Knowing the length of that string is not necessary when the string ends with \0 (Print characters until \0). So you don't need any extra variable to store the length of string.
In fact a string can have many various representations but minimizing the consumed memory (which it was important to C designers) leads designers to define zero-terminated strings.
Each string representation has its trade off between speed, memory and flexibility. For example you can have your string definition same as Pascal string which stores length of the string at first element of array but it causes that string to have limited length, but retrieving the length of string is faster that zero-terminated strings (Counting each character until \0).

I am unable to comprehend its use in printing a string and printing a string without it
Normally you don't print a string character by character like that. You print the whole string. In such cases, your C library will print until it finds a zero.

When printing a string of variable length, there has to be some 'signal' to indicate that you have reached the end. Generally, this is the '\0' character. Most C standard calls, like strcpy, strcat, printf, etc. depend on the string being zero-terminated, thus ending in a '\0' character. This corresponds to your second example.
The first example is printing a string of fixed length, which is simply a far less common occurence.
The third example combines both, it looks for a zero-terminator ('\0' character) ór 7 characters maximum. This corresponds to calls like strncpy, for example.

The purpose of the terminating zero character is to terminate the string, i.e. to indirectly encode the string length information in the string itself. If you somehow already know the length of your string, you can write code that works correctly without relying on that terminating zero character. That's basically all.
Now, in your code sample the first cycle does something that does not make much sense. It prints 7 characters from a string that actually has length 6. I.e. it attempts to print the terminating zero as well. Why it is doing that - I don't know. In other words, the first output generated by your code is formally different from the rest, since it includes the effect of printing a zero character right after the ! sign. On your platform that effect just happened to be "invisible" on the screen, which is why you probably assumed that the first output is the same as the other ones. However, if you redirect the output to a file, you will be able to see that it is actually quite different.
The other output methods in your code simply output the string up to (and not including) the terminating zero character. The last cycle has redundant condition checking, since you know that the cycle will stop at zero character, before i will have a chance to hit 7.
Other than that, I don't know what "difference" you might be asking about. Please, clarify your question, if this doesn't answer it.

In your loop, you actually print the nul character. Generally this has no effect since it is a non-printing, non-control character. However printf("%s",a); will not output the nul at all - it uses it as a sentinel value. So you loop is not equivalent to %s formatted output.
If you try say:
char a[] = "123456" ;
char b[]={'h','e','l','l','o','!' } ; // No terminator
char c[] = "ABCDEF" ;
printf( "%s", a ) ;
printf( "%s", b ) ;
printf( "%s", c ) ;
You might clearly see why the nul terminator is essential. In my case it output:
123456
hello!╠╠╠╠╠╠╠╠╠╠123456
ABCDEF
Your mileage may vary - the result is undefined behaviour, but in this case the output is running through to the adjacent string, but the compiler has inserted some unused space between them with "junk" in it. I packed a string either side of the un-terminated string because there is no way of telling how a particular compiler orders data in memory. Incidentally when I declared the strings static if the strings, the string b was output with no "run-on". Sometimes the surrounding "junk" may happen to already be zero.

Null termination of char array

Consider following case:
#include<stdio.h>
int main()
{
char A[5];
scanf("%s",A);
printf("%s",A);
}
My question is if char A[5] contains only two characters. Say "ab", then A[0]='a', A[1]='b' and A[2]='\0'.
But if the input is say, "abcde" then where is '\0' in that case. Will A[5] contain '\0'?
If yes, why?
sizeof(A) will always return 5 as answer. Then when the array is full, is there an extra byte reserved for '\0' which sizeof() doesn't count?

If you type more than four characters then the extra characters and the null terminator will be written outside the end of the array, overwriting memory not belonging to the array. This is a buffer overflow.
C does not prevent you from clobbering memory you don't own. This results in undefined behavior. Your program could do anything—it could crash, it could silently trash other variables and cause confusing behavior, it could be harmless, or anything else. Notice that there's no guarantee that your program will either work reliably or crash reliably. You can't even depend on it crashing immediately.
This is a great example of why scanf("%s") is dangerous and should never be used. It doesn't know about the size of your array which means there is no way to use it safely. Instead, avoid scanf and use something safer, like fgets():
fgets() reads in at most one less than size characters from stream and stores them into the buffer pointed to by s. Reading stops after an EOF or a newline. If a newline is read, it is stored into the buffer. A terminating null byte ('\0') is stored after the last character in the buffer.
Example:
if (fgets(A, sizeof A, stdin) == NULL) {
/* error reading input */
}
Annoyingly, fgets() will leave a trailing newline character ('\n') at the end of the array. So you may also want code to remove it.
size_t length = strlen(A);
if (A[length - 1] == '\n') {
A[length - 1] = '\0';
}
Ugh. A simple (but broken) scanf("%s") has turned into a 7 line monstrosity. And that's the second lesson of the day: C is not good at I/O and string handling. It can be done, and it can be done safely, but C will kick and scream the whole time.

As already pointed out - you have to define/allocate an array of length N + 1 in order to store N chars correctly. It is possible to limit the amount of characters read by scanf. In your example it would be:
scanf("%4s", A);
in order to read max. 4 chars from stdin.

character arrays in c are merely pointers to blocks of memory. If you tell the compiler to reserve 5 bytes for characters, it does. If you try to put more then 5 bytes in there, it will just overwrite the memory past the 5 bytes you reserved.
That is why c can have serious security implementations. You have to know that you are only going to write 4 characters + a \0. C will let you overwrite memory until the program crashes.
Please don't think of char foo[5] as a string. Think of it as a spot to put 5 bytes. You can store 5 characters in there without a null, but you have to remember you need to do a memcpy(otherCharArray, foo, 5) and not use strcpy. You also have to know that the otherCharArray has enough space for those 5 bytes.

You'll end up with undefined behaviour.
As you say, the size of A will always be 5, so if you read 5 or more chars, scanf will try to write to a memory, that it's not supposed to modify.
And no, there's no reserved space/char for the \0 symbol.

Any string greater than 4 characters in length will cause scanf to write beyond the bounds of the array. The resulting behavior is undefined and, if you're lucky, will cause your program to crash.
If you're wondering why scanf doesn't stop writing strings that are too long to be stored in the array A, it's because there's no way for scanf to know sizeof(A) is 5. When you pass an array as the parameter to a C function, the array decays to a pointer pointing to the first element in the array. So, there's no way to query the size of the array within the function.
In order to limit the number of characters read into the array use
scanf("%4s", A);

There isn't a character that is reserved, so you must be careful not to fill the entire array to the point it can't be null terminated. Char functions rely on the null terminator, and you will get disastrous results from them if you find yourself in the situation you describe.
Much C code that you'll see will use the 'n' derivatives of functions such as strncpy. From that man page you can read:
The strcpy() and strncpy() functions return s1. The stpcpy() and
stpncpy() functions return a
pointer to the terminating `\0' character of s1. If stpncpy() does not terminate s1 with a NUL
character, it instead returns a pointer to s1[n] (which does not necessarily refer to a valid mem-
ory location.)
strlen also relies on the null character to determine the length of a character buffer. If and when you're missing that character, you will get incorrect results.

the null character is used for the termination of array. it is at the end of the array and shows that the array is end at that point. the array automatically make last character as null character so that the compiler can easily understand that the array is ended.

\0 is an terminator operator which terminates itself when array is full
if array is not full then \0 will be at the end of the array
when you enter a string it will read from the end of the array

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight