Why does printf function ignore the latter \0? [duplicate] - c

This question already has answers here:
What happens if I do printf("one\0two");?
(4 answers)
How to print a string with embedded nulls so that "(null)" is substituted for '\0'
(4 answers)
Closed 3 years ago.
I am stuck with some features that \0 has.
I know that \0 is a null character and it is a term to indicate that the formal is a string.
int j;
j = printf("abcdef\0abcdefg\0");
printf("%d", j);
return 0;
When I tried to print "abcdef\0abcdefg\0" out, C would only print string 'abcdef' and '6' instead of both 'abcdef' and 'abcdefg' which would sum up to 13. Why does this happen?

"abcdef\0abcdefg\0", the string literal, is effectively a static, const (for all intents an purposes) char array and so it has an associated size that the compiler maintains:
#include <stdio.h>
#define S "abcdef\0abcdefg\0"
//^string literals implicitly add a(nother) hidden \0 at the end
int main()
{
printf("%zu\n", sizeof(S)); //prints 16
}
But arrays are treated specially in C and passing them as a parameter to a function or almost any operator converts them to a pointer to their first element.
Pointers do not have an associated size.
When you pass a char const* to a function (e.g., printf), the function receives just one number--the address of the first element.
The way printf and most string functions in C obtain the size is by counting character until the first '\0'.
If you pass a pointer to the first element of a char array that has explicit embedded zeros in it, then for a function that counts until the first '\0', the string effectively ends at the first '\0'.

A string in C is a sequence of characters followed by a NUL character, which is '\0'. There is no separate "length" field.
When a string appears as a literal, such as "hello", what actually gets stored in memory is:
'h', 'e', 'l', 'l', 'o', '\0'
So you can see that if your string itself contains a '\0', as far as any of the C standard library functions are concerned, that's the end of the string.
Once printf see the first '\0' in your string, it stops printing and returns, because that's the end of the format string. printf has no way of knowing that there's another string after the '\0'. Maybe there is--or maybe there's just random other program data in memory after that point. It can't tell the difference.
If you want to actually print the '\0' characters, then you need to have some other way to track the "real" length of the string and use a function that accepts that length as a parameter. Alternately you could add the '\0' characters during the formatting process by specifying %c in the format string and passing 0 as the character value.

Here
j = printf("abcdef\0abcdefg\0"); /* printf stops printing once \0 encounters hence it prints abcdef */
printf() starts printing from base address of string literal "abcdef\0abcdefg\0" i.e from a until first \0 char encounters. So it prints abcdef.
-----------------------------------------------------------------
| a | b | c | d | e | f | \0 | a | b | c | d | e | f | f | g | \0 |
-----------------------------------------------------------------
0x100 0x101 ...............| 0x100 - assume this as base address of the string literal
| |
starts printing when printf sees
from 0x100 memory first \0
location it stops the printing & returns.
And then printf() returns number of printable characters i.e 6.
printf("%d", j); /* prints 6 */
From the manual page of printf
RETURN VALUE
Upon successful return, these functions return the number of
characters printed (excluding the null byte used to end output
to
strings).

Related

Why the strlen() function doesn't return the correct length for a hex string?

I have a hex string for example \xF5\x17\x30\x91\x00\xA1\xC9\x00\xDF\xFF, when trying to use strlen() function to get the length of that hex string it returns 4!
const char string_[] = { "\xF5\x17\x30\x91\x00\xA1\xC9\x00\xDF\xFF" };
unsigned int string_length = strlen(string_);
printf("%d", string_length); // the result: 4
Is the strlen() function dealing with that hex as a string, or is something unclear to me?
For string functions in the C standard library, a character with value zero, also called a null character, marks the end of a string. Your string contains \x00, which designates a null character, so the string ends there. There are four non-null characters before it, so strlen returns four.
C 2018 7.1.1 1 says:
A string is a contiguous sequence of characters terminated by and including the first null character… The length of a string is the number of bytes preceding the null character…
C 2018 7.24.6.3 2 says:
The strlen function computes the length of the string pointed to by s [its first argument].
You could compute the size of your array as sizeof string_ (because it is an array of char) or sizeof string_ / sizeof *string_ (to compute the number of elements regardless of type), but this will include a terminating null character because defining an array with [] and letting the length be computed from a string literal initializer includes the terminating null character of the string literal. You may need to hard-code the length of the array, possibly using #define to define a preprocessor macro, and use that length in the array definition and in other places where the length is needed.
It is because you have zero at index [4]
string_[0] == 0xF5
string_[1] == 0x17
string_[2] == 0x30
string_[3] == 0x91
string_[4] == 0
...
"\xf5" puts char having integer value 0xf5 at position [0]
To see it as a string you need to escape the \ character
const char string_[] = "\\xF5\\x17\\x30\\x91\\x00\\xA1\\xC9\\x00\\xDF\\xFF";
At compile time, your "string" appears as consecutive hex values expressed in C syntax inside a pair of quotation marks.
strlen() is a run time function that scans through a series of bytes, looking for the first instance of a zero-value byte.
It's good to understand the difference between "compile time" and "run time".

What does char a[50][50] mean in C?

I'm working on a homework that has to do with strings.
Here's the code
int main(){
char a[50][50];
int n;
printf("Enter the value of n\n");
scanf("%d",&n);
printf("Enter %d names\n",n);
fflush(stdin);
for(int i=0; i<n; i++){
gets(a[i]);
}
I tried to change the char a[50][50] into char a[50] but the entire program didn't run, came along with this error message: "Invalid conversion from 'char' to '*char'
I don't really understand how this works.
char a[50][50] declares a to be an array of 50 arrays of 50 char.
Then a[0] is an array of 50 char, and so is a[1],a[2]. a[3], and so on up to a[49]. There are 50 separate arrays, and each of them has 50 char.
Since a[0] is an array of 50 char, a[0][0] is a char. In general, a[i][j] is character j of array i.
gets(a[i]) says to read characters from input and put them into a[i]. For this to work, a[i] must be an array of char—gets reads multiple characters and puts them in the array. If a[i] were a single character, gets could not work.
Although gets(a[i]) says to put characters into a[i], it works by passing an address instead of passing the array. When an array is used in an expression other than as the operand of sizeof or the address operator &, C automatically converts it to a pointer to its first element. Since a[i] is an array, it is automatically converted to a pointer to its first element (a pointer to a[i][0]). gets receives this pointer and uses it to fill in characters that it reads from the standard input stream.
char a[50][50] declares a as a 50-element array of 50-element arrays of char. That means each a[i] is a 50-element array of char. It will be laid out in memory like:
+---+
a: | | a[0][0]
+---+
| | a[0][1]
+---+
| | a[0][2]
+---+
...
+---+
| | a[0][49]
+---+
| | a[1][0]
+---+
| | a[1][1]
+---+
...
+---+
| | a[1][49]
+---+
| | a[2][0]
+---+
...
This code is storing up to 50 strings, each up to 49 characters long, in a (IOW, each a[i] can store a 49-character string). In C, a string is a sequence of character values including a 0-valued terminator. For example, the string "hello", is represented as the sequence {'h', 'e', 'l', 'l', 'o', 0}. That trailing 0 marks the end of the string. String handling functions and output functions like puts and printf with the %s specifier need that 0 terminator in order to process the string correctly.
Strings are stored in arrays of character type, either char (for ASCII, UTF-8, or EBCDIC character sets) or wchar_t for "wide" strings (character sets that require more than 8 or so bits to encode). An N-character string requires an array that's at least N+1 elements wide to account for the 0 terminator.
Unless it is the operand of the sizeof or unary & operator, or is a string literal used to initialize an array of character type, an expression of type "N-element array of T" will be converted ("decay") to an expression of type "pointer to T", and the value of the expression will be the address of the first element of the array.
When you call
gets( a[i] );
the expression a[i] is converted from type "50-element array of char" to "pointer to char", and the value of the expression is the address of the first element of the array (&a[i][0])1. gets will read characters from standard input and store them to the array starting at that address. Note that gets is no longer part of the standard C library - it was removed in the 2011 version of the standard because it is unsafe. C does not require any sort of bounds checking on array accesses - if you type in more characters than the target buffer is sized to hold (in this case, 50), those extra characters will be written to memory immediately following the last element of the array, which can cause all sorts of mayhem. Buffer overflows are a popular malware exploit. You should replace the gets call with
fgets( a[i], 50, stdin );
which will read up to 49 characters into a[i] from standard input. Note that any excess characters are left in the input stream.
Also, the behavior of fflush is not defined for input streams2 - there's no good, safe, portable way to clear excess input except to read it using getchar or fgetc.
This is why you got the error message you did when you changed a from char [50][50] to char [50] - in that case, a[i] has type char, not char *, and the value of a[i] is not an address.
Microsoft's Visual Studio C compiler is a notable exception - it will clear excess input from the input stream. However, that's specific to MSVC, and not portable across different compilers. The operation is also a little nonsensical with respect to "flush" semantics.
Basically, in C this signifies an array of length 0 to 50 of that contains the character value of 50 in each cell of the array
That program seems to store n names in the array a. It first asks for the number of names, and then the names. The method char *gets(char *str) stores each different line in an entry of a.
n has 2 dimensions. The first refers to the number of names, and the second is for the length of each name. Something like n[number_of_names][lenght_of_name]
However, it will probably crash if the user provides an n > 50, or if a name contains more than 50 chars.
Also, gets() is dangerous. See this other post.
EDIT: Changing a to one dimensions makes the program try to store a whole line inside a char, hence the error

Confused why function isn't printing address

Could anyone explain why running the following code prints only the newline character?
#include <stdio.h>
#include <string.h>
int main(int argc, char **argv) {
int x = 12;
char *s = (char *) &x;
printf("%s\n", s);
return 0;
}
Since we're casting &x as a string, shouldn't what is printed be the string representation of the address of x (presumably some hexadecimal memory address)?
A string is a sequence of characters, terminated by the special character '\0'. When you print a string using the "%s" format, the printf function takes the address as a base address and prints characters from that base until it finds the terminator. If the "string" isn't actually a string, you have undefined behavior.
If you want to print an address you should use the "%p" format:
printf("Address of variable x is %p\n", (void *) &p);
Your code exhibits undefined behavior because you are trying to print an int's address using %s.
%s in printf family of function is used to print \0 terminated character array or c-type strings
From C11 specs, 7.21.6.1 The fprintf function
(8) %s: If no l length modifier is present, the argument shall be a pointer to
the initial element of an array of character type.280) Characters from
the array are written up to (but not including) the terminating null
character. If the precision is specified, no more than that many bytes
are written. If the precision is not specified or is greater than the
size of the array, the array shall contain a null character.
280) No special provisions are made for multibyte characters
And later
(9) If a conversion specification is invalid, the behavior is
undefined.282) If any argument is not the correct type for the
corresponding conversion specification, the behavior is undefined.
282) See ‘‘future library directions’’ (7.31.11).
One of the many possibilities that may happen is: (I am assuming a lot about the implementation here)
your int appears in memory as following 4 bytes
s (not guaranteed to hold the same address)
s s+1 s+2 s+3 s+4
+---+---+---+---+
| 0 | 0 | 0 | 12|
+---+---+---+---+
&x
or
s (not guaranteed to hold the same address)
s   s+1 s+2 s+3 s+4
+---+---+---+---+
| 12| 0 | 0 | 0 |
+---+---+---+---+
&x
Where 12 or form-feed or \f is a non-printable ascii character and may not print anything on the screen.
When you reinterpret it as char * and print, an empty string is printed followed by the newline. Although this is not guaranteed and anything may happen from crashing to printing indefinitely (or even worse).
Correct way to print an int is:
printf("%d\n", x);

Storing a format specifier in a pointer

What actually happens when I do this?
{
char * str = "%d\n";
str++;
str++;
printf(str-2,300);
return 0;
}
Intuitively, it appears that the number on the screen will be 300, but I want to know, what gets stored in str.
Edit: It will be great if someone can tell me, when do we actually do this?
Thanks!
str is a memory address, initially the address of the % sign of the string literal %d\n. This literal is created because it is in your code.
Two increments make str point to the character \n and when this is the case, str - 2 is the address of the % sign. So printf sees the format string %d\n, and as usual it prints the first argument after the format string as an integer. The fact is, printf does not care about the origins of the format string. It doesn't matter if you can create it on-the-fly, or hard code it.
We don't do this generally. Sometimes you need to fiddle with a character pointer to scan a string, extract something out of a string, or to skip some prefix of a string.
str is a pointer on the stack. It initially points to (ie, holds the address of) the start of the string literal "%d\n" (this is probably stored in a read-only section of your program by the compiler).
Let's say for example the string literal (the "$d\n") is stored at 0x5000. So (assuming UTF-8 or ASCII) the byte at 0x5000 is %, the byte at 0x5001 is d, 0x5002 is \n (the newline) and at 0x5003 it is \0 (the terminating null character)
str is initially holding the address 0x5000. str++ would increase it to 0x5001, meaning it now points to the string "d\n", ie one character into the string literal "%d\n". Likewise, str++ again moves it to 0x5002, ie the string "\n", two characters into the string literal "%d\n". Note that all of these are still terminated by the null character at 0x5003 (how C knows when the string ends).
The printf call has the format string as the first argument. str at this point holds 0x5002, so the call is saying 'Use the format string starting at 0x5002 - 2 = 0x5000', which turns out to be the same string that we started with.
Thus it will be the same as calling
printf("%d\n",300)
and will print out 300.
Well you are declaring a char pointer. This pointer will hold a RAM address from where you will write the following bytes: % (1 byte) d (1 byte) \n (1byte on UNIX, 2 bytes on windows) and \0 the null terminating byte that ends your string.
Then you increment by two your pointer value (which is the address of the first byte) then decrement by two. So basically you do nothing. Thus when calling printf() src-2 will point to %d\n and the null terminating byte will make it exactly pass %d\n.
So at the end of the day what you are doing is:
printf("%d\n", 300); Hence the 300 output.

Doesn't %[] or %[^] specifier in scanf(),sscanf() or fscanf() store the input in null-terminated character array?

Here's what the Beez C guide (LINK) tells about the %[] format specifier:
It allows you to specify a set of characters to be stored away (likely in an array of chars). Conversion stops when a character that is not in the set is matched.
I would appreciate if you can clarify some basic questions that arise from this premise:
1) Are the input fetched by those two format specifiers stored in the arguments(of type char*) as a character array or a character array with a \0 terminating character (string)? If not a string, how to make it store as a string , in cases like the program below where we want to fetch a sequence of characters as a string and stop when a particular character (in the negated character set) is encountered?
2) My program seems to suggest that processing stops for the %[^|] specifier when the negated character | is encountered.But when it starts again for the next format specifier,does it start from the negated character where it had stopped earlier?In my program I intend to ignore the | hence I used %*c.But I tested and found that if I use %c and an additional argument of type char,then the character | is indeed stored in that argument.
3) And lastly but crucially for me,what is the difference between passing a character array for a %s format specifier in printf() and a string(NULL terminated character array)?In my other program titled character array vs string,I've passed a character array(not NULL terminated) for a %s format specifier in printf() and it gets printed just as a string would.What is the difference?
//Program to illustrate %[^] specifier
#include<stdio.h>
int main()
{
char *ptr="fruit|apple|lemon",type[10],fruit1[10],fruit2[10];
sscanf(ptr, "%[^|]%*c%[^|]%*c%s", type,fruit1, fruit2);
printf("%s,%s,%s",type,fruit1,fruit2);
}
//character array vs string
#include<stdio.h>
int main()
{
char test[10]={'J','O','N'};
printf("%s",test);
}
Output JON
//Using %c instead of %*c
#include<stdio.h>
int main()
{
char *ptr="fruit|apple|lemon",type[10],fruit1[10],fruit2[10],char_var;
sscanf(ptr, "%[^|]%c%[^|]%*c%s", type,&char_var,fruit1, fruit2);
printf("%s,%s,%s,and the character is %c",type,fruit1,fruit2,char_var);
}
Output fruit,apple,lemon,and the character is |
It is null terminated. From sscanf():
The conversion specifiers s and [ always store the null terminator in addition to the matched characters. The size of the destination array must be at least one greater than the specified field width.
The excluded characters are unconsumed by the scan set and remain to be processed. An alternative format specifier:
if (sscanf(ptr, "%9[^|]|%9[^|]|%9s", type,fruit1, fruit2) == 3)
The array is actually null terminated as remaining elements will be zero initialized:
char test[10]={'J','O','N' /*,0,0,0,0,0,0,0*/ };
If it was not null terminated then it would keep printing until a null character was found somewhere in memory, possibly overruning the end of the array causing undefined behaviour. It is possible to print a non-null terminated array:
char buf[] = { 'a', 'b', 'c' };
printf("%.*s", 3, buf);
1) Are the input fetched by those two format specifiers stored in the
arguments(of type char*) as a character array or a character array
with a \0 terminating character (string)? If not a string, how to make
it store as a string , in cases like the program below where we want
to fetch a sequence of characters as a string and stop when a
particular character (in the negated character set) is encountered?
They're stored in ASCIIZ format - with a NUL/'\0' terminator.
2) My program seems to suggest that processing stops for the %[^|]
specifier when the negated character | is encountered.But when it
starts again for the next format specifier,does it start from the
negated character where it had stopped earlier?In my program I intend
to ignore the | hence I used %*c.But I tested and found that if I use
%c and an additional argument of type char,then the character | is
indeed stored in that argument.
It shouldn't consume the next character. Show us your code or it didn't happen ;-P.
3) And lastly but crucially for me,what is the difference between
passing a character array for a %s format specifier in printf() and a
string(NULL terminated character array)?In my other program titled
character array vs string,I've passed a character array(not NULL
terminated) for a %s format specifier in printf() and it gets printed
just as a string would.What is the difference?
(edit: the following addresses the question above, which talks about array behaviours generally and is broader than the code snippet in the question that specifically posed the case char[10] = "abcd"; and is safe)
%s must be passed a pointer to a ASCIIZ text... even if that text is explicitly in a char array, it's the mandatory presence of the NUL terminator that defines the textual content and not the array length. You must NUL terminate your character array or you have undefined behaviour. You might get away with it sometimes - e.g. strncpy into the array will NUL terminate it if-and-only-if there's room to do so, and static arrays start with all-0 content so if you only overwrite before the final character you'll have a NUL, your char[10] example happens to have elements for which values aren't specified populated with NULs, but you should generally take responsibility for ensuring that something is ensuring NUL termination.

Resources