What is the purpose of the s==NULL case for mbrtowc? - c

mbrtowc is specified to handle a NULL pointer for the s (multibyte character pointer) argument as follows:
If s is a null pointer, the mbrtowc() function shall be equivalent to the call:
mbrtowc(NULL, "", 1, ps)
In this case, the values of the arguments pwc and n are ignored.
As far as I can tell, this usage is largely useless. If ps is not storing any partially-converted character, the call will simply return 0 with no side effects. If ps is storing a partially-converted character, then since '\0' is not valid as the next byte in a multibyte sequence ('\0' can only be a string terminator), the call will return (size_t)-1 with errno==EILSEQ. and leave ps in an undefined state.
The intended usage seems to have been to reset the state variable, particularly when NULL is passed for ps and the internal state has been used, analogous to mbtowc's behavior with stateful encodings, but this is not specified anywhere as far as I can tell, and it conflicts with the semantics for mbrtowc's storage of partially-converted characters (if mbrtowc were to reset state when encountering a 0 byte after a potentially-valid initial subsequence, it would be unable to detect this dangerous invalid sequence).
If mbrtowc were specified to reset the state variable only when s is NULL, but not when it points to a 0 byte, a desirable state-reset behavior would be possible, but such behavior would violate the standard as written. Is this a defect in the standard? As far as I can tell, there is absolutely no way to reset the internal state (used when ps is NULL) once an illegal sequence has been encountered, and thus no correct program can use mbrtowc with ps==NULL.

Since a '\0' byte must convert to a null wide character regardless of shift state (5.2.1.2 Multibyte characters), and the mbrtowc() function is specified to reset the shift state when it converts to a wide null character (7.24.6.3.2/3 The mbrtowc function), calling mbrtowc( NULL, "", 1, ps) will reset the shift state stored in the mbstate_t pointed to by ps. And if mbrtowc( NULL, "", 1, NULL) is called to use the library's internal mbstate_t object, it will be reset to an initial state. See the end of the answer for cites of the relevant bits of the standard.
I'm by no means particularly experienced with the C standard multibyte conversion functions (my experience with this kind of thing has been using the Win32 APIs for conversion).
If mbrtowc() processes a 'incomplete char' that's cut short by a 0 byte, it should return (size_t)(-1) to indicate an invalid multibyte char (and thus detect the dangerous situation you describe). In that case the conversion/shift state is unspecified (and I think you're basically hosed for that string). The multibyte 'sequence' that a conversion was attempted on but contains a '\0' is invalid and ever will be valid with subsequent data. If the '\0' wasn't intended to be part of the converted sequence, then it shouldn't have been included in the count of bytes available for processing.
If you're in a situation where you might get additional, subsequent bytes for a partial multibyte char (say from a network stream), the n you passed for the partial multibyte char shouldn't include a 0 byte, so you'll get a (size_t)(-2) returned. In this case, if you pass a '\0' while in the middle of the partial conversion, you'll lose the fact that there's an error and as a side-effect reset the mbstate_t state in use (whether it's your own or the internal one being used because you passed in a NULL pointer for ps). I think I'm essentailly restating your question here.
However I think it is possible to detect and handle this situation, but unfortunately it requires keeping track of some state yourself:
#define MB_ERROR ((size_t)(-1))
#define MB_PARTIAL ((size_t)(-2))
// function to get a stream of multibyte characters from somewhere
int get_next(void);
int bar(void)
{
char c;
wchar_t wc;
mbstate_t state = {0};
int in_partial_convert = 0;
while ((c = get_next()) != EOF)
{
size_t result = mbrtowc( &wc, &c, 1, &state);
switch (result) {
case MB_ERROR:
// this multibyte char is invalid
return -1;
case MB_PARTIAL:
// do nothing yet, we need more data
// but remember that we're in this state
in_partial_convert = 1;
break;
case 1:
// output the competed wide char
in_partial_convert = 0; // no longer in the middle of a conversion
putwchar(wc);
break;
case 0:
if (in_partial_convert) {
// this 'last' multibyte char was mal-formed
// return an error condidtion
return -1;
}
// end of the multibyte string
// we'll handle similar to EOF
return 0;
}
}
return 0;
}
Maybe not an ideal situation, but I think it shows it's not completely broken so as to be impossible to use.
Standards citations:
5.2.1.2 Multibyte characters
A multibyte character set may have a state-dependent encoding, wherein
each sequence of multibyte characters
begins in an initial shift state and
enters other locale-specific shift
states when specific multibyte
characters are encountered in the
sequence. While in the initial shift
state, all single-byte characters
retain their usual interpretation and
do not alter the shift state. The
interpretation for subsequent bytes in
the sequence is a function of the
current shift state.
A byte with all bits zero shall be interpreted as a null character
independent of shift state.
A byte with all bits zero shall not occur in the second or subsequent
bytes of a multibyte character.
7.24.6.3.2/3 The mbrtowc function
If the corresponding wide character is
the null wide character, the resulting
state described is the initial
conversion state

In 5.2.1.2, Multibyte characters, the C Standard states:
A byte with all bits zero shall be interpreted as a null character independent of shift
state. Such a byte shall not occur as part of any other multibyte character.
The Standard seems to differentiate between shift state and conversion state, as, for example, 7.24.6 mentions:
The conversion state described by the pointed-to object is altered as needed to track the shift state, and the position within a multibyte character, for the associated multibyte character sequence.
(emphasis added by me). However, I think that the intent is to interpret a byte with all zero bits as the null character regardless of the mbstate_t value, which encodes the entire conversion state, particularly as "Such a byte shall not occur as part of any other multibyte character" implies that the null byte cannot occur within a multibyte character. If a null byte does occur in errant input where the second, third, etc. byte of a multibyte character should be, then I interpret the Standard as saying that the partial multibyte character at the EOF is silently ignored.
My reading of 7.24.6.3.2, The mbrtowc function, for the case when s is NULL is thus: the next 1 byte completes the null wide character, the return value of mbrtowc is 0, and the resulting state is the initial conversion state because:
If the corresponding wide character is the null wide character, the resulting state described is the initial conversion state.
By passing NULL for both s and ps, the internal mbstate_t of mbrtowc is reset to the initial state.

Related

mbrtowc: howto determine number of characters to skip if null character is read

According to the C99 specification the mbrtowc function returns 0
if the next n or fewer bytes complete the multibyte character that
corresponds to the null wide character (which is the value stored).
What is the best way to continue reading the input immediately after the encoded null character?
My current solution is to convert the null wide character with the given encoding in order to determine the number of input bytes to skip for the next call to mbrtowc. But there might be a more elegant way to do this.
Additionally I wonder what the rationale behind this behaviour of mbrtowc might be.
One byte. The null byte always represents the null character regardless of shift state, and cannot participate as part of a multibyte character. The source for this is:
5.2.1.2 Multibyte characters
...
A byte with all bits zero shall be interpreted as a null character independent of shift state. Such a byte shall not occur as part of any other multibyte character.

Iterating backwards Multibyte String - C

I know I can iterate forwards through a multibyte string, in C, using mbrtowc(). But what if I wanted to iterate backwards; or in other words how do I find the previous valid multibyte character. I tried the following method and it at least partially works on my Ubuntu system using the default en_us.UTF-8 locale:
char *str = "\xc2\xa2\xc2\xa1xyzwxfd\xc2\xa9", *tmp = NULL;
wchar_t wc = 0;
size_t ret = 0, width = 1;
mbstate_t state = {0};
//Iterate through 2 characters using mbrtowc()
tmp = str;
tmp += mbrtowc(&wc, tmp, MB_CUR_MAX, &state);
tmp += mbrtowc(&wc, tmp, MB_CUR_MAX, &state);
//This is a simplified version of my code. I didnt test this
//exact code but this general idea did work.
for(tmp--; (ret = mbrtowc(&wc, tmp, width, &state)) == (size_t)(-1) || ret == (size_t)(-2); width++, tmp--)
if(width == MB_CUR_MAX) printf("error\n");
printf("last multibyte character %lc\n", wc);
The idea is simple just iterate backwards by one byte until we find a valid multibyte character as defined by mbrtowc(). My question is can I rely on this to work for any possible multibyte locale or just encoding's with special properties. Also more specifically is mbstate_t being used incorrectly; I mean could the change in direction affect the validity of mbstate_t? Can I guarantee that 'ret' will only be (size_t)(-1) or (size_t)(-2) instead of either because I currently assume that 'ret' could be both depending on the definitions for an incomplete and invalid multibyte character.
If you need to deal with any theoretically-possible multibyte encoding, then it is not possible to iterate backwards. There is no requirement that a multibyte encoding have the property that no proper suffix of a valid multibyte sequence is a valid multibyte sequence. (As it happens, your algorithm requires an even stronger property, because you might recognize a multibyte sequence starting in the middle of one valid sequence and continuing into the next sequence.)
Also, you cannot predict (again, in general) the multibyte state if the multibyte encoding has shift states. If you back-up over a multibyte sequence which changes the state, you have no idea what the previous state was.
UTF-8 was designed with this in mind. It does not have shift states, and it clearly marks the octets (bytes) which can start a sequence. So if you know that the multibyte encoding is UTF-8, you can easily iterate backwards. Just scan backwards for a character not in the range 0x80-0xBF. (UTF-16 and UTF-32 are also easily iterated in either direction, but you need to read them as two-/four-byte code units, respectively, because a misaligned read is quite likely to be a correct codepoint.)
If you don't know that the multibyte encoding is UTF-8, then there is simply no robust algorithm to iterate backwards. All you can do is iterate forwards and remember the starting position and mbstate of each character.
Fortunately, these days there is really little reason to support multibyte encodings other than Unicode encodings.
For UTF-8 you can take benefit of the encoding property of the additional bytes following the first one: the additional bytes of a multibyte chars (and only them) start with 10xx xxxx.
So if you go backward an a char c is so that (c & 0xC0)==0x80 then you can skip it.
For other multibyte encoding you don't necessarily have such a simple solution as the lead and following bytes are in ranges that overlap.

What string operations do the 0x0d and 0xff (from a terminator canary) protect against

It is stated here:
A terminator canary contains NULL(0x00), CR (0x0d), LF (0x0a) and EOF (0xff) --
four characters that should terminate most string operations, rendering the
overflow attempt harmless.
I know the null (0x00) can help prevent strcpy, strncpy, stpcpy, and strcat.
Also LF (0x0A) can work for gets and fgets.
What are 0x0D and 0xFF good for stopping?
The purpose of the "canary" is to detect when a buffer has been overrun. It's placed just before the return address of the stack, after any buffers allocated in the current stack frame. If its value changes then the stack checking code know buffer has been overrun and aborts the program before it can do any damage.
The problem with this is that if the attacker overwrites the canary with the same value as it had before then buffer overrun won't be detected. To make this more difficult either a random number is used as the canary, so the attacker can't predict it, or the special "terminator canary" value you're asking about is used. The byte values that make up the terminator canary were chosen because they will terminate various copy operation used by programs. If the these values are in the string ("shellcode") the attacker uses to try overwrite the return value with then most code that would copy the string will stop before overwriting the return value.
Here are examples of memory copying operations that would be terminated before over writing the return value if the terminate canary appears in the source input:
NUL terminatation
char buf[10];
strcpy(buf, src);
The example above of the most common case. Any operation copying a C string with stop a the first NUL (0) byte in the source string.
EOL termination
char buf[10];
gets(buf);
Using gets is common beginner mistake and doesn't normally appear in production code, but it's not hard to write more sophisticated code that reads a line but doesn't take care to not overflow the buffer. What marks the end-of-line depends on the convention. The usual Unix convention is to use a single line feed character (LF, 0x0A), but Windows uses the carriage return and line feed (CR LF, 0x0D 0x0A) sequence. Since the terminator canary contains both the CR and LF byte values, both EOL terminators are present in the canary. Any operation copying a single line will stop before overwriting the return value.
Broken EOF termination
char buf[10];
char *dest = buf;
while(1) {
char c = getchar();
if (c == EOF) {
break;
}
*dest++ = c;
}
How the EOF terminator works here is harder to see and explain than in other two examples. In addition to the buffer overflow bug, this code contains another bug that leads to 0xFF being interpreted as EOF. Like using gets, this is also a rookie mistake, but one that's more common in production code. The bug is caused by using char c instead int c.
The value returned by getchar is actually an int rather than char. This makes it possible to distinguish valid byte values from special EOF return value, which on most systems is -1. Byte value read from the file are returned as unsigned char values cast to int. So the byte value '\xFF' in the file is returned as the int value 255. When this is assigned to char variable c its truncated from (on most systems these days) a 32-bit signed integer value to an 8-bit signed integer value. This turns 32-bit signed integer value 255 into the 8-bit signed integer value -1. This conversion also turns the 32-bit signed integer value -1 (EOF) into the 8-bit signed integer value -1.
Since both '\xFF' and EOF end up being converted to -1, they both end up comparing as equal to EOF. This means that in the example code above when either value is returned by getchar the loop will terminate. Any code that makes this sort of mistake will stop copying at the first '\xFF' byte in the source input. However code that uses getchar, getc, fgetc or a similar function correctly, assigning the return value to an int, will continue to copy past any '\xFF' bytes.
Summary
The terminator canary makes it so an attacker can't exploit a given buffer overrun in code, if the overrunning code uses a NUL (0), EOL (0x0D and/or 0x0A) or a broken EOF (0xFF) comparison to terminate the copy. My guess is that most case of buffer overruns are made unexploitable by NUL being in the terminator canary. Many of the remainder would be protected by the EOL characters, while the broken EOF byte probably doesn't have much applicability at all.

Is fgets() returning NULL with a short buffer compliant?

In unit testing a function containing fgets(), came across an unexpected result when the buffer size n < 2. Obviously such a buffer size is foolish, but the test is exploring corner cases.
Simplified code:
#include <error.h>
#include <stdio.h>
void test_fgets(char * restrict s, int n) {
FILE *stream = stdin;
s[0] = 42;
printf("< s:%p n:%d stream:%p\n", s, n, stream);
char *retval = fgets(s, n, stream);
printf("> errno:%d feof:%d ferror:%d retval:%p s[0]:%d\n\n",
errno, feof(stream), ferror(stream), retval, s[0]);
}
int main(void) {
char s[100];
test_fgets(s, sizeof s); // Entered "123\n" and works as expected
test_fgets(s, 1); // fgets() --> NULL, feof() --> 0, ferror() --> 0
test_fgets(s, 0); // Same as above
return 0;
}
What is surprising is that fgets() returns NULL and neither feof() nor ferror() are 1.
The C spec, below, seems silent on this rare case.
Questions:
Is returning NULL without setting feof() nor ferror() compliant behavior?
Could a different result be compliant behavior?
Does it make a difference if n is 1 or less than 1?
Platform: gcc version 4.5.3 Target: i686-pc-cygwin
Here is an abstract from the C11 Standard, some emphasis mine:
7.21.7.2 The fgets function
The fgets function reads at most one less than the number of characters specified by n [...]
The fgets function returns s if successful. If end-of-file is encountered and no characters have been read into the array, the contents of the array remain unchanged and a null pointer is returned. If a read error occurs during the operation, the array contents are indeterminate and a null pointer is returned.
Related postings
How to use feof and ferror for fgets (minishell in C)
Trouble creating a shell in C (Seg-Fault and ferror)
fputs(), fgets(), ferror() questions and C++ equivalents
Return value of fgets()
[Edit] Comments on answers
#Shafik Yaghmour well presented the overall issue: since the C spec does not mention what to do when it does not read any data nor write any data to s when (n <= 0), it is Undefined Behavior. So any reasonable response should be acceptable such as return NULL, set no flags, leave buffer alone.
As to what should happen when n==1, #Oliver Matthews answer and #Matt McNabb comment indicate a C spec's lack of clarity considering a buffer of n == 1. The C spec seems to favor a buffer of n == 1 should return the buffer pointer with s[0] == '\0', but is not explicit enough.
The behavior is different in newer releases of glibc, for n == 1, it returns s which indicates success, this is not an unreasonable reading of 7.19.7.2 The fgets function paragraph 2 which says (it is the same in both C99 and C11, emphasis mine):
char *fgets(char * restrict s, int n, FILE * restrict stream);
The fgets function reads at most one less than the number of characters specified by n
from the stream pointed to by stream into the array pointed to by s. No additional
characters are read after a new-line character (which is retained) or after end-of-file. A null character is written immediately after the last character read into the array.
Not terribly useful but does not violate anything said in the standard, it will read at most 0 characters and null-terminate. So the results you are seeing looks like a bug that was fixed in later releases of glibc. It also clearly not an end of file nor a read error as covered in paragraph 3:
[...]If end-of-file is encountered and no characters have been read into the array, the contents of the array remain unchanged and a null pointer is returned. If a read error occurs during the operation, the array contents are indeterminate and a null pointer is returned.
As far as the final case where n == 0 this looks like it is simply undefined behavior. The draft C99 standard section 4. Conformance paragraph 2 says (emphasis mine):
If a ‘‘shall’’ or ‘‘shall not’’ requirement that appears outside of a constraint is violated, the behavior is undefined. Undefined behavior is otherwise indicated in this International Standard by the words ‘‘undefined behavior’’ or by the omission of any explicit definition of behavior. There is no difference in emphasis among these three; they all describe ‘‘behavior that is undefined’’.
The wording is the same in C11. It is impossible to read at most -1 characters and it is neither an end of file nor a read error. So we have no explicit definition of the behavior in this case. Looks like a defect but I cannot find any defect reports that cover this.
tl;dr: that version of glibc has a bug for n=1, the spec has (arguably) an ambiguity for n<1; but I think newer glibc's take the most sensible option.
So, the c99 spec is basically the same.
The behavior for test_fgets(s, 1) is wrong. glibc 2.19 gives the correct output (retval!=null, s[0]==null.
The behavior for test_fgets(s,0) is undefined, really. It wasn't successful (you can't read at most -1 characters), but it doesn't hit either of the two 'return null' criteria (EOF& 0 read; read error).
However, GCC's behavior is arguably correct (returning the pointer to the unchanged s would also be OK) - feof isn't set, because it hasn't hit eof; ferror isn't set because there wasn't a read error.
I suspect the logic in gcc (not got the source to hand) has an 'if n<=0 return null' near the top.
[edit:]
On reflection, I actually think that glibc's behavior for n=0 is the most correct response it could give:
No eof read, so feof()==0
No reads, so no read error could have happened, so ferror=0
Now as for the return value
- fgets cannot have read -1 characters (it's impossible). If fgets returned the passed in pointer back, it would look like a successful call.
- Ignoring this corner case, fgets commits to returning a null-terminated string. If it didn't in this case, you couldn't rely on it. But fgets will set the character after after the last character read into the array to null. given we read in -1 characters (apparantly) on this call, that would make it setting the 0th character to null?
So, the sanest choice is to return null (in my opinion).
The C Standard (C11 n1570 draft) specifies fgets() this way (some emphasis mine):
7.21.7.2 The fgets function
Synopsis
#include <stdio.h>
char *fgets(char * restrict s, int n,
FILE * restrict stream);
Description
The fgets function reads at most one less than the number of characters specified by n from the stream pointed to by stream into the array pointed to by s. No additional characters are read after a new-line character (which is retained) or after end-of-file. A null character is written immediately after the last character read into the array.
Returns
The fgets function returns s if successful. If end-of-file is encountered and no characters have been read into the array, the contents of the array remain unchanged and a null pointer is returned. If a read error occurs during the operation, the array contents are indeterminate and a null pointer is returned.
The phrase reads at most one less than the number of characters specified by n is not precise enough. A negative number cannot represent a number of characters*, but 0 does mean no characters. reading at most -1 characters does not seem possible, so the case of n <= 0 is not defined by the Standard, and as such has undefined behavior.
For n = 1, fgets is specified as reading at most 0 characters, which it should succeed at unless the stream is invalid or in an error condition. The phrase A null character is written immediately after the last character read into the array is ambiguous as no characters have been read into the array, but it makes sense to interpret this special case as meaning s[0] = '\0';. The specification for gets_s offers the same reading, with the same imprecision. Again the behavior is not explicitly defined, so it is undefined1.
The specification of snprintf is more precise, the case of n = 0 is explicitly specified, with useful semantics attached. Unfortunately, such semantics cannot be implemented for fgets:
7.21.6.5 The snprintf function
Synopsis
#include <stdio.h>
int snprintf(char * restrict s, size_t n,
const char * restrict format, ...);
Description
The snprintf function is equivalent to fprintf, except that the output is written into an array (specified by argument s) rather than to a stream. If n is zero, nothing is written, and s may be a null pointer. Otherwise, output characters beyond the n-1st are discarded rather than being written to the array, and a null character is written at the end of the characters actually written into the array. If copying takes place between objects that overlap, the behavior is undefined.
The specification for get_s() also clarifies the case of n = 0 and makes it a runtime constraint violation:
K.3.5.4.1 The gets_s function
Synopsis
#define __STDC_WANT_LIB_EXT1__ 1
#include <stdio.h>
char *gets_s(char *s, rsize_t n);
Runtime-constraints
s shall not be a null pointer. n shall neither be equal to zero nor be greater than RSIZE_MAX. A new-line character, end-of-file, or read error shall occur within reading n-1 characters from stdin.
If there is a runtime-constraint violation, s[0] is set to the null character, and characters are read and discarded from stdin until a new-line character is read, or end-of-file or a read error occurs.
Description
The gets_s function reads at most one less than the number of characters specified by n from the stream pointed to by stdin, into the array pointed to by s. No additional characters are read after a new-line character (which is discarded) or after end-of-file. The discarded new-line character does not count towards number of characters read. A null character is written immediately after the last character read into the array.
If end-of-file is encountered and no characters have been read into the array, or if a read error occurs during the operation, then s[0] is set to the null character, and the other elements of s take unspecified values.
Recommended practice
The fgets function allows properly-written programs to safely process input lines too long to store in the result array. In general this requires that callers of fgets pay attention to the presence or absence of a new-line character in the result array. Consider using fgets (along with any needed processing based on new-line characters) instead of gets_s.
Returns
The gets_s function returns s if successful. If there was a runtime-constraint violation, or if end-of-file is encountered and no characters have been read into the array, or if a read error occurs during the operation, then a null pointer is returned.
The C library you are testing seems to have a bug for this case, which was fixed un later versions of the glibc. Returning NULL should mean some kind of failure condition (the opposite of success): end-of-file or read-error. Other cases such as invalid stream or stream not open for reading are more or less explicitly described as undefined behavior.
The cases of n = 0 and n < 0 are not defined. Returning NULL is a sensible choice, but it would be useful to clarify the description of fgets() in the Standard to require n > 0 as is the case for gets_s.
Note that there is another specification issue for fgets: the type of the n argument should have been size_t instead of int, but this function was originally specified by the C authors before size_t was even invented, and kept unchanged in the first C Standard (C89). Changing it then was considered unacceptable because they were trying to standardize existing usage: the signature change would have created inconsistencies across C libraries and broken well written existing code that uses function pointers or unprototyped functions.
1The C Standard specifies in paragraph 2 of 4. Conformance that If a “shall” or “shall not” requirement that appears outside of a constraint or runtime-constraint is violated, the behavior is undefined. Undefined behavior is otherwise indicated in this International Standard by the words “undefined behavior” or by the omission of any explicit definition of behavior. There is no difference in emphasis among these three; they all describe “behavior that is undefined”.

Can the null character be used to represent the zero character?

The C99 standard requires that "A byte with all bits set to 0, called the null character, shall exist in the basic execution character set; it is used to terminate a character string." (5.2.1.2) It then goes on to list 99 other characters that must be in the execution set. Can a character set be used in which the null character is one of these 99 characters? In particular, is it allowed that '0' == '\0' ?
Edit: Everyone is pointing out that in ASCII, '0' is 0x30. This is true, but the standard doesn't mandate the used of ASCII.
No matter if you use ASCII, EBCDIC or something "self-crafted", '0' must be distinct from '\0', for the reason you mention yourself:
A byte with all bits set to 0, called the null character, shall exist in the basic execution character set; it is used to terminate a character string. (5.2.1.2)
If the null character terminates a character string, it cannot be contained in that string. It is the only character which cannot be contained in a string; all other haracters can be used and thus must be distinct from 0.
I don't think the standard states that each of the characters that it lists (including the null character) has a distinct value, other than that the digits do. But a "character set" containing a value 0 that allegedly represents 91 of the 100 required characters is clearly not really a character set containing the required 100 characters. So this is either:
part of the English-language definition of "a character set",
obvious from context,
a very minor flaw in the text of the standard, that it should spell it out to prevent wilful misinterpretation by a faithless implementer.
Take your pick.
In the case of the '0'='\0' you will not be able to differ end of string and '0' value.
Thus it will be a bit hard to use something like "0_any_string", as it already starts from '0'.
No, it can't. Character set must be described by an injective function, i.e. a function that maps each character to exactly one distinct binary value. Mapping 2 characters to the same value will make the character set non-deterministic, i.e. the computer won't be able to interpret the data to a matching character since more than one fits.
The C99 standard poses another restriction by forcing the mapping of null character to a specific binary value. Given the above paragraph this means that no other character can have a value identical to null.
The integer constant literal 0 has different meanings depending upon
the context in which it's used. In all cases, it is still an integer
constant with the value 0, it is just described in different ways.
If a pointer is being compared to the constant literal 0, then this is
a check to see if the pointer is a null pointer. This 0 is then
referred to as a null pointer constant. The C standard defines that 0
cast to the type void * is both a null pointer and a null pointer
constant.
What is the difference between NULL, '\0' and 0

Resources