Is snprintf always null terminating the destination buffer?
In other words, is this sufficient:
char dst[10];
snprintf(dst, sizeof (dst), "blah %s", somestr);
or do you have to do like this, if somestr is long enough?
char dst[10];
somestr[sizeof (dst) - 1] = '\0';
snprintf(dst, sizeof (dst) - 1, "blah %s", somestr);
I am interested both in what the standard says and what some popular libc might do which is not standard behavior.
As the other answers establish: It should:
snprintf ... Writes the results to a character string buffer. (...)
will be terminated with a null character, unless buf_size is zero.
So all you have to take care is that you don't pass an zero-size buffer to it, because (obviously) it cannot write a zero to "nowhere".
However, beware that Microsoft's library does not have a function called snprintf but instead historically only had a function called _snprintf (note leading underscore) which does not append a terminating null. Here's the docs (VS 2012, ~~ VS 2013):
http://msdn.microsoft.com/en-us/library/2ts7cx93%28v=vs.110%29.aspx
Return Value
Let len be the length of the formatted data string (not including the
terminating null). len and count are in bytes for _snprintf, wide
characters for _snwprintf.
If len < count, then len characters are stored in buffer, a
null-terminator is appended, and len is returned.
If len = count, then len characters are stored in buffer, no
null-terminator is appended, and len is returned.
If len > count, then count characters are stored in buffer, no
null-terminator is appended, and a negative value is returned.
(...)
Visual Studio 2015 (VC14) apparently introduced the conforming snprintf function, but the legacy one with the leading underscore and the non null-terminating behavior is still there:
The snprintf function truncates the output when len is greater
than or equal to count, by placing a null-terminator at
buffer[count-1]. (...)
For all functions other than snprintf, if len = count, len
characters are stored in buffer, no null-terminator is appended,
(...)
According to snprintf(3) manpage.
The functions snprintf() and vsnprintf() write at most size bytes (including the trailing null byte ('\0')) to str.
So, yes, no need to terminate if size >= 1.
According to the C standard, unless the buffer size is 0, vsnprintf() and snprintf() null terminates its output.
The snprintf() function shall be equivalent to sprintf(), with the addition of the n argument which states the size of the buffer referred to by s. If n is zero, nothing shall be written and s may be a null pointer. Otherwise, output bytes beyond the n-1st shall be discarded instead of being written to the array, and a null byte is written at the end of the bytes actually written into the array.
So, if you need to know how big a buffer to allocate, use a size of zero, and you can then use a null pointer as the destination. Note that I linked to the POSIX pages, but these explicitly say that there is not intended to be any divergence between Standard C and POSIX where they cover the same ground:
The functionality described on this reference page is aligned with the ISO C standard. Any conflict between the requirements described here and the ISO C standard is unintentional. This volume of POSIX.1-2008 defers to the ISO C standard.
Be wary of the Microsoft version of vsnprintf(). It definitely behaves differently from the standard C version when there is not enough space in the buffer (it returns -1 where the standard function returns the required length). It is not entirely clear that the Microsoft version null terminates its output under error conditions, whereas the standard C version does.
However, note that Microsoft has changed the rules (vsnprintf()) since this answer was originally written:
Beginning with the UCRT in Visual Studio 2015 and Windows 10, vsnprintf is no longer identical to _vsnprintf. The vsnprintf function conforms to the C99 standard; _vnsprintf is kept for backward compatibility with older Visual Studio code.
Similar comments apply to snprintf() and sprintf().
Note also the answers to Do you use the TR 24731 safe functions? (see MSDN for the Microsoft version of the vsprintf_s()) and the Mac solution for the safe alternatives to unsafe C standard library functions?
Some older versions of SunOS did weird things with snprintf and might have not NUL-terminated the output and had return values that didn't match what everyone else was doing, but anything that has been released in the past 10 years have been doing what C99 says.
The ambiguity starts from the C Standard itself. Both C99 and C11 have identical description of snprintf function. Here is the description from C99:
7.19.6.5 The snprintf function
Synopsis
1 #include <stdio.h>
int snprintf(char * restrict s, size_t n, const char * restrict format, ...);
Description
2 The snprintf function is equivalent to fprintf, except that the output is written into an array (specified by argument s) rather than to a stream. If n is zero, nothing is written, and s may be a null pointer. Otherwise, output characters beyond the n-1st are
discarded rather than being written to the array, and a null character is written at the end of the characters actually written into the array. If copying takes place between objects that overlap, the behavior is undefined.
Returns
3 The snprintf function returns the number of characters that would have been written had n been sufficiently large, not counting the terminating null character, or a negative value if an encoding error occurred. Thus, the null-terminated output has been completely written if and only if the returned value is nonnegative and less than n.
On the one hand the sentence
Otherwise, output characters beyond the n-1st are discarded rather than being written to the array, and a null character is written at the end of the characters actually written into the array
says that
if (the s points to a 3-character-long array, and) n is 3, then 2 characters will be written, and the characters beyond the 2nd one are discarded; then the null character is written after those 2 (and the null character will be the 3rd character written).
And this I believe answers the original question.
THE ANSWER:
If copying takes place between objects that overlap, the behavior is undefined.
If n is 0 then nothing is written to the output
otherwise, if no encoding errors encountered, the output is ALWAYS null-terminated (regardless of whether the output fits in the output array or not; if not then some characters are discarded such that the output array is never overflown),
otherwise (if encoding errors are encountered) the output can stay non-null-terminated.
On the other hand
The last sentence
Thus, the null-terminated output has been completely written if and only if the returned value is nonnegative and less than n
gives ambiguity (or my English is not good enough). I can interpret this sentence in at least two ways:
1. The output is null-terminated if and only if the returned value is nonnegative and less than n (which means that if the returned value is not less than n, i.e. the output (including the terminating null character) does not fit in the array, then the output is not null-terminated).
2. The output is complete (no characters have been discarded) if and only if the returned value is nonnegative and less than n.
I believe that the interpretation 1 above contradicts THE ANSWER, causes misunderstanding and lengthy discussions. That is why the last sentence describing the snprintf function needs a change in order to remove any ambiguity (which gives grounds for writing a Proposal to the C language Standard).
The example of non-ambiguous wording I believe can be taken from http://en.cppreference.com/w/c/io/fprintf (see 4)), thanks to #"Martin Ba" for the link.
See also the question "snprintf: Are there any C Standard Proposals/plans to change the description of this func?".
Related
In the various cases that a buffer is provided to the standard library's many string functions, is it guaranteed that the buffer will not be modified beyond the null terminator? For example:
char buffer[17] = "abcdefghijklmnop";
sscanf("123", "%16s", buffer);
Is buffer now required to equal "123\0efghijklmnop"?
Another example:
char buffer[10];
fgets(buffer, 10, fp);
If the read line is only 3 characters long, can one be certain that the 6th character is the same as before fgets was called?
The C99 draft standard does not explicitly state what should happen in those cases, but by considering multiple variations, you can show that it must work a certain way so that it meets the specification in all cases.
The standard says:
%s - Matches a sequence of non-white-space characters.252)
If no l length modifier is present, the corresponding argument shall be a
pointer to the initial element of a character array large enough to accept the
sequence and a terminating null character, which will be added automatically.
Here's a pair of examples that show it must work the way you are proposing to meet the standard.
Example A:
char buffer[4] = "abcd";
char buffer2[10]; // Note the this could be placed at what would be buffer+4
sscanf("123 4", "%s %s", buffer, buffer2);
// Result is buffer = "123\0"
// buffer2 = "4\0"
Example B:
char buffer[17] = "abcdefghijklmnop";
char* buffer2 = &buffer[4];
sscanf("123 4", "%s %s", buffer, buffer2);
// Result is buffer = "123\04\0"
Note that the interface of sscanf doesn't provide enough information to really know that these were different. So, if Example B is to work properly, it must not mess with the bytes after the null character in Example A. This is because it must work in both cases according to this bit of spec.
So implicitly it must work as you stated due to the spec.
Similar arguments can be placed for other functions, but I think you can see the idea from this example.
NOTE:
Providing size limits in the format, such as "%16s", could change the behavior. By the specification, it would be functionally acceptable for sscanf to zero out a buffer to its limits before writing the data into the buffer. In practice, most implementations opt for performance, which means they leave the remainder alone.
When the intent of the specification is to do this sort of zeroing out, it is usually explicitly specified. strncpy is an example. If the length of the string is less than the maximum buffer length specified, it will fill the rest of the space with null characters. The fact that this same "string" function could return a non-terminated string as well makes this one of the most common functions for people to roll their own version.
As far as fgets, a similar situation could arise. The only gotcha is that the specification explicitly states that if nothing is read in, the buffer remains untouched. An acceptable functional implementation could sidestep this by checking to see if there is at least one byte to read before zeroing out the buffer.
Each individual byte in the buffer is an object. Unless some part of the function description of sscanf or fgets mentions modifying those bytes, or even implies their values may change e.g. by stating their values become unspecified, then the general rule applies: (emphasis mine)
6.2.4 Storage durations of objects
2 [...] An object exists, has a constant address, and retains its last-stored value throughout its lifetime. [...]
It's this same principle that guarantees that
#include <stdio.h>
int a = 1;
int main() {
printf ("%d\n", a);
printf ("%d\n", a);
}
attempts to print 1 twice. Even though a is global, printf can access global variables, and the description of printf doesn't mention not modifying a.
Neither the description of fgets nor that of sscanf mentions modifying buffers past the bytes that actually were supposed to be written (except in the case of a read error), so those bytes don't get modified.
The standard is somewhat ambiguous on this, but I think a reasonable reading of it is that the answer is: yes, it's not allowed to write more bytes to the buffer than it read+null. On the other hand, a stricter reading/interpretation of the text could conclude that the answer is no, there's no guarantee. Here's what a publicly avaialble draft says about fgets.
char *fgets(char * restrict s, int n, FILE * restrict stream);
The fgets function reads at most one less than the number of characters specified by n from the stream pointed to by stream into the array pointed to by s. No additional characters are read after a new-line character (which is retained) or after end-of-file. A null character is written immediately after the last character read into the array.
The fgets function returns s if successful. If end-of-file is encountered and no characters have been read into the array, the contents of the array remain unchanged and a null pointer is returned. If a read error occurs during the operation, the array contents are indeterminate and a null pointer is returned.
There's a guarantee about how much it is supposed to read from the input, i.e. stop reading at newline or EOF and not read more than n-1 bytes. Although nothing is said explicitly about how much it's allowed to write to the buffer, the common knowledge is that fgets's n parameter is used to prevent buffer overflows. It's a little strange that the standard uses the ambiguous term read, which may not necessarily imply that gets can't write to the buffer more than n bytes, if you want to nitpick on the terminology it uses. But note that the same "read" terminology is used about both issues: the n-limit and the EOF/newline limit. So if you interpret the n-related "read" as a buffer-write limit, then [for consistency] you can/should interpret the other "read" the same way, i.e. not write more than what it read when string is shorter than the buffer.
On the other hand, if you distinguish between the uses of the phrase-verb "read into" (="write") and just "read", then you can't read the committee's text the same way. You are guaranteed that it won't "read into" (="write to") the array more than n bytes, but if the input string is terminated sooner by newline or EOF you're only guaranteed the rest (of the input) won't be "read", but whether that implies in won't be "read into" (="written to") the buffer is unclear under this stricter reading. The crucial issue is keyword is "into", which is elided, so the problem is whether the completion given by me in brackets in the following modified quote is the intended interpretation:
No additional characters are read [into the array] after a new-line character (which is retained) or after end-of-file.
Frankly a single postcondition stated as a formula (and would be pretty short in this case) would have been a lot more helpful than the verbiage I quoted...
I can't be bothered to try and analyze their writeup about the *scanf family, because I suspect it's going to be even more complicated given all the other things that happen in those functions; their writeup for fscanf is about five pages long... But I suspect a similar logic applies.
is it guaranteed that the buffer will not be modified beyond the null
terminator?
No, there's no guarantee.
Is buffer now required to equal "123\0efghijklmnop"?
Yes. But that's only because you've used correct parameters to your string related functions. Should you mess up buffer length, input modifiers to sscanf and such, then you program will compile. But it will most likely fail during runtime.
If the read line is only 3 characters long, can one be certain that the 6th character is the same as before fgets was called?
Yes. Once fgets() figures you have a 3 character input string it stores the input in the provided buffer, and it doesn't care about the reset of provided space at all.
Is buffer now required to equal "123\0efghijklmnop"?
Here buffer is just consists of 123 string guaranteed terminating at NUL.
Yes the memory allocated for array buffer will not get de-allocated, however you are making sure/restricting your string buffer can atmost only have 16 char elements which you can read into it at any point of time. Now depends whether you write just a single char or maximum what buffer can take.
For example:
char buffer[4096] = "abc";`
actually does something below,
memcpy(buffer, "abc", sizeof("abc"));
memset(&buffer[sizeof("abc")], 0, sizeof(buffer)-sizeof("abc"));
The standard insists that if any part of char array is initialized that is all it consists of at any moment until obeying its memory boundary.
There are no any guarantees from standard, which is why the functions sscanf and fgets are recommended to be used (with respect to the size of the buffer) as you show in your question (and using of fgets is considered preferable compared with gets).
However, some standard functions use null-terminator in their work, e.g. strlen (but I suppose you ask about string modification)
EDIT:
In your example
fgets(buffer, 10, fp);
untouching characters after 10-th is guaranteed (content and length of buffer will not be considered by fgets)
EDIT2:
Moreover, when using fgets keep in mind that '\n' will be stored in the buffers. e.g.
"123\n\0fghijklmnop"
instead of expected
"123\0efghijklmnop"
Depends on the function in use (and to a lesser degree its implementation). sscanf will start writing when it encounters its first non-whitespace character, and continue writing until its first whitespace character, where it will add a finishing 0 and return. But a function like strncpy (famously) zeroes out the rest of the buffer.
There is however nothing in the C standard which mandates how these functions behave.
Lately, I noticed a strange case I would like to verify:
By SUS, for %n in a format string, the respective int will be set to the-amount-of-bytes-written-to-the-output.
Additionally, for snprintf(dest, 3, "abcd"), dest will point to "ab\0". Why? Because no more than n (n = 3) bytes are to be written to the output (the dest buffer).
I deduced that for the code:
int written;
char dest[3];
snprintf(dest, 3, "abcde%n", &written);
written will be set to 2 (null termination excluded from count).
But from a test I made using GCC 4.8.1, written was set to 5.
Was I misinterpreting the standard? Is it a bug? Is it undefined behaviour?
Edit:
#wildplasser said:
... the behavior of %n in the format string could be undefined or implementation defined ...
And
... the implementation has to simulate processing the complete format string (including the %n) ...
#par said:
written is 5 because that's how many characters would be written at the point the %n is encountered. This is correct behavior. snprintf only copies up to size characters minus the trailing null ...
And:
Another way to look at this is that the %n wouldn't have even been encountered if it only processed up to 2 characters, so it's conceivable to expect written to have an invalid value...
And:
... the whole string is processed via printf() rules, then the max-length is applied ...
Can it be verified be the standard, a standard-draft or some official source?
written is 5 because that's how many characters would be written at the point the %n is encountered. This is correct behavior. snprintf only copies up to size characters minus the trailing null (so in your case 3-1 == 2. You have to separate the string formatting behavior from the only-write-so-many characters.
Another way to look at this is that the %n wouldn't have even been encountered if it only processed up to 2 characters, so it's conceivable to expect written to have an invalid value. That's where there would be a bug, if you were expecting something valid in written at the point %n was encountered (and there wasn't).
So remember, the whole string is processed via printf() rules, then the max-length is applied.
It is not a bug: ISOC99 says
The snprintf function is equivalent to fprintf
[...]
output characters beyond the n-1st are discarded rather than being written to the array
[...]
So it just discards trailing output but otherwise behaves the same.
In unit testing a function containing fgets(), came across an unexpected result when the buffer size n < 2. Obviously such a buffer size is foolish, but the test is exploring corner cases.
Simplified code:
#include <error.h>
#include <stdio.h>
void test_fgets(char * restrict s, int n) {
FILE *stream = stdin;
s[0] = 42;
printf("< s:%p n:%d stream:%p\n", s, n, stream);
char *retval = fgets(s, n, stream);
printf("> errno:%d feof:%d ferror:%d retval:%p s[0]:%d\n\n",
errno, feof(stream), ferror(stream), retval, s[0]);
}
int main(void) {
char s[100];
test_fgets(s, sizeof s); // Entered "123\n" and works as expected
test_fgets(s, 1); // fgets() --> NULL, feof() --> 0, ferror() --> 0
test_fgets(s, 0); // Same as above
return 0;
}
What is surprising is that fgets() returns NULL and neither feof() nor ferror() are 1.
The C spec, below, seems silent on this rare case.
Questions:
Is returning NULL without setting feof() nor ferror() compliant behavior?
Could a different result be compliant behavior?
Does it make a difference if n is 1 or less than 1?
Platform: gcc version 4.5.3 Target: i686-pc-cygwin
Here is an abstract from the C11 Standard, some emphasis mine:
7.21.7.2 The fgets function
The fgets function reads at most one less than the number of characters specified by n [...]
The fgets function returns s if successful. If end-of-file is encountered and no characters have been read into the array, the contents of the array remain unchanged and a null pointer is returned. If a read error occurs during the operation, the array contents are indeterminate and a null pointer is returned.
Related postings
How to use feof and ferror for fgets (minishell in C)
Trouble creating a shell in C (Seg-Fault and ferror)
fputs(), fgets(), ferror() questions and C++ equivalents
Return value of fgets()
[Edit] Comments on answers
#Shafik Yaghmour well presented the overall issue: since the C spec does not mention what to do when it does not read any data nor write any data to s when (n <= 0), it is Undefined Behavior. So any reasonable response should be acceptable such as return NULL, set no flags, leave buffer alone.
As to what should happen when n==1, #Oliver Matthews answer and #Matt McNabb comment indicate a C spec's lack of clarity considering a buffer of n == 1. The C spec seems to favor a buffer of n == 1 should return the buffer pointer with s[0] == '\0', but is not explicit enough.
The behavior is different in newer releases of glibc, for n == 1, it returns s which indicates success, this is not an unreasonable reading of 7.19.7.2 The fgets function paragraph 2 which says (it is the same in both C99 and C11, emphasis mine):
char *fgets(char * restrict s, int n, FILE * restrict stream);
The fgets function reads at most one less than the number of characters specified by n
from the stream pointed to by stream into the array pointed to by s. No additional
characters are read after a new-line character (which is retained) or after end-of-file. A null character is written immediately after the last character read into the array.
Not terribly useful but does not violate anything said in the standard, it will read at most 0 characters and null-terminate. So the results you are seeing looks like a bug that was fixed in later releases of glibc. It also clearly not an end of file nor a read error as covered in paragraph 3:
[...]If end-of-file is encountered and no characters have been read into the array, the contents of the array remain unchanged and a null pointer is returned. If a read error occurs during the operation, the array contents are indeterminate and a null pointer is returned.
As far as the final case where n == 0 this looks like it is simply undefined behavior. The draft C99 standard section 4. Conformance paragraph 2 says (emphasis mine):
If a ‘‘shall’’ or ‘‘shall not’’ requirement that appears outside of a constraint is violated, the behavior is undefined. Undefined behavior is otherwise indicated in this International Standard by the words ‘‘undefined behavior’’ or by the omission of any explicit definition of behavior. There is no difference in emphasis among these three; they all describe ‘‘behavior that is undefined’’.
The wording is the same in C11. It is impossible to read at most -1 characters and it is neither an end of file nor a read error. So we have no explicit definition of the behavior in this case. Looks like a defect but I cannot find any defect reports that cover this.
tl;dr: that version of glibc has a bug for n=1, the spec has (arguably) an ambiguity for n<1; but I think newer glibc's take the most sensible option.
So, the c99 spec is basically the same.
The behavior for test_fgets(s, 1) is wrong. glibc 2.19 gives the correct output (retval!=null, s[0]==null.
The behavior for test_fgets(s,0) is undefined, really. It wasn't successful (you can't read at most -1 characters), but it doesn't hit either of the two 'return null' criteria (EOF& 0 read; read error).
However, GCC's behavior is arguably correct (returning the pointer to the unchanged s would also be OK) - feof isn't set, because it hasn't hit eof; ferror isn't set because there wasn't a read error.
I suspect the logic in gcc (not got the source to hand) has an 'if n<=0 return null' near the top.
[edit:]
On reflection, I actually think that glibc's behavior for n=0 is the most correct response it could give:
No eof read, so feof()==0
No reads, so no read error could have happened, so ferror=0
Now as for the return value
- fgets cannot have read -1 characters (it's impossible). If fgets returned the passed in pointer back, it would look like a successful call.
- Ignoring this corner case, fgets commits to returning a null-terminated string. If it didn't in this case, you couldn't rely on it. But fgets will set the character after after the last character read into the array to null. given we read in -1 characters (apparantly) on this call, that would make it setting the 0th character to null?
So, the sanest choice is to return null (in my opinion).
The C Standard (C11 n1570 draft) specifies fgets() this way (some emphasis mine):
7.21.7.2 The fgets function
Synopsis
#include <stdio.h>
char *fgets(char * restrict s, int n,
FILE * restrict stream);
Description
The fgets function reads at most one less than the number of characters specified by n from the stream pointed to by stream into the array pointed to by s. No additional characters are read after a new-line character (which is retained) or after end-of-file. A null character is written immediately after the last character read into the array.
Returns
The fgets function returns s if successful. If end-of-file is encountered and no characters have been read into the array, the contents of the array remain unchanged and a null pointer is returned. If a read error occurs during the operation, the array contents are indeterminate and a null pointer is returned.
The phrase reads at most one less than the number of characters specified by n is not precise enough. A negative number cannot represent a number of characters*, but 0 does mean no characters. reading at most -1 characters does not seem possible, so the case of n <= 0 is not defined by the Standard, and as such has undefined behavior.
For n = 1, fgets is specified as reading at most 0 characters, which it should succeed at unless the stream is invalid or in an error condition. The phrase A null character is written immediately after the last character read into the array is ambiguous as no characters have been read into the array, but it makes sense to interpret this special case as meaning s[0] = '\0';. The specification for gets_s offers the same reading, with the same imprecision. Again the behavior is not explicitly defined, so it is undefined1.
The specification of snprintf is more precise, the case of n = 0 is explicitly specified, with useful semantics attached. Unfortunately, such semantics cannot be implemented for fgets:
7.21.6.5 The snprintf function
Synopsis
#include <stdio.h>
int snprintf(char * restrict s, size_t n,
const char * restrict format, ...);
Description
The snprintf function is equivalent to fprintf, except that the output is written into an array (specified by argument s) rather than to a stream. If n is zero, nothing is written, and s may be a null pointer. Otherwise, output characters beyond the n-1st are discarded rather than being written to the array, and a null character is written at the end of the characters actually written into the array. If copying takes place between objects that overlap, the behavior is undefined.
The specification for get_s() also clarifies the case of n = 0 and makes it a runtime constraint violation:
K.3.5.4.1 The gets_s function
Synopsis
#define __STDC_WANT_LIB_EXT1__ 1
#include <stdio.h>
char *gets_s(char *s, rsize_t n);
Runtime-constraints
s shall not be a null pointer. n shall neither be equal to zero nor be greater than RSIZE_MAX. A new-line character, end-of-file, or read error shall occur within reading n-1 characters from stdin.
If there is a runtime-constraint violation, s[0] is set to the null character, and characters are read and discarded from stdin until a new-line character is read, or end-of-file or a read error occurs.
Description
The gets_s function reads at most one less than the number of characters specified by n from the stream pointed to by stdin, into the array pointed to by s. No additional characters are read after a new-line character (which is discarded) or after end-of-file. The discarded new-line character does not count towards number of characters read. A null character is written immediately after the last character read into the array.
If end-of-file is encountered and no characters have been read into the array, or if a read error occurs during the operation, then s[0] is set to the null character, and the other elements of s take unspecified values.
Recommended practice
The fgets function allows properly-written programs to safely process input lines too long to store in the result array. In general this requires that callers of fgets pay attention to the presence or absence of a new-line character in the result array. Consider using fgets (along with any needed processing based on new-line characters) instead of gets_s.
Returns
The gets_s function returns s if successful. If there was a runtime-constraint violation, or if end-of-file is encountered and no characters have been read into the array, or if a read error occurs during the operation, then a null pointer is returned.
The C library you are testing seems to have a bug for this case, which was fixed un later versions of the glibc. Returning NULL should mean some kind of failure condition (the opposite of success): end-of-file or read-error. Other cases such as invalid stream or stream not open for reading are more or less explicitly described as undefined behavior.
The cases of n = 0 and n < 0 are not defined. Returning NULL is a sensible choice, but it would be useful to clarify the description of fgets() in the Standard to require n > 0 as is the case for gets_s.
Note that there is another specification issue for fgets: the type of the n argument should have been size_t instead of int, but this function was originally specified by the C authors before size_t was even invented, and kept unchanged in the first C Standard (C89). Changing it then was considered unacceptable because they were trying to standardize existing usage: the signature change would have created inconsistencies across C libraries and broken well written existing code that uses function pointers or unprototyped functions.
1The C Standard specifies in paragraph 2 of 4. Conformance that If a “shall” or “shall not” requirement that appears outside of a constraint or runtime-constraint is violated, the behavior is undefined. Undefined behavior is otherwise indicated in this International Standard by the words “undefined behavior” or by the omission of any explicit definition of behavior. There is no difference in emphasis among these three; they all describe “behavior that is undefined”.
I've seen several usage of fgets (for example, here) that go like this:
char buff[7]="";
(...)
fgets(buff, sizeof(buff), stdin);
The interest being that, if I supply a long input like "aaaaaaaaaaa", fgets will truncate it to "aaaaaa" here, because the 7th character will be used to store '\0'.
However, when doing this:
int i=0;
for (i=0;i<7;i++)
{
buff[i]='a';
}
printf("%s\n",buff);
I will always get 7 'a's, and the program will not crash. But if I try to write 8 'a's, it will.
As I saw it later, the reason for this is that, at least on my system, when I allocate char buff[7] (with or without =""), the 8th byte (counting from 1, not from 0) gets set to 0. From what I guess, things are done like this precisely so that a for loop with 7 writes, followed by a string formatted read, could succeed, whether the last character to be written was '\0' or not, and thus avoiding the need for the programmer to set the last '\0' himself, when writing chars individually.
From this, it follows that in the case of
fgets(buff, sizeof(buff), stdin);
and then providing a too long input, the resulting buffstring will automatically have two '\0' characters, one inside the array, and one right after it that was written by the system.
I have also observed that doing
fgets(buff,(sizeof(buff)+17),stdin);
will still work, and output a very long string, without crashing. From what I guessed, this is because fgets will keep writing until sizeof(buff)+17, and the last char to be written will precisely be a '\0', ensuring that any forthcoming string reading process would terminate properly (although the memory is messed up anyway).
But then, what about fgets(buff, (sizeof(buff)+1),stdin);? this would use up all the space that was rightfully allocated in buff, and then write a '\0' right after it, thus overwriting...the '\0' previously written by the system. In other words, yes, fgets would go out of bounds, but it can be proven that when adding only one to the length of the write, the program will never crash.
So in the end, here comes the question: why does fgets always terminates its write with a '\0', when another '\0', placed by the system right after the array, already exists? why not do like in the one by one for-loop based write, that can access the whole of the array and write anything the programmer wants, without endangering anything?
Thank you very much for your answer!
EDIT: indeed, there is no proof possible, as long as I do not know whether this 8th '\0' that mysteriously appears upon allocation of buff[7], is part of the C standard or not, specifically for string arrays. If not, then...it's just luck that it works :-)
but it can be proven that when adding only one to the length of the write, the program will never crash.
No! You can't prove that! Not in the sense of a mathematical proof. You have only shown that on your system, with your compiler, with those particular compiler settings you used, with particular environment configuration, it might not crash. This is far from a mathematical proof!
In fact the C standard itself, although it guarantees that you can get the address of "one place after the last element of an array", it also states that dereferencing that address (i.e. trying to read or write from that address) is undefined behaviour.
That means that an implementation can do everything in this case. It can even do what you expect with naive reasoning (i.e. work - but it's sheer luck), but it may also crash or it may also format your HD (if your are very, very unlucky). This is especially true when writing system software (e.g. a device driver or a program running on the bare metal), i.e. when there is no OS to shield you from the nastiest consequences of writing bad code!
Edit This should answer the question made in a comment (C99 draft standard):
7.19.7.2 The fgets function
Synopsis
#include <stdio.h>
char *fgets(char * restrict s, int n,
FILE * restrict stream);
Description
The fgets function reads at most one less than the number of characters specified by n
from the stream pointed to by stream into the array pointed to by s. No additional
characters are read after a new-line character (which is retained) or after end-of-file. A
null character is written immediately after the last character read into the array.
Returns
The fgets function returns s if successful. If end-of-file is encountered and no
characters have been read into the array, the contents of the array remain unchanged and a
null pointer is returned. If a read error occurs during the operation, the array contents are
indeterminate and a null pointer is returned.
Edit: Since it seems that the problem lies in a misunderstanding of what a string is, this is the relevant excerpt from the standard (emphasis mine):
7.1.1 Definitions of terms
A string is a contiguous sequence of characters terminated by and including the first null
character. The term multibyte string is sometimes used instead to emphasize special
processing given to multibyte characters contained in the string or to avoid confusion
with a wide string. A pointer to a string is a pointer to its initial (lowest addressed)
character. The length of a string is the number of bytes preceding the null character and
the value of a string is the sequence of the values of the contained characters, in order.
From C11 standard draft:
The fgets function reads at most one less than the number of characters specified by n
from the stream pointed to by stream into the array pointed to by s. No additional
characters are read after a new-line character (which is retained) or after end-of-file. A
null character is written immediately after the last character read into the array.
The fgets function returns s if successful. If end-of-file is encountered and no
characters have been read into the array, the contents of the array remain unchanged and a
null pointer is returned. If a read error occurs during the operation, the array contents are indeterminate and a null pointer is returned.
The behaviour you describe is undefined.
What is the role of 1 and 2 in these snprintf functions? Could anyone please explain it
snprintf(argv[arg++], strlen(pbase) + 2 + strlen("ivlpp"), "%s%ccivlpp", pbase, sep);
snprintf(argv[arg++], strlen(defines_path) + 1, "-F\"%s\"", defines_path);
The role of the +2 is to allow for a terminal null and the embedded character from the %c format, so there is exactly the right amount of space for formatting the first string. but (as 6502 points out), the actual string provided is one space shorter than needed because the strlen("ivlpp") doesn't match the civlpp in the format itself. This means that the last character (the second 'p') will be truncated in the output.
The role of the +1 is also to cause snprintf() to truncate the formatted data. The format string contains 4 literal characters, and you need to allow for the terminal null, so the code should allocate strlen(defines)+5. As it is, the snprintf() truncates the data, leaving off 4 characters.
I'm dubious about whether the code really works reliably...the memory allocation is not shown, but will have to be quite complex - or it will have to over-allocate to ensure that there is no danger of buffer overflow.
Since a comment from the OP says:
I don't know the use of snprintf()
int snprintf(char *restrict s, size_t n, const char *restrict format, ...);
The snprintf() function formats data like printf(), but it writes it to a string (the s in the name) instead of to a file. The first n in the name indicates that the function is told exactly how long the string is, and snprintf() therefore ensures that the output data is null terminated (unless the length is 0). It reports how long the string should have been; if the reported value is longer than the value provided, you know the data got truncated.
So, overall, snprintf() is a relatively safe way of formatting strings, provided you use it correctly. The examples in the question do not demonstrate 'using it correctly'.
One gotcha: if you work on MS Windows, be aware that the MSVC implementation of snprintf() does not exactly follow the C99 standard (and it looks a bit as though MS no longer provides snprintf() at all; only various alternatives such as _snprintf()). I forget the exact deviation, but I think it means that the string is not properly null-terminated in all circumstances when it should be longer than the space provided.
With locally defined arrays, you normally use:
nbytes = snprintf(buffer, sizeof(buffer), "format...", ...);
With dynamically allocated memory, you normally use:
nbytes = snprintf(dynbuffer, dynbuffsize, "format...", ...);
In both cases, you check whether nbytes contains a non-negative value less than the size argument; if it does, your data is OK; if the value is equal to or larger, then your data got chopped (and you know how much space you needed to allocate).
The C99 standard says:
The snprintf function returns the number of characters that would have been written
had n been sufficiently large, not counting the terminating null character, or a negative
value if an encoding error occurred. Thus, the null-terminated output has been
completely written if and only if the returned value is nonnegative and less than n.
The programmer whose code you are reading doesn't know how to use snprintf properly. The second argument is the buffer size, so it should almost always look like this:
snprintf(buf, sizeof buf, "..." ...);
The above is for situations where buf is an array, not a pointer. In the latter case you have to pass the buffer size along:
snprintf(buf, bufsize, "...", ...);
Computing the buffer size is unneeded.
By the way, since you tagged the question as qt-related. There is a very nice QString class that you should use instead.
At a first look both seem incorrect.
In the first case the correct computation would be path + sep + name + NUL so 2 would seem ok, but for the name the strlen call is using ilvpp while the formatting code is using instead cilvpp that is one char longer.
In the second case the number of chars added is 4 (-L"") so the number to add should be 5 because of the ending NUL.