Real implementation where sizeof(size_t) < sizeof(unsigned int)

Real implementation where sizeof(size_t) < sizeof(unsigned int) - c

I know that the C standard allows for implementations where
(sizeof(unsigned) > sizeof(size_t))
or
(sizeof(int) > sizeof(ptrdiff_t))
is true. But are there any real implementations where one of these is true?
Background
I wrote a function similar to asprintf() (since asprintf() is not portable), and snprintf() return an int but needs a size_t argument, so should I check if leni (shown below) is not less than SIZE_MAX in this code?
va_copy(atmp,args)
int leni = vsnprintf(NULL,0,format,atmp); //get the size of the new string
va_end(atmp);
if(leni<0)
//do some error handling
if(leni>=SIZE_MAX) //do i need this part?
//error handling
size_t lens = ((size_t)leni)+1;
char *newString = malloc(lens);
if(!newString)
//do some error hanling
vsnprintf(newString,lens,format,args)!=lens-1)

While the standard doesn't forbid that INT_MAX won't be smaller than SIZE_MAX, the function vsnprintf guarantees that the returned value will not be greater than SIZE_MAX.
If the functions succeeds, then the return value must be less than its second argument1. This argument has the type size_t, thus the return value must be less than SIZE_MAX.2.
And if you're not convinced, you can always use preprocessor directive that evaluates INT_MAX > SIZE_MAX, and then include the needed code that checks the result of vsnprintf.
1 The identifier n mentioned in the standard citation below, is the second argument to vsnprintf.
2 (Quoted from: ISO/IEC 9899:201x 7.21.6.12 The vsnprintf function 3)
The vsnprintf function returns the number of characters that would have been written
had n been sufficiently large, not counting the terminating null character, or a neg ative
value if an encoding error occurred. Thus, the null-terminated output has been
completely written if and only if the returned value is nonnegative and less than n.

Related

Why does snprintf() take a size_t size limit, but returns an int number of chars printed?

The venerable snprintf() function...
int snprintf( char *restrict buffer, size_t bufsz, const char *restrict format, ... );
returns the number of characters it prints, or rather, the number it would have printed had it not been for the buffer size limit.
takes the size of the buffer in characters/bytes.
How does it make sense for the buffer size to be size_t, but for the return type to be only an int?
If snprintf() is supposed to be able to print more than INT_MAX characters into the buffer, surely it must return an ssize_t or a size_t with (size_t) - 1 indicating an error, right?
And if it is not supposed to be able to print more than INT_MAX characters, why is bufsz a size_t rather than, say, an unsigned or an int? Or - is it at least officially constrained to hold values no larger than INT_MAX?

printf predates the existence of size_t and similar "portable" types -- when printf was first standardized, the result of a sizeof was an int.
This is also the reason why the argument in the printf argument list read for a * width or precision in the format is an int rather than a size_t.
snprintf is more recent, so the size it takes as an argument was defined to be a size_t, but the return value was kept as an int to make it the same as printf and sprintf.
Note that you can print more than INT_MAX characters with these functions, but if you do, the return value is unspecified. On most platforms, an int and a size_t will both be returned in the same way (in the primary return value register), it is just that a size_t value may be out of range for an int. So many platforms actually return a size_t (or ssize_t) from all of these routines and things being out of range will generally work out ok, even though the standard does not require it.

The discrepancy between size and return has been discussed in the standards group in the thread https://www.austingroupbugs.net/view.php?id=761. Here is the conclusion posted at the end of that thread:
Further research has shown that the behavior when the return value would overflow int was clarified by WG14 in C99 by adding it into the list of undefined behaviors in Annex J. It was updated in C11 to the following text:
"J.2 Undefined behavior
The behavior is undefined in the following circumstances:
[skip]
— The number of characters or wide characters transmitted by a formatted output function (or written to an array, or that would have been written to an array) is greater than INT_MAX (7.21.6.1, 7.29.2.1)."
Please note that this description does not mention the size argument of snprintf or the size of the buffer.

How does it make sense for the buffer size to be size_t, but for the return type to be only an int?
The official C99 rationale document does not discuss these particular considerations, but presumably it's for consistency and (separate) ideological reasons:
all of the printf-family functions return an int with substantially the same significance. This was defined (for the original printf, fprintf, and sprintf) well before size_t was invented.
type size_t is in some sense the correct type for conveying sizes and lengths, so it was used for the second arguments to snprintf and vsnprintf when those were introduced (along with size_t itself) in C99.
If snprintf() is supposed to be able to print more than INT_MAX characters into the buffer, surely it must return an ssize_t or a size_t with (size_t) - 1 indicating an error, right?
That would be a more internally-consistent design choice, but nope. Consistency across the function family seems to have been chosen instead. Note that none of the functions in this family have documented limits on the number of characters they can output, and their general specification implies that there is no inherent limit. Thus, they all suffer from the same issue with very long outputs.
And if it is not supposed to be able to print more than INT_MAX characters, why is bufsz a size_t rather than, say, an unsigned or an int? Or - is it at least officially constrained to hold values no larger than INT_MAX?
There is no documented constraint on the value of the second argument, other than the implicit one that it must be representable as a size_t. Not even in the latest version of the standard. But note that there is also nothing that says that type int cannot represent all the values that are representable by size_t (though indeed it can't in most implementations).
So yes, implementations will have trouble behaving according to the specifications when very large data are output via these functions, where "very large" is implementation-dependent. As a practical matter, then, one should not rely on using them to emit very large outputs in a single call (unless one intends to ignore the return value).

If snprintf() is supposed to be able to print more than INT_MAX characters into the buffer, surely it must return an ssize_t or a size_t with (size_t) - 1 indicating an error, right?
Not quite.
C also has an Environmental limit for fprintf() and friends.
The number of characters that can be produced by any single conversion shall be at least 4095." C17dr § 7.21.6.1 15
Anything over 4095 per % risks portability and so int, even at 16-bit (INT_MAX = 32767), suffices for most purposes for portable code.
Note: the ssize_t is not part of the C spec.

Assign result of sizeof() to ssize_t

It happened to me that I needed to compare the result of sizeof(x) to a ssize_t.
Of course GCC gave an error (lucky me (I used -Wall -Wextra -Werror)), and I decided to do a macro to have a signed version of sizeof().
#define ssizeof (ssize_t)sizeof
And then I can use it like this:
for (ssize_t i = 0; i < ssizeof(x); i++)
The problem is, do I have any guarantees that SSIZE_MAX >= SIZE_MAX? I imagine that sadly this is never going to be true.
Or at least that sizeof(ssize_t) == sizeof(size_t), which would cut half of the values but would still be close enough.
I didn't find any relation between ssize_t and size_t in the POSIX documentation.
Related question:
What type should be used to loop through an array?

There is no guarantee that SSIZE_MAX >= SIZE_MAX. In fact, it is very unlikely to be the case, since size_t and ssize_t are likely to be corresponding unsigned and signed types, so (on all actual architectures) SIZE_MAX > SSIZE_MAX. Casting an unsigned value to a signed type which cannot hold that value is Undefined Behaviour. So technically, your macro is problematic.
In practice, at least on 64-bit platforms, you're unlikely to get into trouble if the value you are converting to ssize_t is the size of an object which actually exists. But if the object is theoretical (eg sizeof(char[3][1ULL<<62])), you might get an unpleasant surprise.
Note that the only valid negative value of type ssize_t is -1, which is an error indication. You might be confusing ssize_t, which is defined by Posix, with ptrdiff_t, which is defined in standard C since C99. These two types are the same on most platforms, and are usually the signed integer type corresponding to size_t, but none of those behaviours is guaranteed by either standard. However, the semantics of the two types are different, and you should be aware of that when you use them:
ssize_t is returned by a number of Posix interfaces in order to allow the function to signal either a number of bytes processed or an error indication; the error indication must be -1. There is no expectation that any possible size will fit into ssize_t; the Posix rationale states that:
A conforming application would be constrained not to perform I/O in pieces larger than {SSIZE_MAX}.
This is not a problem for most of the interfaces which return ssize_t because Posix generally does not require interfaces to guarantee to process all data. For example, both read and write accept a size_t which describes the length of the buffer to be read/written and return an ssize_t which describes the number of bytes actually read/written; the implication is that no more than SSIZE_MAX bytes will be read/written even if more data were available. However, the Posix rationale also notes that a particular implementation may provide an extension which allows larger blocks to be processed ("a conforming application using extensions would be able to use the full range if the implementation provided an extended range"), the idea being that the implementation could, for example, specify that return values other than -1 were to be interpreted by casting them to size_t. Such an extension would not be portable; in practices, most implementations do limit the number of bytes which can be processed in a single call to the number which can be reported in ssize_t.
ptrdiff_t is (in standard C) the type of the result of the difference between two pointers. In order for subtraction of pointers to be well defined, the two pointers must refer to the same object, either by pointing into the object or by pointing at the byte immediately following the object. The C committee recognised that if ptrdiff_t is the signed equivalent of size_t, then it is possible that the difference between two pointers might not be representable, leading to undefined behaviour, but they preferred that to requiring that ptrdiff_t be a larger type than size_t. You can argue with this decision -- many people have -- but it has been in place since C90 and it seems unlikely that it will change now. (Current standard wording from , §6.5.6/9: "If the result is not representable in an object of that type [ptrdiff_t], the behavior is undefined.")
As with Posix, the C standard does not define undefined behaviour, so it would be a mistake to interpret that as forbidding the subtraction of two pointers in very large objects. An implementation is always allowed to define the result of behaviour left undefined by the standard, so that it is completely valid for an implementation to specify that if P and Q are two pointers to the same object where P >= Q, then (size_t)(P - Q) is the mathematically correct difference between the pointers even if the subtraction overflows. Of course, code which depends on such an extension won't be fully portable, but if the extension is sufficiently common that might not be a problem.
As a final point, the ambiguity of using -1 both as an error indication (in ssize_t) and as a possibly castable result of pointer subtraction (in ptrdiff_t) is not likely to be a present in practice provided that size_t is as large as a pointer. If size_t is as large as a pointer, the only way that the mathematically correct value of P-Q could be (size_t)(-1) (aka SIZE_MAX) is if the object that P and Q refer to is of size SIZE_MAX, which, given the assumption that size_t is the same width as a pointer, implies that the object plus the following byte occupy every possible pointer value. That contradicts the requirement that some pointer value (NULL) be distinct from any valid address, so we can conclude that the true maximum size of an object must be less than SIZE_MAX.

Please note that you can't actually do this.
The largest possible object in x86 Linux is just below 0xB0000000 in size, while SSIZE_T_MAX is 0x7FFFFFFF.
I haven't checked if read and stuff actually can handle the largest possible objects, but if they can it worked like this:
ssize_t result = read(fd, buf, count);
if (result != -1) {
size_t offset = (size_t) result;
/* handle success */
} else {
/* handle failure */
}
You may find libc is busted. If so, this would work if the kernel is good:
ssize_t result = sys_read(fd, buf, count);
if (result >= 0 || result < -256) {
size_t offset = (size_t) result;
/* handle success */
} else {
errno = (int)-result;
/* handle failure */
}

ssize_t is a POSIX type, it's not defined as part of the C standard. POSIX defines that ssize_t must be able to handle numbers in the interval [-1, SSIZE_MAX], so in principle it doesn't even need to be a normal signed type. The reason for this slightly weird definition is that the only place ssize_t is used is as the return value for read/write/etc. functions.
In practice it's always a normal signed type of the same size as size_t. But if you want to be really pedantic about your types, you shouldn't use it for other purposes than handling return values for IO syscalls. For a general "pointer-sized" signed integer type C89 defines ptrdiff_t. Which in practice will be the same as ssize_t.
Also, if you look at the official spec for read(), you'll see that for the 'nbyte' argument it says that 'If the value of nbyte is greater than {SSIZE_MAX}, the result is implementation-defined.'. So even if a size_t is capable of representing larger values than SSIZE_MAX, it's implementation-defined behavior to use larger values than that for the IO syscalls (the only places where ssize_t is used, as mentioned). And similar for write() etc.

I'm gonna take this on as an X-Y problem. The issue you have is that you want to compare a signed number to an unsigned number. Rather than casting the result of sizeof to ssize_t, You should check if your ssize_t value is less than zero. If it is, then you know it is less than the your size_t value. If not, then you can cast it to size_t and then do a comparison.
For an example, here's a compare function that returns -1 if the signed number is less than the unsigned number, 0 if equal, or 1 if the signed number is greater than the unsigned number:
int compare(ssize_t signed_number, size_t unsigned_number) {
int ret;
if (signed_number < 0 || (size_t) signed_number < unsigned_number) {
ret = -1;
}
else {
ret = (size_t) signed_number > unsigned_number;
}
return ret;
}
If all you wanted was the equivalent of < operation, you can go a bit simpler with something like this:
(signed_number < 0 || (size_t) signed_number < unsigned_number))
That line will give you 1 if signed_number is less than unsigned_number and it limits the branching overhead. Just takes an extra < operation and a logical-OR.

What can I assume about the behaviour of atoi() on error?

The standard C library function atoi is documented in ISO 9899:2011 as:
7.22.1 Numeric conversion functions
1 The functions atof, atoi, atol, and atoll need not affect the value of the integer expression errno on an error. If the value of the result cannot be represented, the behavior is undefined.
...
7.22.1.2 The atoi, atol, and atoll functions
Synopsis
#include <stdlib.h>
int atoi(const char *nptr);
long int atol(const char *nptr);
long long int atoll(const char *nptr);
Description
2 The atoi, atol, and atoll functions convert the initial portion of the string pointed to by nptr to int, long int, and long long int representation, respectively. Except for the behavior on error, they are equivalent to
atoi: (int)strtol(nptr, (char **)NULL, 10)
atol: strtol(nptr, (char **)NULL, 10)
atoll: strtoll(nptr, (char **)NULL, 10)
Returns
3 The atoi, atol, and atoll functions return the converted value.
What is the intended behaviour when string pointed to by nptr cannot be parsed as an integer? The following four opinions seem to exist:
No conversion is performed and zero is returned. This is the documentation given by some references like this one.
Behaviour is like that of strtol except that errno might not be set. This emerges from taking “Except for the behavior on error” as a reference to §7.22.1 ¶1.
Behaviour is unspecified. This is what POSIX says:
The call atoi(str) shall be equivalent to:
(int) strtol(str, (char **)NULL, 10)
except that the handling of errors may differ. If the value cannot be represented, the behavior is undefined.
Furthermore, the section Application Usage states:
The atoi() function is subsumed by strtol() but is retained because it is used extensively in existing code. If the number is not known to be in range, strtol() should be used because atoi() is not required to perform any error checking.
Note that POSIX claims that the specification is aligned to ISO 9899:1999 (which contains the same language as ISO 9899:2011 as far as I'm concerned):
The functionality described on this reference page is aligned with the ISO C standard. Any conflict between the requirements described here and the ISO C standard is unintentional. This volume of POSIX.1-2008 defers to the ISO C standard.
According to my local POSIX committee member, this is the historical behaviour of UNIX.
Behaviour is undefined. This interpretation arises because §7.22.1.2 ¶2 never explicitly says what happens on error. Behaviour that is neither defined nor explicitly implementation defined or unspecified is undefined.
Which of these interpretations is correct? Please try to refer to authoritative documentation.

What is the intended behaviour when string pointed to by nptr cannot be parsed as an integer?
To be clear, this question applies to
// Case 1
value = atoi("");
value = atoi(" ");
value = atoi("wxyz");
and not the following:
// Case 2
// NULL does not point to a string
value = atoi(NULL);
// Convert the initial portion, yet has following junk
value = atoi("123xyz");
value = atoi("123 ");
And maybe/maybe not the following depending on usage of integer.
// Case 3
// Can be parsed as an _integer_, yet overflows an `int`.
value = atoi("12345678901234567890123456789012345678901234567890");
The "non-Case 2" behavior of ato*() depends on the meaning of error in
The atoi, atol, and atoll functions convert the initial portion of the string pointed to by nptr to int, long int, and long long int representation, respectively. Except for the behavior on error, they are equivalent to
atoi: (int)strtol(nptr, (char **)NULL, 10)
...
C11dr §7.22.1.2 2
Certainly error includes case 3: "If the correct value is outside the range of representable values". strto*(), though maybe not ato*(), in this case does set the error number errrno defined in <errno.h>. Since the specification of ato*() does not apply to this error, overflow, the result, is UB per
Undefined behavior is otherwise indicated in this International Standard by the words ‘‘undefined behavior’’ or by the omission of any explicit definition of behavior. C11dr §4 2
For case 1, the behavior of strto*() is well defined and is not specified to affect errno. The spec goes into detail (§7.22.1.4 4) and calls these "no conversion", not an error. So it can asserted the case 1 strto*() behavior is not an error, but a "no conversion". Thus per ...
"If no conversion could be performed, zero is returned. C11dr §7.22.1.4 8
... atoi("") must return 0.

Can an implementation that has sizeof (int) == 1 "fully conform"? [duplicate]

This question already has answers here:
Can sizeof(int) ever be 1 on a hosted implementation?
(8 answers)
Closed 7 years ago.
According to the C standard, any characters returned by fgetc are returned in the form of unsigned char values, "converted to an int" (that quote comes from the C standard, stating that there is indeed a conversion).
When sizeof (int) == 1, many unsigned char values are outside of range. It is thus possible that some of those unsigned char values might end up being converted to an int value (the result of the conversion being "implementation-defined or an implementation-defined signal is raised") of EOF, which would be returned despite the file not actually being in an erroneous or end-of-file state.
I was surprised to find that such an implementation actually exists. The TMS320C55x CCS manual documents UCHAR_MAX having a corresponding value of 65535, INT_MAX having 32767, fputs and fopen supporting binary mode... What's even more surprising is that it seems to describe the environment as a fully conforming, complete implementation (minus signals).
The C55x C/C++ compiler fully conforms to the ISO C standard as defined by the ISO specification ...
The compiler tools come with a complete runtime library. All library
functions conform to the ISO C library standard. ...
Is such an implementation that can return a value indicating errors where there are none, really fully conforming? Could this justify using feof and ferror in the condition section of a loop (as hideous as that seems)? For example, while ((c = fgetc(stdin)) != EOF || !(feof(stdin) || ferror(stdin))) { ... }

The function fgetc() returns an int value in the range of unsigned char only when a proper character is read, otherwise it returns EOF which is a negative value of type int.
My original answer (I changed it) assumed that there was an integer conversion to int, but this is not the case, since actually the function fgetc() is already returning a value of type int.
I think that, to be conforming, the implementation have to make fgetc() to return nonnegative values in the range of int, unless EOF is returned.
In this way, the range of values from 32768 to 65535 will be never associated to character codes in the TMS320C55x implementation.

strtol using errno

I have the following code:
#include <stdlib.h>
#include <stdio.h>
#include <errno.h>
void main(void)
{
int data;
char * tmp;
data = strtol("23ef23",&tmp,10);
printf("%d",errno);
getchar();
}
output is 0 ...
why?
i am using visual studio 2010 C++
code must be C89 compatible.

strtol only sets errno for overflow conditions, not to indicate parsing failures. For that purpose, you have to check the value of the end pointer, but you need to store a pointer to the original string:
char const * const str = "blah";
char const * endptr;
int n = strtol(str, &endptr, 0);
if (endptr == str) { /* no conversion was performed */ }
else if (*endptr == '\0') { /* the entire string was converted */ }
else { /* the unconverted rest of the string starts at endptr */ }
I think the only required error values are for underflow and overflow.
Conversely, if the entire string has been consumed in the conversion, you have *endptr = '\0', which may be an additional thing you might want to check.

Your logic does not fit with the 'spec'.
see this
An invalid value does not necessarily set 'errno'.
(copy follows)
long int strtol ( const char * str, char ** endptr, int base );
Convert string to long integer
Parses the C string str interpreting its content as an integral number of the specified base, which is returned as a long int value.
The function first discards as many whitespace characters as necessary until the first non-whitespace character is found. Then, starting from this character, takes as many characters as possible that are valid following a syntax that depends on the base parameter, and interprets them as a numerical value. Finally, a pointer to the first character following the integer representation in str is stored in the object pointed by endptr.
If the value of base is zero, the syntax expected is similar to that of integer constants, which is formed by a succession of:
An optional plus or minus sign
An optional prefix indicating octal or hexadecimal base ("0" or "0x" respectively)
A sequence of decimal digits (if no base prefix was specified) or either octal orhexadecimal digits if a specific prefix is present
If the base value is between 2 and 36, the format expected for the integral number is a succession of the valid digits and/or letters needed to represent integers of the specified radix (starting from '0' and up to 'z'/'Z' for radix 36). The sequence may optionally be preceded by a plus or minus sign and, if base is 16, an optional "0x" or "0X" prefix.
If the first sequence of non-whitespace characters in str is not a valid integral number as defined above, or if no such sequence exists because either str is empty or it contains only whitespace characters, no conversion is performed.
Parameters
str
C string containing the representation of an integral number.
endptr
Reference to an object of type char*, whose value is set by the function to the next character in str after the numerical value.
This parameter can also be a null pointer, in which case it is not used.
Return Value
On success, the function returns the converted integral number as a long int value.
If no valid conversion could be performed, a zero value is returned.
If the correct value is out of the range of representable values, LONG_MAX or
LONG_MIN is returned, and the global variable errno is set to ERANGE.

It has been 10 years since the question was first posted, but the problem does not age. The answers given are either out of date (yet true for their time) or a bit confusing because I had to search more.
I have seen this in a book and met this post while searching for its meaning, and while checking the page for strtol, I ended up in this page on cplusplus.com of errno macro.
Your question has 2 parts to answer here:
First lets make a note of these 2 things about errno:
1- errno can be anything during the execution of a program for no function resets it (unless your own function does so)
errno is set to zero at program startup ...
any function ... can modify its value ...
no ... function sets its value back to zero
2- one has to reset it before calling a function that may use it.
should be reset ... to zero before the call ... since ... previous ... function may have altered its value
your program is pretty small, so no function seems to be there to change it. The sole visitors of errno are main program to set it to zero, and strtol in case of any error.
Yet, your program shows errno is 0, and this is confusing because one expects 23ef23 would not be converted to a number since it includes letters in it. However, this expectation is wrong, and actually, you get a number from this string thus there is really no error here, so no change is made to errno. and this makes the second part of the answer.
you will find this definition in strtol page
... takes as many characters as possible that are valid following a
syntax that depends on the base parameter, and interprets them as a
numerical value ... a pointer to the first character following is
stored.
instead of a long explanation, this following print statement and its output will suffice to visualize that above definition:
printf("%d %d %s",data,errno,tmp);
23 0 ef23
if you set the base to 16, output would be 2354979 0 . And base 2 would give 0 0 23ef23, showing that strtol will not freak if it does not find a number. The only error it will give will be ERANGE for breaching limits:
If the value read is out of the range of representable values by a
long int, the function returns LONG_MAX or LONG_MIN (defined in
), and errno is set to ERANGE.

You have to set errno to 0 before you call strtol. Otherwise you overwrite whatever value strtol set errno to.

You have to check tmp is not the same as "blablabla" pointer.
If data == 0 and tmp == "blablabla", then the input data is in the incorrect format. errno needs not to be set by the implementation if the input data is not in the expected format.
On strtol, strtoll, strtoul, and strtoull functions C says:
(C99, 7.20.1.4p7) If the subject sequence is empty or does not have the expected form, no conversion is performed; the value of nptr is stored in the object pointed to by endptr, provided that endptr is not a null pointer.
(C99, 7.20.1.4p9) The strtol, strtoll, strtoul, and strtoull functions return the converted
value, if any. If no conversion could be performed, zero is returned.