It happened to me that I needed to compare the result of sizeof(x) to a ssize_t.
Of course GCC gave an error (lucky me (I used -Wall -Wextra -Werror)), and I decided to do a macro to have a signed version of sizeof().
#define ssizeof (ssize_t)sizeof
And then I can use it like this:
for (ssize_t i = 0; i < ssizeof(x); i++)
The problem is, do I have any guarantees that SSIZE_MAX >= SIZE_MAX? I imagine that sadly this is never going to be true.
Or at least that sizeof(ssize_t) == sizeof(size_t), which would cut half of the values but would still be close enough.
I didn't find any relation between ssize_t and size_t in the POSIX documentation.
Related question:
What type should be used to loop through an array?
There is no guarantee that SSIZE_MAX >= SIZE_MAX. In fact, it is very unlikely to be the case, since size_t and ssize_t are likely to be corresponding unsigned and signed types, so (on all actual architectures) SIZE_MAX > SSIZE_MAX. Casting an unsigned value to a signed type which cannot hold that value is Undefined Behaviour. So technically, your macro is problematic.
In practice, at least on 64-bit platforms, you're unlikely to get into trouble if the value you are converting to ssize_t is the size of an object which actually exists. But if the object is theoretical (eg sizeof(char[3][1ULL<<62])), you might get an unpleasant surprise.
Note that the only valid negative value of type ssize_t is -1, which is an error indication. You might be confusing ssize_t, which is defined by Posix, with ptrdiff_t, which is defined in standard C since C99. These two types are the same on most platforms, and are usually the signed integer type corresponding to size_t, but none of those behaviours is guaranteed by either standard. However, the semantics of the two types are different, and you should be aware of that when you use them:
ssize_t is returned by a number of Posix interfaces in order to allow the function to signal either a number of bytes processed or an error indication; the error indication must be -1. There is no expectation that any possible size will fit into ssize_t; the Posix rationale states that:
A conforming application would be constrained not to perform I/O in pieces larger than {SSIZE_MAX}.
This is not a problem for most of the interfaces which return ssize_t because Posix generally does not require interfaces to guarantee to process all data. For example, both read and write accept a size_t which describes the length of the buffer to be read/written and return an ssize_t which describes the number of bytes actually read/written; the implication is that no more than SSIZE_MAX bytes will be read/written even if more data were available. However, the Posix rationale also notes that a particular implementation may provide an extension which allows larger blocks to be processed ("a conforming application using extensions would be able to use the full range if the implementation provided an extended range"), the idea being that the implementation could, for example, specify that return values other than -1 were to be interpreted by casting them to size_t. Such an extension would not be portable; in practices, most implementations do limit the number of bytes which can be processed in a single call to the number which can be reported in ssize_t.
ptrdiff_t is (in standard C) the type of the result of the difference between two pointers. In order for subtraction of pointers to be well defined, the two pointers must refer to the same object, either by pointing into the object or by pointing at the byte immediately following the object. The C committee recognised that if ptrdiff_t is the signed equivalent of size_t, then it is possible that the difference between two pointers might not be representable, leading to undefined behaviour, but they preferred that to requiring that ptrdiff_t be a larger type than size_t. You can argue with this decision -- many people have -- but it has been in place since C90 and it seems unlikely that it will change now. (Current standard wording from , §6.5.6/9: "If the result is not representable in an object of that type [ptrdiff_t], the behavior is undefined.")
As with Posix, the C standard does not define undefined behaviour, so it would be a mistake to interpret that as forbidding the subtraction of two pointers in very large objects. An implementation is always allowed to define the result of behaviour left undefined by the standard, so that it is completely valid for an implementation to specify that if P and Q are two pointers to the same object where P >= Q, then (size_t)(P - Q) is the mathematically correct difference between the pointers even if the subtraction overflows. Of course, code which depends on such an extension won't be fully portable, but if the extension is sufficiently common that might not be a problem.
As a final point, the ambiguity of using -1 both as an error indication (in ssize_t) and as a possibly castable result of pointer subtraction (in ptrdiff_t) is not likely to be a present in practice provided that size_t is as large as a pointer. If size_t is as large as a pointer, the only way that the mathematically correct value of P-Q could be (size_t)(-1) (aka SIZE_MAX) is if the object that P and Q refer to is of size SIZE_MAX, which, given the assumption that size_t is the same width as a pointer, implies that the object plus the following byte occupy every possible pointer value. That contradicts the requirement that some pointer value (NULL) be distinct from any valid address, so we can conclude that the true maximum size of an object must be less than SIZE_MAX.
Please note that you can't actually do this.
The largest possible object in x86 Linux is just below 0xB0000000 in size, while SSIZE_T_MAX is 0x7FFFFFFF.
I haven't checked if read and stuff actually can handle the largest possible objects, but if they can it worked like this:
ssize_t result = read(fd, buf, count);
if (result != -1) {
size_t offset = (size_t) result;
/* handle success */
} else {
/* handle failure */
}
You may find libc is busted. If so, this would work if the kernel is good:
ssize_t result = sys_read(fd, buf, count);
if (result >= 0 || result < -256) {
size_t offset = (size_t) result;
/* handle success */
} else {
errno = (int)-result;
/* handle failure */
}
ssize_t is a POSIX type, it's not defined as part of the C standard. POSIX defines that ssize_t must be able to handle numbers in the interval [-1, SSIZE_MAX], so in principle it doesn't even need to be a normal signed type. The reason for this slightly weird definition is that the only place ssize_t is used is as the return value for read/write/etc. functions.
In practice it's always a normal signed type of the same size as size_t. But if you want to be really pedantic about your types, you shouldn't use it for other purposes than handling return values for IO syscalls. For a general "pointer-sized" signed integer type C89 defines ptrdiff_t. Which in practice will be the same as ssize_t.
Also, if you look at the official spec for read(), you'll see that for the 'nbyte' argument it says that 'If the value of nbyte is greater than {SSIZE_MAX}, the result is implementation-defined.'. So even if a size_t is capable of representing larger values than SSIZE_MAX, it's implementation-defined behavior to use larger values than that for the IO syscalls (the only places where ssize_t is used, as mentioned). And similar for write() etc.
I'm gonna take this on as an X-Y problem. The issue you have is that you want to compare a signed number to an unsigned number. Rather than casting the result of sizeof to ssize_t, You should check if your ssize_t value is less than zero. If it is, then you know it is less than the your size_t value. If not, then you can cast it to size_t and then do a comparison.
For an example, here's a compare function that returns -1 if the signed number is less than the unsigned number, 0 if equal, or 1 if the signed number is greater than the unsigned number:
int compare(ssize_t signed_number, size_t unsigned_number) {
int ret;
if (signed_number < 0 || (size_t) signed_number < unsigned_number) {
ret = -1;
}
else {
ret = (size_t) signed_number > unsigned_number;
}
return ret;
}
If all you wanted was the equivalent of < operation, you can go a bit simpler with something like this:
(signed_number < 0 || (size_t) signed_number < unsigned_number))
That line will give you 1 if signed_number is less than unsigned_number and it limits the branching overhead. Just takes an extra < operation and a logical-OR.
Related
The venerable snprintf() function...
int snprintf( char *restrict buffer, size_t bufsz, const char *restrict format, ... );
returns the number of characters it prints, or rather, the number it would have printed had it not been for the buffer size limit.
takes the size of the buffer in characters/bytes.
How does it make sense for the buffer size to be size_t, but for the return type to be only an int?
If snprintf() is supposed to be able to print more than INT_MAX characters into the buffer, surely it must return an ssize_t or a size_t with (size_t) - 1 indicating an error, right?
And if it is not supposed to be able to print more than INT_MAX characters, why is bufsz a size_t rather than, say, an unsigned or an int? Or - is it at least officially constrained to hold values no larger than INT_MAX?
printf predates the existence of size_t and similar "portable" types -- when printf was first standardized, the result of a sizeof was an int.
This is also the reason why the argument in the printf argument list read for a * width or precision in the format is an int rather than a size_t.
snprintf is more recent, so the size it takes as an argument was defined to be a size_t, but the return value was kept as an int to make it the same as printf and sprintf.
Note that you can print more than INT_MAX characters with these functions, but if you do, the return value is unspecified. On most platforms, an int and a size_t will both be returned in the same way (in the primary return value register), it is just that a size_t value may be out of range for an int. So many platforms actually return a size_t (or ssize_t) from all of these routines and things being out of range will generally work out ok, even though the standard does not require it.
The discrepancy between size and return has been discussed in the standards group in the thread https://www.austingroupbugs.net/view.php?id=761. Here is the conclusion posted at the end of that thread:
Further research has shown that the behavior when the return value would overflow int was clarified by WG14 in C99 by adding it into the list of undefined behaviors in Annex J. It was updated in C11 to the following text:
"J.2 Undefined behavior
The behavior is undefined in the following circumstances:
[skip]
— The number of characters or wide characters transmitted by a formatted output function (or written to an array, or that would have been written to an array) is greater than INT_MAX (7.21.6.1, 7.29.2.1)."
Please note that this description does not mention the size argument of snprintf or the size of the buffer.
How does it make sense for the buffer size to be size_t, but for the return type to be only an int?
The official C99 rationale document does not discuss these particular considerations, but presumably it's for consistency and (separate) ideological reasons:
all of the printf-family functions return an int with substantially the same significance. This was defined (for the original printf, fprintf, and sprintf) well before size_t was invented.
type size_t is in some sense the correct type for conveying sizes and lengths, so it was used for the second arguments to snprintf and vsnprintf when those were introduced (along with size_t itself) in C99.
If snprintf() is supposed to be able to print more than INT_MAX characters into the buffer, surely it must return an ssize_t or a size_t with (size_t) - 1 indicating an error, right?
That would be a more internally-consistent design choice, but nope. Consistency across the function family seems to have been chosen instead. Note that none of the functions in this family have documented limits on the number of characters they can output, and their general specification implies that there is no inherent limit. Thus, they all suffer from the same issue with very long outputs.
And if it is not supposed to be able to print more than INT_MAX characters, why is bufsz a size_t rather than, say, an unsigned or an int? Or - is it at least officially constrained to hold values no larger than INT_MAX?
There is no documented constraint on the value of the second argument, other than the implicit one that it must be representable as a size_t. Not even in the latest version of the standard. But note that there is also nothing that says that type int cannot represent all the values that are representable by size_t (though indeed it can't in most implementations).
So yes, implementations will have trouble behaving according to the specifications when very large data are output via these functions, where "very large" is implementation-dependent. As a practical matter, then, one should not rely on using them to emit very large outputs in a single call (unless one intends to ignore the return value).
If snprintf() is supposed to be able to print more than INT_MAX characters into the buffer, surely it must return an ssize_t or a size_t with (size_t) - 1 indicating an error, right?
Not quite.
C also has an Environmental limit for fprintf() and friends.
The number of characters that can be produced by any single conversion shall be at least 4095." C17dr § 7.21.6.1 15
Anything over 4095 per % risks portability and so int, even at 16-bit (INT_MAX = 32767), suffices for most purposes for portable code.
Note: the ssize_t is not part of the C spec.
There was this range checking function that required two signed integer parameters:
range_limit(long int lower, long int upper)
It was called with range_limit(0, controller_limit). I needed to expand the range check to also include negative numbers up to the 'controller_limit' magnitude.
I naively changed the call to
range_limit(-controller_limit, controller_limit)
Although it compiled without warnings, this did not work as I expected.
I missed that controller_limit was unsigned integer.
In C, simple integer calculations can lead to surprising results. For example these calculations
0u - 1;
or more relevant
unsigned int ui = 1;
-ui;
result in 4294967295 of type unsigned int (aka UINT_MAX). As I understand it, this is due to integer conversion rules and modulo arithmetics of unsigned operands see here.
By definition, unsigned arithmetic does not overflow but rather "wraps-around". This behavior is well defined, so the compiler will not issue a warning (at least not gcc) if you use these expressions calling a function:
#include <stdio.h>
void f_l(long int li) {
printf("%li\n", li); // outputs: 4294967295
}
int main(void)
{
unsigned int ui = 1;
f_l(-ui);
return 0;
}
Try this code for yourself!
So instead of passing a negative value I passed a ridiculously high positive value to the function.
My fix was to cast from unsigned integer into int:
range_limit(-(int)controller_limit, controller_limit);
Obviously, integer modulo behavior in combination with integer conversion rules allows for subtle mistakes that are hard to spot especially, as the compiler does not help in finding these mistakes.
As the compiler does not emit any warnings and you can come across these kind of calculations any day, I'd like to know:
If you have to deal with unsigned operands, how do you best avoid the unsigned integers modulo arithmetic pitfall?
Note:
While gcc does not provide any help in detecting integer modulo arithmetic (at the time of writing), clang does. The compiler flag "-fsanitize=unsigned-integer-overflow" will enable detection of modulo arithmetic (using "-Wconversion" is not sufficient), however, not at compile time but at runtime. Try for yourself!
Further reading:
Seacord: Secure Coding in C and C++, Chapter 5, Integer Security
Using signed integers does not change the situation at all.
A C implementation is under no obligation to raise a run-time warning or error as a response to Undefined Behaviour. Undefined Behaviour is undefined, as it says; the C standard provides absolutely no requirements or guidance about the outcome. A particular implementation can choose any mechanism it sees fit in response to Undefined Behaviour, including explicitly defining the result. (If you rely on that explicit definition, your program is no longer portable to other compilers with different or undocumented behaviour. Perhaps you don't care.)
For example, GCC defines the result of out-of-bounds integer conversions and some bitwise operations in Implementation-defined behaviour section of its manual.
If you're worried about integer overflow (and there are lots of times you should be worried about it), it's up to you to protect yourself.
For example, instead of allowing:
unsigned_counter += 5;
to overflow, you could write:
if (unsigned_count > UINT_MAX - 5) {
/* Handle the error */
}
else {
unsigned_counter += 5;
}
And you should do that in cases where integer overflow will get you into trouble. A common example, which can (and has!) lead to buffer-overflow exploits, comes from checking whether a buffer has enough room for an addition:
if (buffer_length + added_length >= buffer_capacity) {
/* Reallocate buffer or fail*/
}
memcpy(buffer + buffer_length, add_characters, added_length);
buffer_length += added_length;
buffer[buffer_length] = 0;
If buffer_length + added_length overflows -- in either signed or unsigned arithmetic -- the necessary reallocation (or failure) won't trigger and the memcpy will overwrite memory or segfault or do something else you weren't expecting.
It's easy to fix, so it's worth getting into the habit:
if (added_length >= buffer_capacity
|| buffer_length >= buffer_capacity - added_length) {
/* Reallocate buffer or fail*/
}
memcpy(buffer + buffer_length, add_characters, added_length);
buffer_length += added_length;
buffer[buffer_length] = 0;
Another similar case where you can get into serious trouble is when you are using a loop and your increment is more than one.
This is safe:
for (i = 0; i < limit; ++i) ...
This could lead to an infinite loop:
for (i = 0; i < limit; i += 2) ...
The first one is safe -- assuming i and limit are the same type -- because i + 1 cannot overflow if i < limit. The most it can be is limit itself. But no such guarantee can be made about i + 2, since limit could be INT_MAX (or whatever is the maximum value for the integer type being used). Again, the fix is simple: compare the difference rather than the sum.
If you're using GCC and you don't care about full portability, you can use the GCC overflow-detection builtins to help you. They're also documented in the GCC manual.
I know that the C standard allows for implementations where
(sizeof(unsigned) > sizeof(size_t))
or
(sizeof(int) > sizeof(ptrdiff_t))
is true. But are there any real implementations where one of these is true?
Background
I wrote a function similar to asprintf() (since asprintf() is not portable), and snprintf() return an int but needs a size_t argument, so should I check if leni (shown below) is not less than SIZE_MAX in this code?
va_copy(atmp,args)
int leni = vsnprintf(NULL,0,format,atmp); //get the size of the new string
va_end(atmp);
if(leni<0)
//do some error handling
if(leni>=SIZE_MAX) //do i need this part?
//error handling
size_t lens = ((size_t)leni)+1;
char *newString = malloc(lens);
if(!newString)
//do some error hanling
vsnprintf(newString,lens,format,args)!=lens-1)
While the standard doesn't forbid that INT_MAX won't be smaller than SIZE_MAX, the function vsnprintf guarantees that the returned value will not be greater than SIZE_MAX.
If the functions succeeds, then the return value must be less than its second argument1. This argument has the type size_t, thus the return value must be less than SIZE_MAX.2.
And if you're not convinced, you can always use preprocessor directive that evaluates INT_MAX > SIZE_MAX, and then include the needed code that checks the result of vsnprintf.
1 The identifier n mentioned in the standard citation below, is the second argument to vsnprintf.
2 (Quoted from: ISO/IEC 9899:201x 7.21.6.12 The vsnprintf function 3)
The vsnprintf function returns the number of characters that would have been written
had n been sufficiently large, not counting the terminating null character, or a neg ative
value if an encoding error occurred. Thus, the null-terminated output has been
completely written if and only if the returned value is nonnegative and less than n.
The C99 standard suggests that the type size_t is large enough to store the size of any object, as it is the resulting type of the sizeof operator.
The sizeof operator yields the size (in bytes) of its operand, which may be an expression or the parenthesized name of a type. ...
The value of the result ... is implementation-defined, and its type (an unsigned integer type) is size_t, defined in (and other headers).
Since SIZE_MAX (<limits.h>, if I recall correctly) is defined to be the largest value that a size_t can store, it should follow that the largest object would have a size equal to SIZE_MAX. That would in fact be helpful, but alas it seems as though we'd be asking quite a lot to allocate anything even one quarter that size.
Are there any implementations where you can actually declare (or otherwise allocate) an object as large as SIZE_MAX?
It certainly doesn't seem to be the common case... In C11 the optional rsize_t type and its corresponding RSIZE_MAX macro were introduced. It's supposed to be a runtime constraint if any standard C function is used with a value greater than RSIZE_MAX as an rsize_t argument. This seems to imply that the largest object might be RSIZE_MAX bytes. However, this doesn't seem to be widely supported, either!
Are there any implementations where RSIZE_MAX exists and you can actually declare (or otherwise allocate) an object as large as RSIZE_MAX?
I think all C implementations will allow you to declare an object of that size. The OS may refuse to load the executable, as there isn't enough memory.
Likewise, all C runtime libraries will allow you to attempt to allocate that size of memory, however, it will probably fail as there isn't that much memory, neither virtual nor real.
Just think: if size_t is a type equal to the machine word size (32 bits, 64 bits), then the highest adressable memory cell (byte) is 2^32 (or 2^64). Given there are lower memory interrupt vectors, BIOS and OS, not to mention the code and data of your program, there is never this amount of memory available.
Oh, now I see the issue. Assuming realloc factor of ~1.5, there are two integer limits. First is SIZE_MAX/3, because above that size*3/2 will overflow. But at that large low bits are insignificant and we can inverse the operator order to size/2*3 to still grow by ~1.5 (that imposes second limit of SIZE_MAX/3*2). At last, resort to SIZE_MAX.
After that, we just have to binary-search the amount that system can actually allocate (in range from the result of growing to minimal required size).
int
grow(char **data_p, size_t *size_p, size_t min)
{
size_t size = *size_p;
while (size < min)
size = (size <= SIZE_MAX / 3 ? size * 3 / 2 :
size <= SIZE_MAX / 3 * 2 ? size / 2 * 3 : SIZE_MAX);
if (size != *size_p) {
size_t ext = size - min;
char *data;
for (;; ext /= 2)
if ((data = realloc(*data_p, size + ext)) || ext == 0)
break;
if (data == NULL) return -1; // ENOMEM
*data_p = data;
*size_p = size + ext;
}
return 0;
}
And none of os-dependent or manual limits!
As you can see, the original question is a consequence of [probably] imperfect implementation that doesn't treat edge cases with respect. It doesn't matter if any questioned systems do or do not exist now – any algorithm that is supposed to work near to the integer limits should take proper care of them.
(Please note that above code assumes char data and has no accounting for other element sizes; implementing that may add more complex checks.)
I am passing an array of vertex indices in some GL code... each element is a GLushort
I want to terminate with a sentinel so as to avoid having to laboriously pass the array length each time alongside the array itself.
#define SENTINEL ( (GLushort) -1 ) // edit thanks to answers below
:
GLushort verts = {0, 0, 2, 1, 0, 0, SENTINEL};
I cannot use 0 to terminate as some of the elements have value 0
Can I use -1?
To my understanding this would wrap to the maximum integer GLushort can represent, which would be ideal.
But is this behaviour guaranteed in C?
(I cannot find a MAX_INT equivalent constant for this type, otherwise I would be using that)
If GLushort is indeed an unsigned type, then (GLushort)-1 is the maximum value for GLushort. The C standard guarantees that. So, you can safely use -1.
For example, C89 didn't have SIZE_MAX macro for the maximum value for size_t. It could be portably defined by the user as #define SIZE_MAX ((size_t)-1).
Whether this works as a sentinel value in your code depends on whether (GLushort)-1 is a valid, non-sentinel value in your code.
GLushort is an UNSIGNED_SHORT type which is typedefed to unsigned short, and which, although C does not guarantee it, OpenGL assumes as a value with a 2^16-1 range (Chapter 4.3 of the specification). On practically every mainstream architecture, this somewhat dangerous assumption holds true, too (I'm not aware of one where unsigned short has a different size).
As such, you can use -1, but it is awkward because you will have a lot of casts and if you forget a cast for example in an if() statement, you can be lucky and get a compiler warning about "comparison can never be true", or you can be unlucky and the compiler will silently optimize the branch out, after which you spend days searching for the reason why your seemingly perfect code executes wrong. Or worse yet, it all works fine in debug builds and only bombs in release builds.
Therefore, using 0xffff as jv42 has advised is much preferrable, it avoids this pitfall alltogether.
I would create a global constant of value:
const GLushort GLushort_SENTINEL = (GLushort)(-1);
I think this is perfectly elegant as long as signed integers are represented using 2's complement.
I don't remember if thats guaranteed by the C standard, but it is virtually guaranteed for most CPU's (in my experience).
Edit: Appparently this is guaranteed by the C standard....
If you want a named constant, you shouldn't use a const qualified variable as proposed in another answer. They are really not the same. Use either a macro (as others have said) or an enumeration type constant:
enum { GLushort_SENTINEL = -1; };
The standard guarantees that this always is an int (really another name of the constant -1) and that it always will translate into the max value of your unsigned type.
Edit: or you could have it
enum { GLushort_SENTINEL = (GLushort)-1; };
if you fear that on some architectures GLushort could be narrower than unsigned int.