C/GL: Using -1 as sentinel on array of unsigned integers

C/GL: Using -1 as sentinel on array of unsigned integers - c

I am passing an array of vertex indices in some GL code... each element is a GLushort
I want to terminate with a sentinel so as to avoid having to laboriously pass the array length each time alongside the array itself.
#define SENTINEL ( (GLushort) -1 ) // edit thanks to answers below
:
GLushort verts = {0, 0, 2, 1, 0, 0, SENTINEL};
I cannot use 0 to terminate as some of the elements have value 0
Can I use -1?
To my understanding this would wrap to the maximum integer GLushort can represent, which would be ideal.
But is this behaviour guaranteed in C?
(I cannot find a MAX_INT equivalent constant for this type, otherwise I would be using that)

If GLushort is indeed an unsigned type, then (GLushort)-1 is the maximum value for GLushort. The C standard guarantees that. So, you can safely use -1.
For example, C89 didn't have SIZE_MAX macro for the maximum value for size_t. It could be portably defined by the user as #define SIZE_MAX ((size_t)-1).
Whether this works as a sentinel value in your code depends on whether (GLushort)-1 is a valid, non-sentinel value in your code.

GLushort is an UNSIGNED_SHORT type which is typedefed to unsigned short, and which, although C does not guarantee it, OpenGL assumes as a value with a 2^16-1 range (Chapter 4.3 of the specification). On practically every mainstream architecture, this somewhat dangerous assumption holds true, too (I'm not aware of one where unsigned short has a different size).
As such, you can use -1, but it is awkward because you will have a lot of casts and if you forget a cast for example in an if() statement, you can be lucky and get a compiler warning about "comparison can never be true", or you can be unlucky and the compiler will silently optimize the branch out, after which you spend days searching for the reason why your seemingly perfect code executes wrong. Or worse yet, it all works fine in debug builds and only bombs in release builds.
Therefore, using 0xffff as jv42 has advised is much preferrable, it avoids this pitfall alltogether.

I would create a global constant of value:
const GLushort GLushort_SENTINEL = (GLushort)(-1);
I think this is perfectly elegant as long as signed integers are represented using 2's complement.
I don't remember if thats guaranteed by the C standard, but it is virtually guaranteed for most CPU's (in my experience).
Edit: Appparently this is guaranteed by the C standard....

If you want a named constant, you shouldn't use a const qualified variable as proposed in another answer. They are really not the same. Use either a macro (as others have said) or an enumeration type constant:
enum { GLushort_SENTINEL = -1; };
The standard guarantees that this always is an int (really another name of the constant -1) and that it always will translate into the max value of your unsigned type.
Edit: or you could have it
enum { GLushort_SENTINEL = (GLushort)-1; };
if you fear that on some architectures GLushort could be narrower than unsigned int.

Related

How to insert booleans into a bitfield in C89

As far as I understand, in C89 all boolean expressions are of type integer. This also means that function parameters that represent bool usually get represented by an int parameter.
Now my question is how I can most ideally take such an int and put it into a bitfield so that it only occupies one bit (let's ignore padding for now).
The first thing here is which type to use. Using int or any other unsigned type doesn't work, because when there is only one bit, only -1 and 0 can be represented (at least with two's complement).
While -1 technically evaluates as true, this is not ideal because actually assigning it without undefined behavior can be quite tricky from what I understand.
So an unsigned type should be chosen for the bitfield:
typedef struct bitfield_with_boolean {
unsigned int boolean : 1;
} bitfield_with_boolean;
The next question is then how to assign that bitfield. Just taking an int or similar won't work because the downcast truncates the value so if the lowest bit wasn't set, a value that would previously evaluate to true would now suddenly evaluate to false.
As far as I understand, the boolean operators are guaranteed to always return either 0 or 1. So my idea to solve this problem would be something like this:
#define to_boolean(expression) (!!(expression))
So in order to assign the value I would do:
bitfield_with_boolean to_bitfield(int boolean) {
bitfield_with_boolean bitfield = {to_boolean(boolean)};
return bitfield;
}
Is that correct, and or is there a better way?
NOTE:
I know the problem is completely solved starting with C99 because casting to _Bool is guaranteed to always result in either a 0 or a 1. Where 0 is only the result if the input had a value of 0.

Yes, your solution is correct. However, I wouldn't hide it behind a macro, and I wouldn't name a macro using all_lowercase letters.
!!var is sufficiently idiomatic that I'd say it's fine in code.
Alternatives include var != 0 and, of course, var ? 1 : 0.

What's the point in specifying unsigned integers with "U"?

I have always, for as long as I can remember and ubiquitously, done this:
for (unsigned int i = 0U; i < 10U; ++i)
{
// ...
}
In other words, I use the U specifier on unsigned integers. Now having just looked at this for far too long, I'm wondering why I do this. Apart from signifying intent, I can't think of a reason why it's useful in trivial code like this?
Is there a valid programming reason why I should continue with this convention, or is it redundant?

First, I'll state what is probably obvious to you, but your question leaves room for it, so I'm making sure we're all on the same page.
There are obvious differences between unsigned ints and regular ints: The difference in their range (-2,147,483,648 to 2,147,483,647 for an int32 and 0 to 4,294,967,295 for a uint32). There's a difference in what bits are put at the most significant bit when you use the right bitshift >> operator.
The suffix is important when you need to tell the compiler to treat the constant value as a uint instead of a regular int. This may be important if the constant is outside the range of a regular int but within the range of a uint. The compiler might throw a warning or error in that case if you don't use the U suffix.
Other than that, Daniel Daranas mentioned in comments the only thing that happens: if you don't use the U suffix, you'll be implicitly converting the constant from a regular int to a uint. That's a tiny bit extra effort for the compiler, but there's no run-time difference.
Should you care? Here's my answer, (in bold, for those who only want a quick answer): There's really no good reason to declare a constant as 10U or 0U. Most of the time, you're within the common range of uint and int, so the value of that constant looks exactly the same whether its a uint or an int. The compiler will immediately take your const int expression and convert it to a const uint.
That said, here's the only argument I can give you for the other side: semantics. It's nice to make code semantically coherent. And in that case, if your variable is a uint, it doesn't make sense to set that value to a constant int. If you have a uint variable, it's clearly for a reason, and it should only work with uint values.
That's a pretty weak argument, though, particularly because as a reader, we accept that uint constants usually look like int constants. I like consistency, but there's nothing gained by using the 'U'.

I see this often when using defines to avoid signed/unsigned mismatch warnings. I build a code base for several processors using different tool chains and some of them are very strict.
For instance, removing the ‘u’ in the MAX_PRINT_WIDTH define below:
#define MAX_PRINT_WIDTH (384u)
#define IMAGE_HEIGHT (480u) // 240 * 2
#define IMAGE_WIDTH (320u) // 160 * 2 double density
Gave the following warning:
"..\Application\Devices\MartelPrinter\mtl_print_screen.c", line 106: cc1123: {D} warning:
comparison of unsigned type with signed type
for ( x = 1; (x < IMAGE_WIDTH) && (index <= MAX_PRINT_WIDTH); x++ )
You will probably also see ‘f’ for float vs. double.

I extracted this sentence from a comment, because it's a widely believed incorrect statement, and also because it gives some insight into why explicitly marking unsigned constants as such is a good habit.
...it seems like it would only be useful to keep it when I think overflow might be an issue? But then again, haven't I gone some ways to mitigating for that by specifying unsigned in the first place...
Now, let's consider some code:
int something = get_the_value();
// Compute how many 8s are necessary to reach something
unsigned count = (something + 7) / 8;
So, does the unsigned mitigate potential overflow? Not at all.
Let's suppose something turns out to be INT_MAX (or close to that value). Assuming a 32-bit machine, we might expect count to be 229, or 268,435,456. But it's not.
Telling the compiler that the result of the computation should be unsigned has no effect whatsoever on the typing of the computation. Since something is an int, and 7 is an int, something + 7 will be computed as an int, and will overflow. Then the overflowed value will be divided by 8 (also using signed arithmetic), and whatever that works out to be will be converted to an unsigned and assigned to count.
With GCC, arithmetic is actually performed in 2s complement so the overflow will be a very large negative number; after the division it will be a not-so-large negative number, and that ends up being a largish unsigned number, much larger than the one we were expecting.
Suppose we had specified 7U instead (and maybe 8U as well, to be consistent). Now it works.. It works because now something + 7U is computed with unsigned arithmetic, which doesn't overflow (or even wrap around.)
Of course, this bug (and thousands like it) might go unnoticed for quite a lot of time, blowing up (perhaps literally) at the worst possible moment...
(Obviously, making something unsigned would have mitigated the problem. Here, that's pretty obvious. But the definition might be quite a long way from the use.)

One reason you should do this for trivial code1 is that the suffix forces a type on the literal, and the type may be very important to produce the correct result.
Consider this bit of (somewhat silly) code:
#define magic_number(x) _Generic((x), \
unsigned int : magic_number_unsigned, \
int : magic_number_signed \
)(x)
unsigned magic_number_unsigned(unsigned) {
// ...
}
unsigned magic_number_signed(int) {
// ...
}
int main(void) {
unsigned magic = magic_number(10u);
}
It's not hard to imagine those function actually doing something meaningful based on the type of their argument. Had I omitted the suffix, the generic selection would have produced a wrong result for a very trivial call.
1 But perhaps not the particular code in your post.

In this case, it's completely useless.
In other cases, a suffix might be useful. For instance:
#include <stdio.h>
int
main()
{
printf("%zu\n", sizeof(123));
printf("%zu\n", sizeof(123LL));
return 0;
}
On my system, it will print 4 then 8.
But back to your code, yes it makes your code more explicit, nothing more.

Using bit operations to "turn off" binary digits of a pointer

I was able to use bit operations to "turn off" binary digits of a number.
Ex:
x = x & ~(1<<0)
x = x & ~(1<<1)
(and repeat until desired number of digits starting from the right are changed to 0)
I would like to apply this technique to a pointer's address.
Unfortunately, the & operator cannot be used with pointers. Using the same lines of code as above, where x is a pointer, the compiler says "invalid operands to binary & (have int and int)."
I tried to typecast the pointers as ints, but that doesn't work as I assume the ints are too small (and I just realized I'm not allowed to cast).
(note: though this is part of a homework problem, I've already reasoned out why I need to turn off some digits after a good couple hours, so I'm fine in that regard. I'm simply trying to see if I can get a clever technique to do what I want to do here).
Restrictions: I cannot use loops, conditionals, any special functions, constants greater than 255, division, mod.
(edit: added restrictions to the bottom)

Use uintptr_t from <stdint.h>. You should always use unsigned types for bit twiddling, and (u)intptr_t is specifically chosen to be able to hold a pointer's value.
Note however that adjusting a pointer manually and dereferencing it is undefined behaviour, so watch your step. You shall be able to recover the exact original value of the pointer (or another valid pointer) before doing so.
Edit : from your comment I understand that you don't plan on dereferencing the twiddled pointer at all, so no undefined behaviour for you. Here is how you can check if your pointers share the same 64-byte block :
uintptr_t p1 = (uintptr_t)yourPointer1;
uintptr_t p2 = (uintptr_t)yourPointer2;
uintptr_t mask = ~(uintptr_t)63u; // Shave off 5 low-order bits
return (p1 & mask) == (p2 & mask);

C language standard library includes the (optional though) type intptr_t, for which there is guarantee that "any valid pointer to void can be converted to this type, then converted back to pointer to void, and the result will compare equal to the original pointer".
Of course if you perform bitwise operation on the integer than the result is undefined behaviour.

Edit:
How unfortunate haha. I need a function to show two pointers are in
the same 64-byte block of memory. This holds true so long as every
digit but the least significant 6 digits of their binary
representations are equal. By making sure the last 6 digits are all
the same (ex: 0), I can return true if both pointers are equal. Well,
at least I hope so.
You should be able to check if they're in the same 64 block of memory by something like this:
if ((char *)high_pointer - (char *)low_pointer < 64) {
// do stuff
}
Edit2: This is likely to be undefined behaviour as pointed out by chris.
Original post:
You're probably looking for intptr_t or uintptr_t. The standard says you can cast to and from these types to pointers and have the value equal to the original.
However, despite it being a standard type, it is optional so some library implementations may choose not to implement it. Some architectures might not even represent pointers as integers so such a type wouldn't make sense.
It is still better than casting to and from an int or a long since it is guaranteed to work on implementations that supply it. Otherwise, at least you'll know at compile time that your program will break on a certain implementation/architecture.
(Oh, and as other answers have stated, manually changing the pointer when casted to an integer type and dereferencing it is undefined behaviour)

How is {int i=999; char c=i;} different from {char c=999;}?

My friend says he read it on some page on SO that they are different,but how could the two be possibly different?
Case 1
int i=999;
char c=i;
Case 2
char c=999;
In first case,we are initializing the integer i to 999,then initializing c with i,which is in fact 999.In the second case, we initialize c directly with 999.The truncation and loss of information aside, how on earth are these two cases different?
EDIT
Here's the link that I was talking of
why no overflow warning when converting int to char
One member commenting there says --It's not the same thing. The first is an assignment, the second is an initialization
So isn't it a lot more than only a question of optimization by the compiler?

They have the same semantics.
The constant 999 is of type int.
int i=999;
char c=i;
i created as an object of type int and initialized with the int value 999, with the obvious semantics.
c is created as an object of type char, and initialized with the value of i, which happens to be 999. That value is implicitly converted from int to char.
The signedness of plain char is implementation-defined.
If plain char is an unsigned type, the result of the conversion is well defined. The value is reduced modulo CHAR_MAX+1. For a typical implementation with 8-bit bytes (CHAR_BIT==8), CHAR_MAX+1 will be 256, and the value stored will be 999 % 256, or 231.
If plain char is a signed type, and 999 exceeds CHAR_MAX, the conversion yields an implementation-defined result (or, starting with C99, raises an implementation-defined signal, but I know of no implementations that do that). Typically, for a 2's-complement system with CHAR_BIT==8, the result will be -25.
char c=999;
c is created as an object of type char. Its initial value is the int value 999 converted to char -- by exactly the same rules I described above.
If CHAR_MAX >= 999 (which can happen only if CHAR_BIT, the number of bits in a byte, is at least 10), then the conversion is trivial. There are C implementations for DSPs (digital signal processors) with CHAR_BIT set to, for example, 32. It's not something you're likely to run across on most systems.
You may be more likely to get a warning in the second case, since it's converting a constant expression; in the first case, the compiler might not keep track of the expected value of i. But a sufficiently clever compiler could warn about both, and a sufficiently naive (but still fully conforming) compiler could warn about neither.
As I said above, the result of converting a value to a signed type, when the source value doesn't fit in the target type, is implementation-defined. I suppose it's conceivable that an implementation could define different rules for constant and non-constant expressions. That would be a perverse choice, though; I'm not sure even the DS9K does that.
As for the referenced comment "The first is an assignment, the second is an initialization", that's incorrect. Both are initializations; there is no assignment in either code snippet. There is a difference in that one is an initialization with a constant value, and the other is not. Which implies, incidentally, that the second snippet could appear at file scope, outside any function, while the first could not.

Any optimizing compiler will just make the int i = 999 local variable disappear and assign the truncated value directly to c in both cases. (Assuming that you are not using i anywhere else)

It depends on your compiler and optimization settings. Take a look at the actual assembly listing to see how different they are. For GCC and reasonable optimizations, the two blocks of code are probably equivalent.

Aside from the fact that the first also defines an object iof type int, the semantics are identical.

i,which is in fact 999
No, i is a variable. Semantically, it doesn't have a value at the point of the initialization of c ... the value won't be known until runtime (even though we can clearly see what it will be, and so can an optimizing compiler). But in case 2 you're assigning 999 to a char, which doesn't fit, so the compiler issues a warning.

will ~ operator change the data type?

When I read someone's code I find that he bothered to write an explicite type cast.
#define ULONG_MAX ((unsigned long int) ~(unsigned long int) 0)
When I write code
1 #include<stdio.h>
2 int main(void)
3 {
4 unsigned long int max;
5 max = ~(unsigned long int)0;
6 printf("%lx",max);
7 return 0;
8 }
it works as well. Is it just a meaningless coding style?

The code you read is very bad, for several reasons.
First of all user code should never define ULONG_MAX. This is a reserved identifier and must be provided by the compiler implementation.
That definition is not suitable for use in a preprocessor #if. The _MAX macros for the basic integer types must be usable there.
(unsigned long)0 is just crap. Everybody should just use 0UL, unless you know that you have a compiler that is not compliant with all the recent C standards with that respect. (I don't know of any.)
Even ~0UL should not be used for that value, since unsigned long may (theoretically) have padding bits. -1UL is more appropriate, because it doesn't deal with the bit pattern of the value. It uses the guaranteed arithmetic properties of unsigned integer types. -1 will always be the maximum value of an unsigned type. So ~ may only be used in a context where you are absolutely certain that unsigned long has no padding bits. But as such using it makes no sense. -1 serves better.
"recasting" an expression that is known to be unsigned long is just superfluous, as you observed. I can't imagine any compiler that bugs on that.
Recasting of expression may make sense when they are used in the preprocessor, but only under very restricted circumstances, and they are interpreted differently, there.
#if ((uintmax_t)-1UL) == SOMETHING
..
#endif
Here the value on the left evalues to UINTMAX_MAX in the preprocessor and in later compiler phases. So
#define UINTMAX_MAX ((uintmax_t)-1UL)
would be an appropriate definition for a compiler implementation.
To see the value for the preprocessor, observe that there (uintmax_t) is not a cast but an unknown identifier token inside () and that it evaluates to 0. The minus sign is then interpreted as binary minus and so we have 0-1UL which is unsigned and thus the max value of the type. But that trick only works if the cast contains a single identifier token, not if it has three as in your example, and if the integer constant has a - or + sign.

They are trying to ensure that the type of the value 0 is unsigned long. When you assign zero to a variable, it gets cast to the appropriate type.
In this case, if 0 doesn't happen to be an unsigned long then the ~ operator will be applied to whatever other type it happens to be and the result of that will be cast.
This would be a problem if the compiler decided that 0 is a short or char.
However, the type after the ~ operator should remain the same. So they are being overly cautious with the outer cast, but perhaps the inner cast is justified.
They could of course have specified the correct zero type to begin with by writing ~0UL.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight