assuming two arbitrary timestamps:
uint32_t timestamp1;
uint32_t timestamp2;
Is there a standard conform way to get a signed difference of the two beside the obvious variants of converting into bigger signed type and the rather verbose if-else.
Beforehand it is not known which one is larger, but its known that the difference is not greater than max 20bit, so it will fit into 32 bit signed.
int32_t difference = (int32_t)( (int64_t)timestamp1 - (int64_t)timestamp2 );
This variant has the disadvantage that using 64bit arithmetic may not be supported by hardware and is possible of course only if a larger type exists (what if the timestamp already is 64bit).
The other version
int32_t difference;
if (timestamp1 > timestamp2) {
difference = (int32_t)(timestamp1 - timestamp2);
} else {
difference = - ((int32_t)(timestamp2 - timestamp1));
}
is quite verbose and involves conditional jumps.
That is with
int32_t difference = (int32_t)(timestamp1 - timestamp2);
Is this guaranteed to work from standards perspective?
You can use a union type pun based on
typedef union
{
int32_t _signed;
uint32_t _unsigned;
} u;
Perform the calculation in unsigned arithmetic, assign the result to the _unsigned member, then read the _signed member of the union as the result:
u result {._unsigned = timestamp1 - timestamp2};
result._signed; // yields the result
This is portable to any platform that implements the fixed width types upon which we are relying (they don't need to). 2's complement is guaranteed for the signed member and, at the "machine" level, 2's complement signed arithmetic is indistinguishable from unsigned arithmetic. There's no conversion or memcpy-type overhead here: a good compiler will compile out what's essentially standardese syntactic sugar.
(Note that this is undefined behaviour in C++.)
Bathsheba's answer is correct but for completeness here are two more ways (which happen to work in C++ as well):
uint32_t u_diff = timestamp1 - timestamp2;
int32_t difference;
memcpy(&difference, &u_diff, sizeof difference);
and
uint32_t u_diff = timestamp1 - timestamp2;
int32_t difference = *(int32_t *)&u_diff;
The latter is not a strict aliasing violation because that rule explicitly allows punning between signed and unsigned versions of an integer type.
The suggestion:
int32_t difference = (int32_t)(timestamp1 - timestamp2);
will work on any actual machine that exists and offers the int32_t type, but technically is not guaranteed by the standard (the result is implementation-defined).
The conversion of an unsigned integer value to a signed integer is implementation defined. This is spelled out in section 6.3.1.3 of the C standard regarding integer conversions:
1 When a value with integer type is converted to another integer type other than
_Bool ,if the value can be represented by the new type, it is unchanged.
2 Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than
the maximum value that can be represented in the new type
until the value is in the range of the new type. 60)
3 Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined
or an implementation-defined signal is raised.
On implementations people are most likely to use, the conversion will occur the way you expect, i.e. the representation of the unsigned value will be reinterpreted as a signed value.
Specifically GCC does the following:
The result of, or the signal raised by, converting an integer to a signed integer type when the value cannot be represented in an object
of that type (C90 6.2.1.2, C99 and C11 6.3.1.3).
For conversion to a type of width N, the value is reduced modulo 2^N
to be within range of the type; no signal is raised.
MSVC:
When a long integer is cast to a short, or a short is cast to a char,
the least-significant bytes are retained.
For example, this line
short x = (short)0x12345678L;
assigns the value 0x5678 to x, and this line
char y = (char)0x1234;
assigns the value 0x34 to y.
When signed variables are converted to unsigned and vice versa, the
bit patterns remain the same. For example, casting -2 (0xFE) to an
unsigned value yields 254 (also 0xFE).
So for these implementations, what you proposed will work.
Rebranding Ian Abbott's macro-packaging of Bathseba's answer as an answer:
#define UTOS32(a) ((union { uint32_t u; int32_t i; }){ .u = (a) }.i)
int32_t difference = UTOS32(timestamp1 - timestamp2);
Summarizing the discussions on why this is more portable than a simple typecast: The C standard (back to C99, at least) specifies the representation of int32_t (it must be two's complement), but not in all cases how it should be cast from uint32_t.
Finally, note that Ian's macro, Bathseba's answer, and M.M's answers all also work in the more general case where the counters are allowed to wrap around 0, as is the case, for example, with TCP sequence numbers.
Related
I am working on a project where I often need to interpret certain variables as signed or unsigned values and do signed operations on them; however, in multiple cases subtle, seemingly insignificant changes swapped an unsigned interpretation to a signed one, while in other cases I couldn't force C to interpret it as a signed value and it remained unsigned. Here are two examples:
int32_t pop();
//Version 1
push((int32_t)( (-1) * (pop() - pop()) ) );
//Version 2
int32_t temp1 = pop();
int32_t temp2 = pop();
push((int32_t)( (-1) * (temp1 - temp2) ) );
/*Another example */
//Version 1
int32_t get_signed_argument(uint8_t* argument) {
return (int32_t)( (((int32_t)argument[0] << 8) & (int32_t)0x0000ff00 | (((int32_t)argument[1]) & (int32_t)0x000000ff) );
}
//Version 2
int16_t get_signed_argument(uint8_t* argument) {
return (int16_t)( (((int16_t)argument[0] << 8) & (int16_t)0xff00 | (((int16_t)argument[1]) & (int16_t)0x00ff) );
}
In the first example version 1 does not seem to multiply the value by -1, while version 2 does, but the only difference is storing the intermediate values of the calculation in temporary variables in one case or not doing so in the other.
In the second example the value returned by version 1 is the unsigned interpretation of the same bytes as the returned value of version 2, which interprets it in 2's complement. The only difference is using int16_t or int32_t.
In both cases I am using signed types (int32_t, int16_t), but this doesn't seem to be sufficient to interpret them as signed values. Can you please explain why these differences cause a difference in signedness? Where can I find more information on this? How can I use the shorter version of the first example, but still get signed values? Thank you in advance!
I assume pop() returns an unsigned type. If so, the expression pop() - pop() will be performed using unsigned arithmetic, which is modular and wraps around if the second pop() is larger than the first one (BTW, C doesn't specify a particular order of evaluation, so there's no guarantee which popped value will be first or second).
As a result, the value that you multiply by -1 might not be the difference you expect; if there was wraparound, it could be a large positive value rather than a negative value.
You can get the equivalent of the temporaries if you cast at least one of the function calls directly.
push(-1 * ((int32_t)pop() - pop()));
if you just want to convert a binary buffer to the longer signed integers for example received form somewhere (I assume the little endian)
int16_t bufftoInt16(const uint8_t *buff)
{
return (uint16_t)buff[0] | ((uint16_t)buff[1] << 8);
}
int32_t bufftoInt32(const uint8_t *buff)
{
return (uint32_t)buff[0] | ((uint32_t)buff[1] << 8) | ((uint32_t)buff[2] << 16) | ((uint32_t)buff[3] << 24) ;
}
int32_t bufftoInt32_2bytes(const uint8_t *buff)
{
int16_t result = (uint16_t)buff[0] | ((uint16_t)buff[1] << 8);
return result;
}
int main()
{
int16_t x = -5;
int32_t y = -10;
int16_t w = -5567;
printf("%hd %d %d\n", bufftoInt16(&x), bufftoInt32(&y), bufftoInt32_2bytes(&w));
return 0;
}
casting bytes to signed integers works completely different way than the unsigned shift.
The result of an expression in C has its type determined by the types of the component operands of that expression, not by any cast you may apply to that result. As Barmar comments above, to force the type of the result you must cast one of the operands.
I am working on a project where I often need to interpret certain variables as signed or unsigned values and do signed operations on them.
That seems fraught. I take you to mean that you want to reinterpret objects' representations as having different types (varying only in signedness) in different situations, or perhaps that you want to convert values as if you were reinterpreting object representations. This sort of thing generally produces a mess, though you can handle it if you take sufficient care. That can be easier if you are willing to depend on details of your implementation, such as its representations of various types.
It is imperative in such matters is to know and understand all the rules for implicit conversions, both the integer promotions and the usual arithmetic conversions, and under which circumstances they apply. It is essential to understand the effect of these rules on the evaluation of your expressions -- both the type and the value of all intermediate and final results.
For example, the best you can hope for with respect to the cast in
push((int32_t)( (-1) * (temp1 - temp2) ) );
is that it is useless. If the value is not representable in that type then (it being a signed integer type) a signal may be raised, and if not, then the result is implementation-defined. If the value is representable, however, then the conversion does not change it. In any case, the result is not exempted from further conversion to the type of push()'s parameter.
For another example, the difference between version 1 and version 2 of your first example is largely which values are converted, when (but see also below). If the two indeed produce different results then it follows that the return type of pop() is different from int32_t. In that case, if you want to convert those to a different type to perform an operation on them then you must in fact do that. Your version 2 accomplishes that via assigning the pop() results to variables of the desired type, but it would be more idiomatic to perform the conversions via casts:
push((-1) * ((int32_t)pop() - (int32_t)pop()));
Beware, however, that if the results of the pop() calls depend on their order -- if they pop elements off a stack, for instance -- then you have a further problem: the relative order in which those operands are evaluated is unspecified, and you cannot safely assume that it will be consistent. For that reason, not because of typing considerations, your version 2 is preferable here.
Overall, however, if you have a stack whose elements may represent values of different types, then I would suggest making the element type a union (if the type of each element is implicit from context) or a tagged union (if elements need to carry information about their own types. For example,
union integer {
int32_t signed;
uint32_t unsigned;
};
union integer pop();
void push(union integer i);
union integer first = pop();
union integer second = pop();
push((union integer) { .signed = second.signed - first.signed });
To help you see what's happening in your code, I've included the text of the standard that explains how automatic type conversions are done (for integers), along with the section on bitwise shifting since that works a bit differently. I then step through your code to see exactly what intermediate types exist after each operation.
Relevant parts of the standard
6.3.1.1 Boolean, characters, and integers
If an int can represent all values of the original type, the value is converted to an int; otherwise, it is converted to an unsigned int. These are called the integer promotions. All other types are unchanged by the integer promotions.
6.3.1.8 Usual Arithmetic Conversions
(I'm just summarizing the relevant parts here.)
Integer promotion is done.
If they are both signed or both unsigned, they are both converted to the larger type.
If the unsigned type is larger, the signed type is converted to the unsigned type.
If the signed type can represent all values of the unsigned type, the unsigned type is converted to the signed one.
Otherwise, they are both converted to the unsigned type of the same size as the signed type.
(Basically, if you've got a OP b, the size of the type used will be the largest of int, type(a), type(b), and it
will prefer types that can represent all values representable by type(a) and type(b). And finally, it favors signed types.
Most of the time, that means it'll be int.)
6.5.7 Bitwise shift operators
The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with zeros. If E1 has an unsigned type, the value of the result is $E1 x 2^{E2}$,reduced modulo one more than the maximum value representable in the result type. If E1 has a signed type and nonnegative value, and $E1 x 2^{E2}$ is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.
How all that applies to your code.
I'm skipping the first example for now, since I don't know what type pop() returns. If you add that information to your
question, I can address that example as well.
Let's step through what happens in this expression (note that you had an extra ( after the first cast in your version; I've removed that):
(((int32_t)argument[0] << 8) & (int32_t)0x0000ff00 | (((int32_t)argument[1]) & (int32_t)0x000000ff) )
Some of these conversions depend on the relative sizes of the types.
Let INT_TYPE be the larger of int32_t and int on your system.
((int32_t)argument[0] << 8)
argument[0] is explicitly cast to int32_t
8 is already an int, so no conversion happens
(int32_t)argument[0] is converted to INT_TYPE.
The left shift happens and the result has type INT_TYPE.
(Note that if argument[0] could have been negative, the shift would be undefined behavior. But since it was originally unsigned, so you're safe here.)
Let a represent the result of those steps.
a & (int32_t)0x0000ff00
0x000ff0 is explicitly cast to int32_t.
Usual arithmetic conversions. Both sides are converted to INT_TYPE. Result is of type INT_TYPE.
Let b represent the result of those steps.
(((int32_t)argument[1]) & (int32_t)0x000000ff)
Both of the explicit casts happen
Usual arithmetic conversions are done. Both sides are now INT_TYPE.
Result has type INT_TYPE.
Let c represent that result.
b | c
Usual arithmetic conversions; no changes since they're both INT_TYPE.
Result has type INT_TYPE.
Conclusion
So none of the intermediate results are unsigned here. (Also, most of the explicit casts were unnecessary, especially if sizeof(int) >= sizeof(int32_t) on your system).
Additionally, since you start with uint8_ts, never shift more than 8 bits, and are storing all the intermediate results in types of at least 32 bits, the top 16 bits will always be 0 and the values will all be non-negative, which means that the signed and unsigned types represent all the values you could have here exactly the same.
What exactly are you observing that makes you think it's using unsigned types where it should use signed ones? Can we see example inputs and outputs along with the outputs you expected?
Edit:
Based on your comment, it appears that the reason it isn't working the way you expected is not because the type is unsigned, but because you're generating the bitwise representations of 16 bit signed ints but storing them in 32 bit signed ints. Get rid of all the casts you have other than the (int32_t)argument[0] ones (and change those to (int)argument[0]. int is generally the size that the system operates on most efficiently, so your operations to use int unless you have a specific reason to use another size). Then cast the final result to int16_t.
I wouldn't expect the value that gets printed to be the initial negative value. Is there something I'm missing for type casting?
#include<stdint.h>
int main() {
int32_t color = -2451337;
uint32_t color2 = (uint32_t)color;
printf("%d", (uint32_t)color2);
return 0;
}
int32_t color = -2451337;
uint32_t color2 = (uint32_t)color;
The cast is unnecessary; if you omit it, exactly the same conversion will be done implicitly.
For any conversion between two numeric types, if the value is representable in both types, the conversion preserves the value. But since color is negative, that's not the case here.
For conversion from a signed integer type to an unsigned integer type, the result is implementation-defined (or it can raise an implementation-defined signal, but I don't know of any compiler that does that).
Under most compilers, conversions between integer types of the same size just copies or reinterprets the bits making up the representation. The standard requires int32_t to use two's-complement representation, so if the conversion just copies the bits, then the result will be 4292515959.
(Other results are permitted by the C standard, but not likely to be implemented by real-world compilers. The standard permits one's-complement and sign-and magnitude representations for signed integer types, but specifically requires int32_t to use two's-complement; a C compiler for a one's complement CPU probably just would't define int32_t.)
printf("%d", (uint32_t)color2);
Again, the cast is unnecessary, since color2 is already of type uint32_t. But the "%d" format requires an argument of type int, which is a signed type (that may be as narrow as 16 bits). In this case, the uint32_t value isn't converted to int. Most likely the representation of color2 will be treated as if it were an int object, but the behavior is undefined, so as far as the C standard is concerned quite literally anything could happen.
To print a uint32_t value, you can use the PRId32 macro defined in <inttypes.h>:
printf("%" PRId32, color32);
Or, perhaps more simply, you can convert it to the widest unsigned integer type and use "%ju":
printf("%ju", (uintmax_t)color32);
This will print the implementation-defined value (probably 4292515959) of color32.
And you should add a newline \n to the format string.
More quibbles:
You're missing #include <stdio.h>, which is required if you call printf.
int main() is ok, but int main(void) is preferred.
You took a bunch of bits (stored in signed value). You then told the CPU to interpret that bunch of bits as unsigned. You then told the cpu to render the same bunch of bits as signed again (%d). You would therefore see the same as you first entered.
C just deals in bunches of bits. If the value you had chosen was near the representational limit of the type(s) involved (read up on twos-complement representation), then we might see some funky effects, but the value you happened to choose wasn't. So you got back what you put in.
I would like to convert unsigned int to uint64 inside C function. uint64 is defined in R package int64.
EDIT
This question is about conversion from C unsigned int data type to uint64 R language data type.
"int64 package has been developped so that 64 bit integer vectors are represented using only R data structures, i.e data is not represented as external pointers to some C++ object. Instead, each 64 bit integer is represented as a couple of regular 32 bit integers, each of them carrying half the bits of the underlying 64 bit integer. This was a choice by design so that 64 bit integer vectors can be serialized and used as data frame columns."
An unsigned int is required to be able to store values between at least the range 0-65536. An int64_t (that's the portable version, from <stdint.h>) will be able to store values between -(263-1) and 263. There's a problem here, which is that an unsigned int might be 64 bits in length and might represent values outside the range of an int64_t (see §5.2.4.2.1 p1 of the C standard, and the section below).
Here's what the standard says:
6.3.1.3 Signed and unsigned integers
When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it is unchanged.
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.60)
Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.
60) The rules describe arithmetic on the mathematical value, not the value of a given type of expression.
Ignoring an implementation-defined signal corresponding to a computational exception is undefined behaviour.
In the case of unsigned-to-signed conversions, I suggest defining your behaviour explicitly. Saturation is the easiest: When your unsigned int value is greater than INT64_MAX, the conversion would result in INT64_MAX. This looks something like x > INT64_MAX : INT64_MAX ? x. Wrapping LIA style (eg. unsigned int x = UINT_MAX; ++x == 0) is possible for int64_t because of the guarantee that int64_t won't contain padding, but more work is necessary to make portability guarantees. I suggest something like (x & INT64_MIN) > INT64_MAX ? -(x & INT64_MAX) : x & INT64_MAX, if you can find some assertion that your int64 will have the same representation as the C standard int64_t.
All operations on "standard" signed integer types in C (short, int, long, etc) exhibit undefined behaviour if they yield a result outside of the [TYPE_MIN, TYPE_MAX] interval (where TYPE_MIN, TYPE_MAX are the minimum and the maximum integer value respectively. that can be stored by the specific integer type.
According to the C99 standard, however, all intN_t types are required to have a two's complement representation:
7.8.11.1 Exact-width integer types
1. The typedef name intN_t designates a signed integer type with width N , no padding
bits, and a two’s complement representation. Thus, int8_t denotes a signed integer
type with a width of exactly 8 bits.
Does this mean that intN_t types in C99 exhibit well-defined behaviour in case of an integer overflow? For example, is this code well-defined?
#include <stdio.h>
#include <stdint.h>
#include <inttypes.h>
int main(void)
{
printf("Minimum 32-bit representable number: %" PRId32 "\n", INT32_MAX + 1);
return 0;
}
No, it doesn't.
The requirement for a 2's-complement representation for values within the range of the type does not imply anything about the behavior on overflow.
The types in <stdint.h> are simply typedefs (aliases) for existing types. Adding a typedef doesn't change a type's behavior.
Section 6.5 paragraph 5 of the C standard (both C99 and C11) still applies:
If an exceptional condition occurs during the evaluation of an
expression (that is, if the result is not mathematically defined or
not in the range of representable values for its type), the behavior
is undefined.
This doesn't affect unsigned types because unsigned operations do not overflow; they're defined to yield the wrapped result, reduced modulo TYPE_MAX + 1. Except that unsigned types narrower than int are promoted to (signed) int, and can therefore run into the same problems. For example, this:
unsigned short x = USHRT_MAX;
unsigned short y = USHRT_MAX;
unsigned short z = x * y;
causes undefined behavior if short is narrower than int. (If short and int are 16 and 32 bits, respectively, then 65535 * 65535 yields 4294836225, which exceeds INT_MAX.)
Although storing an out-of-range value to a signed type stored in memory will generally store the bottom bits of the value, and reloading the value from memory will sign-extend it, many compilers' optimizations may assume that signed arithmetic won't overflow, and the effects of overflow may be unpredictable in many real scenarios. As a simple example, on a 16-bit DSP which uses its one 32-bit accumulator for return values (e.g. TMS3205X), int16_t foo(int16_t bar) { return bar+1;} a compiler would be free to load bar, sign-extended, into the accumulator, add one to it, and return. If the calling code were e.g. long z = foo(32767), the code might very well set z to 32768 rather than -32768.
I've read and wondered about the source code of sqlite
static int strlen30(const char *z){
const char *z2 = z;
while( *z2 ){ z2++; }
return 0x3fffffff & (int)(z2 - z);
}
Why use strlen30() instead of strlen() (in string.h)??
The commit message that went in with this change states:
[793aaebd8024896c] part of check-in [c872d55493] Never use strlen(). Use our own internal sqlite3Strlen30() which is guaranteed to never overflow an integer. Additional explicit casts to avoid nuisance warning messages. (CVS 6007) (user: drh branch: trunk)
(this is my answer from Why reimplement strlen as loop+subtraction? , but it was closed)
I can't tell you the reason why they had to re-implement it, and why they chose int instead if size_t as the return type. But about the function:
/*
** Compute a string length that is limited to what can be stored in
** lower 30 bits of a 32-bit signed integer.
*/
static int strlen30(const char *z){
const char *z2 = z;
while( *z2 ){ z2++; }
return 0x3fffffff & (int)(z2 - z);
}
Standard References
The standard says in (ISO/IEC 14882:2003(E)) 3.9.1 Fundamental Types, 4.:
Unsigned integers, declared unsigned, shall obey the laws of arithmetic modulo 2n where n is the number of bits in the value representation of that particular size of integer. 41)
...
41): This implies that unsigned arithmetic does not overflow because a result that cannot be represented by the resulting unsigned integer
type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting unsigned integer
type
That part of the standard does not define overflow-behaviour for signed integers. If we look at 5. Expressions, 5.:
If during the evaluation of an expression, the result is not mathematically defined or not in the range of representable values for its type, the behavior is undefined, unless such an expression is a constant expression
(5.19), in which case the program is ill-formed. [Note: most existing implementations of C + + ignore integer
overflows. Treatment of division by zero, forming a remainder using a zero divisor, and all floating point
exceptions vary among machines, and is usually adjustable by a library function. ]
So far for overflow.
As for subtracting two pointers to array elements, 5.7 Additive operators, 6.:
When two pointers to elements of the same array object are subtracted, the result is the difference of the subscripts of the two array elements. The type of the result is an implementation-defined signed integral type; this type shall be the same type that is defined as ptrdiff_t in the cstddef header (18.1). [...]
Looking at 18.1:
The contents are the same as the Standard C library header stddef.h
So let's look at the C standard (I only have a copy of C99, though), 7.17 Common Definitions :
The types used for size_t and ptrdiff_t should not have an integer conversion rank
greater than that of signed long int unless the implementation supports objects
large enough to make this necessary.
No further guarantee made about ptrdiff_t. Then, Annex E (still in ISO/IEC 9899:TC2) gives the minimum magnitude for signed long int, but not a maximum:
#define LONG_MAX +2147483647
Now what are the maxima for int, the return type for sqlite - strlen30()? Let's skip the C++ quotation that forwards us to the C-standard once again, and we'll see in C99, Annex E, the minimum maximum for int:
#define INT_MAX +32767
Summary
Usually, ptrdiff_t is not bigger than signed long, which is not smaller than 32bits.
int is just defined to be at least 16bits long.
Therefore, subtracting two pointers may give a result that does not fit into the int of your platform.
We remember from above that for signed types, a result that does not fit yields undefined behaviour.
strlen30 does applies a bitwise or upon the pointer-subtract-result:
| 32 bit |
ptr_diff |10111101111110011110111110011111| // could be even larger
& |00111111111111111111111111111111| // == 3FFFFFFF<sub>16</sub>
----------------------------------
= |00111101111110011110111110011111| // truncated
That prevents undefiend behaviour by truncation of the pointer-subtraction result to a maximum value of 3FFFFFFF16 = 107374182310.
I am not sure about why they chose exactly that value, because on most machines, only the most significant bit tells the signedness. It could have made sense versus the standard to choose the minimum INT_MAX, but 1073741823 is indeed slightly strange without knowing more details (though it of course perfectly does what the comment above their function says: truncate to 30bits and prevent overflow).
The CVS commit message says:
Never use strlen(). Use our own internal sqlite3Strlen30() which is guaranteed to never overflow an integer. Additional explicit casts to avoid nuisance warning messages. (CVS 6007)
I couldn't find any further reference to this commit or explanation how they got an overflow in that place. I believe that it was an error reported by some static code analysis tool.