Is SSE2 signed integer overflow undefined? - c

Signed integer overflow is undefined in C and C++. But what about signed integer overflow within the individual fields of an __m128i? In other words, is this behavior defined in the Intel standards?
#include <inttypes.h>
#include <stdio.h>
#include <stdint.h>
#include <emmintrin.h>
union SSE2
{
__m128i m_vector;
uint32_t m_dwords[sizeof(__m128i) / sizeof(uint32_t)];
};
int main()
{
union SSE2 reg = {_mm_set_epi32(INT32_MAX, INT32_MAX, INT32_MAX, INT32_MAX)};
reg.m_vector = _mm_add_epi32(reg.m_vector, _mm_set_epi32(1, 1, 1, 1));
printf("%08" PRIX32 "\n", (uint32_t) reg.m_dwords[0]);
return 0;
}
[myria#polaris tests]$ gcc -m64 -msse2 -std=c11 -O3 sse2defined.c -o sse2defined
[myria#polaris tests]$ ./sse2defined
80000000
Note that the 4-byte-sized fields of an SSE2 __m128i are considered signed.

You are asking about a specific implementation issue (using SSE2) and not about the standard. You've answered your own question "signed integer overflow is undefined in C".
When you are dealing with c intrinsics you aren't even programming in C! These are inserting assembly instructions in line. It is doing it in a some what portable way, but it is no longer true that your data is a signed integer. It is a vector type being passed to an SSE intrinsic. YOU are then casting that to an integer and telling C that you want to see the result of that operation. Whatever bytes happen to be there when you cast is what you will see and has nothing to do with signed arithmetic in the C standard.
Things are a bit different if the compiler inserts SSE instructions (say in a loop). Now the compiler is guaranteeing that the result is the same as a signed 32 bit operation ... UNLESS there is undefined behaviour (e.g. an overflow) in which case it can do whatever it likes.
Note also that undefined doesn't mean unexpected ... whatever behaviour your observe for auto-vectorization might be consistent and repeatable (maybe it does always wrap on your machine ... that might not be true with all cases for surrounding code, or all compilers. Or if the compiler selects different instructions depending on availability of SSSE3, SSE4, or AVX*, possibly not even all processors if it makes different code-gen choices for different instruction-sets that do or don't take advantage of signed overflow being UB).
EDIT:
Okay, well now that we are asking about "the Intel standards" (which don't exist, I think you mean the x86 standards), I can add something to my answer. Things are a little bit convoluted.
Firstly, the intrinsic _mm_add_epi32 is defined by Microsoft to match Intel's intrinsics API definition (https://software.intel.com/sites/landingpage/IntrinsicsGuide/ and the intrinsic notes in Intel's x86 assembly manuals). They cleverly define it as doing to a __m128i the same thing the x86 PADDD instruction does to an XMM register, with no more discussion (e.g. is it a compile error on ARM or should it be emulated?).
Secondly, PADDD isn't only a signed addition! It is a 32 bit binary add. x86 uses two's complement for signed integers, and adding them is the same binary operation as unsigned base 2. So yes, paddd is guaranteed to wrap. There is a good reference for all the x86 instructions here.
So what does that mean: again, the assumption in your question is flawed because there isn't even any overflow. So the output you see should be defined behaviour. Note that it is defined by Microsoft and x86 (not by the C Standard).
Other x86 compilers also implement Intel's intrinsics API the same way, so _mm_add_epi32 is portably guaranteed to just wrap.

This isn't "signed integer overflow within the fields of an __m128i". This is a function call. (Being a compiler intrinsic is just an optimization, much like inlining, and that doesn't interact with the C standard as long as the as-if rule is respected)
Its behavior must follow the contract (preconditions, postconditions) that the function developer documented. Usually intrinsics are documented by the compiler vendor, although they tend to coordinate the naming and contract of intrinsics to aid in porting code.

Related

Do we have atomic uint32 type in C?

sig_atomic_t is a typedef of int. But I am curious do we have an atomic type which is matched to uint32_t ?
C11 defines following typedefs to atomic types in <stdatomic.h>:
atomic_bool
atomic_char
atomic_schar
atomic_uchar
atomic_short
atomic_ushort
atomic_int
atomic_uint
atomic_long
atomic_ulong
atomic_llong
atomic_ullong
atomic_char16_t
atomic_char32_t
atomic_wchar_t
atomic_int_least8_t
atomic_uint_least8_t
atomic_int_least16_t
atomic_uint_least16_t
atomic_int_least32_t
atomic_uint_least32_t
atomic_int_least64_t
atomic_uint_least64_t
atomic_int_fast8_t
atomic_uint_fast8_t
atomic_int_fast16_t
atomic_uint_fast16_t
atomic_int_fast32_t
atomic_uint_fast32_t
atomic_int_fast64_t
atomic_uint_fast64_t
atomic_intptr_t
atomic_uintptr_t
atomic_size_t
atomic_ptrdiff_t
atomic_intmax_t
atomic_uintmax_t
There is no atomic_uint32_t, so your options are:
You can use _Atomic(uint32_t) directly.
You can use one of existing alternative types (atomic_uint_least32_t, atomic_uint_fast32_t or even atomic_char32_t) if this fits your purpose (probably it doesn't).
You can assume atomic_uint is 32-bit and use it as a replacement. This should be actually one of the most portable ways as most OS (*BSDs, Linux, Windows) assume int is a 32-bit type.
Ivan's answer is great (I hope you accept his, not this one), but it's worth mentioning that some compilers (I'm looking at you, MSVC) don't support C11 atomics.
If you're not concerned with such compilers, use C11 atomics.
If, OTOH, you need a bit portability, you may want to take a look at the atomic module in Portable Snippets (disclaimer: it's one of my projects, so take this suggestion with a grain of salt). There is no unsigned 32-bit atomic, but there are 32- and 64-bit signed atomic types which work well with a lot of compilers, including old (pre-C11) GCC, clang, and ICC, as well as suncc, ARM, and a few others.

Should enum never be used in an API?

I am using a C library provided to me already compiled. I have limited information on the compiler, version, options, etc., used when compiling the library. The library interface uses enum both in structures that are passed and directly as passed parameters.
The question is: how can I assure or establish that when I compile code to use the provided library, that my compiler will use the same size for those enums? If it does not, the structures won't line up, and the parameter passing may be messed up, e.g. long vs. int.
My concern stems from the C99 standard, which states that the enum type:
shall be compatible with char, a signed integer type, or an unsigned
integer type. The choice of type is implementation-defined, but shall
be capable of representing the values of all the members of the
enumeration.
As far as I can tell, so long as the largest value fits, the compiler can pick any type it darn well pleases, effectively on a whim, potentially varying not only between compilers, but different versions of the same compiler and/or compiler options. It could pick 1, 2, 4, or 8-byte representations, resulting in potential incompatibilities in both structures and parameter passing. (It could also pick signed or unsigned, but I don't see a mechanism for that being a problem in this context.)
Am I missing something here? If I am not missing something, does this mean that enum should never be used in an API?
Update:
Yes, I was missing something. While the language specification doesn't help here, as noted by #Barmar the Application Binary Interface (ABI) does. Or if it doesn't, then the ABI is deficient. The ABI for my system indeed specifies that an enum must be a signed four-byte integer. If a compiler does not obey that, then it is a bug. Given a complete ABI and compliant compilers, enum can be used safely in an API.
APIs that use enum are depending on the assumption that the compiler will be consistent, i.e. given the same enum declaration, it will always choose the same underlying type.
While the language standard doesn't specifically require this, it would be quite perverse for a compiler to do anything else.
Furthermore, all compilers for a particular OS need to be consistent with the OS's ABI. Otherwise, you would have far more problems, such as the library using 64-bit int while the caller uses 32-bit int. Ideally, the ABI should constrain the representation of enums, to ensure compatibility.
More generally, the language specification only ensures compatibility between programs compiled with the same implementation. The ABI ensures compatibility between programs compiled with different implementations.
From the question:
The ABI for my system indeed specifies that an enum must be a signed four-byte integer. If a compiler does not obey that, then it is a bug.
I'm surprised about that. I suspect in reality you're compiler will select a 64-bit (8 byte) size for your enum if you define an enumerated constant with a value larger that 2^32.
On my platforms (MinGW gcc 4.6.2 targeting x86 and gcc 4,.4 on Linux targeting x86_64), the following code says that I get both 4 and 8 byte enums:
#include <stdio.h>
enum { a } foo;
enum { b = 0x123456789 } bar;
int main(void) {
printf("%lu\n", sizeof(foo));
printf("%lu", sizeof(bar));
return 0;
}
I compiled with -Wall -std=c99 switches.
I guess you could say that this is a compiler bug. But the alternatives of removing support for enumerated constants larger than 2^32 or always using 8-byte enums both seem undesirable.
Given that these common versions of GCC don't provide a fixed size enum, I think the only safe action in general is to not use enums in APIs.
Further notes for GCC
Compiling with "-pedantic" causes the following warnings to be generated:
main.c:4:8: warning: integer constant is too large for 'long' type [-Wlong-long]
main.c:4:12: warning: ISO C restricts enumerator values to range of 'int' [-pedantic]
The behavior can be tailored via the --short-enums and --no-short-enums switches.
Results with Visual Studio
Compiling the above code with VS 2008 x86 causes the following warnings:
warning C4341: 'b' : signed value is out of range for enum constant
warning C4309: 'initializing' : truncation of constant value
And with VS 2013 x86 and x64, just:
warning C4309: 'initializing' : truncation of constant value

Type specifications in platform ABIs

Which of these items can safely be assumed to be defined in any practically-usable platform ABI?
Value of CHAR_BIT
Size, alignment requirements and object representation of:
void*, size_t, ptrdiff_t
unsigned char and signed char
intptr_t and uintptr_t
float, double and long double
short and long long
int and long (but here I expect a "no")
Pointer to an object type for which the platform ABI specifies these properties
Pointer to function whose type only involves types for which the platform ABI specifies these properties
Object representation of a null object pointer
Object representation of a null function pointer
For example, if I have a library (compiled by an unknown, but ABI-conforming compiler) which publishes this function:
void* foo(void *bar, size_t baz, void* (*qux)());
can I assume to be able to safely call it in my program regardless of the compiler I use?
Or, taken the other way round, if I am writing a library, is there a set of types such that if I limit the library's public interface to this set, it will be guaranteed to be usable on all platforms where it builds?
I don't see how you can expect any library to be universally compatible. If that were possible, there would not be so many compiled variations of libraries.
For example, you could call a 64-bit library from a 16-bit program as long as you set up the call correctly. But you would have to know you're calling a 64-bit based library.
Portability is a much-talked about goal, but few truly achieve it. After 30+ years of system-level, firmware and application programming, I think of it as more of a fantasy versus a goal. Unfortunately, hardware forces us to optimize for the hardware. Therefore, when I write a library, I use the following:
Compile for ABI
Use a pointer to a structure for input and output for all function calls:
int lib_func(struct *input, struct *output);
Where the returning int indicates errors only. I make all error codes unique. I require the user to call an init function prior to any use of the library. The user calls it as:
lib_init(sizeof(int), sizeof(char *), sizeof(long), sizeof(long long));
So that I can decide if there will be any trouble or modify any assumptions if needed. I also add a function allowing the user to learn my data sizes and alignment in addition to version numbers.
This is not to say the user or I am expected to "on-the-fly" modify code or spend lots of CPU power reworking structures. But this allows the application to make absolutely sure it's compatible with me and vice-versa.
The other option which I have employed in the past, is to simply include several entry-point functions with my library. For example:
int lib_func32();
int lib_func16();
int lib_func64();
It makes a bit of a mess for you, but you can then fix it up using the preprocessor:
#ifdef LIB_USE32
#define lib_function lib_func32
#endif
You can do the same with data structures but I'd recommend using the same size data structure regardless of CPU size -- unless performance is a top-priority. Again, back to the hardware!
The final option I explore is whether to have entry functions of all sizes and styles which convert the input to my library's expectations, as well as my library's output.
For example, your lib_func32(&input, &output) can be compiled to expect a 32-bit aligned, 32-bit pointer but it converts the 32-bit struct into your internal 64-bit struct then calls your 64 bit function. When that returns, it reformats the 64-bit struct to its 32-bit equivalent as pointed to by the caller.
int lib_func32(struct *input32, struct *output32)
{
struct input64;
struct output64;
int retval;
lib_convert32_to_64(input32, &input64);
retval = lib_func64(&input64, &output64);
lib_convert64_to_32(&output64, output32);
return(retval);
}
In summary, a totally portable solution is not viable. Even if you begin with total portability, eventually you will have to deviate. This is when things truly get messy. You break your style for deviations which then breaks your documentation and confuses users. I think it's better to just plan it from the start.
Hardware will always cause you to have deviations. Just consider how much trouble 'endianness' causes -- not to mention the number of CPU cycles which are used each day swapping byte orders.
The C standard contains an entire section in the appendix summarizing just that:
J.3 Implementation-defined behavior
A completely random subset:
The number of bits in a byte
Which of signed char and unsigned char is the same as char
The text encodings for multibyte and wide strings
Signed integer representation
The result of converting a pointer to an integer and vice versa (6.3.2.3). Note that this means any pointer, not just object pointers.
Update: To address your question about ABIs: An ABI (application binary interface) is not a standardized concept, and it isn't said anywhere that an implementation must even specify an ABI. The ingredients of an ABI are partly the implementation-defined behaviour of the language (though not all of it; e.g. signed-to-unsigned conversion is implementation defined, but not part of an ABI), and most of the implementation-defined aspects of the language are dictated by the hardware (e.g. signed integer representation, floating point representation, size of pointers).
However, more important aspects of an ABI are things like how function calls work, i.e. where the arguments are stored, who's responsible for cleaning up the memory, etc. It is crucial for two compilers to agree on those conventions in order for their code to be binarily compatible.
In practice, an ABI is usually the result of an implementation. Once the compiler is complete, it determines -- by virtue of its implementation -- an ABI. It may document this ABI, and other compilers, and future versions of the same compiler, may like to stick to those conventions. For C implementations on x86, this has worked rather well and there are only a few, usually well documented, free parameters that need to be communicated for code to be interoperable. But for other languages, most notably C++, you have a completely different picture: There is nothing coming near a standard ABI for C++ at all. Microsoft's compiler breaks the C++ ABI with every release. GCC tries hard to maintain ABI compatibility across versions and uses the published Itanium ABI (ironically for a now dead architecture). Other compilers may do their own, completely different thing. (And then you have of course issues with C++ standard library implementations, e.g. does your string contain one, two, or three pointers, and in which order?)
To summarize: many aspects of a compiler's ABI, especially pertaining to C, are dictated by the hardware architecture. Different C compilers for the same hardware ought to produce compatible binary code as long as certain aspects like function calling conventions are communicated properly. However, for higher-level languages all bets are off, and whether two different compilers can produce interoperable code has to be decided on a case-by-case basis.
If I understand your needs correctly, uint style ones are the only ones that will give you binary compatibility guarantee and of cause int, char will but others tend to differ. i.e long on Windows and Linux, Windows considers it 4byte and Linux as 8byte. If you are really dependent on ABI, you have to plan for the platforms you are going to deliver and may be use typedefs to make things standardized and readable.

-ansi -pedantic 64 bits length integers in C

I would like to know if there is an equivalent to int64_t in C that would work on 32 and 64 bits platforms and that is ansi and pedantic gcc modes compliant.
I found this interesting post, but it relates on C++.
I tried to used long long but i get an integer overflow in expression [-WOverflow] error. Moreover long long is not supported by ISO C90.
I also tried what is suggested in this post, but i still have a -WOverflow error with using int64_t
Any solutions ?
In C89 (required by -ansi flag), there is no standard way to use a 64 bits integer. You have to rely on the types provided by your implementation.
In C99, some implementations may define int64_t, since it is an optional type. As for long long (C99), there is no guarantee that its width is exactly of 64 bits.

128-bit arithmetic on x64 in C

When implementing bignums on x86, obviously the most efficient choice for digit size is 32 bits. However, you need arithmetic up to twice the digit size (i.e. 32+32=33, 32*32=64, 64/32=32). Fortunately, not only does x86 provide this, but it's also accessible from portable C (uint64_t).
Similarly, on x64 it would be desirable to use 64-bit digits. This would require 128 bit arithmetic (i.e. 64+64=65, 64*64=128, 128/64=64). Fortunately, x64 provides this. Unfortunately, it's not accessible from portable C, though obviously one could dip into assembly.
So my question is whether it's accessible from non-portable C. Do any C compilers on x64 provide access to this, and if so, what's the syntax?
(Note that I'm not talking about 128-bit vectors that are strictly treated as collections of 32 or 64-bit words with no carry propagation between them, but about actual 128-bit integer operations.)
GCC 4.1 introduced initial 128-bit integer support with the __int128_t and __uint128_t built-in types but 128-bit type was officially released since GCC 4.6 as __int128 / unsigned __int128
Clang also supports those types although I don't know since when. The first version on Godbolt (3.0.0) does support __int128_t though
ICC gained the same support since version 13.0.0: 128-bit integers supporting +, -, *, /, and % in the Intel C Compiler?
See also
Is there a 128 bit integer in gcc?
What gcc versions support the __int128 intrinsic type?
If you're on MSVC then there's no direct support for a 128-bit type but there are many intrinsics helping you do 128-bit operations:
64*64=128: _mul128(), _umul128(), __mulh(), __umulh()
128/64=64: _div128(), _udiv128()
64+64=65: The carry in an addition can be easily obtained by comparing the low part of the sum with any of the operands:
struct uint128 {
uint64_t H, L;
};
inline uint128 add(uint128 a, uint128 b)
{
uint128 c;
c.L = a.L + b.L; // add low parts
c.H = a.H + b.H + (c.L < a.L); // add high parts and carry
return c;
}
The same thing can be used for 128-bit subtraction
There are also intrinsics for shifting although implementing these is trivial: __shiftleft128(), __shiftright128()
If you're on an unsupported compiler then just use some fixed-width types from many available libraries, that would be much faster. For example ttmath:UInt<4> (a 128-bit int type with four 32-bit limbs), or (u)int128_t in Boost.Multiprecision and calccrypto/uint128_t. An arbitrary-precision arithmetic library like GMP is just too costly for this. One example: Optimization story: Switching from GMP to gcc's __int128 reduced run time by 95%
You may want to check the GNU Multiple Precision Arithmetic Library:
http://gmplib.org/

Resources