-ansi -pedantic 64 bits length integers in C - c

I would like to know if there is an equivalent to int64_t in C that would work on 32 and 64 bits platforms and that is ansi and pedantic gcc modes compliant.
I found this interesting post, but it relates on C++.
I tried to used long long but i get an integer overflow in expression [-WOverflow] error. Moreover long long is not supported by ISO C90.
I also tried what is suggested in this post, but i still have a -WOverflow error with using int64_t
Any solutions ?

In C89 (required by -ansi flag), there is no standard way to use a 64 bits integer. You have to rely on the types provided by your implementation.
In C99, some implementations may define int64_t, since it is an optional type. As for long long (C99), there is no guarantee that its width is exactly of 64 bits.

Related

Are there any well-established/standardized ways to use fixed-width integers in C89?

Some background:
the header stdint.h is part of the C standard since C99. It includes typedefs that are ensured to be 8, 16, 32, and 64-bit long integers, both signed and unsigned. This header is not part of the C89 standard, though, and I haven't yet found any straightforward way to ensure that my datatypes have a known length.
Getting to the actual topic
The following code is how SQLite (written in C89) defines 64-bit integers, but I don't find it convincing. That is, I don't think it's going to work everywhere. Worst of all, it could fail silently:
/*
** CAPI3REF: 64-Bit Integer Types
** KEYWORDS: sqlite_int64 sqlite_uint64
**
** Because there is no cross-platform way to specify 64-bit integer types
** SQLite includes typedefs for 64-bit signed and unsigned integers.
*/
#ifdef SQLITE_INT64_TYPE
typedef SQLITE_INT64_TYPE sqlite_int64;
typedef unsigned SQLITE_INT64_TYPE sqlite_uint64;
#elif defined(_MSC_VER) || defined(__BORLANDC__)
typedef __int64 sqlite_int64;
typedef unsigned __int64 sqlite_uint64;
#else
typedef long long int sqlite_int64;
typedef unsigned long long int sqlite_uint64;
#endif
typedef sqlite_int64 sqlite3_int64;
typedef sqlite_uint64 sqlite3_uint64;
So, this is what I've been doing so far:
Checking that the "char" data type is 8 bits long, since it's not guaranteed to be. If the preprocessor variable "CHAR_BIT" is not equal to 8, compilation fails
Now that "char" is guaranteed to be 8 bits long, I create a struct containing an array of several unsigned chars, which correspond to several bytes in the integer.
I write "operator" functions for my datatypes. Addition, multiplication, division, modulo, conversion from/to string, etc.
I have abstracted this process in a header file, which is the best I can do with what I know, but I wonder if there is a more straightforward way to achieve this.
I'm asking because I want to write a portable C library.
First, you should ask yourself whether you really need to support implementations that don't provide <stdint.h>. It was standardized in 1999, and even many pre-C99 implementations are likely to provide it as an extension.
Assuming you really need this, Doug Gwyn, a member of the ISO C standard committee, created an implementation of several of the new headers for C9x (as C99 was then known), compatible with C89/C90. The headers are in the public domain and should be reasonably portable.
http://www.lysator.liu.se/(nobg)/c/q8/index.html
(As I understand it, the name "q8" has no particular meaning; he just chose it as a reasonably short and unique search term.)
One rather nasty quirk of integer types in C stems from the fact that many "modern" implementations will have, for at least one size of integer, two incompatible signed types of that size with the same bit representation and likewise two incompatible unsigned types. Most typically the types will be 32-bit "int" and "long", or 64-bit "long" and "long long". The "fixed-sized" types will typically alias to one of the standard types, though implementations are not consistent about which one.
Although compilers used to assume that accesses to one type of a given size might affect objects of the other, the authors of the Standard didn't mandate that they do so (probably because there would have been no point ordering people to do things they would do anyway and they couldn't imagine any sane compiler writer doing otherwise; once compilers started doing so, it was politically difficult to revoke that "permission"). Consequently, if one has a library which stores data in a 32-bit "int" and another which reads data from a 32-bit "long", the only way to be assured of correct behavior is to either disable aliasing analysis altogether (probably the sanest choice while using gcc) or else add gratuitous copy operations (being careful that gcc doesn't optimize them out and then use their absence as an excuse to break code--something it sometimes does as of 6.2).

Is SSE2 signed integer overflow undefined?

Signed integer overflow is undefined in C and C++. But what about signed integer overflow within the individual fields of an __m128i? In other words, is this behavior defined in the Intel standards?
#include <inttypes.h>
#include <stdio.h>
#include <stdint.h>
#include <emmintrin.h>
union SSE2
{
__m128i m_vector;
uint32_t m_dwords[sizeof(__m128i) / sizeof(uint32_t)];
};
int main()
{
union SSE2 reg = {_mm_set_epi32(INT32_MAX, INT32_MAX, INT32_MAX, INT32_MAX)};
reg.m_vector = _mm_add_epi32(reg.m_vector, _mm_set_epi32(1, 1, 1, 1));
printf("%08" PRIX32 "\n", (uint32_t) reg.m_dwords[0]);
return 0;
}
[myria#polaris tests]$ gcc -m64 -msse2 -std=c11 -O3 sse2defined.c -o sse2defined
[myria#polaris tests]$ ./sse2defined
80000000
Note that the 4-byte-sized fields of an SSE2 __m128i are considered signed.
You are asking about a specific implementation issue (using SSE2) and not about the standard. You've answered your own question "signed integer overflow is undefined in C".
When you are dealing with c intrinsics you aren't even programming in C! These are inserting assembly instructions in line. It is doing it in a some what portable way, but it is no longer true that your data is a signed integer. It is a vector type being passed to an SSE intrinsic. YOU are then casting that to an integer and telling C that you want to see the result of that operation. Whatever bytes happen to be there when you cast is what you will see and has nothing to do with signed arithmetic in the C standard.
Things are a bit different if the compiler inserts SSE instructions (say in a loop). Now the compiler is guaranteeing that the result is the same as a signed 32 bit operation ... UNLESS there is undefined behaviour (e.g. an overflow) in which case it can do whatever it likes.
Note also that undefined doesn't mean unexpected ... whatever behaviour your observe for auto-vectorization might be consistent and repeatable (maybe it does always wrap on your machine ... that might not be true with all cases for surrounding code, or all compilers. Or if the compiler selects different instructions depending on availability of SSSE3, SSE4, or AVX*, possibly not even all processors if it makes different code-gen choices for different instruction-sets that do or don't take advantage of signed overflow being UB).
EDIT:
Okay, well now that we are asking about "the Intel standards" (which don't exist, I think you mean the x86 standards), I can add something to my answer. Things are a little bit convoluted.
Firstly, the intrinsic _mm_add_epi32 is defined by Microsoft to match Intel's intrinsics API definition (https://software.intel.com/sites/landingpage/IntrinsicsGuide/ and the intrinsic notes in Intel's x86 assembly manuals). They cleverly define it as doing to a __m128i the same thing the x86 PADDD instruction does to an XMM register, with no more discussion (e.g. is it a compile error on ARM or should it be emulated?).
Secondly, PADDD isn't only a signed addition! It is a 32 bit binary add. x86 uses two's complement for signed integers, and adding them is the same binary operation as unsigned base 2. So yes, paddd is guaranteed to wrap. There is a good reference for all the x86 instructions here.
So what does that mean: again, the assumption in your question is flawed because there isn't even any overflow. So the output you see should be defined behaviour. Note that it is defined by Microsoft and x86 (not by the C Standard).
Other x86 compilers also implement Intel's intrinsics API the same way, so _mm_add_epi32 is portably guaranteed to just wrap.
This isn't "signed integer overflow within the fields of an __m128i". This is a function call. (Being a compiler intrinsic is just an optimization, much like inlining, and that doesn't interact with the C standard as long as the as-if rule is respected)
Its behavior must follow the contract (preconditions, postconditions) that the function developer documented. Usually intrinsics are documented by the compiler vendor, although they tend to coordinate the naming and contract of intrinsics to aid in porting code.

Should enum never be used in an API?

I am using a C library provided to me already compiled. I have limited information on the compiler, version, options, etc., used when compiling the library. The library interface uses enum both in structures that are passed and directly as passed parameters.
The question is: how can I assure or establish that when I compile code to use the provided library, that my compiler will use the same size for those enums? If it does not, the structures won't line up, and the parameter passing may be messed up, e.g. long vs. int.
My concern stems from the C99 standard, which states that the enum type:
shall be compatible with char, a signed integer type, or an unsigned
integer type. The choice of type is implementation-defined, but shall
be capable of representing the values of all the members of the
enumeration.
As far as I can tell, so long as the largest value fits, the compiler can pick any type it darn well pleases, effectively on a whim, potentially varying not only between compilers, but different versions of the same compiler and/or compiler options. It could pick 1, 2, 4, or 8-byte representations, resulting in potential incompatibilities in both structures and parameter passing. (It could also pick signed or unsigned, but I don't see a mechanism for that being a problem in this context.)
Am I missing something here? If I am not missing something, does this mean that enum should never be used in an API?
Update:
Yes, I was missing something. While the language specification doesn't help here, as noted by #Barmar the Application Binary Interface (ABI) does. Or if it doesn't, then the ABI is deficient. The ABI for my system indeed specifies that an enum must be a signed four-byte integer. If a compiler does not obey that, then it is a bug. Given a complete ABI and compliant compilers, enum can be used safely in an API.
APIs that use enum are depending on the assumption that the compiler will be consistent, i.e. given the same enum declaration, it will always choose the same underlying type.
While the language standard doesn't specifically require this, it would be quite perverse for a compiler to do anything else.
Furthermore, all compilers for a particular OS need to be consistent with the OS's ABI. Otherwise, you would have far more problems, such as the library using 64-bit int while the caller uses 32-bit int. Ideally, the ABI should constrain the representation of enums, to ensure compatibility.
More generally, the language specification only ensures compatibility between programs compiled with the same implementation. The ABI ensures compatibility between programs compiled with different implementations.
From the question:
The ABI for my system indeed specifies that an enum must be a signed four-byte integer. If a compiler does not obey that, then it is a bug.
I'm surprised about that. I suspect in reality you're compiler will select a 64-bit (8 byte) size for your enum if you define an enumerated constant with a value larger that 2^32.
On my platforms (MinGW gcc 4.6.2 targeting x86 and gcc 4,.4 on Linux targeting x86_64), the following code says that I get both 4 and 8 byte enums:
#include <stdio.h>
enum { a } foo;
enum { b = 0x123456789 } bar;
int main(void) {
printf("%lu\n", sizeof(foo));
printf("%lu", sizeof(bar));
return 0;
}
I compiled with -Wall -std=c99 switches.
I guess you could say that this is a compiler bug. But the alternatives of removing support for enumerated constants larger than 2^32 or always using 8-byte enums both seem undesirable.
Given that these common versions of GCC don't provide a fixed size enum, I think the only safe action in general is to not use enums in APIs.
Further notes for GCC
Compiling with "-pedantic" causes the following warnings to be generated:
main.c:4:8: warning: integer constant is too large for 'long' type [-Wlong-long]
main.c:4:12: warning: ISO C restricts enumerator values to range of 'int' [-pedantic]
The behavior can be tailored via the --short-enums and --no-short-enums switches.
Results with Visual Studio
Compiling the above code with VS 2008 x86 causes the following warnings:
warning C4341: 'b' : signed value is out of range for enum constant
warning C4309: 'initializing' : truncation of constant value
And with VS 2013 x86 and x64, just:
warning C4309: 'initializing' : truncation of constant value

128-bit arithmetic on x64 in C

When implementing bignums on x86, obviously the most efficient choice for digit size is 32 bits. However, you need arithmetic up to twice the digit size (i.e. 32+32=33, 32*32=64, 64/32=32). Fortunately, not only does x86 provide this, but it's also accessible from portable C (uint64_t).
Similarly, on x64 it would be desirable to use 64-bit digits. This would require 128 bit arithmetic (i.e. 64+64=65, 64*64=128, 128/64=64). Fortunately, x64 provides this. Unfortunately, it's not accessible from portable C, though obviously one could dip into assembly.
So my question is whether it's accessible from non-portable C. Do any C compilers on x64 provide access to this, and if so, what's the syntax?
(Note that I'm not talking about 128-bit vectors that are strictly treated as collections of 32 or 64-bit words with no carry propagation between them, but about actual 128-bit integer operations.)
GCC 4.1 introduced initial 128-bit integer support with the __int128_t and __uint128_t built-in types but 128-bit type was officially released since GCC 4.6 as __int128 / unsigned __int128
Clang also supports those types although I don't know since when. The first version on Godbolt (3.0.0) does support __int128_t though
ICC gained the same support since version 13.0.0: 128-bit integers supporting +, -, *, /, and % in the Intel C Compiler?
See also
Is there a 128 bit integer in gcc?
What gcc versions support the __int128 intrinsic type?
If you're on MSVC then there's no direct support for a 128-bit type but there are many intrinsics helping you do 128-bit operations:
64*64=128: _mul128(), _umul128(), __mulh(), __umulh()
128/64=64: _div128(), _udiv128()
64+64=65: The carry in an addition can be easily obtained by comparing the low part of the sum with any of the operands:
struct uint128 {
uint64_t H, L;
};
inline uint128 add(uint128 a, uint128 b)
{
uint128 c;
c.L = a.L + b.L; // add low parts
c.H = a.H + b.H + (c.L < a.L); // add high parts and carry
return c;
}
The same thing can be used for 128-bit subtraction
There are also intrinsics for shifting although implementing these is trivial: __shiftleft128(), __shiftright128()
If you're on an unsupported compiler then just use some fixed-width types from many available libraries, that would be much faster. For example ttmath:UInt<4> (a 128-bit int type with four 32-bit limbs), or (u)int128_t in Boost.Multiprecision and calccrypto/uint128_t. An arbitrary-precision arithmetic library like GMP is just too costly for this. One example: Optimization story: Switching from GMP to gcc's __int128 reduced run time by 95%
You may want to check the GNU Multiple Precision Arithmetic Library:
http://gmplib.org/

How to Declare a 32-bit Integer in C

What's the best way to declare an integer type which is always 4 byte on any platforms? I don't worry about certain device or old machines which has 16-bit int.
#include <stdint.h>
int32_t my_32bit_int;
C doesn't concern itself very much with exact sizes of integer types, C99 introduces the header stdint.h , which is probably your best bet. Include that and you can use e.g. int32_t. Of course not all platforms might support that.
Corey's answer is correct for "best", in my opinion, but a simple "int" will also work in practice (given that you're ignoring systems with 16-bit int). At this point, so much code depends on int being 32-bit that system vendors aren't going to change it.
(See also why long is 32-bit on lots of 64-bit systems and why we have "long long".)
One of the benefits of using int32_t, though, is that you're not perpetuating this problem!
You could hunt down a copy of Brian Gladman's brg_types.h if you don't have stdint.h.
brg_types.h will discover the sizes of the various integers on your platform and will create typedefs for the common sizes: 8, 16, 32 and 64 bits.
You need to include inttypes.h instead of stdint.h because stdint.h is not available on some platforms such as Solaris, and inttypes.h will include stdint.h for you on systems such as Linux.
If you include inttypes.h then your code is more portable between Linux and Solaris.
This link explains what I'm saying:
HP link about inttypes.h
And this link has a table showing why you don't want to use long or int if you have an intention of a certain number of bits being present in your data type.
IBM link about portable data types
C99 or later
Use <stdint.h>.
If your implementation supports 2's complement 32-bit integers then it must define int32_t.
If not then the next best thing is int_least32_t which is an integer type supported by the implementation that is at least 32 bits, regardless of representation (two's complement, one's complement, etc.).
There is also int_fast32_t which is an integer type at least 32-bits wide, chosen with the intention of allowing the fastest operations for that size requirement.
ANSI C
You can use long, which is guaranteed to be at least 32-bits wide as a result of the minimum range requirements specified by the standard.
If you would rather use the smallest integer type to fit a 32-bit number, then you can use preprocessor statements like the following with the macros defined in <limits.h>:
#define TARGET_MAX 2147483647L
#if SCHAR_MAX >= TARGET_MAX
typedef signed char int32;
#elif SHORT_MAX >= TARGET_MAX
typedef short int32;
#elif INT_MAX >= TARGET_MAX
typedef int int32;
#else
typedef long int32;
#endif
#undef TARGET_MAX
If stdint.h is not available for your system, make your own. I always have a file called "types.h" that have typedefs for all the signed/unsigned 8, 16, and 32 bit values.
You can declare 32 bits with signed or unsigned long.
int32_t variable_name;
uint32_t variable_name;
also depending on your target platforms you can use autotools for your build system
it will see if stdint.h/inttypes.h exist and if they don't will create appropriate typedefs in a "config.h"
stdint.h is the obvious choice, but it's not necessarily available.
If you're using a portable library, it's possible that it already provides portable fixed-width integers.
For example, SDL has Sint32 (S is for “signed”), and GLib has gint32.

Resources