Are exact-width integers in Cython actually platform dependent?

Are exact-width integers in Cython actually platform dependent? - c

In Cython one can use exact-width integral types by importing them from stdint, e.g.
from libc.stdint cimport int32_t
Looking through stdint.pxd, we see that int32_t is defined as
cdef extern from "<stdint.h>" nogil:
...
ctypedef signed int int32_t
Does this mean that if I use int32_t in my Cython code, this type is just an alias for signed int (int), which might in fact be only 16 bits wide?
The issue is the same for all the other integral types.

They should be fine.
The typedefs that are really used come from the C stdint.h header, which are almost certainly right.
The Cython typedef
ctypedef signed int int32_t
Is really just so that Cython understands that the type is an integer and that it's signed. It isn't what's actually used in the generated C code. Since it's in a cdef extern block it's telling Cython "a typedef like this exists" rather than acting as the real definition.

No.
On a platform where signed int is not 32bit wide, int32_t would be typedef'ed to a type that actually is 32bit wide.
If no such type is available -- e.g. on a platform where the maximum int width is 16, or where all ints are 64bit, or where CHAR_BIT does not equal 8 -- the exact-width types would not be defined. (Yes, the exact-width types are optional. That is why there are least-width types as well.)
Disclaimer: This is speaking from a purely C perspective. I have no experience with Cython whatsoever. But it would be very surprising (and a bug) if this would not be covered adequately in Cython as well.
And as #JörgWMittag points out in his comment, the alternative is of course to simply not support any platform where signed int isn't 32 bit wide.

Related

Stdint.h in ANSI C (C89)

I have recently become interested in switching from writing C99 code to writing plain ANSI C (C89), as the new features in the language are not worth the extreme portability and reliability of writing it in ANSI C. One of the biggest features I thought I would miss making the transition from C99 to C89 would be the stdint.h standard library file; or so I thought. According to this site, there is no stdint.h file in the C89 standard, which is also what I found on Wikipedia. I wanted to make sure that this was indeed the case, so I wrote a minimal test program that I expected would not compile when providing the flags -ansi and -pedantic-errors in both GCC and Clang;
#include <stdio.h>
#include <stdint.h>
int main(void)
{
printf("The size of an int8_t is %ld.\n", sizeof(int8_t));
printf("The size of an int16_t is %ld.\n", sizeof(int16_t));
printf("The size of an int32_t is %ld.\n", sizeof(int32_t));
printf("The size of an int64_t is %ld.\n", sizeof(int64_t));
printf("The size of a uint8_t is %ld.\n", sizeof(uint8_t));
printf("The size of a uint16_t is %ld.\n", sizeof(uint16_t));
printf("The size of a uint32_t is %ld.\n", sizeof(uint32_t));
printf("The size of a uint64_t is %ld.\n", sizeof(uint64_t));
return 0;
}
However, what I found was not a compiler error, nor a warning, but a program that compiled! Since it worked just as well on either compiler, I'm under the assumption that this is not a bug in the compiler. The output, for reference, was what one would expect for a working C99 implementation:
The size of an int8_t is 1.
The size of an int16_t is 2.
The size of an int32_t is 4.
The size of an int64_t is 8.
The size of a uint8_t is 1.
The size of a uint16_t is 2.
The size of a uint32_t is 4.
The size of a uint64_t is 8.
I have a few questions about this "feature".
Should I be able to rely upon a stdint.h header being provided for a C89 program?
If not, what steps would I have to take in creating a header that functions the same as stdint.h?
How did programmers, in the time before C99, solve this problem of having reliable sizes for integers in their programs in a platform-agnostic manor?

Should I be able to rely upon a stdint.h header being provided for a C89 program?
No. You said you picked C89 for portability reasons and then the first thing you reach for is non-portable extensions...
If not, what steps would I have to take in creating a header that functions the same as stdint.h?
How did programmers, in the time before C99, solve this problem of having reliable sizes for integers in their programs in a platform-agnostic manor?
With a forest of macros like for example in this answer. If C89, you typedef all the names present in stdint.h for the given platform. Otherwise, if standard C, you just include stdint.h.
So you'll need your own "notstdint.h" which contains all of this and then you have to port it to each system where integer sizes are different. And yes, this makes C89 less portable than standard C.

In the days of C89 when writing code for microcomputers, one could simply use:
typedef unsigned char uint8;
typedef signed char int8;
typedef unsigned short uint16;
typedef signed short int16;
typedef unsigned long uint32;
typedef signed long int32;
At the time, int might be either 16 or 32 bits, but all the other types would have the indicated sizes on commonplace implementations for systems that could handle them (including all microcomputers). Compilers for 16-bit systems would allow a pointer that was freshly cast from int* to short* to access objects of type int, so there was no need to worry about whether int16_t should be short or int. Likewise those for 32-bit systems would allow pointer that was freshly cast from int* to long* to access objects of type int, so it wasn't necessary to worry about whether int32_t should be int or long.

Integer types in C

Suppose I wish to write a C program (C99 or C2011) that I want to be completely portable and not tied to a particular architecture.
It seems that I would then want to make a clean break from the old integer types (int, long, short) and friends and use only int8_t, uint8_t, int32_t and so on (perhaps using the the least and fast versions as well).
What then is the return type of main? Or must we strick with int? Is it required by the standard to be int?
GCC-4.2 allows me to write
#include <stdint.h>
#include <stdio.h>
int32_t main() {
printf("Hello\n");
return 0;
}
but I cannot use uint32_t or even int8_t because then I get
hello.c:3: warning: return type of ‘main’ is not ‘int’
This is because of a typedef, no doubt. It seems this is one case where we are stuck with having to use the unspecified size types, since it's not truly portable unless we leave the return type up to the target architecture. Is this interpretation correct? It seems odd to have "just one" plain old int in the code base but I am happy to be pragmatic.

Suppose I wish to write a C program (C99 or C2011) that I want to be
completely portable and not tied to a particular architecture.
It seems that I would then want to make a clean break from the old
integer types (int, long, short) and friends and use only int8_t,
uint8_t, int32_t and so on (perhaps using the the least and fast
versions as well).
These two affirmations, in bold, are contradictory. That's because whether uint32_t, uint8_t and al are available or not is actually implementation-defined (C11, 7.20.1.1/3: Exact-width integer types).
If you want your program to be truly portable, you must use the built-in types (int, long, etc.) and stick to the minimum ranges defined in the C standard (i.e.: C11, 5.2.4.2.1: Sizes of integer types),
Per example, the standard says that both short and int should range from at least -32767 to at least 32767. So if you want to store a bigger or lesser value, say 42000, you'd use a long instead.

The return type of main is required by the Standard to be int in C89, C99 and in C11.
Now the exact-width integer types are alias to integer types. So if you use the right alias for int it will still be valid.
For example:
int32_t main(void)
if int32_t is a typedef to int.

Difference between int32, int, int32_t, int8 and int8_t

I came across the data type int32_t in a C program recently. I know that it stores 32 bits, but don't int and int32 do the same?
Also, I want to use char in a program. Can I use int8_t instead? What is the difference?
To summarize: what is the difference between int32, int, int32_t, int8 and int8_t in C?

Between int32 and int32_t, (and likewise between int8 and int8_t) the difference is pretty simple: the C standard defines int8_t and int32_t, but does not define anything named int8 or int32 -- the latter (if they exist at all) is probably from some other header or library (most likely predates the addition of int8_t and int32_t in C99).
Plain int is quite a bit different from the others. Where int8_t and int32_t each have a specified size, int can be any size >= 16 bits. At different times, both 16 bits and 32 bits have been reasonably common (and for a 64-bit implementation, it should probably be 64 bits).
On the other hand, int is guaranteed to be present in every implementation of C, where int8_t and int32_t are not. It's probably open to question whether this matters to you though. If you use C on small embedded systems and/or older compilers, it may be a problem. If you use it primarily with a modern compiler on desktop/server machines, it probably won't be.
Oops -- missed the part about char. You'd use int8_t instead of char if (and only if) you want an integer type guaranteed to be exactly 8 bits in size. If you want to store characters, you probably want to use char instead. Its size can vary (in terms of number of bits) but it's guaranteed to be exactly one byte. One slight oddity though: there's no guarantee about whether a plain char is signed or unsigned (and many compilers can make it either one, depending on a compile-time flag). If you need to ensure its being either signed or unsigned, you need to specify that explicitly.

The _t data types are typedef types in the stdint.h header, while int is an in built fundamental data type. This make the _t available only if stdint.h exists. int on the other hand is guaranteed to exist.

Always keep in mind that 'size' is variable if not explicitly specified so if you declare
int i = 10;
On some systems it may result in 16-bit integer by compiler and on some others it may result in 32-bit integer (or 64-bit integer on newer systems).
In embedded environments this may end up in weird results (especially while handling memory mapped I/O or may be consider a simple array situation), so it is highly recommended to specify fixed size variables. In legacy systems you may come across
typedef short INT16;
typedef int INT32;
typedef long INT64;
Starting from C99, the designers added stdint.h header file that essentially leverages similar typedefs.
On a windows based system, you may see entries in stdin.h header file as
typedef signed char int8_t;
typedef signed short int16_t;
typedef signed int int32_t;
typedef unsigned char uint8_t;
There is quite more to that like minimum width integer or exact width integer types, I think it is not a bad thing to explore stdint.h for a better understanding.

Where to find the complete definition of off_t type?

I am sending file from client to server using TCP. To mark the end of the file I like to send file size before the actual data. So I use stat system call to find the size of the file. This is of type off_t. I like to know how many bytes it occupies so that I can read it properly on the server side. It is defined in the <sys/types.h>. But I do not understand the definition. It just defines __off_t or _off64_t to be off_t. Where to look for __off_t? Also is it convention that __ is prefixed for most of the things in header files and scares me when I read header files to understand better. How to read a header file better?
#ifndef __off_t_defined
# ifndef __USE_FILE_OFFSET64
typedef __off_t off_t;
# else
typedef __off64_t off_t;
# endif
# define __off_t_defined
#endif

Since this answer still gets voted up, I want to point out that you should almost never need to look in the header files. If you want to write reliable code, you're much better served by looking in the standard. So, the answer to "where can I find the complete definition of off_t" is "in a standard, rather than a header file". Following the standard means that your code will work today and tomorrow, on any machine.
In this case, off_t isn't defined by the C standard. It's part of the POSIX standard, which you can browse here.
Unfortunately, off_t isn't very rigorously defined. All I could find to define it is on the page on sys/types.h:
blkcnt_t and off_t shall be signed integer types.
This means that you can't be sure how big it is. If you're using GNU C, you can use the instructions in the answer below to ensure that it's 64 bits. Or better, you can convert to a standards defined size before putting it on the wire. This is how projects like Google's Protocol Buffers work (although that is a C++ project).
For completeness here's the answer to "which header file defines off_t?":
On my machine (and most machines using glibc) you'll find the definition in bits/types.h (as a comment says at the top, never directly include this file), but it's obscured a bit in a bunch of macros. An alternative to trying to unravel them is to look at the preprocessor output:
#include <stdio.h>
#include <sys/types.h>
int main(void) {
off_t blah;
return 0;
}
And then:
$ gcc -E sizes.c | grep __off_t
typedef long int __off_t;
....
However, if you want to know the size of something, you can always use the sizeof() operator.
Edit: Just saw the part of your question about the __. This answer has a good discussion. The key point is that names starting with __ are reserved for the implementation (so you shouldn't start your own definitions with __).

As the "GNU C Library Reference Manual" says
off_t
This is a signed integer type used to represent file sizes.
In the GNU C Library, this type is no narrower than int.
If the source is compiled with _FILE_OFFSET_BITS == 64 this
type is transparently replaced by off64_t.
and
off64_t
This type is used similar to off_t. The difference is that
even on 32 bit machines, where the off_t type would have 32 bits,
off64_t has 64 bits and so is able to address files up to 2^63 bytes
in length. When compiling with _FILE_OFFSET_BITS == 64 this type
is available under the name off_t.
Thus if you want reliable way of representing file size between client and server, you can:
Use off64_t type and stat64() function accordingly (as it fills structure stat64, which contains off64_t type itself). Type off64_t guaranties the same size on 32 and 64 bit machines.
As was mentioned before compile your code with -D_FILE_OFFSET_BITS == 64 and use usual off_t and stat().
Convert off_t to type int64_t with fixed size (C99 standard).
Note: (my book 'C in a Nutshell' says that it is C99 standard, but optional in implementation). The newest C11 standard says:
7.20.1.1 Exact-width integer types
1 The typedef name intN_t designates a signed integer type with width N ,
no padding bits, and a two’s complement representation. Thus, int8_t
denotes such a signed integer type with a width of exactly 8 bits.
without mentioning.
And about implementation:
7.20 Integer types <stdint.h>
... An implementation shall provide those types described as ‘‘required’’,
but need not provide any of the others (described as ‘‘optional’’).
...
The following types are required:
int_least8_t uint_least8_t
int_least16_t uint_least16_t
int_least32_t uint_least32_t
int_least64_t uint_least64_t
All other types of this form are optional.
Thus, in general, C standard can't guarantee types with fixed sizes. But most compilers (including gcc) support this feature.

If you are having trouble tracing the definitions, you can use the preprocessed output of the compiler which will tell you all you need to know. E.g.
$ cat test.c
#include <stdio.h>
$ cc -E test.c | grep off_t
typedef long int __off_t;
typedef __off64_t __loff_t;
__off_t __pos;
__off_t _old_offset;
typedef __off_t off_t;
extern int fseeko (FILE *__stream, __off_t __off, int __whence);
extern __off_t ftello (FILE *__stream) ;
If you look at the complete output you can even see the exact header file location and line number where it was defined:
# 132 "/usr/include/bits/types.h" 2 3 4
typedef unsigned long int __dev_t;
typedef unsigned int __uid_t;
typedef unsigned int __gid_t;
typedef unsigned long int __ino_t;
typedef unsigned long int __ino64_t;
typedef unsigned int __mode_t;
typedef unsigned long int __nlink_t;
typedef long int __off_t;
typedef long int __off64_t;
...
# 91 "/usr/include/stdio.h" 3 4
typedef __off_t off_t;

If you are writing portable code, the answer is "you can't tell", the good news is that you don't need to. Your protocol should involve writing the size as (eg) "8 octets, big-endian format" (Ideally with a check that the actual size fits in 8 octets.)

Difference between uint and unsigned int?

Is there any difference between uint and unsigned int?
I'm looking in this site, but all questions refer to C# or C++.
I'd like an answer about the C language.
If it is relevant, note that I'm using GCC under Linux.

uint isn't a standard type - unsigned int is.

Some systems may define uint as a typedef.
typedef unsigned int uint;
For these systems they are same. But uint is not a standard type, so every system may not support it and thus it is not portable.

I am extending a bit answers by Erik, Teoman Soygul and taskinoor
uint is not a standard.
Hence using your own shorthand like this is discouraged:
typedef unsigned int uint;
If you look for platform specificity instead (e.g. you need to specify the number of bits your int occupy), including stdint.h:
#include <stdint.h>
will expose the following standard categories of integers:
Integer types having certain exact widths
Integer types having at least certain specified widths
Fastest integer types having at least certain specified widths
Integer types wide enough to hold pointers to objects
Integer types having greatest width
For instance,
Exact-width integer types
The typedef name int N _t designates a signed integer type with width
N, no padding bits, and a two's-complement representation. Thus,
int8_t denotes a signed integer type with a width of exactly 8 bits.
The typedef name uint N _t designates an unsigned integer type with
width N. Thus, uint24_t denotes an unsigned integer type with a width
of exactly 24 bits.
defines
int8_t
int16_t
int32_t
uint8_t
uint16_t
uint32_t

All of the answers here fail to mention the real reason for uint.
It's obviously a typedef of unsigned int, but that doesn't explain its usefulness.
The real question is,
Why would someone want to typedef a fundamental type to an abbreviated
version?
To save on typing?
No, they did it out of necessity.
Consider the C language; a language that does not have templates.
How would you go about stamping out your own vector that can hold any type?
You could do something with void pointers,
but a closer emulation of templates would have you resorting to macros.
So you would define your template vector:
#define define_vector(type) \
typedef struct vector_##type { \
impl \
};
Declare your types:
define_vector(int)
define_vector(float)
define_vector(unsigned int)
And upon generation, realize that the types ought to be a single token:
typedef struct vector_int { impl };
typedef struct vector_float { impl };
typedef struct vector_unsigned int { impl };

The unsigned int is a built in (standard) type so if you want your project to be cross-platform, always use unsigned int as it is guarantied to be supported by all compilers (hence being the standard).

The uint is a possible and proper abbreviation for unsigned int. It is better readable. But: It is not C standard. You can define and use it (as all other defines) to your own responsibiity.
But unfortunately some system headers define uint too. I have found in a sys/types.h from a currently compiler (ARM):
# ifndef _POSIX_SOURCE
//....
typedef unsigned short ushort; /* System V compatibility */
typedef unsigned int uint; /* System V compatibility */
typedef unsigned long ulong; /* System V compatibility */
# endif /*!_POSIX_SOURCE */
It seems to be a concession for familiary sources programmed as Unix System V standard. To switch off this undesired behaviour (because I want to
#define uint unsigned int
by myself, I have set firstly
#define _POSIX_SOURCE
A system's header must not define things which is not standard. But there are many things which are defined there, unfortunately.
See also on my web page https://www.vishia.org/emc/html/Base/int_pack_endian.html#truean-uint-problem-admissibleness-of-system-definitions resp. https://www.vishia.org/emc.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight