Difference between int32_t and int_fast32_t [duplicate] - c

This question already has answers here:
What is the difference between intXX_t and int_fastXX_t?
(4 answers)
Closed 9 years ago.
What is the difference between the two? I know that int32_t is exactly 32 bits regardless of the environment but, as its name suggests that it's fast, how much faster can int_fast32_t really be compared to int32_t? And if it's significantly faster, then why so?

C is specified in terms of an idealized, abstract machine. But real-world hardware has behavioural characteristics that are not captured by the language standard. The _fast types are type aliases that allow each platform to specify types which are "convenient" for the hardware.
For example, if you had an array of 8-bit integers and wanted to mutate each one individually, this would be rather inefficient on contemporary desktop machines, because their load operations usually want to fill an entire processor register, which is either 32 or 64 bit wide (a "machine word"). So lots of loaded data ends up wasted, and more importantly, you cannot parallelize the loading and storing of two adjacent array elements, because they live in the same machine word and thus need to be load-modify-stored sequentially.
The _fast types are usually as wide as a machine word, if that's feasible. That is, they may be wider than you need and thus consume more memory (and thus are harder to cache!), but your hardware may be able to access them faster. It all depends on the usage pattern, though. (E.g. an array of int_fast8_t would probably be an array of machine words, and a tight loop modifying such an array may well benefit significantly.)
The only way to find out whether it makes any difference is to compare!

int32_t is an integer which is exactly 32bits. It is useful if you want for example to create a struct with an exact memory placement.
int_fast32_t is the "fastest" integer for your current processor that is at last bigger or equal to an int32_t. I don't know if there is really a gain for current processors (x86 or ARM)
But I can at last outline a real case : I used to work with a 32bits PowerPC processor. When accessing misaligned 16bits int16_t, it was inefficient for it has to first realign them in one of its 32bits registers. For non memory-mapped data, since we didn't have memory restrictions, it was more efficient to use int_fast16_t (which were in fact 32bits int).

Related

Why are programming languages strictly reliant on primitive types, and is assembly more flexible? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
Why don't we have data types that are 4 bits in size? Why can't we make them if we are so inclined? I have seen bit-fields, but I have heard they are not portable, and perhaps not used as well? I think this is a result of how the machine interprets the place-value of the bit's location. (big-endian, little-endian)
typedef struct
{
int b1 : 1;
int b2 : 1;
..
..
..
int b32 : 1;
} bitfield32;
We also cannot make a bit-field like this which is bigger than any primitive type. So why the restrictions? Can it be done in assembly?
Why don't we have data types that are 4 bits in size?
Who said that? The Intel 4004 CPU had 4-bit registers and 4-bit memory operands. 4-bit data types are natural for this CPU since they're supported directly.
The Intel 8051 CPU can manipulate single bits of memory directly and so you can have 1-bit variables on it.
These are just two examples of CPUs where data types can be very small, smaller than now ubiquitous 8-bit bytes.
Why can't we make them if we are so inclined?
We can. You can either make a CPU with direct support for 4-bit data types or you can simulate 4-bit variables as parts of larger variables.
You could pack 2 4-bit variables into an 8-bit byte. The problem with this approach is that you need to use more instructions to extract 4 bits from 8-bit registers or memory locations (you need a shift and a mask (AND) instruction for this) and similarly you need more instructions to properly save 4-bit values into the halves of an 8-bit byte (load 8 bits, clear the old value in the 4-bit half (with mask/AND), shift the new value, combine it with the rest and save back). Obviously, this would negatively affect the code size of your program and its speed. Besides, 4-bit variables aren't very useful since they can hold so little information. For these reasons simulation of smaller types isn't very popular.
I have seen bit-fields, but I have heard they are not portable, and perhaps not used as well?
They are used. They exist precisely because in some (but not all) applications they are very useful. If you have many short variables, it may be advantageous to pack several of them into a byte or a machine word and thus reduce memory waste.
They aren't entirely unportable. They have limited portability in C and C++ because the language standards do not define precisely the location and layout of bitfields in the larger data types. That's done intentionally to allow compiler writers to utilize CPU features most effectively when working with bit-fields.
6.7.2.1 Structure and union specifiers clause 10 of the C standard from 1999 says this:
An implementation may allocate any addressable storage unit large enough to hold a bitfield. If enough space remains, a bit-field that immediately follows another bit-field in a structure shall be packed into adjacent bits of the same unit. If insufficient space remains, whether a bit-field that does not fit is put into the next unit or overlaps adjacent units is implementation-defined. The order of allocation of bit-fields within a unit (high-order to low-order or low-order to high-order) is implementation-defined. The alignment of the addressable storage unit is unspecified.
I think this is a result of how the machine interprets the place-value of the bit's location. (big-endian, little-endian)
Yes, that's part of the reason. But that's in no way specific to bit-fields exclusively. Regular, non-bit-field types have this problem as well.
We also cannot make a bit-field like this which is bigger than any primitive type. So why the restrictions? Can it be done in assembly?
If your C (or C++) compiler (or interpreter) can simulate types bigger than those supported by the CPU directly, you can have even 256-bit bit-fields.
But again, if the CPU's largest type supported directly is 32-bit, it will mean that the use of larger types (whether bit-fields or not) like 128-bit ones is going to incur more code and certain performance penalties since a single CPU instruction (or just a couple of them) won't be able to manipulate with such big data values. You'll need more instructions and more time to execute those extra instructions.
Yes, this can be done in assembly. Anything can be done in assembly so long as you're willing to write the code and make it work. :)
Why don't we have data types that are 4 bits in size?
Because the smallest numbers that modern processors like x86 can work on is 8 bits.
Why can't we make them if we are so inclined?
You can, but they would be much slower because you'd have to do more work than normal, since the processors don't support it.
Can it be done in assembly?
Assembly is just a bunch of instructions for you processor, so you could only do it efficiently if your processor supports it.
A lot of this will make sense if you actually study digital design; until then, you'll have to just take these restrictions as granted.

Fastest integer type for common architectures

The stdint.h header lacks an int_fastest_t and uint_fastest_t to correspond with the {,u}int_fastX_t types. For instances where the width of the integer type does not matter, how does one pick the integer type that allows processing the greatest quantity of bits with the least penalty to performance? For example, if one was searching for the first set bit in a buffer using a naive approach, a loop such as this might be considered:
// return the bit offset of the first 1 bit
size_t find_first_bit_set(void const *const buf)
{
uint_fastest_t const *p = buf; // use the fastest type for comparison to zero
for (; *p == 0; ++p); // inc p while no bits are set
// return offset of first bit set
return (p - buf) * sizeof(*p) * CHAR_BIT + ffsX(*p) - 1;
}
Naturally, using char would result in more operations than int. But long long might result in more expensive operations than the overhead of using int on a 32 bit system and so on.
My current assumption is for the mainstream architectures, the use of long is the safest bet: It's 32 bit on 32 bit systems, and 64 bit on 64 bit systems.
int_fast8_t is always the fastest integer type in a correct implementation. There can never be integer types smaller than 8 bits (because CHAR_BIT>=8 is required), and since int_fast8_t is the fastest integer type with at least 8 bits, it's thus the fastest integer type, period.
Theoretically, int is the best bet. It should map to the CPU's native register size, and thus be "optimal" in the sense you're asking about.
However, you may still find that an int-64 or int-128 is faster on some CPUs than an int-32, because although these are larger than the register size, they will reduce the number of iterations of your loop, and thus may work out more efficient by minimising the loop overheads and/or taking advantage of DMA to load/store the data faster.
(For example, on ARM-2 processors it took 4 memory cycles to load one 32-bit register, but only 5 cycles to load two sequentially, and 7 cycles to load 4 sequentially. The routine you suggest above would be optimised to use as many registers as you could free up (8 to 10 usually), and could therefore run up to 3 or 4 times faster by using multiple registers per loop iteration)
The only way to be sure is to write several routines and then profile them on the specific target machine to find out which produces the best performance.
I'm not sure I really understand the question, but why aren't you just using int? Quoting from my (free draft copy of the wrong, i. e. C++) standard, "Plain ints have the natural size suggested by the architecture of the execution environment."
But I think that if you want to have the optimal integer type for a certain operation, it will be different depending on which operation it is. Trying to find the first bit in a large data buffer, or finding a number in a sequence of integers, or moving them around, could very well have completely different optimal types.
EDIT:
For whatever it's worth, I did a small benchmark. On my particular system (Intel i7 920 with Linux, gcc -O3) it turns out that long ints (64 bits) are quite a bit faster than plain ints (32 bits), on this particular example. I would have guessed the opposite.
If you want to be certain you've got the fastest implementation, why not benchmark each one on the systems you're expecting to run on instead of trying to guess?
The answer is int itself. At least in C++, where 3.9.1/2 of the standard says:
Plain ints have the natural size
suggested by the architecture of the
execution environment
I expect the same is true for C, though I don't have any of the standards documents.
I would guess that the types size_t (for an unsigned type) and ptrdiff_t (for a signed type) will usually correspond to quite efficient integer types on any given platform.
But nothing can prove that than inspecting the produced assembler and to do benchmarks.
Edit, including the different comments, here and in other replies:
size_t and ptrdiff_t are the only typedefs that are normative in C99 and for which one may make a reasonable assumption that they are related to the architecture.
There are 5 different possible ranks for standard integer types (char, short, int, long, long long). All the forces go towards having types of width 8, 16, 32, 64 and in near future 128. As a consequence int will be stuck on 32 bit. Its definition will have nothing to do with efficiency on the platform, but just be constrained by that width requirement.
If you're compiling with gcc, i'd recommend using __builtin_ffs() for finding the first bit set:
Built-in Function: int __builtin_ffs (unsigned int x)
Returns one plus the index of the least significant 1-bit of x, or if x is zero, returns zero.
This will be compiled into (often a single) native assembly instruction.
It is not possible to answer this question since the question is incomplete. As an analogy, consider the question:
What is the fastest vehicle
A Bugatti Veyron? Certainly fast, but no good for going from London to New York.
What is missing from the question, is the context the integer will be used in. In the original example above, I doubt you'd see much difference between 8, 32 or 64 bit values if the array is large and sparse since you'll be hitting memory bandwidth limits before cpu limits.
The main point is, the architecture does not define what size the various integer types are, it's the compiler designer that does that. The designer will carefully weigh up the pros and cons for various sizes for each type for a given architecture and pick the most appropriate.
I guess the 32 bit int on the 64 bit system was chosen because for most operations ints are used for 32 bits are enough. Since memory bandwidth is a limiting factor, saving on memory use was probably the overriding factor.
For all existing mainstream architectures long is the fastest type at present for loop throughput.

Determine word size of my processor

How do I determine the word size of my CPU? If I understand correct an int should be one word right? I'm not sure if I am correct.
So should just printing sizeof(int) would be enough to determine the word size of my processor?
Your assumption about sizeof(int) is untrue; see this.
Since you must know the processor, OS and compiler at compilation time, the word size can be inferred using predefined architecture/OS/compiler macros provided by the compiler.
However while on simpler and most RISC processors, word size, bus width, register size and memory organisation are often consistently one value, this may not be true to more complex CISC and DSP architectures with various sizes for floating point registers, accumulators, bus width, cache width, general purpose registers etc.
Of course it begs the question why you might need to know this? Generally you would use the type appropriate to the application, and trust the compiler to provide any optimisation. If optimisation is what you think you need this information for, then you would probably be better off using the C99 'fast' types. If you need to optimise a specific algorithm, implement it for a number of types and profile it.
an int should be one word right?
As I understand it, that depends on the data size model. For an explanation for UNIX Systems, 64-bit and Data Size Neutrality. For example Linux 32-bit is ILP32, and Linux 64-bit is LP64. I am not sure about the difference across Window systems and versions, other than I believe all 32-bit Window systems are ILP32.
How do I determine the word size of my CPU?
That depends. Which version of C standard are you assuming. What platforms are we talking. Is this a compile or run time determination you're trying to make.
The C header file <limits.h> may defines WORD_BIT and/or __WORDSIZE.
sizeof(int) is not always the "word" size of your CPU. The most important question here is why you want to know the word size.... are you trying to do some kind of run-time and CPU specific optimization?
That being said, on Windows with Intel processors, the nominal word size will be either 32 or 64 bits and you can easily figure this out:
if your program is compiled for 32-bits, then the nominal word size is 32-bits
if you have compiled a 64-bit program then then the nominal word size is 64-bits.
This answer sounds trite, but its true to the first order. But there are some important subtleties. Even though the x86 registers on a modern Intel or AMD processor are 64-bits wide; you can only (easily) use their 32-bit widths in 32-bit programs - even though you may be running a 64-bit operating system. This will be true on Linux and OSX as well.
Moreover, on most modern CPU's the data bus width is wider than the standard ALU registers (EAX, EBX, ECX, etc). This bus width can vary, some systems have 128 bit, or even 192 bit wide busses.
If you are concerned about performance, then you also need to understand how the L1 and L2 data caches work. Note that some modern CPU's have an L3 cache. Caches including a unit called the Write Buffer
Make a program that does some kind of integer operation many times, like an integer version of the SAXPY algorithm. Run it for different word sizes, from 8 to 64 bits (i.e. from char to long long).
Measure the time each version spends while running the algorithm. If there is one specific version that lasts noticeably less than the others, the word size used for that version is probably the native word size of your computer. On the other way, if there are several versions that last more or less the same time, pick up the one which has the greater word size.
Note that even with this technique you can get false data: your benchmark, compiled using Turbo C and running on a 80386 processor through DOS will report that the word size is 16 bits, just because the compiler doesn't use the 32-bit registers to perform integer aritmetic, but calls to internal functions that do the 32-bit version of each aritmetic operation.
"Additionally, the size of the C type long is equal to the word size, whereas the size of the int type is sometimes less than that of the word size. For example, the Alpha has a 64-bit word size. Consequently, registers, pointers, and the long type are 64 bits in length."
source: http://books.msspace.net/mirrorbooks/kerneldevelopment/0672327201/ch19lev1sec2.html
Keeping this in mind, the following program can be executed to find out the word size of the machine you're working on-
#include <stdio.h>
int main ()
{
long l;
short s = (8 * sizeof(l));
printf("Word size of this machine is %hi bits\n", s);
return 0;
}
In short: There's no good way. The original idea behind the C data types was that int would be the fastest (native) integer type, long the biggest etc.
Then came operating systems that originated on one CPU and were then ported to different CPUs whose native word size was different. To maintain source code compatibility, some of the OSes broke with that definition and kept the data types at their old sizes, and added new, non-standard ones.
That said, depending on what you actually need, you might find some useful data types in stdint.h, or compiler-specific or platform-specific macros for various purposes.
To use at compile time: sizeof(void*)
What every may be the reason for knowing the size of the processor it don't matter.
The size of the processor is the amount of date that Arthematic Logic Unit(ALU) of One CPU Core can work on at a single point of time. A CPU Cores's ALU will on Accumulator Register at any time. So, The size of a CPU in bits is the the size of Accumulator Register in bits.
You can find the size of the accumulator from the data sheet of the processor or by writing a small assembly language program.
Note that the effective usable size of Accumulator Register can change in some processors (like ARM) based on mode of operations (Thumb and ARM modes). That means the size of the processor will also change based on the mode for that processors.
It common in many architectures to have virtual address pointer size and integer size same as accumulator size. It is only to take advantage of Accumulator Register in different processor operations but it is not a hard rule.
Many thinks of memory as an array of bytes. But CPU has another view of it. Which is about memory granularity. Depending on architecture, there would be 2, 4, 8, 16 or even 32 bytes memory granularity. Memory granularity and address alignment have great impact on performance, stability and correctness of software. Consider a granularity of 4 bytes and an unaligned memory access to read in 4 bytes. In this case every read, 75% if address is increasing by one byte, takes two more read instructions plus two shift operations and finally a bitwise instruction for final result which is performance killer. Further atomic operations could be affected as they must be indivisible. Other side effects would be caches, synchronization protocols, cpu internal bus traffic, cpu write buffer and you guess what else. A practical test could be run on a circular buffer to see how the results could be different. CPUs from different manufacturers, based on model, have different registers which will be used in general and specific operations. For example modern CPUs have extensions with 128 bits registers. So, the word size is not only about type of operation but memory granularity. Word size and address alignment are beasts which must be taken care about. There are some CPUs in market which does not take care of address alignment and simply ignore it if provided. And guess what happens?
As others have pointed out, how are you interested in calculating this value? There are a lot of variables.
sizeof(int) != sizeof(word). the size of byte, word, double word, etc have never changed since their creation for the sake of API compatibility in the windows api world at least. Even though a processor word size is the natural size an instruction can operate on. For example, in msvc/cpp/c#, sizeof(int) is four bytes. Even in 64bit compilation mode. Msvc/cpp has __int64 and c# has Int64/UInt64(non CLS compliant) ValueType's. There are also type definitions for WORD DWORD and QWORD in the win32 API that have never changed from two bytes, four bytes, and eight bytes respectively. As well as UINT/INT_PTR on Win32 and UIntPtr/IntPtr on c# that are guranteed to be big enough to represent a memory address and a reference type respectively. AFAIK, and I could be wrong if arch's still exist, I don't think anyone has to deal with, nor do, near/far pointers exist anymore, so if you're on c/cpp/c#, sizeof(void*) and Unsafe.SizeOf{IntPtr}() would be enough to determine your maximum "word" size I would think in a compliant cross-platform way, and if anyone can correct that, please do so! Also, sizes of intrinsic types in c/cpp are vague in size definition.
C data type sizes - Wikipedia

Word and Double Word integers in C

I am trying to implement a simple, moderately efficient bignum library in C. I would like to store digits using the full register size of the system it's compiled on (presumably 32 or 64-bit ints). My understanding is that I can accomplish this using intptr_t. Is this correct? Is there a more semantically appropriate type, i.e. something like intword_t?
I also know that with GCC I can easily do overflow detection on a 32-bit machine by upcasting both arguments to 64-bit ints, which will occupy two registers and take advantage of instructions like IA31 ADC (add with carry). Can I do something similar on a 64-bit machine? Is there a 128-bit type I can upcast to which will compile to use these instructions if they're available? Better yet, is there a standard type that represents twice the register size (like intdoubleptr_t) so this could be done in a machine independent fashion?
Thanks!
Any reason not to use size_t? size_t is 4 bytes on a 32-bit system and 8 bytes on a 64-bit system, and is probably more portable than using WORD_SIZE (I think WORD_SIZE is gcc-specific, no?)
I am not aware of any 128-bit value on 64-bit systems, could be wrong here but haven't come across that type in the kernel or regular user apps.
I'd strongly recommend using the C99 <stdint.h> header. It declares int32_t, int64_t, uint32_t, and uint64_t, which look like what you really want to use.
EDIT: As Alok points out, int_fast32_t, int_fast64_t, etc. are probably what you want to use. The number of bits you specify should be the minimum you need for the math to work, i.e. for the calculation to not "roll over".
The optimization comes from the fact that the CPU doesn't have to waste cycles realigning data, padding the leading bits on a read, and doing a read-modify-write on a write. Truth is, a lot of processors (such as recent x86s) have hardware in the CPU that optimizes these access pretty well (at least the padding and read-modify-write parts), since they're so common and usually only involve transfers between the processor and cache.
So the only thing left for you to do is make sure the accesses are aligned: take sizeof(int_fast32_t) or whatever and use it to make sure your buffer pointers are aligned to that.
Truth is, this may not amount to that much improvement (due to the hardware optimizing transfers at runtime anyway), so writing something and timing it may be the only way to be sure. Also, if you're really crazy about performance, you may need to look at SSE or AltiVec or whatever vectorization tech your processor has, since that will outperform anything you can write that is portable when doing vectored math.

C Programming Data Types

A question was asked, and I am not sure whether I gave an accurate answer or not.
The question was, why use int, why not char, why are they separate? It's all reserved in memory, and bits, why data types have categories?
Can anyone shed some light upon it?
char is the smallest addressable chunk of memory – suits well for manipulating data buffers, but can't hold more than 256 distinct values (if char is 8 bits which is usual) and therefore not very good for numeric calculations. int is usually bigger than char – more suitable for calculations, but not so suitable for byte-level manipulation.
Remember that C is sometimes used as a higher level assembly language - to interact with low level hardware. You need data types to match machine-level features, such as byte-wide I/O registers.
From Wikipedia, C (programming language):
C's primary use is for "system programming", including implementing operating systems and embedded system applications, due to a combination of desirable characteristics such as code portability and efficiency, ability to access specific hardware addresses, ability to "pun" types to match externally imposed data access requirements, and low runtime demand on system resources.
In the past, computers had little memory. That was the prime reason why you had different data types. If you needed a variable to only hold small numbers, you could use an 8-bit char instead of using a 32-bit long. However, memory is cheap today. Therefore, this reason is less applicable now but has stuck anyway.
However, bear in mind that every processor has a default data type in the sense that it operates at a certain width (usually 32-bit). So, if you used an 8-bit char, the value would need to be extended to 32-bits and back again for computation. This may actually slow down your algorithm slightly.
The standard mandates very few limitations on char and int :
A char must be able to hold an ASCII value, that is 7 bits mininum (EDIT: CHAR_BIT is at least 8 according to the C standard). It is also the smallest addressable block of memory.
An int is at least 16 bits wide and the "recommended" default integer type. This recommendation is left to the implementation (your C compiler.)
In general, there are algorithms and designs which are abstractions and data types help in implementing those abstractions. For example - there is a good chance that weight is usually represented as a rational number which can be best implemented in storage in the form of float/double i.e. a number which has a precision part to it.
I hope this helps.
int is the "natural" integer type, you should use it for most computations.
char is essentially a byte; it's the smallest memory unit addressable. char is not 8-bit wide on all platforms, although it's the case most of the time.

Resources