Efficiency of different integer sizes on a 64-bit CPU

In a 64-bit CPU, if the int is 32 bits whereas the long is 64 bits, would the long be more efficient than the int?

The main problem with your question is that you did not define "efficient". There are several possible efficiency related differences.
Of course if you need to use 64 bits, then there's no question. But sometimes you could use 32 bits and you wonder if it would be better to use 64 bits instead.
Data Size Efficiency
Using 32 bits will use less memory. This is more efficient especially if you use a lot of them. Not only it's more efficient in the sense that you may not get to swap out, but also in the sense that you'll have fewer cache misses. If you use just a few then the efficiency difference is irrelevant.
Code Size Efficiency
This is heavily dependent on the architecture. Some architectures will need longer instructions to manipulate 32 bit values, others will need longer instructions to manipulate 64 bits values and others will make no difference. On the intel processors, for example, 32 bits is the default operand size even for 64 bits code. Smaller code may have a little advantage both in cache behavior and in pipeline usage. But it is dependent on the architecture which operand size will use smaller code.
Execution Speed Efficiency
In general there should be no difference beyond the one implied by code size. Once the instruction has been decoded the timing for mere execution are generally identical. However, once again, this is in fact architecture specific. There are architectures that do not have native 32 bit arithmetic, for example.
My suggestion:
If it's just some local variables or data in small structures that you do not allocate in huge quantities, use int and do it in a way that does not assume a size, so that a new version of the compiler or a different compiler that use a different size for int will still work.
However if you have huge arrays or matrixes, then use the smallest type you can use and make sure its size is explicit.

On the common x86-64 architecture, 32-bit arithmetic is never slower than 64 bit arithmethic. So int is always the same speed or faster than long. On other architectures that don't actually have builtin 32-bit arithmetic, such as the MMIX, this might not hold.
Basic wisdom holds: Write it without considering such micro-optimizations and if necessary, profile and optimize.

If you are trying to store 64 bits of data, use a long. If you aren't going to need the 64 bits use the regular 32 bit int.

Yes, a 64bit number would be more efficient than a 32bit number.
On a 64bit CPU most compilers would give you 64bit if you ask for an long int though.
To see the size with your current compiler:
#include <stdio.h>
int main(int argc, char **argv){
long int foo;
printf("The size of an int is: %ld bytes\n", sizeof(foo));
printf("The size of an int is: %ld bits\n", sizeof(foo) * 8);
return 0;
If your cpu is running in 64bit mode you can expect that the CPU will use that regardless of what you ask. All the registers are 64bit, the operations are 64bit so if you want to get a 32bit result it will generally convert the 64bit result to 32bit for you.
The limits.h on my system defines long int as:
/* Minimum and maximum values a `signed long int' can hold. */
# if __WORDSIZE == 64
# define LONG_MAX 9223372036854775807L
# else
# define LONG_MAX 2147483647L
# endif
# define LONG_MIN (-LONG_MAX - 1L)


Operating Rightmost/Leftmost n-Bits, Not All the Bits of A Integer Type Data Variable

In a programming-task, I have to add a smaller integer in variable B (data type int)
to a larger integer (20 decimal integer) in variable A (data type long long int),
then compare A with variable C which is also as large integer (data type long long int) as A.
What I realized, since I add a smaller B to A,
I don't need to check all the digits of A when I compare that with C, in other words, we don't need to check all the bits of A and C.
Given that I know, how many bits from the right I need to check, say n-bits,
is there a way/technique to check only those specific n-bits from the right (not all the bits of A, C) to make the program faster in c programming language?
Because for comparing all the bits take more time, and since I am working with large number, the program becomes slower.
Every time I search in the google, bit-masking appears which uses all the bits of A, C, that doesn't do what I am asking for, so probably I am not using correct terminology, please help.
Initial comments of this post made me think there is no way but i found the following -
Bit Manipulation by University of Colorado Boulder
(#cuboulder, after 7:45)
...the bit band region is accessed via a bit band alĂ­as, each bit in a
supported bit band region has its own unique address and we can access
that bit using a pointer to its bit band alias location, the least
significant bit in an alias location can be sent or cleared and that
will be mapped to the bit in the corresponding data or peripheral
memory, unfortunately this will not help you if you need to write to
multiple bit locations in memory dependent operations only allow a
single bit to be cleared or set...
Is above what I a asking for? if yes then
where I can find the detail as beginner?
Updated question:
Is there a way/technique to check only those specific n-bits from the right (not all the bits of A, C) to make the program faster in c programming language (or any other language) that makes the program faster?
Your assumption that comparing fewer bits is faster might be true in some cases but is probably not true in most cases.
I'm only familiar with x86 CPUs. A x86-64 Processor has 64 bit wide registers. These can be accessed as 64 bit registers but the lower bits also as 32, 16 and 8 bit registers. There are processor instructions which work with the 64, 32, 16 or 8 bit part of the registers. Comparing 8 bits is one instruction but so is comparing 64 bits.
If using the 32 bit comparison would be faster than the 64 bit comparison you could gain some speed. But it seems like there is no speed difference for current processor generations. (Check out the "cmp" instruction with the link to uops.info from #harold.)
If your long long data type is actually bigger then the word size of your processor, then it's a different story. E.g. if your long long is 64 bit but your are on a 32 bit processor then these instructions cannot be handled by one register and you would need multiple instructions. So if you know that comparing only the lower 32 bits would be enough this could save some time.
Also note that comparing only e.g. 20 bits would actually take more time then comparing 32 bits. You would have to compare 32 bits and then mask the 12 highest bits. So you would need a comparison and a bitwise and instruction.
As you see this is very processor specific. And you are on the processors opcode level. As #RawkFist wrote in his comment you could try to get the C compiler to create such instructions but that does not automatically mean that this is even faster.
All of this is only relevant if these operations are executed a lot. I'm not sure what you are doing. If e.g. you add many values B to A and compare them to C each time it might be faster to start with C, subtract the B values from it and compare with 0. Because the compare-operation works internally like a subtraction. So instead of an add and a compare instruction a single subtraction would be enough within the loop. But modern CPUs and compilers are very smart and optimize a lot. So maybe the compiler automatically performs such or similar optimizations.
Try this question.
Is there a way/technique to check only those specific n-bits from the right (not all the bits of A, C) to make the program faster in c programming language (or any other language) that makes the program faster?
Yes - when A + B != C. We can short-cut the comparison once a difference is found: from least to most significant.
No - when A + B == C. All bits need comparison.
Now back to OP's original question
Is there a way/technique to check only those specific n-bits from the right (not all the bits of A, C) to make the program faster in c programming language (or any other language) that makes the program faster?
No. In order to do so, we need to out-think the compiler. A well enabled compiler itself will notice any "tricks" available for long long + (signed char)int == long long and emit efficient code.
Yet what about really long compares? How about a custom uint1000000 for A and C?
For long compares of a custom type, a quick compare can be had.
First, select a fast working type. unsigned is a prime candidate.
typedef unsigned ufast;
Now define the wide integer.
#include <limits.h>
#include <stdbool.h>
#define UINT1000000_N (1000000/(sizeof(ufast) * CHAR_BIT))
typedef struct {
// Least significant first
ufast digit[UINT1000000_N];
} uint1000000;
Perform the addition and compare one "digit" at a time.
bool uint1000000_fast_offset_compare(const uint1000000 *A, unsigned B,
const uint1000000 *C) {
ufast carry = B;
for (unsigned i = 0; i < UINT1000000_N; i++) {
ufast sum = A->digit[i] + carry;
if (sum != C->digit[i]) {
return false;
carry = sum < A->digit[i];
return true;

C: Is using char faster than using int?

Since char is only 1 byte long, is it better to to use char while dealing with 8-bit unsigned int?
I was trying to create a struct for storing rgb values of a color.
struct color
unsigned int r: 8;
unsigned int g: 8;
unsigned int b: 8;
Now since it is int, it allocates a memory of 4 bytes in my case. But if I replace them with unsigned char, they will be taking 3 bytes of memory as intended (in my platform).
No. Maybe a tiny bit.
First, this is a very platform dependent question.
However the <stdint.h> header was introduced to help with this.
Some hardware platforms are optimised for a particular size of operand and have an overhead to using smaller (in bit-size) operands even though logically the smaller operand requires less computation.
You should find that uint_fast8_t is the fastest unsigned integer type with at least 8 bits (#include <stdint.h> to use it).
That may be the same as unsigned char or unsigned int depending on whether your question is 'yes' or 'no' respectively(*).
So the idea would be that if you're speed focused you'd use uint_fast8_t and the compiler will pick the fastest type fitting your purpose.
There are a couple of downsides to this scheme.
One is that if you create very vast quantities of data performance can be impaired (and limits reached) because you're using an 'oversized' type for the purpose.
On a platform where a 4-byte int is faster than a 1-byte char you're using about 4 times as much memory as you need.
If your platform is small or your scale large that can be a significant overhead.
Also you need to be careful that if the underlying type isn't the minimum size you expect then some calculations may be confounded.
Arithmetic 'clocks' neatly for unsigned operands but obviously at different sizes if uint_fast8_t isn't in fact 8-bits.
It's platform dependent what the following returns:
#include <stdint.h>
int foo() {
uint_fast8_t x=255;
return 1;
return 0;
The overhead of dealing with potentially outsized types can claw back your gains.
I tend to agree with Knuth that "premature optimisation is the root of all evil" and would say you should only get into this sort of cleverness if you need it.
Do a typedef for typedef uint8_t color_comp; for now and get the application working before trying to shave off fractions of a second performance later!
I don't know what your application does but it may be that it's not compute intensive in RGB channels and the bottleneck (if any) is elsewhere. Maybe you find some high load calculation where it's worth dealing with uint_fast8_t conversions and issues.
The wisdom of Knuth is that you probably don't know where that is until later anyway.
(*) It could be unsigned long or indeed some other type. The only constraint is that it is an unsigned type and at least 8 bits.

Why int16+int16 is faster than int16+int8?

I decided to make a benchmark to test how fast each C99 type is, out of curiosity.
My benchmark creates a array of the following struct:
typedef struct
int x;
int y;
short spdx;
short spdy;
unsigned char type;
} defaultTypes;
Then I do this operation on the entire struct, multiple times, to simulate a game update loop:
while(counter < ARRAY_MAX)
I tried several types, like int_fast8_t and double.
Later I decided to test what if I made the "spdx" variable bigger too? So I made a version where both the position (x, y) and the speed (spdx, spdy) variables are int16_t
To my surprise, it is SLIGHLY, but only SLIGHLY faster than the int16_t + int8_t version, it was 11% faster to be more exact (compared for example to doubles, that run a quarter of the int16_t+int16_t version speed).
For most other speed differences (floats being slower, bigger variables being slower, and so on) I think I know the reasons, but I don't know why a bigger structure (16, 16, 16, 16, 8) is FASTER than a smaller one (even with padding, 16, 16, 8, 8, 8).
Thus, why doing int16_t+=int16_t is 11% faster than int16_t+=int8_t? Someone suggested it had to do with integer promotion, but I am not sure about that.
Important note (seemly this affect the results): I compiled this with MingW, targeting 32-bit, and running on a 64-bit bit x86 (I ran only once a test targeting 64-bit, thus I am not confident of its results, but the performance gap is seemly 2% instead of 11%)
I think the answer has to do with the fact that your variables are signed. For two (signed) variables with the same type, the cpu can add them using a native instruction. However, when the types are different, even just different sizes, it has to convert one to the other. This may or may not be supported in hardware, but it is still a necessary step. Consider if your int16 was 256 and your int8 was -1. You would expect to get back an int16 value of 257. If the CPU just added the bit patterns together, then you'd get something very different, because the 8bit representation of -1 is not the same as the 16bit representation of -1. First it has to convert the 8-bit number to a 16 bit number, then add them together.
This is what is meant by "integer promotion": the compiler recognizes that the vars are of different types, and uses the appropriate assembly code to do the conversion. It "promotes" the int8 to an int16 before doing the add operation.
I'll bet if you changed them to unsigned, the difference in speed would disappear.

Why is size_t better?

The title is actually a bit misleading, but I wanted to keep it short. I've read about why I should use size_t and I often found statements like this:
size_t is guaranteed to be able to express the maximum size of any object, including any array
I don't really understand what that means. Is there some kind of cap on how much memory you can allocate at once and size_t is guaranteed to be large enough to count every byte in that memory block?
Follow-up question:
What determines how much memory can be allocated?
Let's say the biggest object your compiler/platform can have is 4 gb. size_t then is 32 bit. Now let's say you recompile your program on a 64 bit platform able to support objects of size 2^43 - 1. size_t will be at least 43 bit long (but normally it will be 64 bit at this point). The point is that you only have to recompile the program. You don't have to change all your ints to long (if int is 32 bit and long is 64 bit) or from int32_t to int64_t.
(if you are asking yourself why 43 bit, let's say that Windows Server 2008 R2 64bit doesn't support objects of size 2^63 nor objects of size 2^62... It supports 8 TB of addressable space... So 43 bit!)
Many programs written for Windows considered a pointer to be as much big as a DWORD (a 32 bit unsigned integer). These programs can't be recompiled on 64 bit without rewriting large swats of code. Had they used DWORD_PTR (an unsigned value guaranteed to be as much big as necessary to contain a pointer) they wouldn't have had this problem.
The size_t "point" is the similar. but different!
size_t isn't guaranteed to be able to contain a pointer!!
(the DWORD_PTR of Microsoft Windows is)
This, in general, is illegal:
void *p = ...
size_t p2 = (size_t)p;
For example, on the old DOS "platform", the maximum size of an object was 64k, so size_t needed to be 16 bit BUT a far pointer needed to be at least 20 bit, because the 8086 had a memory space of 1 mb (in the end a far pointer was 16 + 16 bit, because the memory of an 8086 was segmented)
Basically it means that size_t, is guaranteed to be large enough to index any array and get the size of any data type.
It is preferred over using just int, because the size of int and other integer types can be smaller than what can be indexed. For example int is usually 32-bits long which is not enough to index large arrays on 64-bit machines. (This is actually a very common problem when porting programs to 64-bit.)
That is exactly the reason.
The maximum size of any object in a given programming language is determined by a combination of the OS, the CPU architecture and the compiler/linker in use.
size_t is defined to be big enough to hold the size value of the largest possible object.
This usually means that size_t is typedef'ed to be the same as the largest int type available.
So on a 32 bit environment it would typically be 4 bytes and in a 64 bit system 8 bytes.
size_t is defined for the platform that you are compiling for. Hence it can represent the maximum for that platform.
size_t is the return of the sizeof operator (see 7.17 c99) therefore it must describe the largest possible object the system can represent.
Have a look at

int v/s. long in C

On my system, I get:
sizeof ( int ) = 4
sizeof ( long ) = 4
When I checked with a C program, both int & long overflowed to the negative after:
a = 2147483647;
If both can represent the same range of numbers, why would I ever use the long keyword?
int has a minimum range of -32767 to 32767, whereas long has a minimum range of -2147483647 to 2147483647.
If you are writing portable code that may have to compile on different C implementations, then you should use long if you need that range. If you're only writing non-portable code for one specific implementation, then you're right - it doesn't matter.
Because sizeof(int)==sizeof(long) isn't always true. int normaly represents the fastest size with at least 2*8 Bit. long on the other hand is at least 4*8 Bit.
C defines a number of integer types and specifies the relation of their sizes. Basically, what it says is that sizeof(long long) >= sizeof(long) >= sizeof(int) >= sizeof(short) >= sizeof(char), and that sizeof(char) == 1.
But the actual sizes are not defined, and depend on the architecture you are running on. On a 32-bit PC, int and long are typically four bytes and long long is 8 bytes. But on a 64-bit system, long is typically 8 bytes, and thus different from int.
There is also a type called uintptr_t (and intptr_t) that is guaranteed to have the same size as data pointers.
The important thing to remember is to not assume that you can, for example, store pointer values in a long or an int. Being portable is probably more important than you think, and it is likely that you will want to compile your code on a 64-bit system in the near future.
I think it's more of a compiler issue nowadays, since computers has gone much faster and demands more numbers, as was the case before.
On different platform or with a different compiler, the int and long may be different.
If you don't plan to port your code to anything else or use a different machine, then pick the one you want, it won't make a difference.
It depends on the compiler, and you might want to check this out: What does the C++ standard state the size of int, long type to be?
The size of built-in data types is variable depending on the C implementation, but they all have minimum ranges. Nowadays, int is typically 4 bytes long (32-bits) because most OS are 32-bit. Note that char will always be 1 bytes.
The size of a data type depends upon the compiler. Different compilers have diffrent size of int and other data types.
So if you make a code which is going to run on diffrent machine you should use long or it is depend on the range of the value tha t ur variable may have.
