How to treat 64-bit words on a CUDA device? - c

I'd like to handle directly 64-bit words on the CUDA platform (eg. uint64_t vars).
I understand, however, that addressing space, registers and the SP architecture are all 32-bit based.
I actually found this to work correctly (on my CUDA cc1.1 card):
__global__ void test64Kernel( uint64_t *word )
{
(*word) <<= 56;
}
but I don't know, for example, how this affects registers usage and the operations per clock cycle count.

Whether addresses are 32-bit or anything else does not affect what data types you can use. In your example you have a pointer (32-bit, 64-bit, 3-bit (!) - doesn't matter) to a 64-bit unsigned integer.
64-bit integers are supported in CUDA but of course for every 64-bit value you are storing twice as much data as a 32-bit value and so will use more registers and arithmetic operations will take longer (adding two 64-bit integers will just expand it out onto the smaller datatypes using carries to push into the next sub-word). The compiler is an optimising compiler, so will try to minimise the impact of this.
Note that using double precision floating point, also 64-bit, is only supported in devices with compute capability 1.3 or higher (i.e. 1.3 or 2.0 at this time).

Related

What does it mean for a program to be 32 or 64 bit? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 3 years ago.
Improve this question
This question: How many bits does a WORD contain in 32/64 bit OS respectively?, mentions that word size refers to the bit size of a processor register - which i take to mean the number of bits that a computer processor operates on / i.e. the smallest 'indivisible' amount of bits that a processor operates on.
Is that correct? Using software like Word/Excel/etc, the installers have the option for a 32bit or a 64bit installation. What is the difference?
Since the computer architecture is fixed, it would seem to me that software that is '32 bit' would be designed to align with a computer architecture that has a 32 bit architecture. And a 64 bit program would make efforts to align instruction sets with 64 bit word sizes.
Is that correct?
A very similar question is asked here: From a programming point of view, what does it mean when a program is 32 or 64 bit? - and the accepted answer mentions that the difference is the amount of memory that can be allocated to an application. But this is too vague - unless 32 bit / 64 bit software as a concept is completely unrelated to 32 bit / 64 bit word processor size?
Word size is a major difference, but it's not the only one. It tends to define the number of bits a CPU is "rated" for, but word size and overall capability are only loosely related. And overall capability is what matters.
On an Intel or AMD CPU, 32-bit vs. 64-bit software really refers to the mode in which the CPU operates when running it. 32-bit mode has fewer/smaller registers and instructions available, but the most important limitation is the amount of memory available. 32-bit software is generally limited to using between 2GB and just under 4GB of memory.
Each byte of memory has a unique address, which is not very different from each house having a unique postal address. A memory address is just a number that a program can use to find a piece of data again once it has saved it in memory, and each byte of memory has to have an address. If an address is 32 bits, then there are 2^32 possible addresses, and that means 2^32 addressable bytes of memory. On today's Intel/AMD CPUs, the size of a memory address is the same as the size of the registers (although this wasn't always true).
With 32 bit addresses, 4GB (2^32 bytes) can be addressed by the program, however up to half of that space is reserved by the OS. Into the available memory space must fit program code, data, and often also files being accessed. In today's PCs, with many gigabytes of RAM, this fails to take advantage of available memory. That is the main reason why 64-bit has become popular. 64-bit CPUs were available and widely used (typically in 32-bit mode) for several years, until memory sizes larger than 2GB became common, at which point 64-bit mode started to offer real-world advantages and it became popular. 64 bits of memory address space provides 16 exabytes of addressable memory (~18 quintillion bytes), which is more than any current software can use, and certainly no PC has anywhere near that much RAM.
The majority of data used in typical applications, even in 64-bit mode, does not need to be 64-bit and so most of it is still stored in 32-bit (or even smaller) formats. The common ASCII and UTF-8 representations of text use 8-bit data formats. If the program needs to move a large block of text from one place to another in memory, it may try to do it 64 bits at a time, but if it needs to interpret the text, it will probably do it 8 bits at a time. Similarly, 32 bits is a common size for integers (maximum range of +/- 2^31, or approximately +/- 2.1 billion). 2.1 billion is enough range for many uses. Graphics data is usually naturally represented pixel by pixel, and each pixel, usually, contains at most 32 bits of data.
There are disadvantages to using 64-bit data needlessly. 64-bit data takes up more space in memory, and more space in the CPU cache (very fast memory used by the CPU for short-term storage). Memory can only transfer data at a maximum rate, and 64-bit data is twice as big. This can reduce performance if used wastefully. And if it's necessary to support both 32-bit and 64-bit versions of software, using 32-bit values where possible can reduce the differences between the two versions and make development easier (doesn't always work out that way, though).
Prior to 32-bit, the address and word size were usually different (e.g. 16-bit 8086/88 with 20-bit memory addresses but 16-bit registers, or 8-bit 6502 with 16-bit memory addresses, or even early 32-bit ARM with 26-bit addresses). While no programmer ever turned up their nose at better registers, memory space was usually the real driving force for each advancing generation of technology. This is because most programmers rarely work directly with registers, but do work directly with memory, and memory limitations directly cause unpleasantness for the programmer, and in the 32-bit to 64-bit case, for the user as well.
To sum up, while there are real and important technological differences between the various bit sizes, what 32-bit or 64-bit (or 16-bit or 8-bit) really means is simply a collection of capabilities that tend to be associated with CPUs of a particular technological generation, and/or software that takes advantage of those capabilities. Word length is a part of that, but not the only, or necessarily the most important part.
Source: Have been programmer through all these technological eras.
The answer you reference describes benefits of 64-bit over 32-bit. As far as what's actually different about the program itself, it depends on your perspective.
Generally speaking, the program source code does not have to be different at all. Most programs can be written so that they compile perfectly well as either 32-bit or 64-bit programs, as controlled by appropriate choice of compiler and / or compiler options. There is often some impact on the source, however, in that a (C) compiler targeting 64-bit may choose to define its types differently. In particular, long int is ubiquitously 32 bits wide on 32-bit platforms, but it is 64 bits wide on many (but not all) 64-bit platforms. This can be a source of bugs in code that makes unwarranted assumptions about such details.
The main differences are all in the binary. 64-bit programs make use of the full instruction sets of their 64-bit target CPUs, which invariably contain instructions that 32-bit counterpart CPUs do not contain. They will use registers that 32-bit counterpart CPUs do not have. They will use function-call conventions appropriate for their target CPU, which often means passing more arguments in registers than 32-bit programs do. Use of these and other facilities of 64-bit CPUs affords functional advantages such as the ability to use more memory and (sometimes) improved performance.
A program runs on top of a given architecture (arch, or ISA), which is implemented by processors. Typically, an architecture defines a "main" word size, which is the size most of the registers and operations on those registers run (although you can design architectures that work differently). This is usually called the "native" word size, although an architecture may allow operations using different sized registers.
Further, processors use memory, and need to address that memory somehow -- this means operating with those addresses. Therefore, the addresses are typically able to be stored and manipulated like any other number, which means you have registers capable of holding them. Although it is not required that those registers to match the word size nor it is required that an address is computed out of a single register, in some architectures this is the case.
Throughout history, there have been many architectures of different word sizes, even weird ones. Nowadays, you can easily find processors around you that are not just 32-bit and 64-bit, but also e.g. 8-bit and 16-bit (typically in embedded devices). In the typical desktop computer, you are using x86 or x64, which are 32-bit and 64-bit respectively.
Therefore, when you say that a program is 32-bit or 64-bit, you are referring to a particular architecture. In the popular desktop scenario, you are referring to x86 vs. x64. There are many questions, articles and books discussing the differences between the two.
Now, a final note: for compatibility reasons, x64 processors can operate in different modes, one of which is capable of running the 32-bit code from x86. This means that if your computer is x64 (likely) and if your operating system has support for it (also likely, e.g. Windows 64-bit), it can still run programs compiled for x86.
Using software like Word/Excel/etc, the installers have the option for a 32bit or a 64bit installation. What is the difference?
This depends on the CPU used:
On SPARC CPUs, the difference between "32-bit" and "64-bit" programs is exactly what you think:
64-bit programs use additional operations that are not supported by 32-bit SPARC CPUs. On the other hand the Solaris or Linux operating system places the data accessed by 64-bit programs in memory areas which can only be accessed using 64-bit instructions. This means that a 64-bit program even MUST use instructions not supported by 32-bit CPUs.
For x86 CPUs this is different:
Modern x86 CPUs have different operating modes and they can execute different types of code. In the different modes, they can execute 16-, 32- or 64-bit code.
In 16-, 32- and 64-bit code, the CPU interprets the bytes differently:
The bytes (hexadecimal) b8 4e 61 bc 00 c3 would be interpreted as:
mov eax,0xbc614e
ret
... in 32-bit code and as:
mov ax,0x614e
mov sp,0xc300
... in 16 bit code.
The bytes in the EXE file of the "64-bit installation" and of the "32-bit installation" must be interpreted differently by the CPU.
And a 64 bit program would make efforts to align instruction sets with 64 bit word sizes.
16-bit code (see above) can access 32-bit registers when the CPU is not a 16-bit CPU.
So a "16-bit program" can access 32-bit registers on a 32- or 64-bit x86 CPU.
mentions that word size refers to the bit size of a processor register
Generally yes (though there are some exceptions/complications)
which I take to mean the number of bits that a computer processor operates on / i.e. the smallest 'indivisible' amount of bits that a processor operates on.
No, most processor architectures can work on values smaller than their native word size. A better (but not perfect) definition would be the largest piece of data that the processor can process (through the main integer datapath) as a single unit.
In general on modern 32-bit and 64-bit systems pointers are the same size as the word size, though on many 64-bit systems not all bits of said pointer are actually usable. It is possible to have a memory model where addressable memory is greater than the system's native word size and it was common to do so in the 8-bit and 16-bit eras, but it has fallen out of favour since the introduction of 32-bit CPUs.
Since the computer architecture is fixed
While the physical architecture is of course fixed, many processors have multiple operating modes with different instructions and registers available to the programmer. In 64-bit mode, the full features of the CPU are available, in 32-bit mode, the processor presents a backwards compatible interface which limits the features and the address space. The modes are sufficiently different that code must be compiled for a particular mode.
As a general rule, an OS running in 64-bit mode can support applications running in 32-bit mode but not vice-versa.
So a 32-bit application runs in 32-bit mode on either a 32-bit processor running a 32-bit OS, a 64-bit processor running a 32-bit OS or a 64-bit processor running a 64-bit OS.
A 64-bit application on the other hand normally runs only on a 64-bit processor running a 64-bit OS.
The information you have is a good part of the picture, but not all of it. I'm not a processor expert, so there are likely some details that my answer will be missing.
The 32 bit vs 64 bit is related to the processor architecture. An increase in word size does a few things:
Larger word size enables more instructions to be defined. For instance, and 8-bit processor that does a single load instruction can only have 256 total instructions, where a larger word size allows more instructions to be defined in the processors micro-code. Obviously, there is a limit to how many truly useful instructions are defined.
More data can be processed with a single instruction cycle as there are more bits available. This speeds up execution.
Like you stated, it also allows access to a larger memory space without having to do things like multiple address cycles, or multiplexing high/low data words.
When the processor architecture moves from 32-bit to 64-bit, the chip manufacturer will likely maintain compatibility with the previous instruction set, so that all the software that was developed previously will still run on the new architecture. When you target the 64-bit architecture, the compiler will have new instructions available and memory addressing schemes with which to process data more efficiently.
Short answer: This is a convention based solely on the width of the underlying data bus
An n-bit program is a program that is optimized for an n-bit CPU. Said otherwise a 64-bit program is a binary program compiled for a 64 bit CPU. A 64 bit CPU, in turn, is one taking advantage of a 64-bit data bus for the exchange of data between CPU and memory.
That's as simple, but you can read more below.
The definition actually redirects to understanding what is a 32/64 bit CPU, indirectly to what is a 32/64 bit operating system, and how compilers optimize binaries for a given architecture.
Optimization here encompasses the format of the binary itself. 32 bit and 64-bit binaries for a given OS, e.g. a Windows binary, have different formats. However, a given 64 bit OS, e.g. Windows 64, will be able to read and launch a 32-bit binary file written for the 32-bit version and a 32-bit wide data bus.
32/64 bit CPU, first definition
The CPU can store/recall a certain quantity of data in memory in a single instruction. A 32-bit CPU can transfer 4 bytes (32 bits) at once and a 64-bit CPU can transfer 8 bytes (64 bits) at once. So "32/64 bit" prefix comes from the quantity of RAM transferred in a single read/write cycle.
This quantity impacts the execution time: The fewer transfer cycles are required, the less the CPU waits for the memory, the program executes faster. It's like carrying a large quantity of water with a small or a large bucket.
The size of the bucket (the number of bits used for data transfer) is used to indicate how efficient the architecture is, hence for the same CPU, a 32-bit application is less efficient than a 64-bit application.
32/64 bit CPU, technical definition
Obviously, the RAM and the CPU must be both able to manage a 32/64-bit data transfer, which in turn determines the number of wires used to connect the CPU to the RAM (system bus). 32/64 bit is actually the number of wires/tracks composing the data bus (usually named the bus "width").
(Wikipedia: System bus - The data bus width determines the prefix 32/64 bit for a CPU, a program, an OS, ...)
(Another bus is the address bus, which is usually wider, but the address bus width is irrelevant in naming a CPU as 32 or 64 bit CPU. This address bus width determines the total quantity of RAM which can be reached / "addressed" by the CPU, e.g. 2 GB or 32 GB. As for the control bus, it is a small bus used to synchronize everything connected to the data bus, in particular, it indicates when the data bus is stable and ready to be sampled in a data transfer operation).
When bits are transferred between the CPU and the RAM, the voltage on the different copper tracks of the data bus must be stable prior to reading data on the bus, else one or more bit values would be wrong. It takes less time to stabilize 8 bits than 64 bits, so increasing the data bus width is not without problems to solve.
32/64 bit program: A compiler matter
Programs don't always need to transfer 4 bytes (32-bit data bus) or 8 bytes (64-bit data bus), so they use different instructions to read 1 byte, 2 bytes, 4 bytes, and 8 bits, for performance reasons.
Binaries (native assembly language programs) are written either with the 32-bit architecture in mind, or the 64-bit architecture, and the associated instruction set. So the name 32/64 bit program.
The choice of the target architecture is a matter of compiler/compiler options used when converting the source program into a binary. Most compilers are able to produce a 32 bit or a 64 binary from the same source program. That's why you'll find both versions of an application when downloading your preferred program or tool.
However, most programs rely on ready-made libraries written by other programmers (e.g. a video editing program may use FFmpeg library). To produce a fully 64-bit application, the compiler (actually the link editor, but let's keep it simple) needs to access a 64-bit version of any library used, which may not be possible.
This also applies to operating systems themselves, as an OS is just a suite of individual programs and libraries. However, an OS is itself a kind of big library for the user programs, acting as a gateway between the computer hardware and the user programs, for efficiency and security reasons. The way OS is written car prevent the user programs to access the full potential of the underlying CPU architecture.
32-bit program compatibility with 64-bit CPU
A 64-bit operating system is able to run a 32-bit binary on a 64-bit architecture, as the 64 bit CPU instruction set is retro-compatible. However, some adjustments are required.
In addition of the data bus width and read/write instructions subset, there are many other differences between 32 bit and 64 bit CPU (register operations, memory caches, data alignment/boundaries, timing, ...).
Running a 32-bit program on a 64-bit architecture:
is more efficient than running it on an older 32-bit architecture (almost solely due to CPU clock speed improvement compared to older 32/64 bit CPU generations)
is less efficient than running the same application compiled into a 64-bit binary to take advantage of the 64-bit architecture, in particular, the ability to transfer 64 bits at once from/to memory.
When compiling a source into a 32-bit binary, the compiler will still use small buckets, instead of the larger available with the 64-bit data bus. This has the largest impact on execution speed, compared to the same application compiled to use large buckets.
For information, the applications compiled into 16 bit Windows binaries (earlier versions of Windows running on 80-286 CPU with a 16-bit data bus) are not fully supported anymore, though there is still a possibility on Windows 10 to activate NTVDM.
The case of .NET, Java and other interpreted "byte-code"
While until recent years, compilers were used to translate a source program (e.g. a C++ source) into a machine language program, this method is now in regression.
The main problem is that machine language for some CPU is not the same than for another (think about differences between a smartphone using an ARM chip and a server using an Intel chip). You definitely can't use the same binary on both hardware, they are not talking the same language, and even if this were possible it would be inefficient on both machines due to the huge differences in how they work.
The current idea is to use an intermediate representation (IR) of the instructions, derived from the source. Java (Sun, sadly now Oracle) and IL (Microsoft) are such intermediate representations. The same IR file can be used on any OS supporting the IR.
Once the OS opens the file, it performs the final compilation into the "local" machine language understood by the actual CPU and taking into account the final architecture on which to run the program. For example, for Microsoft .NET, the universal version is executed by a CoreCLR virtual machine located on the final computer. There is usually no notion of data bus width in such intermediate languages, hence less and less application will have this n-bit prefix.
However we cannot forget the actual architecture, so there will be still 32 and 64 bit versions produced for the CoreCLR to optimize the final code, even if the application itself, at the IR level, is not optimized for a given architecture (only one IR version to download and install).

size of int variable

How the size of int is decided?
Is it true that the size of int will depend on the processor. For 32-bit machine, it will be 32 bits and for 16-bit it's 16.
On my machine it's showing as 32 bits, although the machine has 64-bit processor and 64-bit Ubuntu installed.
It depends on the implementation. The only thing the C standard guarantees is that
sizeof(char) == 1
and
sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long) <= sizeof(long long)
and also some representable minimum values for the types, which imply that char is at least 8 bits long, int is at least 16 bit, etc.
So it must be decided by the implementation (compiler, OS, ...) and be documented.
It depends on the compiler.
For eg : Try an old turbo C compiler & it would give the size of 16 bits for an int because the word size (The size the processor could address with least effort) at the time of writing the compiler was 16.
Making int as wide as possible is not the best choice. (The choice is made by the ABI designers.)
A 64bit architecture like x86-64 can efficiently operate on int64_t, so it's natural for long to be 64 bits. (Microsoft kept long as 32bit in their x86-64 ABI, for various portability reasons that make sense given the existing codebases and APIs. This is basically irrelevant because portable code that actually cares about type sizes should be using int32_t and int64_t instead of making assumptions about int and long.)
Having int be int32_t actually makes for better, more efficient code in many cases. An array of int use only 4B per element has only half the cache footprint of an array of int64_t. Also, specific to x86-64, 32bit operand-size is the default, so 64bit instructions need an extra code byte for a REX prefix. So code density is better with 32bit (or 8bit) integers than with 16 or 64bit. (See the x86 wiki for links to docs / guides / learning resources.)
If a program requires 64bit integer types for correct operation, it won't use int. (Storing a pointer in an int instead of an intptr_t is a bug, and we shouldn't make the ABI worse to accommodate broken code like that.) A programmer writing int probably expected a 32bit type, since most platforms work that way. (The standard of course only guarantees 16bits).
Since there's no expectation that int will be 64bit in general (e.g. on 32bit platforms), and making it 64bit will make some programs slower (and almost no programs faster), int is 32bit in most 64bit ABIs.
Also, there needs to be a name for a 32bit integer type, for int32_t to be a typedef for.
It is depends on the primary compiler.
if you using turbo c means the integer size is 2 bytes.
else you are using the GNU gccompiler means the integer size is 4 bytes.
it is depends on only implementation in C compiler.
The size of integer is basically depends upon the architecture of your system.
Generally if you have a 16-bit machine then your compiler will must support a int of size 2 byte.
If your system is of 32 bit,then the compiler must support for 4 byte for integer.
In more details,
The concept of data bus comes into picture yes,16-bit ,32-bit means nothing but the size of data bus in your system.
The data bus size is required for to determine the size of an integer because,The purpose of data bus is to provide data to the processor.The max it can provide to the processor at a single fetch is important and this max size is preferred by the compiler to give a data at time.
Basing upon this data bus size of your system the compiler is
designed to provide max size of the data bus as the size of integer.
x06->16-bit->DOS->turbo c->size of int->2 byte
x306->32-bit>windows/Linux->GCC->size of int->4 byte
Yes. int size depends on the compiler size.
For 16 bit integer the range of the integer is between -32768 to 32767. For 32 & 64 bit compiler it will increase.

Micro-optimizations: using intptr_t for flag/bool types

From what I understand, the definition of intptr_t varies by architecture -- it is guaranteed to have the capacity to represent a pointer that can access all of the uniform address space of a process.
Nginx (popular open source web-server) defines a type that is used as a flag(boolean) and this a typedef to intptr_t. Now using the x86-64 architecture as an example -- which has access to a plethora of instructions covering operands of all sizes -- why define the flag to be intptr_t ? Surely the tradition of using a 32-bit bool type would fit the bill just as well ?
I have gone over the 32-bit Vs. 8-bit bools argument myself when I was a new developer, and the conclusion was that 32-bit bools perform better for the common case because of the intricacies of processor design. Why then do need to move to 64-bit bools ?
The only people who really know why nginx uses intptr_t for a boolean type are the nginx developers.
As you say, 32-bit bools often perform better than 8-bit bools for the common case. I have done no benchmarking myself, but it sounds not unreasonable to me that for certain situation on x86-64 a 64-bit bool beats a 32-bit bool. For example, in the nginx source I noticed that most ngnx_flag_t's occur in structs with other (u)intptr_t typedef'ed types. A 32-bit bool might not save space here due to alignment padding.
I do find the choice for intptr_t a bit odd as it is an optional C99 type with the intent of converting to/from void *. But as far as I can see it is never used as such. Perhaps this type gives the best approximation for 'native' word sized type?
64bit bool sounds like a terrible idea for x86-64. I'd guess that the person who wrote it was thinking about 32bit machines with 32bit pointers at the time.
Modern x86 has very good support for unaligned loads/stores, and for unpacking a byte to fill a register on the fly. If x86 is the primary target, 8bit boolean should be preferred, esp in cases where it saves bytes leading to less cache usage. In the rare case where cache is not an issue at all, 32bit is the natural size, and will maybe save an instruction in some cases where a boolean is added or multiplied directly with an int, by allowing the boolean to be used as a memory operand, instead of being loaded with a movzx.
For the usual case of test&branch on a boolean, Intel and AMD CPUs have literally zero difference in performance between 8bit and 32bit operands, whether it's in memory or a register.

Determine word size of my processor

How do I determine the word size of my CPU? If I understand correct an int should be one word right? I'm not sure if I am correct.
So should just printing sizeof(int) would be enough to determine the word size of my processor?
Your assumption about sizeof(int) is untrue; see this.
Since you must know the processor, OS and compiler at compilation time, the word size can be inferred using predefined architecture/OS/compiler macros provided by the compiler.
However while on simpler and most RISC processors, word size, bus width, register size and memory organisation are often consistently one value, this may not be true to more complex CISC and DSP architectures with various sizes for floating point registers, accumulators, bus width, cache width, general purpose registers etc.
Of course it begs the question why you might need to know this? Generally you would use the type appropriate to the application, and trust the compiler to provide any optimisation. If optimisation is what you think you need this information for, then you would probably be better off using the C99 'fast' types. If you need to optimise a specific algorithm, implement it for a number of types and profile it.
an int should be one word right?
As I understand it, that depends on the data size model. For an explanation for UNIX Systems, 64-bit and Data Size Neutrality. For example Linux 32-bit is ILP32, and Linux 64-bit is LP64. I am not sure about the difference across Window systems and versions, other than I believe all 32-bit Window systems are ILP32.
How do I determine the word size of my CPU?
That depends. Which version of C standard are you assuming. What platforms are we talking. Is this a compile or run time determination you're trying to make.
The C header file <limits.h> may defines WORD_BIT and/or __WORDSIZE.
sizeof(int) is not always the "word" size of your CPU. The most important question here is why you want to know the word size.... are you trying to do some kind of run-time and CPU specific optimization?
That being said, on Windows with Intel processors, the nominal word size will be either 32 or 64 bits and you can easily figure this out:
if your program is compiled for 32-bits, then the nominal word size is 32-bits
if you have compiled a 64-bit program then then the nominal word size is 64-bits.
This answer sounds trite, but its true to the first order. But there are some important subtleties. Even though the x86 registers on a modern Intel or AMD processor are 64-bits wide; you can only (easily) use their 32-bit widths in 32-bit programs - even though you may be running a 64-bit operating system. This will be true on Linux and OSX as well.
Moreover, on most modern CPU's the data bus width is wider than the standard ALU registers (EAX, EBX, ECX, etc). This bus width can vary, some systems have 128 bit, or even 192 bit wide busses.
If you are concerned about performance, then you also need to understand how the L1 and L2 data caches work. Note that some modern CPU's have an L3 cache. Caches including a unit called the Write Buffer
Make a program that does some kind of integer operation many times, like an integer version of the SAXPY algorithm. Run it for different word sizes, from 8 to 64 bits (i.e. from char to long long).
Measure the time each version spends while running the algorithm. If there is one specific version that lasts noticeably less than the others, the word size used for that version is probably the native word size of your computer. On the other way, if there are several versions that last more or less the same time, pick up the one which has the greater word size.
Note that even with this technique you can get false data: your benchmark, compiled using Turbo C and running on a 80386 processor through DOS will report that the word size is 16 bits, just because the compiler doesn't use the 32-bit registers to perform integer aritmetic, but calls to internal functions that do the 32-bit version of each aritmetic operation.
"Additionally, the size of the C type long is equal to the word size, whereas the size of the int type is sometimes less than that of the word size. For example, the Alpha has a 64-bit word size. Consequently, registers, pointers, and the long type are 64 bits in length."
source: http://books.msspace.net/mirrorbooks/kerneldevelopment/0672327201/ch19lev1sec2.html
Keeping this in mind, the following program can be executed to find out the word size of the machine you're working on-
#include <stdio.h>
int main ()
{
long l;
short s = (8 * sizeof(l));
printf("Word size of this machine is %hi bits\n", s);
return 0;
}
In short: There's no good way. The original idea behind the C data types was that int would be the fastest (native) integer type, long the biggest etc.
Then came operating systems that originated on one CPU and were then ported to different CPUs whose native word size was different. To maintain source code compatibility, some of the OSes broke with that definition and kept the data types at their old sizes, and added new, non-standard ones.
That said, depending on what you actually need, you might find some useful data types in stdint.h, or compiler-specific or platform-specific macros for various purposes.
To use at compile time: sizeof(void*)
What every may be the reason for knowing the size of the processor it don't matter.
The size of the processor is the amount of date that Arthematic Logic Unit(ALU) of One CPU Core can work on at a single point of time. A CPU Cores's ALU will on Accumulator Register at any time. So, The size of a CPU in bits is the the size of Accumulator Register in bits.
You can find the size of the accumulator from the data sheet of the processor or by writing a small assembly language program.
Note that the effective usable size of Accumulator Register can change in some processors (like ARM) based on mode of operations (Thumb and ARM modes). That means the size of the processor will also change based on the mode for that processors.
It common in many architectures to have virtual address pointer size and integer size same as accumulator size. It is only to take advantage of Accumulator Register in different processor operations but it is not a hard rule.
Many thinks of memory as an array of bytes. But CPU has another view of it. Which is about memory granularity. Depending on architecture, there would be 2, 4, 8, 16 or even 32 bytes memory granularity. Memory granularity and address alignment have great impact on performance, stability and correctness of software. Consider a granularity of 4 bytes and an unaligned memory access to read in 4 bytes. In this case every read, 75% if address is increasing by one byte, takes two more read instructions plus two shift operations and finally a bitwise instruction for final result which is performance killer. Further atomic operations could be affected as they must be indivisible. Other side effects would be caches, synchronization protocols, cpu internal bus traffic, cpu write buffer and you guess what else. A practical test could be run on a circular buffer to see how the results could be different. CPUs from different manufacturers, based on model, have different registers which will be used in general and specific operations. For example modern CPUs have extensions with 128 bits registers. So, the word size is not only about type of operation but memory granularity. Word size and address alignment are beasts which must be taken care about. There are some CPUs in market which does not take care of address alignment and simply ignore it if provided. And guess what happens?
As others have pointed out, how are you interested in calculating this value? There are a lot of variables.
sizeof(int) != sizeof(word). the size of byte, word, double word, etc have never changed since their creation for the sake of API compatibility in the windows api world at least. Even though a processor word size is the natural size an instruction can operate on. For example, in msvc/cpp/c#, sizeof(int) is four bytes. Even in 64bit compilation mode. Msvc/cpp has __int64 and c# has Int64/UInt64(non CLS compliant) ValueType's. There are also type definitions for WORD DWORD and QWORD in the win32 API that have never changed from two bytes, four bytes, and eight bytes respectively. As well as UINT/INT_PTR on Win32 and UIntPtr/IntPtr on c# that are guranteed to be big enough to represent a memory address and a reference type respectively. AFAIK, and I could be wrong if arch's still exist, I don't think anyone has to deal with, nor do, near/far pointers exist anymore, so if you're on c/cpp/c#, sizeof(void*) and Unsafe.SizeOf{IntPtr}() would be enough to determine your maximum "word" size I would think in a compliant cross-platform way, and if anyone can correct that, please do so! Also, sizes of intrinsic types in c/cpp are vague in size definition.
C data type sizes - Wikipedia

Word and Double Word integers in C

I am trying to implement a simple, moderately efficient bignum library in C. I would like to store digits using the full register size of the system it's compiled on (presumably 32 or 64-bit ints). My understanding is that I can accomplish this using intptr_t. Is this correct? Is there a more semantically appropriate type, i.e. something like intword_t?
I also know that with GCC I can easily do overflow detection on a 32-bit machine by upcasting both arguments to 64-bit ints, which will occupy two registers and take advantage of instructions like IA31 ADC (add with carry). Can I do something similar on a 64-bit machine? Is there a 128-bit type I can upcast to which will compile to use these instructions if they're available? Better yet, is there a standard type that represents twice the register size (like intdoubleptr_t) so this could be done in a machine independent fashion?
Thanks!
Any reason not to use size_t? size_t is 4 bytes on a 32-bit system and 8 bytes on a 64-bit system, and is probably more portable than using WORD_SIZE (I think WORD_SIZE is gcc-specific, no?)
I am not aware of any 128-bit value on 64-bit systems, could be wrong here but haven't come across that type in the kernel or regular user apps.
I'd strongly recommend using the C99 <stdint.h> header. It declares int32_t, int64_t, uint32_t, and uint64_t, which look like what you really want to use.
EDIT: As Alok points out, int_fast32_t, int_fast64_t, etc. are probably what you want to use. The number of bits you specify should be the minimum you need for the math to work, i.e. for the calculation to not "roll over".
The optimization comes from the fact that the CPU doesn't have to waste cycles realigning data, padding the leading bits on a read, and doing a read-modify-write on a write. Truth is, a lot of processors (such as recent x86s) have hardware in the CPU that optimizes these access pretty well (at least the padding and read-modify-write parts), since they're so common and usually only involve transfers between the processor and cache.
So the only thing left for you to do is make sure the accesses are aligned: take sizeof(int_fast32_t) or whatever and use it to make sure your buffer pointers are aligned to that.
Truth is, this may not amount to that much improvement (due to the hardware optimizing transfers at runtime anyway), so writing something and timing it may be the only way to be sure. Also, if you're really crazy about performance, you may need to look at SSE or AltiVec or whatever vectorization tech your processor has, since that will outperform anything you can write that is portable when doing vectored math.

Resources