Is there a standard way to detect bit width of hardware? - c

Variables of type int are allegedly "one machine-type word in length"
but in embedded systems, C compilers for 8 bit micro use to have int of 16 bits!, (8 bits for unsigned char) then for more bits, int behave normally:
in 16 bit micros int is 16 bits too, and in 32 bit micros int is 32 bits, etc..
So, is there a standar way to test it, something as BITSIZEOF( int ) ?
like "sizeof" is for bytes but for bits.
this was my first idea
register c=1;
int bitwidth=0;
do
{
bitwidth++;
}while(c<<=1);
printf("Register bit width is : %d",bitwidth);
But it takes c as int, and it's common in 8 bit compilers to use int as 16 bit, so it gives me 16 as result, It seems there is no standar for use "int" as "register width", (or it's not respected)
Why I want to detect it? suppose I need many variables that need less than 256 values, so they can be 8, 16, 32 bits, but using the right size (same as memory and registers) will speed up things and save memory, and if this can't be decided in code, I have to re-write the function for every architecture
EDIT
After read the answers I found this good article
http://embeddedgurus.com/stack-overflow/category/efficient-cc/page/4/
I will quote the conclusion (added bold)
Thus
the bottom line is this. If you want
to start writing efficient, portable
embedded code, the first step you
should take is start using the C99
data types ‘least’ and ‘fast’. If your
compiler isn’t C99 compliant then
complain until it is – or change
vendors. If you make this change I
think you’ll be pleasantly surprised
at the improvements in code size and
speed that you’ll achieve.

I have to re-write the function for every architecture
No you don't. Use C99's stdint.h, which has types like uint_fast8_t, which will be a type capable of holding 256 values, and quickly.
Then, no matter the platform, the types will change accordingly and you don't change anything in your code. If your platform has no set of these defined, you can add your own.
Far better than rewriting every function.

To answer your deeper question more directly, if you have a need for very specific storage sizes that are portable across platforms, you should use something like types.h stdint.h which defines storage types specified with number of bits.
For example, uint32_t is always unsigned 32 bits and int8_t is always signed 8 bits.

#include <limits.h>
const int bitwidth = sizeof(int) * CHAR_BIT;

The ISA you're compiling for is already known to the compiler when it runs over your code, so your best bet is to detect it at compile time. Depending on your environment, you could use everything from autoconf/automake style stuff to lower level #ifdef's to tune your code to the specific architecture it'll run on.

I don't exactly understand what you mean by "there is no standar for use "int" as "register width". In the original C language specification (C89/90) the type int is implied in certain contexts when no explicit type is supplied. Your register c is equivalent to register int c and that is perfectly standard in C89/90. Note also that C language specification requires type int to support at least -32767...+32767 range, meaning that on any platform int will have at least 16 value-forming bits.
As for the bit width... sizeof(int) * CHAR_BIT will give you the number of bits in the object representation of type int.
Theoretically though, the value representation of type int is not guaranteed to use all bits of its object representation. If you need to determine the number of bits used for value representation, you can simply analyze the INT_MIN and INT_MAX values.
P.S. Looking at the title of your question, I suspect that what you really need is just the CHAR_BIT value.

Does an unsigned char or unsigned short suit your needs? Why not use that? If not, you should be using compile time flags to bring in the appropriate code.

I think that in this case you don't need to know how many bits has your architecture. Just use variables as small as possible if you want to optimize your code.

Related

Is there a general way in C to make a number greater than 64 bits or smaller than 8 bits?

First of all, I am aware that what I am trying to do might be outside the C standard.
I'd like to know if it is possible to make a uint4_t/int4_t or uint128_t/int128_t type in C.
I know I could do this using bitshifts and complex functions, but can I do it without those?
You can use bitfields within a structure to get fields narrower than a uint8_t, but, the base datatype they're stored in will not be any smaller.
struct SmallInt
{
unsigned int a : 4;
};
will give you a structure with a member called a that is 4 bits wide.
Individual storage units (bytes) are no less than CHAR_BITS bits wide1; even if you create a struct with a single 4-bit bitfield, the associated object will always take up a full storage unit.
There are multiple precision libraries such as GMP that allow you to work with values that can't fit into 32 or 64 bits. Might want to check them out.
8 bits minimum, but may be wider.
In practice, if you want very wide numbers (but that is not specified in standard C11) you probably want to use some arbitrary-precision arithmetic external library (a.k.a. bignums). I recommend using GMPlib.
In some cases, for tiny ranges of numbers, you might use bitfields inside struct to have tiny integers. Practically speaking, they can be costly (the compiler would emit shift and bitmask instructions to deal with them).
See also this answer mentioning __int128_t as an extension in some compilers.

Would making plain int 64-bit break a lot of reasonable code?

Until recently, I'd considered the decision by most systems implementors/vendors to keep plain int 32-bit even on 64-bit machines a sort of expedient wart. With modern C99 fixed-size types (int32_t and uint32_t, etc.) the need for there to be a standard integer type of each size 8, 16, 32, and 64 mostly disappears, and it seems like int could just as well be made 64-bit.
However, the biggest real consequence of the size of plain int in C comes from the fact that C essentially does not have arithmetic on smaller-than-int types. In particular, if int is larger than 32-bit, the result of any arithmetic on uint32_t values has type signed int, which is rather unsettling.
Is this a good reason to keep int permanently fixed at 32-bit on real-world implementations? I'm leaning towards saying yes. It seems to me like there could be a huge class of uses of uint32_t which break when int is larger than 32 bits. Even applying the unary minus or bitwise complement operator becomes dangerous unless you cast back to uint32_t.
Of course the same issues apply to uint16_t and uint8_t on current implementations, but everyone seems to be aware of and used to treating them as "smaller-than-int" types.
As you say, I think that the promotion rules really are the killer. uint32_t would then promote to int and all of a sudden you'd have signed arithmetic where almost everybody expects unsigned.
This would be mostly hidden in places where you do just arithmetic and assign back to an uint32_t. But it could be deadly in places where you do comparison to constants. Whether code that relies on such comparisons without doing an explicit cast is reasonable, I don't know. Casting constants like (uint32_t)1 can become quite tedious. I personally at least always use the suffix U for constants that I want to be unsigned, but this already is not as readable as I would like.
Also have in mind that uint32_t etc are not guaranteed to exist. Not even uint8_t. The enforcement of that is an extension from POSIX. So in that sense C as a language is far from being able to make that move.
"Reasonable Code"...
Well... the thing about development is, you write and fix it and then it works... and then you stop!
And maybe you've been burned a lot so you stay well within the safe ranges of certain features, and maybe you haven't been burned in that particular way so you don't realize that you're relying on something that could kind-of change.
Or even that you're relying on a bug.
On olden Mac 68000 compilers, int was 16 bit and long was 32. But even then most extant C code assumed an int was 32, so typical code you found on a newsgroup wouldn't work. (Oh, and Mac didn't have printf, but I digress.)
So, what I'm getting at is, yes, if you change anything, then some things will break.
With modern C99 fixed-size types
(int32_t and uint32_t, etc.) the need
for there to be a standard integer
type of each size 8, 16, 32, and 64
mostly disappears,
C99 has fixed-sized typeDEFs, not fixed-size types. The native C integer types are still char, short, int, long, and long long. They are still relevant.
The problem with ILP64 is that it has a great mismatch between C types and C99 typedefs.
int8_t = char
int16_t = short
int32_t = nonstandard type
int64_t = int, long, or long long
From 64-Bit Programming Models: Why LP64?:
Unfortunately, the ILP64 model does
not provide a natural way to describe
32-bit data types, and must resort to
non-portable constructs such as
__int32 to describe such types. This is likely to cause practical problems
in producing code which can run on
both 32 and 64 bit platforms without
#ifdef constructions. It has been possible to port large quantities of
code to LP64 models without the need
to make such changes, while
maintaining the investment made in
data sets, even in cases where the
typing information was not made
externally visible by the application.
DEC Alpha and OSF/1 Unix was one of the first 64-bit versions of Unix, and it used 64-bit integers - an ILP64 architecture (meaning int, long and pointers were all 64-bit quantities). It caused lots of problems.
One issue I've not seen mentioned - which is why I'm answering at all after so long - is that if you have a 64-bit int, what size do you use for short? Both 16 bits (the classical, change nothing approach) and 32 bits (the radical 'well, a short should be half as long as an int' approach) will present some problems.
With the C99 <stdint.h> and <inttypes.h> headers, you can code to fixed size integers - if you choose to ignore machines with 36-bit or 60-bit integers (which is at least quasi-legitimate). However, most code is not written using those types, and there are typically deep-seated and largely hidden (but fundamentally flawed) assumptions in the code that will be upset if the model departs from the existing variations.
Notice Microsoft's ultra-conservative LLP64 model for 64-bit Windows. That was chosen because too much old code would break if the 32-bit model was changed. However, code that had been ported to ILP64 or LP64 architectures was not immediately portable to LLP64 because of the differences. Conspiracy theorists would probably say it was deliberately chosen to make it more difficult for code written for 64-bit Unix to be ported to 64-bit Windows. In practice, I doubt whether that was more than a happy (for Microsoft) side-effect; the 32-bit Windows code had to be revised a lot to make use of the LP64 model too.
There's one code idiom that would break if ints were 64-bits, and I see it often enough that I think it could be called reasonable:
checking if a value is negative by testing if ((val & 0x80000000) != 0)
This is commonly found in checking error codes. Many error code standards (like Window's HRESULT) uses bit 31 to represent an error. And code will sometimes check for that error either by testing bit 31 or sometimes by checking if the error is a negative number.
Microsoft's macros for testing HRESULT use both methods - and I'm sure there's a ton of code out there that does similar without using the SDK macros. If MS had moved to ILP64, this would be one area that caused porting headaches that are completely avoided with the LLP64 model (or the LP64 model).
Note: if you're not familiar with terms like "ILP64", please see the mini-glossary at the end of the answer.
I'm pretty sure there's a lot of code (not necessarily Windows-oriented) out there that uses plain-old-int to hold error codes, assuming that those ints are 32-bits in size. And I bet there's a lot of code with that error status scheme that also uses both kinds of checks (< 0 and bit 31 being set) and which would break if moved to an ILP64 platform. These checks could be made to continue to work correctly either way if the error codes were carefully constructed so that sign-extension took place, but again, many such systems I've seen construct the error values by or-ing together a bunch a bitfields.
Anyway, I don't think this is an unsolvable problem by any means, but I do think it's a fairly common coding practice that would cause a lot of code to require fixing up if moved to an ILP64 platform.
Note that I also don't think this was one of the foremost reasons for Microsoft to choose the LLP64 model (I think that decision was largely driven by binary data compatibility between 32-bit and 64-bit processes, as mentioned in MSDN and on Raymond Chen's blog).
Mini-Glossary for the 64-bit Platform Programming Model terminology:
ILP64: int, long, pointers are 64-bits
LP64: long and pointers are 64-bits, int is 32-bits (used by many (most?) Unix platforms)
LLP64: long long and pointers are 64-bits, int and long remain 32-bits (used on Win64)
For more information on 64-bit programming models, see "64-bit Programming Models: Why LP64?"
While I don't personally write code like this, I'll bet that it's out there in more than one place... and of course it'll break if you change the size of int.
int i, x = getInput();
for (i = 0; i < 32; i++)
{
if (x & (1 << i))
{
//Do something
}
}
Well, it's not like this story is all new. With "most computers" I assume you mean desktop computers. There already has been a transition from 16-bit to 32-bit int. Is there anything at all that says the same progression won't happen this time?
Not particularly. int is 64 bit on some 64 bit architectures (not x64).
The standard does not actually guarantee you get 32 bit integers, just that (u)int32_t can hold one.
Now if you are depending on int is the same size as ptrdiff_t you may be broken.
Remember, C does not guarantee that the machine even is a binary machine.

Fastest integer type for common architectures

The stdint.h header lacks an int_fastest_t and uint_fastest_t to correspond with the {,u}int_fastX_t types. For instances where the width of the integer type does not matter, how does one pick the integer type that allows processing the greatest quantity of bits with the least penalty to performance? For example, if one was searching for the first set bit in a buffer using a naive approach, a loop such as this might be considered:
// return the bit offset of the first 1 bit
size_t find_first_bit_set(void const *const buf)
{
uint_fastest_t const *p = buf; // use the fastest type for comparison to zero
for (; *p == 0; ++p); // inc p while no bits are set
// return offset of first bit set
return (p - buf) * sizeof(*p) * CHAR_BIT + ffsX(*p) - 1;
}
Naturally, using char would result in more operations than int. But long long might result in more expensive operations than the overhead of using int on a 32 bit system and so on.
My current assumption is for the mainstream architectures, the use of long is the safest bet: It's 32 bit on 32 bit systems, and 64 bit on 64 bit systems.
int_fast8_t is always the fastest integer type in a correct implementation. There can never be integer types smaller than 8 bits (because CHAR_BIT>=8 is required), and since int_fast8_t is the fastest integer type with at least 8 bits, it's thus the fastest integer type, period.
Theoretically, int is the best bet. It should map to the CPU's native register size, and thus be "optimal" in the sense you're asking about.
However, you may still find that an int-64 or int-128 is faster on some CPUs than an int-32, because although these are larger than the register size, they will reduce the number of iterations of your loop, and thus may work out more efficient by minimising the loop overheads and/or taking advantage of DMA to load/store the data faster.
(For example, on ARM-2 processors it took 4 memory cycles to load one 32-bit register, but only 5 cycles to load two sequentially, and 7 cycles to load 4 sequentially. The routine you suggest above would be optimised to use as many registers as you could free up (8 to 10 usually), and could therefore run up to 3 or 4 times faster by using multiple registers per loop iteration)
The only way to be sure is to write several routines and then profile them on the specific target machine to find out which produces the best performance.
I'm not sure I really understand the question, but why aren't you just using int? Quoting from my (free draft copy of the wrong, i. e. C++) standard, "Plain ints have the natural size suggested by the architecture of the execution environment."
But I think that if you want to have the optimal integer type for a certain operation, it will be different depending on which operation it is. Trying to find the first bit in a large data buffer, or finding a number in a sequence of integers, or moving them around, could very well have completely different optimal types.
EDIT:
For whatever it's worth, I did a small benchmark. On my particular system (Intel i7 920 with Linux, gcc -O3) it turns out that long ints (64 bits) are quite a bit faster than plain ints (32 bits), on this particular example. I would have guessed the opposite.
If you want to be certain you've got the fastest implementation, why not benchmark each one on the systems you're expecting to run on instead of trying to guess?
The answer is int itself. At least in C++, where 3.9.1/2 of the standard says:
Plain ints have the natural size
suggested by the architecture of the
execution environment
I expect the same is true for C, though I don't have any of the standards documents.
I would guess that the types size_t (for an unsigned type) and ptrdiff_t (for a signed type) will usually correspond to quite efficient integer types on any given platform.
But nothing can prove that than inspecting the produced assembler and to do benchmarks.
Edit, including the different comments, here and in other replies:
size_t and ptrdiff_t are the only typedefs that are normative in C99 and for which one may make a reasonable assumption that they are related to the architecture.
There are 5 different possible ranks for standard integer types (char, short, int, long, long long). All the forces go towards having types of width 8, 16, 32, 64 and in near future 128. As a consequence int will be stuck on 32 bit. Its definition will have nothing to do with efficiency on the platform, but just be constrained by that width requirement.
If you're compiling with gcc, i'd recommend using __builtin_ffs() for finding the first bit set:
Built-in Function: int __builtin_ffs (unsigned int x)
Returns one plus the index of the least significant 1-bit of x, or if x is zero, returns zero.
This will be compiled into (often a single) native assembly instruction.
It is not possible to answer this question since the question is incomplete. As an analogy, consider the question:
What is the fastest vehicle
A Bugatti Veyron? Certainly fast, but no good for going from London to New York.
What is missing from the question, is the context the integer will be used in. In the original example above, I doubt you'd see much difference between 8, 32 or 64 bit values if the array is large and sparse since you'll be hitting memory bandwidth limits before cpu limits.
The main point is, the architecture does not define what size the various integer types are, it's the compiler designer that does that. The designer will carefully weigh up the pros and cons for various sizes for each type for a given architecture and pick the most appropriate.
I guess the 32 bit int on the 64 bit system was chosen because for most operations ints are used for 32 bits are enough. Since memory bandwidth is a limiting factor, saving on memory use was probably the overriding factor.
For all existing mainstream architectures long is the fastest type at present for loop throughput.

Different int sizes on my computer and Arduino

Im working on a sparetime project, making some server code to an Arduino Duemilanove, but before I test this code on the controller I am testing it on my own machine (An OS X based macbook). I am using ints some places, and I am worried that this will bring up strange errors when code is compiled and run on the Arduino Duemilanove because the Arduino handles ints as 2 bytes, and my macbook handles ints as 4 bytes. Im not a hardcore C and C++ programmer, so I am in a bit of worry how an experienced programmer would handle this situation.
Should I restrict the code with a typedef that wrap my own definition of and int that is restricted to 2 bytes? Or is there another way around?
Your best bet is to use the stdint.h header. It defines typedefs that explicitly refer to the signedness and size of your variables. For example, a 16-bit unsigned integer is a uint16_t. It's part of the C99 standard, so it's available pretty much everywhere. See:
http://en.wikipedia.org/wiki/Stdint.h
The C standard defines an int as being a signed type large enough to at least hold all integers between -32768 and 32767 - implementations are free to choose larger types, and any modern 32-bit system will choose a 32-bit integer. However, as you've seen, some embedded platforms still use 16-bit ints. I would recommend using uint16_t or uint32_t if your arduino compiler supports it; if not, use preprocessor macros to typedef those types yourself.
The correct way to handle the situation is to choose the type based on the values it will need to represent:
If it's a general small integer, and the range -32767 to 32767 is OK, use int;
Otherwise, if the range -2147483647 to 2147483647 is OK, use long;
Otherwise, use long long.
If the range -32767 to 32767 is OK and space efficiency is important, use short (or signed char, if the range -127 to 127 is OK).
As long as you have made no other assumptions that these (ie. always using sizeof instead of assuming the width of the type), then your code will be portable.
In general, you should only need to use the fixed-width types from stdint.h for values that are being exchanged through a binary interface with another system - ie. being read from or written to the network or a file.
Will you need values smaller than −32,768 or bigger than +32,767? If not, ignore the differnt sizes. If you do need them, there's stdint.h with fixed-size integers, singned und unsigned, called intN_t/uintN_t (N = number of bits). It's C99, but most compilers will support it. Note that using integers with a size bigger than the CPU's wordsize (16 bit in this case) will hurt performance, as there are no native instructions for handling them.
avoid using the type int as it's size can depend upon architecture / compiler.
use short and long instead

Smart typedefs

I've always used typedef in embedded programming to avoid common mistakes:
int8_t - 8 bit signed integer
int16_t - 16 bit signed integer
int32_t - 32 bit signed integer
uint8_t - 8 bit unsigned integer
uint16_t - 16 bit unsigned integer
uint32_t - 32 bit unsigned integer
The recent embedded muse (issue 177, not on the website yet) introduced me to the idea that it's useful to have some performance specific typedefs. This standard suggests having typedefs that indicate you want the fastest type that has a minimum size.
For instance, one might declare a variable using int_fast16_t, but it would actually be implemented as an int32_t on a 32 bit processor, or int64_t on a 64 bit processor as those would be the fastest types of at least 16 bits on those platforms. On an 8 bit processor it would be int16_t bits to meet the minimum size requirement.
Having never seen this usage before I wanted to know
Have you seen this in any projects, embedded or otherwise?
Any possible reasons to avoid this sort of optimization in typedefs?
For instance, one might declare a
variable using int_fast16_t, but it
would actually be implemented as an
int32_t on a 32 bit processor, or
int64_t on a 64 bit processor as those
would be the fastest types of at least
16 bits on those platforms
That's what int is for, isn't it? Are you likely to encounter an 8-bit CPU any time soon, where that wouldn't suffice?
How many unique datatypes are you able to remember?
Does it provide so much additional benefit that it's worth effectively doubling the number of types to consider whenever I create a simple integer variable?
I'm having a hard time even imagining the possibility that it might be used consistently.
Someone is going to write a function which returns a int16fast_t, and then someone else is going to come along and store that variable into an int16_t.
Which means that in the obscure case where the fast variants are actually beneficial, it may change the behavior of your code. It may even cause compiler errors or warnings.
Check out stdint.h from C99.
The main reason I would avoid this typedef is that it allows the type to lie to the user. Take int16_t vs int_fast16_t. Both type names encode the size of the value into the name. This is not an uncommon practice in C/C++. I personally use the size specific typedefs to avoid confusion for myself and other people reading my code. Much of our code has to run on both 32 and 64 bit platforms and many people don't know the various sizing rules between the platforms. Types like int32_t eliminate the ambiguity.
If I had not read the 4th paragraph of your question and instead just saw the type name, I would have assumed it was some scenario specific way of having a fast 16 bit value. And I obviously would have been wrong :(. For me it would violate the "don't surprise people" rule of programming.
Perhaps if it had another distinguishing verb, letter, acronym in the name it would be less likely to confuse users. Maybe int_fast16min_t ?
When I am looking at int_fast16_t, and I am not sure about the native width of the CPU in which it will run, it may make things complicated, for example the ~ operator.
int_fast16_t i = 10;
int_16_t j = 10;
if (~i != ~j) {
// scary !!!
}
Somehow, I would like to willfully use 32 bit or 64 bit based on the native width of the processor.
I'm actually not much of a fan of this sort of thing.
I've seen this done many times (in fact, we even have these typedefs at my current place of employment)... For the most part, I doubt their true usefulness... It strikes me as change for changes sake... (and yes, I know the sizes of some of the built ins can vary)...
I commonly use size_t, it happens to be the fastest address size, a tradition I picked up in embedding. And it never caused any issues or confusion in embedded circles, but it actually began causing me problems when I began working on 64bit systems.

Resources