Standard byte sizes for variables in C? - c

So, I was writing an implementation of ye olde SHA1 algorithm in C (I know it's insecure, it's for the Matasano problems), and for some of the variables it's pretty crucial that they're exactly 32 bits long. Having read that unsigned long int is 32 bits by standard, I just used that, and then spent 4 hours trying to find why the hell my hashes were coming out all wrong, until I thought to see what sizeof(unsigned long int) came out to be. Spoiler, it was 64.
Now of course I'm using uint32_t, and always will in the future, but could someone (preferably, someone who has more discretion and less gullibility than I do) please point me to where the actual standards for variable sizes are written down for modern C? Or tell me why this question is misguided, if it is?

The minimum sizes are:
char - 8-bit
short - 16-bit
int - 16-bit
long - 32-bit
long long - 64-bit
There are no maximum sizes, the compiler writer chooses whatever they think will work best for the target platform. The corresponding unsigned types have the same minimum size.

Related

Is it possible to create a custom sized variable type in c?

Good evening, sorry in advance if I have a bad English, i'm French.
So, in C, there is different variable types, for example int, long, ... That takes a number of bytes depending of the type, and if I'm not wrong the "largest" type is long long int (or just long long) that takes 8 bytes of memory (like long which is weird so if someone could explain me that too thanks)
So my first question is: can I create my custom variable type that takes for example 16 bytes or am I forced to use strings if the number is too high for long long (or unsigned long long) ?
You can create custom types of all sorts, and if you want a "integer" type that is 16 bytes wide you could create a custom struct and pair two long longs together. But then you'd have to implement all the arithmetic on those types manually. This was quite common in the past when 16 bit (and even 32 bit) machines were most common, you'd have "bigint" libraries to do like 64-bit integer math. That's less useful now that most machines are either 64 bit or have long long support natively on 32 bit targets.
You used to see libraries with stuff like this quite often:
typedef struct _BigInt {
unsigned long long high;
unsigned long long low;
} BigInt;
// Arithmetic functions:
BigInt BigIntAdd(BigInt a, BigInt b);
// etc.
These have faded away somewhat because the current typical CPU register width is 64 bits, which allows for an enormous range of values, and unless you're working with very specialized data, it's not longer "common" in normal programming tasks to need values outside that range. As #datenwolf is explicit and correct about in the comments below, if you find the need for such functionality in production code, seek out a reliable and debugged library for it. (Writing your own could be a fun exercise, though this sort of thing is likely to be a bug farm if you try to just whip it up as a quick step along the way to other work.) As Eric P indicates in the comments above, clang offers a native way of doing this without a third party library.
(The weird ambiguities or equivalencies about the widths of long and long long are mostly historical, and if you didn't evolve with the platforms it's confusing and kind of unnecessary. See the comment on the question about this-- the C standard defines minimum sizes for the integer types but doesn't say they have to be different from each other; historically the types char, short, int, long and long long were often useful ways of distinguishing e.g. 8, 16, 32, and 64 bit sizes but it's a bit of a mess now and if you want a particular size modern platforms provide a uint32_t to guarantee size rather than using the "classic" C types.)
Obviously you can. By preference you should not use string, because computations with those will be a lot more complicated and slower.
Also, you may not want to use bytes, but the 2nd largest datatype available on your compiler, because detecting overflow can be cumbersome if you're using the largest datatype.

Motivation for using size_t uint32 uint64 etc

When I reading some code, for integer, they use bunch of different type such as size_t, uint32, uint64 etc.
What is the motivation or purpose to do this?
Why not just use int?
Related to platform-cross? Or low-level relevant.
Sometimes, the code make sense to me because they just want 32 bit int or something.
But, what is size_t?
Please help me make this clear.
These are for platform-independence.
size_t is, by definition, the type returned by sizeof. It is large enough to represent the largest object on the target system.
Not so many years ago, 32 bits would have been enough for any platform. 64 bits is enough today. But who knows how many bits will be needed 5, 10, or 50 years from now?
By writing your code not to care -- i.e., always use size_t when you mean "size of an object" -- you can write code that will actually compile and run 5, 10, or 50 years from now. Or at least have a fighting chance.
Use the types to say what you mean. If for some reason you require a specific number of bits (probably only when dealing with an externally-defined format), use a size-specific type. If you want something that is "the natural word size of the machine" -- i.e., fast -- use int.
If you are dealing with a programmatic interface like sizeof or strlen, use the data type appropriate for that interface, like size_t.
And never try to assign one type to another unless it is large enough to hold the value by definition.
The motivation to use them is because you can't rely on int, short or long to have any particular size - a mistake made by too many programmers far too many times in the past. If you look not too far back in history, there was a transition from 16 bit to 32 bit processors, which broke lots of code because people had wrongly relied on int being 16 bits. The same mistake was made thereafter when people relied on int to be 32 bits, and still do so even to this day.
Not to mention the terms int, short and long have been truly nuked by language designers who all decide to make them mean something different. A Java programmer reading some C will naively expect long to mean 64 bits. These terms are truly meaningless - they don't specify anything about a type, and I facepalm every time I see a new language released that still uses the terms.
The standard int types were a necessity so you can use the type you want to use. They should've deprecated int, short and long decades ago.
For info on size_t, see the Stack Overflow question: What is size_t in C?
You're right for uint32 and uint64 that they're just being specific about the number of bits that they would like, and that the compiler should interpret them as unsigned.
There are many possible reasons for choosing an underlying type for an integer value. The most obvious one is the size of the maximum possible value that you can store -- uint32 will be able to store a number twice as large as int32, which might be desirable. int64 will be able to store a number much larger than int32 - up to 2^63 - 1 instead of 2^31 - 1.
There are other possible reasons as well. If you're directly reading binary data from some source (file, socket, etc), it is necessary to make sure it's interpreted correctly. If someone writes a uint32 and you interpret it as an int32, it's possible that you interpret a very large positive number as a negative number (overflow).
size_t is just a typedef for an unsigned int, usually 32-bit I believe.
For most day-to-day programming, the size of an integer doesn't really matter all that much. But sometimes it is good to be specific. This is especially useful in low-level or embedded programming. Another place it is useful is scientific or computationally intensive tasks where it might be wasteful to use an int that is bigger than necessary.
The advantage of size_t is that it is unsigned. On the one hand it nice to use size_t because it adds more information about what the argument should be (i.e not negitave). On the other hand it is less tying vs. unsigned int.

What is the historical context for long and int often being the same size?

According to numerous answers here, long and int are both 32 bits in size on common platforms in C and C++ (Windows & Linux, 32 & 64 bit.) (I'm aware that there is no standard, but in practice, these are the observed sizes.)
So my question is, how did this come about? Why do we have two types that are the same size? I previously always assumed long would be 64 bits most of the time, and int 32. I'm not saying it "should" be one way or the other, I'm just curious as to how we got here.
From the C99 rationale (PDF) on section 6.2.5:
[...] In the 1970s, 16-bit C (for the
PDP-11) first represented file
information with 16-bit integers,
which were rapidly obsoleted by disk
progress. People switched to a 32-bit
file system, first using int[2]
constructs which were not only
awkward, but also not efficiently
portable to 32-bit hardware.
To solve the problem, the long type
was added to the language, even though
this required C on the PDP-11 to
generate multiple operations to
simulate 32-bit arithmetic. Even as
32-bit minicomputers became available
alongside 16-bit systems, people still
used int for efficiency, reserving
long for cases where larger integers
were truly needed, since long was
noticeably less efficient on 16-bit
systems. Both short and long were
added to C, making short available
for 16 bits, long for 32 bits, and
int as convenient for performance.
There was no desire to lock the
numbers 16 or 32 into the language, as
there existed C compilers for at least
24- and 36-bit CPUs, but rather to
provide names that could be used for
32 bits as needed.
PDP-11 C might have been
re-implemented with int as 32-bits,
thus avoiding the need for long; but
that would have made people change
most uses of int to short or
suffer serious performance degradation
on PDP-11s. In addition to the
potential impact on source code, the
impact on existing object code and
data files would have been worse, even
in 1976. By the 1990s, with an immense
installed base of software, and with
widespread use of dynamic linked
libraries, the impact of changing the
size of a common data object in an
existing environment is so high that
few people would tolerate it, although
it might be acceptable when creating a
new environment. Hence, many vendors,
to avoid namespace conflicts, have
added a 64-bit integer to their 32-bit
C environments using a new name, of
which long long has been the most
widely used. [...]
Historically, most of the sizes and types in C can be traced back to the PDP-11 architecture. That had bytes, words (16 bits) and doublewords (32 bits). When C and UNIX were moved to another machine (the Interdata 832 I think), the word length was 32 bits. To keep the source compatible, long and int were defined so that, strictly
sizeof(short) ≤ sizeof(int) ≤ sizeof(long).
Most machines now end up with sizeof(int) = sizeof(long) because 16 bits is no longer convenient, but we have long long to get 64 bits if needed.
Update strictly I should have said "compilers" because different compiler implmentors can make different decisions for the same instruction set architecture. GCC and Microsoft, for example.
Back in the late 70s and early 80s many architectures were 16 bit, so typically char was 8 bit, int was 16 bit and long was 32 bit. In the late 80s there was a general move to 32 bit architectures and so int became 32 bits but long remained at 32 bits.
Over the last 10 years there has been a move towards 64 bit computing and we now have a couple of different models, the most common being LP64, where ints are still 32 bits and long is now 64 bits.
Bottom line: don't make any assumptions about the sizes of different integer types (other than what's defined in the standard of course) and if you need fixed size types then use <stdint.h>.
As I understand it, the C standard requires that a long be at least 32 bits long, and be at least as long as an int. An int, on the other hand, is always (I think) equal to the native word size of the architecture.
Bear in mind that, when the standards were drawn up, 32-bit machines were not common; originally, an int would probably have been the native 16-bits, and a long would have been twice as long at 32-bits.
In 16-bit operating systems, int was 16-bit and long was 32-bit. After moving to Win32, both become 32 bit. Moving to 64 bit OS, it is a good idea to keep long size unchanged, this doesn't break existing code when it compiled in 64 bit. New types (like Microsoft-specific __int64, size_t etc.) may be used in 64 bit programs.

Size of an Integer in C

Does the ANSI C specification call for size of int to be equal to the word size (32 bit / 64 bit) of the system?
In other words, can I decipher the word size of the system based on the space allocated to an int?
The size of the int type is implementation-dependent, but cannot be shorter than 16 bits. See the Minimum Type Limits section here.
This Linux kernel development site claims that the size of the long type is guaranteed to be the machine's word size, but that statement is likely to be false: I couldn't find any confirmation of that in the standard, and long is only 32 bits wide on Win64 systems (since these systems use the LLP64 data model).
The language specification recommends that int should have the natural "word" size for the hardware platform. However, it is not strictly required. If you noticed, to simplify 32-bit-to-64-bit code transition some modern implementations prefer to keep int as 32-bit type even if the underlying hardware platform has 64-bit word size.
And as Frederic already noted, in any case the size may not be smaller than 16 value-forming bits.
The original intension was the int would be the word size - the most efficient data-processing size. Still, what tends to happen is that massive amounts of code are written that assume the size of int is X bits, and when the hardware that code runs on moves to larger word size, the carelessly-written code would break. Compiler vendors have to keep their customers happy, so they say "ok, we'll leave int sized as before, but we'll make long bigger now". Or, "ahhh... too many people complained about us making long bigger, we'll create a long long type while leaving sizeof(int) == sizeof(long)". So, these days, it's all a mess:
Does the ANSI C specification call for size of int to be equal to the word size (32 bit / 64 bit) of the system?
Pretty much the idea, but it doesn't insist on it.
In other words, can I decipher the word size of the system based on the space allocated to an int?
Not in practice.
You should check your system provided limits.h header file. INT_MAX declaration should help you back-calculate what is the minimum size an integer must have. For details look into http://www.opengroup.org/onlinepubs/009695399/basedefs/limits.h.html

Different int sizes on my computer and Arduino

Im working on a sparetime project, making some server code to an Arduino Duemilanove, but before I test this code on the controller I am testing it on my own machine (An OS X based macbook). I am using ints some places, and I am worried that this will bring up strange errors when code is compiled and run on the Arduino Duemilanove because the Arduino handles ints as 2 bytes, and my macbook handles ints as 4 bytes. Im not a hardcore C and C++ programmer, so I am in a bit of worry how an experienced programmer would handle this situation.
Should I restrict the code with a typedef that wrap my own definition of and int that is restricted to 2 bytes? Or is there another way around?
Your best bet is to use the stdint.h header. It defines typedefs that explicitly refer to the signedness and size of your variables. For example, a 16-bit unsigned integer is a uint16_t. It's part of the C99 standard, so it's available pretty much everywhere. See:
http://en.wikipedia.org/wiki/Stdint.h
The C standard defines an int as being a signed type large enough to at least hold all integers between -32768 and 32767 - implementations are free to choose larger types, and any modern 32-bit system will choose a 32-bit integer. However, as you've seen, some embedded platforms still use 16-bit ints. I would recommend using uint16_t or uint32_t if your arduino compiler supports it; if not, use preprocessor macros to typedef those types yourself.
The correct way to handle the situation is to choose the type based on the values it will need to represent:
If it's a general small integer, and the range -32767 to 32767 is OK, use int;
Otherwise, if the range -2147483647 to 2147483647 is OK, use long;
Otherwise, use long long.
If the range -32767 to 32767 is OK and space efficiency is important, use short (or signed char, if the range -127 to 127 is OK).
As long as you have made no other assumptions that these (ie. always using sizeof instead of assuming the width of the type), then your code will be portable.
In general, you should only need to use the fixed-width types from stdint.h for values that are being exchanged through a binary interface with another system - ie. being read from or written to the network or a file.
Will you need values smaller than −32,768 or bigger than +32,767? If not, ignore the differnt sizes. If you do need them, there's stdint.h with fixed-size integers, singned und unsigned, called intN_t/uintN_t (N = number of bits). It's C99, but most compilers will support it. Note that using integers with a size bigger than the CPU's wordsize (16 bit in this case) will hurt performance, as there are no native instructions for handling them.
avoid using the type int as it's size can depend upon architecture / compiler.
use short and long instead

Resources