size_t vs. uintptr_t - c

The C standard guarantees that size_t is a type that can hold any array index. This means that, logically, size_t should be able to hold any pointer type. I've read on some sites that I found on the Googles that this is legal and/or should always work:
void *v = malloc(10);
size_t s = (size_t) v;
So then in C99, the standard introduced the intptr_t and uintptr_t types, which are signed and unsigned types guaranteed to be able to hold pointers:
uintptr_t p = (size_t) v;
So what is the difference between using size_t and uintptr_t? Both are unsigned, and both should be able to hold any pointer type, so they seem functionally identical. Is there any real compelling reason to use uintptr_t (or better yet, a void *) rather than a size_t, other than clarity? In an opaque structure, where the field will be handled only by internal functions, is there any reason not to do this?
By the same token, ptrdiff_t has been a signed type capable of holding pointer differences, and therefore capable of holding most any pointer, so how is it distinct from intptr_t?
Aren't all of these types basically serving trivially different versions of the same function? If not, why? What can't I do with one of them that I can't do with another? If so, why did C99 add two essentially superfluous types to the language?
I'm willing to disregard function pointers, as they don't apply to the current problem, but feel free to mention them, as I have a sneaking suspicion they will be central to the "correct" answer.

size_t is a type that can hold any array index. This means that,
logically, size_t should be able to
hold any pointer type
Not necessarily! Hark back to the days of segmented 16-bit architectures for example: an array might be limited to a single segment (so a 16-bit size_t would do) BUT you could have multiple segments (so a 32-bit intptr_t type would be needed to pick the segment as well as the offset within it). I know these things sound weird in these days of uniformly addressable unsegmented architectures, but the standard MUST cater for a wider variety than "what's normal in 2009", you know!-)

Regarding your statement:
"The C standard guarantees that size_t is a type that can hold any array index. This means that, logically, size_t should be able to hold any pointer type."
This is actually a fallacy (a misconception resulting from incorrect reasoning)(a). You may think the latter follows from the former but that's not actually the case.
Pointers and array indexes are not the same thing. It's quite plausible to envisage a conforming implementation that limits arrays to 65536 elements but allows pointers to address any value into a massive 128-bit address space.
C99 states that the upper limit of a size_t variable is defined by SIZE_MAX and this can be as low as 65535 (see C99 TR3, 7.18.3, unchanged in C11). Pointers would be fairly limited if they were restricted to this range in modern systems.
In practice, you'll probably find that your assumption holds, but that's not because the standard guarantees it. Because it actually doesn't guarantee it.
(a) This is not some form of personal attack by the way, just stating why your statements are erroneous in the context of critical thinking. For example, the following reasoning is also invalid:
All puppies are cute. This thing is cute. Therefore this thing must be a puppy.
The cuteness or otherwise of puppiess has no bearing here, all I'm stating is that the two facts do not lead to the conclusion, because the first two sentences allow for the existance of cute things that are not puppies.
This is similar to your first statement not necessarily mandating the second.

I'll let all the other answers stand for themselves regarding the reasoning with segment limitations, exotic architectures, and so on.
Isn't the simple difference in names reason enough to use the proper type for the proper thing?
If you're storing a size, use size_t. If you're storing a pointer, use intptr_t. A person reading your code will instantly know that "aha, this is a size of something, probably in bytes", and "oh, here's a pointer value being stored as an integer, for some reason".
Otherwise, you could just use unsigned long (or, in these here modern times, unsigned long long) for everything. Size is not everything, type names carry meaning which is useful since it helps describe the program.

It's possible that the size of the largest array is smaller than a pointer. Think of segmented architectures - pointers may be 32-bits, but a single segment may be able to address only 64KB (for example the old real-mode 8086 architecture).
While these aren't commonly in use in desktop machines anymore, the C standard is intended to support even small, specialized architectures. There are still embedded systems being developed with 8 or 16 bit CPUs for example.

I would imagine (and this goes for all type names) that it better conveys your intentions in code.
For example, even though unsigned short and wchar_t are the same size on Windows (I think), using wchar_t instead of unsigned short shows the intention that you will use it to store a wide character, rather than just some arbitrary number.

Looking both backwards and forwards, and recalling that various oddball architectures were scattered about the landscape, I'm pretty sure they were trying to wrap all existing systems and also provide for all possible future systems.
So sure, the way things settled out, we have so far needed not so many types.
But even in LP64, a rather common paradigm, we needed size_t and ssize_t for the system call interface. One can imagine a more constrained legacy or future system, where using a full 64-bit type is expensive and they might want to punt on I/O ops larger than 4GB but still have 64-bit pointers.
I think you have to wonder: what might have been developed, what might come in the future. (Perhaps 128-bit distributed-system internet-wide pointers, but no more than 64 bits in a system call, or perhaps even a "legacy" 32-bit limit. :-) Image that legacy systems might get new C compilers...
Also, look at what existed around then. Besides the zillion 286 real-mode memory models, how about the CDC 60-bit word / 18-bit pointer mainframes? How about the Cray series? Never mind normal ILP64, LP64, LLP64. (I always thought microsoft was pretensious with LLP64, it should have been P64.) I can certainly imagine a committee trying to cover all bases...

size_t vs. uintptr_t
In addition to other good answers:
size_t is defined in <stddef.h>, <stdio.h>, <stdlib.h>, <string.h>, <time.h>, <uchar.h>, <wchar.h>. It is at least 16-bit.
uintptr_t is defined in <stdint.h>. It is optional. A compliant library might not define it, likely because there is not a wide-enough integer type to round trip a void*-uintptr_t-void *.
Both are unsigned integer types.
Note: the optional companion intptr_t is a signed integer type.

int main(){
int a[4]={0,1,5,3};
int a0 = a[0];
int a1 = *(a+1);
int a2 = *(2+a);
int a3 = 3[a];
return a2;
}
Implying that intptr_t must always substitute for size_t and visa versa.

Related

When should I just use "int" versus more sign-specific or size-specific types?

I have a little VM for a programming language implemented in C. It supports being compiled under both 32-bit and 64-bit architectures as well as both C and C++.
I'm trying to make it compile cleanly with as many warnings enabled as possible. When I turn on CLANG_WARN_IMPLICIT_SIGN_CONVERSION, I get a cascade of new warnings.
I'd like to have a good strategy for when to use int versus either explicitly unsigned types, and/or explicitly sized ones. So far, I'm having trouble deciding what that strategy should be.
It's certainly true that mixing them—using mostly int for things like local variables and parameters and using narrower types for fields in structs—causes lots of implicit conversion problems.
I do like using more specifically sized types for struct fields because I like the idea of explicitly controlling memory usage for objects in the heap. Also, for hash tables, I rely on unsigned overflow when hashing, so it's nice if the hash table's size is stored as uint32_t.
But, if I try to use more specific types everywhere, I find myself in a maze of twisty casts everywhere.
What do other C projects do?
Just using int everywhere may seem tempting, since it minimizes the need for casting, but there are several potential pitfalls you should be aware of:
An int might be shorter than you expect. Even though, on most desktop platforms, an int is typically 32 bits, the C standard only guarantees a minimum length of 16 bits. Could your code ever need numbers larger than 216−1 = 32,767, even for temporary values? If so, don't use an int. (You may want to use a long instead; a long is guaranteed to be at least 32 bits.)
Even a long might not always be long enough. In particular, there is no guarantee that the length of an array (or of a string, which is a char array) fits in a long. Use size_t (or ptrdiff_t, if you need a signed difference) for those.
In particular, a size_t is defined to be large enough to hold any valid array index, whereas an int or even a long might not be. Thus, for example, when iterating over an array, your loop counter (and its initial / final values) should generally be a size_t, at least unless you know for sure that the array is short enough for a smaller type to work. (But be careful when iterating backwards: size_t is unsigned, so for(size_t i = n-1; i >= 0; i--) is an infinite loop! Using i != SIZE_MAX or i != (size_t) -1 should work, though; or use a do/while loop, but beware of the case n == 0!)
An int is signed. In particular, this means that int overflow is undefined behavior. If there's ever any risk that your values might legitimately overflow, don't use an int; use an unsigned int (or an unsigned long, or uintNN_t) instead.
Sometimes, you just need a fixed bit length. If you're interfacing with an ABI, or reading / writing a file format, that requires integers of a specific length, then that's the length you need to use. (Of course, is such situations, you may also need to worry about things like endianness, and so may sometimes have to resort to manually packing data byte-by-byte anyway.)
All that said, there are also reasons to avoid using the fixed-length types all the time: not only is int32_t awkward to type all the time, but forcing the compiler to always use 32-bit integers is not always optimal, particularly on platforms where the native int size might be, say, 64 bits. You could use, say, C99 int_fast32_t, but that's even more awkward to type.
Thus, here are my personal suggestions for maximum safety and portability:
Define your own integer types for casual use in a common header file, something like this:
#include <limits.h>
typedef int i16;
typedef unsigned int u16;
#if UINT_MAX >= 4294967295U
typedef int i32;
typedef unsigned int u32;
#else
typedef long i32;
typedef unsigned long i32;
#endif
Use these types for anything where the exact size of the type doesn't matter, as long as they're big enough. The type names I've suggested are both short and self-documenting, so they should be easy to use in casts where needed, and minimize the risk of errors due to using a too-narrow type.
Conveniently, the u32 and u16 types defined as above are guaranteed to be at least as wide as unsigned int, and thus can be used safely without having to worry about them being promoted to int and causing undefined overflow behavior.
Use size_t for all array sizes and indexing, but be careful when casting between it and any other integer types. Optionally, if you don't like to type so many underscores, typedef a more convenient alias for it too.
For calculations that assume overflow at a specific number of bits, either use uintNN_t, or just use u16 / u32 as defined above and explicit bitmasking with &. If you choose to use uintNN_t, make sure to protect yourself against unexpected promotion to int; one way to do that is with a macro like:
#define u(x) (0U + (x))
which should let you safely write e.g.:
uint32_t a = foo(), b = bar();
uint32_t c = u(a) * u(b); /* this is always unsigned multiply */
For external ABIs that require a specific integer length, again define a specific type, e.g.:
typedef int32_t fooint32; /* foo ABI needs 32-bit ints */
Again, this type name is self-documenting, with regard to both its size and its purpose.
If the ABI might actually require, say, 16- or 64-bit ints instead, depending on the platform and/or compile-time options, you can change the type definition to match (and rename the type to just fooint) — but then you really do need to be careful whenever you cast anything to or from that type, because it might overflow unexpectedly.
If your code has its own structures or file formats that require specific bitlengths, consider defining custom types for those too, exactly as if it was an external ABI. Or you could just use uintNN_t instead, but you'll lose a little bit of self-documentation that way.
For all these types, don't forget to also define the corresponding _MIN and _MAX constants for easy bounds checking. This might sound like a lot of work, but it's really just a couple of lines in a single header file.
Finally, remember to be careful with integer math, especially overflows.
For example, keep in mind that the difference of two n-bit signed integers may not fit in an n-bit int. (It will fit into an n-bit unsigned int, if you know it's non-negative; but remember that you need to cast the inputs to an unsigned type before taking their difference to avoid undefined behavior!)
Similarly, to find the average of two integers (e.g. for a binary search), don't use avg = (lo + hi) / 2, but rather e.g. avg = lo + (hi + 0U - lo) / 2; the former will break if the sum overflows.
You seem to know what you are doing, judging from the linked source code, which I took a glance at.
You said it yourself - using "specific" types makes you have more casts. That's not an optimal route to take anyway. Use int as much as you can, for things that do not mandate a more specialized type.
The beauty of int is that it is abstracted over the types you speak of. It is optimal in all cases where you need not expose the construct to a system unaware of int. It is your own tool for abstracting the platform for your program(s). It may also yield you speed, size and alignment advantage, depending.
In all other cases, e.g. where you want to deliberately stay close to machine specifications, int can and sometimes should be abandoned. Typical cases include network protocols where the data goes on the wire, and interoperability facilities - bridges of sorts between C and other languages, kernel assembly routines accessing C structures. But don't forget that sometimes you would want to in fact use int even in these cases, as it follows platforms own "native" or preferred word size, and you might want to rely on that very property.
With platform types like uint32_t, a kernel might want to use these (although it may not have to) in its data structures if these are accessed from both C and assembler, as the latter doesn't typically know what int is supposed to be.
To sum up, use int as much as possible and resort to moving from more abstract types to "machine" types (bytes/octets, words, etc) in any situation which may require so.
As to size_t and other "usage-suggestive" types - as long as syntax follows semantics inherent to the type - say, using size_t for well, size values of all kinds - I would not contest. But I would not liberally apply it to anything just because it is guaranteed to be the largest type (regardless if it is actually true). That's an underwater stone you don't want to be stepping on later. Code has to be self-explanatory to the degree possible, I would say - having a size_t where none is naturally expected, would raise eyebrows, for a good reason. Use size_t for sizes. Use offset_t for offsets. Use [u]intN_t for octets, words, and such things. And so on.
This is about applying semantics inherent in a particular C type, to your source code, and about the implications on the running program.
Also, as others have illustrated, don't shy away from typedef, as it gives you the power to efficiently define your own types, an abstraction facility I personally value. A good program source code may not even expose a single int, nevertheless relying on int aliased behind a multitude of purpose-defined types. I am not going to cover typedef here, the other answers hopefully will.
Keep large numbers that are used to access members of arrays, or control buffers as size_t.
For an example of a project that makes use of size_t, refer to GNU's dd.c, line 155.
Here are a few things I do. Not sure they're for everyone but they work for me.
Never use int or unsigned int directly. There always seems to be a more appropriately named type for the job.
If a variable needs to be a specific width (e.g. for a hardware register or to match a protocol) use a width-specific type (e.g. uint32_t).
For array iterators, where I want to access array elements 0 thru n, this should also be unsigned (no reason to access any index less than 0) and I use one of the fast types (e.g. uint_fast16_t), selecting the type based on the minimum size required to access all array elements. For example, if I have a for loop that will iterate through 24 elements max, I'll use uint_fast8_t and let the compiler (or stdint.h, depending how pedantic we want to get) decide which is the fastest type for that operation.
Always use unsigned variables unless there is a specific reason for them to be signed.
If your unsigned variables and signed variables need to play together, use explicit casts and be aware of the consequences. (Luckily this will be minimized if you avoid using signed variables except where absolutely necessary.)
If you disagree with any of those or have recommended alternatives please let me know in the comments! That's the life of a software developer... we keep learning or we become irrelevant.
Always.
Unless you have specific reasons for using a more specific type, including you're on a 16-bit platform and need integers greater than 32767, or you need to ensure proper byte order and signage for data exchange over a network or in a file (and unless you're resource constrained, consider transferring data in "plain text," meaning ASCII or UTF8 if you prefer).
My experience has shown that "just use 'int'" is a good maxim to live by and makes it possible to turn out working, easily maintained, correct code quickly every time. But your specific situation may differ, so take this advice with a bit of well-deserved scrutiny.
Most of the time, using int is not ideal. The main reason is that int is signed and signed can cause UB, signed integers can also be negative, something that you don't need for most integers. Prefer unsigned integers. Secondly, data types reflect meaning and a, very limited, way to document the used range and values this variable may have. If you use int, you imply that you expect this variable to sometimes hold negative values, that this values probably do not always fit into 8 bit but always fit into INT_MAX, which can be as low as 32767. Do not assume a int is 32 bit.
Always, think about the possible values of a variable and choose the type accordingly. I use the following rules:
Use unsigned integers except when you need to be able to handle negative numbers.
If you want to index an array, from the start, use size_t except when there are good reasons not to. Almost never use int for it, a int can be too small and there is a high chance of creating a UB bug that isn't found during testing because you never tested arrays large enough.
Same for array sizes and sizes of other object, prefer size_t.
If you need to index array with negative index, which you may need for image processing, prefer ptrdiff_t. But be aware, ptrdiff_t can be too small, but that is rare.
If you have arrays that never exceed a certain size, you may use uint_fastN_t, uintN_t, or uint_leastN_t types. This can make a lot of sense especially on a 8 bit microcontroller.
Sometimes, unsigned int can be used instead of uint_fast16_t, similarly int for int_fast16_t.
To handle the value of a single byte (or character, but this is not a real character because of UTF-8 and Unicode sometimes using more than one code pointer per character), use int. int can store -1 if you need an indicator for error or not set and a character literal is of type int. (This is true for C, for C++ you may use a different strategy). There is the extremely rare possibility that a machine uses sizeof(int)==1 && CHAR_MIN==0 where a byte can not be handled with a int, but i never saw such a machine.
It can make sense to define your own types for different purposes.
Use explicit cast where casts are needed. This way the code is well defined and has the least amount of unexpected behaviour.
After a certain size, a project needs a list/enum of the native integer data types. You can use macros with the _Generic expression from C11, that only needs to handle bool, signed char, short, int, long, long long and their unsigned counterparts to get the underlying native type from a typedefed one. This way your parsers and similar parts only need to handle 11 integer types and not 56 standard integer (if i counted correctly), and a bunch of other non-standard types.

Use of unsigned on the heap

I'm trying to find all instances in a C program that I've written in which I can improve performance. In a few places the program operates very slowly on account of large arrays allocated to the heap, where some of these arrays are integer arrays. I've read, here on stackoverflow and through other sources, that we should always make use of "unsigned" if there is no instance of a negative integer.
While I don't have many instances of division by factors of two, where performance could see a significant boost by making this change, is there a difference in the handling of memory for a large array of int vs. unsigned int? Similarly, does initializing a large array of ints with calloc operate differently when initializing the same array with unsigned int?
Thanks!
In ISO C (C99), int is signed and has a minimum range of at least -32767 through 32767 include,
unsigned int is to 0 through 65535 include.
They are both coded on the same amount of bits, and thus, allocating X number of int and X number of unsigned int does not make a single difference.
As Mat pointed out, calloc doesn't even care what types you're giving it.
I think you're misinterpreting the advice you think you got. Whether to use signed or unsigned integer types can go either way with respect to performance, but unless you really need to optimize a particular bottleneck that you have measured to be a bottleneck, you should always be choosing the types that convey the intended semantic, not a type that "somebody told you is faster".
As for the specific question you asked, there is no way to search for data of a particular type "on the heap". Objects in C do not carry their types with them as part of their representation; while they formally have types, the type is represented in the code that accesses them, not in the objects themselves. So if you want to search for use of signed or unsigned integer objects, you should search your source code, not the heap. But again, this is not the way to solve a performance problem.
Firstly, your "always" in "should always make use of "unsigned" if there is no instance of a negative integer" is overly strong. It is largely a matter of personal preference and/or coding standard. There are arguments in favor of either approach. I personally prefer to follow that rule, i.e. all naturally unsigned quantities are represented by unsigned types in my code. But using a signed type in such cases does not necessarily represent a design error.
Secondly, all this has nothing to do with dynamic memory allocation or large arrays. How such arrays behave in memory is completely independent from what kind of data you store in them. Proper memory management is important for achieving good performance, but this is a completely independent matter, not in any way related to the question using signed or unsigned integer types.
Thirdly, even though unsigned types formally perform better in integer arithmetic, and even if your code performs massive amount of work with those integers, switching from signed to unsigned is unlikely to produce any notable improvement in performance.
Heap allocations is C are done with malloc, which knows nothing about your intent.

Why aren't the C-supplied integer types good enough for basically any project?

I'm much more of a sysadmin than a programmer. But I do spend an inordinate amount of time grovelling through programmers' code trying to figure out what went wrong. And a disturbing amount of that time is spent dealing with problems when the programmer expected one definition of __u_ll_int32_t or whatever (yes, I know that's not real), but either expected the file defining that type to be somewhere other than it is, or (and this is far worse but thankfully rare) expected the semantics of that definition to be something other than it is.
As I understand C, it deliberately doesn't make width definitions for integer types (and that this is a Good Thing), but instead gives the programmer char, short, int, long, and long long, in all their signed and unsigned glory, with defined minima which the implementation (hopefully) meets. Furthermore, it gives the programmer various macros that the implementation must provide to tell you things like the width of a char, the largest unsigned long, etc. And yet the first thing any non-trivial C project seems to do is either import or invent another set of types that give them explicitly 8, 16, 32, and 64 bit integers. This means that as the sysadmin, I have to have those definition files in a place the programmer expects (that is, after all, my job), but then not all of the semantics of all those definitions are the same (this wheel has been re-invented many times) and there's no non-ad-hoc way that I know of to satisfy all of my users' needs here. (I've resorted at times to making a <bits/types_for_ralph.h>, which I know makes puppies cry every time I do it.)
What does trying to define the bit-width of numbers explicitly (in a language that specifically doesn't want to do that) gain the programmer that makes it worth all this configuration management headache? Why isn't knowing the defined minima and the platform-provided MAX/MIN macros enough to do what C programmers want to do? Why would you want to take a language whose main virtue is that it's portable across arbitrarily-bitted platforms and then typedef yourself into specific bit widths?
When a C or C++ programmer (hereinafter addressed in second-person) is choosing the size of an integer variable, it's usually in one of the following circumstances:
You know (at least roughly) the valid range for the variable, based on the real-world value it represents. For example,
numPassengersOnPlane in an airline reservation system should accommodate the largest supported airplane, so needs at least 10 bits. (Round up to 16.)
numPeopleInState in a US Census tabulating program needs to accommodate the most populous state (currently about 38 million), so needs at least 26 bits. (Round up to 32.)
In this case, you want the semantics of int_leastN_t from <stdint.h>. It's common for programmers to use the exact-width intN_t here, when technically they shouldn't; however, 8/16/32/64-bit machines are so overwhelmingly dominant today that the distinction is merely academic.
You could use the standard types and rely on constraints like “int must be at least 16 bits”, but a drawback of this is that there's no standard maximum size for the integer types. If int happens to be 32 bits when you only really needed 16, then you've unnecessarily doubled the size of your data. In many cases (see below), this isn't a problem, but if you have an array of millions of numbers, then you'll get lots of page faults.
Your numbers don't need to be that big, but for efficiency reasons, you want a fast, “native” data type instead of a small one that may require time wasted on bitmasking or zero/sign-extension.
This is the int_fastN_t types in <stdint.h>. However, it's common to just use the built-in int here, which in the 16/32-bit days had the semantics of int_fast16_t. It's not the native type on 64-bit systems, but it's usually good enough.
The variable is an amount of memory, array index, or casted pointer, and thus needs a size that depends on the amount of addressable memory.
This corresponds to the typedefs size_t, ptrdiff_t, intptr_t, etc. You have to use typedefs here because there is no built-in type that's guaranteed to be memory-sized.
The variable is part of a structure that's serialized to a file using fread/fwrite, or called from a non-C language (Java, COBOL, etc.) that has its own fixed-width data types.
In these cases, you truly do need an exact-width type.
You just haven't thought about the appropriate type, and use int out of habit.
Often, this works well enough.
So, in summary, all of the typedefs from <stdint.h> have their use cases. However, the usefulness of the built-in types is limited due to:
Lack of maximum sizes for these types.
Lack of a native memsize type.
The arbitrary choice between LP64 (on Unix-like systems) and LLP64 (on Windows) data models on 64-bit systems.
As for why there are so many redundant typedefs of fixed-width (WORD, DWORD, __int64, gint64, FINT64, etc.) and memsize (INT_PTR, LPARAM, VPTRDIFF, etc.) integer types, it's mainly because <stdint.h> came late in C's development, and people are still using older compilers that don't support it, so libraries need to define their own. Same reason why C++ has so many string classes.
Sometimes it is important. For example, most image file formats require an exact number of bits/bytes be used (or at least specified).
If you only wanted to share a file created by the same compiler on the same computer architecture, you would be correct (or at least things would work). But, in real life things like file specifications and network packets are created by a variety of computer architectures and compilers, so we have to care about the details in these case (at least).
The main reason the fundamental types can't be fixed is that a few machines don't use 8-bit bytes. Enough programmers don't care, or actively want not to be bothered with support for such beasts, that the majority of well-written code demands a specific number of bits wherever overflow would be a concern.
It's better to specify a required range than to use int or long directly, because asking for "relatively big" or "relatively small" is fairly meaningless. The point is to know what inputs the program can work with.
By the way, usually there's a compiler flag that will adjust the built-in types. See INT_TYPE_SIZE for GCC. It might be cleaner to stick that into the makefile, than to specialize the whole system environment with new headers.
If you want portable code, you want the code your write to function identically on all platforms. If you have
int i = 32767;
you can't say for certain what i+1 will give you on all platforms.
This is not portable. Some compilers (on the same CPU architecture!) will give you -32768 and some will give you 32768. Some perverted ones will give you 0. That's a pretty big difference. Granted if it overflows, this is Undefined Behavior, but you don't know it is UB unless you know exactly what the size of int is.
If you use the standard integer definitions (which is <stdint.h>, the ISO/IEC 9899:1999 standard), then you know the answer of +1 will give exact answer.
int16_t i = 32767;
i+1 will overflow (and on most compilers, i will appear to be -32768)
uint16_t j = 32767;
j+1 gives 32768;
int8_t i = 32767; // should be a warning but maybe not. most compilers will set i to -1
i+1 gives 0; (//in this case, the addition didn't overflow
uint8_t j = 32767; // should be a warning but maybe not. most compilers will set i to 255
i+1 gives 0;
int32_t i = 32767;
i+1 gives 32768;
uint32_t j = 32767;
i+1 gives 32768;
There are two opposing forces at play here:
The need for C to adapt to any CPU architecture in a natural way.
The need for data transferred to/from a program (network, disk, file, etc.) so that a program running on any architecture can correctly interpret it.
The "CPU matching" need has to do with inherent efficiency. There is CPU quantity which is most easily handled as a single unit which all arithmetic operations easily and efficiently are performed on, and which results in the need for the fewest bits of instruction encoding. That type is int. It could be 16 bits, 18 bits*, 32 bits, 36 bits*, 64 bits, or even 128 bits on some machines. (* These were some not-well-known machines from the 1960s and 1970s which may have never had a C compiler.)
Data transfer needs when transferring binary data require that record fields are the same size and alignment. For this it is quite important to have control of data sizes. There is also endianness and maybe binary data representations, like floating point representations.
A program which forces all integer operations to be 32 bit in the interests of size compatibility will work well on some CPU architectures, but not others (especially 16 bit, but also perhaps some 64-bit).
Using the CPU's native register size is preferable if all data interchange is done in a non-binary format, like XML or SQL (or any other ASCII encoding).

Reasons to use (or not) stdint

I already know that stdint is used to when you need specific variable sizes for portability between platforms. I don't really have such an issue for now, but what are the cons and pros of using it besides the already shown fact above?
Looking for this on stackoverflow and others sites, I found 2 links that treats about the theme:
codealias.info - this one talks about the portability of the stdint.
stackoverflow - this one is more specific about uint8_t.
These two links are great specially if one is looking to know more about the main reason of this header - portability. But for me, what I like most about it is that I think uint8_t is cleaner than unsigned char (for storing an RBG channel value for example), int32_t looks more meaningful than simply int, etc.
So, my question is, exactly what are the cons and pros of using stdint besides the portability? Should I use it just in some specifics parts of my code, or everywhere? if everywhere, how can I use functions like atoi(), strtok(), etc. with it?
Thanks!
Pros
Using well-defined types makes the code far easier and safer to port, as you won't get any surprises when for example one machine interprets int as 16-bit and another as 32-bit. With stdint.h, what you type is what you get.
Using int etc also makes it hard to detect dangerous type promotions.
Another advantage is that by using int8_t instead of char, you know that you always get a signed 8 bit variable. char can be signed or unsigned, it is implementation-defined behavior and varies between compilers. Therefore, the default char is plain dangerous to use in code that should be portable.
If you want to give the compiler hints of that a variable should be optimized, you can use the uint_fastx_t which tells the compiler to use the fastest possible integer type, at least as large as 'x'. Most of the time this doesn't matter, the compiler is smart enough to make optimizations on type sizes no matter what you have typed in. Between sequence points, the compiler can implicitly change the type to another one than specified, as long as it doesn't affect the result.
Cons
None.
Reference: MISRA-C:2004 rule 6.3."typedefs that indicate size and signedness shall be used in place of the basic types".
EDIT : Removed incorrect example.
The only reason to use uint8_t rather than unsigned char (aside from aesthetic preference) is if you want to document that your program requires char to be exactly 8 bits. uint8_t exists if and only if CHAR_BIT==8, per the requirements of the C standard.
The rest of the intX_t and uintX_t types are useful in the following situations:
reading/writing disk/network (but then you also have to use endian conversion functions)
when you want unsigned wraparound behavior at an exact cutoff (but this can be done more portably with the & operator).
when you're controlling the exact layout of a struct because you need to ensure no padding exists (e.g. for memcmp or hashing purposes).
On the other hand, the uint_least8_t, etc. types are useful anywhere that you want to avoid using wastefully large or slow types but need to ensure that you can store values of a certain magnitude. For example, while long long is at least 64 bits, it might be 128-bit on some machines, and using it when what you need is just a type that can store 64 bit numbers would be very wasteful on such machines. int_least64_t solves the problem.
I would avoid using the [u]int_fastX_t types entirely since they've sometimes changed on a given machine (breaking the ABI) and since the definitions are usually wrong. For instance, on x86_64, the 64-bit integer type is considered the "fast" one for 16-, 32-, and 64-bit values, but while addition, subtraction, and multiplication are exactly the same speed whether you use 32-bit or 64-bit values, division is almost surely slower with larger-than-necessary types, and even if they were the same speed, you're using twice the memory for no benefit.
Finally, note that the arguments some answers have made about the inefficiency of using int32_t for a counter when it's not the native integer size are technically mostly correct, but it's irrelevant to correct code. Unless you're counting some small number of things where the maximum count is under your control, or some external (not in your program's memory) thing where the count might be astronomical, the correct type for a count is almost always size_t. This is why all the standard C functions use size_t for counts. Don't consider using anything else unless you have a very good reason.
cons
The primary reason the C language does not specify the size of int or long, etc. is for computational efficiency. Each architecture has a natural, most-efficient size, and the designers specifically empowered and intended the compiler implementor to use the natural native data size data for speed and code size efficiency.
In years past, communication with other machines was not a primary concern—most programs were local to the machine—so the predictability of each data type's size was of little concern.
Insisting that a particular architecture use a particular size int to count with is a really bad idea, even though it would seem to make other things easier.
In a way, thanks to XML and its brethren, data type size again is no longer much of a concern. Shipping machine-specific binary structures from machine to machine is again the exception rather than the rule.
I use stdint types for one reason only, when the data I hold in memory shall go on disk/network/descriptor in binary form. You only have to fight the little-endian/big-endian issue but that's relatively easy to overcome.
The obvious reason not to use stdint is when the code is size-independent, in maths terms everything that works over the rational integers. It would produce ugly code duplicates if you provided a uint*_t version of, say, qsort() for every expansion of *.
I use my own types in that case, derived from size_t when I'm lazy or the largest supported unsigned integer on the platform when I'm not.
Edit, because I ran into this issue earlier:
I think it's noteworthy that at least uint8_t, uint32_t and uint64_t are broken in Solaris 2.5.1.
So for maximum portability I still suggest avoiding stdint.h (at least for the next few years).

Why can't I declare bitfields as automatic variables?

I want to declare a bitfield with the size specified using the a colon (I can't remember what the syntax is called). I want to write this:
void myFunction()
{
unsigned int thing : 12;
...
}
But GCC says it's a syntax error (it thinks I'm trying to write a nested function). I have no problem doing this though:
struct thingStruct
{
unsigned int thing : 4;
};
and then putting one such struct on the stack
void myFunction()
{
struct thingStruct thing;
...
}
This leads me to believe that it's being prevented by syntax, not semantic issues.
So why won't the first example work? What am I missing?
The first example won't work because you can only declare bitfields inside structs. This is syntax, not semantics, as you said, but there it is. If you want a bitfield, use a struct.
Why would you want to do such a thing? A bit field of 12 would on all common architectures be padded to at least 16 or 32 bits.
If you want to ensure the width of an integer variable use the types in inttypes.h, e.g int16_t or int32_t.
As others have said, bitfields must be declared inside a struct (or union, but that's not really useful). Why? Here are two reasons.
Mainly, it's to make the compiler writer's job easier. Bitfields tend to require more machine instructions to extract the bits from the bytes. Only fields can be bitfields, and not variables or other objects, so the compiler writer doesn't have to worry about them if there is no . or -> operator involved.
But, you say, sometimes the language designers make the compiler writer's job harder in order to make the programmer's life easier. Well, there is not a lot of demand from programmers for bitfields outside structs. The reason is that programmers pretty much only bother with bitfields when they're going to cram several small integers inside a single data structure. Otherwise, they'd use a plain integral type.
Other languages have integer range types, e.g., you can specify that a variable ranges from 17 to 42. There isn't much call for this in C because C never requires that an implementation check for overflow. So C programmers just choose a type that's capable of representing the desired range; it's their job to check bounds anyway.
C89 (i.e., the version of the C language that you can find just about everywhere) offers a limited selection of types that have at least n bits. There's unsigned char for 8 bits, unsigned short for 16 bits and unsigned long for 32 bits (plus signed variants). C99 offers a wider selection of types called uint_least8_t, uint_least16_t, uint_least32_t and uint_least64_t. These types are guaranteed to be the smallest types with at least that many value bits. An implementation can provide types for other number of bits, such as uint_least12_t, but most don't. These types are defined in <stdint.h>, which is available on many C89 implementations even though it's not required by the standard.
Bitfields provide a consistent syntax to access certain implementation-dependent functionality. The most common purpose of that functionality is to place certain data items into bits in a certain way, relative to each other. If two items (bit-fields or not) are declared as consecutive items in a struct, they are guaranteed to be stored consecutively. No such guarantee exists with individual variables, regardless of storage class or scope. If a struct contains:
struct foo {
unsigned bar: 1;
unsigned boz: 1;
};
it is guaranteed that bar and boz will be stored consecutively (most likely in the same storage location, though I don't think that's actually guaranteed). By contrast, 'bar' and 'boz' were single-bit automatic variables, there's no telling where they would be stored, so there'd be little benefit to having them as bitfields. If they did share space with some other variable, it would be hard to make sure that different functions reading and writing different bits in the same byte didn't interfere with each other.
Note that some embedded-systems compilers do expose a genuine 'bit' type, which are packed eight to a byte. Such compilers generally have an area of memory which is allocated for storing nothing but bit variables, and the processors for which they generate code have atomic instructions to test, set, and clear individual bits. Since the memory locations holding the bits are only accessed using such instructions, there's no danger of conflicts.

Resources