8 bit enum, in C - c

I have to store instructions, commands that I will be receiving via serial.
The commands will be 8 bits long.
I need to preserve transparency between command name, and its value.
So as to avoid having to translate an 8-bit number received in serial into any type.
I'd like to use Enumerations to deal with them in my code.
Only a enumeration corresponds to a on this platform a 16 bit integer.
The platform is AVR ATmega169V microcontroller, on the Butterfly demo board.
It is a 8bit system with some limited support for 16bit operations.
It is not a fast system and has about 1KB of RAM.
It doesn't have any luxuries like file I/O, or an operating systems.
So any suggestions as to what type I should be using to store 8-bit commands?
There has got to be something better than a massive header of #defines.

gcc's -fshort-enums might be useful:
Allocate to an "enum" type only as
many bytes as it needs for the
declared range of possible values.
Specifically, the "enum" type will be
equivalent to
the smallest integer type which has enough room.
In fact, here's a page with a lot of relevant information. I hope you come across many GCC switches you never knew existed. ;)

You are trying to solve a problem that does not exist.
Your question is tagged C. In C language enum types in value context are fully compatible with integral types and behave just like other integral types. When used in expressions, they are subjected to exactly the same integral promotions as other integral types. Once you take that into account, you should realize that if you want to store values described by enumeration constants in a 8-bit integral type, all you have to do is to choose a suitable generic 8-bit integral type (say int8_t) and use it instead of enum type. You'll lose absolutely nothing by storing your enum constant values in an object of type int8_t (as opposed to an object explicitly declared with enum type).
The issue you describe would exist in C++, where enum types are separated much farther from other integral types. In C++ using an integral type in place of enum type for the purpose of saving memory is more difficult (although possible). But not in C, where it requires no additional effort whatsoever.

I don't see why an enum wouldn't work. Comparisons to, and assignments from, an enum should all work fine with the default widening. Just be careful that your 8 bit values are signed correctly (I would think you would want unsigned extension).
You will get 16-bit comparisons this way, I hope that won't be a performance problem (it shouldn't be, especially if your processor is 16-bit as it sounds like it is).

Microsoft's C compiler allows you to do something like this, but it's an extension (it's standard in C++0x):
enum Foo : unsigned char {
blah = 0,
blargh = 1
};
Since you tagged GCC, I'm not entirely sure if the same thing is possible, but GCC might have an extension in gnu99 mode or something for it. Give it a whirl.

I'd recommend to stay on enum in any case for the following reasons:
This solution allows you to map command values directly to what your serial protocol expects.
If you really use 16-bit architecture there is not so big number of advantages to move to 8 bits type. Think about aspects other then 1 memory byte saved.
At some compilers I used actual enum size used minimal number of bits (enums that could be fit in byte used only byte, then 16 bit then 32).
First you should not care about real type width. Only if you really need effective way of storage you should use compiler flags such as -fshort-enums on GNU compiler but I don't recommend them unless you really need them.
As last option you can define 'enum' as presentation data for the commands and use conversion to byte with 2 simple operations to store / restore command value to/from memory (and encapsulate this in one place). What about this? These are very simple operations so you can even inline them (but this allows you to really use only 1 byte for storage and from other side to perform operations using most usable enum defined as you like.

Answer which is relevant for ARC compiler
(Quoted from DesignWare MetaWare C/C++ Programmer’s Guide for ARC; section 11.2.9.2)
Size of Enumerations The size of an enum type depends on the status of toggle *Long_enums*.
■ If toggle *Long_enums* is off, the enum type maps to the smallest of one, two, or four bytes, such that all values can be represented.
■ If toggle *Long_enums* is on, an enum maps to
four bytes (matching the AT&T Portable C Compiler convention).

Related

What does the following mean in context of C programming language?

From Modern C by Jens Gustedt,
Representations of values on a computer can vary “culturally” from architecture to architecture or are determined by the type the programmer gave to the value. Therefore, we should try to reason primarily about values and not about representations if we want to write portable code.
If you already have some experience in C and in manipulating bytes and bits, you will need to make an effort to actively “forget” your knowledge for most of this section. Thinking about concrete representations of values on your computer will inhibit you more
than it helps.
Takeaway - C programs primarily reason about values and not about their representation.
Question 1: What kind of 'representations' of values, is author talking about? Could I be given an example, where this 'representation' varies from architecture to architecture and also an example of how representations of values are determined by type programmer gave to value?
Question 2: What's the purpose of specifying a data type in C language, I mean that's the rule of the language but I have heard that's how a compiler knows how much memory to allocate to an object? Is that the only use, albeit crucial? I've heard there isn't a need to specify a data type in Python.
What kind of 'representations' of values, is author talking about?
https://en.wikipedia.org/wiki/Two%27s_complement vs https://en.wikipedia.org/wiki/Ones%27_complement vs https://en.wikipedia.org/wiki/Offset_binary. Generally https://en.wikipedia.org/wiki/Signed_number_representations.
But also the vast space of floating point number formats https://en.wikipedia.org/wiki/Floating-point_arithmetic#IEEE_754:_floating_point_in_modern_computers - IEEE 745, minifloat, bfloat16, etc. etc. .
Could I be given an example, where this 'representation' varies from architecture to architecture
Your PC uses twos complement vs https://superuser.com/questions/1137182/is-there-any-existing-cpu-implementation-which-uses-ones-complement .
Ach - but of course, most notably https://en.wikipedia.org/wiki/Endianness .
also an example of how representations of values are determined by type programmer gave to value?
(float)1 is represented in IEEE 745 as 0b00111111100000000000000000000000 https://www.h-schmidt.net/FloatConverter/IEEE754.html .
(unsigned)1 with 32-bit int is represented as 0b00.....0001.
What's the purpose of specifying a data type in C language,
Use computer resources efficiently. There is no point in reserving 2 gigabytes to store 8-bits of data. Type determines the range of values that can be "contained" in a variable. You communicate that "upper/lower range" of allowed values to the compiler, and the compiler generates nice and fast code. (There is also ADA where you literally specify the range of types, like type Day_type is range 1 .. 31;).
Programs are written using https://en.wikipedia.org/wiki/Harvard_architecture . Variables at block scope are put on stack https://en.wikipedia.org/wiki/Stack_(abstract_data_type)#Hardware_stack . The idea is that you have to know in advance how many bytes to reserve from the stack. Types communicate just that.
have heard that's how a compiler knows how much memory to allocate to an object?
Type communicates to the compiler how much memory to allocate for an object, but it also communicates the range of values, the representation (float vs _Float32 might be similar, but be different). Overflowing addition of two int's is invalid, overflowing addition of two unsigned is fine and wraps around. There are differences.
Is that the only use, albeit crucial?
The most important use of types is to clearly communicate the purpose of your code to other developers.
char character;
int numerical_variable;
uint_least8_t variable_with_8_bits_that_is_optimized_for_size;
uint_fast8_t variable_with_8_bits_that_is_optimized_for_speed;
wchar_t wide_character;
FILE *this_is_a_file;
I've heard there isn't a need to specify a data type in Python.
This is literally the difference between statically typed programming languages and dynamically typed programming languages. https://en.wikipedia.org/wiki/Type_system#Type_checking

Is there a general way in C to make a number greater than 64 bits or smaller than 8 bits?

First of all, I am aware that what I am trying to do might be outside the C standard.
I'd like to know if it is possible to make a uint4_t/int4_t or uint128_t/int128_t type in C.
I know I could do this using bitshifts and complex functions, but can I do it without those?
You can use bitfields within a structure to get fields narrower than a uint8_t, but, the base datatype they're stored in will not be any smaller.
struct SmallInt
{
unsigned int a : 4;
};
will give you a structure with a member called a that is 4 bits wide.
Individual storage units (bytes) are no less than CHAR_BITS bits wide1; even if you create a struct with a single 4-bit bitfield, the associated object will always take up a full storage unit.
There are multiple precision libraries such as GMP that allow you to work with values that can't fit into 32 or 64 bits. Might want to check them out.
8 bits minimum, but may be wider.
In practice, if you want very wide numbers (but that is not specified in standard C11) you probably want to use some arbitrary-precision arithmetic external library (a.k.a. bignums). I recommend using GMPlib.
In some cases, for tiny ranges of numbers, you might use bitfields inside struct to have tiny integers. Practically speaking, they can be costly (the compiler would emit shift and bitmask instructions to deal with them).
See also this answer mentioning __int128_t as an extension in some compilers.

Define data type of enum in C for memory

I could not find a direct answer to this but is it possible to force a certain kind of data type for an enum in C ?
e.g I have an enum for a state machine that will only hold some of the states, so for memory and perfomance issues it would be great to define an enum as a byte or a shorter data type. Is there any way to provide this behaviour in C or even in the Arduino IDE ?
Any help is appreciated
The C standard says that enumeration constants, that is the "members" of the enum, must be compatible with type int. But the enumeration variable itself is allowed to be of other integer types. If you think this doesn't make any sense, it is because it doesn't: the C standard is irrational when it comes to enums.
As for how to pick which integer type an enumeration variable corresponds to, that's unfortunately a decision made by the compiler, not the programmer. On an 8-bit Atmel, an enum variable is either 8 or 16 bits.
Several compilers gives an option to set the size of an enum through non-standard compiler options. To use such features might not be a good idea regardless, as that would make the code non-portable.
However, regardless of the size of an enum, the compiler may (and likely will) optimize expressions where the enum is present, just as it may optimize any expression containing small integer types to not use int for the calculation, as the C standard otherwise mandates through integer promotion.
In case you have very extreme performance requirements, don't use enums, but uint8_t. But if you had extreme performance requirements, then you wouldn't use a hobbyist 8-bit MCU in the first place! So it turns out that your concern is a non-issue.
Go ahead and use enum and let the compiler worry about optimizations.
As pointed before enum constants are basically integer constants. They do not take space themself, they are not listed somewhere in the executable code.
However if you use enum type for defining a variable, then the variable probably takes as much space as an int variable. If you need to save this space you can use any other integer type able to store the necessary range. For example, instead of:
enum state {S1, S2, S3, S4};
enum state stack[1000];
...
stack[i] = S2;
...
you can write:
enum state {S1, S2, S3, S4};
typedef unsigned char my_state_type;
my_state_type stack[1000];
...
stack[i] = S2;
...
Enums are not "stored" as variables use memory locations. Enums are used in statements. These statements are compiled and in the compiled result (machine code, assembler), the enums are replaced with their value. For example, an enum constant named MY_ENUM1 with value 1 will be replaced in the assembler as
mov ax, 1
Hence they will take up the smallest amount of memory as is required for the instruction operand.
It looks to me as if the main value of enums in C is more descriptive code that is presumably easier to read and maintain. There does not seem to be any real benefit to using an enum versus using a bunch of #define statements. Even declaring a variable of type enum color, say my_hue, the compiler, gcc at least, does not show an error if you assign an 'invalid' value (-100 or +100) to my_hue. And as an undesirable side effect you have the compiler choosing variable types that do not work in perfectly reasonable loops.

Why aren't the C-supplied integer types good enough for basically any project?

I'm much more of a sysadmin than a programmer. But I do spend an inordinate amount of time grovelling through programmers' code trying to figure out what went wrong. And a disturbing amount of that time is spent dealing with problems when the programmer expected one definition of __u_ll_int32_t or whatever (yes, I know that's not real), but either expected the file defining that type to be somewhere other than it is, or (and this is far worse but thankfully rare) expected the semantics of that definition to be something other than it is.
As I understand C, it deliberately doesn't make width definitions for integer types (and that this is a Good Thing), but instead gives the programmer char, short, int, long, and long long, in all their signed and unsigned glory, with defined minima which the implementation (hopefully) meets. Furthermore, it gives the programmer various macros that the implementation must provide to tell you things like the width of a char, the largest unsigned long, etc. And yet the first thing any non-trivial C project seems to do is either import or invent another set of types that give them explicitly 8, 16, 32, and 64 bit integers. This means that as the sysadmin, I have to have those definition files in a place the programmer expects (that is, after all, my job), but then not all of the semantics of all those definitions are the same (this wheel has been re-invented many times) and there's no non-ad-hoc way that I know of to satisfy all of my users' needs here. (I've resorted at times to making a <bits/types_for_ralph.h>, which I know makes puppies cry every time I do it.)
What does trying to define the bit-width of numbers explicitly (in a language that specifically doesn't want to do that) gain the programmer that makes it worth all this configuration management headache? Why isn't knowing the defined minima and the platform-provided MAX/MIN macros enough to do what C programmers want to do? Why would you want to take a language whose main virtue is that it's portable across arbitrarily-bitted platforms and then typedef yourself into specific bit widths?
When a C or C++ programmer (hereinafter addressed in second-person) is choosing the size of an integer variable, it's usually in one of the following circumstances:
You know (at least roughly) the valid range for the variable, based on the real-world value it represents. For example,
numPassengersOnPlane in an airline reservation system should accommodate the largest supported airplane, so needs at least 10 bits. (Round up to 16.)
numPeopleInState in a US Census tabulating program needs to accommodate the most populous state (currently about 38 million), so needs at least 26 bits. (Round up to 32.)
In this case, you want the semantics of int_leastN_t from <stdint.h>. It's common for programmers to use the exact-width intN_t here, when technically they shouldn't; however, 8/16/32/64-bit machines are so overwhelmingly dominant today that the distinction is merely academic.
You could use the standard types and rely on constraints like “int must be at least 16 bits”, but a drawback of this is that there's no standard maximum size for the integer types. If int happens to be 32 bits when you only really needed 16, then you've unnecessarily doubled the size of your data. In many cases (see below), this isn't a problem, but if you have an array of millions of numbers, then you'll get lots of page faults.
Your numbers don't need to be that big, but for efficiency reasons, you want a fast, “native” data type instead of a small one that may require time wasted on bitmasking or zero/sign-extension.
This is the int_fastN_t types in <stdint.h>. However, it's common to just use the built-in int here, which in the 16/32-bit days had the semantics of int_fast16_t. It's not the native type on 64-bit systems, but it's usually good enough.
The variable is an amount of memory, array index, or casted pointer, and thus needs a size that depends on the amount of addressable memory.
This corresponds to the typedefs size_t, ptrdiff_t, intptr_t, etc. You have to use typedefs here because there is no built-in type that's guaranteed to be memory-sized.
The variable is part of a structure that's serialized to a file using fread/fwrite, or called from a non-C language (Java, COBOL, etc.) that has its own fixed-width data types.
In these cases, you truly do need an exact-width type.
You just haven't thought about the appropriate type, and use int out of habit.
Often, this works well enough.
So, in summary, all of the typedefs from <stdint.h> have their use cases. However, the usefulness of the built-in types is limited due to:
Lack of maximum sizes for these types.
Lack of a native memsize type.
The arbitrary choice between LP64 (on Unix-like systems) and LLP64 (on Windows) data models on 64-bit systems.
As for why there are so many redundant typedefs of fixed-width (WORD, DWORD, __int64, gint64, FINT64, etc.) and memsize (INT_PTR, LPARAM, VPTRDIFF, etc.) integer types, it's mainly because <stdint.h> came late in C's development, and people are still using older compilers that don't support it, so libraries need to define their own. Same reason why C++ has so many string classes.
Sometimes it is important. For example, most image file formats require an exact number of bits/bytes be used (or at least specified).
If you only wanted to share a file created by the same compiler on the same computer architecture, you would be correct (or at least things would work). But, in real life things like file specifications and network packets are created by a variety of computer architectures and compilers, so we have to care about the details in these case (at least).
The main reason the fundamental types can't be fixed is that a few machines don't use 8-bit bytes. Enough programmers don't care, or actively want not to be bothered with support for such beasts, that the majority of well-written code demands a specific number of bits wherever overflow would be a concern.
It's better to specify a required range than to use int or long directly, because asking for "relatively big" or "relatively small" is fairly meaningless. The point is to know what inputs the program can work with.
By the way, usually there's a compiler flag that will adjust the built-in types. See INT_TYPE_SIZE for GCC. It might be cleaner to stick that into the makefile, than to specialize the whole system environment with new headers.
If you want portable code, you want the code your write to function identically on all platforms. If you have
int i = 32767;
you can't say for certain what i+1 will give you on all platforms.
This is not portable. Some compilers (on the same CPU architecture!) will give you -32768 and some will give you 32768. Some perverted ones will give you 0. That's a pretty big difference. Granted if it overflows, this is Undefined Behavior, but you don't know it is UB unless you know exactly what the size of int is.
If you use the standard integer definitions (which is <stdint.h>, the ISO/IEC 9899:1999 standard), then you know the answer of +1 will give exact answer.
int16_t i = 32767;
i+1 will overflow (and on most compilers, i will appear to be -32768)
uint16_t j = 32767;
j+1 gives 32768;
int8_t i = 32767; // should be a warning but maybe not. most compilers will set i to -1
i+1 gives 0; (//in this case, the addition didn't overflow
uint8_t j = 32767; // should be a warning but maybe not. most compilers will set i to 255
i+1 gives 0;
int32_t i = 32767;
i+1 gives 32768;
uint32_t j = 32767;
i+1 gives 32768;
There are two opposing forces at play here:
The need for C to adapt to any CPU architecture in a natural way.
The need for data transferred to/from a program (network, disk, file, etc.) so that a program running on any architecture can correctly interpret it.
The "CPU matching" need has to do with inherent efficiency. There is CPU quantity which is most easily handled as a single unit which all arithmetic operations easily and efficiently are performed on, and which results in the need for the fewest bits of instruction encoding. That type is int. It could be 16 bits, 18 bits*, 32 bits, 36 bits*, 64 bits, or even 128 bits on some machines. (* These were some not-well-known machines from the 1960s and 1970s which may have never had a C compiler.)
Data transfer needs when transferring binary data require that record fields are the same size and alignment. For this it is quite important to have control of data sizes. There is also endianness and maybe binary data representations, like floating point representations.
A program which forces all integer operations to be 32 bit in the interests of size compatibility will work well on some CPU architectures, but not others (especially 16 bit, but also perhaps some 64-bit).
Using the CPU's native register size is preferable if all data interchange is done in a non-binary format, like XML or SQL (or any other ASCII encoding).

Reasons to use (or not) stdint

I already know that stdint is used to when you need specific variable sizes for portability between platforms. I don't really have such an issue for now, but what are the cons and pros of using it besides the already shown fact above?
Looking for this on stackoverflow and others sites, I found 2 links that treats about the theme:
codealias.info - this one talks about the portability of the stdint.
stackoverflow - this one is more specific about uint8_t.
These two links are great specially if one is looking to know more about the main reason of this header - portability. But for me, what I like most about it is that I think uint8_t is cleaner than unsigned char (for storing an RBG channel value for example), int32_t looks more meaningful than simply int, etc.
So, my question is, exactly what are the cons and pros of using stdint besides the portability? Should I use it just in some specifics parts of my code, or everywhere? if everywhere, how can I use functions like atoi(), strtok(), etc. with it?
Thanks!
Pros
Using well-defined types makes the code far easier and safer to port, as you won't get any surprises when for example one machine interprets int as 16-bit and another as 32-bit. With stdint.h, what you type is what you get.
Using int etc also makes it hard to detect dangerous type promotions.
Another advantage is that by using int8_t instead of char, you know that you always get a signed 8 bit variable. char can be signed or unsigned, it is implementation-defined behavior and varies between compilers. Therefore, the default char is plain dangerous to use in code that should be portable.
If you want to give the compiler hints of that a variable should be optimized, you can use the uint_fastx_t which tells the compiler to use the fastest possible integer type, at least as large as 'x'. Most of the time this doesn't matter, the compiler is smart enough to make optimizations on type sizes no matter what you have typed in. Between sequence points, the compiler can implicitly change the type to another one than specified, as long as it doesn't affect the result.
Cons
None.
Reference: MISRA-C:2004 rule 6.3."typedefs that indicate size and signedness shall be used in place of the basic types".
EDIT : Removed incorrect example.
The only reason to use uint8_t rather than unsigned char (aside from aesthetic preference) is if you want to document that your program requires char to be exactly 8 bits. uint8_t exists if and only if CHAR_BIT==8, per the requirements of the C standard.
The rest of the intX_t and uintX_t types are useful in the following situations:
reading/writing disk/network (but then you also have to use endian conversion functions)
when you want unsigned wraparound behavior at an exact cutoff (but this can be done more portably with the & operator).
when you're controlling the exact layout of a struct because you need to ensure no padding exists (e.g. for memcmp or hashing purposes).
On the other hand, the uint_least8_t, etc. types are useful anywhere that you want to avoid using wastefully large or slow types but need to ensure that you can store values of a certain magnitude. For example, while long long is at least 64 bits, it might be 128-bit on some machines, and using it when what you need is just a type that can store 64 bit numbers would be very wasteful on such machines. int_least64_t solves the problem.
I would avoid using the [u]int_fastX_t types entirely since they've sometimes changed on a given machine (breaking the ABI) and since the definitions are usually wrong. For instance, on x86_64, the 64-bit integer type is considered the "fast" one for 16-, 32-, and 64-bit values, but while addition, subtraction, and multiplication are exactly the same speed whether you use 32-bit or 64-bit values, division is almost surely slower with larger-than-necessary types, and even if they were the same speed, you're using twice the memory for no benefit.
Finally, note that the arguments some answers have made about the inefficiency of using int32_t for a counter when it's not the native integer size are technically mostly correct, but it's irrelevant to correct code. Unless you're counting some small number of things where the maximum count is under your control, or some external (not in your program's memory) thing where the count might be astronomical, the correct type for a count is almost always size_t. This is why all the standard C functions use size_t for counts. Don't consider using anything else unless you have a very good reason.
cons
The primary reason the C language does not specify the size of int or long, etc. is for computational efficiency. Each architecture has a natural, most-efficient size, and the designers specifically empowered and intended the compiler implementor to use the natural native data size data for speed and code size efficiency.
In years past, communication with other machines was not a primary concern—most programs were local to the machine—so the predictability of each data type's size was of little concern.
Insisting that a particular architecture use a particular size int to count with is a really bad idea, even though it would seem to make other things easier.
In a way, thanks to XML and its brethren, data type size again is no longer much of a concern. Shipping machine-specific binary structures from machine to machine is again the exception rather than the rule.
I use stdint types for one reason only, when the data I hold in memory shall go on disk/network/descriptor in binary form. You only have to fight the little-endian/big-endian issue but that's relatively easy to overcome.
The obvious reason not to use stdint is when the code is size-independent, in maths terms everything that works over the rational integers. It would produce ugly code duplicates if you provided a uint*_t version of, say, qsort() for every expansion of *.
I use my own types in that case, derived from size_t when I'm lazy or the largest supported unsigned integer on the platform when I'm not.
Edit, because I ran into this issue earlier:
I think it's noteworthy that at least uint8_t, uint32_t and uint64_t are broken in Solaris 2.5.1.
So for maximum portability I still suggest avoiding stdint.h (at least for the next few years).

Resources