I have several enums that serve as type constants. For example:
enum item_type {
street,
town,
lake,
border,
...
}
The enum values are used in code to designate object types, and are written out to disk as part of data files. This mostly works well, but there is one drawback:
There is no way to remove an enum member (because it is no longer used) without changing the integer values of all subsequent members. So any such change would make the code incompatible with existing data files.
Is there some good technique for avoiding this problem? Maybe some preprocessor trick?
The only solution I can think of is to explicitly set all the integer values. While that would work, it is hard to read and manage for big enums.
Note: This problem comes from the source code of Navit, which uses several such "type enums" (though they are actually hidden behind some macros).
If you want to remove items very rarely, you could do something like
enum item_type {
street,
town,
//lake,
border = town+2,
...
}
i.e. only explicitly assign a value to the item immediately following the one you remove.
Since compatibility is very important to you, it'd be more reliable to just bite the bullet and explicitly number all items
enum item_type {
street = 0,
town = 1,
//lake = 2,
border = 3,
...
}
I ended up declaring a macro UNUSED, which expands to
UNUSED_<linenumber>. Then unused enum values can just be replaced by
UNUSED. The macro expands to a unique identifier on each line it is
used because otherwise the compiler would complain about duplicate enum
entries it were used multiple times inside one enum.
This is slightly ugly if you have many "gaps". Still, I chose this
solution over simonc's solution because it easy to read (keeps the regular enum values free of visual
clutter like three=zero+2) and does not require magic numbers.
Admittedly this only makes sense if the gaps are few and far between.
For large gaps simonc's solution looks better.
Complete example:
#include <stdio.h>
#define UNUSED UNUSED_P(__LINE__)
#define UNUSED_P(x) UNUSED_P2(x)
#define UNUSED_P2(x) UNUSED_##x
enum e {
zero,
UNUSED,
UNUSED,
three,
};
int main(void){
printf("int value of 'three': %d\n",three);
return 0;
}
The double replacement is adapted from, among others, this question:
c++ - How, exactly, does the double-stringize trick work? .
Related
I already have, say, a struct smallbox with two primitive variables (int identifier, int size) in it. This smallbox is part of higher structs that are used to build i.e. queues.
Now, I have in a part of my project an issue for which I came up with the solution to expand this smallbox, so it has another piece of information like int costs_to_send_it. While, I am not allowed to change my basis structs, is there a way to expand this struct in some fashion like methods overloading in java or so? Will I still be able to use all operation that I have on my higher structs while having the new struct smallbox with the new attribute inside instead of the old one?
This sentence determines the answer: “[Will] I still be able to use all operation that I have on my higher structs while having the new struct smallbox with color attribute inside instead of the old one?” The answer is no.
If the headers and routines involved were completely separate, there are some compiling and linking “games” you could play—compiling one set of source files with one definition of the structure and another set of source files with another definition of the structure and ensuring they never interacted in ways depending on the structure definition. However, since you ask whether the operations defined using one definition could be used with the alternate definition, you are compelling one set of code to use both definitions. (An alternate solution would be to engineer one source file to use different names for its routines under different circumstances, and then you could compile it twice, once for one definition of the structure and once for another, and then you could use the “same” operations on the different structures, but they would actually be different routines with different names performing the “same” operation in some sense.)
While you could define the structure differently within different translation units, when the structure or any type derived from it (such as a pointer to the structure) is used with a routine in a different translation unit, the type the routine is expecting to receive as a parameter must be compatible with the type that is passed to it as an argument, aside from some rules about signed types, adding qualifiers, and so on that do not help here.
For two structures to be compatible, there must be a one-to-one correspondence between their members, which must themselves be of compatible types (C 2018 6.2.7 1). Two structures with different numbers of members do not have a one-to-one correspondence.
is there a way to expand this struct in some fashion like methods
overloading in java or so?
In method overloading, the compiler chooses among same-named methods by examining the arguments to each invocation of a method of that name. Observe that that is an entirely localized decision: disregarding questions of optimization, the compiler's choice here affects only code generation for a single statement.
Where I still be able to use all operation
that I have on my higher structs while having the new struct smallbox
with color attribute inside instead of the old one?
I think what you're looking for is polymorphism, not overloading. Note well that in Java (and C++ and other the other languages I know of that support this) it is based on a type / subtype relationship between differently-named types. I don't know of any language that lets you redefine type names and use the two distinct types as if they were the same in any sense. Certainly C does not.
There are some alternatives, however. Most cleanly-conforming would involve creating a new, differently-named structure type that contains an instance of the old:
struct sb {
int id;
int size;
};
struct sb_metered {
struct sb box;
int cost;
}
Functions that deal in individual instances of these objects by pointer, not by value, can be satisfied easily:
int get_size(struct sb *box) {
return sb->size;
}
int get_cost(struct sb_metered *metered_box) {
return metered_box->cost;
}
int main() {
struct sb_metered b = { { 1, 17}, 42 };
printf("id: %d, size: %d, cost: %d\n",
b.id,
get_size(&b.box),
get_cost(&b));
}
Note that this does not allow you to form arrays of the supertype (struct sb) that actually contain instances of the subtype, nor to pass or return structure objects of the subtype by value as if they were objects of the supertype.
My Question for larger coding perspective but I'm trying to understand with simple example. Lets say I have few lines of Code
int main(void) {
int input_1 = 10;
int input_2 = 10;
/* some stuff */
return 0;
}
After reading design principles(I am not sure whether it was common for programming language or not, I hope its generic) I came to know that above code is valid C code but its a dirty code because here I'm not following DRY(Don't repeat yourself) principle as magic number 10 is repeating.
Firstly My doubt is, Does C standard says the same about best practices of coding, I read specs but I didn't get exactly ?
And I modified as below to avoid the phrase Dirty Code
int main(void) { /* I'm not 100 percent sure that this is not dirty code ? */
const int value = 10; /*assigning 10 to const variable*/
int input_1 = value;
int input_2 = value;
/* some stuff */
return 0;
}
Does modified version is the correct or can I do something more better in that ? Finally If these design principles are best suggested than why compilers doesn't produce any warning.
This is more about avoiding magic numbers. Your 10 should have some semantic meaning if you claim it's "the same 10". Then you should do something like
#define FROBNUM 10 // use a name here that explains the meaning of the number
int main(void) {
int input_1 = FROBNUM;
int input_2 = FROBNUM;
/* some stuff */
return 0;
}
Introducing a const is unnecessary, macros solve this problem nicely. DRY is addressed here, the macro definition is the single source of the concrete value.
If there is on the other hand no semantic relationship between the two 10 values, #define two macros instead. This isn't "repeating yourself" if they indeed have a different meaning. Don't misunderstand DRY here.
Side note about your version with const: It has two flaws
The name value isn't semantic at all, so nothing gained, the number is still magic
With this declaration, you introduce a new object of automatic storage duration and type int, which you don't really need. A good compiler would optimize it away, but better not rely on that -- that's why a macro fits better here.
DRY mostly refers to there being one single source of truth. Certain business rules or reusable code patterns should only be expressed once, especially if they may be altered in the future. Examples include code to calculate shipping fees or tax rates, which you want to code exactly once and alter exactly in one place if they change; or the instantiation of a database adapter which you can alter in exactly one place when the database details change.
DRY does not mean that you must reduce every line of code which looks similar to another line of code into one single line.
enum elements' names are susceptible to overlap/collide with both other enum elements names, variable names, etc...
enum Fruit
{
apple,
orange
};
typedef enum Fruit Fruit;
enum Color
{
red,
orange // <-- ERROR
};
typedef enum Color Color;
char apple='a'; // <-- ERROR
Is there a C99 compliant solution to avoid collision other than prefixing every enum element name?
Side note: this question has already an answer for C++
How to avoid name conflicts for two enum values with the same name in C++?
I'm looking for a C99 solution.
In C, there is no solution other than prefixing the names of the enum values.
As pointed out in the OP, C++ has a number of mechanisms, of which enum class is probably indicated for modern code. However, in practice the result is the same: you end up prefixing the name of the enum element with the name of the enum. Arguably, Fruit::orange is tidier than FruitOrange, but really it makes little difference to my eyes.
In some parallel universe, it would be great to have a language in which you could write:
Fruit selected = orange;
and have the compiler deduce the namespace of the constant on the right-hand side. But I don't see how that language could be C. C doesn't have namespaces in that sense, and even if it did, the type system only allows conversions; you cannot condition the syntax of the RHS of an operator based on the LHS (and I use the word syntax deliberately, because name lookup is a syntactic property in C).
Even if you did have some language hack which sometimes implictly inserted an enum namespace, you would still need the explicit prefix on any comparison, because
if (apple > orange)
does not have a context in which deduction could take place, even though the fact that enum values in C are all of type int does make FruitApple and FruitOrange comparable.
An Example
Suppose we have a text to write and could be converted to "uppercase or lowercase", and can be printed "at left, center or right".
Specific case implementation (too many functions)
writeInUpperCaseAndCentered(char *str){//..}
writeInLowerCaseAndCentered(char *str){//..}
writeInUpperCaseAndLeft(char *str){//..}
and so on...
vs
Many Argument function (bad readability and even hard to code without a nice autocompletion IDE)
write( char *str , int toUpper, int centered ){//..}
vs
Context dependent (hard to reuse, hard to code, use of ugly globals, and sometimes even impossible to "detect" a context)
writeComplex (char *str)
{
// analize str and perhaps some global variables and
// (under who knows what rules) put it center/left/right and upper/lowercase
}
And perhaps there are others options..(and are welcome)
The question is:
Is there is any good practice or experience/academic advice for this (recurrent) trilemma ?
EDIT:
What I usually do is to combine "specific case" implementation, with an internal (I mean not in header) general common many-argument function, implementing only used cases, and hiding the ugly code, but I don't know if there is a better way that I don't know. This kind of things make me realize of why OOP was invented.
I'd avoid your first option because as you say the number of function you end up having to implement (though possibly only as macros) can grow out of control. The count doubles when you decide to add italic support, and doubles again for underline.
I'd probably avoid the second option as well. Againg consider what happens when you find it necessary to add support for italics or underlines. Now you need to add another parameter to the function, find all of the cases where you called the function and updated those calls. In short, anoying, though once again you could probably simplify the process with appropriate use of macros.
That leaves the third option. You can actually get some of the benefits of the other alternatives with this using bitflags. For example
#define WRITE_FORMAT_LEFT 1
#define WRITE_FORMAT_RIGHT 2
#define WRITE_FORMAT_CENTER 4
#define WRITE_FORMAT_BOLD 8
#define WRITE_FORMAT_ITALIC 16
....
write(char *string, unsigned int format)
{
if (format & WRITE_FORMAT_LEFT)
{
// write left
}
...
}
EDIT: To answer Greg S.
I think that the biggest improvement is that it means that if I decide, at this point, to add support for underlined text I it takes two steps
Add #define WRITE_FORMAT_UNDERLINE 32 to the header
Add the support for underlines in write().
At this point it can call write(..., ... | WRITE_FORMAT_UNLDERINE) where ever I like. More to the point I don't need to modify pre-existing calls to write, which I would have to do if I added a parameter to its signature.
Another potential benefit is that it allows you do something like the following:
#define WRITE_ALERT_FORMAT (WRITE_FORMAT_CENTER | \
WRITE_FORMAT_BOLD | \
WRITE_FORMAT_ITALIC)
I prefer the argument way.
Because there's going to be some code that all the different scenarios need to use. Making a function out of each scenario will produce code duplication, which is bad.
Instead of using an argument for each different case (toUpper, centered etc..), use a struct. If you need to add more cases then you only need to alter the struct:
typedef struct {
int toUpper;
int centered;
// etc...
} cases;
write( char *str , cases c ){//..}
I'd go for a combination of methods 1 and 2.
Code a method (A) that has all the arguments you need/can think of right now and a "bare" version (B) with no extra arguments. This version can call the first method with the default values. If your language supports it add default arguments. I'd also recommend that you use meaningful names for your arguments and, where possible, enumerations rather than magic numbers or a series of true/false flags. This will make it far easier to read your code and what values are actually being passed without having to look up the method definition.
This gives you a limited set of methods to maintain and 90% of your usages will be the basic method.
If you need to extend the functionality later add a new method with the new arguments and modify (A) to call this. You might want to modify (B) to call this as well, but it's not necessary.
I've run into exactly this situation a number of times -- my preference is none of the above, but instead to use a single formatter object. I can supply it with the number of arguments necessary to specify a particular format.
One major advantage of this is that I can create objects that specify logical formats instead of physical formats. This allows, for example, something like:
Format title = {upper_case, centered, bold};
Format body = {lower_case, left, normal};
write(title, "This is the title");
write(body, "This is some plain text");
Decoupling the logical format from the physical format gives you roughly the same kind of capabilities as a style sheet. If you want to change all your titles from italic to bold-face, change your body style from left justified to fully justified, etc., it becomes relatively easy to do that. With your current code, you're likely to end up searching through all your code and examining "by hand" to figure out whether a particular lower-case, left-justified item is body-text that you want to re-format, or a foot-note that you want to leave alone...
As you already mentioned, one striking point is readability: writeInUpperCaseAndCentered("Foobar!") is much easier to understand than write("Foobar!", true, true), although you could eliminate that problem by using enumerations. On the other hand, having arguments avoids awkward constructions like:
if(foo)
writeInUpperCaseAndCentered("Foobar!");
else if(bar)
writeInLowerCaseAndCentered("Foobar!");
else
...
In my humble opinion, this is a very strong argument (no pun intended) for the argument way.
I suggest more cohesive functions as opposed to superfunctions that can do all kinds of things unless a superfunction is really called for (printf would have been quite awkward if it only printed one type at a time). Signature redundancy should generally not be considered redundant code. Technically speaking it is more code, but you should focus more on eliminating logical redundancies in your code. The result is code that's much easier to maintain with very concise, well-defined behavior. Think of this as the ideal when it seems redundant to write/use multiple functions.
I read somewhere about giving enums default values like so:
typedef enum {
MarketNavigationTypeNone = 0,
MarketNavigationTypeHeirachy = 1,
MarketNavigationTypeMarket = 2
} MarketNavigationLevelType;
.. but i can't remember the value of doing this. If i don't give them default values - and then someone later on reorders the enum - what are the risks?
If i always use the enum name and don't even refer to them by their integer value, is there any risks?
The only possible problem i can think of is if i'm initialising an enum from an int value from a DB - and the enum is reordered - then the app would break.
That are not default values, you are giving them the values they will always have.
If you wouldn't initialize them explicitly, the first enumerators value is zero. For all others, if there is no initializer, their value is the value of the previous enumerator increased by one.
There are two reasons for giving them explicit values:
you don't want them to have the values they'd have otherwise
you want to make it clear what value they have (for you or other developers)
If you always refer to them by their name and never explicitly use an integral value for comparison or assignment, explicitly giving them a value is not needed.
In general this only matters if the enum is exposed to some kind of external API or it is going to be used to exchange data via data files or other means. If the enum is only every used within your app and nowhere else then the actual values don't matter.