As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
I've just been going back over a bit of C studying using Ivor Horton's Beginning C book. I got to the bit about declaring constants which seems to get mixed up with variables in the same sentence.
Just to clarify, what is the difference in specifying constants and variables in C, and really, when do you need to use a constant instead of a variable? I know folks say to use a constant when the information doesn't change during program execution but I can't really think of a time when a variable couldn't be used instead.
A variable, as you can guess from the name, varies over time. If it doesn't vary, there is "no loss". When you tell the compiler that the value will not change, the compiler can do a whole bunch of optimizations, like directly inlining the value and never allocating any space for the constant on the stack.
However, you cannot always count on your compiler to be smart enough to be able to correctly determine if a value will change once set. In any situation where the compiler is incapable of determining this with 100% confidence, the compiler will err on the side of safety and assume it could change. This can result in various performance impacts like avoiding inlining, not optimizing certain loops, creating object code that is not as parallelism-friendly.
Because of this, and since readability is also important, you should strive to use an explicit constant whenever possible and leave variables for things that can actually change.
As to why constants are used instead of literal numbers:
1) It makes code more readable. Everyone knows what 3.14 is (hopefully), not everyone knows that 3.07 is the income tax rate in PA. This is an example of domain-specific knowledge, and not everyone maintaining your code in the future (e.g., a tax software) will know it.
2) It saves work when you make a change. Going and changing every 3.07 to 3.18 if the tax rate changes in the future will be annoying. You always want to minimize changes and ideally make a single change. The more concurrent changes you have to make, the higher the risk that you will forget something, leading to errors.
3) You avoid risky errors. Imagine that there were two states with an income tax rate of 3.05, and then one of them changes to 3.18 while the other stays at 3.07. By just going and replacing, you could end up with severe errors. Of course, many integer or string constant values are more common than "3.07". For example, the number 7 could represent the number of days in the week, and something else. In large programs, it is very difficult to determine what each literal value means.
4) In the case of string text, it is common to use symbolic names for strings to allow the string pools to change quickly in the case of supporting multiple languages.
Note that in addition to variables and "constant variables", there are also some languages with enumerations. An enumeration actually allows you to defines a type for a small group of constants (e.g., return values), so using them will provide type safety.
For example, if I have an enumeration for the days of the weeks and for the months, I will be warned if I assign a month into a day. If I just use integer constants, there will be no warning when day 3 is assigned to month 3. You always want type safety, and it improves readability. Enumerations are also better for defining order. Imagine that you have constants for the days of the week, and now you want your week to start on Monday rather than Sunday.
Using constants is more a way of defensive programming, to protect yourself from yourself, from accidentally changing the value somewhere in the code when you're coding at 2 a.m. or before having drunk your coffee.
Technically, yes, you can use a variable instead.
Constants have several advantages over variables.
Constants provide some level of guarantee that code can't change the underlying value. This is not of much importance for a smaller project, but matters on a larger project with multiple components written by multiple authors.
Constants also provide a strong hint to the compiler for optimization. Since the compiler knows the value can't change, it doesn't need to load the value from memory and can optimize the code to work for only the exact value of the constant (for instance, the compiler can use shifts for multiplication/division if the const is a power of 2.)
Constants are also inherently static - you can declare the constant and its value in a header file, and not have to worry about defining it exactly one place.
For one, performance optimization.
More importantly, this is for human readers. Remember that your target audience is not only the compiler. It helps to express yourself in code, and avoid comments.
const int spaceTimeDimensions = 4;
if(gpsSattelitesAvailable >= spaceTimeDimensions)
Good();
For a low-level language like C, constants allow for several compilation optimizations.
For a programming language in general, you don't really need them. High level dynamic languages such as Ruby and JavaScript doesn't have them (or at least not in a true constant sense). Variables are used instead, just like you suggested.
Constant is when you just want to share the memory, and it doesn't change.
The const keyword is often used for function parameters, particularly pointers, to suggest the memory the pointer points to will not be modified by the function. Look at the decleration for strcpy for instance:
char *strcpy(char *dest, const char *src);
Otherwise, for example, an declaration such as
const int my_magic_no = 54321;
might be preferred over:
#define MY_MAGIC_NO 54321
for type safety reasons.
It's a really easy way to trap a certain class of errors. If you declare a variable const, and accidentally try to modify it, the compiler will call you on it.
Constants are very necessary in regards to declaration and intialization of variable for any purpose such as at the starting of the loop, to check the condition within the if -else statement, etc.
For more reference, feel free to read either of the following articles:
Constants in C Programming Language
Variables in C Language
Not using const can mean someone in a team project could declare where int FORTY_TWO = 42 and make it equal FORTY_TWO = 41 somewhere else by another team member. Therefore the end of the world happens and you also loose the answer to life. with const although none of this will ever happen. Plus const is stored elsewhere in memory, when compared to the storage of normal variables, and is more efficient.
Related
I am aware of the fact that the C standard allows string literals with identical content to be stored in different locations, at least that's what I have been told, and what I take away from other posts here on SO, e.g. this or this one. However it strikes me as odd that the equality of location for these literals is not required by the standard, since it would guarantee smaller executables and speed up equality checks on string literals considerably, making them an O(1) operation instead of O(n).
I would like to know what arguments - from an implementers POV - make it appealing to allow the locations of these literals to differ. Do compilers do any kind of optimization to make the saving on comparing the literal's location irrelevant? I am well aware, that doing such a comparison on the location would be useless if you would compare a literal with a variable pointing to a different location containing the same string, but I am trying to understand how people who make the standard look at this.
I can think of arguments why you would not want to do that, e.g. the subtle errors you might introduce, when you make a location based comparison an operation supported by the standard, but I am not entirely satisfied with what I could come up with.
I hope some of you can shed light on this.
Edit 1: First of all thank you for your answers and comments. Beyond that I would like to add some thoughts on some of the answers given:
#hvd: I think this is a problem for the specific additional optimization, not the idea of having a single instance per string literal.
#Shafik: I think your question makes it clear to me why having this set in stone would not allow for a lot of useful usages. It could only be used in code that is limited to the translations unit's scope. Once two files with the same string literal are compiled independently of each other, both would contain their own string literal at their own location. Objects would have to use an external reference or be recompiled every time they are combined with other objects containing the same literal.
I think I am sufficiently convinced that the less strict implementation spec as John Bollinger and FUZxxl suggested is preferable, given how little could be gained by JUST specifying that string literals should exist only once per translation unit.
Aside from older compilers that simply want to avoid doing unnecessary work, the requirement would not necessarily be useful even today.
Suppose you have one translation unit with the string literals "a" and "ba". Suppose you also have an optimising compiler that notices this translation unit's "a" can be optimised to "ba"+1.
Now suppose you have another translation unit with the string literals "a" and "ca". The same compiler would then optimise that translation unit's "a" to "ca"+1.
If the first translation unit's "a" must compare equal to the second translation unit's "a", compilers cannot merge strings like this, even though this is a useful optimisation to save space. (As FUZxxl points out in the comments, some linkers do this, and if one of those linkers is used, the compiler don't need to. Not all linkers do this, though, so it may still be a worthwhile optimisation in the compiler.)
The C standard has been traditionally written in a way that makes writing a basic C compiler a comparably simple task. This is important because a C compiler is usually among the first things that need to be provided on a new platform due to the ubiquity of the C language.
For this reason, the C standard does
provide syntax like the register keyword to aid dumb compilers,
not mandate any optimizations,
not specify many aspects of its behaviour.
This question already has an answer here:
memcpy vs assignment in C
(1 answer)
Closed 9 years ago.
Suppose pp is a pointer to an array of structs of length n. [was dynamically allocated]
Suppose I want to create a copy of that array of structs and make a pointer to it, the following way:
struct someStruct* pp2 = malloc(_appropriate_size_);
memcpy(pp2, pp, _appropriate_length_);
I also can make a loop and do pp2[i]=pp[i] for 0 <= i <= n.
What is the difference between these approaches, and which is better and why?
There is no definitive answer for all architectures. You need to do profiling to figure out what method is best.
However IMHO I would imagine that memcpy would be faster simply that somebody has taken the time for your particular architecture/platform and is able to use particular nuances to speed things up.
The former uses identifiers that are forbidden in C: anything prefixed with _. The latter is invalid in C89, as struct assignment was rationalised in C99. Assuming neither of these factors cause issues, there is no functional difference.
Better isn't well defined; If you define "better" as compiles in C89, then the former might be better. However, if you define "better" as has no undefined behaviour in C99, then the latter is better. If you define "better" as more optimal, then there is no definitive answer as one implementation may produce poor performance for both, while another may produce poor performance for one and perfectly optimal code for the other, or even the same code for both. This is pretty unlikely to be a bottleneck in your algorithm, anyway...
I would say the memcpy to be faster - usually tuned for the underlying platform and may possibly use DMA initiated transfers (without L1/L2 cache copies). The for-loop possibly may involved extra transfers. However, it depends on how smart the underlying compiler is - if it spots statically defined value for n, it may replace it with memcpy. It is worth timing the routines or checking the assembly code as Mystical did mention.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I'm building a minishell in C, and have come to a roadblock that seems that it could be easily fixable by using global variables(3 to be exact). The reason I think globals are necessary, is that the alternative would be to pass these variables to almost every single function in my program.
The variables are mainargc, mainargv, and shiftedMArgV. The first two are the number of arguments, and argument list passed to main, respectively. The variable shiftedMArgV is the argument list, however it may have been shifted. I'm trying to create builtin functions shift and unshift, and would make shiftedMArgV point to different arguments.
So, would it be stupid to just make these global? Otherwise I will have to revise a very large amount of code, and I'm not sure I'd be losing anything by making them global.
If I do make them global, would doing so from the main header file be foolish?
Thanks for the help guys, if you need any clarification just ask.
As an alternative to global variables, consider 'global functions':
extern int msh_mainArgC(void);
extern char **msh_mainArgV(void);
extern char **msh_shiftedArgV(void);
The implementations of those functions is trivial, but it allows you control over the access to the memory. And if you need to do something fancy, you can change the implementation of the functions. (I chose to capitalize the C and V to make the difference more visible; when only the last character of an 8-12 letter name is different, it is harder to spot the difference.)
There'd be an implementation file that would define these functions. In that file, there'd be static variables storing the relevant information, and functions to set and otherwise manipulate the variables. In principle, if you slap enough const qualifiers around, you could ensure that the calling code cannot modify the data except via the functions designed to do so (or by using casts to remove the const-ness).
Whether this is worthwhile for you is debatable. But it might be. It is a more nearly 'object-oriented' style of operation. It is an alternative to consider and then discard, rather than something to leave unconsidered.
Note that your subsystems that use these functions might have one function that collects the global values, and then passes these down to its subordinate functions. This saves the subordinates from having to know where the values came from; the just operate with them correctly. If there are global variables, you have to worry about aliasing — is a function passed values (copies of the global variables) but does it also access the global variables. With the functions, you don't have to worry about that in the same way.
I would say that it is not stupid, but that you should proceed with a certain caution.
The reason globals are usually avoided is not that they should never be used, but rather that their usage has led people to frequently led programmers to crash and burn. Through experience one learns the difference between when it is the right time and when it is the wrong time.
If you have thought deeply about the problem you are trying to solve and considered the code you've wrote to solve this problem and also considered the future of this code (i.e. are you compromising maintainability) and feel that a global is either unavoidable or better represents the coded solution, then you should go with the global.
Later, you may crash and burn, but that experience will help you later discern what a better choice may have been. Conversely, if you feel as though not using the globals may lead to crashage and burnage, than this is your prior experience saying you should use them. You should trust such instincts.
Dijkstra has a paper in which he discusses the harm the goto statement may cause, but his discussion also, in my opinion, explains some of our difficulties with globals. It may be worth a read.
This answer and this answer may also be of use.
Globals are ok as long as they are really globals in a logical way, and not just a mean to make your life easier. For example, globals can describe an environment in which your program executes or, in another words, attributes that are relevant on system level of your app.
Pretty much all complex software I ever worked with had a set of well defined globals. There's nothing wrong with that. They range from just a handful to about a dozen. In the latter case they're usually grouped logically in structs.
Globals are usually exposed in a header file as externs, and then defined in a source file. Just remember that globals are shared between threads and thus must be protected, unless it makes more sense to declare them with thread local storage.
For a shell, you have a lot more state than this. You have the state of remapped file descriptors (which you need to track for various reasons), trap dispositions, set options, the environment and shell variables, ...
In general, global variables are the wrong solution. Instead, all of the state should be kept in a context structure of some sort, a pointer to which is passed around everywhere. This is good program design, and usually it allows you to have multiple instances of the same code running in the same process (e.g. multiple interpreters, multiple video decoders, etc.).
With that said, the shell is a very special case, because it also deals with a lot of global state you can't keep in a structure: signal dispositions, file descriptors and mappings, child processes, process groups, controlling terminal, etc. It may be possible to abstract a lot of this with an extra layer so that you can emulate the required behavior while keeping clean contexts that could exist in multiplicity within a single process, but that's a much more difficult task than writing a traditional shell. As such, I might give yourself some leeway to write your shell "the lazy way" using global variables. If this is a learning exercise, though, try to carefully identify each time you're introducing global variables or state, why you're doing it, and how you might be able to implement the program differently without having global state. This will be very useful to you in the future.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
20 years ago, there was (almost) no any compilers optimizations. So, we started to use some hacks, such as:
Use pointers, not array indexes.
Don't use small functions (such as swap()), use macros or write the code directly.
Today, we have complex compiler optimization. Array indexes and pointer are same. If we use -O3 (I know, it's dangerous), compiler will remove all functions except main().
So, the small hacks in the old books (Programming Pearls, The C Programming Language) are useless today? They are just make the code more unreadable?
Programming Pearls is about optimisation at the algorithm level, not at the code level, so it's still highly relevant today.
Code micro-optimisations are another story though, and many of the old tricks are now either redundant or even harmful. There are still important techniques that can be applied to performance-critical code today, but these also may become redundant/harmful at some point in the future. You need to keep up-to-date with advances in CPU micro-architecture and compiler technology and use only what's appropriate (and only when absolutely needed of course - premature optimisation being the root of all evil.)
"Use pointers, not array indexes."
This has never been more efficient. Even the old drafts of ANSI-C specified that they were equivalent:
3.3.2.1 Array subscripting
The definition of the subscript operator [] is that E1[E2] is identical to
(*(E1+(E2)))
"Don't use small functions (such as swap()), use macros or write the code directly."
This has been obsolete for quite a while. C99 introduced the inline keyword, but even before that, compilers were free to inline parts of the code. It makes no sense to write such function-like macros today for efficiency reasons.
"So, the small hacks in the old books (Programming Pearls, The C Programming Language) are useless today? They are just make the code more unreadable?"
Please note that what follows here is just my personal opinion and not a consensus among the world's programmer community: I would personally say that those two books are not only useless, they are harmful. Not so much because of various optimization tricks, but mainly because of the horrible, unreadable coding style and the heavy reliance on poorly-defined behavior. Both books are also filled with bugs and typos, so you can't even read them without the errata next to you.
Those hacks are still useful in case you are not allowed to turn on optimization for whatever reason. Sometimes the compiler will also not be able to optimize code as he does not know about intended and uninteded side effects of a certain piece of code.
It really depends on what requirements you have. To my experience there are still things you can express in better ways in order to make the compiler understanding your intention better. It's always a trade off to sacrifice readability in order to gain a better compilation result.
Basically, yes. But, if you do find a particularly ridiculous example of a missed optimization opportunity, then you should report it to the developers!
Braindead source code will always produce braindead machine code though: to a certain extent the compiler still has to do what you say, rather than what you meant, although many common idioms are recognised and "fixed" (the rule is that it has got to be impossible to tell that it's been altered without using a debugger).
And then there are still tricks, new and old, that are useful, at least on some architectures.
For example, if you have a loop that counts from 0 to 100 and does something to an array, some compilers might reverse the counter and make it go from 100 down to zero (because comparing against zero is cheaper than against another constant), but they can't do that if you loop has a side effect. If you don't care that the side-effect happens in reverse order then you can get better code if you reverse the counter yourself.
Another useful trick that GCC has is __builtin_expect(expr, bool), with which you can tell the compiler that expr is likely to be true or false, so it can optimize branches accordingly. Similarly, __builtin_unreachable() can tell GCC that something can't happen, so it doesn't have to allow for the case where it does.
In general though, the compiler is good enough that you really don't need to care unless your program spends 90% of its runtime in that one tiny function. (For example, memcpy is still typically written in assembler).
I have just discovered the joy of bitflags. I have several questions related to "best-practices" regarding the use of bitflags in C. I learned everything from various examples I found on the web but still have questions.
In order to save space, I am using a single 32bit integer field in a struct (A->flag) to represent several different sets of boolean properties. In all, 20 different bits are #defined. Some of these are truly presence/absence flags (STORAGE-INTERNAL vs. STORAGE-EXTERNAL). Others have more than two values (e.g. mutually exclusive set of formats: FORMAT-A, FORMAT-B, FORMAT-C). I have defined macros for setting specific bits (and simultaneously turning off mutually exclusive bits). I have also defined macros for testing if specific combination of bits are set in the flag.
However, what is lost in the above approach is the specific grouping of flags that is best captured by enums. For writing functions, I would like to use enums (e.g., STORAGE-TYPE and FORMAT-TYPE), so that function definitions look nice. I expect to use enums only for passing parameters and #defined macros for setting and testing flags.
(a) How do I define flag (A->flag) as a 32 bit integer in a portable fashion (across 32 bit / 64 bit platforms)?
(b) Should I worry about potential size differences in how A->flag vs. #defined constants vs. enums are stored?
(c) Am I making things unnecessarily complicated, meaning should I just stick to using #defined constants for passing parameters as ordinary ints? What else should I worry about in all this?
I apologize for the poorly articulated question. It reflects my ignorance about potential issues.
There is a C99 header that was intended to solve that exact problem (a) but for some reason Microsoft doesn't implement it. Fortunately, you can get <stdint.h> for Microsoft Windows here. Every other platform will already have it. The 32-bit int types are uint32_t and int32_t. These also come in 8, 16, and 64- bit flavors.
So, that takes care of (a).
(b) and (c) are kind of the same question. We do make assumptions whenever we develop something. You assume that C will be available. You assume that <stdint.h> can be found somewhere. You could always assume that int was at least 16 bits and now a >= 32 bit assumption is fairly reasonable.
In general, you should try to write conforming programs that don't depend on layout, but they will make assumptions about word length. You should worry about performance at the algorithm level, that is, am I writing something that is quadratic, polynomial, exponential?
You should not worry about performance at the operation level until (a) you notice a performance lag, and (b) you have profiled your program. You need to get your job done without bogging down worrying about individual operations. :-)
Oh, I should add that you particularly don't need to worry about low level operation performance when you are writing the program in C in the first place. C is the close-to-the-metal go-as-fast-as-possible language. We routinely write stuff in php, python, ruby, or lisp because we want a powerful language and the CPU's are so fast these days that we can get away with an entire interpreter, never mind a not-perfect choice of bit-twiddle-word-length ops. :-)
You can use bit-fields and let the compiler do the bit twiddling. For example:
struct PropertySet {
unsigned internal_storage : 1;
unsigned format : 4;
};
int main(void) {
struct PropertySet x;
struct PropertySet y[10]; /* array of structures containing bit-fields */
if (x.internal_storage) x.format |= 2;
if (y[2].internal_storage) y[2].format |= 2;
return 0;
}
Edited to add array of structures
As others have said, your problem (a) is resolvable by using <stdint.h> and either uint32_t or uint_least32_t (if you want to worry about Burroughs mainframes which have 36-bit words). Note that MSVC does not support C99, but #DigitalRoss shows where you can obtain a suitable header to use with MSVC.
Your problem (b) is not an issue; C will type convert safely for you if it is necessary, but it probably isn't even necessary.
The area of most concern is (c) and in particular the format sub-field. There, 3 values are valid. You can handle this by allocating 3 bits and requiring that the 3-bit field is one of the values 1, 2, or 4 (any other value is invalid because of too many or too few bits set). Or you could allocate a 2-bit number, and specify that either 0 or 3 (or, if you really want to, one of 1 or 2) is invalid. The first approach uses one more bit (not currently a problem since you're only using 20 of 32 bits) but is a pure bitflag approach.
When writing function calls, there is no particular problem writing:
some_function(FORMAT_A | STORAGE_INTERNAL, ...);
This will work whether FORMAT_A is a #define or an enum (as long as you specify the enum value correctly). The called code should check whether the caller had a lapse in concentration and wrote:
some_function(FORMAT_A | FORMAT_B, ...);
But that is an internal check for the module to worry about, not a check for users of the module to worry about.
If people are going to be switching bits in the flags member around a lot, the macros for setting and unsetting the format field might be beneficial. Some might argue that any pure boolean fields barely need it, though (and I'd sympathize). It might be best to treat the flags member as opaque and provide 'functions' (or macros) to get or set all the fields. The less people can get wrong, the less will go wrong.
Consider whether using bit-fields works for you. My experience is that they lead to big code and not necessarily very efficient code; YMMV.
Hmmm...nothing very definitive here, so far.
I would use enums for everything because those are guaranteed to be visible in a debugger where #define values are not.
I would probably not provide macros to get or set bits, but I'm a cruel person at times.
I would provide guidance on how to set the format part of the flags field, and might provide a macro to do that.
Like this, perhaps:
enum { ..., FORMAT_A = 0x0010, FORMAT_B = 0x0020, FORMAT_C = 0x0040, ... };
enum { FORMAT_MASK = FORMAT_A | FORMAT_B | FORMAT_C };
#define SET_FORMAT(flag, newval) (((flag) & ~FORMAT_MASK) | (newval))
#define GET_FORMAT(flag) ((flag) & FORMAT_MASK)
SET_FORMAT is safe if used accurately but horrid if abused. One advantage of the macros is that you could replace them with a function that validated things thoroughly if necessary; this works well if people use the macros consistently.
For question a, if you are using C99 (you probably are using it), you can use the uint32_t predefined type (or, if it is not predefined, it can be found in the stdint.h header file).
Regarding (c): if your enumerations are defined correctly you should be able to pass them as arguments without a problem. A few things to consider:
enumeration storage is often
compiler specific, so depending on
what kind of development you are
doing (you don't mention if it's
Windows vs. Linux vs. embedded vs.
embedded Linux :) ) you may want to
visit compiler options for enum
storage to make sure there are no
issues there. I generally agree with
the above consensus that the
compiler should cast your
enumerations appropriately - but
it's something to be aware of.
in the case that you are doing
embedded work, many static quality
checking programs such as PC Lint
will "bark" if you start getting too
clever with enums, #defines, and
bitfields. If you are doing
development that will need to pass
through any quality gates, this
might be something to keep in mind.
In fact, some automotive standards
(such as MISRA-C) get downright
irritable if you try to get trig
with bitfields.
"I have just discovered the joy of
bitflags." I agree with you - I find
them very useful.
I added comments to each answer above. I think I have some clarity. It seems enums are cleaner as it shows up in debugger and keeps fields separate. macros can be used for setting and getting values.
I have also read that enums are stored as small integers - which as I understand it, is not a problem with the boolean tests as these would be peroformed starting at the right most bits. But, can enums be used to store large integers (1 << 21)??
thanks again to you all. I have already learned more than I did two days ago!!
~Russ