Arithmetic operations between constants - c

Consider this code;
#define A 5
#define B 3
int difference = A - B;
does value of "difference" is hardcoded as "2" in compile time, or does it get calculated on runtime?

The A and B macros are a bit of a distraction. This:
#define A 5
#define B 3
int difference = A - B;
is exactly equivalent to this:
int difference = 5 - 3;
so let's discuss the latter.
5 - 3 is a constant expression, which is an expression that "can be evaluated during translation rather than runtime, and accordingly may be used in any place that a constant may be". It's also an *integer constant expression". For example, a case label must be an integer constant expression, so you could write either this:
switch (foo) {
case 2: /* this is a constant */
...
}
or this:
switch (foo) {
case 5 - 3: /* this is a constant expression */
...
}
But note that the definition says that it can be evaluated during translation, not that it must be. There are some contexts that require constant expressions, and in those contexts the expression must be evaluated at compile time.
But assuming that difference is declared inside some function, the initializer is not one of those contexts.
Any compiler worth what you pay for it (even if it's free) will reduce 5 - 3 to 2 at compile time, and generate code that stores the value 2 in difference. But it's not required to do so. The C standard specifies the behavior of programs; it doesn't specify how that behavior must be implemented. But it's safe to assume that whatever compiler you're using will replace 5 - 3 by 2.
Even if you write:
int difference = 2;
a compiler could legally generate code that loads the value 5 into a register, subtracts 3 from it, and stores the contents of the register into difference. That would be a silly thing to do, but the language standard doesn't exclude it.
As long as the final result is that difference has the value 2, the language standard doesn't care how it's done.
On the other hand, if you write:
switch (foo) {
case 5 - 3: /* ... */
case 2: /* ... */
}
then the compiler must compute the result so it can diagnose the error (you can't have two case labels with the same value.
Finally, if you define difference at file scope (outside any function), then the initial value does have to be constant. But the real distinction in that case is not whether 5 - 3 will be evaluated at compile time, it's whether you're allowed to use a non-constant expression.
Reference: The latest draft of the 2011 C standard is N1570 (large PDF); constant expressions are discussed in section 6.6.

The standard does not specify this sort of thing. It says nothing about potential optimizations like this (and for good reason. A standard defines semantics, not implementation).
Why not look at the disassembly for your compiler? That will give you a definitive answer.
...
So let's do that.
Here is the output from VC++ 10:
#include <iostream>
#define A 5
#define B 3
int main() {
int x = A - B;
std::cout << x; // make sure the compiler doesn't toss it away
010A1000 mov ecx,dword ptr [__imp_std::cout (10A2048h)]
010A1006 push 2
010A1008 call dword ptr [__imp_std::basic_ostream<char,std::char_traits<char> >::operator<< (10A2044h)]
return 0;
010A100E xor eax,eax
As you can see, it just replaced the occurrence of x with a static value of 2 and pushed it onto the stack for the call to cout. It did not evaluate the expression at runtime.

Related

Can two implementation defined identical expressions give different results?

Related to: Three questions: Is NULL - NULL defined? Is (uintptr_t)NULL - (uintptr_t)NULL defined?
Lets consider:
Case 1:
(uintptr_t)NULL - (uintptr_t)NULL will the result always be zero?
Case 2 (ispired by the Eric comment):
uintptr_t x = (uintptr_t)NULL;
will x - x be always zero?
case 3:
uintptr_t x = (uintptr_t)NULL, y = (uintptr_t)NULL;
Will x-y be always zero?
Case 4:
void *a;
/* .... */
uintptr_t x = (uintptr_t)a, y = (uintptr_t)a;
Will x-y be always zero?
If not - why?
Can two implementation defined identical expressions give different results?
Yes. It's "implementation-defined" - all rules are up to implementation. An imaginary implementation may look like this:
int main() {
void *a = 0;
#pragma MYCOMPILER SHIFT_UINTPTR 0
printf("%d\n", (int)(uintptr_t)a); // prints 0
#pragma MYCOMPILER SHIFT_UINTPTR 5
printf("%d\n", (int)(uintptr_t)a); // prints 5
}
Still such an implementation would be insane on most platforms.
I could imagine a example: architecture that has to deal with memory in "banks". A compiler for that architecture uses a #pragma switch to select the "bank" that is used for dereferencing pointers.
(uintptr_t)NULL - (uintptr_t)NULL will the result always be zero?
Not necessarily.
will x - x be always zero?
Yes. uintptr_t is an unsigned integer type, it has to obey the laws of mathematics.
Will x-y be always zero?
Not necessarily.
Will x-y be always zero?
Not necessarily.
If not - why?
The result of conversion from void* to uintptr_t is implementation defined - the implementation may convert the pointer value to different uintptr_t value each time, which would result in a non-zero difference between the values.
I could see a example: on some imaginary architecture pointers have 48-bits, while uintptr_t has 64-bits. A compiler for such architecture just "doesn't care" what is in those 16 extra bits and when converting uintptr_t to a pointer it uses only the 48-bits. When converting pointer to an uintrpt_t compiler uses whatever garbage value was leftover in registers for the extra 16-bits, because it's fast to do that in that specific architecture and because they will never be used when converting back..
You won't find a system in use anywhere where NULL isn't defined as some form of 0, be it the literal value or a void * with value 0, so all of your checks will either work as you'd expect or be syntax errors (can't subtract void * values).
Is it possible to be defined as anything else though? Theoretically. Though it's still a constant, so subtracting them (language and type allowing) will still equal 0.

Risks of adding uint to enum in function call

I have a function that returns a float number:
float function(enum value)
I then have an enum
typedef enum
{
a = 0,
b,
c
} myenum;
I want to do the following:
function(a+1);
And I wonder if there are any risks other than the risk of unexpected behaviour if the enum changes. My question might seem dumb but I have to make sure that there are no risks of what I'm doing.
Please don't ask questions on why it's done like this. Because I don't know. I just need to know if it's safe or not.
This is safe. Moreover, the standard guarantees that a+1 is b and a+2 is c in the scenario that you describe:
C99 standard, section 6.7.2.2, part 3: If the first enumerator has no =, the value of its enumeration constant is 0. Each subsequent enumerator with no = defines its enumeration constant as the value of the constant expression obtained by adding 1 to the value of the previous enumeration constant.
It's safe. As you seem to recognise yourself, it's really working against the way enums are intended to work, which is as arbitrary labels. However sometimes you want ordering such that a < b < c. If a = 0 and b = 1 and c = 2 in some firm sense, then you don't want an enum, however, you want a variable of type int.

Using multiplied macro in array declaration

I know the following is valid code:
#define SOMEMACRO 10
int arr[SOMEMACRO];
which would result as int arr[10].
If I wanted to make an array 2x size of that (and still need the original macro elsewhere), is this valid code?
#define SOMEMACRO 10
int arr[2 * SOMEMACRO];
which would be int arr[2 * 10] after precompilation. Is this still considered as constant expression by the compiler?
After a quick look it seems to work, but is this defined behavior?
Yes it will work.MACRO will be placed as it is at compilation so a[2*SOMEMACRO] will become a[2*10] which is perfectly valid.
To check what is preprocessed you can use cc -E foo.c option
Is this still considered as constant expression by the compiler?
Yes. That's the difference between a constant expression and a literal: a constant expression need not be a single literal, bit it can be any expression of which the value can be computed at compile time (i. e. a combination of literals or other constant expressions).
(Just for the sake of clarity: of course literals are still considered constant expressions.)
However, in C, the size of the array need not be a compile-time constant. C99 and C11 supports variable-length arrays (VLAs), so
size_t sz = // some size calculated at runtime;
int arr[sz];
is valid C as well.
Yes you can use this expression. It will not result in UB.
Note that an array subcript may be an integer expression:
#define i 5
#define j 4
int a[i+j*10] = 0;
The value of of subscript i+j*10 will be calculated during compilation.
yes, as long as it a valid number it's a constant expression.
and if you say it worked then you know the compiler worked just fine with it.
as you know we can't do
int x;
scanf("%d", &x);
int arr[2 * x];
because that's no a constant number. but what you've written is a constant number, so you're good to go

Does the C99 standard permit assignment of a variable to itself?

Does the C99 standard allow variables to be assigned to themselves? For instance, are the following valid:
int a = 42;
/* Case 1 */
a = a;
/* Case 2 */
int *b = &a;
a = *b;
While I suspect Case 1 is valid, I'm hesitant to say the same for Case 2.
In the case of an assignment, is the right side completely evaluated before assigning the value to the variable on the left -- or is a race condition introduced when dereferencing a pointer to the variable being assigned?
Both cases are perfectly valid, since the value of a is only used to determine the value that is to be stored, not to determine the object in which this value is to be store.
In essence in an assignment you have to distinguish three different operations
determine the object to which the value is to be stored
evaluate the RHS
store the determined value in the determined object
the first two of these three operations can be done in any order, even in parallel. The third is obviously a consequence of the two others, so it will come after.
This is perfectly valid, you are only using the previous value to determine the value to be stored. This is covered in the draft C99 standard section 6.5.2 which says:
Between the previous and next sequence point an object shall have its
stored value modified at most once by the evaluation of an
expression.Furthermore, the prior value shall be read only to
determine the value to be stored.
One of the examples of valid code is as follows:
i = i + 1;
The C and C++ section here covers the different places where a sequence point can occur.
C99 6.5.16.1 Simple assignment
3 If the value being stored in an object is read from another object that overlaps in any way
the storage of the first object, then the overlap shall be exact and the two objects shall
have qualified or unqualified versions of a compatible type; otherwise, the behavior is
undefined.
I think the example code qualifies the "overlap" condition. Since they do have qualified version of a compatible type, the result is valid.
Also 6.5.16 Assignment operators
4 The order of evaluation of the operands is unspecified. If an attempt is made to modify
the result of an assignment operator or to access it after the next sequence point, the
behavior is undefined.
Still, there's no "attempt to modify the result" so the result is valid.
Assuming the compiler doesn't optimize the first instruction out by simply removing it, there is even a race condition here. On most architecture, if a is stored in memory a = a will be compiled in two move instructions (mem => reg, reg => mem) and therefore is not atomic.
Here is an example:
int a = 1;
int main()
{ a = a; }
Result on an Intel x86_64 with gcc 4.7.1
4004f0: 8b 05 22 0b 20 00 mov 0x200b22(%rip),%eax # 601018 <a>
4004f6: 89 05 1c 0b 20 00 mov %eax,0x200b1c(%rip) # 601018 <a>
I can't see a C compiler not permitting a = a. Such an assignment may occur serendipitously due to macros without a programmer knowing it. It may not even generate any code for that is an optimizing issue.
#define FOO (a)
...
a = FOO;
Sample code readily compiles and my review of the C standard shows no prohibition.
As to race conditions #Yu Hao answers that well: no race condition.

Difference and definition of literal and symbolic constants in C?

I am having trouble getting to grips with the definition and uses of symbolic and literal constants and I was wondering if you anyone could explain them and highlight their differences. Thanks!
A literal constant is a value typed directly into your program wherever it is needed. For example
int tempInt = 10;
tempInt is a variable of type int; 10 is a literal constant. You can't assign a value to 10, and its value can't be changed. A symbolic constant is a constant that is represented by a name, just as a variable is represented. Unlike a variable, however, after a constant is initialized, its value can't be changed.
If your program has one integer variable named students and another named classes, you could compute how many students you have, given a known number of classes, if you knew there were 15 students per class:
students = classes * 15;
A symbol is something that the compiler deals with. The compiler treats a const pretty much the way it treats a variable. On the other hand, a #define is something the compiler is not even aware of, because the precompiler transforms it into its value. It's like search-and-replace. If you do
#define A 5
and then
b += A;
The precompiler translates it into
b += 5;
and all the compiler sees is the number 5.
(Borrowing from earlier posts)
A literal constant is a value typed directly into your program wherever it is needed. For example
int breakpoint = 10;
The variable breakpoint is an integer (int); 10 is a literal constant. You can't assign a value to 10, and its value can't be changed. Unlike a variable, a constant can't be changed after it is assigned a value (initialized).
A symbol is something that the compiler deals with. In this example, TEN is a symbolic constant created using the #define function. A #define is something the compiler is not even aware of, because the precompiler transforms it into its assigned (defined) value. The precompiler searches out and replaces every symbol constant inside your program with a value.
#define TEN 10 /* These two lines of code become one... */
breakpoint += TEN; /* after running through the precompiler */
The precompiler translates it into
Breakpoint += 10;
The compiler never sees TEN but only its assigned value, 10. Why is this useful? What if the breakpoint is changed to 11. Rather than looking through the entire program and changing every variable definition to the new value that was set using a literal constant, 10, change the definition of a single symbol constant... TEN to 11 and let the precompiler do the changes for you.
I think what you mean is that a literal constant is a primitive expression like "string" or 2 or false, while a symbolic one is when you give it a name, like const int MagicNumber = 42. Both can be used as expressions, but you can refer to the latter with a name. Useful if you use the same constant from many places.

Resources