Getting an address using both square brackets and ampersand - c

Consider:
int a[2] = {0,1};
int *address_of_second = (&a[1]);
I assume this works because it's translated to &*(a+1) and then the & and * cancel each other out, but can I count on it, or is it compiler-specific? That is, does the C standard have anything to say about this?
Is this a decent way to write?
Personally I think that writing:
int *address_of_second = a+1
is better, do you agree?
Thanks.

You can count on the behaviour in your first code example.
I assume it works because it's
translated to &*(a+1) and then the &
and * cancel each other, but can I
count on it, or is it compiler
specific? that is, does the standard
have any thing to say about this?
a[1] is the same as *(a + 1), and &a[1] is the same as &(*(a + 1)), which gives you a pointer (&) to the (dereferenced, *) int at a + 1. This is well-defined behaviour, which you can count on.
int *address_of_second = a+1
This is readable, but not quite as readable as &a[1] in my opinion. &a[1] explicitly shows that a is a pointer, that you're referencing an offset of that pointer, and that you're getting a pointer to that offset. a + 1 is slightly more ambiguous in that the actual line doesn't tell you anything about what a is (you can deduce that it's a pointer but for all you know from that snippet a could be just an int).
Even so, that's just my opinion. You're free to make up your own style decisions, so long as you understand that behind the scenes that they're the same at the lowest level.

it is a matter of style, or code convention if you work in a group. Both forms are c++ compatible on all conforming compilers.
I prefer the former :)

Both work. My personal preference would be either &(a[1]) or a+1. I prefer the bracket placement in the first because it makes it very clear what I'm taking the address of. I'd tend to use that in most situations. If the code in question is doing a lot of pointer arithmetic, then I might use the second form.

Related

GCC accepts index[var] (eg.2[a]), and treats it same as var[index] (eg. a[2]) [duplicate]

As Joel points out in Stack Overflow podcast #34, in C Programming Language (aka: K & R), there is mention of this property of arrays in C: a[5] == 5[a]
Joel says that it's because of pointer arithmetic but I still don't understand. Why does a[5] == 5[a]?
The C standard defines the [] operator as follows:
a[b] == *(a + b)
Therefore a[5] will evaluate to:
*(a + 5)
and 5[a] will evaluate to:
*(5 + a)
a is a pointer to the first element of the array. a[5] is the value that's 5 elements further from a, which is the same as *(a + 5), and from elementary school math we know those are equal (addition is commutative).
Because array access is defined in terms of pointers. a[i] is defined to mean *(a + i), which is commutative.
I think something is being missed by the other answers.
Yes, p[i] is by definition equivalent to *(p+i), which (because addition is commutative) is equivalent to *(i+p), which (again, by the definition of the [] operator) is equivalent to i[p].
(And in array[i], the array name is implicitly converted to a pointer to the array's first element.)
But the commutativity of addition is not all that obvious in this case.
When both operands are of the same type, or even of different numeric types that are promoted to a common type, commutativity makes perfect sense: x + y == y + x.
But in this case we're talking specifically about pointer arithmetic, where one operand is a pointer and the other is an integer. (Integer + integer is a different operation, and pointer + pointer is nonsense.)
The C standard's description of the + operator (N1570 6.5.6) says:
For addition, either both operands shall have arithmetic type, or one
operand shall be a pointer to a complete object type and the other
shall have integer type.
It could just as easily have said:
For addition, either both operands shall have arithmetic type, or the left
operand shall be a pointer to a complete object type and the right operand
shall have integer type.
in which case both i + p and i[p] would be illegal.
In C++ terms, we really have two sets of overloaded + operators, which can be loosely described as:
pointer operator+(pointer p, integer i);
and
pointer operator+(integer i, pointer p);
of which only the first is really necessary.
So why is it this way?
C++ inherited this definition from C, which got it from B (the commutativity of array indexing is explicitly mentioned in the 1972 Users' Reference to B), which got it from BCPL (manual dated 1967), which may well have gotten it from even earlier languages (CPL? Algol?).
So the idea that array indexing is defined in terms of addition, and that addition, even of a pointer and an integer, is commutative, goes back many decades, to C's ancestor languages.
Those languages were much less strongly typed than modern C is. In particular, the distinction between pointers and integers was often ignored. (Early C programmers sometimes used pointers as unsigned integers, before the unsigned keyword was added to the language.) So the idea of making addition non-commutative because the operands are of different types probably wouldn't have occurred to the designers of those languages. If a user wanted to add two "things", whether those "things" are integers, pointers, or something else, it wasn't up to the language to prevent it.
And over the years, any change to that rule would have broken existing code (though the 1989 ANSI C standard might have been a good opportunity).
Changing C and/or C++ to require putting the pointer on the left and the integer on the right might break some existing code, but there would be no loss of real expressive power.
So now we have arr[3] and 3[arr] meaning exactly the same thing, though the latter form should never appear outside the IOCCC.
And, of course
("ABCD"[2] == 2["ABCD"]) && (2["ABCD"] == 'C') && ("ABCD"[2] == 'C')
The main reason for this was that back in the 70's when C was designed, computers didn't have much memory (64KB was a lot), so the C compiler didn't do much syntax checking. Hence "X[Y]" was rather blindly translated into "*(X+Y)"
This also explains the "+=" and "++" syntaxes. Everything in the form "A = B + C" had the same compiled form. But, if B was the same object as A, then an assembly level optimization was available. But the compiler wasn't bright enough to recognize it, so the developer had to (A += C). Similarly, if C was 1, a different assembly level optimization was available, and again the developer had to make it explicit, because the compiler didn't recognize it. (More recently compilers do, so those syntaxes are largely unnecessary these days)
One thing no-one seems to have mentioned about Dinah's problem with sizeof:
You can only add an integer to a pointer, you can't add two pointers together. That way when adding a pointer to an integer, or an integer to a pointer, the compiler always knows which bit has a size that needs to be taken into account.
To answer the question literally. It is not always true that x == x
double zero = 0.0;
double a[] = { 0,0,0,0,0, zero/zero}; // NaN
cout << (a[5] == 5[a] ? "true" : "false") << endl;
prints
false
I just find out this ugly syntax could be "useful", or at least very fun to play with when you want to deal with an array of indexes which refer to positions into the same array. It can replace nested square brackets and make the code more readable !
int a[] = { 2 , 3 , 3 , 2 , 4 };
int s = sizeof a / sizeof *a; // s == 5
for(int i = 0 ; i < s ; ++i) {
cout << a[a[a[i]]] << endl;
// ... is equivalent to ...
cout << i[a][a][a] << endl; // but I prefer this one, it's easier to increase the level of indirection (without loop)
}
Of course, I'm quite sure that there is no use case for that in real code, but I found it interesting anyway :)
Nice question/answers.
Just want to point out that C pointers and arrays are not the same, although in this case the difference is not essential.
Consider the following declarations:
int a[10];
int* p = a;
In a.out, the symbol a is at an address that's the beginning of the array, and symbol p is at an address where a pointer is stored, and the value of the pointer at that memory location is the beginning of the array.
For pointers in C, we have
a[5] == *(a + 5)
and also
5[a] == *(5 + a)
Hence it is true that a[5] == 5[a].
Not an answer, but just some food for thought.
If class is having overloaded index/subscript operator, the expression 0[x] will not work:
class Sub
{
public:
int operator [](size_t nIndex)
{
return 0;
}
};
int main()
{
Sub s;
s[0];
0[s]; // ERROR
}
Since we dont have access to int class, this cannot be done:
class int
{
int operator[](const Sub&);
};
It has very good explanation in A TUTORIAL ON POINTERS AND ARRAYS IN C
by Ted Jensen.
Ted Jensen explained it as:
In fact, this is true, i.e wherever one writes a[i] it can be
replaced with *(a + i) without any problems. In fact, the compiler
will create the same code in either case. Thus we see that pointer
arithmetic is the same thing as array indexing. Either syntax produces
the same result.
This is NOT saying that pointers and arrays
are the same thing, they are not. We are only saying that to identify
a given element of an array we have the choice of two syntaxes, one
using array indexing and the other using pointer arithmetic, which
yield identical results.
Now, looking at this last
expression, part of it.. (a + i), is a simple addition using the +
operator and the rules of C state that such an expression is
commutative. That is (a + i) is identical to (i + a). Thus we could
write *(i + a) just as easily as *(a + i).
But *(i + a) could have come from i[a] ! From all of this comes the curious
truth that if:
char a[20];
writing
a[3] = 'x';
is the same as writing
3[a] = 'x';
I know the question is answered, but I couldn't resist sharing this explanation.
I remember Principles of Compiler design,
Let's assume a is an int array and size of int is 2 bytes,
& Base address for a is 1000.
How a[5] will work ->
Base Address of your Array a + (5*size of(data type for array a))
i.e. 1000 + (5*2) = 1010
So,
Similarly when the c code is broken down into 3-address code,
5[a] will become ->
Base Address of your Array a + (size of(data type for array a)*5)
i.e. 1000 + (2*5) = 1010
So basically both the statements are pointing to the same location in memory and hence, a[5] = 5[a].
This explanation is also the reason why negative indexes in arrays work in C.
i.e. if I access a[-5] it will give me
Base Address of your Array a + (-5 * size of(data type for array a))
i.e. 1000 + (-5*2) = 990
It will return me object at location 990.
in c compiler
a[i]
i[a]
*(a+i)
are different ways to refer to an element in an array ! (NOT AT ALL WEIRD)
In C arrays, arr[3] and 3[arr] are the same, and their equivalent pointer notations are *(arr + 3) to *(3 + arr). But on the contrary [arr]3 or [3]arr is not correct and will result into syntax error, as (arr + 3)* and (3 + arr)* are not valid expressions. The reason is dereference operator should be placed before the address yielded by the expression, not after the address.
A little bit of history now. Among other languages, BCPL had a fairly major influence on C's early development. If you declared an array in BCPL with something like:
let V = vec 10
that actually allocated 11 words of memory, not 10. Typically V was the first, and contained the address of the immediately following word. So unlike C, naming V went to that location and picked up the address of the zeroeth element of the array. Therefore array indirection in BCPL, expressed as
let J = V!5
really did have to do J = !(V + 5) (using BCPL syntax) since it was necessary to fetch V to get the base address of the array. Thus V!5 and 5!V were synonymous. As an anecdotal observation, WAFL (Warwick Functional Language) was written in BCPL, and to the best of my memory tended to use the latter syntax rather than the former for accessing the nodes used as data storage. Granted this is from somewhere between 35 and 40 years ago, so my memory is a little rusty. :)
The innovation of dispensing with the extra word of storage and having the compiler insert the base address of the array when it was named came later. According to the C history paper this happened at about the time structures were added to C.
Note that ! in BCPL was both a unary prefix operator and a binary infix operator, in both cases doing indirection. just that the binary form included an addition of the two operands before doing the indirection. Given the word oriented nature of BCPL (and B) this actually made a lot of sense. The restriction of "pointer and integer" was made necessary in C when it gained data types, and sizeof became a thing.
Because it's useful to avoid confusing nesting.
Would you rather read this:
array[array[head].next].prev
or this:
head[array].next[array].prev
Incidentally, C++ has a similar commutative property for function calls. Rather than writing g(f(x)) as you must in C, you may use member functions to write x.f().g(). Replace f and g with lookup tables and you can write g[f[x]] (functional style) or (x[f])[g] (oop style). The latter gets really nice with structs containing indices: x[xs].y[ys].z[zs]. Using the more common notation that's zs[ys[xs[x].y].z].
Well, this is a feature that is only possible because of the language support.
The compiler interprets a[i] as *(a+i) and the expression 5[a] evaluates to *(5+a). Since addition is commutative it turns out that both are equal. Hence the expression evaluates to true.
In C
int a[]={10,20,30,40,50};
int *p=a;
printf("%d\n",*p++);//output will be 10
printf("%d\n",*a++);//will give an error
Pointer p is a "variable", array name a is a "mnemonic" or "synonym",
so p++ is valid but a++ is invalid.
a[2] is equals to 2[a] because the internal operation on both of this is "Pointer Arithmetic" internally calculated as *(a+2) equals *(2+a)
Because C compiler always convert array notation in pointer notation.
a[5] = *(a + 5) also 5[a] = *(5 + a) = *(a + 5)
So, both are equal.
C was based on BCPL. BCPL directly exposed memory as a sequence of addressable words. The unary operator !X (also known as LV) gave you the contents of the address location X. For convenience there was also a binary operator X!Y equivalent to !(X+Y) which gave you the contents of the Y'th word of an array at location X, or equivalently, the X'th word of an array at location Y.
In C, X!Y became X[Y], but the original BCPL semantics of !(X+Y) show through, which accounts for why the operator is commutative.

why this C program is working [duplicate]

As Joel points out in Stack Overflow podcast #34, in C Programming Language (aka: K & R), there is mention of this property of arrays in C: a[5] == 5[a]
Joel says that it's because of pointer arithmetic but I still don't understand. Why does a[5] == 5[a]?
The C standard defines the [] operator as follows:
a[b] == *(a + b)
Therefore a[5] will evaluate to:
*(a + 5)
and 5[a] will evaluate to:
*(5 + a)
a is a pointer to the first element of the array. a[5] is the value that's 5 elements further from a, which is the same as *(a + 5), and from elementary school math we know those are equal (addition is commutative).
Because array access is defined in terms of pointers. a[i] is defined to mean *(a + i), which is commutative.
I think something is being missed by the other answers.
Yes, p[i] is by definition equivalent to *(p+i), which (because addition is commutative) is equivalent to *(i+p), which (again, by the definition of the [] operator) is equivalent to i[p].
(And in array[i], the array name is implicitly converted to a pointer to the array's first element.)
But the commutativity of addition is not all that obvious in this case.
When both operands are of the same type, or even of different numeric types that are promoted to a common type, commutativity makes perfect sense: x + y == y + x.
But in this case we're talking specifically about pointer arithmetic, where one operand is a pointer and the other is an integer. (Integer + integer is a different operation, and pointer + pointer is nonsense.)
The C standard's description of the + operator (N1570 6.5.6) says:
For addition, either both operands shall have arithmetic type, or one
operand shall be a pointer to a complete object type and the other
shall have integer type.
It could just as easily have said:
For addition, either both operands shall have arithmetic type, or the left
operand shall be a pointer to a complete object type and the right operand
shall have integer type.
in which case both i + p and i[p] would be illegal.
In C++ terms, we really have two sets of overloaded + operators, which can be loosely described as:
pointer operator+(pointer p, integer i);
and
pointer operator+(integer i, pointer p);
of which only the first is really necessary.
So why is it this way?
C++ inherited this definition from C, which got it from B (the commutativity of array indexing is explicitly mentioned in the 1972 Users' Reference to B), which got it from BCPL (manual dated 1967), which may well have gotten it from even earlier languages (CPL? Algol?).
So the idea that array indexing is defined in terms of addition, and that addition, even of a pointer and an integer, is commutative, goes back many decades, to C's ancestor languages.
Those languages were much less strongly typed than modern C is. In particular, the distinction between pointers and integers was often ignored. (Early C programmers sometimes used pointers as unsigned integers, before the unsigned keyword was added to the language.) So the idea of making addition non-commutative because the operands are of different types probably wouldn't have occurred to the designers of those languages. If a user wanted to add two "things", whether those "things" are integers, pointers, or something else, it wasn't up to the language to prevent it.
And over the years, any change to that rule would have broken existing code (though the 1989 ANSI C standard might have been a good opportunity).
Changing C and/or C++ to require putting the pointer on the left and the integer on the right might break some existing code, but there would be no loss of real expressive power.
So now we have arr[3] and 3[arr] meaning exactly the same thing, though the latter form should never appear outside the IOCCC.
And, of course
("ABCD"[2] == 2["ABCD"]) && (2["ABCD"] == 'C') && ("ABCD"[2] == 'C')
The main reason for this was that back in the 70's when C was designed, computers didn't have much memory (64KB was a lot), so the C compiler didn't do much syntax checking. Hence "X[Y]" was rather blindly translated into "*(X+Y)"
This also explains the "+=" and "++" syntaxes. Everything in the form "A = B + C" had the same compiled form. But, if B was the same object as A, then an assembly level optimization was available. But the compiler wasn't bright enough to recognize it, so the developer had to (A += C). Similarly, if C was 1, a different assembly level optimization was available, and again the developer had to make it explicit, because the compiler didn't recognize it. (More recently compilers do, so those syntaxes are largely unnecessary these days)
One thing no-one seems to have mentioned about Dinah's problem with sizeof:
You can only add an integer to a pointer, you can't add two pointers together. That way when adding a pointer to an integer, or an integer to a pointer, the compiler always knows which bit has a size that needs to be taken into account.
To answer the question literally. It is not always true that x == x
double zero = 0.0;
double a[] = { 0,0,0,0,0, zero/zero}; // NaN
cout << (a[5] == 5[a] ? "true" : "false") << endl;
prints
false
I just find out this ugly syntax could be "useful", or at least very fun to play with when you want to deal with an array of indexes which refer to positions into the same array. It can replace nested square brackets and make the code more readable !
int a[] = { 2 , 3 , 3 , 2 , 4 };
int s = sizeof a / sizeof *a; // s == 5
for(int i = 0 ; i < s ; ++i) {
cout << a[a[a[i]]] << endl;
// ... is equivalent to ...
cout << i[a][a][a] << endl; // but I prefer this one, it's easier to increase the level of indirection (without loop)
}
Of course, I'm quite sure that there is no use case for that in real code, but I found it interesting anyway :)
Nice question/answers.
Just want to point out that C pointers and arrays are not the same, although in this case the difference is not essential.
Consider the following declarations:
int a[10];
int* p = a;
In a.out, the symbol a is at an address that's the beginning of the array, and symbol p is at an address where a pointer is stored, and the value of the pointer at that memory location is the beginning of the array.
For pointers in C, we have
a[5] == *(a + 5)
and also
5[a] == *(5 + a)
Hence it is true that a[5] == 5[a].
Not an answer, but just some food for thought.
If class is having overloaded index/subscript operator, the expression 0[x] will not work:
class Sub
{
public:
int operator [](size_t nIndex)
{
return 0;
}
};
int main()
{
Sub s;
s[0];
0[s]; // ERROR
}
Since we dont have access to int class, this cannot be done:
class int
{
int operator[](const Sub&);
};
It has very good explanation in A TUTORIAL ON POINTERS AND ARRAYS IN C
by Ted Jensen.
Ted Jensen explained it as:
In fact, this is true, i.e wherever one writes a[i] it can be
replaced with *(a + i) without any problems. In fact, the compiler
will create the same code in either case. Thus we see that pointer
arithmetic is the same thing as array indexing. Either syntax produces
the same result.
This is NOT saying that pointers and arrays
are the same thing, they are not. We are only saying that to identify
a given element of an array we have the choice of two syntaxes, one
using array indexing and the other using pointer arithmetic, which
yield identical results.
Now, looking at this last
expression, part of it.. (a + i), is a simple addition using the +
operator and the rules of C state that such an expression is
commutative. That is (a + i) is identical to (i + a). Thus we could
write *(i + a) just as easily as *(a + i).
But *(i + a) could have come from i[a] ! From all of this comes the curious
truth that if:
char a[20];
writing
a[3] = 'x';
is the same as writing
3[a] = 'x';
I know the question is answered, but I couldn't resist sharing this explanation.
I remember Principles of Compiler design,
Let's assume a is an int array and size of int is 2 bytes,
& Base address for a is 1000.
How a[5] will work ->
Base Address of your Array a + (5*size of(data type for array a))
i.e. 1000 + (5*2) = 1010
So,
Similarly when the c code is broken down into 3-address code,
5[a] will become ->
Base Address of your Array a + (size of(data type for array a)*5)
i.e. 1000 + (2*5) = 1010
So basically both the statements are pointing to the same location in memory and hence, a[5] = 5[a].
This explanation is also the reason why negative indexes in arrays work in C.
i.e. if I access a[-5] it will give me
Base Address of your Array a + (-5 * size of(data type for array a))
i.e. 1000 + (-5*2) = 990
It will return me object at location 990.
in c compiler
a[i]
i[a]
*(a+i)
are different ways to refer to an element in an array ! (NOT AT ALL WEIRD)
In C arrays, arr[3] and 3[arr] are the same, and their equivalent pointer notations are *(arr + 3) to *(3 + arr). But on the contrary [arr]3 or [3]arr is not correct and will result into syntax error, as (arr + 3)* and (3 + arr)* are not valid expressions. The reason is dereference operator should be placed before the address yielded by the expression, not after the address.
A little bit of history now. Among other languages, BCPL had a fairly major influence on C's early development. If you declared an array in BCPL with something like:
let V = vec 10
that actually allocated 11 words of memory, not 10. Typically V was the first, and contained the address of the immediately following word. So unlike C, naming V went to that location and picked up the address of the zeroeth element of the array. Therefore array indirection in BCPL, expressed as
let J = V!5
really did have to do J = !(V + 5) (using BCPL syntax) since it was necessary to fetch V to get the base address of the array. Thus V!5 and 5!V were synonymous. As an anecdotal observation, WAFL (Warwick Functional Language) was written in BCPL, and to the best of my memory tended to use the latter syntax rather than the former for accessing the nodes used as data storage. Granted this is from somewhere between 35 and 40 years ago, so my memory is a little rusty. :)
The innovation of dispensing with the extra word of storage and having the compiler insert the base address of the array when it was named came later. According to the C history paper this happened at about the time structures were added to C.
Note that ! in BCPL was both a unary prefix operator and a binary infix operator, in both cases doing indirection. just that the binary form included an addition of the two operands before doing the indirection. Given the word oriented nature of BCPL (and B) this actually made a lot of sense. The restriction of "pointer and integer" was made necessary in C when it gained data types, and sizeof became a thing.
Because it's useful to avoid confusing nesting.
Would you rather read this:
array[array[head].next].prev
or this:
head[array].next[array].prev
Incidentally, C++ has a similar commutative property for function calls. Rather than writing g(f(x)) as you must in C, you may use member functions to write x.f().g(). Replace f and g with lookup tables and you can write g[f[x]] (functional style) or (x[f])[g] (oop style). The latter gets really nice with structs containing indices: x[xs].y[ys].z[zs]. Using the more common notation that's zs[ys[xs[x].y].z].
Well, this is a feature that is only possible because of the language support.
The compiler interprets a[i] as *(a+i) and the expression 5[a] evaluates to *(5+a). Since addition is commutative it turns out that both are equal. Hence the expression evaluates to true.
In C
int a[]={10,20,30,40,50};
int *p=a;
printf("%d\n",*p++);//output will be 10
printf("%d\n",*a++);//will give an error
Pointer p is a "variable", array name a is a "mnemonic" or "synonym",
so p++ is valid but a++ is invalid.
a[2] is equals to 2[a] because the internal operation on both of this is "Pointer Arithmetic" internally calculated as *(a+2) equals *(2+a)
Because C compiler always convert array notation in pointer notation.
a[5] = *(a + 5) also 5[a] = *(5 + a) = *(a + 5)
So, both are equal.
C was based on BCPL. BCPL directly exposed memory as a sequence of addressable words. The unary operator !X (also known as LV) gave you the contents of the address location X. For convenience there was also a binary operator X!Y equivalent to !(X+Y) which gave you the contents of the Y'th word of an array at location X, or equivalently, the X'th word of an array at location Y.
In C, X!Y became X[Y], but the original BCPL semantics of !(X+Y) show through, which accounts for why the operator is commutative.

Wrong multi-dimensional array variables increase in C

I have the curiosity to know the behaviour, in C, of a multidimensional array increased as below:
int x[10][10];
y = x[++i, ++j];
I know that is the wrong way. I just want to know what the compiler do in this case and what will be the consequence if a programmer do this in his code.
That is the comma operator, misused. ++i, ++j yields the value of j + 1 and has 2 side effects (modifying i and j). The whole thing basically means ++i; y = x[++j]. Which will work or not, depending on the type of y.
what will be the consequence if a programmer do this in his code
Well, most likely other programmers will give him/her murderous looks.

a++ vs a = a + 1 which is useful in efficient memory programming and how?

This was my Interview question in HP. I answered a++ takes less instruction compared to a = a +1;
I want to know which is useful in efficient programming, and how both are different from each other..?
Hoping for quick and positive response..
In C, there would be no difference, if the compiler is smart.
In C++, it depends on what type a is, and whether the ++ operator is overloaded. To complicate matters even more, the = operator can be overloaded too, and a = a + 1 might not be the same as a++. For even more complication, the + operator can also be overloaded, so an innocent looking piece of code such as a = a + 1 might have drastic implications.
So, without some context, you simply cannot know.
First of all, in C++ this will depend on type of a. Clearly a can be of class type and have those operators overloaded and without knowing the details it's impossible to decide which is more efficient.
That said, both in C and C++ whatever looks cleaner is preferable. First write clear code, then profile it and see if it's intolerably slow.
I think I would answer in an implementation independent way. The a++ is easier to read to me because it's just showing me what it does. Whereas for a = a + 1 I first have to scan all the addition. I prefer to go for the choice that's more foolproof.
The former, a++, evaluates to the prior value, so you can use it to express things in sometimes surprisingly simpler manners. For instance
// copy, until '\0' is hit.
while(*dest++ = *source++) ;
Apart from these considerations, I don't think any of them is more efficient, assuming you have to do with basic integer types.
I am not an expert in microprocessor design, but I guess many processors have a INC or DEC instruction. If the data type is int then increment can be done in one instruction. But a = a + 1 requires more, first add and then assignment. So a++ should be faster, obviously assuming that a is not a complex data type.
However a smart compile should do this kind of optimization.
With an optimizing compiler they are identical. The interview question is moot.
As far as I know, there's no difference between a++ and a = a + 1.
HOWEVER, there is a difference between ++a and a = a + 1
Let's take the first case, a = a + 1.
a = a + 1 will have to take the value of a, add one to it, and then store the result back to a.
++a will be a single assembly instruction.
You can notice the difference with these two examples:
Example 1
int a = 1;
int x = a++; //x will be 1
Example 2
int a = 1;
int x = ++a; //x will be 2
BE AWARE! Most compilers optimize this today. If you have a++ somewhere in your code it will MOST likely be optimized to a single assembly instruction.
Even more efficient in many cases in ++a. When a is an int or a pointer though it is not going to make any difference.
The rationale of why these increments are more efficient though than a=a+1 is because the instruction of increment is one instruction whereas the instructions involved in adding 1 to a then assigning it back is something like:
get the address of a
push its contents onto the stack
push 1 to the stack
add them
get the address of a (possibly already stored)
write (pop) from the stack into this address
Really it all boils down to what your compiler does optimize.
Lets take the optimal case of a is an int. Then normally your compiler will make a++ and a=a+1 be exactly the same.
Now what can be pointed out, is that a = a + 1; is purely incrementing the value of the fixed amount 1,
whereas a++ is incrementing the value of 1 of the type of the variable. So if it is an int, float etc you'll get 1->2, 3.4->4.4 in both cases.
But if a is a pointer to a array/list etc, you'll be able to change pointer to the next element in the list/array when using a++, while a = a+1 might do something else or not work at all.
Long answer short, I'd say a++ is better:
your code is clearer and shorter
your can manipulate a wider range of varaibles types
and should be more efficient since (I think but I'm not sure) ++ is a basic operator on the same level as << etc.: it modifies directly the variable, while a = a + 1, if not optimized by your compiler, will require more operations by adding a with another number.
++a;
a+=1;
a=a+1;
Which notation should we use? Why?
We prefer the first version, ++a, because it more directly expresses the idea of incrementing. It says what we want to do (increment a) rather than how to do it.

(add 1 to a and then write the result to a).
In general, a way of saying something in a program is better than another if it more directly expresses an idea.
The result is more concise and easier for a reader to understand. If we wrote a=a+1, a reader could easily wonder whether we really meant to increment by 1.
Maybe we just mistyped a=b+1, a=a+2, or even a=a–1.
With ++a there are far fewer opportunities for such doubts.
Note: This is a logical argument about readability and correctness, not an argument about efficiency. Contrary to popular belief.
Modern compilers tend to generate exactly the same code from a=a+1 as for ++a when a is one of the built-in types.
from http://www.parashift.com/c++-faq-lite/operator-overloading.html#faq-13.15 :
++i is sometimes faster than, and is never slower than, i++.
a++ is better than a+1 because in the case of floating point numbers a++ increments more efficiently than a=a+1. I.e. a++ increments exactly 1 and no rounding takes place.

What are the common undefined/unspecified behavior for C that you run into? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
An example of unspecified behavior in the C language is the order of evaluation of arguments to a function. It might be left to right or right to left, you just don't know. This would affect how foo(c++, c) or foo(++c, c) gets evaluated.
What other unspecified behavior is there that can surprise the unaware programmer?
A language lawyer question. Hmkay.
My personal top3:
violating the strict aliasing rule
violating the strict aliasing rule
violating the strict aliasing rule
:-)
Edit Here is a little example that does it wrong twice:
(assume 32 bit ints and little endian)
float funky_float_abs (float a)
{
unsigned int temp = *(unsigned int *)&a;
temp &= 0x7fffffff;
return *(float *)&temp;
}
That code tries to get the absolute value of a float by bit-twiddling with the sign bit directly in the representation of a float.
However, the result of creating a pointer to an object by casting from one type to another is not valid C. The compiler may assume that pointers to different types don't point to the same chunk of memory. This is true for all kind of pointers except void* and char* (sign-ness does not matter).
In the case above I do that twice. Once to get an int-alias for the float a, and once to convert the value back to float.
There are three valid ways to do the same.
Use a char or void pointer during the cast. These always alias to anything, so they are safe.
float funky_float_abs (float a)
{
float temp_float = a;
// valid, because it's a char pointer. These are special.
unsigned char * temp = (unsigned char *)&temp_float;
temp[3] &= 0x7f;
return temp_float;
}
Use memcopy. Memcpy takes void pointers, so it will force aliasing as well.
float funky_float_abs (float a)
{
int i;
float result;
memcpy (&i, &a, sizeof (int));
i &= 0x7fffffff;
memcpy (&result, &i, sizeof (int));
return result;
}
The third valid way: use unions. This is explicitly not undefined since C99:
float funky_float_abs (float a)
{
union
{
unsigned int i;
float f;
} cast_helper;
cast_helper.f = a;
cast_helper.i &= 0x7fffffff;
return cast_helper.f;
}
My personal favourite undefined behaviour is that if a non-empty source file doesn't end in a newline, behaviour is undefined.
I suspect it's true though that no compiler I will ever see has treated a source file differently according to whether or not it is newline terminated, other than to emit a warning. So it's not really something that will surprise unaware programmers, other than that they might be surprised by the warning.
So for genuine portability issues (which mostly are implementation-dependent rather than unspecified or undefined, but I think that falls into the spirit of the question):
char is not necessarily (un)signed.
int can be any size from 16 bits.
floats are not necessarily IEEE-formatted or conformant.
integer types are not necessarily two's complement, and integer arithmetic overflow causes undefined behaviour (modern hardware won't crash, but some compiler optimizations will result in behavior different from wraparound even though that's what the hardware does. For example if (x+1 < x) may be optimized as always false when x has signed type: see -fstrict-overflow option in GCC).
"/", "." and ".." in a #include have no defined meaning and can be treated differently by different compilers (this does actually vary, and if it goes wrong it will ruin your day).
Really serious ones that can be surprising even on the platform you developed on, because behaviour is only partially undefined / unspecified:
POSIX threading and the ANSI memory model. Concurrent access to memory is not as well defined as novices think. volatile doesn't do what novices think. Order of memory accesses is not as well defined as novices think. Accesses can be moved across memory barriers in certain directions. Memory cache coherency is not required.
Profiling code is not as easy as you think. If your test loop has no effect, the compiler can remove part or all of it. inline has no defined effect.
And, as I think Nils mentioned in passing:
VIOLATING THE STRICT ALIASING RULE.
My favorite is this:
// what does this do?
x = x++;
To answer some comments, it is undefined behaviour according to the standard. Seeing this, the compiler is allowed to do anything up to and including format your hard drive.
See for example this comment here. The point is not that you can see there is a possible reasonable expectation of some behaviour. Because of the C++ standard and the way the sequence points are defined, this line of code is actually undefined behaviour.
For example, if we had x = 1 before the line above, then what would the valid result be afterwards? Someone commented that it should be
x is incremented by 1
so we should see x == 2 afterwards. However this is not actually true, you will find some compilers that have x == 1 afterwards, or maybe even x == 3. You would have to look closely at the generated assembly to see why this might be, but the differences are due to the underlying problem. Essentially, I think this is because the compiler is allowed to evaluate the two assignments statements in any order it likes, so it could do the x++ first, or the x = first.
Dividing something by a pointer to something. Just won't compile for some reason... :-)
result = x/*y;
Another issue I encountered (which is defined, but definitely unexpected).
char is evil.
signed or unsigned depending on what the compiler feels
not mandated as 8 bits
I can't count the number of times I've corrected printf format specifiers to match their argument. Any mismatch is undefined behavior.
No, you must not pass an int (or long) to %x - an unsigned int is required
No, you must not pass an unsigned int to %d - an int is required
No, you must not pass a size_t to %u or %d - use %zu
No, you must not print a pointer with %d or %x - use %p and cast to a void *
I've seen a lot of relatively inexperienced programmers bitten by multi-character constants.
This:
"x"
is a string literal (which is of type char[2] and decays to char* in most contexts).
This:
'x'
is an ordinary character constant (which, for historical reasons, is of type int).
This:
'xy'
is also a perfectly legal character constant, but its value (which is still of type int) is implementation-defined. It's a nearly useless language feature that serves mostly to cause confusion.
A compiler doesn't have to tell you that you're calling a function with the wrong number of parameters/wrong parameter types if the function prototype isn't available.
The clang developers posted some great examples a while back, in a post every C programmer should read. Some interesting ones not mentioned before:
Signed integer overflow - no it's not ok to wrap a signed variable past its max.
Dereferencing a NULL Pointer - yes this is undefined, and might be ignored, see part 2 of the link.
The EE's here just discovered that a>>-2 is a bit fraught.
I nodded and told them it was not natural.
Be sure to always initialize your variables before you use them! When I had just started with C, that caused me a number of headaches.

Resources