Difference of two pointers of the same type is always one.
#include<stdio.h>
#include<string.h>
int main(){
int a = 5,b = 10,c;
int *p = &a,*q = &b;
c = p - q;
printf("%d" , c);
return 0;
}
Output is 1.
I dont get the reasoning behind it
The behavior is undefined.
C99 6.5.6 paragraph 9 says:
When two pointers are subtracted, both shall point to elements of the
same array object, or one past the last element of the array object;
the result is the difference of the subscripts of the two array
elements.
Paragraph 7 in the same section says:
For the purposes of these operators, a pointer to an object that is
not an element of an array behaves the same as a pointer to the first
element of an array of length one with the type of the object as its
element type.
Section 4 paragraph 2 says:
If a "shall" or "shall not" requirement that appears outside of a
constraint is violated, the behavior is undefined. Undefined behavior
is otherwise indicated in this International Standard by the words
"undefined behavior" or by the omission of any explicit definition of
behavior. There is no difference in emphasis among these three; they
all describe "behavior that is undefined".
3.4.3 defines the term "undefined behavior" as:
behavior, upon use of a nonportable or erroneous program construct or
of erroneous data, for which this International imposes no
requirements
NOTE Possible undefined behavior ranges from ignoring the situation
completely with unpredictable results, to behaving during translation
or program execution in a documented manner characteristic of the
environment (with or without the issuance of a diagnostic message), to
terminating a translation or execution (with the issuance of a
diagnostic message).
Given the declaration:
int a = 5, b = 10, c;
it's likely that evaluating &b - &a will yield a result that seems reasonable, such as 1 or -1. (Reasonable results are always a possible symptom of undefined behavior; it's undefined, not required to crash.) But the compiler is under no obligation to place a and b at any particular locations in memory relative to each other, and even if it does so, the subtraction is not guaranteed to be meaningful. An optimizing compiler is free to transform your program in ways that assume that its behavior is well defined, resulting in code that can behave in arbitrarily bad ways if that assumption is violated.
By writing &b - &a, you are in effect promising the compiler that that's a meaningful operation. As Henry Spencer famously said, "If you lie to the compiler, it will get is revenge."
Note that it's not just the result of the subtraction that's undefined, it's the behavior of the program that evaluates it.
Oh, did I mention that the behavior is undefined?
In fact, the behaviour of your program is undefined. The output happens to be 1 on your platform/compiler, but it could just as easily be something else.
This code exhibits undefined behaviour because pointer arithmetic is only defined when the operands are both in the same array or struct.
The value of 1 comes about because the compiler has placed a and b next to each other in memory. Another compiler could do something different. Indeed the same compiler could do something different the next time you change the code.
C compiler knows the size of each type. for example suppose P is a int pointer that refer to address 0x0010. if you add P by 1 (P++ or P=P+1), then the value of P is 0x0014.
About your question, a and b variables declared tandems, in physical memory they tandems, and the head of each of them, has 4 bytes difference with other. At this situation compiler knows the int size is 4 bytes. when you subtract 2 int pointer, the compiler divide the result by 4.
Related
Pardon the confusing question title, but I was unsure how to phrase it more clearly.
In C, accessing an array out of bounds is classified as undefined behavior. However, array elements are guaranteed to be laid out contiguously in memory, and the array subscript operator is syntactic sugar for pointer arithmetic (e.g x[3] == *(x + 3)). Therefore, I would personally expect the behavior of the code below to be well-defined:
int array[10][10];
int i = array[0][15]; // i == array[1][5]?
If my interpretation of the standard is correct, this would be undefined behavior. Am I wrong?
According to the standard, it is clearly undefined behaviour as such a case is explicitly listed in the section J.2 undefined behaviour (found in an online C99 standard draft):
An array subscript is out of range, even if an object is apparently
accessible with the given subscript (as in the lvalue expression
a[1][7] given the declaration int a[4][5]) (6.5.6).
It can still be the case that your example will work, and actually I have seen a lot of such cases in C code; However, to be accurate, it is UB.
The Standard makes very clear that given unsigned char arr[10][10];, an attempt to access arr[0][x] would yield UB if x exceeds 9.
I think it is equally clear, however, that the authors of the Standard intended to allow code to take the address of any object, including a multi-dimensional array, as a character pointer, and then index that pointer to access all the bytes of the object.
If the Standard were to say that the arr[0] yields a pointer of type char* which can only be used to access the first ten elements, but (char*)arr would yield a pointer that can access the entire array, that would accommodate both objectives above, but I see nothing in the Standard that would suggest that arr[0] and (char*)arr are not equivalent to each other.
Most likely, the authors of the Standard expected that implementations would seek to behave sensibly in such corner cases whether or not the Standard described them fully. I'm not sure whether clang and gcc conform to such expectations with regard to this particular issue, but such expectations don't hold true in general.
As other answers already pointed out where the standard says that this is UB, I will add something that is not discussed in other answers.
The expression arr[0][15] is equivalent to (a[0])[15] due to operator precedence.
Now since going out of bounds of a 1D arrays is undefined behavior, the use of the expression arr[0][15] which is the same as (arr[0])[15] is also UB because 15 is greater than 10 in your example.
Basically, since going out of bounds of a 1D array is undefined behavior, this implies that if any of the index i or j in the expression (arr[i])[j]== arr[i][j] goes beyond range xrange - 1 = 10 - 1 = 9 and yrange - 1 = 10 - 1 = 9(for your given example) respectively, we will have undefined behavior.
Note xrange and yrange are shorthand for the initial size that you specified as rows and columns of the array which correspond to 10 and 10 in your example.
Is the behaviour of this code well defined?
#include <stdio.h>
#include <stdint.h>
int main(void)
{
void *ptr = (char *)0x01;
size_t val;
ptr = (char *)ptr + 1;
val = (size_t)(uintptr_t)ptr;
printf("%zu\n", val);
return 0;
}
I mean, can we assign some fixed number to a pointer and increment it even if it is pointing to some random address? (I know that you can not dereference it)
The assignment:
void *ptr = (char *)0x01;
Is implementation defined behavior because it is converting an integer to a pointer. This is detailed in section 6.3.2.3 of the C standard regarding Pointers:
5 An integer may be converted to any pointer type. Except as previously specified, the result is implementation-defined,
might not be correctly aligned, might not point to an entity
of the referenced type, and might be a trap representation.
As for the subsequent pointer arithmetic:
ptr = (char *)ptr + 1;
This is dependent on a few things.
First, the current value of ptr may be a trap representation as per 6.3.2.3 above. If it is, the behavior is undefined.
Next is the question of whether 0x1 points to a valid object. Adding a pointer and an integer is only valid if both the pointer operand and the result point to elements of an array object (a single object counts as an array of size 1) or one element past the array object. This is detailed in section 6.5.6:
7 For the purposes of these operators, a pointer to an object that is not an element of an array behaves the same as a
pointer to the first element of an array of length one with the type
of the object as its element type
8 When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer
operand. If the pointer operand points to an element of an array
object, and the array is large enough, the result points to an element
offset from the original element such that the difference of the
subscripts of the resulting and original array elements equals the
integer expression. In other words, if the expression P points to the
i-th element of an array object, the expressions (P)+N (equivalently, N+(P) ) and (P)-N (where N has the value n ) point to,
respectively, the i+n-th and i−n-th elements of the array object, provided they exist. Moreover, if the expression P points to the last element of an
array object, the expression (P)+1 points one past the last element of
the array object, and if the expression Q points one past the
last element of an array object, the expression (Q)-1 points to
the last element of the array object. If both the pointer
operand and the result point to elements of the same array
object, or one past the last element of the array object, the
evaluation shall not produce an overflow; otherwise, the behavior is
undefined. If the result points one past the last element of the
array object, it shall not be used as the operand of a unary
* operator that is evaluated.
On a hosted implementation the value 0x1 almost certainly does not point to a valid object, in which case the addition is undefined. An embedded implementation could however support setting pointers to specific values, and if so it could be the case that 0x1 does in fact point to a valid object. If so, the behavior is well defined, otherwise it is undefined.
No, the behaviour of this program is undefined. Once an undefined construct is reached in a program, any future behaviour is undefined. Paradoxically, any past behaviour is undefined too.
The result of void *ptr = (char*)0x01; is implementation-defined, due in part to the fact that a char can have a trap representation.
But the behaviour of the ensuing pointer arithmetic in the statement ptr = (char *)ptr + 1; is undefined. This is because pointer arithmetic is only valid within arrays including one past the end of the array. For this purpose an object is an array of length one.
Yes, the code is well-defined as implementation-defined. It is not undefined. See ISO/IEC 9899:2011 [6.3.2.3]/5 and note 67.
The C language was originally created as a system programming language. Systems programming required manipulating memory-mapped hardware, requiring that you would stuff hard-coded addresses into pointers, sometimes increment those pointers, and read and write data from and to the resulting address. To that end, assigning and integer to a pointer and manipulating that pointer using arithmetic is well defined by the language. By making it implementation-defined, what the language allows is that all kinds of things can happen: from the classic halt-and-catch-fire to raising a bus error when trying to dereference an odd address.
The difference between undefined behaviour and implementation-defined behaviour is basically undefined behaviour means "don't do that, we don't know what will happen" and implementation-defined behaviour means "it's OK to go ahead and do that, it's up to you to know what will happen."
It is undefined behavior.
From N1570 (emphasis added):
An integer may be converted to any pointer type. Except as previously specified, the result is implementation-defined, might not be correctly aligned, might not point to an entity of the referenced type, and might be a trap representation.
If the value is a trap representation, reading it is undefined behavior:
Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined. If such a representation is produced by a side effect that modifies all or any part of the object by an lvalue expression that does not have character type, the behavior is undefined.) Such a representation is called a trap representation.
And
An identifier is a primary expression, provided it has been declared as designating an object (in which case it is an lvalue) or a function (in which case it is a function designator).
Therefore, the line void *ptr = (char *)0x01; is already potentially undefined behavior, on an implementation where (char*)0x01 or (void*)(char*)0x01 is a trap representation. The left-hand side is an lvalue expression that does not have character type and reads a trap representation.
On some hardware, loading an invalid pointer into a machine register could crash the program, so this was a forced move by the standards committee.
The Standard does not require that implementations process integer-to-pointer conversions in a meaningful fashion for any particular integer values, or even for any possible integer values other than Null Pointer Constants. The only thing it guarantees about such conversions is that a program which stores the result of such a conversion directly into an object of suitable pointer type and does nothing with it except examine the bytes of that object will, at worst, see Unspecified values. While the behavior of converting an integer to a pointer is Implementation-Defined, nothing would forbid any implementation (no matter what it actually does with such conversions!) from specifying that some (or even all) of the bytes of the representation having Unspecified values, and specifying that some (or even all) integer values may behave as though they yield trap representations.
The only reasons the Standard says anything at all about integer-to-pointer conversions are that:
In some implementations, the construct is meaningful, and some programs for those implementations require it.
The authors of the Standard did not like the idea of a construct that was used on some implementations would represent a constraint violation on others.
It would have been odd for the Standard to describe a construct but then specify that it has Undefined Behavior in all cases.
Personally, I think the Standard should have allowed implementations to treat integer-to-pointer conversions as constraint violations if they don't define any situations where they would be useful, rather than require that compilers accept the meaningless code, but that wasn't the philosophy at the time.
I think it would be simplest to simply say that any operation involving integer-to-pointer conversions with anything other than intptr_t or uintptr_t values received from pointer-to-integer conversions invokes Undefined Behavior, but then note that it is common for quality implementations intended for low-level programming to process Undefined Behavior "in a documented manner characteristic of the environment". The Standard doesn't specify when implementations should process programs that invoke UB in that fashion but instead treats it as a Quality of Implementation issue.
If an implementation specifies that integer-to-pointer conversions operate in a fashion that would define the behavior of
char *p = (char*)1;
p++;
as equivalent to "char p = (char)2;", then the implementation should be expected to work that way. On the other hand, an implementation could define the behavior of integer-to-pointer conversion in such a way that even:
char *p = (char*)1;
char *q = p; // Not doing any arithmetic here--just a simple assignment
would release nasal demons. On most platforms, a compiler where arithmetic on pointers produced by integer-to-pointer conversions behaved oddly would not be viewed as a high-quality implementation suitable for low-level programming. A programmer that is not intending to target any other kind of implementations could thus expect such constructs to behave usefully on compilers for which the code was suitable, even though the Standard does not require it.
main()
{
char buffer[6]="hello";
char *ptr3 = buffer +8;
char *str;
for(str=buffer;str <ptr3;str++)
printf("%d \n",str);
}
Here, ptr3 is pointing out of array bounds. However, if I run this program, I am getting consecutive memory locations (for ex.1000.....1007). So, according to the C standard, a pointer pointing more than one past the array bound is explicitly undefined behavior.
My question is how the above code results in undefined behavior?
There are multiple occurrences of undefined behavior in your program.
For starters you're calling printf without the required #include <stdio.h>, and main() should be int main(void). That's not what you're asking about, but you should fix it.
char buffer[6]="hello";
This is ok.
char *ptr3 = buffer +8;
Evaluating the expression buffer +8 has undefined behavior. N1570 6.5.6 specifies the behavior of the + addition operator, and paragraph 8 says:
If both the pointer operand and the result point to elements of the
same array object, or one past the last element of the array object,
the evaluation shall not produce an overflow; otherwise, the behavior
is undefined.
Computing the pointer value by itself has undefined behavior, even if you never dereference it or access its value.
char *str;
for(str=buffer;str <ptr3;str++)
printf("%d \n",str);
You're passing a char* value to printf, but %d requires an argument of type int. Passing a value of the wrong type to printf also has undefined behavior.
If you want to print the pointer value, you need to write:
printf("%p\n", (void*)str);
which will likely print the pointer value in hexadecimal, depending on the implementation. (I've removed the unnecessary trailing space.)
When str points to buffer[5], str++ is valid; it causes str to point just past the end of buffer. (Dereferencing str after that would have undefined behavior, but you don't do that.) Incrementing str again after that has undefined behavior. The comparison str < ptr3 also has undefined behavior, since ptr3 has an invalid value -- but you already triggered undefined behavior when you initialized ptr3. so this is just icing on the proverbial cake.
Keep in mind that "undefined behavior" means that the C standard does not define the behavior. It doesn't mean that the program will crash or print an error message. In fact the worst possible consequence of undefined behavior is that the code seems to "work"; it means that you have a bug, but it's going to be difficult to diagnose and fix it.
You are seeing the address of the pointer. If you want the value, you need use the dereference (*) operator in the printf.
The other thing is, if you want see characters and not ASCII codes, you should use %c in printf.
printf("%c\n",*str);
In C, you can always add two numbers. You can always add an integer to a pointer, or subtract two pointers. You will always get an "answer": the compiler will generate code and the code will execute. That's no assurance that answer is valid, useful, or even defined.
The C standard defines the language. Within the scope of what the syntax admits, it defines what's valid -- what definitely means something -- and what's not. When you color outside those lines, the compiler may produce weird code or no code. In C, it's not the job of the compiler to anticipate every weird circumstance and arrive at a reasonable answer. The compiler writer assumes the programmer knows the rules, and is not required to verify he followed them.
There are lots of examples of valid syntax that's meaningless or undefined. In math, you cannot take the log of a negative, and you cannot divide by zero. Dividing by zero doesn't yield zero or not zero; the operation is undefined.
In your case, ptr3 has a value, duly computed, 8 larger than buffer. That's the result of some pointer arithmetic. So far, so good.
But just because you have a pointer, doesn't mean it points to anything. (void*) 0 is explicitly guaranteed not point to anything. Likewise, your ptr3 doesn't point to anything. It needn't even be a value 8 larger than buffer. Section 6.5.6 of the C standard defines the result of adding an integer to a pointer, and puts it this way:
If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.
When you say, I am getting consecutive memory locations (for ex.1000.....1007), what you're seeing is behavior. You had to see some behavior. And that behavior is undefined. According to the standard, you could see some other behavior, such as wrapping back to 1000 or 0.
What the compiler accepts and what the standard defines are two different things.
Difference of two pointers of the same type is always one.
#include<stdio.h>
#include<string.h>
int main(){
int a = 5,b = 10,c;
int *p = &a,*q = &b;
c = p - q;
printf("%d" , c);
return 0;
}
Output is 1.
I dont get the reasoning behind it
The behavior is undefined.
C99 6.5.6 paragraph 9 says:
When two pointers are subtracted, both shall point to elements of the
same array object, or one past the last element of the array object;
the result is the difference of the subscripts of the two array
elements.
Paragraph 7 in the same section says:
For the purposes of these operators, a pointer to an object that is
not an element of an array behaves the same as a pointer to the first
element of an array of length one with the type of the object as its
element type.
Section 4 paragraph 2 says:
If a "shall" or "shall not" requirement that appears outside of a
constraint is violated, the behavior is undefined. Undefined behavior
is otherwise indicated in this International Standard by the words
"undefined behavior" or by the omission of any explicit definition of
behavior. There is no difference in emphasis among these three; they
all describe "behavior that is undefined".
3.4.3 defines the term "undefined behavior" as:
behavior, upon use of a nonportable or erroneous program construct or
of erroneous data, for which this International imposes no
requirements
NOTE Possible undefined behavior ranges from ignoring the situation
completely with unpredictable results, to behaving during translation
or program execution in a documented manner characteristic of the
environment (with or without the issuance of a diagnostic message), to
terminating a translation or execution (with the issuance of a
diagnostic message).
Given the declaration:
int a = 5, b = 10, c;
it's likely that evaluating &b - &a will yield a result that seems reasonable, such as 1 or -1. (Reasonable results are always a possible symptom of undefined behavior; it's undefined, not required to crash.) But the compiler is under no obligation to place a and b at any particular locations in memory relative to each other, and even if it does so, the subtraction is not guaranteed to be meaningful. An optimizing compiler is free to transform your program in ways that assume that its behavior is well defined, resulting in code that can behave in arbitrarily bad ways if that assumption is violated.
By writing &b - &a, you are in effect promising the compiler that that's a meaningful operation. As Henry Spencer famously said, "If you lie to the compiler, it will get is revenge."
Note that it's not just the result of the subtraction that's undefined, it's the behavior of the program that evaluates it.
Oh, did I mention that the behavior is undefined?
In fact, the behaviour of your program is undefined. The output happens to be 1 on your platform/compiler, but it could just as easily be something else.
This code exhibits undefined behaviour because pointer arithmetic is only defined when the operands are both in the same array or struct.
The value of 1 comes about because the compiler has placed a and b next to each other in memory. Another compiler could do something different. Indeed the same compiler could do something different the next time you change the code.
C compiler knows the size of each type. for example suppose P is a int pointer that refer to address 0x0010. if you add P by 1 (P++ or P=P+1), then the value of P is 0x0014.
About your question, a and b variables declared tandems, in physical memory they tandems, and the head of each of them, has 4 bytes difference with other. At this situation compiler knows the int size is 4 bytes. when you subtract 2 int pointer, the compiler divide the result by 4.
Due to the way how memory is layout for structs and its members, I was able to do the following:
typedef struct
{
int a;
int b;
int c;
} myStruct;
In main():
myStruct aStruct;
aStruct.a = 1;
aStruct.b = 2;
aStruct.c = 3;
int *aBadBoyPointer = &aStruct.a;
printf("%i %i %i", *(aBadBoyPointer), *(++aBadBoyPointer), *(++aBadBoyPointer));
Easy enough.
The above line causes a warning:
Unsequenced modification and access to aBadBoyPointer
But it compiles and runs fine printing out 1, 2, 3, respectively.
My question here:
For the sake of doing things correctly, could you cite a scenario where this would break miserably, a scenario that confirms the compiler that this is a bad practice / bad way of doing things?
Or: perhaps this is in fact a "a good way" to do things under some rare circumstances?
Addendum:
Aside from this part that cause Undefined Behavior:
printf("%i %i %i", *(aBadBoyPointer), *(++aBadBoyPointer), *(++aBadBoyPointer));
What I really like to know is:
Is it considered OK (a good practice) using a pointer pointing to one of the members of the struct but then gain access to other members within the struct this way (by incrementing, made possible due to the memory layout of the struct members) or is it a practice that is frowned upon by the Standard?
And secondly as mentioned above, if this is in deed a bad practice, would there ever be cases where using a pointer this way within a struct and then gain access to another member variable beneficial under certain circumstances?
There are a few problems, the warning you are seeing is due the unspecified order of evaluation of the argument to your function. The C99 draft standard section 6.5.2.2 Function calls paragraph 10 says:
The order of evaluation of the function designator, the actual arguments, and
subexpressions within the actual arguments is unspecified, but there is a sequence point
before the actual call.
you are also modifying a variable more than once within a sequence point which is undefined behavior and can not be relied on to work, section 6.5 Expressions paragraph 2 says (emphasis mine going forward):
Between the previous and next sequence point an object shall have its stored value
modified at most once by the evaluation of an expression.72) Furthermore, the prior value shall be read only to determine the value to be stored.73)
Also, note that the standard allows for padding between elements of a struct but beyond that scalar is considered an array of one element and so incrementing beyond the array and then performing indirection would be undefined as well. This is covered in section 6.5.6 Additive operators paragraph 7:
For the purposes of these operators, a pointer to an object that is not an element of an
array behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type.
and going more than one past the array bounds of accessing one past the array bounds is undefined by section 6.5.6 Additive operators paragraph 8:
[...]If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.
We can see depending on the optimization level gcc will output (see it live):
3 3 2
or (see it live):
3 3 3
neither of which is the desired output.
The standards compliant way to access your structs members via a pointer would be to use offsetof and here, which requires including stddef.h. Accessing member a would look like this:
*( (int*) ((char*)aBadBoyPointer+offsetof(myStruct, a)) )
^ ^ ^
3 2 1
There are three elements here:
Use offsetof to determine the offset in bytes of the member
Cast to *char ** since we need pointer arithmetic in bytes
Cast back to *int ** since that is the correct type
I agree with the existing answers (such unsequenced access invokes Undefined Behavior and is bad).
However, to make a concrete example, I compiled your code in MS Visual Studio. The output is (in both Debug and Release mode):
3
3
3
In addition to what was already said in Shafik Yaghmour and nmichaels answers, you must also observe that some compilers will apply an alignment to the variables in the structure, usually by 4 bytes. For instance:
struct something {
char a;
char b;
};
This structure seems to have 2 bytes, but it may have 8 because the compiler may pad each element in the structure to make it cover a memory space divisible by 4. There will be 6 bytes that are just garbage, but they are still reserved. In such exemple, reading the structure as a sequence of char would fail.
Function arguments are not evaluated in an order determined by the C standard (C99 §6.5.2.2) therefore your code invokes undefined behavior. A different compiler or the same compiler on a different platform might give you different results. Under no circumstances is invoking undefined behavior a good way to do things.
For reference, the text of the standard says:
10 The order of evaluation of the function designator, the actual arguments, and
subexpressions within the actual arguments is unspecified, but there is a sequence point
before the actual call.
Addendum
To answer the second part of your question, the C compiler is free to add padding in between members of a struct per §6.7.2.1 paragraph 12:
Each non-bit-field member of a structure or union object is aligned in an implementation-defined manner appropriate to its type.
There are cases where structures can behave like arrays, and incrementing a pointer to a member can work out for you (see #pragma pack and __attribute__((packed))) but your code will be decidedly (if explicitly) non-portable and you may run into some compiler bugs. In general, use arrays and enumerations for this sort of thing instead of structures.
The compiler is free to add padding between each of a struture's members. If it did the OP's code would fail miserably.
Also this might be undefined behaviour as one may not dereference a pointer which points out of an array's bounds.
However, whether one may consider
int a;
an array of 1 element I'm unsure.