I never thought I will be going to ask this question but I have no idea why this happens.
const int a = 3;
int *ptr;
ptr = (int*)( &a );
printf( "A=%d\n", &a );
*ptr = 5;
printf( "A=%d\n", ptr );
printf( "A=%d\n", a );
printf( "A=%d\n", *ptr );
Output
A=6945404
A=6945404
A=3
A=5
How can this happen? How can one memory location hold two different values? I searched around and all I find is undefined behavior is undefined. Well that does not make any sense. There must be an explanation.
Edit
I get it, Marks answer makes alot of sense but still I wonder that const was added into the language so that user does not change the value unintentionally. I get that old compilers allows you to do that but I tried this on VS 2012 and I got the same behavior. Then again as haccks said, one memory location can't hold two values it looks like it does, then where is the second value stored?
The optimizer can determine that a is a constant value, and replace any reference to it with the literal 3. That explains what you see, although there's no guarantee that's what's actually happening. You'd need to study the generated assembly output for that.
Modifying a const variable through a non-const pointer results in undefined behavior. Most ikely the optimizer is substituting the original value in this line:
printf( "A=%d\n", a );
Look at the disassembly to verify this.
The C Standard, subclause 6.7.3, paragraph 6 [ISO/IEC 9899:2011], states:
If an attempt is made to modify an object defined with a const-qualified type through use of an lvalue with non-const-qualified type, the behavior is undefined.
In fact your program invokes undefined behavior because of two reasons:
1.You are printing an address with wrong specifier %d. Correct specifier for that is %p.
2.You are modifying a variable with const specifier.
If the behavior is undefined then anything could happen. You may get either expected or unexpected result.
Standard says about it;
3.4.3 undefined behavior
behavior, upon use of a nonportable or erroneous program construct or of erroneous data,
for which this International Standard imposes no requirements
The problem is that the type of ptr is "pointer to int" not "pointer to const int".
You are then casting the address of 'a' (a const int) to be of type "pointer to int" and storing that address in ptr. The effect of this is that you are casting away the const-ness of a const variable.
This results in undefined behavior so your results may vary from compiler to compiler.
It is possible for the compiler to store 'a' in program ROM since it knows 'a' is a const value that can never be changed. When you lie to the compiler and cast away the const-ness of 'a' so that you can modify it through ptr, it may be invalid for ptr to actually modify the value of 'a' since that data may be stored in program ROM. Instead of giving you a crash, this compiler this time decided to point ptr to a different location with a different value this time. But anything could have happened since this behavior is undefined.
Related
Is this behavior defined or not?
volatile long (*volatile ptr)[1] = (void*)NULL;
volatile long v = (long) *ptr;
printf("%ld\n", v);
It works because by dereferencing pointer to array we are receiving an array itself, then that array decaying to pointer to it's first element.
Updated demo: https://ideone.com/DqFF6T
Also, GCC even considers next code as a constant expression:
volatile long (*ptr2)[1] = (void*)NULL;
enum { this_is_constant_in_gcc = ((void*)ptr2 == (void*)*ptr2) };
printf("%d\n", this_is_constant_in_gcc);
Basically, dereferencing ptr2 at compile time;
This:
long (*ptr)[1] = NULL;
Is declaring a pointer to an "array of 1 long" (more precisely, the type is long int (*)[1]), with the initial value of NULL. Everything fine, any pointer can be NULL.
Then, this:
long v = (long) *ptr;
Is dereferencing the NULL pointer, which is undefined behavior. All bets are off, if your program does not crash, the following statement could print any value or do anything else really.
Let me make this clear one more time: undefined behavior means that anything can happen. There is no explanation as to why anything strange happens after invoking undefined behavior, nor there needs to be. The compiler could very well emit 16-bit Real Mode x86 assembly, produce a binary that deletes your entire home folder, emit the Apollo 11 Guidance Computer assembly code, or whatever else. It is not a bug. It's perfectly conforming to the standard.
The only reason your code seems to work is because GCC decides, purely out of coincidence, to do the following (Godbolt link):
mov QWORD PTR [rbp-8], 0 ; put NULL on the stack
mov rax, QWORD PTR [rbp-8]
mov QWORD PTR [rbp-16], rax ; move NULL to the variable v
Causing the NULL-dereference to never actually happen. This is most probably a consequence of the undefined behavior in dereferencing ptr ¯\_(ツ)_/¯
Funnily enough, I previously said in a comment:
dereferencing NULL is invalid and will basically always cause a segmentation fault.
But of course, since it is undefined behavior that "basically always" is wrong. I think this is the first time I ever see a null-pointer dereference not cause a SIGSEGV.
Is this behavior defined or not?
Not.
long (*ptr)[1] = NULL;
long v = (long) *ptr;
printf("%ld\n", v);
It works because by dereferencing pointer to array we are receiving an
array itself, then that array decaying to pointer to it's first
element.
No, you are confusing type with value. It is true that the expression *ptr on the second line has type long[1], but evaluating that expression produces undefined behavior regardless of the data type, and regardless of the automatic conversion that would be applied to the result if it were defined.
The relevant section of the spec is paragraph 6.5.2.3/4:
The unary * operator denotes indirection. If the operand points to a
function, the result is a function designator; if it points to an
object, the result is an lvalue designating the object. If the operand
has type ''pointer to type'', the result has type ''type''. If an
invalid value has been assigned to the pointer, the behavior of the
unary * operator is undefined.
A footnote goes on to clarify that
[...] Among the invalid values for dereferencing a pointer by the unary * operator are a null pointer [...]
It may "work" for you in an empirical sense, but from a language perspective, any output at all or none is a conforming result.
Update:
It may be interesting to note that the answer would be different for explicitly taking the address of *ptr than it is for supposing that array decay will overcome the undefinedness of the dereference. The standard provides that, as a special case, where the operand of the unary & operator is the result of a unary * operator, neither of those operators is evaluated. Provided that all relevant constraints are satisfied, the result is as if they were both omitted altogether, except that it is never an lvalue.
Thus, this is ok:
long (*ptr)[1] = NULL;
long v = (long) &*ptr;
printf("%ld\n", v);
On many implementations it will reliably print 0, but do note that C does not specify that it must be 0.
The key distinction here is that in this case, the * operation is not evaluated (per spec). The * operation in the original code is is evaluated, notwithstanding the fact that if the pointer value were valid, the resulting array would be converted right back to a pointer (of a different type, but to the same location). That does suggest an obvious shortcut that implementations may take with the original code, and they may take it, if they wish, without regard to whether ptr's value is valid because if it is invalid then they can do whatever they want.
To just answer you´re provided questions:
Is dereferencing a NULL pointer to array valid in C?
No.
Is this behavior defined or not?
It is classified as "undefined behavior", so it is not defined.
Never mind of the case, that this trick with the array, maybe will work on some implementations and it fills absolutely no needs to do so (I imply you are asking out of curiousity), it is not valid per the C standard to dereference a NULL pointer in any way and will cause "Undefined Behavior".
Anything can happen when you implement such statements into your program.
Look at the answers on this question, which explain why:
What EXACTLY is meant by "de-referencing a NULL pointer"?
One qoute from Adam Rosenfield´s answer:
A null pointer is a pointer that does not point to any valid data (but it is not the only such pointer). The C standard says that it is undefined behavior to dereference a null pointer. This means that absolutely anything could happen: the program could crash, it could continue working silently, or it could erase your hard drive (although that's rather unlikely).
Is this behavior defined or not?
The behavior is undefined because you are applying * operator to a pointer that compares equal to null pointer constant.
The following stackoverflow thread tries to explain what undefined behavior is: Undefined, unspecified and implementation-defined behavior
main()
{
char buffer[6]="hello";
char *ptr3 = buffer +8;
char *str;
for(str=buffer;str <ptr3;str++)
printf("%d \n",str);
}
Here, ptr3 is pointing out of array bounds. However, if I run this program, I am getting consecutive memory locations (for ex.1000.....1007). So, according to the C standard, a pointer pointing more than one past the array bound is explicitly undefined behavior.
My question is how the above code results in undefined behavior?
There are multiple occurrences of undefined behavior in your program.
For starters you're calling printf without the required #include <stdio.h>, and main() should be int main(void). That's not what you're asking about, but you should fix it.
char buffer[6]="hello";
This is ok.
char *ptr3 = buffer +8;
Evaluating the expression buffer +8 has undefined behavior. N1570 6.5.6 specifies the behavior of the + addition operator, and paragraph 8 says:
If both the pointer operand and the result point to elements of the
same array object, or one past the last element of the array object,
the evaluation shall not produce an overflow; otherwise, the behavior
is undefined.
Computing the pointer value by itself has undefined behavior, even if you never dereference it or access its value.
char *str;
for(str=buffer;str <ptr3;str++)
printf("%d \n",str);
You're passing a char* value to printf, but %d requires an argument of type int. Passing a value of the wrong type to printf also has undefined behavior.
If you want to print the pointer value, you need to write:
printf("%p\n", (void*)str);
which will likely print the pointer value in hexadecimal, depending on the implementation. (I've removed the unnecessary trailing space.)
When str points to buffer[5], str++ is valid; it causes str to point just past the end of buffer. (Dereferencing str after that would have undefined behavior, but you don't do that.) Incrementing str again after that has undefined behavior. The comparison str < ptr3 also has undefined behavior, since ptr3 has an invalid value -- but you already triggered undefined behavior when you initialized ptr3. so this is just icing on the proverbial cake.
Keep in mind that "undefined behavior" means that the C standard does not define the behavior. It doesn't mean that the program will crash or print an error message. In fact the worst possible consequence of undefined behavior is that the code seems to "work"; it means that you have a bug, but it's going to be difficult to diagnose and fix it.
You are seeing the address of the pointer. If you want the value, you need use the dereference (*) operator in the printf.
The other thing is, if you want see characters and not ASCII codes, you should use %c in printf.
printf("%c\n",*str);
In C, you can always add two numbers. You can always add an integer to a pointer, or subtract two pointers. You will always get an "answer": the compiler will generate code and the code will execute. That's no assurance that answer is valid, useful, or even defined.
The C standard defines the language. Within the scope of what the syntax admits, it defines what's valid -- what definitely means something -- and what's not. When you color outside those lines, the compiler may produce weird code or no code. In C, it's not the job of the compiler to anticipate every weird circumstance and arrive at a reasonable answer. The compiler writer assumes the programmer knows the rules, and is not required to verify he followed them.
There are lots of examples of valid syntax that's meaningless or undefined. In math, you cannot take the log of a negative, and you cannot divide by zero. Dividing by zero doesn't yield zero or not zero; the operation is undefined.
In your case, ptr3 has a value, duly computed, 8 larger than buffer. That's the result of some pointer arithmetic. So far, so good.
But just because you have a pointer, doesn't mean it points to anything. (void*) 0 is explicitly guaranteed not point to anything. Likewise, your ptr3 doesn't point to anything. It needn't even be a value 8 larger than buffer. Section 6.5.6 of the C standard defines the result of adding an integer to a pointer, and puts it this way:
If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.
When you say, I am getting consecutive memory locations (for ex.1000.....1007), what you're seeing is behavior. You had to see some behavior. And that behavior is undefined. According to the standard, you could see some other behavior, such as wrapping back to 1000 or 0.
What the compiler accepts and what the standard defines are two different things.
This question is more of an academic one, seeing as there is no valid reason to write your own offsetof macro anymore. Nevertheless, I've seen this home-grown implementation pop-up here and there:
#define offsetof(s, m) ((size_t) &(((s *)0)->m))
Which is, technically speaking, dereferencing a NULL pointer (AFAIKT):
C11(ISO/IEC 9899:201x) §6.3.2.3 Pointers Section 3
An integer constant expression with the value 0, or such an expression cast to type void *, is called a null pointer constant
So the above implementation is, according to how I read the standard, the same as writing:
#define offsetof(s, m) ((size_t) &(((s *)NULL)->m))
It does make me wonder that, by changing one tiny detail, the following definition of offsetof would be completely legal, and reliable:
#define offsetof(s, m) (((size_t)&(((s *) 1)->m)) - 1)
Seeing as, instead of 0, 1 is used as a pointer, and I subtract 1 at the end, the result should be the same. I'm no longer using a NULL pointer. As far as I can tell the results are the same.
So basically: is there any reason why using 1 instead of 0 in this offsetof definition might not work? Can it still cause UB in certain cases, and if so: when and how? Basically, what I'm asking here is: Am I missing anything here?
Both definitions are undefined behavior: in the first definition a null pointer is dereferenced and in your second definition you are dereferencing an invalid pointer (the pointer does not point to a valid object). It is not possible in C to write a portable version of offsetof macro.
Defect Report #44 says:
"In particular, this is why the offsetof macro exists: there was otherwise no portable means to compute such translation-time constants."
(DR#44 is for C89 but nothing has changed in the language in C99 and C11 that would allow a portable implementation.)
I believe the behaviour is implementation-defined. In 6.3.2.3 of n1256:
5 An integer may be converted to any pointer type. Except as previously specified, the result is implementation-defined, might not be correctly aligned, might not point to an entity of the referenced type, and might be a trap representation.
One problem is that your created pointer does not point to an object.
6.2.4 Storage durations of objects
The lifetime of an object is the portion of program execution during which storage is
guaranteed to be reserved for it. An object exists, has a constant address, 33) and retains
its last-stored value throughout its lifetime. 34) If an object is referred to outside of its
lifetime, the behavior is undefined. The value of a pointer becomes indeterminate when
the object it points to (or just past) reaches the end of its lifetime.
and
J.2 Undefined behaviour
- The value of a pointer to an object whose lifetime has ended is used (6.2.4).
3.19.2 indeterminate value: either an unspecified value or a trap representation
When you convert 1 to a pointer, and the created pointer does not point to an object, the value of the pointer becomes indeterminate. You then use the pointer. Both of those cause undefined behavior.
The conversion of an integer to a pointer is also problematic:
6.3.2.3 Pointers
An integer may be converted to any pointer type. Except as previously specified, the
result is implementation-defined, might not be correctly aligned, might not point to an
entity of the referenced type, and might be a trap representation. 67)
The implementation of offsetof with dereferencing a NULL pointer invokes undefined behavior. In this implementation it is assumed that the hypothetical structure begins at address 0. You may assume it to be 1, and yes it will invoke UB too because you are dereferencing a null pointer, but because an uninitialized pointer is dereferenced.
Nothing in any version of the C standard would forbid a compiler from doing anything it wanted with any macro that would attempt to achieve the effect without defining a storage location to hold the indicated object. Nonetheless, a form like:
#define offsetof(s, m) ((char*)&((((s)*)0)->m)-(char*)0)
would probably be pretty safe for pre-C99 compilers. Note that it generates an integer by subtracting one char* from another. That is specified to work and yield the a constant value when the pointers access parts of the same valid object, and will in practice work on any compiler which doesn't notice that a null pointer isn't a valid object. By contrast, the effect of casting a pointer to an integer or vice versa will vary on different platforms and there are many platforms where (int)(((char*)&foo)+1) - (int)(char*)&foo may not yield 1.
Note also that the meaning of "Undefined Behavior" has changed recently. It used to be that Undefined Behavior meant that the specification didn't say what compilers had to do, but most compilers would generally choose (sometimes arbitrarily) behavior that was mathematically correct or would make sense on the underlying platform. For example, on a 32-bit processor, int32_t foo=2147483647; foo+=(unsigned char)x; if (foo > 100) ... a compiler might determine that for any possible value of x the mathematically-correct value assigned to foo would be in the range 2147483647 to 2147483903, and thus greater than 100 in any case. Or it might perform the operation using two's-complement arithmetic and perform the comparison on a possibly-wrapped-around value. Newer compilers, however, may do something even more interesting.
A new compiler may look at an expression like the example with foo and infer that if x is zero then foo must remain 2147483647, and if x is non-zero the compiler would be allowed to do whatever it likes, so it may infer that as a consequence that the LSB of x must equal zero when the statement is executed, so if the code is preceded by a test for (unsigned char)x==0, that expression would always be true. Given code like the offsetof macro, which would generate Undefined Behavior regardless of the values of any variables, a compiler would be entitled to eliminate not just any code using it, but also any preceding code which could not by any defined means cause program execution to terminate.
Note that casting a non-zero integer literal to a pointer only Undefined Behavior if there does not exist any object whose address has been taken and cast to an integer so as yield that same value. Thus, a compiler would not be able to recognize a variant of the pointer-difference-based offsetof macro which cast some non-zero value to a pointer as exhibiting Undefined Behavior unless it could determine that the number in question did not correspond to any pointer. On the other hand, an attempt to cast a non-zero integer to a pointer would on some systems perform a validation check to ensure that the pointer is valid; such a system may then trap if it isn't.
You're not actually dereferencing the pointer, what you're doing is more akin to pointer addition, so using zero should be fine.
Difference of two pointers of the same type is always one.
#include<stdio.h>
#include<string.h>
int main(){
int a = 5,b = 10,c;
int *p = &a,*q = &b;
c = p - q;
printf("%d" , c);
return 0;
}
Output is 1.
I dont get the reasoning behind it
The behavior is undefined.
C99 6.5.6 paragraph 9 says:
When two pointers are subtracted, both shall point to elements of the
same array object, or one past the last element of the array object;
the result is the difference of the subscripts of the two array
elements.
Paragraph 7 in the same section says:
For the purposes of these operators, a pointer to an object that is
not an element of an array behaves the same as a pointer to the first
element of an array of length one with the type of the object as its
element type.
Section 4 paragraph 2 says:
If a "shall" or "shall not" requirement that appears outside of a
constraint is violated, the behavior is undefined. Undefined behavior
is otherwise indicated in this International Standard by the words
"undefined behavior" or by the omission of any explicit definition of
behavior. There is no difference in emphasis among these three; they
all describe "behavior that is undefined".
3.4.3 defines the term "undefined behavior" as:
behavior, upon use of a nonportable or erroneous program construct or
of erroneous data, for which this International imposes no
requirements
NOTE Possible undefined behavior ranges from ignoring the situation
completely with unpredictable results, to behaving during translation
or program execution in a documented manner characteristic of the
environment (with or without the issuance of a diagnostic message), to
terminating a translation or execution (with the issuance of a
diagnostic message).
Given the declaration:
int a = 5, b = 10, c;
it's likely that evaluating &b - &a will yield a result that seems reasonable, such as 1 or -1. (Reasonable results are always a possible symptom of undefined behavior; it's undefined, not required to crash.) But the compiler is under no obligation to place a and b at any particular locations in memory relative to each other, and even if it does so, the subtraction is not guaranteed to be meaningful. An optimizing compiler is free to transform your program in ways that assume that its behavior is well defined, resulting in code that can behave in arbitrarily bad ways if that assumption is violated.
By writing &b - &a, you are in effect promising the compiler that that's a meaningful operation. As Henry Spencer famously said, "If you lie to the compiler, it will get is revenge."
Note that it's not just the result of the subtraction that's undefined, it's the behavior of the program that evaluates it.
Oh, did I mention that the behavior is undefined?
In fact, the behaviour of your program is undefined. The output happens to be 1 on your platform/compiler, but it could just as easily be something else.
This code exhibits undefined behaviour because pointer arithmetic is only defined when the operands are both in the same array or struct.
The value of 1 comes about because the compiler has placed a and b next to each other in memory. Another compiler could do something different. Indeed the same compiler could do something different the next time you change the code.
C compiler knows the size of each type. for example suppose P is a int pointer that refer to address 0x0010. if you add P by 1 (P++ or P=P+1), then the value of P is 0x0014.
About your question, a and b variables declared tandems, in physical memory they tandems, and the head of each of them, has 4 bytes difference with other. At this situation compiler knows the int size is 4 bytes. when you subtract 2 int pointer, the compiler divide the result by 4.
Difference of two pointers of the same type is always one.
#include<stdio.h>
#include<string.h>
int main(){
int a = 5,b = 10,c;
int *p = &a,*q = &b;
c = p - q;
printf("%d" , c);
return 0;
}
Output is 1.
I dont get the reasoning behind it
The behavior is undefined.
C99 6.5.6 paragraph 9 says:
When two pointers are subtracted, both shall point to elements of the
same array object, or one past the last element of the array object;
the result is the difference of the subscripts of the two array
elements.
Paragraph 7 in the same section says:
For the purposes of these operators, a pointer to an object that is
not an element of an array behaves the same as a pointer to the first
element of an array of length one with the type of the object as its
element type.
Section 4 paragraph 2 says:
If a "shall" or "shall not" requirement that appears outside of a
constraint is violated, the behavior is undefined. Undefined behavior
is otherwise indicated in this International Standard by the words
"undefined behavior" or by the omission of any explicit definition of
behavior. There is no difference in emphasis among these three; they
all describe "behavior that is undefined".
3.4.3 defines the term "undefined behavior" as:
behavior, upon use of a nonportable or erroneous program construct or
of erroneous data, for which this International imposes no
requirements
NOTE Possible undefined behavior ranges from ignoring the situation
completely with unpredictable results, to behaving during translation
or program execution in a documented manner characteristic of the
environment (with or without the issuance of a diagnostic message), to
terminating a translation or execution (with the issuance of a
diagnostic message).
Given the declaration:
int a = 5, b = 10, c;
it's likely that evaluating &b - &a will yield a result that seems reasonable, such as 1 or -1. (Reasonable results are always a possible symptom of undefined behavior; it's undefined, not required to crash.) But the compiler is under no obligation to place a and b at any particular locations in memory relative to each other, and even if it does so, the subtraction is not guaranteed to be meaningful. An optimizing compiler is free to transform your program in ways that assume that its behavior is well defined, resulting in code that can behave in arbitrarily bad ways if that assumption is violated.
By writing &b - &a, you are in effect promising the compiler that that's a meaningful operation. As Henry Spencer famously said, "If you lie to the compiler, it will get is revenge."
Note that it's not just the result of the subtraction that's undefined, it's the behavior of the program that evaluates it.
Oh, did I mention that the behavior is undefined?
In fact, the behaviour of your program is undefined. The output happens to be 1 on your platform/compiler, but it could just as easily be something else.
This code exhibits undefined behaviour because pointer arithmetic is only defined when the operands are both in the same array or struct.
The value of 1 comes about because the compiler has placed a and b next to each other in memory. Another compiler could do something different. Indeed the same compiler could do something different the next time you change the code.
C compiler knows the size of each type. for example suppose P is a int pointer that refer to address 0x0010. if you add P by 1 (P++ or P=P+1), then the value of P is 0x0014.
About your question, a and b variables declared tandems, in physical memory they tandems, and the head of each of them, has 4 bytes difference with other. At this situation compiler knows the int size is 4 bytes. when you subtract 2 int pointer, the compiler divide the result by 4.