gcc and clang are giving different results - c

#include <stdio.h>
int main(int argc, char *argv[]){
int a;
int *b = &a;
a = 10;
printf("%d %d\n", a, *b);
int p = 20;
int *q;
*q = p;
printf("%d %d\n", p, *q);
int *t = NULL;
return 0;
}
The above program when compiled with gcc gives segmentation fault on execution. But when compiled with clang, it executes without giving segmentation fault. Can anybody give the reason? gcc version is 9.3.0 and clang version is 10.0.0. OS is ubuntu 20.04

Problem:
The problem does not stem from the compiler, it's in the code itself, specifically *q = p, when you dereference the pointer, i.e. use *, you are accessing the value stored in the pointer which is the address where it points to, in this case the code is invalid because there is no memory assigned to q, it points to nowhere (at least to nowhere we'd like it to point). You can't store anything in the memory pointed by it because it doesn't exist or is some random memory location given by some garbage value that may be stored in q.
Behavior explained:
Given the above, and knowing that the value stored in q can be anything, you can and should expect different results in different compilers, different versions of the same compiler, or even the same compiler in the same machine but in different executions of the same program, at some point it may even be pointing to where you want it to, in a one in a trillion chance, and the program would then give you the expected result, therefore the behavior of your program is undefined.
Fixing it:
As already stated q needs to be pointing to some valid memory location before you can store a value in that memory location. You can do that by either allocating memory and assigning it to q or make q point to an existing valid memory address, e.g.:
int p;
int *q;
q = &p; // now q points to p, i.e. its value is the address of p
Now you can do:
*q = 10; // stores 10 in the memory address pointed by(stored in) q, the address of p
// p is now 10
Or
int value = 20;
*q = value; // stores a copy of value in the address of the variable pointed by q, again p
// p is now 20
Extra:
Note that if you use extra warning flags like -Wall or -Wextra among others, the compiler is likely to warn you about faulty constructs like the one you have.
gcc warning flags manual
clang warning flags manual

I'm not an expert in compliers, but you are for sure triggering some Undefined Behaviour right here:
int *q;
*q=p;
printf("%d %d\n",p,*q);
You are dereferencing pointer q before initializing it.
Reasons why this segfaults (or rather, doesn't segfault) can be few. The q could point to any memory location, it could for example hold old value of b after it was popped from stack in case of Clang, thus writing into non-restricted memory.
Not sure what your original intentions were with this piece of code, though.

The reason is that anything can happen if you use a variable which has not been initialized. If you compile this program with warnings enabled you should get a warning like
t.c:10:5: warning: ‘q’ is used uninitialized in this function [-Wuninitialized]
*q = p;
~~~^~~
Before initialization, a variable can have any value which happens to be at the memory location where the variable is allocated. That's why the runtime behavior is unpredictable. The following picture illustrates the situation before the assignment of p:
Since we don't know where q point, we cannot dereferrence (follow) the pointer.

Can anybody give the reason?
The reason comes from the C language itself and how compilers were constructed.
First, the C language - your code invokes undefined behavior. Firstly, because using an uninitialized variable is undefined behavior, but obviously because you are applying * operator on an "invalid" pointer. Bottom line, there is undefined behavior.
Now, because there is undefined behavior, compilers can do what they want and generate code however they want. In short - there are no requirements.
Because of that, compiler writers do not care what compilers do in undefined behavior cases. Two compilers were constructed differently and act differently in this specific case when compiling this specific code. It was not deliberate - no one cares, so some random unrelated decisions in unrelated fields resulted in such behavior of both compilers.
The specific reasons why behaviors of both compilers are different, will come from inspecting the source code of both compilers. In this case, inspecting llvm with its documentation and gcc with gcc developer options will be helpful along the way.

The line *q=p; uses the value of q which is uninitialized; accessing an uninitialized variable is undefined behaviour, allowing the compilers to interpret both that line of code and anything preceding or following the line in any way at all.
It'd probably give different results for different levels of optimization as well.

Related

Following C code compiles and runs, but is it undefined bahaviour?

I posted a question about some pointer issues I've been having earlier in this question:
C int pointer segmentation fault several scenarios, can't explain behaviour
From some of the comments, I've been led to believe that the following:
#include <stdlib.h>
#include <stdio.h>
int main(){
int *p;
*p = 1;
printf("%d\n", *p);
return 0;
}
is undefined behaviour. Is this true? I do this all the time, and I've even seen it in my C course.
However, when I do
#include <stdlib.h>
#include <stdio.h>
int main(){
int *p=NULL;
*p = 1;
printf("%d\n", *p);
return 0;
}
I get a seg fault right before printing the contents of p (after the line *p=1;). Does this mean I should have always been mallocing any time I actually assign a value for a pointer to point to?
If that's the case, then why does char *string = "this is a string" always work?
I'm quite confused, please help!
This:
int *p;
*p = 1;
Is undefined behavior because p isn't pointing anywhere. It is uninitialized. So when you attempt to dereference p you're essentially writing to a random address.
What undefined behavior means is that there is no guarantee what the program will do. It might crash, it might output strange results, or it may appear to work properly.
This is also undefined behaivor:
int *p=NULL;
*p = 1;
Because you're attempting to dereference a NULL pointer.
This works:
char *string = "this is a string" ;
Because you're initializing string with the address of a string constant. It's not the same as the other two cases. It's actually the same as this:
char *string;
string = "this is a string";
Note that here string isn't being dereferenced. The pointer variable itself is being assigned a value.
Yes, doing int *p; *p = 1; is undefined behavior. You are dereferencing an uninitialized pointer (accessing the memory to which it points). If it works, it is only because the garbage in p happened to be the address of some region of memory which is writable, and whose contents weren't critical enough to cause an immediate crash when you overwrote them. (But you still might have corrupted some important program data causing problems you won't notice until later...)
An example as blatant as this should trigger a compiler warning. If it doesn't, figure out how to adjust your compiler options so it does. (On gcc, try -Wall -O).
Pointers have to point to valid memory before they can be dereferenced. That could be memory allocated by malloc, or the address of an existing valid object (p = &x;).
char *string = "this is a string"; is perfectly fine because this pointer is not uninitialized; you initialized it! (The * in char *string is part of its declaration; you aren't dereferencing it.) Specifically, you initialized it with the address of some memory which you asked the compiler to reserve and fill in with the characters this is a string\0. Having done that, you can safely dereference that pointer (though only to read, since it is undefined behavior to write to a string literal).
is undefined behaviour. Is this true?
Sure is. It just looks like it's working on your system with what you've tried, but you're performing an invalid write. The version where you set p to NULL first is segfaulting because of the invalid write, but it's still technically undefined behavior.
You can only write to memory that's been allocated. If you don't need the pointer, the easiest solution is to just use a regular int.
int p = 1;
In general, avoid pointers when you can, since automatic variables are much easier to work with.
Your char* example works because of the way strings work in C--there's a block of memory with the sequence "this is a string\0" somewhere in memory, and your pointer is pointing at that. This would be read-only memory though, and trying to change it (i.e., string[0] = 'T';) is undefined behavior.
With the line
char *string = "this is a string";
you are making the pointer string point to a place in read-only memory that contains the string "this is a string". The compiler/linker will ensure that this string will be placed in the proper location for you and that the pointer string will be pointing to the correct location. Therefore, it is guaranteed that the pointer string is pointing to a valid memory location without any further action on your part.
However, in the code
int *p;
*p = 1;
p is uninitialized, which means it is not pointing to a valid memory location. Dereferencing p will therefore result in undefined behavior.
It is not necessary to always use malloc to make p point to a valid memory location. It is one possible way, but there are many other possible ways, for example the following:
int i;
int *p;
p = &i;
Now p is also pointing to a valid memory location and can be safely dereferenced.
Consider the code:
#include <stdio.h>
int main(void)
{
int i=1, j=2;
int *p;
... some code goes here
*p = 3;
printf("%d %d\n", i, j);
}
Would the statement *p = 2; write to i, j, or neither? It would write to i or j if p points to that object, but not if p points somewhere else. If the ... portion of the code doesn't do anything with p, then p might happen point to i, or j, or something within the stdout object, or anything at all. If it happens to point to i or j, then the write *p = 3; might affect that object without any side effects, but if it points to information within stdout that controls where output goes, it might cause the following printf to behave in unpredictable fashion. In a typical implementation, p might point anywhere, and there will be so many things to which p might point that it would be impossible to predict all of the possible effects of writing to them.
Note that the Standard classifies many actions as "Undefined Behavior" with the intention that many or even most implementations will extend the semantics of the language by documenting their behavior. Most implementations, for example, extend the meaning of the << operator to allow it to be used to multiply negative numbers by power of two. Even on implementations that extend the language to specify that an assignment like *p = 3; will always perform a word-sized write of the value 3 to the indicated address, with whatever consequence results, there would be relatively few platforms(*) where it would be possible to fully characterize all possible effects of that action in cases where nothing is known about the value of p. In cases where pointers are read rather than written, some systems may be able to offer useful behavioral guarantees about the effect of arbitrary stray reads, but not all(**).
(*) Some freestanding platforms which keep code in read-only storage may be able to uphold some behavioral guarantees even if code writes to arbitrary pointer addresses. Such behavioral guarantees may be useful in systems whose state might be corrupted by electrical interference, but even when targeting such systems writing to a stray pointer would never be useful.
(**) On many platforms, stray reads will either yield a meaningless value without side effects or force an abnormal program termination, but on an Apple II which a Disk II card in the customary slot-6 location, if code reads from address 0xC0EF within a second of performing a disk access, the drive head to start overwriting whatever happens to be on the last track accessed. This is by design (software that needs to write to the disk does so by accessing address 0xC0EF, and having hardware respond to both reads and writes required one less logic gate--and thus one less chip--than would be required for hardware that only responded to writes) but does mean that code must be careful not to perform any stray reads.

Do C pointers (always) start with a valid address memory?

Do C pointer (always) start with a valid address memory? For example If I have the following piece of code:
int *p;
*p = 5;
printf("%i",*p); //shows 5
Why does this piece of code work? According to books (that I read), they say a pointer always needs a valid address memory and give the following and similar example:
int *p;
int v = 5;
p = &v;
printf("%i",*p); //shows 5
Do C pointer (always) start with a valid address memory?
No.
Why does this code work?
The code invokes undefined behavior. If it appears to work on your particular system with your particular compiler options, that's merely a coincidence.
No. Uninitialized local variables have indeterminate values and using them in expressions where they get evaluated cause undefined behavior.
The behaviour is undefined. A C compiler can optimize the pointer access away, noting that in fact the p is not used, only the object *p, and replace the *p with q and effectively produce the program that corresponds to this source code:
#include <stdio.h>
int main(void) {
int q = 5;
printf("%i", q); //shows 5
}
Such is the case when I compile the program with GCC 7.3.0 and -O3 switch - no crash. I get a crash if I compile it without optimization. Both programs are standard-conforming interpretations of the code, namely that dereferencing a pointer that does not point to a valid object has undefined behaviour.
No.
On older time, it was common to initialize pointer to selected memory addresses (e.g. linked to hardware).
char *start_memory buffer = (char *)0xffffb000;
Compiler has no way to find if this is a valid address. This involve a cast, so it is cheating.
Consider
static int *p;
p will have the value of NULL, which doesn't point to a valid address (Linux, but on Kernel, it invalidate such address, other OS could use memory on &NULL to store some data.
But you may also create initialized variables, so with undefined initial values (which probably it is wrong).

Is casting a pointer to a double pointer acceptable within C?

I was just curious if this was correct in assigning the value 888 to c and if it is not then why. I haven't found anything saying it was not and when I looked inside the c language specifications it appeared as if it was correct.
int** ppi;
int c = 6;
ppi = (int**)(&c);
*ppi = 888;
I have used it within several IDE's and with several compilers, but none have given me an error. However, some of my friends have said that this code should throw an error.
I was trying to change the value of c without adding in an intermediate pointer.
I know the following will work, but I was not sure if doing it the above way would work as well.
int** ppi;
int* pi;
int c = 6;
pi = &c;
ppi = π
**ppi = 888;
The code causes undefined behaviour in 4 different ways; it is certainly not "correct" or "acceptable" as some of the other answers seem to be suggesting.
Firstly, *ppi = 888; attempts to assign an int to an lvalue of type int * . This violates the constraint 6.5.16.1/1 of the assignment operator which lists the types that may be assigned to each other; integer to pointer is not in the list.
Being a constraint violation, the compiler must issue a diagnostic and may refuse to compile the program. If the compiler does generate a binary then that is outside the scope of the C Standard, i.e. completely undefined.
Some compilers, in their default mode of operation, will issue the diagnostic and then proceed as if you had written *ppi = (int *)888;. This brings us to the next set of issues.
The behaviour of casting 888 to int * is implementation-defined. It might not be correctly aligned (causing undefined behaviour), and it might be a trap representation (also causing undefined behaviour). Furthermore, even if those conditions pass, there is no guarantee that (int *)888 has the same size or representation as (int)888 as your code relies on.
The next major issue is that the code violates the strict aliasing rule. The object declared as int c; is written using the lvalue *ppi which is an lvalue of type int *; and int * is not compatible with int.
Yet another issue is that the write may write out of bounds. If int is 4 bytes and int * is 8 bytes, you tried to write 8 bytes into a 4-byte allocation.
Another problem from earlier in the program is that ppi = (int**)(&c); will cause undefined behaviour if c is not correctly aligned for int *, e.g. perhaps the platform has 4-byte alignment for int and 8-byte alignment for pointers.
This is not acceptable. Unless you have some really good reason to know that there's an int being stored at the memory address 888, this is invalid code which will lead to either crashes or undefined behavior if you dereference the pointer twice (and if you don't plan to do that, there's little point in using an int **).
ppi contains a pointer that points to a memory location that itself contains a pointer to an int.
int c=6; creates storage for an int and puts the value 6 into that storage giving:
ppi : [ some pointer ]
c : [ 6 ]
The line
ppi = (int**)(&c)
is telling the compiler "never mind that &c is a pointer to int; assume it's a pointer that holds a pointer to int; then store that in ppi. So at this point, ppi will contain the address of c (whatever that may be). So we have
ppi : [ &c ]
c : [ 6 ]
The next line
*ppi = 888;
is telling the compiler : "Store the value 888 at the location pointed to by *ppi."
So ppi points at c which contains 6 so we'd expect the value of c to be modified to 888. But wait, c is an int so depending on how much space an int takes, it may not be enough to store a pointer. This is the biggest problem here.
int** ppi;
int c = 6;
ppi = (int**)(&c); // Cast from int* to int** may be lossy or trap due to alignment issues
*ppi = 888; // 888 is not an int* nor implicitly convertible. Whether casting
// is allowed, and what that means, depends on the implementation
Regarding compilation giving you an error:
While the last assignment forces the compiler to give a diagnostic message, any singular one is enough. Whether that is called an error, how much detail it contains, and if that breaks the build is at the discretion of the implementation. There are probably options.

const int *ptr=500;where exactly it stored

I know that,in const int *ptr we can change the address but cannot change the value. i.e., ptr will be stored in read-write section(stack) and the object or entity will be stored in read-only section of data segement. So, we can change the address pointing by that pointer ptr, but cannot change the object which is constant.
int main()
{
const int *ptr=500;
(*ptr)++;
printf("%d\n",*ptr);
}
output is is assign the read only location to *ptr
int main()
{
const int *ptr=500;
ptr++;
printf("%d\n",*ptr);
}
No compilation errors, but at runtime the output is "segmentation fault".
I agree with the first one., why I am getting an segmentation fault in 2nd one? Where exactly they will be stored?
The reason for the segmentation fault is different from what you think.
It is not because of const.
It is because you are not allowed to access the area that you are trying to access when doing *ptr
When you make a pointer to "something", you are still not allowed to access the data (aka dereference the pointer) until you have made the pointer point to some memory that belongs to you.
Example:
int x = 0;
int* p = (int*)500;
int a = *p; // Invalid - p is not pointing to any memory that belongs to the program
p = &x;
int b = *p; // Fine - p is pointing to the variable x
p++;
int c = *p; // Invalid - p is not pointing to any memory that belongs to the program
The "invalid" code may give a segmentation fault. On the other hand, it may also just execute and produce unexpected results (or even worse: produce the expected result).
Lots of confusion here.
and the object or entity will be stored in read-only section of data segement
No, there is no requirement for where the pointed-at object is stored. This is only determined by any qualifiers/specifiers such as const or static, when declaring the pointed-at object.
const int *ptr=500;
This is not valid C and the code must result in a compiler message. An integer cannot get assigned to a pointer, there must be a conversion in between. GCC has a known flaw here, you have to configure it to be a standard compiler. gcc -std=c11 -pedantic-errors.
If you had code such as const int *ptr=(int*)500; which is valid C, then it would set the pointer to point at address 500. If there is an int at address 500, the code will work fine. If there is no memory there that you are allowed to access, then you will get some implementation-defined behavior like a crash - memory mapping is beyond the scope of the language.
(*ptr)++;
This is not valid C and the code must result in a compiler message. You are not allowed to modify a read-only location.
Overall, your compiler seems very poorly configured. GCC, correctly configured, gives 2 compiler errors.
const int *ptr=500; // WRONG
This declare a local variable which is a pointer to some constant integer. The const just tells the compiler that it is not allowed to update (overwrite) the dereferenced pointer memory cell.
However, your code is not correct; you probably want:
const int *ptr = (const int*)500;
The pointer is initialized to address 500 (you initialize the pointer).
On most systems, that address (and the following ones, e.g. at address 504 since sizeof(int) is 4) is out of the virtual address space. So dereferencing it (with *ptr) is undefined behavior and would often give some segmentation fault. See also this.
ptr will be stored in read-write section(stack) and the object or entity will be stored in read-only section of data segement.
This is wrong. Nothing is done at compilation time to keep the memory zone in a read-only text segment (however, most compilers are putting most literals or const static or global data -defined at compile-time- in it). Just you forbid the compiler to update the pointed thing (without cast).
If you need a read-only memory zone at runtime, you need to ask your OS for it (e.g. using mmap(2) & mprotect(2) on Linux). BTW protection works in pages.
On Linux, use pmap(1) (or proc(5), e.g. read sequentially the pseudo file /proc/self/maps from your program). You may want to add
char cmdbuf[64];
snprintf(cmdbuf, sizeof(cmdbuf), "pmap %d", (int) getpid());
system(cmdbuf);
before any dereference of ptr in your code to understand what is its virtual address space.
Try
cat /proc/self/maps
and
cat /proc/$$/maps
and understand their output (notice that $$ is expanded to the pid of your shell). Maybe experiment also strace(1) on your faulty program (which you should compile with gcc -Wall -g).

Evaluating the condition containing unitialized pointer - UB, but can it crash?

Somewhere on the forums I encountered this:
Any attempt to evaluate an uninitialized pointer variable
invokes undefined behavior. For example:
int *ptr; /* uninitialized */
if (ptr == NULL) ...; /* undefined behavior */
What is meant here?
Is it meant that if I ONLY write:
if(ptr==NULL){int t;};
this statement is already UB?
Why? I am not dereferencing the pointer right?
(I noticed there maybe terminology issue, by UB in this case, I referred to: will my code crash JUST due to the if check?)
Using unitialized variables invokes undefined behavior. It doesn't matter whether it is pointer or not.
int i;
int j = 7 * i;
is undefined as well. Note that "undefined" means that anything can happen, including a possibility that it will work as expected.
In your case:
int *ptr;
if (ptr == NULL) { int i = 0; /* this line does nothing at all */ }
ptr might contain anything, it can be some random trash, but it can be NULL too. This code will most likely not crash since you are just comparing value of ptr to NULL. We don't know if the execution enters the condition's body or not, we can't be even sure that some value will be successfully read - and therefore, the behavior is undefined.
your pointer is not initialized. Your statement would be the same as:
int a;
if (a == 3){int t;}
since a is not initialized; its value can be anything so you have undefined behavior. It doesn't matter whether you dereference your pointer or not. If you would do that, you would get a segfault
The C99 draft standard says it is undefined clearly in Annex J.2 Undefined behavior:
The value of an object with automatic storage duration is used while it is
indeterminate (6.2.4, 6.7.8, 6.8).
and the normative text has an example that also says the same thing in section 6.5.2.5 Compound literals paragraph 17 which says:
Note that if an iteration statement were used instead of an explicit goto and a labeled statement, the lifetime of the unnamed object would be the body of the loop only, and on entry next time around p would have an indeterminate value, which would result in undefined behavior.
and the draft standard defines undefined behavior as:
behavior, upon use of a nonportable or erroneous program construct or of erroneous data,
for which this International Standard imposes no requirements
and notes that:
Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).
As Shafik has pointed out, the C99 standard draft declares any use of unintialized variables with automatic storage duration undefined behaviour. That amazes me, but that's how it is. My rationale for pointer use comes below, but similar reasons must be true for other types as well.
After int *pi; if (pi == NULL){} your prog is allowed to do arbitrary things. In reality, on PCs, nothing will happen. But there are architectures out there which have illegal address values, much like NaN floats, which will cause a hardware trap when they are loaded in a register. These to us modern PC users unheard of architectures are the reason for this provision. Cf. e.g. How does a hardware trap in a three-past-the-end pointer happen even if the pointer is never dereferenced?.
The behavior of this is undefined because of how the stack is used for various function calls. When a function is called the stack grows to make space for variables within the scope of that function, but this memory space is not cleared or zeroed out.
This can be shown to be unpredictable in code like the following:
#include <stdio.h>
void test()
{
int *ptr;
printf("ptr is %p\n", ptr);
}
void another_test()
{
test();
}
int main()
{
test();
test();
another_test();
test();
return 0;
}
This simply calls the test() function multiple times, which just prints where 'ptr' lives in memory. You'd expect maybe to get the same results each time, but as the stack is manipulated the physical location of where 'ptr' is has changed and the data at that address is unknown in advance.
On my machine running this program results in this output:
ptr is 0x400490
ptr is 0x400490
ptr is 0x400575
ptr is 0x400585
To explore this a bit more, consider the possible security implications of using pointers that you have not intentionally set yourself
#include <stdio.h>
void test()
{
int *ptr;
printf("ptr is %p\n", ptr);
}
void something_different()
{
int *not_ptr_or_is_it = (int*)0xdeadbeef;
}
int main()
{
test();
test();
something_different();
test();
return 0;
}
This results in something that is undefined even though it is predictable. It is undefined because on some machines this will work the same and others it might not work at all, it's part of the magic that happens when your C code is converted to machine code
ptr is 0x400490
ptr is 0x400490
ptr is 0xdeadbeef
Some implementations may be designed in such a way that an attempted rvalue conversion of an invalid pointer may case arbitrary behavior. Other implementations are designed in such a way that an attempt to compare any pointer object with null will never do anything other than yield 0 or 1.
Most implementations target hardware where pointer comparisons simply compare bits without regard for whether those bits represent valid pointers. The authors of many such implementations have historically considered it so obvious that a pointer comparison on such hardware should never have any side-effect other than to report that pointers are equal or report that they are unequal that they seldom bothered to explicitly document such behavior.
Unfortunately, it has become fashionable for implementations to aggressively "optimize" Undefined Behavior by identifying inputs that would cause a program to invoke UB, assuming such inputs cannot occur, and then eliminating any code that would be irrelevant if such inputs were never received. The "modern" viewpoint is that because the authors of the Standard refrained from requiring side-effect-free comparisons on implementations where such a requirement would
impose significant expense, there's no reason compilers for any platform should guarantee them.
You're not dereferencing the pointer, so you don't end up with a segfault. It will not crash. I don't understand why anyone thinks that comparing two numbers will crash. It's nonsense. So again:
IT WILL NOT CRASH. PERIOD.
But it's still UB. You don't know what memory address the pointer contains. It may or may not be NULL. So your condition if (ptr == NULL) may or may not evaluate to true.
Back to my IT WILL NOT CRASH statement. I've just tested the pointer going from 0 to 0xFFFFFFFF on the 32-bit x86 and ARMv6 platforms. It did not crash.
I've also tested the 0..0xFFFFFFFF and 0xFFFFFFFF00000000..0xFFFFFFFFFFFFFFFF ranges on and amd64 platform. Checking the full range would take a few thousand years I guess.
Again, it did not crash.
I challenge the commenters and downvoters to show a platform and value where it crashes. Until then, I'll probably be able to survive a few negative points.
There is also a SO link to
trap representation
which also indicates that it will not crash.

Resources