GCC: why constant variables not placed in .rodata

GCC: why constant variables not placed in .rodata - c

I've always believed that GCC would place a static const variable to .rodata segments (or to .text segments for optimizations) of an ELF or such file. But it seems not that case.
I'm currently using gcc (GCC) 4.7.0 20120505 (prerelease) on a laptop with GNU/Linux. And it does place a static constant variable to .bss segment:
/*
* this is a.c, and in its generated asm file a.s, the following line gives:
* .comm a,4,4
* which would place variable a in .bss but not .rodata(or .text)
*/
static const int a;
int main()
{
int *p = (int*)&a;
*p = 0; /* since a is in .data, write access to that region */
/* won't trigger an exception */
return 0;
}
So, is this a bug or a feature? I've decided to file this as a bug to bugzilla but it might be better to ask for help first.
Are there any reasons that GCC can't place a const variable in .rodata?
UPDATED:
As tested, a constant variable with an explicit initialization(like const int a = 0;) would be placed into .rodata by GCC, while I left the variable uninitialized. Thus this question might be closed later -- I didn't present a correct question maybe.
Also, in my previous words I wrote that the variable a is placed in '.data' section, which is incorrect. It's actually placed into .bss section since not initialized. Text above now is corrected.

The compiler has made it a common, which can be merged with other compatible symbols, and which can go in bss (taking no space on disk) if it ends up with no explicitly initialized definition. Putting it in rodata would be a trade-off; you'd save memory (commit charge) at runtime, but would use more space on disk (potentially a lot for a huge array).
If you'd rather it go in rodata, use the -fno-common option to GCC.

Why GCC does it? Can't really answer that question without asking the developers themselves. If I'm allowed to speculate, I'd wager it has to do with optimization--compilers don't have to enforce const.
That said, I think it's better if we look at the language itself, particularly undefined behavior. There are a few mentions of undefined behavior, but none of them go in-depth.
Modifying a constant is undefined behavior. Const is a contract, and that is especially true in C (and C++).
"But what if I const_cast away the const and modify y anyway?" Then you have undefined behavior.
What undefined behavior means is that the compiler is allowed to do quite literally anything it wants, and whatever the compiler decides to do will not be considered a violation of the ISO 9899 standard.
3.4.3
1 undefined behavior
behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements
2 NOTE Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).
ISO/IEC 9899:1999, §3.4.3
What this means is that, because you have invoked undefined behavior, anything the compiler does is technically correct by way of not being incorrect. Ergo, it is correct for GCC to take...
static const int a = 0;
...and turn it into a .rodata symbol, while taking...
static const int a; // guaranteed to be zero
...and turning it into a .bss symbol.
In the former case, any attempt to modify a--even by proxy--will typically result in a segmentation violation, causing the kernel to force-kill the running program. In the latter case, the program will probably run without crashing.
That said, it is not reasonable to guess which one the compiler will do. Const is a contract, and it is up to you, the programmer, to uphold that contract by not modifying data that is supposed to be constant. Violating that contract means undefined behavior, and all the portability issues and program bugs that come with it.
So GCC can do a couple things.
It might write the symbol to .rodata, giving it protection under the OS kernel
It might write the object to somewhere where memory protection is not guaranteed, in which case...
It might change the value
It might change the value and immediately change it back
It might completely delete the offending code under the rationale that the value isn't changing (0 -> 0), essentially optimizing...
int main(){
int *p = &a;
*p = 0;
return 0;
}
...to...
int main(void){
return 0;
}
It might even send a model T-800 back in time to terminate your parents before you're born.
All of these behaviors are legal (well, legal in the sense of adhering to the standard), so the bug report was not warranted.

writing to an object that has been declared const qualified is undefined behavior: anything can happen, even that.
There is no way in C to declare the object itself to be unmutable, you only forbid it to be mutable through the particular access that you have to it. Here you have an int*, so modification is "allowed" in the sense that the compiler is not forced to issue a diagnostic. Doing a cast in C means that you suppose to know what you are doing.

Are there any reasons that GCC can't place a const variable in .rodata?
Your program is optimized by the compiler (even in -O0 some optimizations are done). Constant propagation is done: http://en.wikipedia.org/wiki/Constant_folding
Try to deceive the compiler like this (note that this program is still technically undefined behavior):
#include <stdio.h>
static const int a;
int main(void)
{
*(int *) &a = printf(""); // compiler cannot assume it is 0
printf("%d\n", a);
return 0;
}

Related

Why this program will never terminate with flag `-O3`?

The program below has different behaviors with different option levels. When I compile it with -O3, it will never terminate. when I compile it with -O0, it will always terminate very soon.
#include <stdio.h>
#include <pthread.h>
void *f(void *v) {
int *i = (int *)v;
*i = 0;
printf("set to 0!\n");
return NULL;
}
int main() {
const int c = 1;
int i = 0;
pthread_t thread;
void *ptr = (void *)&c;
while (c) {
i++;
if (i == 1000) {
pthread_create(&thread, NULL, &f, ptr);
}
}
printf("done\n");
}
This is the result of running it with different optimization flags.
username#hostname:/src$ gcc -O0 main.c -o main
username#hostname:/src$ ./main
done
set to 0!
set to 0!
username#hostname:/src$ gcc -O3 main.c -o main
username#hostname:/src$ ./main
set to 0!
set to 0!
set to 0!
set to 0!
set to 0!
set to 0!
^C
username#hostname:/src$
The answer given by the professor's slide is like this:
Will it always terminate?
Depends of gcc options
With –O3 (all optimisations): no
Why?
The variable c is likely to stay local in a register, hence it will not be shared.
Solution « volatile »
Thank you for your replies. I now realize that volatile is a keyword in C. The description of the volatile keyword:
A volatile specifier is a hint to a compiler that an object may change its values in ways not specified by the language so that aggressive optimizations must be avoided.
According to my understanding, there is a shared register that stores the c value when we use -O3 flag. So the main thread and sub-thread will share it. In this case, if a sub-thread modifies c to 0, the main thread will get 0 when it wants to read c to compare in the while(c) statement. Then, the loop stops.
There is no register storing c that can be shared by the main thread and sub-threads when we use -O0 flag. Though the c is modified by a sub-thread, this change may not be written to memory and just be stored in a register, or it is written to memory while the main thread just uses the old value which is read and saved in a register. As a result, the loop is infinite.
If I declared the c value with const: const volatile int c = 1;, the program will terminate finally even if we compiled it with -O3. I guess all threads will read c from the main memory and write back to the main memory if they change the c value.
I know, according to the specifications or rules about C language, we are not allowed to modify a value that is declared by the const keyword. But I don't understand what is un behavior.
I wrote a test program:
#include "stdio.h"
int main() {
const int c = 1;
int *i = &c;
*i = 2;
printf("c is : %d\n", c);
}
output
username#hostname:/src$ gcc test.c -o test
test.c: In function ‘main’:
test.c:9:14: warning: initialization discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers]
9 | int *i = &c;
| ^
username#hostname:/src$ ./test
c is : 2
username#hostname:/src$
The result is 2 which means a variable declared with the const can be modified but this behavior is not suggested, right?
I also tried changing the judgment condition. If it is changed to while (1){ from while(c){, the loop will be an infinite one no matter using -O0 or -O3
This program is not a good one as it violates the specifications or rules of C language. Actually it comes from the lecture about software security.
Can I just understand like this? All threads share the same register storing c when we compile the program with -O0.
While the value c is in un-shared registers, so main thread is not informed when sub-threads modify value c when we use -O3. Or, while(c){ is replaced by while(1){ when we use -O3 so the loop is infinite.
I know this question can be solved easily if I check the generated assembly code. But I am not good at it.

This is undefined behavior. Per 6.7.3 Type qualifiers, paragraph 6 of the (draft) C11 standard:
If an attempt is made to modify an object defined with a const-qualified type through use of an lvalue with non-const-qualified type, the behavior is undefined.
There's no requirement for any particular behavior on the program. How it behaves is literally outside the specifications of the C language.
Your professor's observation of how it behaves may be correct. But he goes off the rails. There is no "why" for undefined behavior. What happens can change with changes to compiler options, particulars of the source code, time of day, or phase of the moon. Anything. Any expectation for any particular behavior is unfounded.
And
Solution « volatile »
is flat-out WRONG.
volatile does not provide sufficient guarantees for multithreaded access. See Why is volatile not considered useful in multithreaded C or C++ programming?.
volatile can appear to "work" because of particulars of the system, or just because any race conditions just don't happen to be triggered in an observable manner, but that doesn't make it correct. It doesn't "work" - you just didn't observe any failure. "I didn't see it break" does not mean "it works".
Note that some C implementations do define volatile much more extensively than the C standard requires. Microsoft in particular defines volatile much more expansively, making volatile much more effective and even useful and correct in multithreaded programs.
But that does not apply to all C implementations. And if you read that link, you'll find it doesn't even apply to Microsoft-compiled code running on ARM hardware...

The professor's explanation is not quite right.
The initial value of c is 1, which is truthy. It's declared as a constant, so its value can't change. Thus, the condition in while (c) is guaranteed to always be true, so there's no need to test the variable at all when the program is running. Just generate code for an infinite loop.
This optimization of not reading the variable is not done when optimization is disabled. In practice, declaring the variable volatile also forces it to be read whenever the variable is referenced in code.
Note that optimizations are implementation-dependent. Assigning to a const variable by accessing it through a non-const pointer results in undefined behavior, so any result is possible.
The typical use of a const volatile variable is for variables that reference read-only hardware registers that can be changed asynchronously (e.g. I/O ports on microcontrollers). This allows the application to read the register but code that tries to assign to the variable will not compile.

The explanation of "The variable c is likely to stay local in a register, hence it will not be shared." is not quite right. Or I'm having trouble parsing its precise meaning.
Once you take a pointer to it, the compiler has to put it into memory, unless it can convince itself that the pointer will not be used.
Here https://godbolt.org/z/YavbYxqoE
mov DWORD PTR [rsp+4], 1
and
lea rcx, [rsp+4]
suggest to me that the compiler has put the variable on the stack.
It's just that the while loop is not checking it for changes due to it being advertised as const.

Different ways of suppressing 'uninitialized variable warnings' in C

I have encountered multiple uses of the uninitialized_var() macro designed to get rid of warnings like:
warning: ‘ptr’ is used uninitialized in this function [-Wuninitialized]
For GCC (<linux/compiler-gcc.h>) it is defined such a way:
/*
* A trick to suppress uninitialized variable warning without generating any
* code
*/
#define uninitialized_var(x) x = x
But also I discovered that <linux/compiler-clang.h> has the same macro defined in a different way:
#define uninitialized_var(x) x = *(&(x))
Why we have two different definitions? For what reason the first way may be insufficient? Is the first way insufficient just for Clang or in some other cases too?
c example:
#define uninitialized_var(x) x = x
struct some {
int a;
char b;
};
int main(void) {
struct some *ptr;
struct some *uninitialized_var(ptr2);
if (1)
printf("%d %d\n", ptr->a, ptr2->a); // warning about ptr, not ptr2
}

Compilers are made to recognize certain constructs as indications that the author intended something deliberately, when the compiler would otherwise warn about it. For example, given if (b = a), GCC and Clang both warn that an assignment is being used as a conditional, but they do not warn about if ((b = a)) even though it is equivalent in terms of the C standard. This particular construct with extra parentheses has simply been set as a way to tell the compiler the author truly intends this code.
Similarly, x = x has been set as a way to tell GCC not to warn about x being uninitialized. There are times where a function may appear to have a code path in which an object is used without being initialized, but the author knows the function is intended not to be used with parameters that would ever cause that particular code path to be executed and, for reasons of efficiency, they want to silence the compiler warning rather than add an initialization that is not actually necessary for program correctness.
Clang was presumably designed not to recognize GCC’s idiom for this and needed a different method.

Why we have two different definitions?
Unclear, but I speculate that it's because Clang still produces a warning for x = x when x is uninitialized, but not for x = *(&(x)). Under almost every circumstance* in which one of those expressions has well-defined behavior, the other has the same well-defined behavior. Under other circumstances, such as when the value of x is undefined or indeterminate, both have undefined behavior, or the behavior of x = x is defined and that of x = *(&(x)) undefined, so the latter provides no advantage.
For what reason the first way may be insufficient?
Because the behavior of both is undefined in the use cases for which they seem to be intended. It is not at all surprising, then, that different compilers handle them differently.
Is the first way insufficient just for Clang or in some other cases too?
Both expressions' meaning and behavior is undefined. In one sense, then, one cannot safely conclude that either is sufficient for anything. In the empirical sense of whether using one or the other fools certain compilers into not emitting warnings that they otherwise would, and still should, emit, it is likely that there were, are, and / or will be compilers that handle the undefined behavior associated with both of those expressions differently than GCC and Clang do.
* The exception being when x is declared with register storage class, in which case the second expression has undefined behavior regardless of whether x has a well-defined value.

Is the behaviour of a program that has undefined behaviour on an unreachable path defined? [duplicate]

This question already has answers here:
Can code that will never be executed invoke undefined behavior?
(9 answers)
Closed 4 years ago.
Consider
void swap(int* a, int* b)
{
if (a != b){
*a = *a ^ *b;
*b = *a ^ *b;
*a = *a ^ *b;
}
}
int main()
{
int a = 0;
int b = 1;
swap(&a, &b); // after this b is 0 and a is 1
return a > b ? 0 : a / b;
}
swap is an attempt to fool the compiler into not optimising out the program.
Is the behaviour of this program defined? a / b is never reachable, but if it was then you'd get a division by zero.

It is not necessary to base a position on this question on the usefulness of any given code construct or practice, nor on anything written about C++, whether in its standard or in another SO answer, no matter how similar C++'s definitions may be. The key thing to consider is C's definition of undefined behavior:
behavior, upon use of a nonportable or erroneous program construct or
of erroneous data, for which this International Standard imposes no
requirements
(C2011, 3.4.3/1; emphasis added)
Thus, undefined behavior is triggered temporally ("upon use" of a construct or data), not by mere presence.* It is convenient that this is consistent for undefined behavior arising from data and that arising from program constructs; the standard need not have been consistent there. And as another answer describes, this "upon use" definition is a good design choice, as it allows programs to avoid executing undefined behaviors associated with erroneous data.
On the other hand, if a program does execute undefined behavior then it follows from the standard's definition that the whole behavior of the program is undefined. This consequent undefinedness is a more general kind arising from the fact that the UB associated directly with the erroneous data or construct could, in principle, include altering the behavior of other parts of the program, even retroactively (or apparently so). There are of course extra-lingual limitations on what could happen -- so no, nasal demons will not actually be making any appearances -- but those are not necessarily as strong as one might suppose.
* Caveat: some program constructs are used at translation time. These produce UB in program translation, with the result that every execution of the program has wholly-undefined behavior. For a somewhat stupid example, if your program source does not end with an unescaped newline then the program's behavior is completely undefined (see C2011, 5.1.1.2/1, point 2).

The behavior of an expression that is not evaluated is irrelevant to the behavior of a program. Behavior that would be undefined if the expression were evaluated has no bearing on the behavior of the program.
If it did, then this code would be useless:
if (p != NULL)
…; // Use pointer p.
(Your XORs could have undefined behavior, as they may produce a trap representation. You can defeat optimization for academic examples like this by declaring an object to be volatile. If an object is volatile, the C implementation cannot know whether its value may change due to external means, so each use of the object requires the implementation to read its value.)

In general, code which would invoke Undefined Behavior if executed must not have any effect if it is not executed. There are, however, a few cases where real-world implementations may behave in contrary fashion and refuse to generate code which, while not a constraint violation, could not possibly execute in defined behavior.
extern struct foo z;
int main(int argc, char **argv)
{
if (argc > 2) z;
return 0;
}
By my reading of the Standard, it explicitly characterizes lvalue conversions on incomplete types as invoking Undefined Behavior (among other things, it's unclear what an implementation could generate code for such a thing), so the Standard would impose no requirements upon behavior if argc is 3 or more. I can't identify any constraint in the Standard that the above code would violate, however, nor any reason behavior should not be fully defined if argc is 2 or less. Nonetheless, many compilers including gcc and clang reject the above code entirely.

How to modify value of const variable? [duplicate]

This question already has answers here:
Can we change the value of an object defined with const through pointers?
(11 answers)
Closed 5 years ago.
I am able to change the value of const modified variable in gcc but not in other compilers.
I have tried this code on gcc, which updates the value of i and j (11). With an online compiler, I get different values.
#include<stdio.h>
void main() {
const int i=10;
int *j;
j = &i;
(*j)++;
printf("address of j is %p address of i is %p\n",j,&i);
printf("i is %d and j is %d\n",i,*j);
}

Yes, you can do it with a little hack.
#include <stdio.h>
int main(){
const int a = 0;
*(int *)&a = 39;
printf("%d", a);
}
In the above code, a is a const int. With the little hack, you can change the constant value.
Update: Explanation
In the above code, a is defined as a const. For example a has a memory addr 0x01 and therefore &a returns the same. When it is casted with (int *) it becomes another variable referred as a pointer to the const. When it is accessed with * again, the another variable can be accessed without violation of the const policy because it is not the original variable, but the changes are reflected because it is referred as address to the pointer.
This will work on older versions like Borland C++ or Turbo C++, however no one is using it now a days.
It's "undefined behaviour", meaning that based on the standard you can't predict what will happen when you try this. It may do different things depending on the particular machine, compiler, and state of the program.
In this case, what will most often happen is that the answer will be "yes". A variable, const or not, is just a location in memory, and you can break the rules of const and simply overwrite it. (Of course this will cause a severe bug if some other part of the program is depending on its const data being constant!)
However in some cases -- most typically for const static data -- the compiler may put such variables in a read-only region of memory. MSVC, for example, usually puts const static ints in .text segment of the executable, which means that the operating system will throw a protection fault if you try to write to it, and the program will crash.
In some other combination of compiler and machine, something entirely different may happen. The one thing you can predict for sure is that this pattern will annoy whoever has to read your code.
Try this and let me know.

How to modify value of const variable?
No! You shouldn't modify a const variable.
The whole point of having a const variable is to be not able to modify it. If you want a variable which you should be able to modify, simply don't add a const qualifier on it.
Any code which modify's a const forcibly through (pointer)hackery invokes Undefined Behavior.
An Undefined Behavior means that the code is non conforming to the standard specifications laid out by the C standard and hence not a valid code. Such a code can show any behavior and it is allowed to do so.

By defining i as const, you promised not to modify it. The compiler can rely on that promise and assume that it's not modified. When you print the value of i, the compiler can just print 10 rather than loading whatever value is currently stored in i.
Or it can choose to load the value. Or it can cause your program to crash when you try to modify i. The behavior is undefined.
You'll likely see different behavior with gcc depending on the optimization options (-O1, -O3).
Oh, and void main() is incorrect; it should be int main(void). If your textbook tells you to use void main(), you should get a better book.

Take a look at Can we change the value of an object defined with const through pointers?
Long story short, it's an undefined behaviour. It can result dependently on compiler/machine.

You can not modify the value of const variable if you compile this code in gcc it will show
error: invalid conversion from ‘const int*’ to ‘int*’ [-fpermissive]
and show undefined behave
The maiin thing is that we can only modify the variable when we can access the address without address we can't do anything. That's the same thing with Register storage class.

How undefined is undefined behavior?

I'm not sure I quite understand the extent to which undefined behavior can jeopardize a program.
Let's say I have this code:
#include <stdio.h>
int main()
{
int v = 0;
scanf("%d", &v);
if (v != 0)
{
int *p;
*p = v; // Oops
}
return v;
}
Is the behavior of this program undefined for only those cases in which v is nonzero, or is it undefined even if v is zero?

I'd say that the behavior is undefined only if the users inserts any number different from 0. After all, if the offending code section is not actually run the conditions for UB aren't met (i.e. the non-initialized pointer is not created neither dereferenced).
A hint of this can be found into the standard, at 3.4.3:
behavior, upon use of a nonportable or erroneous program construct or of erroneous data,
for which this International Standard imposes no requirements
This seems to imply that, if such "erroneous data" was instead correct, the behavior would be perfectly defined - which seems pretty much applicable to our case.
Additional example: integer overflow. Any program that does an addition with user-provided data without doing extensive check on it is subject to this kind of undefined behavior - but an addition is UB only when the user provides such particular data.

Since this has the language-lawyer tag, I have an extremely nitpicking argument that the program's behavior is undefined regardless of user input, but not for the reasons you might expect -- though it can be well-defined (when v==0) depending on the implementation.
The program defines main as
int main()
{
/* ... */
}
C99 5.1.2.2.1 says that the main function shall be defined either as
int main(void) { /* ... */ }
or as
int main(int argc, char *argv[]) { /* ... */ }
or equivalent; or in some other implementation-defined manner.
int main() is not equivalent to int main(void). The former, as a declaration, says that main takes a fixed but unspecified number and type of arguments; the latter says it takes no arguments. The difference is that a recursive call to main such as
main(42);
is a constraint violation if you use int main(void), but not if you use int main().
For example, these two programs:
int main() {
if (0) main(42); /* not a constraint violation */
}
int main(void) {
if (0) main(42); /* constraint violation, requires a diagnostic */
}
are not equivalent.
If the implementation documents that it accepts int main() as an extension, then this doesn't apply for that implementation.
This is an extremely nitpicking point (about which not everyone agrees), and is easily avoided by declaring int main(void) (which you should do anyway; all functions should have prototypes, not old-style declarations/definitions).
In practice, every compiler I've seen accepts int main() without complaint.
To answer the question that was intended:
Once that change is made, the program's behavior is well defined if v==0, and is undefined if v!=0. Yes, the definedness of the program's behavior depends on user input. There's nothing particularly unusual about that.

Let me give an argument for why I think this is still undefined.
First, the responders saying this is "mostly defined" or somesuch, based on their experience with some compilers, are just wrong. A small modification of your example will serve to illustrate:
#include <stdio.h>
int
main()
{
int v;
scanf("%d", &v);
if (v != 0)
{
printf("Hello\n");
int *p;
*p = v; // Oops
}
return v;
}
What does this program do if you provide "1" as input? If you answer is "It prints Hello and then crashes", you are wrong. "Undefined behavior" does not mean the behavior of some specific statement is undefined; it means the behavior of the entire program is undefined. The compiler is allowed to assume that you do not engage in undefined behavior, so in this case, it may assume that v is non-zero and simply not emit any of the bracketed code at all, including the printf.
If you think this is unlikely, think again. GCC may not perform this analysis exactly, but it does perform very similar ones. My favorite example that actually illustrates the point for real:
int test(int x) { return x+1 > x; }
Try writing a little test program to print out INT_MAX, INT_MAX+1, and test(INT_MAX). (Be sure to enable optimization.) A typical implementation might show INT_MAX to be 2147483647, INT_MAX+1 to be -2147483648, and test(INT_MAX) to be 1.
In fact, GCC compiles this function to return a constant 1. Why? Because integer overflow is undefined behavior, therefore the compiler may assume you are not doing that, therefore x cannot equal INT_MAX, therefore x+1 is greater than x, therefore this function can return 1 unconditionally.
Undefined behavior can and does result in variables that are not equal to themselves, negative numbers that compare greater than positive numbers (see above example), and other bizarre behavior. The smarter the compiler, the more bizarre the behavior.
OK, I admit I cannot quote chapter and verse of the standard to answer the exact question you asked. But people who say "Yeah yeah, but in real life dereferencing NULL just gives a seg fault" are more wrong than they can possibly imagine, and they get more wrong with every compiler generation.
And in real life, if the code is dead you should remove it; if it is not dead, you must not invoke undefined behavior. So that is my answer to your question.

If v is 0, your random pointer assignment never gets executed, and the function will return zero, so it is not undefined behaviour

When you declare variables (especially explicit pointers), a piece of memory is allocated (usually an int). This peace of memory is being marked as free to the system but the old value stored there is not cleared (this depends on the memory allocation being implemented by the compiler, it might fill the place with zeroes) so your int *p will have a random value (junk) which it has to interpret as integer. The result is the place in memory where p points to (p's pointee). When you try to dereference (aka. access this piece of the memory), it will be (almost every time) occupied by another process/program, so trying to alter/modify some others memory will result in access violation issues by the memory manager.
So in this example, any other value then 0 will result in undefined behavior, because no one knows what *p will point to at this moment.
I hope this explanation is of any help.
Edit: Ah, sorry, again few answers ahead of me :)

It is simple. If a piece of code doesn't execute, it doesn't have a behavior!!!, whether defined or not.
If input is 0, then the code inside if doesn't run, so it depends on the rest of the program to determine whether the behavior is defined (in this case it is defined).
If input is not 0, you execute code that we all know is a case of undefined behavior.

I would say it makes the whole program undefined.
The key to undefined behavior is that it is undefined. The compiler can do whatever it wants to when it sees that statement. Now, every compiler will handle it as expected, but they still have every right to do whatever they want to - including changing parts unrelated to it.
For example, a compiler may choose to add a message "this program may be dangerous" to the program if it detects undefined behavior. This would change the output whether or not v is 0.

Your program is pretty-well defined. If v == 0 then it returns zero. If v != 0 then it splatters over some random point in memory.
p is a pointer, its initial value could be anything, since you don't initialise it. The actual value depends on the operating system (some zero memory before giving it to your process, some don't), your compiler, your hardware and what was in memory before you ran your program.
The pointer assignment is just writing into a random memory location. It might succeed, it might corrupt other data or it might segfault - it depends on all of the above factors.
As far as C goes, it's pretty well defined that unintialised variables do not have a known value, and your program (though it might compile) will not be correct.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight