Why does this program segfault upon printf? [duplicate] - c

I'm new to C++ and just trying to get a hang of it. It generally seems not too bad, but I stumbled upon this weird/pathological segfaulting behavior:
int main () {
int* b;
*b = 27;
int c = *b;
cout << "c points to " << c << endl; //OK
printf( "b points to %d\n", *b); //OK
// cout << "b points to " << (*b) << endl; - Not OK: segfaults!
return 0;
}
This program, as given, produces what you'd expect:
c points to 27
b points to 27
On the other hand, if you uncomment the second-to-last line, you get a program that crashes (seg-fault) in runtime. Why? This is a valid pointer.

int* b points to an unknown memory address because it wasn't initialized. If you initialized it to whatever null pointer value exists for your compiler (0 until C++11, nullptr in C++11 and newer), you'd most certainly get a segfault earlier. The problem lies in the fact that you allocated space for the pointer but not the data it points to. If you instead did this:
int c = 27;
int* b = &c;
cout << "c points to " << c << endl;
printf ("b points to %d\n", *b);
cout << "b points to " << (*b) << endl;
Things would work because int* b refers to a memory location that is accessible by your program (since the memory is actually a part of your program).
If you leave a pointer uninitialized or assign a null value to it, you can't use it until it points to a memory address that you KNOW you can access. For example, using dynamic allocation with the new operator will reserve memory for the data for you:
int* b = new int();
*b = 27;
int c = *b;
//output
delete b;

Update 3
My answer to Where exactly does C++ standard say dereferencing an uninitialized pointer is undefined behavior? gives a much better answer to why using an uninitialized pointer is undefined behavior. The basic logic from the C++ draft standard, section 24.2 Iterator requirements, specifically section 24.2.1 In general paragraph 5 and 10 which respectively say (emphasis mine):
[...][ Example: After the declaration of an uninitialized pointer x (as with int* x;), x must always be assumed to have a singular value of a pointer. —end example ] [...] Dereferenceable values are always non-singular.
Update 2
This was originally an answer to a C question with nearly identical circumstances but the original question I answered was merged with this one. I am updating my answer to include an answer specific to the new question and to the C++ draft standard.
b has not be initialized and therefore it's value is indeterminate but you used indirection on b which is undefined behavior.
One possible simple fix would be to assign b to the address of an existing variable, for example:
int a ;
int* b = &a;
Another option would have been to use dynamic allocation via new.
For completeness sake we can see this is undefined behavior by going to the draft C++ standard section 5.3.1 Unary operators paragraph 1 which says(emphasis mine):
The unary * operator performs indirection: the expression to which it is applied shall be a pointer to an object type, or a pointer to a function type and the result is an lvalue referring to the object or function to which the expression points.[...]
and if we then go to section 3.10 Lvalues and rvalues paragraph 1 says(emphasis mine):
An lvalue (so called, historically, because lvalues could appear on the left-hand side of an assignment expression) designates a function or an object. [...]
but b does not point to a valid object.
Original Answer
You did not allocate any memory to f nor b but you used indirection on both which is undefined behavior.
Update
It is worth noting that cranking up the warning levels should have indicated this was a problem, for example using gcc -Wall gives me the following warning for this code:
warning: 'f' is used uninitialized in this function [-Wuninitialized]
The simplest fix would be to assign f to point to a valid object like so:
char a ;
char *f = &a ;
Another options would be to use dynamic allocation, if you don't have a handy reference the C FAQ is not a bad place to start.
for completeness sake, if we look at the C99 draft standard Annex J.2 Undefined behavior paragraph 1 says:
The behavior is undefined in the following circumstances:
and includes the following bullet:
The value of an object with automatic storage duration is used while it is
indeterminate (6.2.4, 6.7.8, 6.8).
The value of f and b are both automatic variables and are indeterminate since they are not initialized.
It is not clear from reading the referenced sections which statement makes it undefined but section 6.5.2.5 Compound literals paragraph 17 which is part of normative text has an example with the following text which uses the same language and says:
[...]next time around p would have an indeterminate value, which would result in undefined behavior.
In the C11 draft standard the paragraph is 16.

The pointer is valid in as much it's got a value. But the memory is probably not. It's your OS telling you that you are touching memory which isn't yours.
I'm frankly surprised it doesn't crash earlier than that.
Here's why:
int* b; // b is uninitialized.
*b = 27;
Where does b point? It might be somewhere valid, or somewhere totally off-limits. You can usually bet on the latter.
Here's a better way to do what you want.
int b1 = 27;
int *b = &b1;
Now b points to the location on the stack where b1s value is stored.

This is because f is a pointer and it need to be allocated some memory for it.

General rule: initialize variable before using it
char* f; is a variable. *f is usage of this variable. Like any variable, f must be initialized before usage.

Related

Question on "array objects" and undefined behavior

In C, suppose for a pointer p we do *p++ = 0. If p points to an int variable, is this defined behavior?
You can do arithmetic resulting in pointing one past the end of an "array object" per the standard, but I am unable to find a really precise definition of "array object" in the standard. I don't think in this context it means just an object explicitly defined as an array, because p=malloc(sizeof(int)); ++p; pretty clearly is intended to be defined behavior.
If a variable does not qualify as an "array object", then as far as I can tell *p++ = 0 is undefined behavior.
I am using the C23 draft, but an answer citing the C11 standard would probably answer the question too.
Yes it is well-defined. Pointer arithmetic is defined by the additive operators so that's where you need to look.
C17 6.5.6/7
For the purposes of these operators, a pointer to an object that is not an element of an array behaves
the same as a pointer to the first element of an array of length one with the type of the object as its
element type.
That is, int x; is to be regarded as equivalent to int x[1]; for the purpose of determining valid pointer arithmetic.
Given int x; int* p = &x; *p++ = 0; then it is fine to point 1 item past it but not to de-reference that item:
C17 6.5.6/8
If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation
shall not produce an overflow; otherwise, the behavior is undefined. If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.
This behavior has not changed in the various revisions of the standard. It's the very same from C90 to C23.
There are two separate questions: 1. What constructs does the Standard specify that correct conforming implementations should process meaningfully, and 2. What constructs do clang and gcc actually process meaningfully. The clear intention of the Standard is to define the behavior of a pointer "one past" an array object and a pointer to the start of another array object that happens to immediately follow it. The actual behavior of clang and gcc tells another story, however.
Given the source code:
#include <stdint.h>
extern int x[],y[];
int test1(int *p)
{
y[0] = 1;
if (p == x+1)
*p = 2;
return y[0];
}
int test2(int *p)
{
y[0] = 1;
uintptr_t p1 = 3*(uintptr_t)(x+1);
uintptr_t p2 = 5*(uintptr_t)p;
if (5*p1 == 3*p2)
*p = 2;
return y[0];
}
both clang and gcc will recognize in both functions that the *p=2 assignment will only run if p happens to be equal to a one-past pointer to x, and will conclude as a consequence that it would be impossible for p to equal y. Construction of an executable example where clang and gcc would erroneously make this assumption is difficult without the ability to execute a program containing two compilation units, but examination of the generated machine code at https://godbolt.org/z/x78GMqbrv will reveal that every ret instruction is immediately preceded by mov eax,1, which loads the return value with 1.
Note that the code in test2 doesn't compare pointers, nor even compare integers that are directly formed from pointers, but the fact that clang and gcc are able to show that the numbers being compared can only be equal if the pointers happened to be equal is sufficient for test2() to, as perceived by clang or gcc, invoke UB if the function is passed a pointer to y, and y happens to equal x+1.

Normal Structure Variable A decay Pointer or A Stack Frame

struct student
{
int roll_no;
int physics;
int chem;
float percentage;
};
int main()
{
struct student s = {1, 96, 98, 97.00};
printf("%d", *((int *) &s + 1));
}
Here in the above code we know that 's' is a structure variable and what my question is
whether 's' is a decay pointer or a separate stack frame is getting allocated for the structure, if it is a decay pointer we don't have to put that '&' with the 's' so i'm guessing it must be a stackframe cause it behaves just like a normal variable.
if we print 's' first members value gets printed and if we print '&s' first members address gets printed.
If it is anything other than this kindly help us to understand that too.
And also,
How to access if a Char array is a member of the structure, tried this *((char*)&s + offset_value); not working.
Your help will be much appreciated.
whether 's' is a decay pointer…
s is a structure. Structures are not automatically converted to pointers the way arrays are. (Some people say “decay,” but this is a colloquial term and is inaccurate.)
… or a separate stack frame is getting allocated for the structure,…
With s defined as you show inside main, the compiler automatically allocates space for it. In common C implementations, the compiler uses stack space for this (except that optimization may result in alternatives including using processor registers instead, incorporating use of the structure into other expressions, or eliminating part or all of the structure entirely). If the compiler does use stack space for s, it uses space within the stack frame for main; it does not allocate a separate stack frame.
(Generally, a stack frame is the space on the stack used by one function, sometimes with some modifications such as “leaf” functions that do not call other routines and may not establish a full stack frame.)
printf("%d", *((int *) &s + 1));
*((int *) &s + 1) is not proper C code because its behavior is not defined by the C standard and likely not by your compiler either.
&s is the address of the structure s. When we convert it to (int *), that gives the address of the first member of the structure, s.roll_no. That is because the C standard gives us a rule that says we can convert a pointer to a structure to a pointer to its first member. C 2018 6.7.2.1 15 says:
… A pointer to a structure object, suitably converted, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa…
Then you add 1 to this pointer, (int *) &s + 1. That also has defined behavior. From above (int *) &s points to s.roll_no, so it is equivalent to &s.roll_no. Then (int *) &s + 1 is equivalent to &s.roll_no + 1. C 2018 6.5.6 specifies what happens with addition. I will omit specifics of the rules, but essentially &s.roll_no + 1 is defined and produces a pointer that points just beyond s.roll_no.
However, even if the next member of the structure, physics, is just beyond s.roll_no1, the C standard does not define what happens when * is applied to it. This is because C 2018 6.5.6 8 says:
If the result [of the addition] points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.
In the context of 6.5.6, s.roll_no is an “array object” that contains just one element, and &s.roll_no + 1 points one beyond its last element. So *(&s.roll_no + 1) and, equivalently, *((int *) &s + 1) violates this rule. C 2018 4 2 says that when a rule like this is violated, the behavior is not defined:
If a "shall" or "shall not" requirement that appears outside of a constraint or runtime-constraint is violated, the behavior is undefined…
Footnote
1 This is not guaranteed because C implementations are allowed to insert padding between members in structures. However, typical C implementations will not insert padding between members of identical types, as it is not needed for alignment, and this can be checked at compile time. But the statements above apply even if there is no padding.
Structs parameters are passed by value unless the & operator is used. Arrays parameters are passed as pointers. A local variable is in the stack frame (or a register).
You might find the offsetof macro interesting
#define offsetof(S, F) (((char *)&((S *)0)->F - (char *)0))

Is reading an uninitialized value always an undefined behaviour? Or are there exceptions to it?

An obvious example of undefined behavior (UB), when reading a value, is:
int a;
printf("%d\n", a);
What about the following examples?
int i = i; // `i` is not initialized when we are reading it by assigning it to itself.
int x; x = x; // Is this the same as above?
int y; int z = y;
Are all three examples above also UB, or are there exceptions to it?
Each of the three lines triggers undefined behavior. The key part of the C standard, that explains this, is section 6.3.2.1p2 regarding Conversions:
Except when it is the operand of the sizeof operator, the
_Alignof operator, the unary & operator, the ++ operator, the
-- operator, or the left operand of the . operator or an
assignment operator, an lvalue that does not have array type
is converted to the value stored in the designated object
(and is no longer an lvalue); this is called lvalue
conversion. If the lvalue has qualified type, the value has
the unqualified version of the type of the lvalue; additionally,
if the lvalue has atomic type, the value has the non-atomic version
of the type of the lvalue; otherwise, the value has the
type of the lvalue. If the lvalue has an incomplete type and does
not have array type, the behavior is undefined. If the lvalue
designates an object of automatic storage duration that could
have been declared with the register storage class (never had its
address taken), and that object is uninitialized (not declared
with an initializer and no assignment to it has been
performed prior to use), the behavior is undefined.
In each of the three cases, an uninitialized variable is used as the right-hand side of an assignment or initialization (which for this purpose is equivalent to an assignment) and undergoes lvalue to rvalue conversion. The part in bold applies here as the objects in question have not been initialized.
This also applies to the int i = i; case as the lvalue on the right side has not (yet) been initialized.
There was debate in a related question that the right side of int i = i; is UB because the lifetime of i has not yet begun. However, that is not the case. From section 6.2.4 p5 and p6:
5 An object whose identifier is declared with no linkage and without the storage-class specifier static has automatic
storage duration, as do some compound literals. The result of
attempting to indirectly access an object with automatic storage
duration from a thread other than the one with which the object is
associated is implementation-defined.
6 For such an object that does not have a variable length array type, its lifetime extends from entry into the block
with which it is associated until execution of that block ends in any
way. (Entering an enclosed block or calling a function
suspends, but does not end,execution of the current block.) If
the block is entered recursively, a new instance of the object is
created each time. The initial value of the object is
indeterminate. If an initialization is specified for the
object, it is performed each time the declaration or compound
literal is reached in the execution of the block; otherwise,
the value becomes indeterminate each time the declaration is reached
So in this case the lifetime of i begins before the declaration in encountered. So int i = i; is still undefined behavior, but not for this reason.
The bolded part of 6.3.2.1p2 does however open the door for use of an uninitialized variable not being undefined behavior, and that is if the variable in question had it's address taken. For example:
int a;
printf("%p\n", (void *)&a);
printf("%d\n", a);
In this case it is not undefined behavior if:
The implementation does not have trap representations for the given type, OR
The value chosen for a happens to not be a trap representation.
In which case the value of a is unspecified. In particular, this will be the case with GCC and Microsoft Visual C++ (MSVC) in this example as these implementations do not have trap representations for integer types.
Use of the not initialized automatic storage duration objects invokes UB.
Use of the not initialized static storage duration objects is defined as they are initialized to 0s
int a;
int foo(void)
{
static int b;
int c;
int d = d; //UB
static int e = e; //OK
printf("%d\n", a); //OK
printf("%d\n", b); //OK
printf("%d\n", c); //UB
}
In cases where an action on an object of some type might have unpredictable consequences on platforms where the type has trap representations, but have at-least-somewhat predictable behavior for types that don't, the Standard will seek to avoid distinguishing platforms that do or don't define the behavior by throwing everything into the catch-all category of "Undefined Behavior".
With regard to the behavior of uninitialized or partially-initialized objects, I don't think there's ever been a consensus over exactly which corner cases must be treated as though objects were initialized with Unspecified bit patterns, and which cases need not be treated in such fashion.
For example, given something like:
struct ztstr15 { char dat[16]; } x,y;
void test(void)
{
struct zstr15 hey;
strcpy(hey.dat, "Hey");
x=hey;
y=hey;
}
Depending upon how x and y will be used, there are at least four ways it might be useful to have an implementation process the above code:
Squawk if an attempt is made to copy any automatic-duration object that isn't fully initialized. This could be very useful in cases where one must avoid leakage of confidential information.
Zero-fill all unused portions of hey. This would prevent leakage of confidential information on the stack, but wouldn't flag code that might cause such leakage if the data weren't zero-filled.
Ensure that all parts of x and y are identical, without regard for whether the corresponding members of hey were written.
Write the first four bytes of x and y to match those of hey, but leave some or all of the remaining portions holding whatever they held before test() was called.
I don't think the Standard was intended to pass judgment as to whether some of those approaches would be better or worse than others, but it would have been awkward to write the Standard in a manner that would define behavior of test() while allowing for option #3. The optimizations facilitated by #3 would only be useful if programmers could safely write code like the above in cases where client code wouldn't care about the contents of x.dat[4..15] and y.dat[4..15]. If the only way to guarantee anything about the behavior of that function would be to write all portions of hey were written, including those whose values would be irrelevant to program behavior, that would nullify any optimization advantage approach #3 could have offered.

Variables in memory

If i have two variables a i b both int, and one pointer ptr that points to &b. If we would increment ptr++ like that it should be pointing at a,if i'm not wrong. I thought it's possible because when compiling a i b are in stack and b has 4 bytes less than a. But when i print that pointer in next line i only get address.
Code:
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
int a = 52;
int b = 12;
int *ptr;
ptr = &b;
printf("%d\n",*ptr);
ptr++;
printf("\n%d",*ptr);
return 0;
}
but if i put printf("%d",&a); then last printf is printed good and it prints value of a
Code:
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
int a = 52;
int b = 12;
printf("%d\n",&a);
int *ptr;
ptr = &b;
printf("%d\n",*ptr);
ptr++;
printf("\n%d",*ptr);
return 0;
}
Can someone explain me why this happens?
Pictures:
The compiler is free to arrange local variables in any order it chooses on the stack. In fact the C standard doesn't even mention a stack. That's an implementation detail left up to the compiler.
Adding a seemingly unrelated line of code can result in the compiler deciding to place variables on the stack in a different order than it did without the additional code. So you can't depend on this behavior when writing your code. Doing so is undefined behavior, which you have experienced.
Also, performing pointer arithmetic on variables that are not part of the same array is also undefined behavior.
C11 draft standard n1570:
6.5.2.4 Postfix increment and decrement operators
2
[...] See the discussions of additive operators and compound assignment for
information on constraints, types, and conversions and the effects of operations on
pointers.[...]
6.5.6 Additive operators
7
For the purposes of these operators, a pointer to an object that is not an element of an
array behaves the same as a pointer to the first element of an array of length one with the
type of the object as its element type.
8
[...] If both the pointer
operand and the result point to elements of the same array object, or one past the last
element of the array object, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined. If the result points one past the last element of the array object, it
shall not be used as the operand of a unary * operator that is evaluated.
After ptr = &b; and ptr++;, dereferencing ptr in printf("\n%d",*ptr); is undefined behavior.
You can't guarantee that a and b variables are stored anywhere near in memory, and it's plain unsafe to try to "travel" from one to another by pointer increments, and rely on the results. What you're doing is dwelling into the realm of undefined behavior, you shouldn't do that.

Restricted pointer questions

I'm a little confused about the rules regarding restricted pointers. Maybe someone out there can help me out.
Is it legal to define nested restricted pointers as follows:
int* restrict a;
int* restrict b;
a = malloc(sizeof(int));
// b = a; <-- assignment here is illegal, needs to happen in child block
// *b = rand();
while(1)
{
b = a; // Is this legal? Assuming 'b' is not modified outside the while() block
*b = rand();
}
Is it legal to derive a restricted pointer value as follows:
int* restrict c;
int* restrict d;
c = malloc(sizeof(int*)*101);
d = c;
for(int i = 0; i < 100; i++)
{
*d = i;
d++;
}
c = d; // c is now set to the 101 element, is this legal assuming d isn't accessed?
*c = rand();
Thanks!
Andrew
For reference, here's the restrict qualifier's rather convoluted definition (from C99 6.7.3.1 "Formal definition of restrict"):
Let D be a declaration of an ordinary
identifier that provides a means of
designating an object P as a
restrict-qualified pointer to type T.
If D appears inside a block and
does not have storage class
extern, let B denote the block. If D
appears in the list of parameter
declarations of a function
definition, let B denote the
associated block. Otherwise, let B
denote the block of main (or the block
of whatever function is called at
program startup in a freestanding
environment).
In what follows, a pointer
expression E is said to be based on
object P if (at some sequence point
in the execution of B prior to the
evaluation of E) modifying P to point
to a copy of the array object into
which it formerly pointed would change
the value of E. Note that "based" is
defined only for expressions with
pointer types.
During each execution of B, let L be
any lvalue that has &L based on P. If
L is used to access the value of the
object X that it designates, and X is
also modified (by any means), then the
following requirements apply: T shall
not be const-qualified. Every other
lvalue used to access the value of X
shall also have its address based on
P. Every access that modifies X shall
be considered also to modify P, for
the purposes of this subclause. If P
is assigned the value of a pointer
expression E that is based on another
restricted pointer object P2,
associated with block B2, then either
the execution of B2 shall begin before
the execution of B, or the
execution of B2 shall end prior to
the assignment. If these
requirements are not met, then the
behavior is undefined.
Here an execution of B means that
portion of the execution of the
program that would correspond to the
lifetime of an object with scalar type
and automatic storage duration
associated with B.
My reading of the above means that in your first question, a cannot be assigned to b, even inside a "child" block - the result is undefined. Such an assignment could be made if b were declared in that 'sub-block', but since b is declared at the same scope as a, the assignment cannot be made.
For question 2, the assignments between c and d also result in undefined behavior (in both cases).
The relevant bit from the standard (for both questions) is:
If P is assigned the value of a
pointer expression E that is based on
another restricted pointer object P2,
associated with block B2, then either
the execution of B2 shall begin before
the execution of B, or the
execution of B2 shall end prior to
the assignment.
Since the restricted pointers are associated with the same block, it's not possible for block B2 to begin before the execution of B, or for B2 to end prior to the assignment (since B and B2 are the same block).
The standard gives an example that makes this pretty clear (I think - the clarity of the restrict definition's 4 short paragraphs is on par with C++'s name resolution rules):
EXAMPLE 4:
The rule limiting assignments between
restricted pointers does not
distinguish between a function call
and an equivalent nested block.
With one exception, only
"outer-to-inner" assignments between
restricted pointers declared in nested
blocks have defined behavior.
{
int * restrict p1;
int * restrict q1;
p1 = q1; // undefined behavior
{
int * restrict p2 = p1; // valid
int * restrict q2 = q1; // valid
p1 = q2; // undefined behavior
p2 = q2; // undefined behavior
}
}
The restrict type qualifier is an indication to the compiler that, if the memory addressed by the restrict-qualified pointer is modified, no other pointer will access that same memory. The compiler may choose to optimize code involving restrict-qualified pointers in a way that might otherwise result in incorrect behavior. It is the responsibility of the programmer to ensure that restrict-qualified pointers are used as they were intended to be used. Otherwise, undefined behavior may result. (link)
As you can see from the above description, both your assignments are illegal, that may work in executables produced by some compilers but break in others. Don't expect the compiler itself to emit errors or warnings as restrict just gives an opportunity to perform certain optimization, which it can choose not to perform, like in the case of volatile.

Resources