Using malloc with static pointers - c

I know that declaring a ststic variable and initializing it in this waystatic int *st_ptr = malloc(sizeof(int)); will generate a compile error message(Type initializer element is not constant),and solving this by using separate statements in this way static int *st_ptr;
st_ptr = malloc(5*sizeof(int));
i need to understand the difference between initialization operator and assignment operator in this case ?and why this way solved the problem ?

First, let's have a brief on initialization vs. assignment.
Initialization:
This is used to specify the initial value of an object. Usually, this means, only at the time of defining a variable, initialization takes place. The value to initialize the object is called an initalizer. From C11 , chapter 6.7.9,
An initializer specifies the initial value stored in an object.
Assignment:
Assignment is assigning (or setting) the value of a variable, at any (valid) given point of time of execution. Quoting the standard, chapter 6.5.16,
An assignment operator stores a value in the object designated by the left operand.
In case of simple assignment (= operator),
In simple assignment (=), the value of the right operand is converted to the type of the
assignment expression and replaces the value stored in the object designated by the left
operand.
That said, I think, your query has to do with the initialization of static object.
For the first case,
static int *st_ptr = malloc(sizeof(int));
Quoting from C11 standard document, chapter §6.7.9, Initialization, paragraph 4,
All the expressions in an initializer for an object that has static or thread storage duration
shall be constant expressions or string literals.
and regarding the constant expression, from chapter 6.6 of the same document, (emphasis mine)
Constant expressions shall not contain assignment, increment, decrement, function-call,
or comma operators, except when they are contained within a subexpression that is not
evaluated.
clearly, malloc(sizeof(int)); is not a constant expression, so we cannot use it for initialization of a static object.
For the second case,
static int *st_ptr;
st_ptr = malloc(5*sizeof(int));
you are not initializing the static object. You're leaving it uninialized. Next instruction, you're assigning the return value of malloc() to it. So your compiler does not produce any complains.

when a variable is declared static inside a function , it is created in either the "data segment" or the "bss segment" , depends if it were initialized or not. this variable is created in the binaries and must have a constant value - remember - static variables inside a function are created when the program goes on even before the main() starts , it can't be initialized with any function since the program does not 'run' yet(there is no kind of evaluations or function calls) so the initializer must be constant or not initialize at the first place.
static int *st_ptr = malloc(sizeof(int));
here, you bind the creation of st_ptr with malloc , but since malloc is a function that needs to run and st_ptr must be created before any other function runs - this creates impossible state
static int *st_ptr;
st_ptr = malloc(5*sizeof(int));
here, the st_ptr is created and left un-initialize, the creation of it is not bound to any function.
each time the function runs - malloc takes place. so the activation of malloc and creation st_ptr are not depended.
but as I stated in the comment - this is extremely dangerous practice. you allocate more and more memory on the same variable. the only way to avoid it is to free(st_ptr) in the end of every function. this said - you don't need it to be static at the first place

Roughly, initialization in C is when the compiler outputs binary data to executable file; assignment is the operation performed by actual executable code.
So, static int i = 5 makes the compiler to output data word 5 to executable file's data section; while int i = func() makes the compiler to generate several CPU instructions as call to call subroutine and mov to store the result.
Thus the expression static int i = func() requires both 1) to be calculated earlier than main() (as this is an initialization), 2) a piece of user code to execute (which may only make sense in the context of the new program instance). It's possible to solve that issue by creating some hidden initialization subroutine which executes before main(). Actually, C++ does this. But C has no such feature, so static variables may be initialized only with constants.

Related

Is the C11 formal definition of restrict consistent with implementation?

In trying to answer a recent question (Passing restrict qualified pointers to functions?), I could not find how the C11 standard is consistent with practice.
I'm not trying to call out the standard or anything, most things that look inconsistent I just am not understanding right, but my question is best posed as an argument against the definition used in the standard, so here it is.
It seems to be commonly accepted that a function can take a restrict qualified pointer and both work on it and have its own function calls work on it. For example,
// set C to componentwise sum and D to componentwise difference
void dif_sum(float* restrict C, float* restrict D, size_t n)
{
size_t i = 0;
while(i<n) C[i] = C[i] - D[i],
D[i] += C[i] + D[i],
++i;
}
// set A to componentwise sum of squares of A and B
// set B to componentwise product of A and B
void prod_squdif(float* restrict A, float* restrict B, size_t n)
{
size_t i = 0;
float x;
dif_sum(A,B,n);
while(i<n) x = ( (A[i]*=A[i]) - B[i]*B[i] )/2,
A[i] -= x,
B[i++] = x/2;
}
What seems to be the common understanding is that restrict pointers need to reference independent space within their declaring block. So, prod_sqdif is valid because nothing lexically within its defining block accesses the arrays identified by A or B other than those pointers.
To demonstrate my concern with the standard, here is the standard formal definition of restrict (according to the committee draft, if you have the final version and it is different, let me know!):
6.7.3.1 Formal definition of restrict
1 Let D be a declaration of an ordinary identifier that provides a means of designating an object P as a restrict-qualified pointer to type T.
2 If D appears inside a block and does not have storage class extern, let B denote the block. If D appears in the list of parameter declarations of a function definition, let B denote the associated block. Otherwise, let B denote the block of main (or the block of whatever function is called at program startup in a freestanding environment).
3 In what follows, a pointer expression E is said to be based on object P if (at some sequence point in the execution of B prior to the evaluation of E) modifying P to point to a copy of the array object into which it formerly pointed would change the value of E. Note that ‘‘based’’ is defined only for expressions with pointer types.
4 During each execution of B, let L be any lvalue that has &L based on P. If L is used to access the value of the object X that it designates, and X is also modified (by any means), then the following requirements apply: T shall not be const-qualified. Every other lvalue used to access the value of X shall also have its address based on P. Every access that modifies X shall be considered also to modify P, for the purposes of this subclause. If P is assigned the value of a pointer expression E that is based on another restricted pointer object P2, associated with block B2, then either the execution of B2 shall begin before the execution of B, or the execution of B2 shall end prior to the assignment. If these
requirements are not met, then the behavior is undefined.
5 Here an execution of B means that portion of the execution of the program that would correspond to the lifetime of an object with scalar type and automatic storage duration associated with B.
6 A translator is free to ignore any or all aliasing implications of uses of restrict.
[Examples not included because they are not formally significant.]
Identifying execution of B with expressions lexically contained therein might be seen as supported by the following excerpt from 6.2.4, item 6:
"...Entering an enclosed block or calling a function suspends, but does not end,
execution of the current block..."
However, part 5 of the formal definition of restrict explicitly defines the block B to correspond to the lifetime of an object with automatic storage declared in B (in my example, B is the body of prod_squdif). This clearly overrides any definition of the execution of a block found elsewhere in the standard. The following excerpt from the standard defines lifetime of an object.
6.2.4 Storage durations of objects, item 2
The lifetime of an object is the portion of program execution during which storage is guaranteed to be reserved for it. An object exists, has a constant address, and retains its last-stored value throughout its lifetime. 34) If an object is referred to outside of its lifetime, the behavior is undefined. The value of a pointer becomes indeterminate when the object it points to (or just past) reaches the end of its lifetime.
Then the execution of dif_sum is clearly included in the execution of B. I don't think there is any question there. But then the lvalues in dif_sum that read and modify elements of A and B (via C and D) are clearly not based on A and B (they follow sequence points where A and B could have been repointed to copies of their content without changing the locations identified by the lvalues). This is undefined behavior. Note that what lvalues or sequence points are discussed in item 4 is not restricted; as it is stated, there is no reason to restrict lvalues and sequence points to those lexically corresponding to the block B, and so lvalues and sequence points within a function call play just like they do in the body of the calling function.
On the other hand, the generally accepted use of restrict seems implied by the fact that the formal definition explicitly allows C and D to be assigned the values of A and B. This suggests that some meaningful access to A and B through C and D is allowed. However, as argued above, such access is undefined for any element modified through either the outer or inner function call, and at least read by the inner call. This seems contrary to the apparent intent of allowing the assignment in the first place.
Of course, intent has no formal place in the standard, but it does seem suggestive that the common interpretation of restrict, rather than what seems actually defined, is what is intended.
In summary, interpreting the execution of B as the execution of each statement during the lifetime of B's automatic storage, then function calls can't work with the contents of restrict pointers passed to them.
It seems unavoidable to me that there should be some exception stating reads and writes within functions or sub blocks are not considered, but that at most one assignment within such a sub block (and other sub blocks, recursively) may be based on any particular restrict pointer in the outer block.
I have really gone over the standard, both today and yesterday. I really can't see how the formal definition of restrict could possibly be consistent with the way it seems to be understood and implemented.
EDIT: As has been pointed out, violating the restrict contract results in undefined behavior. My question is not about what happens when the contract is violated. My question can be restated as follows:
How can the formal definition of restrict be consistent with access to array elements through function calls? Does such access, within a calling function, not constitute access not based on the restrict pointer passed to the function?
I am looking for an answer based in the standard, as I agree that restrict pointers should be able to be passed through function calls. It just seems that this is not the consequence of the formal definition in the standard.
EDIT
I think the main problem with communicating my question is related to the definition of "based on". I will try to present my question a little differently here.
The following is an informal tracking of a particular call to prod_squdif. This is not intended as C code, it is just an informal description of the execution of the function's block.
Note that this execution includes the execution of the called function, per item 5 of the formal definition of restrict: "Here an execution of B means that portion of the execution of the program that would correspond to the lifetime of an object with scalar type and automatic storage duration associated with B."
// 1. prod_squdif is called
prod_squdif( (float[1]){2}, (float[1]){1}, 1 )
// 2. dif_sum is called
dif_sum(A,B,n) // assigns C=A and D=B
// 3. while condition is evaluated
0<1 // true
// 4. 1st assignment expression
C[0] = C[0] - D[0] // C[0] == 0
// 5. 2nd assignment expression
D[0] += C[0] + D[0] // D[0] == 1
// 6. increment
++i // i == 1
// 7. test
1<1 // false
// return to calling function
// 8. test
0<1 // true
// 9. 1st assignment expression
x = ( (A[0]*=A[0]) - B[1]*B[1] )/2 // x == -.5
// 10. 2nd assignment expression
A[0] -= -.5 // A[0] == .5
// 11. 3rd assignment expression
B[i++/*evaluates to 0*/] = -.5/2 // B[0] == -.25
// 12. test
1<1 // false
// prod_squdif returns
So, the test for the restrict contract is given by item 4 in the formal definition of restrict: "During each execution of B, let L be any lvalue that has &L based on P. If L is used to access the value of the object X that it designates, and X is also modified (by any means), then the following requirements apply: ... Every other lvalue used to access the value of X shall also have its address based on P..."
Let L be the lvalue on the left of the portion of the execution marked '4' above (C[0]). Is &L based on A? I.e., is C based on A?
See item 3 of the formal definition of restrict: "...a pointer expression E is said to be based on object P if (at some sequence point in the execution of B prior to the evaluation of E) modifying P to point to a copy of the array object into which it formerly pointed would change the value of E...".
Take as a sequence point the end of item 3 above. (At this sequence point) modifying A to point to a coppy of the array object into which it formerly pointed would NOT change the value of C.
Thus C is not based on A. So A[0] is modified by an lvalue not based on A. Since it is also read by an lvalue that is based on A (item 10), this is undefined behavior.
My question is: Is it correct to therefore conclude that my example invokes undefined behavior and thus the formal definition of restrict is not consistent with common implementation?
Suppose we have a function with nested blocks like this:
void foo()
{
T *restrict A = getTptr();
{
T *restrict B = A;
{
#if hypothetical
A = copyT(A);
#endif
useTptr(B + 1);
}
}
}
It would seem that, at the point where useTptr(B + 1) is called, the hypothetical change to A would no longer affect the value of B + 1. However, a different sequence point can be found, such that a change to A does affect the value of B + 1:
void foo()
{
T *restrict A = getTptr();
#if hypothetical
A = copyT(A);
#endif
{
T *restrict B = A;
{
useTptr(B + 1);
}
}
}
and C11 draft standard n1570 6.7.3.1 Formal definition of restrict only demands that there be some such sequence point, not that all sequence points exhibit this behavior.
I'm really not sure exactly what your question is.
It sounds like you're asking:
Q: Gee, will "restrict" still apply if I violate the "restrict"
contract? Like in the "remove_zeroes()" example?
The answer, of course, is "No - it won't".
Here are two links that might clarify the discussion. Please update your post with (a) more explicit question(s):
Realistic usage of the C99 'restrict' keyword?
Is it legal to assign a restricted pointer to another pointer, and use the second pointer to modify the value?
https://en.wikipedia.org/wiki/Restrict

Is there anything wrong with `something_t* x = malloc(sizeof(*x))`?

I'm writing some extremely repetitive code in C (reading XML), and I found that writing my code like makes it easier to copy and paste code in a constructor*:
something_t* something_new(void)
{
something_t* obj = malloc(sizeof(*obj));
/* initialize */
return obj;
}
What I'm wondering is, it is safe to use sizeof(*obj) like this, when I just defined obj? GCC isn't showing any warnings and the code works fine, but GCC tends to have "helpful" extensions so I don't trust it.
* And yes, I realize that I should have just written a Python program to write my C program, but it's almost done already.
something_t* obj = malloc(sizeof(*obj));
What I'm wondering is, it is safe to use sizeof(*obj)
like this, when I just defined obj?
You have a declaration consisting of:
type-specifier something_t
declarator * obj
=
initializer malloc(sizeof(*obj))
;
The C standard says in section Scopes of identifiers:
Structure, union, and enumeration tags have scope that begins just
after the appearance of the tag in a type specifier that declares the
tag. Each enumeration constant has scope that begins just after the
appearance of its defining enumerator in an enumerator list. Any other
identifier has scope that begins just after the completion of its
declarator.
Since obj has scope that begins just after the completion of its declarator, it is guaranteed by the standard that the identifier used in the initializer refers to the just defined object.
While we are giving the sizeof like this.
something_t* obj = malloc(sizeof(obj));
It will allocate the memory to that pointer variable as four bytes( bytes allocated to a pointer variable.)
something_t* obj = malloc(sizeof(*obj));
It will take the data type which is declared to that pointer.
For example,
char *p;
printf("%d\n",sizeof(p));
It will return the value as four.
printf("%d\n",sizeof(*p));
Now it will return value as one. Which is a byte allocated to the character. So when we are using the *p in sizeof it will take the datatype. I don't know this is the answer you are expecting.
It's safe. For example,
n = sizeof(*(int *)NULL);
In this case, NULL pointer access doesn't occur, because a compiler can caluculate the size of the operand without knowing the value of "*(int *)NULL" in run time.
The C89/90 standard guarantees the 'sizeof' expression is a constant one; it is translated into a constant (e.g. 0x04) and embedded into a binary code in compilation stage.
In C99 standard, the 'sizeof' expression is not always a compile-time constant, because of introducing variable length array. For example,
n = sizeof(int [*(int *)NULL]);
In this case, the value of "*(int *)NULL" needs to be known in run time to caluculate the size of 'int[]'.

Safe to pass pointer to auto variable to function?

Suppose I have a function that declares and initializes two local variables – which by default have the storage duration auto. This function then calls a second function, to which it passes the addresses of these two local variables. Can this second function safely use these pointers?
A trivial programmatic example, to supplement that description:
#include <stdio.h>
int adder(int *a, int *b)
{
return *a + *b;
}
int main()
{
auto int a = 5; // `auto' is redundant; included for clarity
auto int b = 3;
// adder() gets the addresses of two auto variables! is this an issue?
int result = adder(&a, &b);
printf("5 + 3 = %d\n", result);
return 0;
}
This program works as expected, printing 5 + 3 = 8.
Usually, when I have questions about C, I turn to the standard, and this was no exception. Specifically, I checked ISO/IEC 9899, §6.2.4. It says there, in part:
4
An object whose identifier is declared with no linkage and without
the storage-class specifier static has automatic storage duration.
5
For such an object that does not have a variable length array type,
its lifetime extends from entry into the block with which it is
associated until execution of that block ends in any way. (Entering an
enclosed block or calling a function suspends, but does not end,
execution of the current block.) If the block is entered recursively,
a new instance of the object is created each time. The initial value
of the object is indeterminate. If an initialization is specified for
the object, it is performed each time the declaration is reached in
the execution of the block; otherwise, the value becomes indeterminate
each time the declaration is reached.
Reading this, I reason the following points:
Variables a and b have storage duration auto, which I've made explicit using the auto keyword.
Calling the adder() function corresponds to the parenthetical in clause 5, in the partial quote above. That is, entering the adder() function "suspends, but does not end," the execution of the current block (which is main()).
Since the main() block is not "end[ed] in any way," storage for a and b is guaranteed. Thus, accessing them using the addresses &a and &b, even inside adder(), should be safe.
My question, then, is: am I correct in this? Or am I just getting "lucky," and accessing memory locations that, by happenstance, have not been overwritten?
P.S. I was unable to find an exact answer to this question through either Google or SO's search. If you can, mark this as a duplicate and I'll delete it.
Yes, it is safe and basically your assumptions are correct. The lifetime of an automatic object is from the entry in the block where it has been declared until the block terminates.
(C99, 6.2.4p5) "For such an object [...] its lifetime extends from entry into the block with which it is associated until execution of that block ends in any way.
Your reasoning is correct for your particular function call chain, and you have read and quoted the relevant portions of the standard. This is a perfectly valid use of pointers to local variables.
Where you have to be wary is if the function stores the pointer values in a structure that has a lifetime longer than its own call. Consider two functions, foo(), and bar():
int *g_ptr;
void bar (int *p) {
g_ptr = p;
}
void foo () {
int x = 10;
bar(&x);
}
int main () {
foo ();
/* ...do something with g_ptr? */
return 0;
}
In this case, the variable xs lifetime ends with foo() returns. However, the pointer to x has been stored in g_ptr by bar(). In this case, it was an error for foo() to pass a pointer to its local variable x to bar().
What this means is that in order to know whether or not it is valid to pass a pointer to a local variable to a function, you have to know what that function will do with it.
Those variables are allocated in the stack. As long as you do not return from the function that declared them, they remain valid.
As I'm not yet allowed to comment, I'd rather write another answer as amendment to jxh's answer above:
Please see my elaborate answer here for a similar question. This contains a real world example where the aliasing in the called function makes your code break even though it follows all the c-language rules.
Even though it is legal in the C-language I consider it as harmful to pass pointers to automatic variables in a function call. You never know (and often you don't want to know) what exactly the called function does with the passed values. When the called function establishes an alias, you get in big trouble.

Static array initialization in C

I am reading the book Let us C by Yashavant Kanetkar.
In the Array of Pointers section there is a section of code which is giving me problems:
int main()
{
static int a[]={0,1,2,3,4}; //-----------(MY PROBLEM)
int *p[]={a,a+1,a+2,a+3,a+4};
printf("%u %u %d\n",p,*p,*(*p));
return 0;
}
What I don't understand is why has the array a have to be initialized as static. I tried initializing it without the static keyword but I got an error saying "illegal". Please help.
C90 (6.5.7) had
All the expressions in an initializer for an object that has static storage duration or in an initializer list for an object that has aggregate or union type shall be constant expressions.
And you are initializing an object that has an aggregate type, so the value must be known at compile time and the address of automatic variables are not in that case.
Note this has changed in C99 (6.7.8/4)
All the expressions in an initializer for an object that has static storage duration shall be constant expressions or string literals.
The constraint on object with aggregate or union type has been removed and I've not found it placed somewhere else. Your code with static removed should be accepted by a C99 compiler (it is by gcc -std=c99 for instance, which seems to confirm that I've not overlooked a constraint elsewhere).
My guess would be that the contents of an array initialiser have to be a compile-time constant. By using static on a local variable in a function you essentially make that variable global, except with a local scope.

What does the C compiler do with different types of declarations?

I understand this:
int i = 3; // declaration with definition
It tells the compiler to:
Reserve space in memory to hold integer value.
Associate name with memory location.
Store the value 3 at this location.
But what does this declaration tell the compiler:
int i; // declaration
The declaration tells the compiler to reserve space for the variable i and associate the name i with that space (your points 1. and 2.).
If i is a global variable it is initialized to 0.
If it is local the value of i is undefined (probably garbage, ie. some random value) and you should assign to it before reading it.
There are two cases: at file scope (i.e. for a global declaration), and in a function.
In a function, the declaration int i; does two things: it declares a variable called i whose type is int, and it reserves some storage in memory to put a value of type int. What it does not do is give the variable a value. The storage used by i will still contain whatever garbage was there before. You need to initialize the variable, i.e. assign a value to it, before you can read a value from it. Good compilers will warn you if you don't initialize the variable.
At file scope, int i also declares a variable called i. The rest depends on other things: this is known as a tentative definition. You can have multiple such declarations in your file. At most one of these is allowed to have an initializer, making it a full-fleged definition. If none of the declarations of i at file scope have an initializer, the declaration is also a definition, and there is an implicit initialization to 0. Thus:
int i;
/* ... more code ...*/
int i;
is valid, and i will be initialized to 0 (assuming these are the only declarations of i at file scope). Whereas:
int i;
int i = 3;
is also valid, and i will be initialized to 3 when the program starts.
In practice, at file scope, there's often a difference between leaving the initialization implicit and explicitly initializing to 0. Many compilers will store an explicit 0 in the binary, but let the operating system initialize implicit zeroes automatically when the program is loaded. Don't worry about this unless you have a large global array (which shouldn't happen often) or you work on tiny embedded systems.
It says to reserve space for an integer called i. As far as what is in there is up to the compiler and is undefined.
It does the same thing as your previous declaration:
allocates space on the stack for the integer
the compiler associates a name with the space (your running program won't do this, necessarily)
the integer is not initialized.
Others have pretty much answered the question, but I will mention two points that (I think ) haven't been mentioned so far:
int i;
defines i to be an int, with garbage in it (unless i is "global"). Such garbage might be a trap representation, which means that using it could be "bad":
A trap representation is a set of bits which, when interpreted as a value of a specific type, causes undefined behavior. Trap representations are most commonly seen on floating point and pointer values, but in theory, almost any type could have trap representations. An uninitialized object might hold a trap representation. This gives the same behavior as the old rule: access to uninitialized objects produces undefined behavior.
Also, int i; could also be a tentative definition, which means that you're telling the compiler: "i is an int, and I will define it later. If I don't, then define it for me.". Here is a very good explanation of why C has tentative definitions.
There are three kinds of memory for objects:
1) external (often called "global" but that really refers to scope). Objects here are created before running the program; 2) stack (created during run time); 3) heap (eg malloced).
"int i;" either creates the object in the external memory or on the stack. If it's in a function, it's created on the stack (if "static" isn't also used.
Objects created in external memory are initialized to zero if they are not explicitly initialized (e.g, "int i = 3";
You can create an external object in a function by using the "static" keyword.
int a; // external memory with "global" scope. Initialized to 0 implicitly.
static int b; // external memory with file (module) scope. Initialized to 0 implicitly.
int c = 3; // external memory initialized to 3.
f()
{
int d; // created on the stack. Goes away when the block exits. Filled with random trash because there is no initialization.
int e = 4; // stack object initialized to 3.
static int f; // "f" is external but not global. Like all externals, it's implicitly initialized to zero.
static int g = 3; // An external like f but initialized to 3.
}

Resources