No change in heap? - c

I wrote a simple function in order to check if malloc works. I create 1 Gb array, fill it with numbers, but the heap does not seem to change. Here is the code:
#include <stdio.h>
#include <assert.h> // For assert()
#include <stdlib.h> // For malloc(), free() and realloc()
#include <unistd.h> // For sleep()
static void create_array_in_heap()
{
double* b;
b = (double*)malloc(sizeof(double) * 1024 * 1024 * 1024);
assert(b != NULL); // Check that the allocation succeeded
int i;
for (i=0; i<1024*1024*1024; i++);
b[i] = 1;
sleep(10);
free(b);
}
int main()
{
create_array_in_heap();
return 0;
}
screenshot of Linux' system monitor
Any ideas why ?

EDIT: a simpler explanation is given in the comments. But my answer applies once the ; has been removed.
An agressive optimizing compiler, such as Clang (Compiler Explorer link), can see that the only important part of your function create_array_in_heap is the call to sleep. The rest has no functional value, since you only fill a memory block to eventually discard it, and is removed by the compiler. This is the entirety of your program compiled by Clang 7.0.0 with -O2:
main: # #main
pushq %rax
movl $10, %edi
callq sleep
xorl %eax, %eax
popq %rcx
retq
In order to benchmark any aspect of a program, the program should have been designed to output a result (computing and discarding the result is too easy for the compiler to optimize into nothing). The result should also be computed from inputs that aren't known at compile-time, otherwise the computation always produces the same result and can be optimized by constant propagation.

Related

Volatile/modified return address

Consider a C function (with external linkage) like the following one:
void f(void **p)
{
/* do something with *p */
}
Now assume that f is being called in a way such that p points to the return address of f on the stack, as in the following code (assuming the System V AMD64 ABI):
leaq -8(%rsp), %rdi
callq f
What may happen is that the code of f modifies the return address on the stack by assigning a value to *p. Thus the compiler will have to treat the return address on the stack as a volatile value. How can I tell the compiler, gcc in my case, that the return address is volatile?
Otherwise, the compiler could, at least in principle, generate the following code for f:
pushq %rbp
movq 8(%rsp), %r10
pushq %r10
## do something with (%rdi)
popq %r10
popq %rbp
addq 8,%rsp
jmpq *%r10
Admittedly, it is unlikely that a compiler would ever generate code like this but it does not seem to be forbidden without any further function attributes. And this code wouldn't notice if the return address on the stack is being modified in the middle of the function because the original return address is already retrieved at the beginning of the function.
P.S.: As has been suggested by Peter Cordes, I should better explain the purpose of my question: It is about garbage collecting dynamically generated machine code using a moving garbage collector: The function f stands for the garbage collector. The callee of f may be a function whose code is being moved around while f is running, so I came up with the idea of letting f know the return address so that f may modify it accordingly to whether the memory area the return address points to has been moved around or not.
Using the SysV ABI (Linux, FreeBSD, Solaris, Mac OS X / macOS) on AMD64/x86-64, you only need a trivial assembly function wrapped around the actual garbage collector function.
The following f.s defines void f(void *), and calls the real GC, real_f(void *, void **), with the added second parameter pointing to the return address.
.file "f.s"
.text
.p2align 4,,15
.globl f
.type f, #function
f:
movq %rsp, %rsi
call real_f
ret
.size f, .-f
If real_f() already has two other parameters, use %rdx (for the third) instead of %rsi. If three to five, use %rcx, %r8, or %r9, respectively. SysV ABI on AMD64/x86-64 only supports up to six non-floating-point parameters in registers.
Let's test the above with a small example.c:
#include <stdlib.h>
#include <stdio.h>
extern void f(void *);
void real_f(void *arg, void **retval)
{
printf("real_f(): Returning to %p instead of %p.\n", arg, *retval);
*retval = arg;
}
int main(void)
{
printf("Function and label addresses:\n");
printf("%p f()\n", f);
printf("%p real_f()\n", real_f);
printf("%p one_call:\n", &&one_call);
printf("%p one_fail:\n", &&one_fail);
printf("%p one_skip:\n", &&one_skip);
printf("\n");
printf("f(one_skip):\n");
fflush(stdout);
one_call:
f(&&one_skip);
one_fail:
printf("At one_fail.\n");
fflush(stdout);
one_skip:
printf("At one_skip.\n");
fflush(stdout);
return EXIT_SUCCESS;
}
Note that the above relies on both GCC behaviour (&& providing the address of a label) as well as GCC behaviour on AMD64/x86-64 architecture (object and function pointers being interchangeable), as well as the C compiler not making any of the myriad optimizations they are allowed to do to the code in main().
(It does not matter if real_f() is optimized; it's just that I was too lazy to work out a better example in main(). For example, one that creates a small function in an executable data segment that calls f(), with real_f() moving that data segment, and correspondingly adjusting the return address. That would match OP's scenario, and is just about the only practical use case for this kind of manipulation I can think of. Instead, I just hacked a crude example that might or might not work for others.)
Also, we might wish to declare f() as having two parameters (they would be passed in %rdi and %rsi) too, with the second being irrelevant, to make sure the compiler does not expect %rsi to stay unchanged. (If I recall correctly, the SysV ABI lets us clobber it, but I might remember wrong.)
On this particular machine, compiling the above with
gcc -Wall -O0 f.s example.c -o example
running it
./example
produces
Function and label addresses:
0x400650 f()
0x400659 real_f()
0x400729 one_call:
0x400733 one_fail:
0x40074c one_skip:
f(one_skip):
real_f(): Returning to 0x40074c instead of 0x400733.
At one_skip.
Note that if you tell GCC to optimize the code (say, -O2), it will make assumptions about the code in main() it is perfectly allowed to do by the C standard, but which may lead to all three labels having the exact same address. This happens on my particular machine and GCC-5.4.0, and of course causes an endless loop. It does not reflect on the implementation of f() or real_f() at all, only that my example in main() is quite poor. I'm lazy.

incompatible return type from struct function - C

When I attempt to run this code as it is, I receive the compiler message "error: incompatible types in return". I marked the location of the error in my code. If I take the line out, then the compiler is happy.
The problem is I want to return a value representing invalid input to the function (which in this case is calling f2(2).) I only want a struct returned with data if the function is called without using 2 as a parameter.
I feel the only two ways to go is to either:
make the function return a struct pointer instead of a dead-on struct but then my caller function will look funny as I have to change y.b to y->b and the operation may be slower due to the extra step of fetching data in memory.
Allocate extra memory, zero-byte fill it, and set the return value to the struct in that location in memory. (example: return x[nnn]; instead of return x[0];). This approach will use more memory and some processing to zero-byte fill it.
Ultimately, I'm looking for a solution that will be fastest and cleanest (in terms of code) in the long run. If I have to be stuck with using -> to address members of elements then I guess that's the way to go.
Does anyone have a solution that uses the least cpu power?
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef struct{
long a;
char b;
}ab;
static char dat[500];
ab f2(int set){
ab* x=(ab*)dat;
if (set==2){return NULL;}
if (set==1){
x->a=1;
x->b='1';
x++;
x->a=2;
x->b='2';
x=(ab*)dat;
}
return x[0];
}
int main(){
ab y;
y=f2(1);
printf("%c",y.b);
y.b='D';
y=f2(0);
printf("%c",y.b);
return 0;
}
If you care about speed, it is implementation specific.
Notice that the Linux x86-64 ABI defines that a struct of two (exactly) scalar members (that is, integers, doubles, or pointers, -which all fit in a single machine register- but not struct etc... which are aggregate data) is returned thru two registers (without going thru the stack), and that is quite fast.
BTW
if (set==2){return NULL;} //wrong
is obviously wrong. You could code:
if (set==2) return (aa){0,0};
Also,
ab* x=(ab*)dat; // suspicious
looks suspicious to me (since you return x[0]; later). You are not guaranteed that dat is suitably aligned (e.g. to 8 or 16 bytes), and on some platforms (notably x86-64) if dat is misaligned you are at least losing performance (actually, it is undefined behavior).
BTW, I would suggest to always return with instructions like return (aa){l,c}; (where l is an expression convertible to long and c is an expression convertible to char); this is probably the easiest to read, and will be optimized to load the two return registers.
Of course if you care about performance, for benchmarking purposes, you should enable optimizations (and warnings), e.g. compile with gcc -Wall -Wextra -O2 -march=native if using GCC; on my system (Linux/x86-64 with GCC 5.2) the small function
ab give_both(long x, char y)
{ return (ab){x,y}; }
is compiled (with gcc -O2 -march=native -fverbose-asm -S) into:
.globl give_both
.type give_both, #function
give_both:
.LFB0:
.file 1 "ab.c"
.loc 1 7 0
.cfi_startproc
.LVL0:
.loc 1 7 0
xorl %edx, %edx # D.2139
movq %rdi, %rax # x, x
movb %sil, %dl # y, D.2139
ret
.cfi_endproc
you see that all the code is using registers, and no memory is used at all..
I would use the return value as an error code, and the caller passes in a pointer to his struct such as:
int f2(int set, ab *outAb); // returns -1 if invalid input and 0 otherwise

How are arguments passed to function pointers in C?

In the following code snippet Reference, compare is called from main() without any parameters being passed.
I assume it is taking ((char *)&key, (char *)string) as the two parameters needed for the function call. But how does it work internally in C? Does the compiler fill in the arguments when compare is called?
#include <search.h>
#include <string.h>
#include <stdio.h>
#define CNT 2
int compare(const void *arg1,const void *arg2)
{
return (strncmp(*(char **)arg1, *(char **)arg2, strlen(*(char **)arg1)));
}
int main(void)
{
char **result;
char *key = "PATH";
unsigned int num = CNT;
char *string[CNT] = {
"PATH = d:\\david\\matthew\\heather\\ed\\simon","LIB = PATH\\abc" };
/* The following statement finds the argument that starts with "PATH" */
if ((result = (char **)lfind((char *)&key, (char *)string, &num,
sizeof(char *), compare)) != NULL)
printf("%s found\n", *result);
else
printf("PATH not found \n");
return 0;
}
How do function pointers in C work?
compare is called from main() without any parameters being passed.
No. It is just referred to from main.
Essentially, it tells lfind "Hey, if you want to compare entries, take this function - it can be called exactly the way you expect it."
lfind, then, knows about that and whenever it needs to compare two entries, it puts the arguments wherever it would put them on a normal call (on the stack, into the right registers or wherever, depending on the calling conventions of the architecture) and performs the call to the given address. This is called an indirect call and is usually a bit more expensive than a direct call.
Let's switch to a simpler example:
#include <stdio.h>
int add1(int x) {
return x + 1;
}
int times2(int x) {
return x * 2;
}
int indirect42(int(*f)(int)) {
// Call the function we are passed with 42 and return the result.
return f(42);
}
int indirect0(int(*f)(int)) {
// Call the function we are passed with 0 and return the result.
return f(0);
}
int main() {
printf("%d\n", add1(33));
printf("%d\n", indirect42(add1));
printf("%d\n", indirect0(add1));
printf("%d\n", times2(33));
printf("%d\n", indirect42(times2));
printf("%d\n", indirect0(times2));
return 0;
}
Here, I call the function add1() on my own first, then I tell two other functions to use that function for their own purpose. Then I do the same with another function.
And this works perfectly - both functions are called with 33 by me, then with 42 and with 0. And the results - 34, 43 and 1 for the first and 66, 84 and 0 for the 2nd function - match the expectations.
If we have a look at the x86 assembler output, we see
…
indirect42:
.LFB13:
.cfi_startproc
movl 4(%esp), %eax
movl $42, 4(%esp)
jmp *%eax
.cfi_endproc
The function gets the given function pointer into %eax, then puts the 42 where it is expected and then calls %eax. (Resp., as I activated optimization, it jumps there. The logic remains the same.)
The function doesn't know which function is to be called, but how it is to be called.
indirect0:
.LFB14:
.cfi_startproc
movl 4(%esp), %eax
movl $0, 4(%esp)
jmp *%eax
.cfi_endproc
The function does the same as the other one, but it passes 0 to whatever function it gets.
Then, as it comes to calling all this stuff, we get
movl $33, (%esp)
call add1
movl %eax, 4(%esp)
movl $.LC0, (%esp)
call printf
Here we have a normal function call.
movl $add1, (%esp)
call indirect42
movl %eax, 4(%esp)
movl $.LC0, (%esp)
call printf
Here, the address of add1 is pushed to the stack and given to indirect42 so that function can do with it as it wants.
movl $add1, (%esp)
call indirect0
movl %eax, 4(%esp)
movl $.LC0, (%esp)
call printf
The same with indirect0.
(other stuff snipped as it works the same)
There's no magic. You're not actually calling it from main, you're just passing along a reference to it when you call lfind. The compiler lets you do this because it already knows (a) what kind of function is expected there (ie, with those two args) and (b) knows that the one you're using matches correctly, so no complaints. If you'd defined compare differently (wrongly), it wouldn't have compiled.
Inside of lfind, it's invoked directly with the two parameters by name. Function pointers like this seem slippery when you first see them, but in C they're actually laboriously explicit-- you must pass in a reference to a function that matches the function signature declared in lfind's declaration. And inside of lfind, that code must call the passed-in function in the defined way.
Function pointers work exactly the same as regular function calls, by pushing the parameters on the stack and making a subroutine call. The only difference is the location of the subroutine call, with a function pointer, the location is loaded from the given variable whereas with a direct function call, the location is statically generated by the compiler as an offset from wherever the code block is loaded to.
The best way to understand this is actually using GDB and look at the instructions emitted by the compiler. Try it with a less complex code example, then work up from there. You will save time in the long run.
Syntax for lfind is
void *lfind (const void *key, const void *base, size_t *nmemb, size_t size, comparison_fn_t compar)
The lfind function searches in the array with *nmemb elements of size bytes pointed to by base for an element which matches the one pointed to by key. The function pointed to by compar is used decide whether two elements match.
Pointer to compar function is passed to lfind. lfind will then call compar internally using base and key as arg1 and arg2, respectively.
GNU Libc defines comparison_fn_t in stdlib.h if _GNU_SOURCE set.
#include <stdlib.h>
#ifndef HAVE_COMPARISON_FN_T
typedef int (*comparison_fn_t)(const void *, const void *);
#endif

Will compilers optimize double logical negation in conditionals?

Consider the following hypothetical type:
typedef struct Stack {
unsigned long len;
void **elements;
} Stack;
And the following hypothetical macros for dealing with the type (purely for enhanced readability.) In these macros I am assuming that the given argument has type (Stack *) instead of merely Stack (I can't be bothered to type out a _Generic expression here.)
#define stackNull(stack) (!stack->len)
#define stackHasItems(stack) (stack->len)
Why do I not simply use !stackNull(x) for checking if a stack has items? I thought that this would be slightly less efficient (read: not noticeable at all really, but I thought it was interesting) than simply checking stack->len because it would lead to double negation. In the following case:
int thingy = !!31337;
printf("%d\n", thingy);
if (thingy)
doSomethingImportant(thingy);
The string "1\n" would be printed, and It would be impossible to optimize the conditional (well actually, only impossible if the thingy variable didn't have a constant initializer or was modified before the test, but we'll say in this instance that 31337 is not a constant) because (!!x) is guaranteed to be either 0 or 1.
But I'm wondering if compilers will optimize something like the following
int thingy = wellOkaySoImNotAConstantThingyAnyMore();
if (!!thingy)
doSomethingFarLessImportant();
Will this be optimized to actually just use (thingy) in the if statement, as if the if statement had been written as
if (thingy)
doSomethingFarLessImportant();
If so, does it expand to (!!!!!thingy) and so on? (however this is a slightly different question, as this can be optimized in any case, !thingy is !!!!!thingy no matter what, just like -(-(-(1))) = -1.)
In the question title I said "compilers", by that I mean that I am asking if any compiler does this, however I am particularly interested in how GCC will behave in this instance as it is my compiler of choice.
This seems like a pretty reasonable optimization and a quick test using godbolt with this code (see it live):
#include <stdio.h>
void func( int x)
{
if( !!x )
{
printf( "first\n" ) ;
}
if( !!!x )
{
printf( "second\n" ) ;
}
}
int main()
{
int x = 0 ;
scanf( "%d", &x ) ;
func( x ) ;
}
seems to indicate gcc does well, it generates the following:
func:
testl %edi, %edi # x
jne .L4 #,
movl $.LC1, %edi #,
jmp puts #
.L4:
movl $.LC0, %edi #,
jmp puts #
we can see from the first line:
testl %edi, %edi # x
it just uses x without doing any operations on it, also notice the optimizer is clever enough to combine both tests into one since if the first condition is true the other must be false.
Note I used printf and scanf for side effects to prevent the optimizer from optimizing all the code away.

Memory allocation for global and local variables

I have learnt that memory for global variables are allocated at program startup whereas memory for local variables are allocated whenever function call is made.
Case 1:
I have declared a global integer array of size 63500000 and memory used is 256 MB
Ideone Link
include <stdio.h>
int a[63500000];
int main()
{
printf ("This code requires about 250 MB memory\n");
return 0;
}
Case 2:
I have declared a local integer array of same size in main() and memory used is 1.6 MB
Ideone link
#include <stdio.h>
int main()
{
int a[63500000]= {1,5,0};
printf ("This code requires only 1.6 MB \n");
//printf ("%d\n", a[0]);
return 0;
}
Case 3:
I have declared a local integer array of same size in another function and memory used is 1.6 MB
Ideone Link
#include <stdio.h>
void f()
{
int a[63500000];
}
int main()
{
f();
return 0;
}
Please explain why there is difference in memory used or my concept of memory allocation is wrong ??
First of all: the ideone compiler is GCC.
So, what does GCC do when you compile this?:
void foo ()
{
int a[63500000];
}
gcc -S -O2 foo.c generates:
foo:
pushl %ebp
movl %esp, %ebp
popl %ebp
ret
i.e. nothing is allocated on the stack, at all.
The array is simply optimized away by GCC because it is never used.
GCC won't do this with a global, because it is possible that a global is used in another compilation unit, and so it isn't sure that it is never used. Also: The global is not on the stack (since it is a global).
Now, lets see what happens when you actually use the local array:
int bar (int a, int b, int c)
{
int f[63500000];
f[a] = 9;
f[b] = 7;
return f[c];
}
Things are very different:
bar:
pushl %ebp
movl %esp, %ebp
subl $254000000, %esp
movl 8(%ebp), %eax
movl $9, -254000000(%ebp,%eax,4)
movl 12(%ebp), %eax
movl $7, -254000000(%ebp,%eax,4)
movl 16(%ebp), %eax
movl -254000000(%ebp,%eax,4), %eax
leave
ret
This line: subl $254000000, %esp corresponds to the size of the array. i.e. memory is allocated on the stack.
Now, what if I tried to use the bar function in a program:
int bar (int a, int b, int c)
{
int f[63500000];
f[a] = 9;
f[b] = 7;
return f[c];
}
int main (void)
{
return bar (0, 0, 0);
}
We already saw, that the bar function allocates 250 or so megabytes on the stack. On my default GNU/Linux install, the stack size is limited to 8MB. So when the program runs, it causes a "Segmentation fault". I can increase it if I want, by executing the following in a shell:
ulimit -s 1000000 #i.e. allow stack size to grow close to 1GB
Then I can run the program, and it will indeed run.
The reason why it fails on the ideone website is that they have limited the stack size when executing programs (and they should, otherwise malicious users could mess up their system).
Cases 2, 3
Variables that you define inside functions are allocated on the stack. That means that the associated memory is cleaned up (the stack is "popped") when the function exits.
Case 1
Variables defined in global scope are allocated in a data segment (or, generally, a memory space requested from the operating system) that exists for the lifetime of the process.
Additionally
Memory allocated using malloc is allocated from a heap and remains allocated until explicitly released using free.
Note that a modern OS may well provide address space requested by a program, but not physically back that address space with RAM until the memory (or a portion of the memory often called a page) is physically accessed.
case 2 and case 3 would result in stack overflow as you are asking for 64 MB of stack memory wherein your stack is typically 8 MB on Linux . this would result in random bad things and /or core dumps and crashes.
this answer greatly explains various sections of process address space (.text, .bss , .data )and how various allocations of variables is done.

Resources