I'm trying to understand how printf works in C for a simple case. I wrote the following program:
#include "stdio.h"
int main(int argc, char const *argv[])
{
printf("Test %s\n", argv[1]);
return 0;
}
Running objdump on the binary I noticed the Test %s\n resides in .rodata
objdump -sj .rodata bin
bin: file format elf64-x86-64
Contents of section .rodata:
08e0 01000200 54657374 2025730a 00 ....Test %s..
So formatted print seems to perform additional pattern copying from rodata to somewhere else.
After compiling and running it with stare ./bin rr I noticed a brk syscall before the actual write. So running it with
gdb catch syscall brk
gdb catch syscall write
shows that in my case the current break equals to 0x555555756000, but it then sets to 0x555555777000. When the write occurs the formatted string
x/s $rsi
0x555555756260: "Test rr\n"
Resides between the "old" and "new" break. After the write occurs the programs exits.
QUESTION: Why do we allocate so many pages and why didn't the break returns to the previous one after write syscall occurs? Is there any reason to use brk instead of mmap for such formatting?
brk() (and it's companion sbrk()) is some kind of mmap() specialized to manipulate the heap size. It is there for historical reasons, the libc could also use mmap() or mremap() directly.
The heap is expanded as additional memory is allocated, for example with malloc(), which happens internally in the libc, for example to have enough space to create the actual string from the format string and the parameters or many other internal things (i.e. the output buffers when using buffered io with the f* function family).
If some parts of the heap are not used anymore, it is often not automatically deallocated for two main reasons: the heap may be fragmented, and/or the unused heap does not fall below a certain threshold which justifies the operation, because it might be needed again soon.
As a side note: the format string itself is certainly not copied from the ro-section to the heap, this would be completely useless. But the result string is (usually) built on the heap.
Why do we allocate so many pages ?
Using a system call is costly, so the library ask for more than you would like at this moment because it is highly probable that you will want more very soon. Managing memory in user mode is less costly. It is a matter of granularity.
and why didn't the break returns to
the previous one after write syscall occurs?
Again, why free if the probability that you will ask for more soon is high?
Is there any reason to use brk instead of mmap for such formatting?
It is a matter of choice, this depends on implementation.
Aside: Your question is more about "memory allocation policy" than "understanding printf" (that is the context of).
Related
I want to know how I can use the resource monitor, any kind, htop top, etc. to track the memory usage of a processes. Let's write a simple C program.
int main() {
while(1){}
return 0;
}
After the compilation, the executable output a.out is only 16Kb
$ ls -lah ./a.out [8:43:44]
-rwxr-xr-x 1 user staff 16K May 17 08:43 ./a.out
As I understood, the code has no variable, no malloc and all kinds of statement that requires any additional memory usage other than the code itself, which will be loaded to the memory when running. Some additional memory for stack pointer, frame pointer, etc. is expected but shouldn't be too much.
Interestingly, when I run the code. The System Monitor gives a very different opinion.
So I am using MacOS, the monitor states that the Virtual Memory usage is 30Gb+!
Okey?! Maybe this is due to some optimization, or some unique technique that MacOS manages memory. Let's try running that in a Ubuntu Virtual Machine with 1Gb memory.
I know this looks more reasonable than 30Gb, but 2356Kb?
Am I looking at the wrong indicator?
As I understood, the code has no variable, no malloc and all kinds of statement that requires any additional memory usage other than the code itself, which will be loaded to the memory when running.
Your code doesn't have much; but your code is typically linked with some startup code that does things like preprocess command line arguments, initialize parts of the C library, and call your main().
You'll also have a stack (e.g. so that the startup code can call your main()) that consumes memory (whether you use it or not).
When your program is started the executable loader will also "load" (map into your virtual address space) any shared libraries (e.g. C standard library, that's likely needed by the startup code you didn't write, even if you don't use it yourself).
The other thing that can happen is that when the startup code initializes the C standard library, the C standard library can initialize the heap (for things like malloc()), and something (the rest of C standard library initialization, the remainder of the startup code) could use malloc() even though the code you didn't write doesn't use it.
Of course operating systems/virtual memory management uses pages; so the size of each of your program's sections (.text, .data, etc), each section in each shared library, your stack, your heap, etc; are rounded up to the page size. Depending on which computer it is, page size might be 4 KiB (16 KiB for recent ARM/M1 Apple machines); and if the startup code you didn't create wants 1 byte in the .data section it costs 4 KiB (or 16 KiB) of memory.
So I am using MacOS, the monitor states that the Virtual Memory usage is 30Gb+!
I'd guess that most of it is space that was allocated for heap; where a tiny amount of the space is used and most isn't. If you assume that there's 176 KiB of private memory (used by your program and its startup code) and 440 KiB of shared memory (used by shared libraries), and assume that "32.54 GiB" is 3412000000 KiB; then maybe it's "3412000000 - (176 + 440) = 3411999384 KiB of space that was allocated but isn't actually being used".
I know this looks more reasonable than 30Gb, but 2356Kb?
Continuing the assumption that it's mostly "allocated but not used" heap space; it's good to understand how heap works. "Allocated but not used" space costs almost nothing, but asking the OS to allocate space (e.g. because the program actually used it all and ran out of "allocated but not used" space) involves some overhead. For this reason the C library tends to ask the OS for large pieces of "allocated but not used" space (to minimize the overhead by reducing the chance of needing to ask the OS for more space) and then splits it into tiny pieces when you call malloc().
With this in mind; and not forgetting that the startup code and libraries are "generic" and not likely to by optimized specifically for any one program; you can say that the best size for the heap's "allocated but not used" space is impossible to determine, but ranges from "maybe too small but it doesn't matter much" to "maybe too big but nobody cares". Different compilers and/or libraries and/or operating systems make different decisions; so the amount of "allocated but not used" space varies.
Am I looking at the wrong indicator?
I don't know (it depends on why you're looking at memory stats to begin with).
On modern machines the total virtual address space may be 131072 GiB (where most is "not allocated"), so if you're worried that "allocated but not used" space is going to cause you to run out of "not allocated" space later then you're looking at the right indicator.
Typically people care more about (some subset of) "allocated and actually used space" though.
If you're worried about consuming too much actual RAM (e.g. worried about increasing the chance that swap space will be used by the OS, which could reduce performance of all software and not just yours) then you'd want to look at the "Real Memory Size"; but I suspect that this includes shared memory (which would be used by many programs and not just your program).
so, i was making this program that let people know the number of contiguous subarray which sum is equal to a certain value.
i have written the code , but when i try to run this code in vcexpress 2010, it says these error
Unhandled exception at 0x010018e7 in test 9.exe: 0xC00000FD: Stack overflow.
i have tried to search for the solution in this website and other webisites, but i can't seem to find any solution which could help me fix the error in this code(they are using recursion while i'm not).
i would be really grateful if you would kindly explain what cause this error in my code, and how to fix this error. any help would be appreciated. Thank you.
here is my code :
#include <stdio.h>
int main ()
{
int n,k,a=0,t=0;
unsigned long int i[1000000];
int v1,v2=0,v3;
scanf("%d %d",&n,&k);
for(v3=0;v3<n;v3++)
{
scanf("%d",&i[v3]);
}
do
{
for(v1=v2;v1<n;v1++)
{
t=i[v1]+t;
if(t==k)
{
a++;
break;
}
}
t=0;
v2++;
}while(v2!=n);
printf("%lu",a);
return 0;
}
Either move
unsigned long int i[1000000];
outside of main, thus making it a global variable (not an automatic one), or better yet, use some C dynamic heap allocation:
// inside main
unsigned long int *i = calloc(1000000, sizeof(unsigned long int));
if (!i) { perror("calloc"); exit(EXIT_FAILURE); };
BTW, for such a pointer, I would use (for readability reasons) some other name than i. And near the end of main you'll better free(i); to avoid memory leaks.
Also, you could move these 2 lines after the read of n and use calloc(n, sizeof(unsigned long int)) instead of calloc(1000000, sizeof(unsigned long int)) ; then you can handle arrays bigger than a million elements if your computer and system provides enough resources for that.
Your initial code is declaring an automatic variable which goes into the call frame of main on your call stack (which has a limited size, typically a megabyte or a few of them). On some operating systems there is a way to increase the size of that call stack (in an OS-specific way). BTW each thread has its own call stack.
As a rule of thumb, your C functions (including main) should avoid having call frames bigger than a few kilobytes. With the GCC compiler, you could invoke it with gcc -Wall -Wextra -Wframe-larger-than=1024 -g to get useful warnings and debug information.
Read the virtual address space wikipage. It has a nice picture worth many words. Later, find the way to query, on your operating system, the virtual address space of your process (on Linux, use proc(5) like cat /proc/$$/maps etc...). In practice, your virtual address space is likely to contain many segments (perhaps a dozen, sometimes thousands). Often, the dynamic linker or some other part of your program (or of your C standard library) uses memory-mapped files. The standard C heap (managed by malloc etc) may be organized in several segments.
If you want to understand more about virtual address space, take time to read a good book, like: Operating systems, three easy pieces (freely downloadable).
If you want to query the organization of the virtual address space in some process, you need to find an operating-system specific way to do that (on Linux, for a process of pid 1234, use /proc/1234/maps or /proc/self/maps from inside the process).
Memory is laid out much more differently than simply 4 segments(which was done long ago). The answer to the question can be generalized this way - the global or dynamically allocated memory space is handled differently than that of local variables by the system, where as the memory for local variable is limited in size, memory for dynamic allocation or global variables doesn't put a lower constraint like this.
In modern system the concept of virtual address space is there. The process from your program gets a chunk of it. That portion of memory is now responsible for holding the required memory.
Now for dynamic allocation and so on, the process can request more memory and depending on the other processes and so on, new memory request is serviced. For dynamic or global array there is no limit process wise (of course system wise there is- it cant haverequest all memory). That's why dynamic allocation or using global variable won't cause the process to run out of it's allocated memory unlike the automatic lifetime memory that it originally had for local variables.
Basically you can check your stack size
for example in Linux : ulimit -s (Kbytes)
and then decide how you manipulate your code regarding that.
As a concept I would never allocate big piece of memory on the stack because unless you know exactly the depth of your function call and the stack use, it's hard to control the precised allocated memory on stack during run time
When the OS loads a process into memory it initializes the stack pointer to the virtual address it has decided where the stack should go in the process's virtual address space and program code uses this register to know where stack variables are. My question is how does malloc() know at what virtual address the heap starts at? Does the heap always exist at the end of the data segment, if so how does malloc() know where that is? Or is it even one contiguous area of memory or just randomly interspersed with other global variables in the data section?
malloc implementations are dependent on the operating system; so is the process that they use to get the beginning of the heap. On UNIX, this can be accomplished by calling sbrk(0) at initialization time. On other operating systems the process is different.
Note that you can implement malloc without knowing the location of the heap. You can initialize the free list to NULL, and call sbrk or a similar function with the allocation size each time a free element of the appropriate size is not found.
This only about Linux implementations of malloc
Many malloc implementations on Linux or Posix use the mmap(2) syscall to get some quite big range of memory. then they can use munmap(2) to release it.
(It looks like sbrk(2) might not be used a lot any more; in particular, it is not ASLR friendly and might not be multi-thread friendly)
Both these syscalls may be quite expansive, so some implementations ask memory (using mmap) in quite large chunks (e.g. in chunk of one or a few megabytes). Then they manage free space as e.g. linked lists of blocks, etc. They will handle differently small mallocs and large mallocs.
The mmap syscall usually does not start giving memory range at some fixed pieces (notably because of ASLR.
Try on your system to run a simple program printing the result of a single malloc (of e.g. 128 int-s). You probably will observe different addresses from one run to the next (because of ASLR). And strace(1)-ing it is very instructive. Try also cat /proc/self/maps (or print the lines of /proc/self/maps inside your program). See proc(5)
So there is no need to "start" the heap at some address, and on many systems that does not make even any sense. The kernel is giving segments of virtual addresses at random pages.
BTW, both GNU libc and musl libc are free software. You should look inside the source code of their malloc implementation. I find that source code of musl libc is very readable.
On Windows, you use the Heap functions to get the process heap memory. The C runtime will allocate memory blocks on the heap using HeapAlloc and then use that to fulfil malloc requests.
I'm reading standard input on linux. I provide read with buffer that has insufficient length (only two characters), buffer should overflow and Segmentation fault should occure. However the program runs ok. Why?
Compiled with:
gcc file.c -ansi
Runned with:
echo abcd | ./a.out
Program:
#include<stdio.h>
#define STDIN 0
int main() {
/* This buffer is intentionally too small for input */
char * smallBuffer = (char *) malloc( sizeof(char) * 2 );
int readedBytes;
readedBytes = read(STDIN, smallBuffer, sizeof(char) * 4);
printf("Readed: %i, String:'%s'\n", readedBytes, smallBuffer);
return 0;
}
Output:
Readed: 4, String:'abcd'
It is generally wrong to expect a segmentation fault in this kind of cases. You see, buffer overflows result in undefined behavior. It means that a behavior of such code is unpredictable. It may or may not result in segmentation fault.
Technically, when you allocate a buffer of two bytes, for example, there are two possible scenarios.
First is when a buffer is allocated on stack. The stack itself is larger than 2 bytes, and if you overflow that buffer, memory protection unit will still allow you to write at the memory "outside" that buffer. In this case you won't get a segmentation, but could potentially mess up other variables stored "nearby" on the stack, this kind of situation is generally referred to as “stack smashing”.
The second possible scenario is allocating memory dynamically (i.e. using malloc()). In that case it is very likely that actually allocated buffer is a larger or is placed on the same page as memory allocated/reserved before. In that case, the program would write past the buffer of two bytes. It may or may not receive a segmentation violation signal but nevertheless the behavior is undefined.
Sometimes, such cases are hard to find without extremely special care. There are tools that help to trace alike issues. Valgrind is one of them, for example.
On a side note, you may only expect a segmentation fault if you know for sure that a virtual address you are using is invalid or is being protected from read, write, or execution by the memory protection unit (which might not exist at all on the hardware you are running your application).
Hope it helps. Good Luck!
malloc guarantees to provide you with at least the amount of memory you request. To see an error you can use a program such as valgrind and you'll see the following:
==22265== Syscall param read(buf) points to unaddressable byte(s)
==22265== at 0x4F188B0: __read_nocancel (syscall-template.S:82)
==22265== by 0x4005B4: main (in /home/def/p/cm/Git/git/a.out)
==22265== Address 0x51f1042 is 0 bytes after a block of size 2 alloc'd
==22265== at 0x4C2B6CD: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==22265== by 0x400595: main (in /home/def/p/cm/Git/git/a.out)
In this case, the program overwrites some of its own memory. OS does not notice this.
Segmentation fault occurs when a process tries to access a memory that does not belong to it. However, an operating system assigns memory blocks not on a per-byte basis, but with larger blocks - pages (e.g. size of 4 KB is frequently used). So when you allocate two bytes, these two bytes are placed by heap manager on some memory page (either previously allocated or a new one), and the whole memory page is marked as belonging to your process. It is highly probable that these two bytes will not end up at the end of memory page, that is your program will be able to write after these two bytes without any OS exception at the time of writing (but most probably it will fire at you later).
Too small a buffer is not a guarantee that the program will crash. It depends on what data exists in the bytes following the buffer, how the compiler arranges the executable, and how the operating system organizes memory.
Chances are that the bytes following your buffer already "belong" to your program and are padding or otherwise store nothing of import.
The 3rd parameter is not the size of the buffer, but the number of bytes to read. So you call the function and say "here's a stream, read 4 bytes from it and put in in this buffer". But it doesn't know the buffer size (it only knows the file size). So what it happens it reads as much as it can do and puts it in your buffer (assuming you supplied a buffer large enough). So what you get then is memory corruption. You're program may work OK in this simple case, but usually it just fails unexpectedly in some other place.
I think that you should put particular attention to what malloc() really does, a malloc() call under linux it's not only unlikely to fail but it's not granting you a real reservation of space even if it's returning a positive response.
This behaviour is tipically named "optimistic memory allocation strategy" or "overcommit", it's strictly related to the kernel and programming in C under linux it's not that easy, in my opinion you should switch to C++, you will find a familiar syntax to start with and it makes much more sense to use C++ for productivity than C this days, also with a simple RAII approach C++ is safer than C.
Maybe it is different from platform to platform, but
when I compile using gcc and run the code below, I get 0 every time in my ubuntu 11.10.
#include <stdio.h>
#include <stdlib.h>
int main()
{
double *a = malloc(sizeof(double)*100)
printf("%f", *a);
}
Why do malloc behave like this even though there is calloc?
Doesn't it mean that there is an unwanted performance overhead just to initialize the values to 0 even if you don't want it to be sometimes?
EDIT: Oh, my previous example was not initiazling, but happened to use "fresh" block.
What I precisely was looking for was why it initializes it when it allocates a large block:
int main()
{
int *a = malloc(sizeof(int)*200000);
a[10] = 3;
printf("%d", *(a+10));
free(a);
a = malloc(sizeof(double)*200000);
printf("%d", *(a+10));
}
OUTPUT: 3
0 (initialized)
But thanks for pointing out that there is a SECURITY reason when mallocing! (Never thought about it). Sure it has to initialize to zero when allocating fresh block, or the large block.
Short Answer:
It doesn't, it just happens to be zero in your case.(Also your test case doesn't show that the data is zero. It only shows if one element is zero.)
Long Answer:
When you call malloc(), one of two things will happen:
It recycles memory that was previous allocated and freed from the same process.
It requests new page(s) from the operating system.
In the first case, the memory will contain data leftover from previous allocations. So it won't be zero. This is the usual case when performing small allocations.
In the second case, the memory will be from the OS. This happens when the program runs out of memory - or when you are requesting a very large allocation. (as is the case in your example)
Here's the catch: Memory coming from the OS will be zeroed for security reasons.*
When the OS gives you memory, it could have been freed from a different process. So that memory could contain sensitive information such as a password. So to prevent you reading such data, the OS will zero it before it gives it to you.
*I note that the C standard says nothing about this. This is strictly an OS behavior. So this zeroing may or may not be present on systems where security is not a concern.
To give more of a performance background to this:
As #R. mentions in the comments, this zeroing is why you should always use calloc() instead of malloc() + memset(). calloc() can take advantage of this fact to avoid a separate memset().
On the other hand, this zeroing is sometimes a performance bottleneck. In some numerical applications (such as the out-of-place FFT), you need to allocate a huge chunk of scratch memory. Use it to perform whatever algorithm, then free it.
In these cases, the zeroing is unnecessary and amounts to pure overhead.
The most extreme example I've seen is a 20-second zeroing overhead for a 70-second operation with a 48 GB scratch buffer. (Roughly 30% overhead.)
(Granted: the machine did have a lack of memory bandwidth.)
The obvious solution is to simply reuse the memory manually. But that often requires breaking through established interfaces. (especially if it's part of a library routine)
The OS will usually clear fresh memory pages it sends to your process so it can't look at an older process' data. This means that the first time you initialize a variable (or malloc something) it will often be zero but if you ever reuse that memory (by freeing it and malloc-ing again, for instance) then all bets are off.
This inconsistence is precisely why uninitialized variables are such a hard to find bug.
As for the unwanted performance overheads, avoiding unspecified behaviour is probably more important. Whatever small performance boost you could gain in this case won't compensate the hard to find bugs you will have to deal with if someone slightly modifies the codes (breaking previous assumptions) or ports it to another system (where the assumptions might have been invalid in the first place).
Why do you assume that malloc() initializes to zero? It just so happens to be that the first call to malloc() results in a call to sbrk or mmap system calls, which allocate a page of memory from the OS. The OS is obliged to provide zero-initialized memory for security reasons (otherwise, data from other processes gets visible!). So you might think there - the OS wastes time zeroing the page. But no! In Linux, there is a special system-wide singleton page called the 'zero page' and that page will get mapped as Copy-On-Write, which means that only when you actually write on that page, the OS will allocate another page and initialize it. So I hope this answers your question regarding performance. The memory paging model allows usage of memory to be sort-of lazy by supporting the capability of multiple mapping of the same page plus the ability to handle the case when the first write occurs.
If you call free(), the glibc allocator will return the region to its free lists, and when malloc() is called again, you might get that same region, but dirty with the previous data. Eventually, free() might return the memory to the OS by calling system calls again.
Notice that the glibc man page on malloc() strictly says that the memory is not cleared, so by the "contract" on the API, you cannot assume that it does get cleared. Here's the original excerpt:
malloc() allocates size bytes and returns a pointer to the allocated memory.
The memory is not cleared. If size is 0, then malloc() returns either NULL,
or a unique pointer value that can later be successfully passed to free().
If you would like, you can read more about of that documentation if you are worried about performance or other side-effects.
I modified your example to contain 2 identical allocations. Now it is easy to see malloc doesn't zero initialize memory.
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
{
double *a = malloc(sizeof(double)*100);
*a = 100;
printf("%f\n", *a);
free(a);
}
{
double *a = malloc(sizeof(double)*100);
printf("%f\n", *a);
free(a);
}
return 0;
}
Output with gcc 4.3.4
100.000000
100.000000
From gnu.org:
Very large blocks (much larger than a page) are allocated with mmap (anonymous or via /dev/zero) by this implementation.
The standard does not dictate that malloc() should initialize the values to zero. It just happens at your platform that it might be set to zero, or it might have been zero at the specific moment you read that value.
Your code doesn't demonstrate that malloc initialises its memory to 0. That could be done by the operating system, before the program starts. To see shich is the case, write a different value to the memory, free it, and call malloc again. You will probably get the same address, but you will have to check this. If so, you can look to see what it contains. Let us know!
malloc doesn't initialize memory to zero. It returns it to you as it is without touching the memory or changing its value.
So, why do we get those zeros?
Before answering this question we should understand how malloc works:
When you call malloc it checks whether the glibc allocator has a memory of the requested size or not.
If it does, it will return this memory to you. This memory usually comes due to a previous free operation so it has garbage value(maybe zero or not) in most cases.
On the other hand, if it can't find memory, it will ask the OS to allocate memory for it, by calling sbrk or mmap system calls.
The OS returns a zero-initialized page for security reasons as this memory may have been used by another process and carries valuable information such as passwords or personal data.
You can read about it yourself from this Link:
Neighboring chunks can be coalesced on a free no matter what their
size is. This makes the implementation suitable for all kinds of
allocation patterns without generally incurring high memory waste
through fragmentation.
Very large blocks (much larger than a page) are allocated with mmap
(anonymous or via /dev/zero) by this implementation
In some implementations calloc uses this property of the OS and asks the OS to allocate pages for it to make sure the memory is always zero-initialized without initializing it itself.
Do you know that it is definitely being initialised? Is it possible that the area returned by malloc() just frequently has 0 at the beginning?
Never ever count on any compiler to generate code that will initialize memory to anything. malloc simply returns a pointer to n bytes of memory someplace hell it might even be in swap.
If the contents of the memory is critical initialize it yourself.