I need to find out the (dynamic) (assembly) instructions and count against my C program. The output I expect is similar to the following
mov 200
pop 130
jne 48
I tried valgrind --tool=callgrind --cache-sim=yes --dump-instr=yes <my program name> and viewed it using Kcahcegrind. I did find the instruction types but count info was no where. I would like to filter the output to discard the instructions which are due to system libraries etc.
I need to find out the address and the size of memory allocated using malloc in some specific functions and parts of my program. I did some heap profiling but it gives the whole heap size. Any suggestion ?
I want to know which memory locations are accessed by a function of my program. In other words I need to find out the memory access pattern of my program. Will counting Loads help? if yes then how can I count Loads ?
take a look at objdump:
http://sourceware.org/binutils/docs/binutils/objdump.html
I'd get started with objdump -S myprog
Related
I have an m.c:
extern void a(char*);
int main(int ac, char **av){
static char string [] = "Hello , world!\n";
a(string);
}
and an a.c:
#include <unistd.h>
#include <string.h>
void a(char* s){
write(1, s, strlen(s));
}
I compile and build these as:
g++ -c -g -std=c++14 -MMD -MP -MF "m.o.d" -o m.o m.c
g++ -c -g -std=c++14 -MMD -MP -MF "a.o.d" -o a.o a.c
g++ -o linux m.o a.o -lm -lpthread -ldl
Then, I examine the executable, linux thus:
objdump -drwxCS -Mintel linux
The output of this on my Ubuntu 16.04.6 starts off with:
start address 0x0000000000400540
then, later, is the init section:
00000000004004c8 <_init>:
4004c8: 48 83 ec 08 sub rsp,0x8
Finally, is the fini section:
0000000000400704 <_fini>:
400704: 48 83 ec 08 sub rsp,0x8
400708: 48 83 c4 08 add rsp,0x8
40070c: c3 ret
The program references the string Hello , world!\n which is in .data section obtained by command:
objdump -sj .data linux
Contents of section .data:
601030 00000000 00000000 00000000 00000000 ................
601040 48656c6c 6f202c20 776f726c 64210a00 Hello , world!..
All of this tells me that the executable has been created so as to be loaded in actual memory address starting from around 0x0000000000400540 (address of .init) and the program accesses data in actual memory address extending until atleast 601040 (address of .data)
I base this on Chapter 7 of "Linkers & Loaders" by John R Levine, where he states:
A linker combines a set of input files into a single output file that
is ready to be loaded at a specific address.
My question is about the next line.
If, when the program is loaded, storage at that address isn't
available, the loader has to relocate the loaded program to reflect
the actual load address.
(1) Suppose I have another executable that is currently running on my machine already using the memory space between 400540 and 601040, how is it decided where to start my new executable linux?
(2) Related to this, in Chapter 4, it is stated:
..ELF objects...are loaded in about the middle of the address space so
the stack can grown down below the text segment and the heap can grow
up from the end of the data, keeping the total address space in use
relatively compact.
Suppose a previous running application started at, say, 200000 and now linux starts around 400540. There is no clash or overlap of memory address. But as the programs continue, suppose the heap of the previous application creeps up to 300000, while the stack of the newly launched linux has grown downward to 310000. Soon, there will be a clash/overlap of the memory addresses. What happens when the clash eventually occurs?
If, when the program is loaded, storage at that address isn't available, the loader has to relocate the loaded program to reflect the actual load address.
Not all file formats support this:
GCC for 32-bit Windows will add the information required for the loader in the case of dynamic libraries (.dll). However, the information is not added to executable files (.exe), so such an executable file must be loaded to a fixed address.
Under Linux it is a bit more complicated; however, it is also not possible to load many (typically older 32-bit) executable files to different addresses while dynamic libraries (.so) can be loaded to different addresses.
Suppose I have another executable that is currently running on my machine already using the memory space between 400540 and 601040 ...
Modern computers (all x86 32-bit computers) have a paging MMU which is used by most modern operating systems. This is some circuit (typically in the CPU) which translates addresses seen by the software to addresses seen by the RAM. In your example, 400540 could be translated to 1234000, so accessing the address 400540 will actually access the address 1234000 in RAM.
The point is: Modern OSs use different MMU configurations for different tasks. So if you start your program again, a different MMU configuration is used that translates address 400540 seen by the software to address address 2345000 in RAM. Both programs using address 400540 can run at the same time because one program will actually access address 1234000 and the other one will access address 2345000 in RAM when the programs access the address 400540.
This means that some address (e.g. 400540) will never be "already in use" when the executable file is loaded.
The address may already be in use when a dynamic library (.so/.dll) is loaded because these libraries share the memory with the executable file.
... how is it decided where to start my new executable linux?
Under Linux the executable file will be loaded to the fixed address if it was linked in a way that it cannot be moved to another address. (As already said: This was typical for older 32-bit files.) In your example the "Hello world" string would be located at address 0x601040 if your compiler and linker created the executable that way.
However, most 64-bit executables can be loaded to a different address. Linux will load them to some random address because of security reasons making it more difficult for viruses or other malware to attack the program.
... so the stack can grown down below the text segment ...
I've never seen this memory layout in any operating system:
Both under Linux and under Solaris the stack was located at the end of the address space (somewhere around 0xBFFFFF00), while the text segment was loaded quite close to the start of the memory (maybe address 0x401000).
... and the heap can grow up from the end of the data, ...
suppose the heap of the previous application creeps up ...
Many implementations since the late 1990s do not use heap any more. Instead, they use mmap() to reserve new memory.
According to the manual page of brk(), the heap was declared as "legacy feature" in the year 2001, so it should not be used by new programs any longer.
(However, according to Peter Cordes malloc() still seems to use the heap in some cases.)
Unlike "simple" operating systems like MS-DOS, Linux does not allow you "simply" to use the heap, but you have to call the function brk() to tell Linux how much heap you want to use.
If a program uses heap and it uses more heap than available, the brk() function returns some error code and the malloc() function simply returns NULL.
However, this situation typically happens because no more RAM is available and not because the heap overlaps with some other memory area.
... while the stack of the newly launched linux has grown downward to ...
Soon, there will be a clash/overlap of the memory addresses. What happens when the clash eventually occurs?
Indeed, the size of the stack is limited.
If you use too much stack, you have a "stack overflow".
This program will intentionally use too much stack - just to see what happens:
.globl _start
_start:
sub $0x100000, %rsp
push %rax
push %rax
jmp _start
In the case of an operating system with an MMU (such as Linux), your program will crash with an error message:
~$ ./example_program
Segmentation fault (core dumped)
~$
EDIT/ADDENDUM
Is stack for all running programs located at the "end"?
In older Linux versions, the stack was located near (but not exactly at) the end of the virtual memory accessible by the program: Programs could access the address range from 0 to 0xBFFFFFFF in those Linux versions. The initial stack pointer was located around 0xBFFFFE00. (The command line arguments and environment variables came after the stack.)
And is this the end of actual physical memory? Will not the stack of different running programs then get mixed up? I was under the impression that all of the stack and memory of a program remains contiguous in actual physical memory, ...
On a computer using an MMU, the program never sees physical memory:
When the program is loaded, the OS will search some free area of RAM - maybe it finds some at the physical address 0xABC000. Then it configures the MMU in a way that the virtual addresses 0xBFFFF000-0xBFFFFFFF are translated to the physical addresses 0xABC000-0xABCFFF.
This means: Whenever the program accesses address 0xBFFFFE20 (for example using a push operation), the physical address 0xABCE20 in the RAM is actually accessed.
There is no possibility for a program at all to access a certain physical address.
If you have another program running, the MMU is configured in a way that the addresses 0xBFFFF000-0xBFFFFFFF are translated to the addresses 0x345000-0x345FFF when the other program is running.
So if one of the two programs will perform a push operation and the stack pointer is 0xBFFFFE20, the address 0xABCE20 in RAM will be accessed; if the other program performs a push operation (with the same stack pointer value), the address 0x345E20 will be accessed.
Therefore, the stacks will not mix up.
OSs not using an MMU but supporting multi-tasking (examples are the Amiga 500 or early Apple Macintoshes) will of course not work this way. Such OSs use special file formats (and not ELF) which are optimized for running multiple programs without MMU. Compiling programs for such OSs is much more complex than compiling programs for Linux or Windows. And there are even restrictions for the software developer (example: functions and arrays should not be too long).
Also, does each program have its own stack pointer, base pointer, registers, etc.? Or does the OS just have one set of these registers to be shared by all programs?
(Assuming a single-core CPU), the CPU has one set of registers; and only one program can run at the same time.
When you start multiple programs, the OS will switch between the programs. This means program A runs for (for example) 1/50 second, then program B runs for 1/50 second, then program A runs for 1/50 second and so on. It appears to you as if the programs run the same time.
When the OS switches from program A to program B, it must first save the values of the registers (of program A). Then it must change the MMU configuration. Finally it must restore program B's register values.
Yes, objdump on this executable shows addresses where its segments will be mapped. (Linking collects sections into segments: What's the difference of section and segment in ELF file format) .data and .text get linked into different sections with different permissions (read+write vs. read+exec).
If, when the program is loaded, storage at that address isn't available
That could only happen when loading a dynamic library, not the executable itself. Virtual memory means that each process has its own private virtual address space, even if they were started from the same executable. (This is also why ld can always pick the same default base address for the text and data segments, not trying to slot every executable and library on the system into a different spot in a single address space.)
An executable is the first thing that gets to lay claim to parts of that address space, when it's loaded/mapped by the OS's ELF program loader. That's why traditional (non-PIE) ELF executables can be non-relocatable, unlike ELF shared objects like /lib/libc.so.6
If you single-step a program with a debugger, or include a sleep, you'll have time to look at less /proc/<PID>/maps. Or cat /proc/self/maps to have cat show you its own map. (Also /proc/self/smaps for more details info on each mapping, like how much of it is dirty, using hugepages, etc.)
(Newer GNU/Linux distros configure GCC to make PIE executables by default: 32-bit absolute addresses no longer allowed in x86-64 Linux?. In that case objdump would only see addresses relative to a base of 0 or 1000 or something. And compiler-generated asm would have used PC-relative addressing, not absolute.)
I have a question asking me to explain in what regions of a linux memory map a procedure is stored. The question instructs me to use objdump -h to find this information.
Now, I am a little bit confused what "regions in memory" means.
I know that for a given procedure we have certain register that we work with (say %eax, %edx...) and also for each variable we have a memory location it is stored in (say 8(%ebp)). In addition I know that we have the %esp and %ebp registers to "take care" of the stack.
I also run objdump -h on my file but from what I get I cannot tell anything specific.
So should I just mention the registers being used and the memory addresses where the variables of this procedure are being stored?
I believe your question is asking where the linker has designated your actual code to reside in memory when it's loaded by the operating system. This area of code would be represented by the program counter register, or %EIP on x86.
Typically on Linux, program code as well as read-only variables are stored in the lower regions of mapped memory for the process, with the stack in the upper regions (i.e., the stack grows down).
You could easily do a internet search for linux memory map, after all it is your homework and you would learn how to problem solve and do research.
Each program has certain segments, here are a few:
bss - uninitialized data
data - initialized data (strings, arrays etc...)
text - code "procedures"
Sections are relevant to the start address of the program, with positive or negative offsets.
Here is a good page:
http://duartes.org/gustavo/blog/post/anatomy-of-a-program-in-memory
My goal altogether is to figure out from a post mortem core file, why a specific process is consuming a lot of memory. Is there a summary that I can get somehow? As obvious valgrind is out of the question, because I can't get access to the process live.
First of all getting an output something similar to /proc/"pid"/maps, would help, but
maintenance info sections
(as described here: GDB: Listing all mapped memory regions for a crashed process) in gdb didn't show me heap memory consumption.
info proc map
is an option, as I can get access to machine with the exact same code, but as far as I have seen it is not correct. My process was using 700MB-s, but the maps seen only accounted for some 10 MBs. And I didn't see .so-s there which are visible in
maintenance print statistics
Do you know any other command which might be useful?
I can always instrument the code, but that's no easy. Along with reaching all the allocated data through pointers is like needle in the haystack.
Do you have any ideas?
Postmortem debugging of this sort in gdb is a bit of an art more than a science.
The most important tool for it, in my opinion, is the ability to write scripts that run inside of gdb. The manual will explain it to you. The reason I find this so useful is that it lets you do things like walking data structures and printing out information abou them.
Another possibility for you here is to instrument your version of malloc -- write a new malloc function that saves statistics about what is being allocated so that you can then look at those post mortem. You can, of course, call the original malloc to do the actual memory allocation work.
I'm sorry that I can't give you an obvious and simple answer that will simply yield an immediate fix for you here -- without tools like valgrind this is a very hard job.
If its Linux you dont have to worry about doing stats to your malloc. Use the utility called 'memusage'
for a sample program (sample_mem.c) like below
#include<stdio.h>
#include<stdlib.h>
#include<time.h>
int main(voiid)
{
int i=1000;
char *buff=NULL;
srand(time(NULL));
while(i--)
{
buff = malloc(rand() % 64);
free(buff);
}
return 0;
}
the output of memusage will be
$memusage sample_mem
Memory usage summary: heap total: 31434, heap peak: 63, stack peak: 80
total calls total memory failed calls
malloc| 1000 31434 0
realloc| 0 0 0 (nomove:0, dec:0, free:0)
calloc| 0 0 0
free| 1000 31434
Histogram for block sizes:
0-15 253 25% ==================================================
16-31 253 25% ==================================================
32-47 247 24% ================================================
48-63 247 24% ================================================
but if your writing a malloc wapper then you can make your program coredump after this many number of malloc so that you can get a clue.
You might be able to use a simple tool like log-malloc.c which compiles into a shared library which is LD_PRELOADed before your application and logs all the malloc-type functions to a file. At least it might help narrow down the search in your dump.
for a assigment i need to translate some C declarations to assembly using only AVR directives.
I'm wondering if anyone could give me some advice on learning how to do this.
For example:
translate 'char c;' and 'char* d;' to assembly statements
Please note, this is the first week im learning assembly,
Any help/advice would be appreciated
First, char c; and char* d; are declarations not statements.
What you can do is dump the assembly output of your C program with the avr-gcc option -S:
# Dump assembly to stdout
avr-gcc -mmcu=your_avr_mcu -S -c source.c -o -
Then you can reuse the relevant assembly output parts to create inline assembler statements.
Look here on how to write inline assember with avr-gcc:
http://www.nongnu.org/avr-libc/user-manual/inline_asm.html
Without a compiler that you can disassemble from (avr-gcc is easy to come by), it may be difficult to try to understand what happens when a high level language is compiled.
you are simply declaring that you want a variable or an address when you use declarations like that. that doesnt necessarily, immediately, place something in assembly. often dead code and other things are removed from code by the compiler. Other times it is not until the very end of the compile process that you know where your variable may end up. Sometimes a char only ever lives in a register, so for a short period of time in the program that variable has a home. Sometimes there is a longer life the variable has to have a home the whole time the program is running and there are not enough registers to keep it in one forever so it will get a memory location allocated to it.
Likewise a pointer is an address which also lives in registers or in memory locations.
If you dont have a compiler where you can experiment with adding C code and seeing what happens. And even if you do you need to get the instruction set documentation for the desired processor family.
http://www.atmel.com/Images/doc0856.pdf
Look at the add operation for example add rd,rr, and it shows you that d and r are both between 0 and 31 so you could have add r10,r23. And looking at the operation that means that r10 = r10 + r23. If you had char variables that you wanted to add together in C this is one of the instructions the compiler might use.
There are two lds instructions a single 16 bit word version and one that takes two 16 bit words. (usually the assembler chooses that for you). rd is between 0 and 31 and k is an address in the memory space. If you have a global variable declared it will likely be accessed using lds or sts. So K is a pointer, a fixed pointer. Your char * in C can turn into a fixed number at compile time depending on what you do with that pointer in your code. If it is a dynamic thing then look at the flavors of ld and st, that use register pairs. So you might see a char * pointer turn into a pair of registers or a pair of memory locations that hold the pointer itself then it might use the x, y, or z register pairs, see ld and st, and maybe an adiw to find an offset to that pointer before using ld or st.
I have a simulator http://github.com/dwelch67/avriss. but it needs work, not fully debugged (unless you want to learn the instruction set through examining a simulator and its potentinal bugs). simavr and some others are out there that you can use to watch your code execute. http://gitorious.org/simavr
I’m trying to debug a memory leak problem. I’m using mtrace() to get a malloc/free/realloc trace. I’ve ran my prog and have now a huge log file. So far so good. But I have problems interpreting the file. Look at these lines:
# /usr/java/ibm-java2-x86_64-50/jre/bin/libj9prt23.so:[0x2b270a384a34] + 0x1502570 0x68
# /usr/java/ibm-java2-x86_64-50/jre/bin/libj9prt23.so:[0x2b270a384a34] + 0x1502620 0x30
# /usr/java/ibm-java2-x86_64-50/jre/bin/libj9prt23.so:[0x2b270a384a34] + 0x2aaab43a1700 0xa80
# /usr/java/ibm-java2-x86_64-50/jre/bin/libj9prt23.so:[0x2b270a384a34] + 0x1501460 0xa64
The strange about this is that one call (same return address) is responsible for 4 allocations.
Even stranger:
# /usr/java/ibm-java2-x86_64-50/jre/bin/libj9prt23.so:[0x2b270a384a34] + 0x2aaab43a1700 0xa2c
…
# /usr/java/ibm-java2-x86_64-50/jre/bin/libj9prt23.so:[0x2b270a384a34] + 0x2aaab43a1700 0xa80
Between those two lines the block 0x2aaab43a1700 is never being freed.
Does anyone know how to explain this? How could one call result in 4 allocations? And how could malloc return an address which was already allocated previously?
edit 2008/09/30:
The script to analyze the mtrace() output provided by GLIBC (mtrace.pl) isn't of any help here. It will just say: Alloc 0x2aaab43a1700 duplicate. But how could this happen?
You're looking at the direct output of mtrace, which is extremely confusing and counterintuitive. Luckily, there is a perl script (called mtrace, found within glibc-utils) which can very easily help the parsing of this output.
Compile your build with debugging on, and run mtrace like such:
$ gcc -g -o test test.c
$ MALLOC_TRACE=mtrace.out ./test
$ mtrace test mtrace.out
Memory not freed:
-----------------
Address Size Caller
0x094d9378 0x400 at test.c:6
The output should be a lot easier to digest.
The function that is allocating the memory is being called more than once. The caller address points to the code that did the allocation, and that code is simply being run more than once.
Here is an example in C:
void *allocate (void)
{
return (malloc(1000));
}
int main()
{
mtrace();
allocate();
allocate();
}
The output from mtrace is:
Memory not freed:
-----------------
Address Size Caller
0x0000000000601460 0x3e8 at 0x4004f6
0x0000000000601850 0x3e8 at 0x4004f6
Note how the caller address is identical? This is why the mtrace analysing script is saying they are identical, because the same bug is being seen more that once, resulting in several memory leaks.
Compiling with debugs flags (-g) is helpful if you can:
Memory not freed:
-----------------
Address Size Caller
0x0000000000601460 0x3e8 at /home/andrjohn/development/playground/test.c:6
0x0000000000601850 0x3e8 at /home/andrjohn/development/playground/test.c:6
One possible explanation is that the same function is allocating different buffer sizes? One such example is strdup.
For the second question, it is possible that the runtime is allocating some "static" scratch area which is not intended to be freed until the process is terminated. And at that point, the OS will clean-up after the process anyway.
Think about it this way: in Java, there are no destructors, and no guarantees that finalization will be ever called for any object.
Try running your app under valgrind. It might give you a better view about what is actually being leaked.