I am trying to debug an executable that does not work properly (does not receive segmentation fault, it just doesn't do what he should do) using WinDbg. I would like to see a call stack with all the functions that are called while running the executable. Is this possible in WinDbg or any other debugger?
yes as i commented use wt (watch and trace)
Read the docs
it can be configured in several ways
like only first level calls
upto nth level calls only
only in specific modules
only in main module etc
below is a simple trace of a function in ntdll that crosses um-km boundary
0:000> u . l1
ntdll!LdrpInitializeProcess+0x11bf:
76ff6113 e870fffdff call ntdll!NtQueryInformationProcess (76fd6088)
0:000> bp .+5 //set a bp on return address
0:000> bl
0 e 76ff6118 0001 (0001) 0:**** ntdll!LdrpInitializeProcess+0x11c4
0:000> wt
2 0 [ 0] ntdll!NtQueryInformationProcess
27 0 [ 0] aswhook
1 0 [ 1] aswhook
28 1 [ 0] aswhook
1 0 [ 1] 0x6efc0480
1 0 [ 1] 0x6efc0485
2 0 [ 1] ntdll!NtQueryInformationProcess
2 0 [ 2] ntdll!KiFastSystemCall
1 0 [ 1] ntdll!NtQueryInformationProcess
46 8 [ 0] aswhook
3 0 [ 1] aswhook
Breakpoint 0 hit
eax=00000000 ebx=7ffdf000 ecx=e8cb8789 edx=ffffffff esi=ffffffff edi=00000000
eip=76ff6118 esp=0018f59c ebp=0018f6f4 iopl=0 nv up ei pl zr na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246
ntdll!LdrpInitializeProcess+0x11c4:
76ff6118 85c0 test eax,eax
0:000>
I've written two system calls in linux, and I measure both of their resource usages with getrusage within the system call. However, most of the results I get are 0, and I'm not sure if that makes sense. Here's the output:
[ 4103.028728] DELTA RESULTS:
[ 4103.028746] u.tv_sec: 0
[ 4103.028748] s.tv_sec: 0
[ 4103.028749] u.tv_usec: 0
[ 4103.028751] s.tv_usec: 971765
[ 4103.028753] maxrss: 0
[ 4103.028755] ixrss: 0
[ 4103.028756] idrss: 0
[ 4103.028758] isrss: 0
[ 4103.028760] minflt: 0
[ 4103.028761] majflt: 0
[ 4103.028763] nswap: 0
[ 4103.028765] inblock: 0
[ 4103.028766] oublock: 0
[ 4103.028768] msgsnd: 0
[ 4103.028769] msgrcv: 0
[ 4103.028771] nsignals: 0
[ 4103.028773] nvcsw: 199
[ 4103.028774] nivcsw: 5
[ 4103.028961] CONTROL RESULTS:
[ 4103.028966] u.tv_sec: 0
[ 4103.028968] s.tv_sec: 0
[ 4103.028970] u.tv_usec: 1383
[ 4103.028972] s.tv_usec: 971998
[ 4103.028974] maxrss: 2492
[ 4103.028975] ixrss: 0
[ 4103.028977] idrss: 0
[ 4103.028978] isrss: 0
[ 4103.028980] minflt: 75
[ 4103.028982] majflt: 0
[ 4103.028984] nswap: 0
[ 4103.028986] inblock: 24
[ 4103.028987] oublock: 0
[ 4103.028989] msgsnd: 0
[ 4103.028991] msgrcv: 0
[ 4103.028992] nsignals: 0
[ 4103.028994] nvcsw: 200
[ 4103.028996] nivcsw: 5
I just want to know if this output is passable, or if it's a sign that somethings wrong, so I didn't put any of the source code. Thank you!
This looks right; I would not expect the syscall to make any changes in the vast majority of these metrics, which are measuring resources accounted to the process, not kernel resources. You should only see a change if you make a syscall like mmap that allocates new resources to the process, or one like read that ends up storing to previously copy-on-write memory belonging to the process.
With that said, I don't think calling getrusage like this makes much sense at all. It's normally for getting aggregate usage over a process's lifetime, not measuring deltas. Some of the more esoteric things may be hard to measure deltas for in other ways, but just time (real or cpu) can be measued via clock_gettime.
We have a project written in ANSI C. Generally the memory consumption was not a big concern, but now we have a request to fit our program into 256 KB of RAM. I don't have this exact platform on hands, so I compile my project under 32 bit x86 Linux (because it provides enough different tools to evaluate the memory consumption), optimize what I can, remove some features and eventually I have to have the conclusion: what features we need to sacrifice to be able to run on very small systems (if we're able at all). First of all I did a research what exactly a memory size in linux and it seems I have to optimize the RSS size, not VSZ. But in linux even a smallest program which prints "Hello world!" once a second consumes 285-320 KB in RSS:
#include <stdio.h>
#include <unistd.h>
#include <signal.h>
unsigned char cuStopCycle = 0;
void SigIntHandler(int signo)
{
printf("SIGINT received, terminating the program\n");
cuStopCycle = 1;
}
int main()
{
signal( SIGINT, SigIntHandler);
while(!cuStopCycle)
{
printf("Hello, World!\n");
sleep(1);
}
printf("Exiting...\n");
}
user#Ubuntu12-vm:~/tmp/prog_size$ size ./prog_size
text data bss dec hex filename
1456 272 12 1740 6cc ./prog_size
root#Ubuntu12-vm:/home/app# ps -C prog_size -o pid,rss,vsz,args
PID RSS VSZ COMMAND
22348 316 2120 ./prog_size
Obviously this program will perfectly run on small PLCs, with 64KB of RAM. It is just linux loads a lot of libs. I generate a map file for this program and all this data + bss comes from the CRT library. I need to mention that if I add some code to this project - 10,000 times "a = a + b" or manipulate arrays 2000 long int variables, I see the difference in code size, bss size but eventually the RSS size of the process is the same, it doesn't affected)
So I take this as a baseline, the point I want to reach (and which I will never reach, because I need more functions than just print a message once a second).
So here comes my project, where I removed all extra features, removed all auxiliary functions, removed everything except the basic functionality. There are some ways to optimize more, but not that much, what could be removed is already taken away:
root#Ubuntu12-vm:/home/app/workspace/proj_sizeopt/Cmds# ls -l App
-rwxr-xr-x 1 root root 42520 Jul 13 18:33 App
root#Ubuntu12-vm:/home/app/workspace/proj_sizeopt/Cmds# size ./App
text data bss dec hex filename
37027 404 736 38167 9517 ./App
So I have ~36KB of code and ~1KB of data. I do not call malloc inside of my project, I use a shared memory allocation with a wrapper library so I can control how much memory is allocated:
The total memory size allocated is 2052 bytes
Under the hood there are malloc calls obviously, if I substitute 'malloc' calls with my function which summarize all alloc requests I see that ~2.3KB of memory is allocated:
root#Ubuntu12-vm:/home/app/workspace/proj_sizeopt/Cmds# LD_PRELOAD=./override_malloc.so ./App
Malloc allocates 2464 bytes total
Now I run my project amd see that it consumes 600KB of RAM.
root#Ubuntu12-vm:/home/app/workspace/proj_sizeopt# ps -C App -o pid,rss,vsz,args
PID RSS VSZ COMMAND
22093 604 2340 ./App
I do not understand why it eats so much memory. The code size is small. There is not much memory allocated. The size of data is small. Why it takes so much memory? I tried to analyze the mapping of the process:
root#Ubuntu12-vm:/home/app/workspace/proj_sizeopt# pmap -x 22093
22093: ./App
Address Kbytes RSS Dirty Mode Mapping
08048000 0 28 0 r-x-- App
08052000 0 4 4 r---- App
08053000 0 4 4 rw--- App
09e6a000 0 4 4 rw--- [ anon ]
b7553000 0 4 4 rw--- [ anon ]
b7554000 0 48 0 r-x-- libpthread-2.15.so
b756b000 0 4 4 r---- libpthread-2.15.so
b756c000 0 4 4 rw--- libpthread-2.15.so
b756d000 0 8 8 rw--- [ anon ]
b7570000 0 300 0 r-x-- libc-2.15.so
b7714000 0 8 8 r---- libc-2.15.so
b7716000 0 4 4 rw--- libc-2.15.so
b7717000 0 12 12 rw--- [ anon ]
b771a000 0 16 0 r-x-- librt-2.15.so
b7721000 0 4 4 r---- librt-2.15.so
b7722000 0 4 4 rw--- librt-2.15.so
b7731000 0 4 4 rw-s- [ shmid=0x70000c ]
b7732000 0 4 4 rw-s- [ shmid=0x6f800b ]
b7733000 0 4 4 rw-s- [ shmid=0x6f000a ]
b7734000 0 4 4 rw-s- [ shmid=0x6e8009 ]
b7735000 0 12 12 rw--- [ anon ]
b7738000 0 4 0 r-x-- [ anon ]
b7739000 0 104 0 r-x-- ld-2.15.so
b7759000 0 4 4 r---- ld-2.15.so
b775a000 0 4 4 rw--- ld-2.15.so
bfb41000 0 12 12 rw--- [ stack ]
-------- ------- ------- ------- -------
total kB 2336 - - -
And it looks like the program size (in RSS) is only 28KB, the rest is consumed by shared libraries. BTW I do not use posix threads, I do not explicitly link to it, but somehow the linker anyway links this library I have no idea why (this is not really important). If we look at the mapping in more details:
root#Ubuntu12-vm:/home/app/workspace/proj_sizeopt# cat /proc/22093/smaps
08048000-08052000 r-xp 00000000 08:01 344838 /home/app/workspace/proj_sizeopt/Cmds/App
Size: 40 kB
Rss: 28 kB
Pss: 28 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 28 kB
Private_Dirty: 0 kB
Referenced: 28 kB
Anonymous: 0 kB
AnonHugePages: 0 kB
Swap: 0 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
Locked: 0 kB
...
09e6a000-09e8b000 rw-p 00000000 00:00 0 [heap]
Size: 132 kB
Rss: 4 kB
Pss: 4 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 4 kB
Referenced: 4 kB
Anonymous: 4 kB
AnonHugePages: 0 kB
Swap: 0 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
Locked: 0 kB
...
b7570000-b7714000 r-xp 00000000 08:01 34450 /lib/i386-linux-gnu/libc-2.15.so
Size: 1680 kB
Rss: 300 kB
Pss: 7 kB
Shared_Clean: 300 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 0 kB
Referenced: 300 kB
Anonymous: 0 kB
AnonHugePages: 0 kB
Swap: 0 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
Locked: 0 kB
...
b7739000-b7759000 r-xp 00000000 08:01 33401 /lib/i386-linux-gnu/ld-2.15.so
Size: 128 kB
Rss: 104 kB
Pss: 3 kB
Shared_Clean: 104 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 0 kB
Referenced: 104 kB
Anonymous: 0 kB
AnonHugePages: 0 kB
Swap: 0 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
Locked: 0 kB
...
bfb41000-bfb62000 rw-p 00000000 00:00 0 [stack]
Size: 136 kB
Rss: 12 kB
Pss: 12 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 12 kB
Referenced: 12 kB
Anonymous: 12 kB
AnonHugePages: 0 kB
Swap: 0 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
Locked: 0 kB
So I see that RSS size for my project is 40KB, but only 28 KB is used. Does it mean that this project will fit into 256 KB of RAM?
The heap size is 132KB but only 4 KB is used. Why is that? I'm sure it will be different on the small embedded platform.
The stack is 136KB but only 12KB is used.
GLIBC/LD obviously consume some memory, but what exactly memory will it be on the embedded platform?
I do not look at PSS because it doesn't make any sense in my case, I look only at RSS.
What conclusions can I draw from this picture? How exactly to evaluate memory consumption by the application? Look at the RSS size of the process? Or subtract from this size RSS of all mapped system libraries? What is about heap/stack size?
I would be very grateful for any advises, notes, memory consumption optimizations techniques, DOs and DON'Ts for platforms with extremely small amount of RAM (except obvious - keep amount of data and code to the very minimum).
I also will appreciate an explanation WHY the program with small amount of code and data (and which doesn't allocate much memory) still consumes a lot of RAM in RSS.
Thank you in advance
... fit our program into 256 KB of RAM. I don't have this exact platform on hands, so I compile my project under 32 bit x86 Linux..
And what you now see is that the Linux platform tools make reasonable assumptions of your possible need of stack and heap, given that it nows you run on a big machine, and links-in a reasonable set of library functions for your needs. Some you won't need, but it gives them to you "for free".
To fit in 256 Kb on your target platform, you must compile for your target platform and link with the target platform's libraries (and CRT) using the target platform's linker.
Those will make different assumptions, use possibly smaller linbrary footprints, make smaller assumptions on stack and heap space, etcetera. For example, create "Hello World" for the target platform and check its needs on that target platform. Or use a realistic simulator of the target platform and libraries (and not to forget, OS, whch partly dictates what the libraries must do).
And if it is then still too big, you have to re-write or tweak the whole CRT and all libraries....
the program needs to be compiled/linked with the embedded device in mind.
For best results use a makefile
use the 'rt' library written for the embedded device
use the start.s file, located, via the makefile, where execution begins.
use 'static' in the linker parameters
use the linker parameters to not include any libraries but what is specifically requested.
do not use libraries written for your development machine. Only use libraries written for the embedded device.
do NOT include stdio.h, etc. unless specifically written for the embedded device
do NOT call printf() inside a signal handler.
if possible, do not call printf() at all.
instead write a small char output function and have it perform the output through the uart.
do not use signals, instead use interrupts
the resulting application will not run on your PC., but, once loaded, will run on the 256k device
do not call sleep(), rather write your own function that uses a device timer peripheral, that sets the timer and puts the device into powerdown mode.
the time interrupt needs to bring the device out of the powerdown mode.
in the makefile, specifically set the size of the stack, heap, etc.
have the link step output a .map file. study that map file until you understand everything in it.
use a compiler/linker that is specific for the embedded device
you probably will need to include a function that initializes the peripherals on the device, like the clock, the uart, the timer, the watchdog, and any other built in peripherals that the code actually uses.
you will need a file that allocates the interrupt table, and a small function to handle each of the interrupts, even though most of those functions will do nothing beyond clearing the appropriate interrupt pending flag and returning from the interrupt
you will probably need a function to periodically refresh the watchdog, conditionally, depending on an indication that the main function is still cycling regularily. I.E the main function loop and the initialization function will refresh the watchdog
I'm running redis-server and according to prstat it's utilizing 3721M of memory. When I run info in redis-cli, I get this for memory:
used_memory:8739028
used_memory_human:8.33M
used_memory_rss:8739028
mem_fragmentation_ratio:1.00
This is running on a 2GB cloud instance, so my memory utilization is always over. However, even when I flushall on redis it doesn't seem to effect the memory consumption.
Could this be do to a setting that maybe is allocating memory or any idea why redis is using so much over my actual data?
Thank you!
EDIT:
Also just to add, I'm running the same setup (application/redis) on another instance and the memory consumption is only 554M with used_memory_human:5.68M of data.
Response from pmap:
[root#]# pmap -x 27425
27425: /opt/local/bin/redis-server /opt/local/etc/redis.conf
Address Kbytes RSS Anon Locked Mode Mapped File
08046000 8 8 8 - rw--- [ stack ]
08050000 208 160 - - r-x-- redis-server
08093000 8 8 4 - rwx-- redis-server
08095000 3807836 3804160 3764928 - rwx-- [ heap ]
FE9BD000 12 12 - - r-x-- libpthread.so.1
FEAF0000 456 456 - - r-x-- libnsl.so.1
FEB72000 8 8 4 - rw--- libnsl.so.1
FEB74000 20 16 4 - rw--- libnsl.so.1
FEBA0000 304 304 - - r-x-- libm.so.2
FEBFB000 16 16 - - rwx-- libm.so.2
FEDA0000 4 4 - - r--s- dev:531,392 ino:3736139179
FEDB0000 4 4 - - rwxs- [ anon ]
FEDC0000 24 12 8 - rwx-- [ anon ]
FEDD0000 4 4 4 - rwx-- [ anon ]
FEDE0000 1216 1216 - - r-x-- libc.so.1
FEF10000 36 36 24 - rwx-- libc.so.1
FEF19000 8 8 4 - rwx-- libc.so.1
FEF20000 4 4 4 - rwx-- [ anon ]
FEF30000 56 56 - - r-x-- libsocket.so.1
FEF4E000 4 4 - - rw--- libsocket.so.1
FEF50000 4 4 4 - rwx-- [ anon ]
FEF60000 4 4 - - r-x-- libdl.so.1
FEF70000 4 4 - - r--s- ld.config
FEF80000 4 4 4 - rw--- [ anon ]
FEF90000 4 - - - rw--- [ anon ]
FEFA0000 4 4 4 - rwx-- [ anon ]
FEFB0000 4 4 - - rwx-- [ anon ]
FEFB7000 208 208 - - r-x-- ld.so.1
FEFFB000 8 8 4 - rwx-- ld.so.1
FEFFD000 4 4 - - rwx-- ld.so.1
The limits has nothing enabled in the config, here's virtual memory and advanced:
VIRTUAL MEMORY
vm-enabled no
vm-swap-file /tmp/redis.swap
vm-max-memory 0
vm-page-size 32
vm-pages 134217728
vm-max-threads 4
ADVANCED CONFIG
hash-max-zipmap-entries 512
hash-max-zipmap-value 64
Question is a bit old but I had this issue before. If I remember correctly, I shutdown my redis instance, deleted the RDB file if applicable and restart redis. If don't want to remove the RDB file because it contains data you need, I do NOT recommend this. You can try to DEL what you can, then do a SAVE (I believe redis does SAVE when it's shutdown so that might be redundant) and then restart.
It would be good time to upgrade redis to latest stable if you haven't.
I read in a book (can't recollect the name) that in a 32 bit system text section always starts at 0x0848000. But when I do readelf -S example_executable it does not reflect the same information. Why is that? Do other sections (bss,data,rodata etc) also start at fixed addresses? How can I find the alignment boundary for these sections?
There is a good explanation here of how Linux virtual memory works when allocating storage for a particular program.
ua alberta - linux memory allocation
The designers of the compiler/linker tool chain need to allocate an arbitary address for particular blocks of memory. In order to make it easier for other components of the tool chain like debuggers and profilers they always allocatate the same block to the same addresses. The actual addresses chosen are completely arbitrary.
When the program is loaded the virtual address will be mapped to some random piece of free memory (this is mostly done in hardware). This mapping is done on a per process basis to several programs can address virtual address x'0848000' but be pointed at different "real" memory addresses.
It all depends on the implementation on the particular machine.For a linux machine the behaviour will be different than that from windows machine.
Note however that the virtual memory addresses need to start at some fixed address in order to make life easier for debuggers.However the real addresses will be different depending on the pages available in RAM.
If you look the output of readelf -S more carefully you will notice you will notice subtracting the offset from the address indeed gives you 0x0848000.
As i had mentioned earlier this magic number 0x0848000 will depend on the type of executable format.
here is the output i get on my ubuntu 32 bit machine:
readelf -S ~/a.out
There are 29 section headers, starting at offset 0x1130:
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .interp PROGBITS 08048134 000134 000013 00 A 0 0 1
[ 2] .note.ABI-tag NOTE 08048148 000148 000020 00 A 0 0 4
[ 3] .note.gnu.build-i NOTE 08048168 000168 000024 00 A 0 0 4
[ 4] .gnu.hash GNU_HASH 0804818c 00018c 000020 04 A 5 0 4
[ 5] .dynsym DYNSYM 080481ac 0001ac 000050 10 A 6 1 4
[ 6] .dynstr STRTAB 080481fc 0001fc 00004c 00 A 0 0 1
[ 7] .gnu.version VERSYM 08048248 000248 00000a 02 A 5 0 2
[ 8] .gnu.version_r VERNEED 08048254 000254 000020 00 A 6 1 4
[ 9] .rel.dyn REL 08048274 000274 000008 08 A 5 0 4
[10] .rel.plt REL 0804827c 00027c 000018 08 A 5 12 4
[11] .init PROGBITS 08048294 000294 000030 00 AX 0 0 4
[12] .plt PROGBITS 080482c4 0002c4 000040 04 AX 0 0 4
[13] .text PROGBITS 08048310 000310 00018c 00 AX 0 0 16
[14] .fini PROGBITS 0804849c 00049c 00001c 00 AX 0 0 4
There is no consistent address between operating systems and architectures for the text section or any other section. Additionally position independent code and address space layout randomizations make these values even inconsistent on the same machine for some systems.