The following is the result after run on solaris, it shows there are two heaps, but in my understanding, for a process, there is only one heap which is a large continuous memory which can be managed by brk to expand or shrink the size. And for anon memory, a process can have many anon memory which can be managed by mmap/munmap. Is my understanding correct? or I read the result of the pmap wrongly?
sol9# pmap -sx pgrep testprog
...
00022000 3960 3960 3960 - 8K rwx-- [ heap ]
00400000 131072 131072 131072 - 4M rwx-- [ heap ]
...
FF390000 8 8 - - 8K r-x-- libc_psr.so.1
FF3B0000 8 8 8 - 8K rwx-- [ anon ]
...
total Kb 135968 135944 135112 -
You are both correct and misreading the pmap output. If you had done pmap -x the results would probably be less confusing, showing the heap just once, but since you added the -s flag, it breaks down the heap into segments with different page mappings.
The addresses starting at 0x0022000 are not aligned properly to be mapped to a 4Mb page, so they use 3960kb of 8k pages. 0x0022000+(3960*1024) = 0x00400000
At 0x00400000 the address is properly aligned for 4Mb pages, so the heap switches to using the larger pages with fewer page table entries.
If you wanted to ensure that your heap started at the proper alignment to use 4Mb pages for the whole thing instead of starting with 8k until it reached an alignment boundary, then you would link your program with -M /usr/lib/ld/map.bssalign to do that.
A slightly more in-depth explanation can be found in the Page Size and Memory Layout blog post from Solaris Application Programming author Darryl Gove.
Related
For instance, lets suppose that instead of buffers growing in the opposite direction of the stack, they grow in the same direction. If I have a character buffer containing the string "Hello world", instead of 'H' being placed at the lowest address, it is placed at the highest address, and so on.
If an input string copied to a buffer were to overflow it could not overwrite the return address of the function, but certainly there are other things it could overwrite. My question is -- if the input string was long enough, what things could be overwritten? Are there library functions that exist between the heap and the stack that could be overwritten? Can heap variables be overwritten? I assume that variables in the data and bss sections can be overwritten, but is the text segment protected from writes?
The layout of processes in memory varies from system to system. This answer covers Linux under x86_64 processors.
There is a nice article illustrating the memory layout for Linux processes here.
If the buffer is a local variable, then it will be on the stack, along with other local variables. The first thing you are likely to hit if you overflow the buffer is other local variables in the same function.
When you reach the end of the stack, there is a randomly sized offset before the next used segment of memory. If you continue writing into this address space you would trigger a segfault (since that address space is not mapped to any physical RAM).
Assuming you managed to skip over the random offset without crashing, and continued overwriting, the next thing it might hit is the memory mapping segment. This segment contains file mappings, including those used to map dynamic shared libraries into the address space, and anonymous mappings. The dynamic libraries are going to be read-only, but if the process had any RW mappings in place you could perhaps overwrite data in them.
After this segment comes another random offset before you hit the heap. Again if you tried to write into the address space of the random offset you would trigger a crash.
Below the heap comes another random offset, followed by the BSS, Data and finally text segments. Static variables within BSS and Data could be overwritten. The text segment should not be writable.
You can inspect the memory map of a process using the pmap command.
The answer to your question depends entirely on what operating system is being used, as well as what hardware architecture. The operating system lays out logical memory in a certain fashion, and the architecture sometimes reserves (very low) memory for specific purposes as well.
One thing to understand is that traditional processes can access their entire logical memory space, but very little of this capacity is typically used. The most likely effect of what you describe is that you'll try to access some unallocated memory and you'll get a segfault in response, crashing your program.
That said, you definitely can modify these other segments of memory, but what happens when you do so depends on their read/write permissions. For example, the typical memory layout you learn in school is:
Low memory to high memory:
.text - program code
.data - initialized static variables
.bss - uninitialized static variables
.heap - grows up
memory map segments - dynamic libraries
.stack - grows down
The .text segment is marked read only / executable by default, so if you attempt to write to a .text memory location you'll get a segmentation fault. It's possible to change .text to writeable, but this is in general a terrible idea.
The .data, .bss, .heap, and .stack segments are all readable/writeable by default, so you can overwrite those sections without any program errors.
The memory map segment(s) all have their own permissions to deal with as well. Some of these segments are writeable, most are not (so writing to them creates segfaults).
The last thing to note is that most modern OSes will randomize the locations of these segments to make things more difficult for hackers. This may introduce gaps between different segments (which will again cause segfaults if you try to access them).
On Linux, you can print out a process' memory map with the command pmap. The following is the output of this program on an instance of vim:
10636: vim hello.text
0000000000400000 2112K r-x-- vim
000000000080f000 4K r---- vim
0000000000810000 88K rw--- vim
0000000000826000 56K rw--- [ anon ]
0000000000851000 2228K rw--- [ anon ]
00007f7df24c6000 8212K r--s- passwd
00007f7df2ccb000 32K r-x-- libnss_sss.so.2
00007f7df2cd3000 2044K ----- libnss_sss.so.2
00007f7df2ed2000 4K r---- libnss_sss.so.2
00007f7df2ed3000 4K rw--- libnss_sss.so.2
00007f7df2ed4000 48K r-x-- libnss_files-2.17.so
00007f7df2ee0000 2044K ----- libnss_files-2.17.so
00007f7df30df000 4K r---- libnss_files-2.17.so
00007f7df30e0000 4K rw--- libnss_files-2.17.so
00007f7df30e1000 24K rw--- [ anon ]
00007f7df30e7000 103580K r---- locale-archive
00007f7df960e000 8K r-x-- libfreebl3.so
00007f7df9610000 2044K ----- libfreebl3.so
00007f7df980f000 4K r---- libfreebl3.so
00007f7df9810000 4K rw--- libfreebl3.so
00007f7df9811000 8K r-x-- libutil-2.17.so
00007f7df9813000 2044K ----- libutil-2.17.so
00007f7df9a12000 4K r---- libutil-2.17.so
00007f7df9a13000 4K rw--- libutil-2.17.so
00007f7df9a14000 32K r-x-- libcrypt-2.17.so
00007f7df9a1c000 2044K ----- libcrypt-2.17.so
00007f7df9c1b000 4K r---- libcrypt-2.17.so
00007f7df9c1c000 4K rw--- libcrypt-2.17.so
00007f7df9c1d000 184K rw--- [ anon ]
00007f7df9c4b000 88K r-x-- libnsl-2.17.so
00007f7df9c61000 2044K ----- libnsl-2.17.so
00007f7df9e60000 4K r---- libnsl-2.17.so
00007f7df9e61000 4K rw--- libnsl-2.17.so
00007f7df9e62000 8K rw--- [ anon ]
00007f7df9e64000 88K r-x-- libresolv-2.17.so
00007f7df9e7a000 2048K ----- libresolv-2.17.so
00007f7dfa07a000 4K r---- libresolv-2.17.so
00007f7dfa07b000 4K rw--- libresolv-2.17.so
00007f7dfa07c000 8K rw--- [ anon ]
00007f7dfa07e000 152K r-x-- libncurses.so.5.9
00007f7dfa0a4000 2044K ----- libncurses.so.5.9
00007f7dfa2a3000 4K r---- libncurses.so.5.9
00007f7dfa2a4000 4K rw--- libncurses.so.5.9
00007f7dfa2a5000 16K r-x-- libattr.so.1.1.0
00007f7dfa2a9000 2044K ----- libattr.so.1.1.0
00007f7dfa4a8000 4K r---- libattr.so.1.1.0
00007f7dfa4a9000 4K rw--- libattr.so.1.1.0
00007f7dfa4aa000 144K r-x-- liblzma.so.5.0.99
00007f7dfa4ce000 2044K ----- liblzma.so.5.0.99
00007f7dfa6cd000 4K r---- liblzma.so.5.0.99
00007f7dfa6ce000 4K rw--- liblzma.so.5.0.99
00007f7dfa6cf000 384K r-x-- libpcre.so.1.2.0
00007f7dfa72f000 2044K ----- libpcre.so.1.2.0
00007f7dfa92e000 4K r---- libpcre.so.1.2.0
00007f7dfa92f000 4K rw--- libpcre.so.1.2.0
00007f7dfa930000 1756K r-x-- libc-2.17.so
00007f7dfaae7000 2048K ----- libc-2.17.so
00007f7dface7000 16K r---- libc-2.17.so
00007f7dfaceb000 8K rw--- libc-2.17.so
00007f7dfaced000 20K rw--- [ anon ]
00007f7dfacf2000 88K r-x-- libpthread-2.17.so
00007f7dfad08000 2048K ----- libpthread-2.17.so
00007f7dfaf08000 4K r---- libpthread-2.17.so
00007f7dfaf09000 4K rw--- libpthread-2.17.so
00007f7dfaf0a000 16K rw--- [ anon ]
00007f7dfaf0e000 1548K r-x-- libperl.so
00007f7dfb091000 2044K ----- libperl.so
00007f7dfb290000 16K r---- libperl.so
00007f7dfb294000 24K rw--- libperl.so
00007f7dfb29a000 4K rw--- [ anon ]
00007f7dfb29b000 12K r-x-- libdl-2.17.so
00007f7dfb29e000 2044K ----- libdl-2.17.so
00007f7dfb49d000 4K r---- libdl-2.17.so
00007f7dfb49e000 4K rw--- libdl-2.17.so
00007f7dfb49f000 20K r-x-- libgpm.so.2.1.0
00007f7dfb4a4000 2048K ----- libgpm.so.2.1.0
00007f7dfb6a4000 4K r---- libgpm.so.2.1.0
00007f7dfb6a5000 4K rw--- libgpm.so.2.1.0
00007f7dfb6a6000 28K r-x-- libacl.so.1.1.0
00007f7dfb6ad000 2048K ----- libacl.so.1.1.0
00007f7dfb8ad000 4K r---- libacl.so.1.1.0
00007f7dfb8ae000 4K rw--- libacl.so.1.1.0
00007f7dfb8af000 148K r-x-- libtinfo.so.5.9
00007f7dfb8d4000 2048K ----- libtinfo.so.5.9
00007f7dfbad4000 16K r---- libtinfo.so.5.9
00007f7dfbad8000 4K rw--- libtinfo.so.5.9
00007f7dfbad9000 132K r-x-- libselinux.so.1
00007f7dfbafa000 2048K ----- libselinux.so.1
00007f7dfbcfa000 4K r---- libselinux.so.1
00007f7dfbcfb000 4K rw--- libselinux.so.1
00007f7dfbcfc000 8K rw--- [ anon ]
00007f7dfbcfe000 1028K r-x-- libm-2.17.so
00007f7dfbdff000 2044K ----- libm-2.17.so
00007f7dfbffe000 4K r---- libm-2.17.so
00007f7dfbfff000 4K rw--- libm-2.17.so
00007f7dfc000000 132K r-x-- ld-2.17.so
00007f7dfc1f8000 40K rw--- [ anon ]
00007f7dfc220000 4K rw--- [ anon ]
00007f7dfc221000 4K r---- ld-2.17.so
00007f7dfc222000 4K rw--- ld-2.17.so
00007f7dfc223000 4K rw--- [ anon ]
00007ffcb46e7000 132K rw--- [ stack ]
00007ffcb475f000 8K r-x-- [ anon ]
ffffffffff600000 4K r-x-- [ anon ]
total 163772K
The segment starting at 0x851000 is actually the start of the heap (which pmap will tell you with more verbose reporting modes, but the more verbose mode didn't fit).
I think your question reflects a fundamental misunderstanding of how things work in an operating system. Things like "buffers" and "stack" tend not to be defined by the operating system.
The operating system divides memory into kernel and user areas (and some systems have additional, protected areas).
The layout of the user area is usually defined by the linker. The linker creates executables that instruct the loader how to set up the address space. Various linkers have different levels of control. Generally, the default linker settings group the sections of program as something like:
-Read/execute
-Read/no execute
-Read/write/initialized
-Read/write/demand zero
One some linkers you can create multiple program sections with these attributes.
You ask:
"If I have a character buffer containing the string "Hello world", instead of 'H' being placed at the lowest address, it is placed at the highest address, and so on."
In a van neumann machine, memory is independent of its usage. The same memory block can simultaneously be interpreted as a string, floating point, integer, or instruction. You can put your letter in any order you want but most software libraries would not recognize them in reverse order. IF your own libraries can handle the strings stored backwards, knock yourself out.
"My question is -- if the input string was long enough, what things could be overwritten?"
It could be anything.
"Are there library functions that exist between the heap and the stack that could be overwritten?"
That depends upon what your linker did.
"Can heap variables be overwritten?"
Heap can be overwritten.
"I assume that variables in the data and bss sections can be overwritten, but is the text segment protected from writes?
Generally, yes.
I have a question about setting the stack size of pthread using pthread_attr_setstacksize():
From my understanding, the stack of pthread lies on the anonymous mmapped region of its creating process. When I set the thread's stack size to 5M & 8M separately, I see that it does affect the size of the mmapped region but both of them use (almost) the same amount of physical memory:
Partial result of the pmap command [stack with size 5M]:
00007f97f8b52000 7172K rw--- [ anon ]
Partial result of the pmap command [stack with size 8M]:
00007f8784606000 10244K rw--- [ anon ]
Partial result of the top command [stack with size 5M]:
VIRT RES SWAP USED
25160 7236 0 7236
Partial result of the top command [stack with size 8M]:
VIRT RES SWAP USED
22088 7196 0 7196
In my program, I want to use a larger stack size to prevent a stack overflow; what I want to confirm here is that by using a large stack size, I will not consume more physical memory but just larger virtual address. Is this correct?
If you need a larger stack size to prevent overflow, that implies at some point you'll actually be using the larger size (ie, your stack will be deeper than the default would allow).
In that case, there's some point where your program would have crashed with the default stack size, where it instead has another page allocated to its address space. So, in some sense, it could use more physical memory.
How many of the pages allocated to your process actually reside in memory at one time, however, depends your OS, memory pressure, other processes etc. etc.
Edit: I updated my question with the details of my benchmark
For benchmarking purposes, I am trying to setup 1GB pages in a Linux 3.13 system running on top of two Intel Xeon 56xx ("Westmere") processors. For that I modified my boot parameters to add support for 1GB pages (10 pages). These boot parameters only contain 1GB pages and not 2MB ones. Running hugeadm --pool-list leads to:
Size Minimum Current Maximum Default
1073741824 10 10 10 *
My kernel boot parameters are taken into account. In my benchmark I am allocating 1GiB of memory that I want to be backed by a 1GiB huge page using:
#define PROTECTION (PROT_READ | PROT_WRITE)
#define FLAGS (MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB)
uint64_t size = 1UL*1024*1024*1024;
memory = mmap(0, size, PROTECTION, FLAGS, 0, 0);
if (memory == MAP_FAILED) {
perror("mmap");
exit(1);
}
sleep(200)
Looking at the /proc/meminfo while the bench is sleeping (sleep call above), we can see that one huge page has been allocated:
AnonHugePages: 4096 kB
HugePages_Total: 10
HugePages_Free: 9
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 1048576 kB
Note: I disabled THP (through the /sys file system) before running the bench, so I guess the AnonHugePages field reported by /proc/meminfo represents the huge pages allocated by THP before stopping it.
At this point we can think that all is fine, but unfortunately my bench leads me to think that many 2MiB pages are used and not one 1GiB page. Here is the explanation:
This bench randomly access the allocated memory through pointer's chasing: a first step fills the memory to enable pointer chasing (each cell points to another cell) and in a second step the bench navigates through the memory using
pointer = *pointer;
Using the perf_event_open system call, I am counting data TLB read misses for the second step of the bench only. When the memory allocated size is 64MiB, I count a very small number, 0,01 % of my 6400000 memory accesses, of data TLB read misses. All the accesses are saved in the TLB. In other words, 64MiB of memory can be kept in the TLB. As soon as the allocated memory size is greater than 64 MiB I see data tlb read misses. For a memory size equals to 128 MiB, I have 50% of my 6400000 memory accesses that missed in the TLB. 64MiB appears to be the size that can fit in the TLB and 64MiB = 32 entries (as reportd below) * 2MiB pages. I conclude that I am not using 1GiB pages but 2MiB ones.
Can you see any explanation for that behavior ?
Moreover, the cpuid tool, reports the following about the tlb on my system:
cache and TLB information (2):
0x5a: data TLB: 2M/4M pages, 4-way, 32 entries
0x03: data TLB: 4K pages, 4-way, 64 entries
0x55: instruction TLB: 2M/4M pages, fully, 7 entries
0xb0: instruction TLB: 4K, 4-way, 128 entries
0xca: L2 TLB: 4K, 4-way, 512 entries
L1 TLB/cache information: 2M/4M pages & L1 TLB (0x80000005/eax):
L1 TLB/cache information: 4K pages & L1 TLB (0x80000005/ebx):
L2 TLB/cache information: 2M/4M pages & L2 TLB (0x80000006/eax):
L2 TLB/cache information: 4K pages & L2 TLB (0x80000006/ebx):
As you can see, there is no information about 1GiB pages. How many such pages can be cached in the TLB ?
TL;DR
You (specifically, your processor) cannot benefit from 1GB pages in this scenario, but your code is correct without modifications on systems that can.
Long version
I followed these steps to attempt to reproduce your problem.
My System: Intel Core i7-4700MQ, 32GB RAM 1600MHz, Chipset H87
svn co https://github.com/ManuelSelva/c4fun.git
cd c4fun.git/trunk
make. Discovered a few dependencies were needed. Installed them. Build failed, but mem_load did build and link, so did not pursue the rest further.
Rebooted the system, appending at GRUB time to the boot arguments the following:
hugepagesz=1G hugepages=10 default_hugepagesz=1G
which reserves 10 1GB pages.
cd c4fun.git/trunk/mem_load
Ran several tests using memload, in random-access pattern mode and pinning it to core 3, which is something that isn't 0 (the bootstrap processor).
./mem_load -a rand -c 3 -m 1073741824 -i 1048576
This resulted in approximately nil TLB misses.
./mem_load -a rand -c 3 -m 10737418240 -i 1048576
This resulted in approximately 60% TLB misses. On a hunch I did
./mem_load -a rand -c 3 -m 4294967296 -i 1048576
This resulted in approximately nil TLB misses. On a hunch I did
./mem_load -a rand -c 3 -m 5368709120 -i 1048576
This resulted in approximately 20% TLB misses.
At this point I downloaded the cpuid utility. It gave me this for cpuid -1 | grep -i tlb:
cache and TLB information (2):
0x63: data TLB: 1G pages, 4-way, 4 entries
0x03: data TLB: 4K pages, 4-way, 64 entries
0x76: instruction TLB: 2M/4M pages, fully, 8 entries
0xb5: instruction TLB: 4K, 8-way, 64 entries
0xc1: L2 TLB: 4K/2M pages, 8-way, 1024 entries
L1 TLB/cache information: 2M/4M pages & L1 TLB (0x80000005/eax):
L1 TLB/cache information: 4K pages & L1 TLB (0x80000005/ebx):
L2 TLB/cache information: 2M/4M pages & L2 TLB (0x80000006/eax):
L2 TLB/cache information: 4K pages & L2 TLB (0x80000006/ebx):
As you can see, my TLB has 4 entries for 1GB pages. This explains well my results: For 1GB and 4GB arenas, the 4 slots of the TLB are entirely sufficient to satisfy all accesses. For 5GB arenas and random-access pattern mode, 4 of the 5 pages only can be mapped through the TLB, so chasing a pointer into the remaining one will cause a miss. The probability of chasing a pointer into the unmapped page is 1/5, so we expect a miss rate of 1/5 = 20% and we get that. For 10GB, 4/10 pages are mapped and 6/10 aren't so the miss rate will be 6/10=60%, and we got that.
So your code works without modifications on my system at least. Your code does not appear to be problematic then.
I then did some research on CPU-World, and while not all CPUs are listed with TLB geometry data, some are. The only one I saw that matched your cpuid printout exactly (there could be more) is the Xeon Westmere-EP X5650; CPU-World does not explicitly say that the Data TLB0 has entries for 1GB pages, but does say the processor has "1 GB large page support".
I then did more research and finally nailed it. An author at RealWorldTech makes an (admittedly, I must yet find a source for this) off-hand comment in the discussion of the memory subsystem of Sandy Bridge. It reads as follows:
After address generation, uops will access the DTLB to translate from a virtual to a physical address, in parallel with the start of the cache access. The DTLB was mostly kept the same, but the support for 1GB pages has improved. Previously, Westmere added support for 1GB pages, but fragmented 1GB pages into many 2MB pages since the TLB did not have any 1GB page entries. Sandy Bridge adds 4 dedicated entries for 1GB pages in the DTLB.
(Emphasis added)
Conclusion
Whatever nebulous concept "CPU supports 1GB pages" represents, Intel thinks it does not imply "TLB supports 1GB page entries". I'm afraid that you will not be able to use 1GB pages on an Intel Westmere processor to reduce the number of TLB misses.
That, or Intel is hoodwinking us by distinguishing huge pages (in the TLB) from large pages.
I was seeing where the stack , heap and shared library's address ranges starts. I'm seeing 2 values for shared library(which i created) and a.out. 3 values for ld and libc. rest is anonymous and stack region starting addresses.
kg>pmap 24545
24545: ./a.out
003d3000 4K r-x-- [ anon ]
004d9000 4K r-x-- /home/trng3/sh/POC/libfile_sys.so
004da000 4K rwx-- /home/trng3/sh/POC/libfile_sys.so
08048000 4K r-x-- /home/trng3/sh/POC/a.out
08049000 4K rwx-- /home/trng3/sh/POC/a.out
46f46000 100K r-x-- /lib/ld-2.5.so
46f5f000 4K r-x-- /lib/ld-2.5.so
46f60000 4K rwx-- /lib/ld-2.5.so
46f68000 1244K r-x-- /lib/libc-2.5.so
4709f000 8K r-x-- /lib/libc-2.5.so
470a1000 4K rwx-- /lib/libc-2.5.so
470a2000 12K rwx-- [ anon ]
b7f8a000 4K rw--- [ anon ]
b7fa1000 4K rw-s- /dev/zero (deleted)
b7fa2000 8K rw--- [ anon ]
bfc0f000 84K rw--- [ stack ]
Why is that we have 2 copies instead of one. Is the one is from the disk and the other one is currently in memory. What is the purpose of having the two copies of the same data in memory ?
They are not multiple copies, they are just different segments with different permissions. Look at the executable:
08048000 4K r-x-- /home/trng3/sh/POC/a.out
08049000 4K rwx-- /home/trng3/sh/POC/a.out
You can see that the first mapping has r-x permissions and the second mapping has rwx permissions. Ordinarily, the second mapping would have rw permissions but maybe your processor isn't capable of setting no-execute permissions, or maybe the feature is turned off, maybe the program was compiled with an executable data segment, or maybe the processor doesn't have the required granularity.
I think i386 without PAE has very coarse granularity for the NX-bit, so that might explain why the data segments are executable but the stack isn't.
46f46000 100K r-x-- /lib/ld-2.5.so
46f5f000 4K r-x-- /lib/ld-2.5.so
46f5f000 - 46f46000 = 25 * 4k = 100k. Its the last segment of the file. I still cant explain why but i found this.
I have an application I have been trying to get "memory leak free", I have been through solid testing on Linux using Totalview's MemoryScape and no leaks found. I have ported the application to Solaris (SPARC) and there is a leak I am trying to find...
I have used "LIBUMEM" on Solaris and it seems to me like it als picks up NO leaks...
Here is my startup command:
LD_PRELOAD=libumem.so UMEM_DEBUG=audit ./link_outbound config.ini
Then I immediatly checked the PRSTAT on Solaris to see what the startup memory usage was:
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
9471 root 44M 25M sleep 59 0 0:00:00 1.1% link_outbou/3
Then I started to send thousands of messages to the application...and over time the PRSTAT grew..
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
9471 root 48M 29M sleep 59 0 0:00:36 3.5% link_outbou/3
And just before I eventually stopped it:
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
9471 root 48M 48M sleep 59 0 0:01:05 5.3% link_outbou/3
Now the interesting part is when I use LIBUMEM on this application that it showing 48 MB memory, like follows:
pgrep link
9471
# gcore 9471
gcore: core.9471 dumped
# mdb core.9471
Loading modules: [ libumem.so.1 libc.so.1 ld.so.1 ]
> ::findleaks
BYTES LEAKED VMEM_SEG CALLER
131072 7 ffffffff79f00000 MMAP
57344 1 ffffffff7d672000 MMAP
24576 1 ffffffff7acf0000 MMAP
458752 1 ffffffff7ac80000 MMAP
24576 1 ffffffff7a320000 MMAP
131072 1 ffffffff7a300000 MMAP
24576 1 ffffffff79f20000 MMAP
------------------------------------------------------------------------
Total 7 oversized leaks, 851968 bytes
CACHE LEAKED BUFCTL CALLER
----------------------------------------------------------------------
Total 0 buffers, 0 bytes
>
The "7 oversized leaks, 851968 bytes" never changes if I send 10 messages through the application or 10000 messages...it is always "7 oversized leaks, 851968 bytes". Does that mean that the application is not leaking according to "libumem"?
What is so frustrating is that on Linux the memory stays constant, never changes....yet on Solaris I see this slow, but steady growth.
Any idea what this means? Am I using libumem correctly? What could be causing the PRSTAT to be showing memory growth here?
Any help on this would be greatly appreciated....thanks a million.
If the SIZE column doesn't grow, you're not leaking.
RSS (resident set size) is how much of that memory you are actively using, it's normal that that value changes over time. If you were leaking, SIZE would grow over time (and RSS could stay constant, or even shrink).
check out this page.
the preferred option is UMEM_DEBUG=default, UMEM_LOGGING=transaction LD_PRELOAD=libumem.so.1. that is the options that I use for debugging solaris memory leak problem, and it works fine for me.
based on my experience with RedHat REL version 5 and solaris SunOS 5.9/5.10, linux process memory footprint doesn't increase gradually, instead it seems it grabs a large chunk memory when it needs extra memory and use them for a long run. (purely based on observation, haven't done any research on its memory allocation mechanism). so you should send a lot more data (10K messages are not big).
you can try dtrace tool to check memory problem at solaris.
Jack