I am involved in cross platform software development over embedded Linux environment. I see that my application runs very slow once it starts using 30+ MB of memory.
I have followed two different approach but both end up in same results.
Approach 1
Allocate using valloc (requires aligned memory), I kept a count of memory allocated after reaching 30MB the application goes slow.
Approach 2
Allocate large amount of memory 40MB at initialization. Further allocations are done from this partition (never freed though out the program execution). Again the application goes slow once 30+ MB is used. However anything less than 30 MB application runs good.
PS : I could use this approach as uniform blocks of memory were allocated. Total available memory is 128MB.
I am wondering why my application slows down upon accessing the memory, even though allocation was successful.
# more /proc/meminfo
MemTotal: 92468 kB
MemFree: 50812 kB
Buffers: 0 kB
Cached: 22684 kB
SwapCached: 0 kB
Active: 4716 kB
Inactive: 18540 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 92468 kB
LowFree: 50812 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 0 kB
Writeback: 0 kB
AnonPages: 600 kB
Mapped: 952 kB
Slab: 7256 kB
PageTables: 108 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
CommitLimit: 46232 kB
Committed_AS: 1736 kB
VmallocTotal: 262136 kB
VmallocUsed: 10928 kB
VmallocChunk: 246812 kB
Total available memory is 128MB
How sure are you of that number? If it's the total system memory, some amount will be used up by the kernel, drivers, and other processes. It could be that after allocating 30MB of memory, you start swapping to disk. That would certainly explain a sudden slowness.
I am wondering why my application slows down upon accessing the memory, even though allocation was successful.
Possibly because Linux won't actually back the virtual allocation with physical pages until you write to them. So although your malloc() succeeds, it's only really updating the page tables until such time as you actually use that memory.
Looking at your meminfo dump above, I think you have around 50MB free on the system:
MemFree: 50812 kB
Furthermore, around 22MB are in use as cache:
Cached: 22684 kB
I wonder whether your app using over 30MB of memory might push the kernel's VM to the point that it decides to start freeing up cached data. If that should happen you might expect a slowdown if e.g. filesystem buffers that you might have been using get flushed from the cache.
I note you don't have swap enabled (SwapTotal is 0kB). If you did, your app could be causing the VM to thrash.
If I were you trying to debug this, I would try running "top" as my app hit the 30MB memory use point and see whether any kernel threads are suddenly getting busier. I would also use "vmstat" to track system i/o and memory cache/buffer allocations. Finally, I'd try having a bit of a poke around the /proc filesytem to see if you can glean anything there (for example, /proc/sys/vm might be worth looking at).
Related
hi i am in need of your help guys again, i need to analyze smaps. of c files.
check this out:
The other lines provide a number of extra details including:
The mapping size (Size)
The proportional share of this mapping that belongs to the process
(Pss)
The amount of the mapping that currently sits in RAM (Rss)
The amount of memory currently marked as referenced or accessed
(Referenced)
How many dirty and clean private pages are in the mapping
(Private_Dirty and Private_Clean respectively). Note: Even if a page
is part of the MAP_SHARED mapping, if it has only one pte mapped, is
counted as private (not as shared).
The amount of memory that does not belong to any file (Anonymous)
The amount of would-be-anonymous memory that is used but stored on
swap (Swap)
and this is a typical smaps file:
Size: 4 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
Rss: 4 kB
Pss: 4 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 4 kB
Private_Dirty: 0 kB
Referenced: 4 kB
Anonymous: 0 kB
LazyFree: 0 kB
AnonHugePages: 0 kB
ShmemPmdMapped: 0 kB
FilePmdMapped: 0 kB
Shared_Hugetlb: 0 kB
Private_Hugetlb: 0 kB
Swap: 0 kB
SwapPss: 0 kB
Locked: 0 kB
THPeligible: 0
VmFlags: rd mr mw me dw sd
so i just want to know how much memory the process takes, as i understand i need to sum the pss, the actual physical memory allocated, with the rss, the virtual memory utilized. am i wrong?
I am working on very large datasets. I am trying to allocate 16GB in one single array
for some reason I don't know, if I try to access the position (let's say) "600 million" I get that this position isn't accessible and I get a segmentation fault error in run time.
Does anyone know why this happens?
my architecture is 64 bit, therefore it should be possible to address 16billion of addresses, or at least this is what I think.
my call is:
int* array = (int*) malloc(sizeof(int)* 1000000000 * 4);
thank you all!
# ScottChamberlain, #Sanhdrir: it fails silently because it doesn't return a NULL pointer. As you may have noticed this array represents a matrix. before allocating it in this way I tried to allocate it with a pointer to pointers. this required more space in memory (8billions of bytes more) in order to save the address of each pointer. In this way I got my program killed, while now I don't, but when I try to access some addresses I get segmentation fault.
edit if i allocate 10 blocks of 160 millions (or even more) i don't get any error and the memory is allocated. the problem is in allocating one big block. my question now becomes: is there a way to overcome this limit?
edit2 #Sanhadrin your hypothesis are all correct, except for the fact that I use gcc.
I am reporting here the contents of the /proc/meminfo/ file
MemTotal: 198049828 kB
MemFree: 113419800 kB
Buffers: 153064 kB
Cached: 5689680 kB
SwapCached: 124780 kB
Active: 73880720 kB
Inactive: 8998084 kB
Active(anon): 70843644 kB
Inactive(anon): 6192548 kB
Active(file): 3037076 kB
Inactive(file): 2805536 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 201273340 kB
SwapFree: 164734524 kB
Dirty: 0 kB
Writeback: 0 kB
AnonPages: 76915376 kB
Mapped: 16376 kB
Shmem: 72 kB
Slab: 190352 kB
SReclaimable: 124660 kB
SUnreclaim: 65692 kB
KernelStack: 3432 kB
PageTables: 259828 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 300298252 kB
Committed_AS: 160461824 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 733424 kB
VmallocChunk: 34258351392 kB
HardwareCorrupted: 0 kB
AnonHugePages: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 195520 kB
DirectMap2M: 37507072 kB
DirectMap1G: 163577856 kB
The amount of information makes this difficult to answer for sure, but it seems:
You're on an x86-64 architecture
You may be running Linux
You have at least 16GB of RAM, but it's uncertain whether you have over 16GB of free RAM
You're compiling with gcc with settings configured to create a 64-bit binary
Your calls to malloc() are returning a seemingly valid (non-null) pointer
Indexing into that memory may result in a segmentation fault
If you read malloc's man page, it says:
Notes
By default, Linux follows an optimistic memory allocation strategy. This means that when malloc() returns non-NULL there is no guarantee that the memory really is available. In case it turns out that the system is out of memory, one or more processes will be killed by the OOM killer. For more information, see the description of /proc/sys/vm/overcommit_memory and /proc/sys/vm/oom_adj in proc(5), and the Linux kernel source file Documentation/vm/overcommit-accounting.
following up by reading man proc, it states:
/proc/sys/vm/overcommit_memory
This file contains the kernel virtual memory accounting mode.
Values are:
0: heuristic overcommit (this is the default)
1: always overcommit, never check
2: always check, never overcommit
In mode 0, calls of mmap(2) with MAP_NORESERVE are not
checked, and the default check is very weak, leading to the
risk of getting a process "OOM-killed". Under Linux 2.4, any
nonzero value implies mode 1.
So, depending on the setting of overcommit_memory, malloc() may return a valid pointer even when the requested space is not available, in the belief that by the time you use that much memory, other processes will have been terminated, freeing up the needed space. That's not the case here because you're using it immediately - meaning you actually don't have 16GB of free space to work with in the first place. Further:
In mode 2 (available since Linux 2.6), the total virtual
address space that can be allocated (CommitLimit in
/proc/meminfo) is calculated as
CommitLimit = (total_RAM - total_huge_TLB) *
overcommit_ratio / 100 + total_swap
where:
* total_RAM is the total amount of RAM on the system;
* total_huge_TLB is the amount of memory set aside for
huge pages;
* overcommit_ratio is the value in
/proc/sys/vm/overcommit_ratio; and
* total_swap is the amount of swap space.
For example, on a system with 16GB of physical RAM, 16GB of
swap, no space dedicated to huge pages, and an
overcommit_ratio of 50, this formula yields a CommitLimit of
24GB.
Since Linux 3.14, if the value in
/proc/sys/vm/overcommit_kbytes is nonzero, then CommitLimit is
instead calculated as:
CommitLimit = overcommit_kbytes + total_swap
So, at the very least, you can do a better job of preventing it from overcommitting, so that malloc() fails as expected - but the underlying issue is that you're asking for an extremely large amount of space that you seemingly don't have. You can check /proc/meminfo to see how much is actually free at any one time, and other memory statistics, to see what the issue is and what your real limits are.
If you need to allocate memory blocks of very large sizes, you should use your operating system services to map virtual memory to the process; not malloc().
If you have any hope of allocating 16GB on your system, you need to do it that way.
On Unix systems, can I find the physical memory address for a given virtual memory address? If yes, how?
The real problem I'm trying to solve is, how can I find out if the OS maps two virtual addresses to the exact same physical region?
E.g. in the below smaps example, how do I know if both memory regions are, in fact, physically identical?
cat /proc/<pid>/smaps
...
7f7165d42000-7f7265d42000 r--p 00000000 00:14 641846 /run/shm/test (deleted)
Size: 4194304 kB
Rss: 4194304 kB
Pss: 2097152 kB
...
VmFlags: rd mr mw me nr sd
7f7265d42000-7f7365d42000 rw-s 00000000 00:14 641846 /run/shm/test
Size: 4194304 kB
Rss: 4194304 kB
Pss: 2097152 kB
...
VmFlags: rd wr sh mr mw me ms sd
...
Bonus: Is there a way to simply do it programmatically in C ?
I tried to look for duplicates but could not find a pertinent one.
On Linux you can do it by parsing files in /proc/<pid>, namely, maps and pagemap. There is a little user-space tool that does it for you here.
Compile it (no special options are needed), run page-types -p <pid> -l -N, find the virtual page address in the first column, read the physical address in the second column.
It should be straightforward to turn this into a library and use programmatically. Bear in mind that some operations of the utility require root access (such as reading /proc/kpageflags), however, none is needed for this task.
I have an application I have been trying to get "memory leak free", I have been through solid testing on Linux using Totalview's MemoryScape and no leaks found. I have ported the application to Solaris (SPARC) and there is a leak I am trying to find...
I have used "LIBUMEM" on Solaris and it seems to me like it als picks up NO leaks...
Here is my startup command:
LD_PRELOAD=libumem.so UMEM_DEBUG=audit ./link_outbound config.ini
Then I immediatly checked the PRSTAT on Solaris to see what the startup memory usage was:
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
9471 root 44M 25M sleep 59 0 0:00:00 1.1% link_outbou/3
Then I started to send thousands of messages to the application...and over time the PRSTAT grew..
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
9471 root 48M 29M sleep 59 0 0:00:36 3.5% link_outbou/3
And just before I eventually stopped it:
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
9471 root 48M 48M sleep 59 0 0:01:05 5.3% link_outbou/3
Now the interesting part is when I use LIBUMEM on this application that it showing 48 MB memory, like follows:
pgrep link
9471
# gcore 9471
gcore: core.9471 dumped
# mdb core.9471
Loading modules: [ libumem.so.1 libc.so.1 ld.so.1 ]
> ::findleaks
BYTES LEAKED VMEM_SEG CALLER
131072 7 ffffffff79f00000 MMAP
57344 1 ffffffff7d672000 MMAP
24576 1 ffffffff7acf0000 MMAP
458752 1 ffffffff7ac80000 MMAP
24576 1 ffffffff7a320000 MMAP
131072 1 ffffffff7a300000 MMAP
24576 1 ffffffff79f20000 MMAP
------------------------------------------------------------------------
Total 7 oversized leaks, 851968 bytes
CACHE LEAKED BUFCTL CALLER
----------------------------------------------------------------------
Total 0 buffers, 0 bytes
>
The "7 oversized leaks, 851968 bytes" never changes if I send 10 messages through the application or 10000 messages...it is always "7 oversized leaks, 851968 bytes". Does that mean that the application is not leaking according to "libumem"?
What is so frustrating is that on Linux the memory stays constant, never changes....yet on Solaris I see this slow, but steady growth.
Any idea what this means? Am I using libumem correctly? What could be causing the PRSTAT to be showing memory growth here?
Any help on this would be greatly appreciated....thanks a million.
If the SIZE column doesn't grow, you're not leaking.
RSS (resident set size) is how much of that memory you are actively using, it's normal that that value changes over time. If you were leaking, SIZE would grow over time (and RSS could stay constant, or even shrink).
check out this page.
the preferred option is UMEM_DEBUG=default, UMEM_LOGGING=transaction LD_PRELOAD=libumem.so.1. that is the options that I use for debugging solaris memory leak problem, and it works fine for me.
based on my experience with RedHat REL version 5 and solaris SunOS 5.9/5.10, linux process memory footprint doesn't increase gradually, instead it seems it grabs a large chunk memory when it needs extra memory and use them for a long run. (purely based on observation, haven't done any research on its memory allocation mechanism). so you should send a lot more data (10K messages are not big).
you can try dtrace tool to check memory problem at solaris.
Jack
In top, I noticed that my c program (using CUDA 3.2) has a virtual size of 28g or more (looking at VIRT), on every run right from the beginning. This doesn't make ANY sense to me. The resident memory makes sense and is only around 2g on my largest data set. I know at some point in the past the virtual size was not so large, but I'm not sure when the change occurred.
Why would my process use 28g of virtual memory (or why would top's VIRT be so large)? I understand that VIRT includes the executable binary (only 437K), shared libraries, and "data area". What is the "data area"? How can I find out how much memory the shared libraries require? What about other elements of my process's total memory?
contents of /proc/< pid >/smaps (1022 lines) here: http://pastebin.com/fTJJneXr
One of the entries from smaps show that one of them accounts for MOST of it, but has no label... how can I find out what this "blank" entry is that has 28gb?
200000000-900000000 ---p 00000000 00:00 0
Size: 29360128 kB
Rss: 0 kB
Pss: 0 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 0 kB
Referenced: 0 kB
Anonymous: 0 kB
Swap: 0 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
Locked: 0 kB
--
ubuntu 11.04 64-bit
16 GB RAM
UVA requires CUDA to allocate enough virtual memory to map all of both GPU and system memory. Please see post #5 in the following thread on the NVIDIA forums:
These two regions would be the culprit:
200000000-900000000 ---p 00000000 00:00 0
Size: 29360128 kB
Rss: 0 kB
Pss: 0 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 0 kB
Referenced: 0 kB
Anonymous: 0 kB
Swap: 0 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
Locked: 0 kB
7f2e9deec000-7f2f131ec000 rw-s 33cc0c000 00:05 12626 /dev/nvidia0
Size: 1920000 kB
Rss: 1920000 kB
Pss: 1920000 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 1920000 kB
Referenced: 1920000 kB
Anonymous: 0 kB
Swap: 0 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
Locked: 0 kB
The first segment is a 30GB anonymous private segment, with no access to it allowed, mapped from 0x200000000-0x900000000. A bit mysterious, indeed - probably something to do with the nvidia driver's internal workings (maybe it wants to prevent allocations with those specific addresses?). It's not actually occupying any memory though - Rss is zero, and the access flags (---p) are set to deny all access, so (at the moment) actually allocating any memory to it won't happen. It's just a reserved section in your address space.
The other bit is the /dev/nvidia0 mapping, of two gigabytes. This is likely a direct mapping of part of the video card's RAM. It's not occupying memory as such - it's just reserving part of your address space to use to communicate with hardware.
So it's not really something to worry about. If you want to know how much memory you're really using, add up the Rss figures for all other memory segments (use the Private_* entries instead if you want to skip shared libraries and such).