I have a program whose total memory footprint is about 100 MiB (VM size, in top, while stopped in gdb) that's trying to open a new (not-yet-existent) compressed log file using gzopen. This fails, with errno set to ENOMEM, despite the fact that the system has 6GB memory completely free (not even holding caches), and lots of space on the filesystem (that would be ENOSPC, I know). Are there more obscure issues that could cause this? Is something in the library incidentally allocating gigabytes upon gigabytes of memory for no good reason?
For note, my ulimits are set unlimited.
No, there is nothing in zlib that would allocate more than a MiB or two. zlib will only set errno to zero. On its own, it never sets errno to ENOMEM. The library functions it calls may. What version of zlib are you using?
Turns out zlib was not returning ENOMEM. It was bailing out because we had passed it a mode argument w+, which is invalid because it can't read and write a given gzip file at the same time. The ENOMEM came from what happened to be sitting in errno from previous library/system calls.
Related
I have a program where I read a file with fgetc() and one question asked is "does by using mmap() and unmap() can we reduce the amount of cache misses?"
To test it I wrote a dirty piece of code that given an argument on the command line, use mmap and the address returned by mmap or fgetc to read a file character by character and used valgrind --tool=cachegrind on my program to measure the number of cache misses and mmap does not reduce the number of cache misses by about but increase it
I have searched the Internet all day to find useful resources to help me understand why it does this. While I can see that by loading our file into the memory because it is loaded in a contiguous memory zone and we read from the first character to the last, why does it increase cache misses?.
I am looking for any particular resources or explanation that might help me understand what's really going on.
Thanks in advance.
There are several caches. I guess you are talking about the kernel file system cache (or page cache), not about the CPU cache.
You could use the madvise(2) syscall to give hints (after mmap, or pass MAP_POPULATE to mmap(2)) with memory mapping, or use posix_fadvise(2) to give hints (before read) for file I/O.
If using stdio(3) you probably want some larger buffer (e.g. 64Kbytes or more), see setvbuf(3). Notice that GNU glibc fopen(3) may be able to mmap with the m extension in the mode.
See also readahead(2). And linuxatemyram.
Don't hope for miracles, the bottleneck is the hardware disk IO.
I read the manual on malloc() in Solaris, and find that malloc() can set EAGAIN error in Solaris.
The manual writes:
EAGAIN There is not enough memory available to allocate size bytes of memory; but the application could try again later.
Personally, I think if the malloc() returns NULL, there must be a memory leak or some other persistent problem. If that happens how would trying again later help?
So I want to know, in what conditions can malloc() set EAGAIN errno? Has anyone encountered such situation?
Standard malloc() does not set errno to EAGAIN on failure.
Under Unix, malloc() will most probably set errno to ENOMEM.
In general errno EAGAIN means Resource temporarily unavailable. Which means that the operating system may have the resource available in some time.
This is just a way of saying right now I haven't enough memory, but I will try to free some in the nearest future and then I can give it to you.
This may be related to the way operating systems usually allocate memory to processes - even if the memory is free()'d it does not return to the operating system, but is still reserved for that process.
I am only speculating, but perhaps in case of EAGAIN the system will try to reallocate the unused memory assigned to the other processes. This may take time, hence the EAGAIN return code.
I would suggest using sleep() after receiving EAGAIN and then trying it again. After the second call either memory will be allocated or another error returned. If it's ENOMEM, then the case is clear, there's no memory. If it's EAGAIN again... It's up to you.
Standard malloc function doesn't set errno on failure. So it's only specific to the implementation of malloc on Solaris that has this additional feature.
Note that malloc still returns NULL on failure.
So you can still check for return value of malloc() and not bother checking about errno which is the standard malloc's behaviour and should be enough on all occasions. Just that errno provides additional information about failure which may be helpful on certain occasions.
Generally speacking, checking errno makes sense only along with the return code. Relying errno may or may not indicate any failures.
opengroup.org (POSIX) says:
Upon successful completion with size not equal to 0, malloc() shall return a pointer to the allocated space. If size is 0, either a null pointer or a unique pointer that can be successfully passed to free() shall be returned. Otherwise, it shall return a null pointer [CX] and set errno to indicate the error.
ERRORS
The malloc() function shall fail if:
[ENOMEM]
[CX] Insufficient storage space is available.
POSIX malloc description
Solaris comes from a different UNIX family and you will encounter lots of differences between POSIX and base Solaris - i.e., The most glaring thing for new users is normally awk. Solaris has an ancient awk : /usr/bin/awk, /usr/xpg/bin/awk is more "modern", /usr/bin/nawk is what you use when porting shell scripts to Solaris. These anachronisms come from way back and are there so old utilities and syscalls will remain functional on new versions of Solaris.
I'm writing a program in C. I have two main development machines, both Macs. One is running OS X 10.5 and is a 32bit machine, the other is running OS X 10.6 and is 64 bits. The program works fine when compiled and run on the 64bit machine. However, when I compile the exact same program on the 32bit machine it runs for a while and then crashes somewhere inside malloc. Here's the backtrace:
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0xeeb40fe0
0x9036d598 in small_malloc_from_free_list ()
(gdb) bt
#0 0x9036d598 in small_malloc_from_free_list ()
#1 0x90365286 in szone_malloc ()
#2 0x903650b8 in malloc_zone_malloc ()
#3 0x9036504c in malloc ()
#4 0x0000b14c in xmalloc (s=2048) at Common.h:185
...
xmalloc is my custom wrapper which just calls exit if malloc returns NULL, so it's not running out of memory.
If I link the same code with -ltcmalloc it works fine, so I strongly suspect that it's a bug somewhere inside OS X 10.5's default allocator. It may be that my program is causing some memory corruption somewhere and that tcmalloc somehow doesn't get tripped up by it. I tried to reproduce the failure by doing the same sequence of mallocs and frees in a different program but that worked fine.
So my questions are:
Has anyone seen this bug before? Or, alternatively
How can I debug something like this? E.g., is there a debug version of OS X's malloc?
BTW, these are the linked libraries:
$ otool -L ./interp
./interp:
/usr/lib/libgcc_s.1.dylib (compatibility version 1.0.0, current version 1.0.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 111.1.5)
Update: Yeah, it's heap corruption due to writing past the end off an array, it's working now. I should have run valgrind before posting the question. I was nevertheless interested in techniques (other than valgrind) how to protect from such kind of corruption, so thanks for that.
Have you read the manual page for malloc() on MacOS X? In part, it says:
DEBUGGING ALLOCATION ERRORS
A number of facilities are provided to aid in debugging allocation errors in applications. These
facilities are primarily controlled via environment variables. The recognized environment variables
and their meanings are documented below.
ENVIRONMENT
The following environment variables change the behavior of the allocation-related functions.
MallocLogFile <f>
Create/append messages to the given file path instead of writing to
the standard error.
MallocGuardEdges
If set, add a guard page before and after each large block.
MallocDoNotProtectPrelude
If set, do not add a guard page before large blocks, even if the
MallocGuardEdges environment variable is set.
MallocDoNotProtectPostlude
If set, do not add a guard page after large blocks, even if the
MallocGuardEdges environment variable is set.
MallocStackLogging
If set, record all stacks, so that tools like leaks can be used.
MallocStackLoggingNoCompact
If set, record all stacks in a manner that is compatible with the
malloc_history program.
MallocStackLoggingDirectory
If set, records stack logs to the directory specified instead of saving
them to the default location (/tmp).
MallocScribble
If set, fill memory that has been allocated with 0xaa bytes. This
increases the likelihood that a program making assumptions about the contents of freshly allocated memory will fail. Also if set, fill memory
that has been deallocated with 0x55 bytes. This increases the likelihood
that a program will fail due to accessing memory that is no longer allocated.
MallocCheckHeapStart <s>
If set, specifies the number of allocations <s> to wait before begining
periodic heap checks every <n> as specified by MallocCheckHeapEach. If
MallocCheckHeapStart is set but MallocCheckHeapEach is not specified, the
default check repetition is 1000.
MallocCheckHeapEach <n>
If set, run a consistency check on the heap every <n> operations.
MallocCheckHeapEach is only meaningful if MallocCheckHeapStart is also
set.
MallocCheckHeapSleep <t>
Sets the number of seconds to sleep (waiting for a debugger to attach)
when MallocCheckHeapStart is set and a heap corruption is detected. The
default is 100 seconds. Setting this to zero means not to sleep at all.
Setting this to a negative number means to sleep (for the positive number
of seconds) only the very first time a heap corruption is detected.
MallocCheckHeapAbort <b>
When MallocCheckHeapStart is set and this is set to a non-zero value,
causes abort(3) to be called if a heap corruption is detected, instead of
any sleeping.
MallocErrorAbort
If set, causes abort(3) to be called if an error was encountered in
malloc(3) or free(3) , such as a calling free(3) on a pointer previously
freed.
MallocCorruptionAbort
Similar to MallocErrorAbort but will not abort in out of memory conditions, making it more useful to catch only those errors which will cause
memory corruption. MallocCorruptionAbort is always set on 64-bit processes.
That said, I'd still use valgrind first.
Has anyone seen this bug before
Yes, this is common programming bug and is almost certainly in your code. See http://www.efnetcpp.org/wiki/Heap_Corruption
How can I debug something like this?
See the Tools section of the above link.
Here's the situation:
I'm analysing a programs' interaction with a driver by using an LD_PRELOADed module that hooks the ioctl() system call. The system I'm working with (embedded Linux 2.6.18 kernel) luckily has the length of the data encoded into the 'request' parameter, so I can happily dump the ioctl data with the right length.
However quite a lot of this data has pointers to other structures, and I don't know the length of these (this is what I'm investigating, after all). So I'm scanning the data for pointers, and dumping the data at that position. I'm worried this could leave my code open to segfaults if the pointer is close to a segment boundary (and my early testing seems to show this is the case).
So I was wondering what I can do to pre-emptively check whether the current process owns a particular offset before trying to dereference? Is this even possible?
Edit: Just an update as I forgot to mention something that could be very important, the target system is MIPS based, although I'm also testing my module on my x86 machine.
Open a file descriptor to /dev/null and try write(null_fd, ptr, size). If it returns -1 with errno set to EFAULT, the memory is invalid. If it returns size, the memory is safe to read. There may be a more elegant way to query memory validity/permissions with some POSIX invention, but this is the classic simple way.
If your embedded linux has the /proc/ filesystem mounted, you can parse the /proc/self/maps file and validate the pointer/offsets against that. The maps file contains the memory mappings of the process, see here
I know of no such possibility. But you may be able to achieve something similar. As man 7 signal mentions, SIGSEGV can be caught. Thus, I think you could
Start with dereferencing a byte sequence known to be a pointer
Access one byte after the other, at some time triggering SIGSEGV
In SIGSEGV's handler, mark a variable that is checked in the loop of step 2
Quit the loop, this page is done.
There's several problems with that.
Since several buffers may live in the same page, you might output what you think is one buffer that are, in reality, several. You may be able to help with that by also LD_PRELOADing electric fence which would, AFAIK cause the application to allocate a whole page for every dynamically allocated buffer. So you would not output several buffers thinking it is only one, but you still don't know where the buffer ends and would output much garbage at the end. Also, stack based buffers can't be helped by this method.
You don't know where the buffers end.
Untested.
Can't you just check for the segment boundaries? (I'm guessing by segment boundaries you mean page boundaries?)
If so, page boundaries are well delimited (either 4K or 8K) so simple masking of the address should deal with it.
Recently, I work on a video player program on Windows for a CCTV program. As the program has to decode and play many videos streams at the same time, I think it might meet the situation that malloc will fail and I add checking after every malloc.
But genrally speaking, in these code of open source programs that I've read in open source projects, I seldom find any checking of result of malloc. So when malloc fails, most program will just crash. Isn't that unacceptalbe?
My colleagues who write server programs on linux will alloc a enough memory for 100 client connections. So although his program might refuse the 101 client, it will never met a failure of malloc. Is his approach also suitable for desktop applications?
On Linux, malloc() will never fail -- instead, the OOM killer will be triggered and begin killing random processes until the system falls over. Since Linux is the most popular UNIX derivative in use today, many developers have learned to just never check the result of malloc(). That's probably why your colleagues ignore malloc() failures.
On OSes which support failures, I've seen two general patterns:
Write a custom procedure which checks the result of malloc(), and calls abort() if allocation failed. For example, the GLib and GTK+ libraries use this approach.
Store a global list of "purge-able" allocations, such as caches, which can be cleared in the event of allocation failure. Then, try the allocation again, and if it still fails, report it via the standard error reporting mechanisms (which do not perform dynamic allocation).
Follow the standardized API
Even on Linux, ulimit can be used to get a prompt malloc error return. It's just that it defaults to unlimited.
There is a definite pressure to conform to published standards. On most systems, in the long run, and eventually even on Linux, malloc(3) will return a correct indication of failure. It is true that desktop systems have virtual memory and demand paging, but even then not checking malloc(3) only works in a debugged program with no memory leaks. If anything goes wrong, someone will want to set a ulimit and track it down. Suddenly, the malloc check makes sense.
To use the result of malloc without checking for null is unacceptable in code that might be open to use on platforms where malloc can fail, on those it will tend to result in crashes and unpredicatable behaviour. I can't forsee the future, don't know where my code will go, so I would write code with checks for malloc returning null - better to die than behave unpredicatbly!
Strategies for what to do if malloc fails depend upon the kind of applciation and how much confidence you have in the libraries you are using. It some situations the only safe thing to do is halt the whole program.
The idea of preallocating a known quota of memory and parcelling out in some chunks, hence steering clear of actually exhausting the memeory is a good one, if your application's memory usage is predicatable. You can extend this to writing your own memory management routines for use by your code.
It depends on the type of application that you are working on. If the application does work that is divided into discrete tasks where an individual task can be allowed to fail, then checking memory allocations can be recovered from gracefully.
But in many cases, the only reasonable way to respond to a malloc failure is by terminating the program. Allowing your code to just crash on the inevitable null dereference will achieve that. It would certainly always be better to dump a log entry or error message explaining the error, but in the real world we work on limited schedules. Sometimes the return on investment of pedantic error handling isn't there.
Always check, and pre-allocate a buffer that can be freed in this case so you can warn the user to save his data and shut down the application.
Depends on the app you write. Of course you always need to check the return value of malloc(). However, handling OOM gracefully only makes sense in very cases, such as low-level crucial system services, or when writing a library that might be used be them. Having a malloc wrapper that aborts on OOM is hence very common in many apps and frameworks. Often those wrappers are named xmalloc() or similar.
GLib's g_malloc() is aborting, too.
If you are going to handle huge amounts of memory, and want to make statements to Linux like "now I have memory area ABC and I don't need the B piece, do as you wish", have a look to mmap()/madvise() family of functions available in stock GNU C library. Depending on your usage patterns, the code can end up even simpler than using malloc. This API can also be used to help Linux not waste memory by caching files you are going to read/write only once.
They are nicely documented in GNU libc info documentation.
It is usually impossible for a program to handle running out of memory. What are you going to do? Open a file and log something? If you try to allocate a large block and it fails, you may have a fallback and try again with a smaller buffer, but if you fail to allocate 10 bytes, there is not much you can do. And checking for null constantly convolutes the code. For that reason I usually add a custom function that does checking and aborts on fail:
static void* xmalloc(size_t sz) {
void* p = malloc(sz);
if (!p) abort();
return p;
}