CloseHandle necessary when using HeapDestroy? - c

I allocated an array of HANDLE on an Heap and then each handle is associated with a thread.
Once I'm finished with the work, do I have to call CloseHandle() on each of them before calling HeapDestroy()? Or does the latter call make the first useless?

Always close a handle once you've finished with it - it is good practice. The Windows Kernel has tables which tracks assigned handles and who they are assigned to, so it will be in your best interest to remember to close them.
Handle leaks is also a thing which exist and it is when a caller requests for a handle but doesn't close it, and they pile up over a duration of time.
You can also occasionally cause other problems by not closing handles (e.g. sharing violations if you opened a handle to a file and denied sharing but you've kept the handle open when you no longer need the open handle).
To be precise though, handles are fake - the Windows Kernel translates them because it relies on an internal, undocumented and non-exported table which stores the real pointer address to a kernel object linked to that fake handle.

Yes, certainly you must first close the handles! Windows does not know (or care) what data you have stored in your heap, so it cannot close the handles automatically.

Related

garbage collection for `fopen()`?

Boehm gc only deal with memory allocation. But if one wants to use garbage collection to deal with fopen() so that fclose() is no longer needed. Is there a way to do so in C?
P.S.
For example, PyPy takes the garbage collection approach to deal with opening files.
The most obvious effect of this is that files (and sockets, etc) are not promptly closed when they go out of scope. For files that are opened for writing, data can be left sitting in their output buffers for a while, making the on-disk file appear empty or truncated.
http://doc.pypy.org/en/latest/cpython_differences.html
In case it's not obvious, nothing Boehm GC does is possible in C. The whole library is a huge heap of undefined behavior that kinda happens to work on some (many?) real-world implementations. The more advanced, especially in the area of safety, C implementations get, the less likely any of it is to continue to work.
With that said, I don't see any reason the same principle couldn't be extended to FILE* handles. The problem, however, is that with it necessarily being a conservative GC, false positives for remaining references would prevent the file from being closed, and that has visible consequences on the state of the process and the filesystem. If you explicitly fflush in the right places, it might be acceptably only-half-broken, though.
There's absolutely no meaningful way to do this with file descriptors, on the other hand, because they are small integers. You'll essentially always have false positives for remaining references.
TL;DR: Yes, but. More but than yes.
First things first. Since the standard C library must itself automatically garbage collect open file handles in the exit() function (see standard quotes below), it is not necessary to ever call fclose as long as:
You are absolutely certain that your program will eventually terminate either by returning from main() or by calling exit().
You don't care how much time elapses before the file is closed (making data written to the file available to other processes).
You don't need to be informed if the close operation failed (perhaps because of disk failure).
Your process will not open more than FOPEN_MAX files, and will not attempt to open the same file twice. (FOPEN_MAX must be at least eight, but that includes the three standard streams.)
Of course, aside from very simple toy applications, those guarantees are pretty restrictive, particularly for files opened for writing. For a start, how are you going to guarantee that the host does not crash or get powered down (voiding condition 1)? So most programmers regard it as very bad style to not close all open files.
All the same, it is possible to imagine an application which only opens files for reading. In that case, the most serious issue with never calling fclose will be the last one, the simultaneous open file limit. Five is a pretty small number, and even though most systems have much higher limits, they almost all have limits; if an application runs long enough, it will inevitably open too many files. (Condition 3 might be a problem, too, although not all operating systems impose this limit, and few systems impose the limit on files opened only for reading.)
As it happens, these are precisely the issues that garbage collection can, in theory, help solve. With a bit of work, it is possible to get a garbage collector to help manage the number of simultaneously open files. But... as mentioned, there are a number of Buts. Here's a few:
The standard library is under no obligation to dynamically allocate FILE objects using malloc, or indeed to dynamically allocate them at all. (A library which only allowed eight open files might have an internal statically allocated array of eight FILE structures, for example.) So the garbage collector might never see the storage allocations. In order to involve the garbage collector in the removal of FILE objects, every FILE* needs to be wrapped inside a dynamically-allocated proxy (a "handle"), and every interface which takes or returns FILE* pointers must be wrapped with one which creates a proxy. That's not too much work, but there are a lot of interfaces to wrap and the use of the wrappers basically relies on source modification; you might find it difficult to introduce FILE* proxies if some files are opened by external library functions.
Although the garbage collector can be told what to do before it deletes certain objects (see below), most garbage collector libraries have no interface which provides for an object creation limit other than the availability of memory. The garbage collector can only solve the "too many open files" problem if it knows how many files are allowed to be open simultaneously, but it doesn't know and it doesn't have a way for you tell it. So you have to arrange for the garbage collector to be called manually when this limit is about to be breached. Of course, since you are already wrapping all calls to fopen, as per point 1, you can add this logic to your wrapper, either by tracking the open file count, or by reacting to an error indication from fopen(). (The C standard doesn't specify a portable mechanism for detecting this particular error, but Posix says that fopen should fail and set errno to EMFILE if the process has too many files open. Posix also defines the ENFILE error value for the case where there are too many files open in total over all processes; it's probably worthwhile to consider both of these cases.)
In addition, the garbage collector doesn't have a mechanism to limit garbage collection to a single resource type. (It would be very difficult to implement this in a mark-sweep garbage collector, such as the BDW collector, because all used memory needs to be scanned to find live pointers.) So triggering garbage collection whenever all file descriptor slots are used up could turn out to be quite expensive.
Finally, the garbage collector does not guarantee that garbage will be collected in a timely manner. If there is no resource pressure, the garbage collector could stay dormant for a long time, and if you are relying on the garbage collector to close your files, that means that the files could remain open for an unlimited amount of time even though they are no longer in use. So the first two conditions in the original list of requirements for omitting fclose() continue to be in force, even with a garbage collector.
So. Yes, but, but, but, but. Here's what the Boehm GC documentation recommends (abbreviated):
Actions that must be executed promptly… should be handled by explicit calls in the code.
Scarce system resources should be managed explicitly whenever convenient. Use [garbage collection] only as a backup mechanism for the cases that would be hard to handle explicitly.
If scarce resources are managed with [the garbage collector], the allocation routine for that resource (e.g. open file handles) should force a garbage collection (two if that doesn't suffice) if it finds itself short of the resource.
If extremely scarce resources are managed (e.g. file descriptors on systems which have a limit of 20 open files), it may be necessary to introduce a descriptor caching scheme to hide the resource limit.
Now, suppose you've read all of that, and you still want to do it. It's actually pretty simple. As mentioned above, you need to define a proxy object, or handle, which holds a FILE*. (If you are using Posix interfaces like open() which use file descriptors -- small integers -- instead of FILE structures, then the handle holds the fd. This is a different object type, obviously, but the mechanism is identical.)
In your wrapper for fopen() (or open(), or any of the other calls which return open FILE*s or files), you dynamically allocate a handle, and then (in the case of the Boehm GC) call GC_register_finalizer to tell the garbage collector what function to call when the resource is about to be deleted. Almost all GC libraries have some such facility; search for finalizer in their documentation. Here's the documentation for the Boehm collector, out of which I extracted the list of warnings above.
Watch out to avoid race conditions when you are wrapping the open call. The recommended practice is as follows:
Dynamically allocate the handle.
Initialize its contents to a sentinel value (such as -1 or NULL) which indicates that the handle has not yet been assigned to an open file.
Register a finalizer for the handle. The finalizer function should check for the sentinel value before attempting to call fclose(), so registering the handle at this point is fine.
Open the file (or other such resource).
If the open succeeds, reset the handle to use the returned from the open. If the failure has to do with resource exhaustion, trigger a manual garbage collection and repeat as necessary. (Be careful to limit the number of times you do that for a single open wrapper. Sometimes you need to do it twice, but three consecutive failures probably indicates some other kind of problem.)
If the open eventually succeeded, return the handle. Otherwise, optionally deregister the finalizer (if your GC library allows that) and return an error indication.
Obligatory C standard quotes
Returning from main() is the same as calling exit()
§5.1.2.2.3 (Program termination): (Only applies to hosted implementations)
If the return type of the main function is a type compatible with int, a return from the initial call to the main function is equivalent to calling the exit function with the value returned by the main function as its argument; reaching the } that terminates the main function returns a value of 0.
Calling exit() flushes all file buffers and closes all open files
§7.22.4.4 (The exit function):
Next, all open streams with unwritten buffered data are flushed, all open streams are closed, and all files created by the tmpfile function are removed…

Cleanup when user forces my program to exit?

In my C program, I have many data structures that are dynamically allocating memory. If the user does any of the following:
Alt-F4
Clicks on the X
Force quits via Task Manager, etc
then my program will have a memory leak. How do I do a set on close request feature so I can clean up my global variables before my program is shut down unnaturally?
(Your terminology strongly suggests a Windows-specific application so I have written this answer accordingly. On other modern operating systems, the terminology is different, but the behavior is the same, except for some embedded environments, but if you were coding for those you would not have asked this question.)
First, the OS will automatically deallocate all memory associated with your process when it is terminated. You never have to worry about RAM remaining allocated.1
Second, you can get notified upon Alt-F4 / window close. I don't know the details, never having written a GUI application for Windows, but there should be some way you can arrange to get it treated the same as selecting Quit from the application's menus; your "toolkit" might even do this for you by default. The proper use of this notification is to save files to disk and cleanly shut down network dialogues. (The OS will also automatically close all file handles and disconnect all network sockets, but it won't flush unsaved changes for you, nor will it send application-layer goodbye messages.)
Third, you cannot receive a notification upon "force quit", and this is by design, because it's the software equivalent of the big red emergency stop button on an industrial robot -- it's for situations where making the process go away right now, no matter what is more important than any potential data loss. As you can't do anything about this, you should not worry about it.
1 As discussed in the comments: yes, this means it is not necessary to deallocate memory manually before exiting. In fact, it is more efficient if you don't, as discussed here and here. The "common wisdom" that manual deallocation is required dates to the early days of microcomputers; (some iterations of) CP/M and MS-DOS did indeed need each application to deallocate memory manually; nowadays it is an instance of cargo-cult programming.
That said, there are some good reasons to deallocate memory manually before exiting. The most obvious is that some leak-detection tools need you to do it, because they do not know the difference between "memory that the application could have freed if it had bothered" and "memory that the application has lost all valid pointers to, and cannot free anymore". A less obvious case is if you are writing a library, or there's a possibility that your application might get turned into a library in the future. Libraries need to be able to manually free all of their allocations upon request by the application, because the application might need to set up, use, and tear down the library many times in the lifetime of one process. (However, it will be more efficient if the application sets up the library once in its lifetime, uses it repeatedly, and then doesn't bother tearing it down before exiting.)
You can register a function to be called at exit, with atexit. This should handle normal termination (by returning from main or a call to exit anywhere in the program) and Alt+f4 and clicking the X too. However, there is nothing your program can do when it is killed (force quit in the task manager). Fortunately, most modern operating systems can and do clean up the memory used by your program, even when it dies of non-natural causes.
Here is an example of a program using atexit. The program creates (or truncates) a file called atexit.txt. When the program is terminated normally it writes Exited! to that file and closes the handle.
#include <stdlib.h>
#include <stdio.h>
FILE * quitFp;
void onExit(){
fputs("Exited!", quitFp);
fclose(quitFp);
}
int main(){
quitFp = fopen("atexit.txt", "w+");
if(!quitFp){
return 1;
}
atexit(onExit);
puts("Press enter to exit");
getchar();
return 0;
}
On Windows 8.1, compiled with MinGW/GCC it works for normal return and exit and when I click the X on the console window.

How to close a memory mapped file in C that I did not explicitly open?

How can I close a memory mapped file that I did not explicitly open using mmap in C? I'm able to see the name of the memory-mapped file using lsof and I'd like to either be able to somehow get the address and size of the file so that I may call munmap. I know that I should be able to access some memory info via /proc/self/mem/ but I'm not entirely sure how to progress from there and if this would be the safe/correct way of doing so. Thanks.
EDIT
I'm using a library that writes to a device file. What I'm trying to do is simulate this device becoming temporarily physically unavailable and having the program recover from this. A part of the device becoming available again is having it's driver reinitialized, this cannot happen with my process still maintaining a reference since the module count won't be zero and the kernel will therefore not allow unloading. I was able to identify all the file descriptors and close these, but lsof points to a shared memory location that still references the device. Since I did not explicitly open this, and the code that did so is not accessible by me, I was hoping for a way to still be able to close it.
I would suggest the most likely safe solution is to use the exec() system call to return to a known state.
What you are asking for is how to yank the memory mapping out from some library function; which sets you up for undefined behavior in the not too distant future, probably involving the heap manager. You don't want to debug that.
OP = Heisenbug. Well that's singularly fitting.
As for the question that now appears after edit; we have here the XY problem. What would happen on device failure is not freeing the memory mapping, but most likely changing it so that all access to the region yields SIGBUS, which is a lot harder to simulate.

Atomically write 64kB

I need to write something like 64 kB of data atomically in the middle of an existing file. That is all, or nothing should be written. How to achieve that in Linux/C?
I don't think it's possible, or at least there's not any interface that guarantees as part of its contract that the write would be atomic. In other words, if there is a way that's atomic right now, that's an implementation detail, and it's not safe to rely on it remaining that way. You probably need to find another solution to your problem.
If however you only have one writing process, and your goal is that other processes either see the full write or no write at all, you can just make the changes in a temporary copy of the file and then use rename to atomically replace it. Any reader that already had a file descriptor open to the old file will see the old contents; any reader opening it newly by name will see the new contents. Partial updates will never be seen by any reader.
There are a few approaches to modify file contents "atomically". While technically the modification itself is never truly atomic, there are ways to make it seem atomic to all other processes.
My favourite method in Linux is to take a write lease using fcntl(fd, F_SETLEASE, F_WRLCK). It will only succeed if fd is the only open descriptor to the file; that is, nobody else (not even this process) has the file open. Also, the file must be owned by the user running the process, or the process must run as root, or the process must have the CAP_LEASE capability, for the kernel to grant the lease.
When successful, the lease owner process gets a signal (SIGIO by default) whenever another process is opening or truncating the file. The opener will be blocked by the kernel for up to /proc/sys/fs/lease-break-time seconds (45 by default), or until the lease owner releases or downgrades the lease or closes the file, whichever is shorter. Thus, the lease owner has dozens of seconds to complete the "atomic" operation, without any other process being able to see the file contents.
There are a couple of wrinkles one needs to be aware of. One is the privileges or ownership required for the kernel to allow the lease. Another is the fact that the other party opening or truncating the file will only be delayed; the lease owner cannot replace (hardlink or rename) the file. (Well, it can, but the opener will always open the original file.) Also, renaming, hardlinking, and unlinking/deleting the file does not affect the file contents, and therefore are not affected at all by file leases.
Remember also that you need to handle the signal generated. You can use fcntl(fd, F_SETSIG, signum) to change the signal. I personally use a trivial signal handler -- one with an empty body -- to catch the signal, but there are other ways too.
A portable method to achieve semi-atomicity is to use a memory map using mmap(). The idea is to use memmove() or similar to replace the contents as quickly as possible, then use msync() to flush the changes to the actual storage medium.
If the memory map offset in the file is a multiple of the page size, the mapped pages reflect the page cache. That is, any other process reading the file, in any way -- mmap() or read() or their derivatives -- will immediately see the changes made by the memmove(). The msync() is only needed to make sure the changes are also stored on disk, in case of a system crash -- it is basically equivalent to fsync().
To avoid preemption (kernel interrupting the action due to the current timeslice being up) and page faults, I'd first read the mapped data to make sure the pages are in memory, and then call sched_yield(), before the memmove(). Reading the mapped data should fault the pages into page cache, and sched_yield() releases the rest of the timeslice, making it extremely likely that the memmove() is not interrupted by the kernel in any way. (If you do not make sure the pages are already faulted in, the kernel will likely interrupt the memmove() for each page separately. You won't see that in the process, but other processes see the modifications to occur in page-sized chunks.)
This is not exactly atomic, but it is practical: it does not give you any guarantees, only makes the race window very very short; therefore I call this semi-atomic.
Note that this method is compatible with file leases. One could try to take a write lease on the file, but fall back to leaseless memory mapping if the lease is not granted within some acceptable time period, say a second or two. I'd use timer_create() and timer_settime() to create the timeout timer, and the same empty-body signal handler to catch the SIGALRM signal; that way the fcntl() is interrupted (returns -1 with errno == EINTR) when the timeout occurs -- with the timer interval set to some small value (say 25000000 nanoseconds, or 0.025 seconds) so it repeats very often after that, interrupting syscalls if the initial interrupt is missed for any reason.
Most userspace applications create a copy of the original file, modify the contents of the copy, then replace the original file with the copy.
Each process that opens the file will only see complete changes, never a mix of old and new contents. However, anyone keeping the file open, will only see their original contents, and not be aware of any changes (unless they check themselves). Most text editors do check, but daemons and other processes do not bother.
Remember that in Linux, the file name and its contents are two separate things. You can open a file, unlink/remove it, and still keep reading and modifying the contents for as long as you have the file open.
There are other approaches, too. I do not want to suggest any specific approach, because the optimal one depends heavily on the circumstances: Do the other processes keep the file open, or do they always (re)open it before reading the contents? Is atomicity preferred or absolutely required? Is the data plain text, structured like XML, or binary?
EDITED TO ADD:
Please note that there are no ways to guarantee beforehand that the file will be successfully modified atomically. Not in theory, and not in practice.
You might encounter a write error with the disk full, for example. Or the drive might hiccup at just the wrong moment. I'm only listing three practical ways to make it seem atomic in typical use cases.
The reason write leases are my favourite is that I can always use fcntl(fd,F_GETLEASE,&ptr) to check whether the lease is still valid or not. If not, then the write was not atomic.
High system load is unlikely to cause the lease to be broken for a 64k write, if the same data has been read just prior (so that it will likely be in page cache). If the process has superuser privileges, you can use setpriority(PRIO_PROCESS,getpid(),-20) to temporarily raise the process priority to maximum while taking the file lease and modifying the file. If the data to be overwritten has just been read, it is extremely unlikely to be moved to swap; thus swapping should not occur, either.
In other words, while it is quite possible for the lease method to fail, in practice it is almost always successful -- even without the extra tricks mentioned in this addendum.
Personally, I simply check if the modification was not atomic, using the fcntl() call after the modification, prior to msync()/fsync() (making sure the data hits the disk in case a power outage occurs); that gives me an absolutely reliable, trivial method to check whether the modification was atomic or not.
For configuration files and other sensitive data, I too recommend the rename method. (Actually, I prefer the hardlink approach used for NFS-safe file locking, which amounts to the same thing but uses a temporary name to detect naming races.) However, it has the problem that any process keeping the file open will have to check and reopen the file, voluntarily, to see the changed contents.
Disk writes cannot be atomic without a layer of abstraction. You should keep a journal and revert if a write is interrupted.
As far as I know a write below the size of PIPE_BUF is atomic. However I never rely on this. If the programs that access the file are written by you, you can use flock() to achieve exclusive access. This system call sets a lock on the file and allows other processes that know about the lock to get access or not.

Choice of Linux IPC technique

I am building an application which takes as it's input an executable , executes it and keeps track of dynamic memory allocations among others to help track down memory errors.
After reading the name of the executable I create a child process,link the executable with my module ( which includes my version of malloc family of functions) and execute the executable provided by the user. The parent process will consist of a GUI ( using QT framework ) where I want to display warnings/errors/number of allocations.
I need to communicate the number of mallocs/frees and a series of warning messages to the parent process in real-time. After the users application has finished executing I wish to display the number of memory leaks. ( I have taken care of all the backend coding needed for this in the shared library I link against).
Real-Time:
I though of 2 different approaches to communicate this information.
Child process will write to 2 pipes ( 1 for writing whether allocation/free happened and another for writing a single integer to denote a warning message).
I though of simply sending a signal to denote whether an allocation has happened. Also create signals for each of the warning messages. I will map these to the actual warnings (strings) in the parent process.
Is the signal version as efficient as using a pipe? Is it feasible ? Is there any better choice , as I do care about efficiency:)
After user's application finishes executing:
I need to send the whole data structure I use to keep track of memory leaks here. This could possibly be very large so I am not sure which IPC method would be the most efficient.
Thanks for your time
I would suggest a unix-domain socket, it's a little more flexible than a pipe, can be configured for datagram mode which save you having to find message boundaries, and makes it easy to move to a network interface later.
Signals are definitely not the way to do this. In general, signals are best avoided whenever possible.
A pipe solution is fine. You could also use shared memory, but that would be more vulnerable to accidental corruption by the target application.
I suggest a combination of shared memory and a socket. Have a shared memory area, say 1MB, and log all your information in some standard format in that buffer. If/when the buffer fills or the process terminates you send a message, via the socket, to the reader. After the reader ACKs you can clear the buffer and carry on.
To answer caf's concern about target application corruption, just use the mprotect system call to remove permissions (set PROT_NONE) from the shared memory area before giving control to your target process. Naturally this means you'll have to set PROT_READ|PROT_WRITE before updating your log on each allocation, not sure if this is a performance win with the mprotect calls thrown in.
EDIT: in case it isn't blindingly obvious, you can have multiple buffers (or one divided into N parts) so you can pass control back to the target process immediately and not wait for the reader to ACK. Also, given enough computation resources the reader can run as often as it wants reading the currently active buffer and performing real-time updates to the user or whatever it's reading for.

Resources