When we write C programs we make calls to malloc or printf. But do we need to check every call? What guidelines do you use?
e.g.
char error_msg[BUFFER_SIZE];
if (fclose(file) == EOF) {
sprintf(error_msg, "Error closing %s\n", filename);
perror(error_msg);
}
The answer to your question is: "Do whatever you want", there is no written rule, BUT the right question is "What do users want in case of failure".
Let me explain, if you are a student writing a test program for example, no absolute need to check for errors: it may be a waste of time.
Now, if your code may be distributed or used by other people, that quite different: put yourself in the shoes of future users. Which message do you prefer when something goes wrong with an application:
Core was generated by `./cut --output-d=: -b1,1234567890- /dev/fd/63'.
Program terminated with signal SIGSEGV, Segmentation fault.
or
MySuperApp failed to start MySuperModule because there is not enough space on the disk.
Try to free space on disk, then relaunch the app.
If this error persists contact us at support#mysuperapp.com
As it has already been addressed in the comment, you have to consider two types of error:
A fatal error is one that kills your program (app / server / site / whatever it is). It renders it unusable, either by crashing or by putting it in some state whereby it can't do it's usable work. e.g. memory allocation, disk space ...
Non-fatal error is one where something messes up, but the program can continue to do what it's supposed to do. e.g. file not found, serve other users not requesting the thing that called the error.
Source : https://www.quora.com/What-is-the-difference-between-an-error-and-a-fatal-error
Just do error checking if your program behaviour has to behave differently in case an error is detected. Let me illustrate this with an example: Assume you have used a temporary file in your program and you use the unlink(2) system call to erase that temporary file at the end of the program. Have you to check if the file has been successfully erased? Let's analyse the problem with some common sense: if you check for errors, are you going to be able (inside the program) of doing some alternate thing to cope with this? This is uncommon (if you created the file, it's rare that you will not be able to erase it, but something can happen in the time between --- for example a change in directory permissions that forbids you to write on the directory anymore) But what can you do in that case? Is it possible to use a different approach to erase temporary file in that case. Probably not... so checking (in that case) a possible error from the unlink(2) system call will be almost useless.
Of course, this doesn't apply always, you have to use common sense while programming. Errors about writing to files should be always considered, as they belong to access permissions or mostly to full filesystems (In that case, even trying to generate a log message can be useles, as you have filled your disk --- or not, that depends) Do you know always the precise environment details to obviate if a full filesystem error can be ignored. Suppose you have to connect to a server in your program. Should the connect(2) system call failure be acted upon? probably most of the times, at least a message to the user with the protocol error (or the cause of the failure) must be given to the user.... assuming everything goes ok can save you time in a prototype, but you have to cope with what can happen, in production programs.
When i want to use return value of function than suggested to check return value before using it
For example pointer return address that can be null also.so suggested to keep null check before using it.
Related
I have a file let's log. I need to remove some bytes let's n bytes from starting of file only. Issue is, this file referenced by some other file pointers in other programs and may these pointer write to this file log any time. I can't re-create new file otherwise file-pointer would malfunction(i am not sure about it too).
I tried to google it but all suggestion for only to re-write to new files.
Is there any solution for it?
I can suggest two options:
Ring bufferUse a memory mapped file as your logging medium, and use it as a ring buffer. You will need to manually manage where the last written byte is, and wrap around your ring appropriately as you step over the end of the ring. This way, your logging file stays a constant size, but you can't tail it like a regular file. Instead, you will need to write a special program that knows how to walk the ring buffer when you want to display the log.
Multiple number of small log filesUse some number of smaller log files that you log to, and remove the oldest file as the collection of files grow beyond the size of logs you want to maintain. If the most recent log file is always named the same, you can use the standard tail -F utility to follow the log contents perpetually. To avoid issues of multiple programs manipulating the same file, your logging code can send logs as messages to a single logging daemon.
So... you want to change the file, but you cannot. The reason you cannot is that other programs are using the file. In general terms, you appear to need to:
stop all the other programs messing with the file while you change it -- to chop now unwanted stuff off the front;
inform the other programs that you have changed it -- so they can re-establish their file-pointers.
I guess there must be a mechanism to allow the other programs to change the file without tripping over each other... so perhaps you can extend that ? [If all the other programs are children of the main program, then if the children all O_APPEND, you have a fighting chance of doing this, perhaps with the help of a file-lock or a semaphore (which may already exist ?). But if the programs are this intimately related, then #jxh has other, probably better, suggestions.]
But, if you cannot change the other programs in any way, you appear to be stuck, except...
...perhaps you could try 'sparse' files ? On (recent-ish) Linux (at least) you can fallocate() with FALLOC_FL_PUNCH_HOLE, to remove the stuff you don't want without affecting the other programs file-pointers. Of course, sooner or later the other programs may overflow the file-pointer, but that may be a more theoretical than practical issue.
Note: Please read to the end before marking this as duplicate. While it's similar, the scope of what I'm looking for in an answer extends beyond what the previous question was asking for.
Widespread practice, which I tend to agree with, tends to be treating close purely as a resource-deallocation function for file descriptors rather than a potential IO operation with meaningful failure cases. And indeed, prior to the resolution of issue 529, POSIX left the state of the file descriptor (i.e. whether it was still allocated or not) unspecified after errors, making it impossible to respond portably to errors in any meaningful way.
However, a lot of GNU software goes to great lengths to check for errors from close, and the Linux man page for close calls failure to do so "a common but nevertheless serious programming error". NFS and quotas are cited as circumstances under which close might produce an error but does not give details.
What are the situations under which close might fail, on real-world systems, and are they relevant today? I'm particularly interested in knowing whether there are any modern systems where close fails for any non-NFS, non-device-node-specific reasons, and as for NFS or device-related failures, under what conditions (e.g. configurations) they might be seen.
Once upon a time (24 march, 2007), Eric Sosman had the following tale to share in the comp.lang.c newsgroup:
(Let me begin by confessing to a little white lie: It wasn't
fclose() whose failure went undetected, but the POSIX close()
function; this part of the application used POSIX I/O. The lie
is harmless, though, because the C I/O facilities would have
failed in exactly the same way, and an undetected failure would
have had the same consequences. I'll describe what happened in
terms of C's I/O to avoid dwelling on POSIX too much.)
The situation was very much as Richard Tobin described.
The application was a document management system that loaded a
document file into memory, applied the user's edits to the in-
memory copy, and then wrote everything to a new file when told
to save the edits. It also maintained a one-level "old version"
backup for safety's sake: the Save operation wrote to a temp
file, and then if that was successful it deleted the old backup,
renamed the old document file to the backup name, and renamed the
temp file to the document. bak -> trash, doc -> bak, tmp -> doc.
The write-to-temp-file step checked almost everything. The
fopen(), obviously, but also all the fwrite()s and even a final
fflush() were checked for error indications -- but the fclose()
was not. And on one system it happened that the last few disk
blocks weren't actually allocated until fclose() -- the I/O
system sat atop VMS' lower-level file access machinery, and a
little bit of asynchrony was inherent in the arrangement.
The customer's system had disk quotas enabled, and the
victim was right up close to his limit. He opened a document,
edited for a while, saved his work thus far, and exceeded his
quota -- which went undetected because the error didn't appear
until the unchecked fclose(). Thinking that the save succeeded,
the application discarded the old backup, renamed the original
document to become the backup, and renamed the truncated temp
file to be the new document. The user worked a little longer
and saved again -- same thing, except you'll note that this time
the only surviving complete file got deleted, and both the
backup and the master document file are truncated. Result: the
whole document file became trash, not just the latest session
of work but everything that had gone before.
As Murphy would have it, the victim was the boss of the
department that had purchased several hundred licenses for our
software, and I got the privilege of flying to St. Louis to be
thrown to the lions.
[...]
In this case, the failure of fclose() would (if detected) have
stopped the delete-and-rename sequence. The user would have been
told "Hey, there was a problem saving the document; do something
about it and try again. Meanwhile, nothing has changed on disk."
Even if he'd been unable to save his latest batch of work, he would
at least not have lost everything that went before.
Consider the inverse of your question: "Under what situations can we guarantee that close will succeed?" The answer is:
when you call it correctly, and
when you know that the file system the file is on does not return errors from close in this OS and Kernel version
If you are convinced that you program doesn't have any logic errors and you have complete control over the Kernel and file system, then you don't need to check the return value of close.
Otherwise, you have to ask yourself how much you care about diagnosing problems with close. I think there is value in checking and logging the error for diagnostic purposes:
If a coder makes a logic error and passes an invalid fd to close, then you'll be able to quickly track it down. This may help to catch a bug early before it causes problems.
If a user runs the program in an environment where close does return an error when (for example) data was not flushed, then you'll be able to quickly diagnose why the data got corrupted. It's an easy red flag because you know the error should not occur.
I'm working on several programs right now, and have become frustrated over some of the haphazard ways I'm debugging my programs and logging errors. As such, I've decided to take a couple days to write an error library that I can use across all of my programs. I do most of my development in Windows, making extensive use of the Windows API, but the key here is flexibility: this library ideally needs to remain flexible, offering the programmer notification options in console apps and GUI apps on Windows and Unix-like environments.
My initial idea is to use one library that uses preprocessor conditional inclusion of the headers for windows and unix based on the current environment. For example, in a win32 app, console error message notification (while possible) isn't necessary; instead, a simple
MessageBoxA/W(hWndParent, TEXT("Some error message that makes sense in context"), TEXT("Application Name"), MB_ICONERROR | MB_OK)
would make the most sense. On the other hand, on linux things are a little more complex:
GtkWindow *w;
w = gtk_message_dialog_new(pOwner, GTK_DIALOG_MODAL, GTK_MESSAGE_ERROR, GTK_BUTTONS_OK, TEXT("Some error message");
gtk_window_set_title(w, TEXT("Application Name"));
On either operation, simple file logging with more information about the error (function, file, line, etc.) would also be useful in pinpointing the source and tracing the flow.
Moreover, logging should be possible: even when a function requiring error logging/notification is called, it should be possible to show the sequence of function calls in the program, if that level of logging is activated.
Thus, my initial considerations are to have a library that incorporates all of these features, with minimal overhead by dint of preprocessor conditionals. I think it would make the most sense to break this up into a couple structs:
The "main" struct, which is passed to every function in the main program requiring logging, and contains
A bitmask containing the status codes for which file logging is necessary (e.g., NOERROR | WARNING | CRITICALERROR or CRITICALERROR | CRASH)
A pointer to an "output" struct
A linked list of "error information" nodes
The "output" struct, which handles printing to file, displaying message boxes, and printing to the console
The "error information" struct, which contains information about the error (function, file, line, message, type, etc.)
This is just my initial thoughts about this library. Can any of you think of any more information that I should include? Another major issue for me is atomicity of error addition: it's likely that another thread might create the error than the one logging the error, so I need to make sure that creating and adding an error node is actually an atomic operation. Thus, mutexes would likely be the way I'd go about synchronization.
Thanks for the help!
In the case of not CRITICALERROR | CRASH, where the app would be expected to continue after the logging call, it would be better to queue, (thread-safe producer-consumer queue), off each log struct to a logging thread that performs the requested action/s. The logging thread would normally free the structs after handling them. Some advantages:
1) The action taken for each logging request is reduced to mallocing the struct, loading it and pushing it onto the queue. If the disk is temporarily busy, or has high latency because it's on a network, or becomes actually unavailable, the calling thread/s will continue to run almost normally. With this, the set of apps that will fail just because the logging has been turned on is reduced. A user with some intermittent problem that you cannot reproduce can be instructed to turn on the logger with little chance that the logging will affect normal operations or, worse, introduce delays that cover up the bug.
2) For the same reasons as (1), adding/changing logger functionality, even at runtime, is much easier. For example, maybee you want to restrict the size of log files or raise a new date/timestamped log file every day. A 'normal' call would introduce a long delay into the calling thread while the old file was closed and the new one opened. If the logging is queued off, all you get is a temporary increase in the number of queued log structs.
3) Controlling the logging is easier. In a GUI app, the logger could have its own form where the logging options can be modified. You could have a 'New log file now' button which, when clicked, queued a 'LOGCONTROL' struct to the logging thread, along with all the other logging messages. When the thread gets it, it opens a new log file.
4) Forwarding the log messages is fairly easy. Maybe you want to watch the logged messages, as well as write them to disk - queue up a 'LOGCONTROL' struct that instructs the thread to save a function ptr passed in the struct and henceforth call this function with subsequent logging messages after writing them to disk. The function passed could queue up the messages to your GUI for display in a 'terminal' type window, (PostMessage on Windows, Qt etc. have a similar functionalities to allow data to be passed to the GUI). Sure, on ***x, you could open a console window and 'tail-f' the log file, but this will not appear particularly elegant to a GUI user, is more difficult to manage for users and is anyway not available as standard on Windows, (how many users know how to copy paste from a console window and email you the error message?).
Another possibility is that the logging thread might be instructed to stream the log text to a remote server - another 'LOGCONTROL' struct could pass the hostname/port to the logger thread. The temporary delays of opening the network connection to the server would not matter because of the queued communications.
5) 'Lazy writing' and other such performance enhancements become easier, but:
Disadvantages:
1) The main one is that when the log call returns to the requestor, the log operation has probably not yet happened. This is very bad news in the case of CRITICALERROR | CRASH, and can be unacceptable in some cases even with 'ordinary' logging of progress messages etc. There should be an option to bypass in these cases and a direct disk write/flush made - fOpen/CreateFile a separate 'Direct.log', append, write, flush, close. Slow - but secure, just in case the app explodes after the log call returns.
2) More complex, so more development, more conditionals, bigger API interface include.
Rgds,
Martin
Hi I use this for another language but you could research it and follow its design
http://www.gurock.com/smartinspect/
regards
When malloc() fails, which would be the best way to handle the error? If it fails, I want to immediately exit the program, which I would normally do with using exit(). But in this special case, I'm not quite sure if exit() would be the way to go here.
In library code, it's absolutely unacceptable to call exit or abort under any circumstances except when the caller broke the contact of your library's documented interface. If you're writing library code, you should gracefully handle any allocation failures, freeing any memory or other resources acquired in the attempted operation and returning an error condition to the caller. The calling program may then decide to exit, abort, reject whatever command the user gave which required excessive memory, free some unneeded data and try again, or whatever makes sense for the application.
In all cases, if your application is holding data which has not been synchronized to disk and which has some potential value to the user, you should make every effort to ensure that you don't throw away this data on allocation failures. The user will almost surely be very angry. It's best to design your applications so that the "save" function does not require any allocations, but if you can't do that in general, you might instead want to perform frequent auto-save-to-temp-file operations or provide a way of dumping the memory contents to disk in a form that's not the standard file format (which might for example require ugly XML and ZIP libraries, each with their own allocation needs, to write) but instead a more "raw dump" which you application can read and recover from on the next startup.
If malloc() returns NULL it means that the allocation was unsuccessful. It's up to you to deal with this error case. I personally find it excessive to exit your entire process because of a failed allocation. Deal with it some other way.
Use Both?
It depends on whether the core file will be useful. If no one is going to analyze it, then you may as well simply _exit(2) or exit(3).
If the program will sometimes be used locally and you intend to analyze any core files produced, then that's an argument for using abort(3).
You could always choose conditionally, so, with --debug use abort(3) and without it use exit.
My application uses lseek() to seek the desired position to write data.
The file is successfully opened using open() and my application was able to use lseek() and write() lots of times.
At a given time, for some users and not easily reproducable, lseek() returns -1 with an errno of 9. File is not closed before this and the filehandle (int) isn't reset.
After this, another file is created; open() is okay again and lseek() and write() works again.
To make it even worse, this user tried the complete sequence again and all was well.
So my question is, can the OS close the file handle for me for some reason?
What could cause this? A file indexer or file scanner of some sort?
What is the best way to solve this; is this pseudo code the best solution?
(never mind the code layout, will create functions for it)
int fd=open(...);
if (fd>-1) {
long result = lseek(fd,....);
if (result == -1 && errno==9) {
close(fd..); //make sure we try to close nicely
fd=open(...);
result = lseek(fd,....);
}
}
Anybody experience with something similar?
Summary: file seek and write works okay for a given fd and suddenly gives back errno=9 without a reason.
So my question is, can the OS close the file handle for me for some reason? What could cause > this? A file indexer or file scanner of some sort?
No, this will not happen.
What is the best way to solve this; is
this pseudo code the best solution?
(never mind the code layout, will
create functions for it)
No, the best way is to find the bug and fix it.
Anybody experience with something similar?
I've seen fds getting messed up many times, resulting in EBADF in the some of the cases,
and blowing up spectacularly in others, it's been:
buffer overflows - overflowing something and writing a nonsense value into a 'int fd;' variable.
silly bugs that happen because some corner case someone did
if(fd = foo[i].fd) when they meant if(fd == foo[i].fd)
Raceconditions between threads, some thread closes the wrong file descriptor that some other thread wants to use.
If you can find a way to reproduce this problem, run your program under 'strace', so you can see whats going on.
The OS shall not close file handles randomly (I am assuming a Unix-like system). If your file handle is closed, then there is something wrong with your code, most probably elsewhere (thanks to the C language and the Unix API, this can be really anywhere in the code, and may be due to, e.g., a slight buffer overflow in some piece of code which really looks like to be unrelated).
Your pseudo-code is the worst solution, since it will give you the impression of having fixed the problem, while the bug still lurks.
I suggest that you add debug prints (i.e. printf() calls) wherever you open and close a file or socket. Also, try Valgrind.
(I just had yesterday a spooky off-by-1 buffer overflow, which damaged the least significant byte of a temporary slot generated by the compiler to save a CPU register; the indirect effect was that a structure in another function appeared to be shifted by a few bytes. It took me quite some time to understand what was going on, including some thorough reading of Mips assembly code).
I don't know what type of setup you have, but the following scenario, could I think produce such an effect (or else one similar to it). I have not tested this to verify, so please take it with a grain of salt.
If the file/device you are opening implemented as a server application (eg NFS), consider what could happen if the server application goes down / restarts / reboots. The file descriptor though originally valid at the client end might no longer map to a valid file handle at the server end. This can conceivably lead to a sequence of events wherein the client will get EBADF.
Hope this helps.
No, the OS should not close file handles just like that, and other applications (file scanners etc.) should not be able to do it.
Do not work around the problem, find it's source. If you don't know what the reason for your problem was, you will never know if your workaround actually does work.
Check your assumptions. Is errno set to 0 before the call? Is fd really valid at the point the call is being made? (I know you said it is, but did you check it?)
What is the output of puts( strerror( 9 ) ); on your platform?