Best practice for handling input check for void function - c

This may be a huge noob question, but I am relatively new to C and to using 'assert'.
Say I'm building a large program and have a void function test() which takes in an array and performs some manipulation to the array.
Now, as I build this program, I'll want to make sure that all my inputs for my functions are valid, so I want to make sure the array passed into test() is valid (i.e. not null let's say).
I can write something like:
if (array == NULL) return;
However, when I'm testing and it just returns, it becomes hard to know if my method succeeded at manipulating my array unless I check the array itself. Is it normal practice to add an assert in this case to ensure my condition for my own debugging purposes? I've heard that assert is not compiled for production code, so the assert would only be to help me, the programmer, test and debug. It seems kind of weird to have both if statement and an assert, but I don't see how the if statement could quickly allow me to know if my test method succeeded, and I don't see how assert could be a valid check for production code. So it seems like they're both needed?

If the contract of your function is that it requires a valid pointer, the best possible behavior is to crash loudly when a null or otherwise invalid pointer is passed. You can't test the validity of a pointer in general, but in the case of null pointers, dereferencing them will crash on most systems anyway. An assert would be an appropriate way of documenting this and ensuring a crash (unless NDEBUG is defined) to aid in diagnosing usage errors.
Changing your function to return an error status is not a good idea. It complicates the interface and lets the contract violation go unnoticed until later (or not at all if the caller does not check the return value).

You have the basic ideas.
Asserts are used to ensure some condition never occurs. The assert you've indicated ( assert( a != NULL ) ) would be used if it is not valid to call the function where a could be NULL.
As such, testing if ( a == NULL ) would make no sense. The assert indicates you consider this invalid. If you've significantly tested your code with the assert, you've "proven" (at least that's the idea) that you never call the function with a as NULL.
However, if by design, you intend that the function should gracefully ignore a when it is null, and do nothing, then a test is more appropriate. The assert would not be, because if you intend that the function is to be called with a null for some PURPOSE in mind, you don't need to be informed when that happens by having the debug mode trigger a breakpoint.
Which is to say, it makes little sense to combine the two.
A more "advanced" usage may check other values for validity. Say I have some class which processes bitmaps, and the bitmap is "owned" by the class, dynamically allocated and deleted appropriately. If I'm going to call any function in the class that performs operations on the bitmap, I must know it could never be NULL. Asserts would be appropriate to test the member pointer storing the bitmap data to be sure those functions aren't being called when the pointer IS null.
This is key to using asserts. You're attempting to prove the condition never occurs during debugging. That's the basic concept. As you're grappling with the possibility that it may be valid for such a value TO BE null, and still otherwise operate gracefully, you may find the combination of asserts AND tests to be reasonable. That is, you want to avoid crashes if other users of your code happen to make a call when the value IS null, but still not crash in production code if it happens to BE null.
Sometimes that's a performance hit you don't want to accept, and so you fire asserts so that consumers of your code know they're doing something you declare should never be done.

Related

Fast way to check if file pointer points to a valid file

I am looking for fast (for performance-critical code), safe and cross-platform way to check if FILE* in fact points to a file upon successful previous call to fopen().
Asking for current position with ftell() is one approach, but
I doubt that it is fastest, accurate, safe or that there is no better more straightforward and dedicated for this way.
If a call to fopen has succeeded, but you want to know whether you've just opened a file or something else, I know of two general approaches:
Use fstat on the file descriptor (or stat on the same pathname you just opened), then inspect the mode bits.
Attempt to seek on the file descriptor. If this works as expected it's probably a file; if it doesn't it's a pipe or a socket or something like that.
The code for (1) might look like
struct stat st;
fstat(fileno(fp), &st);
if(st.st_mode & S_IFMT) == S_IFREG)
/* it's a regular file */
To perform (2) I normally seek to offset 1, then test to see what offset I'm at. If I'm at 1, it's a seekable file, and I rewind to 0 for the rest of the program. But if I'm still at 0, it's not a seekable file. (And of course I do this once, right after I open the file, and record the result in my own flag associated with the open file, so the performance hit is minimal.)
In C there are three kinds of pointer values:
Values that are NULL, because the programmer initialized them (or because they took advantage of default static initialization).
Values that were returned by a pointer-returning function such as fopen or malloc (and that have not yet been passed to fclose or free).
Values where neither 1 nor 2 is true.
And the simple fact is that if you have a pointer of kind 3, there is no mechanism in the language that will tell you whether the pointer is valid or not. If you have a pointer p that might have been obtained from malloc or not, but you can't remember, there is no way to ask the compiler or run-time system to tell you if it currently points to valid memory. If you have a FILE pointer fp that might have been obtained from fopen or not, but you can't remember, there is no way to ask the compiler or run-time system to tell you if it currently "points to" a valid file.
So it's up to you, the programmer, to keep track of pointer values, and to use programming practices which help you determine whether pointer values are valid or not.
Those ways include the following:
Always initialize pointer variables, either to NULL, or to point to something valid.
When you call a function that returns a pointer, such as fopen or malloc, always test the return value to see if it's NULL, and if it is, return early or print an error message or whatever is appropriate.
When you're finished with a dynamically-allocated pointer, and you release it by calling fclose or free or the equivalent, always set it back to NULL.
If you do these three things religiously, then you can test to see if a pointer is valid by doing
if(p != NULL)
or
if(p)
Similarly, and again if you do those things religiously, you can test to see if a pointer is invalid by doing
if(p == NULL)
or
if(!p)
But those tests work reliably only if you have performed steps 1 and 3 religiously. If you haven't, it's possible -- and quite likely -- for various pointer values to be non-NULL but invalid.
The above is one strategy. I should point out that steps 1 and 3 are not strictly necessary. The other strategy is to apply step 2 religiously, and to never keep around -- never attempt to use -- a pointer that might be null. If functions like fopen or malloc return NULL, you either exit the program immediately, or return immediately from whatever function you're in, typically with a failure code that tells your caller you couldn't do your job (because you couldn't open the file you needed, or you couldn't allocate the memory you needed). In a program that applies rule 2 religiously, you don't even need to test pointers for validity, because all pointer values in such programs are valid. (Well, all pointers are valid as long as Rule 2 was applied religiously. If you forget to apply Rule 2 even once, things can begin to break down.)
Trying to programmatically detect an invalid pointer is a little like hunting for witches.
Supposedly, one way to detect a witch was to hold her underwater. If she died, she was an ordinary human. But if she used her magical powers to avoid drowning, that meant she was a witch — so you killed her. (I can't remember just now if this was ever considered a "legitimate" method, or just a joke out of Monty Python and the Holy Grail.)
But, similarly, if you have a pointer that might be valid or might be invalid, and you try to test it by calling a function that acts on the pointer — like calling ftell on an unknown FILE pointer — there are two possible outcomes:
If the pointer was valid, the function will return normally.
But if the pointer was invalid, the behavior is undefined. In particular, it's significantly likely that the program will crash. That is, the function will not return normally, and it will not return with an error code, either. It will not return at all, because the program will have crashed, and your code (that was going to do one thing or the other depending on whether the pointer was or wasn't valid) won't run at all, because the whole program won't be running any more.
So, once again, if a pointer might or might not be valid, you (that is, explicit code in your program) must keep track of this fact somehow. If you have an unknown pointer value, for which you've lost track of its status, there is no well-defined way to determine its validity.

How to define a getter function

I'm writing a library which allows to build Bayesian nets. The structure of a net is encapsulated and the user cannot access its fields; however they can get and set some of them. Say you want to write the accessor function for the field table (which is basically a double array) for instance. Between the following options, which would be more appropriate?
First option:
int getTable(Net *net, double *res)
Second option:
double *getTable(Net *net)
In the first option, the user provides a pointer to the array where the table values will be written. The function copies the table values on res, leaving the user with no direct access to the internal structure. Modifying res leaves the table of the net unchanged. Of course, another function (say getTableSize()) is provided to correctly allocate res. This seems safe (the internal structure stay coherent) and has the advantage that you can return a code value if something goes wrong. The downside is that this approach can be slower than the next one, since it involves a copy. Typically, the number of values of table may vary from just 1 to a couple of hundreds.
In the second option, the function returns directly the pointer to the internal values. In the doc, I will specify that user cannot try to free the pointer or modify the values. Freeing the pointer would likely result in a segmentation fault and memory leaks if other operations on the net are performed. Modifying the table won't result in any apparent error, but the internal coherence would be broken and the results of subsequent calculations might be very wrong and very hard to debug for the user.
Which option do you prefer? There other stuff to consider? Is there another approach to prefer?
Personally I would go for the first option because of the ability to return an error code. This would also solve your problem regarding the user freeing wrongly the returned value. And passing a pointer on a variable declared on the stack is easy.
Note that you can also make things more clear, i.e. the returned value cannot be modified or freed, with the second option by returning a const pointer like this
const double * getTable(Net *net);
In such way, the caller cannot modify the value, unless he cast it to a non const but I think this would be going a bit too far since the caller break intentionally your interface.
More info on constness can be found on wikipedia
I think a good habit is to always demand a return code from functions that can fail for some reason.
Error handling is much more efficient when working with return codes.
I'd go with option one.
Also, I don't know if this is a mistake or not but option two returns a double pointer - if this is the correct behavior, then function one should have the signature:
int getTable(Net *net, double **res)
Additionally, as Eugene Sh. mentioned in the comments, some environments might not even support malloc (some embedded devices firmware comes to mind), so giving the user a choice whether to pass in a malloc'd variable or a stack allocated variable is also a good selling point for option one.
I have two points for you to consider:
You want memory deallocation to be in the same place where allocation is. If you allocate memory in a function and return pointer to it, then the caller of the function has to know how to free it and has to remember to free it. Maintaining that kind of code would be a nightmare.
Premature optimization is the root of all evil. Do not optimize your code until you are certain (i.e. you measured) that exactly that part of code is causing problems.
That said, first option is the only option.

How to handle error conditions in a void function

I'm making a data structures and algorithms library in C for learning purposes (so this doesn't necessarily have to be bullet-proof), and I'm wondering how void functions should handle errors on preconditions. If I have a function for destroying a list as follows:
void List_destroy(List* list) {
/*
...
free()'ing pointers in the list. Nothing to return.
...
*/
}
Which has a precondition that list != NULL, otherwise the function will blow up in the caller's face with a segfault.
So as far as I can tell I have a few options: one, I throw in an assert() statement to check the precondition, but that means the function would still blow up in the caller's face (which, as far as I have been told, is a big no-no when it comes to libraries), but at least I could provide an error message; or two, I check the precondition, and if it fails I jump to an error block and just return;, silently chugging along, but then the caller doesn't know the List* was NULL.
Neither of these options seem particularly appealing. Moreover, implementing a return value for a simple destroy() function seems like it should be unnecessary.
EDIT: Thank you everyone. I settled on implementing (in all my basic list functions, actually) consistent behavior for NULL List* pointers being passed to the functions. All the functions jump to an error block and exit(1) as well as report an error message to stderr along the lines of "Cannot destroy NULL list." (or push, or pop, or whatever). I reasoned that there's really no sensible reason why a caller should be passing NULL List* pointers anyway, and if they didn't know they were then by all means I should probably let them know.
Destructors (in the abstract sense, not the C++ sense) should indeed never fail, no matter what. Consistent with this, free is specified to return without doing anything if passed a null pointer. Therefore, I would consider it reasonable for your List_destroy to do the same.
However, a prompt crash would also be reasonable, because in general the expectation is that C library functions crash when handed invalid pointers. If you take this option, you should crash by going ahead and dereferencing the pointer and letting the kernel fire a SIGSEGV, not by assert, because assert has a different crash signature.
Absolutely do not change the function signature so that it can potentially return a failure code. That is the mistake made by the authors of close() for which we are still paying 40 years later.
Generally, you have several options if a constraint of one of your functions is violated:
Do nothing, successfully
Return some value indicating failure (or set something pointed-to by an argument to some error code)
Crash randomly (i.e. introduce undefined behaviour)
Crash reliably (i.e. use assert or call abort or exit or the like)
Where (but this is my personal opinion) this is a good rule of thumb:
the first option is the right choice if you think it's OK to not obey the constraints (i.e. they aren't real constraints), a good example for this is free.
the second option is the right choice, if the caller can't know in advance if the call will succeed; a good example is fopen.
the third and fourth option are a good choice if the former two don't apply. A good example is memcpy. I prefer the use of assert (one of the fourth options) because it enables both: Crashing reliably if someone is unwilling to read your documentation and introduce undefined behaviour for people who do read it (they will prevent that by obeying your constraints), depending on whether they compile with NDEBUG defined or not. Dereferencing a pointer argument can serve as an assert, because it will make your program crash (which is the right thing, people not reading your documentation should crash as early as possible) if these people pass an invalid pointer.
So, in your case, I would make it similar to free and would succeed without doing anything.
HTH
If you wish not to return any value from function, then it is good idea to have one more argument for errCode.
void List_destroy(List* list, int* ErrCode) {
*ErrCode = ...
}
Edit:
Changed & to * as question is tagged for C.
I would say that simply returning in case the list is NULL would make sense at this would indicate that list is empty(not an error condition). If list is an invalid pointer, you cant detect that and let kernel handle it for you by giving a seg fault and let programmer fix it.

Should I really worry about fixing null defres if there's no crashes?

Clang's scan-build reports quite a few dereference of null pointers in my project, however, I don't really see any unusual behavior (in 6 years of using it), ie:
Dereference of null pointer (loaded from variable chan)
char *tmp;
CList *chan = NULL;
/* This is weird because chan is set via do_lookup so why could it be NULL? */
chan = do_lookup(who, me, UNLINK);
if (chan)
tmp = do_lookup2(you,me,0);
prot(get_sec_var(chan->zsets));
^^^^
I know null derefs can cause crashes but is this really a big security concern as some people make it out to be? What should I do in this case?
It is Undefined Behavior to dereference a NULL pointer. It can show any behavior, it might crash or not but you MUST fix those!
The truth about Undefined Behavior is that it obeys Murphy's Law
"Anything that can go wrong will go wrong"
It makes no sense checking chan for NULL at one point:
if (chan)
tmp = do_lookup2(you,me,0); /* not evaluated if `chan` is NULL */
prot(get_sec_var(chan->zsets)); /* will be evaluated in any case */
... yet NOT checking it right at the next line.
Don't you have to execute both these statements within if branch?
Clang is warning you because you check for chan being NULL, and then you unconditionally dereference it in the next line anyway. This cannot possibly be correct. Either do_lookup cannot return NULL, then the check is useless and should be removed. Or it can, then the last line can cause undefined behaviour and MUST be fixed. Als is 100% correct: NULL pointer dereferences are undefined behaviour and are always a potential risk.
Probably you want to enclose your code in a block, so that all of it is governed by the check for NULL, and not just the next line.
You have to fix these as soon as possible. Or probably sooner. The Standard says that the NULL pointer is a pointer that points to "no valid memory location", so dereferencing it is undefined behaviour. It means that it may work, it may crash, and it may do strange things at other parts of your program, or maybe cause daemons to fly out of your nose.
Fix them. Now.
Here's how: put the dereference statement inside the if - doing otherwise (as you do: checking for NULL then dereferencing anyways) makes no sense.
if (pointer != NULL) {
something = pointer->field;
}
^^ this is good practice.
If you have never experienced problems with this code, it's probably because:
do_lookup(who, me, UNLINK);
always returns a valid pointer.
But what will it happen if this function changes? Or its parameters vary?
You definitely have to check for NULL pointers before dereferencing them.
if (chan)
prot(get_sec_var(chan->zsets));
If you are absolutely sure that neither do_lookup or its parameters will change in the future (and you can bet the safe execution of your program on it), and the cost of changing all the occurrences of similar functions is excessively high compared to the benefits that you would have in doing so, then:
you may decide to leave your code broken.
Many programmers did that in the past, and many more will do that in the future. Otherwise what would explain the existence of Windows ME?
If your program crashes because of a NULL pointer dereference, this can be classified as a Denial of Service (DoS).
If this program is used together with other programs (e.g. they invoke it), the security aspects now start to depend on what those other programs do when this one crashes. The overall effect can be the same DoS or something worse (exploitation, sensitive info leakage, and so on).
If your program does not crash because of a NULL pointer dereference and instead continues running while corrupting itself and possibly the OS and/or other programs within the same address space, you can have a whole spectrum of security issues.
Don't put on the line (or online) broken code, unless you can afford dealing with consequences of potential hacking.

Should my library handle SIGSEGV on bad pointer input?

I'm writing a small library that takes a FILE * pointer as input.
If I immediately check this FILE * pointer and find it leads to a segfault, is it more correct to handle the signal, set errno, and exit gracefully; or to do nothing and use the caller's installed signal handler, if he has one?
The prevailing wisdom seems to be "libraries should never cause a crash." But my thinking is that, since this particular signal is certainly the caller's fault, then I shouldn't attempt to hide that information from him. He may have his own handler installed to react to the problem in his own way. The same information CAN be retrieved with errno, but the default disposition for SIGSEGV was set for a good reason, and passing the signal up respects this philosophy by either forcing the caller to be handle his errors, or by crashing and protecting him from further damage.
Would you agree with this analysis, or do you see some compelling reason to handle SIGSEGV in this situation?
Taking over handlers is not library business, I'd say it's somewhat offensive of them unless explicitly asked for. To minimize crashes library may validate their input to some certain extent. Beyond that: garbage in — garbage out.
The prevailing wisdom seems to be "libraries should never cause a crash."
I don't know where you got that from - if they pass an invalid pointer, you should crash. Any library will.
I would consider it reasonable to check for the special case of a NULL pointer. But beyond that, if they pass junk, they violated the function's contract and they get a crash.
This is a subjective question, and possibly not fit for SO, but I will present my opinion:
Think about it this way: If you have a function that takes a nul-terminated char * string and is documented as such, and the caller passes a string without the nul terminator, should you catch the signal and slap the caller on the wrist? Or should you let it crash and make the bad programmer using your API fix his/her code?
If your code takes a FILE * pointer, and your documentation says "pass any open FILE *", and they pass a closed or invalidated FILE * object, they've broken the contract. Checking for this case would slow down the code of people who properly use your library to accommodate people who don't, whereas letting it crash will keep the code as fast as possible for the people who read the documentation and write good code.
Do you expect someone who passes an invalid FILE * pointer to check for and correctly handle an error? Or are they more likely to blindly carry on, causing another crash later, in which case handling this crash may just disguise the error?
Kernels shouldn't crash if you feed them a bad pointer, but libraries probably should. That doesn't mean you should do no error checking; a good program dies immediately in the face of unreasonably bad data. I'd much rather a library call bail with assert(f != NULL) than to just trundle on and eventually dereference the NULL pointer.
Sorry, but people who say a library should crash are just being lazy (perhaps in consideration time, as well as development efforts). Libraries are collections of functions. Library code should not "just crash" any more than other functions in your software should "just crash".
Granted, libraries may have some issues around how to pass errors across the API boundary, if multiple languages or (relatively) exotic language features like exceptions would normally be involved, but there's nothing TOO special about that. Really, it's just part of the burden of writing libraries, as opposed to in-application code.
Except where you really can't justify the overhead, every interface between systems should implement sanity checking, or better, design by contract, to prevent security issues, as well as bugs.
There are a number of ways to handle this, What you should probably do, in order of preference, is one of:
Use a language that supports exceptions (or better, design by contract) within libraries, and throw an exception on or allow the contract to fail.
Provide an error handling signal/slot or hook/callback mechanism, and call any registered handlers. Require that, when your library is initialised, at least one error handler is registered.
Support returning some error code in every function that could possibly fail, for any reason. But this is the old, relatively insane way of doing things from C (as opposed to C++) days.
Set some global "an error has occurred flag", and allow clearing that flag before calls. This is also old, and completely insane, mostly because it moves error status maintence burden to the caller, AND is unsafe when it comes to threading.

Resources