I studied about signal and alarm functions but not satisfied hence thought that their definitions can help me.
THIS MIGHT help----->. Good Explanations ::
1.http://www.gnu.org/software/libc/manual/html_node/Signal-Handling.html#Signal-Handling
2.http://www.cs.cf.ac.uk/Dave/C/node24.html
3.http://www.thegeekstuff.com/2012/03/catch-signals-sample-c-code/
The signal(3) C library function is usually a thin wrapper around a system call to the underlying kernel. alarm(3) does a bit more work, but again falls back on the kernel's idea of time handling and signal delivery.
If you really want to know how they work, you'd have to dig into the source of a Unix(y) kernel. Be warned, the code you'll find is probably very complex, kernel programmers have to handle some very exotic corner cases, and be wary of weird uses thay might lead to security problems. All that while keeping it as fast as possible (it's code that'll be used hundreds or thousands of times a second on millions of machines).
Next best would be to check out a book on Unix internals.
Man pages are the best places to know about the working and usage of any system-defined function. From your question, you actually mean the implementations of the functions(I guess).
But ask yourself, why do you need to look into the code? All that happens is it induces more questions, since most of the variables and functions used will be defined somewhere else and you need to look into them again.
If you are so dissatisfied, then try some tough problems involving these functions.
Related
I really want to put more in the body to explain the question… but the title really covers it all. As far as I can suss, librt is more “official” (it’s a standard part of libc?), but I also remember seeing that Node.js uses libeio. Which should I spend more time looking into? What about portability? How different are their APIs?
(I’d appreciate it if somebody with ≥1,500 rep could add the tags “libeio” and “librt” to this question, as I cannot.)
libeio wraps standard calls in threads, and handles a large swath of the common system calls.
librt only has a few calls -- read and write, but not, for example, stat.
in a current project I dared to do away with the old 0 rule, i.e. returning 0 on success of a function. How is this seen in the community? The logic that I am imposing on the code (and therefore on the co-workers and all subsequent maintenance programmers) is:
.>0: for any kind of success/fulfillment, that is, a positive outcome
==0: for signalling no progress or busy or unfinished, which is zero information about the outcome
<0: for any kind of error/infeasibility, that is, a negative outcome
Sitting in between a lot of hardware units with unpredictable response times in a realtime system, many of the functions need to convey exactly this ternary logic so I decided it being legitimate to throw the minimalistic standard return logic away, at the cost of a few WTF's on the programmers side.
Opininons?
PS: on a side note, the Roman empire collapsed because the Romans with their number system lacking the 0, never knew when their C functions succeeded!
"Your program should follow an existing convention if an existing convention makes sense for it."
Source: The GNU C Library
By deviating from such a widely known convention, you are creating a high level of technical debt. Every single programmer that works on the code will have to ask the same questions, every consumer of a function will need to be aware of the deviation from the standard.
http://en.wikipedia.org/wiki/Exit_status
I think you're overstating the status of this mythical "rule". Much more often, it's that a function returns a nonnegative value on success indicating a result of some sort (number of bytes written/read/converted, current position, size, next character value, etc.), and that negative values, which otherwise would make no sense for the interface, are reserved for signalling error conditions. On the other hand, some functions need to return unsigned results, but zero never makes sense as a valid result, and then zero is used to signal errors.
In short, do whatever makes sense in the application or library you are developing, but aim for consistency. And I mean consistency with external code too, not just your own code. If you're using third-party or library code that follows a particular convention and your code is designed to be closely coupled to that third-party code, it might make sense to follow that code's conventions so that other programmers working on the project don't get unwanted surprises.
And finally, as others have said, whatever your convention, document it!
It is fine as long as you document it well.
I think it ultimately depends on the customers of your code.
In my last system we used more or less the same coding system as yours, with "0" meaning "I did nothing at all" (e.g. calling Init() twice on an object). This worked perfectly well and everybody who worked on that system knew this was the convention.
However, if you are writing an API that can be sold to external customers, or writing a module that will be plugged into an existing, "standard-RC" system, I would advise you to stick to the 0-on-success rule, in order to avoid future confusion and possible pitfalls for other developers.
And as per your PS, when in Rome, do like the romans do :-)
I think you should follow the Principle Of Least Astonishment
The POLA states that, when two
elements of an interface conflict, or
are ambiguous, the behaviour should be
that which will least surprise the
user; in particular a programmer
should try to think of the behavior
that will least surprise someone who
uses the program, rather than that
behavior that is natural from knowing
the inner workings of the program.
If your code is for internal consumption only, you may get away with it, though. So it really depends on the people your code will impact :)
There is nothing wrong with doing it that way, assuming you document it in a way that ensures others know what you're doing.
However, as an alternative, if might be worth exploring the option to return an enumerated type defining the codes. Something like:
enum returnCode {
SUCCESS, FAILURE, NO_CHANGE
}
That way, it's much more obvious what your code is doing, self-documenting even. But might not be an option, depending on your code base.
It is a convention only. I have worked with many api that abandon the principle when they want to convey more information to the caller. As long as your consistent with this approach any experienced programmer will quickly pick up the standard. What is hard is when each function uses a different approach IE with win32 api.
In my opinion (and that's the opinion of someone who tends to do out-of-band error messaging thanks to working in Java), I'd say it is acceptable if your functions are of a kind that require strict return-value processing anyway.
So if the return value of your method has to be inspected at all points where it's called, then such a non-standard solution might be acceptable.
If, however, the return value might be ignored or just checked for success at some points, then the non-standard solution produces quite some problem (for example you can no longer use the if(!myFunction()) ohNoesError(); idiom.
What is your problem? It is just a convention, not a law. If your logic makes more sense for your application, then it is fine, as long as it is well documented and consistent.
On Unix, exit status is unsigned, so this approach won't work if you ever have to run your program there, and this will confuse all your Unix programmers to no end. (I looked it up just now to make sure, and discovered to my surprised that Windows uses a signed exit status.) So I guess it will probably only mostly confuse your Windows programmers. :-)
I'd find another method to pass status between processes. There are many to choose from, some quite simple. You say "at the cost of a few WTF's on the programmers side" as if that's a small cost, but it sounds like a huge cost to me. Re-using an int in C is a miniscule benefit to be gained from confusing other programmers.
You need to go on a case by case basis. Think about the API and what you need to return. If your function only needs to return success or failure, I'd say give it an explicit type of bool (C99 has a bool type now) and return true for success and false for failure. That way things like:
if (!doSomething())
{
// failure processing
}
read naturally.
In many cases, however, you want to return some data value, in which case some specific unused or unlikely to be used value must be used as the failure case. For example the Unix system call open() has to return a file descriptor. 0 is a valid file descriptor as is theoretically any positive number (up to the maximum a process is allowed), so -1 is chosen as the failure case.
In other cases, you need to return a pointer. NULL is an obvious choice for failure of pointer returning functions. This is because it is highly unlikely to be valid and on most systems can't even be dereferenced.
One of the most important considerations is whether the caller and the called function or program will be updated by the same person at any given time. If you are maintaining an API where a function will return the value to a caller written by someone who may not even have access to your source code, or when it is the return code from a program that will be called from a script, only violate conventions for very strong reasons.
You are talking about passing information across a boundary between different layers of abstraction. Violating the convention ties both the caller and the callee to a different protocol increasing the coupling between them. If the different convention is fundamental to what you are communicating, you can do it. If, on the other hand, it is exposing the internals of the callee to the caller, consider whether you can hide the information.
I have to use C for one project and I am thinking of using longjmp/setjmp for error handling as I think it will be much easier to handle error in one central place than return codes. I would appreciate if there are some leads on how to do this.
I am particularly concerned with resource cleanup being correctly done if any such error occurs.
Also how do I handle errors that result in multi-threaded programs using them?
Even better, is there some C library that already exists for error/exception handling?
Have a look at this example/tutorial:
http://www.di.unipi.it/~nids/docs/longjump_try_trow_catch.html
If you are worried about resource cleanup, you have to seriously wonder whether longjmp() and setjmp() are a good idea.
If you design your resource allocation system so that you can in fact clean up accurately, then it is OK - but that design tends to be tricky, and typically incomplete if, in fact, the standard libraries that your code uses themselves allocate resources that must be released. It requires extraordinary care, and because it is not wholly reliable, it is not suitable for long-running systems that might need to survive multiple uses of the setjmp()/longjmp() calls (they'll leak, expand, and eventually cause problems).
Symbian implemented it's Leave mechanism in terms of longjmp() and this serves as a good walk through of all the things you need to do.
Symbian has a global 'cleanup stack' that you push and pop things you want cleaned up should an jump happened. This is the manual alternative to the automatic stack unwinding that a C++ compiler does when a C++ exception is thrown.
Symbian had 'trap harnesses' that it would jump out to; these could be nested.
(Symbian more recently reimplemented it in terms of C++ exceptions, but the interface remains unchanged).
All together, I think that proper C++ exceptions are less prone to coding errors and much faster than rolling your own C equivalent.
(Modern C++ compilers are very good at 'zero overhead' exceptions when they are not thrown, for example; longjmp() has to store the state of all the registers and such even when the jump is not later taken, so can fundamentally never be as fast as exceptions.)
Using C++ as a better C, where you only adopt exceptions and RAII, would be a good route should using longjmp() for exception emulation be tempting to you.
I have only ever found one use for setjmp()/longjmp() and it wasn't to do with error handling.
There really is no need to use it for that since it can always be refactored into something easier to follow. The use of setjmp()/longjmp() is very similar to goto in that it can be easily abused. Anything that makes your code less readable is a bad idea in general. Note that I'm not saying they're inherently bad, just that they can lead to bad code easier than the alternatives.
FWIW, the one place they were invaluable was a project I did in the early days of the industry (MS-DOS 6 time frame). I managed to put together a co-operative multi-threading library using Turbo C which used those functions in a yield() function to switch tasks.
I'm pretty certain I haven't touched them (or had the need to) since those days.
Exceptions are by far a better general mechanism, but in the deep dark days of C past, I wrote a processor emulator that included a command shell. The shell used to setjmp/longjmp for interrupt handling (ie, the processor is running and the user hits break/ctrl-c, the code traps SIGINT and longjmps back to the shell).
I've used setjmp/longjmp reasonably tidily, to escape from within a callback, without having to negotiate my way up through various other library levels.
That case (if I recall correctly) was where a code within a yacc-generated parser could detect a (non-syntactical) problem, and wanted to abandon the parse but give a reasonably useful error report back to the caller on the other side of all the yacc-generated code. Another example was within a callback called from an Expat parser. In each case, there were other ways of doing this, but they seemed more cumbersome and obscure than simply bailing out in this way.
As other answers have pointed out, though, it's necessary to be careful about clean-up, and very thoughtful about making sure that the longjmp code is callable only within the scope of the region dynamically protected by the setjmp.
Doing it in the context of multi-threaded programming? I'm sure that's not impossible, but Oooh: get out your family-pack of aspirin now. It's probably wise to keep the setjmp/longjmp pairs as close together as possible. As long as a matching setjmp/longjmp pair are within the same thread, I expect you'll be OK, but ... be careful out there.
One of the hardest things for me to initially adjust to was my first intense experience programming with pthreads in C. I was used to knowing exactly what the next line of code to be run would be and most of my debugging techniques centered around that expectation.
What are some good techniques to debugging with pthreads in C? You can suggest personal methodologies without any added tools, tools you use, or anything else that helps you debug.
P.S. I do my C programming using gcc in linux, but don't let that necessarily restrain your answer
Valgrind is an excellent tool to find race conditions and pthreads API misuses. It keeps a model of program memory (and perhaps of shared resources) accesses and will detect missing locks even when the bug is benign (which of course means that it will completely unexpectedly become less benign at some later point).
To use it, you invoke valgrind --tool=helgrind, here is its manual. Also, there is valgrind --tool=drd (manual). Helgrind and DRD use different models so they detect overlapping but possibly different set of bugs. False positives also may occur.
Anyway, valgrind has saved countless hours of debugging (not all of them though :) for me.
One of the things that will suprise you about debugging threaded programs is that you will often find the bug changes, or even goes away when you add printf's or run the program in the debugger (colloquially known as a Heisenbug).
In a threaded program, a Heisenbug usually means you have a race condition. A good programmer will look for shared variables or resources that are order-dependent. A crappy programmer will try to blindly fix it with sleep() statements.
Debugging a multithreaded application is difficult. A good debugger such as GDB (with optional DDD front end) for the *nix environment or the one that comes with Visual Studio on windows will help tremendously.
In the 'thinking' phase, before you start coding, use the State Machine concept. It can make the design much clearer.
printf's can help you understand the dynamics of your program. But they clutter up the source code, so use a macro DEBUG_OUT() and in its definition enable it with a boolean flag. Better still, set/clear this flag with a signal that you send via 'kill -USR1'. Send the output to a log file with a timestamp.
also consider using assert(), and then analyze your core dumps using gdb and ddd.
My approach to multi-threaded debugging is similar to single-threaded, but more time is usually spent in the thinking phase:
Develop a theory as to what could be causing the problem.
Determine what kind of results could be expected if the theory is true.
If necessary, add code that can disprove or verify your results and theory.
If your theory is true, fix the problem.
Often, the 'experiment' that proves the theory is the addition of a critical section or mutex around suspect code. I will then try to narrow down the problem by systematically shrinking the critical section. Critical sections are not always the best fix (though can often be the quick fix). However, they're useful for pinpointing the 'smoking gun'.
Like I said, the same steps apply to single-threaded debugging, though it is far too easy to just jump into a debugger and have at it. Multi-threaded debugging requires a much stronger understanding of the code, as I usually find the running multi-threaded code through a debugger doesn't yield anything useful.
Also, hellgrind is a great tool. Intel's Thread Checker performs a similar function for Windows, but costs a lot more than hellgrind.
I pretty much develop in an exclusively multi-threaded, high performance world so here's the general practice I use.
Design- the best optimization is a better algorithm:
1) Break you functions into LOGICALLY separable pieces. This means that a call does "A" and ONLY "A"- not A then B then C...
2) NO SIDE EFFECTS: Abolish all nakedly global variables, static or not. If you cannot fully abolish side effects, isolate them to a few locations (concentrate them in the code).
3) Make as many isolated components RE-ENTRANT as possible. This means they're stateless- they take all their inputs as constants and only manipulate DECLARED, logically constant parameters to produce the output. Pass-by-value instead of reference wherever you can.
4) If you have state, make a clear separation between stateless sub-assemblies and the actual state machine. Ideally the state machine will be a single function or class manipulating stateless components.
Debugging:
Threading bugs tend to come in 2 broad flavors- races and deadlocks. As a rule, deadlocks are much more deterministic.
1) Do you see data corruption?: YES => Probably a race.
2) Does the bug arise on EVERY run or just some runs?: YES => Likely a deadlock (races are generally non-deterministic).
3) Does the process ever hang?: YES => There's a deadlock somewhere. If it only hangs sometimes, you probably have a race too.
Breakpoints often act much like synchronization primitives THEMSELVES in the code, because they're logically similar- they force execution to stall in the current context until some other context (you) sends a signal to resume. This means that you should view any breakpoints you have in code as altering its mufti-threaded behavior, and breakpoints WILL affect race conditions but (in general) not deadlocks.
As a rule, this means you should remove all breakpoints, identify the type of bug, THEN reintroduce them to try and fix it. Otherwise, they simply distort things even more.
I tend to use lots of breakpoints. If you don't actually care about the thread function, but do care about it's side effects a good time to check them might be right before it exits or loops back to it's waiting state or whatever else it's doing.
When I started doing multithreaded programming I... stopped using debuggers.
For me the key point is good program decomposition and encapsulation.
Monitors are the easiest way of error-free multithreaded programming.
If you cannot avoid complex lock dependencies then it is easy to check if they are cyclic
- wait until program hangs ans check the stacktraces using 'pstack'.
You can break cyclic locks by introducing some new threads and asynchronous communication buffers.
Use assertions, and make sure to write singlethreaded unittests for particular components of your software - you can then run them in debugger if you want.
I am writing an academic project about extremely long functions in the Linux kernel.
For that purpose, I am looking for examples for real-life functions that are extremely long (few hundreds of lines of code), that you don't consider bad programming (i.e., they won't benefit from decomposition or usage of a dispatch table).
Have you ever written or seen such a code? Can you post or link to it, and give explanation of why is it so long?
I have been getting amazing help from the community here - any idea that will be taken into the project will be properly credited.
Thanks,
Udi
The longest functions that I have ever written all have one thing in common, a very large switch statement. There are times, when you have to switch on a long list of items and it would only make things harder to understand if you tried to refactor some of the options into a separate function. Having large switch statements makes the Cyclomatic complexity go through the roof, but it is often better than the alternative implementations.
It was the last one before I got fired.
A previous job: An extremely long case statement, IIRC 1000+ lines. This was long before objects. Each option was only a few lines long. Breaking it up would have made it less clear. There were actually a pair of such routines doing different things to the same underlying set of data types.
Sorry, I don't have the code anymore and it isn't mine to post, anyway.
The longest function that I didn't see as being horrible would be the key method of a custom CPU VM. As with #epotter, this involved a big switch statement. In fact I'd say a lot of method that I find resist being cleanly broken down or improved in readability involve switch statements.
Unfortunately, you won't often find this type of subroutine checked in or posted somewhere if it's auto-generated during a build step using some sort of code generator.
So look for projects that have C generated from another language.
Beside the performance, I think the size of the call stack in Kernel space is 8K (please verify the size). Also, as far as I know, code in kernel is fairly specific. If some code is unlikely to be re-used in the future why bother make it a function considering function call overhead.
I could imagine that when speed is important (such as when holding some sort of lock in the kernel) you would not want to break up a function because of the overhead due to making a functional call. When compiled, parameters have to be pushed onto the stack and data has to be popped off before returning. Therefor you may have a large function for efficiency reasons.