How do I explain 'main()'? - c

I'm creating a presentation on how to program in C, and since I'm fairly new to C, I want to check whether my assumptions are correct, and what am I missing.
Every C program has to have an entry point for the OS to know where to begin execution. This is defined by the main() function. This function always has a return value, whether it be user defined or an implicit return 0;.
Since this function is returning something, we must define the type of the thing it returns.
This is where my understand starts to get hazy...
Why does the entry point needs to have a return value?
Why does it have to be an int?
What does the OS do with the address of int main() after the program executes?
What happens in that address when say a segfault or some other error halts the program without reaching a return statement?

Every program terminates with an exit code. This exit code is determined by the return of main().
Programs typically return 0 for success or 1 for failure, but you can choose to use exit codes for other purposes.

1 and 2 are because the language says so.
For 3: Most operating systems have some sort of process management, and a process exits by invoking a suitable operating system service to do so, which takes a status value as an argument. For example, both DOS and Linux have "exit" system calls which accept one numeric argument.
For 4: Following from the above, operating systems typically also allow processes to die in response to receiving a signal which is not ignored or handled. In a decent OS you should be able to distinguish whether a process has exited normally (and retrieve its exit status) or been killed because of a signal (and retrieve the signal number). For instance, in Linux the wait system call provides this service.
Exit statuses and signals provide a simple mechanism for processes to communicate with one another in a generic way without the need for a custom communications infrastructure. It would be significantly more tedious and cumbersome to use an OS which didn't have such facilities or something equivalent.

Related

C difference between main thread and other threads

Is there a difference between the first thread and other threads created during runtime. Because I have a program where to abort longjmp is used and a thread should be able to terminate the program (exit or abort don't work in my case). Could I safely use pthread_kill_other_threads_np and then longjmp?
I'm not sure what platform you're talking about, but pthread_kill_other_threads_np is not a standard function and not a remotely reasonable operation anymore than free_all_malloced_memory would be. Process termination inherently involves the termination of all threads atomically with respect to each other (they don't see each other terminate).
As for longjmp, while there is nothing wrong with longjmp, you cannot use it to jump to a context in a different thread.
It sounds like you have an XY problem here; you've asked about whether you can use (or how to use) particular tools that are not the right tool for whatever it is you want, without actually explaining what your constraints are.

exit() function in C — what happens if we do not write this?

I have seen many posts regarding this question. Many say that exit(EXIT_SUCCESS) should be called for successful termination, and exit(EXIT_FAILURE) for unsuccessful termination.
What I want to ask is: What if we do not call the exit() function and instead what if we write return 0 or return -1? What difference does it make?
What happens if successful termination does not happen? What are its effects?
It is told that if we call exit() functions the program becomes portable --
"portable" in the sense what? How can one function make the entire code portable?
It is told that the execution returns to the parent what happens if the execution does not return to the parent?
My questions may seem to be silly but I need answers for all of these to get rid of my ambiguity between return and exit.
Return returns a value from a function. Exit will exit your program. When called within the main function, they are essentially the same. In fact, most standard libc _start functions call main and then call exit on the result of main anyway.
Nothing, directly. The caller (usually your shell) will get the return value, and be able to know from that whether your program succeeded or not.
I don't know about this. I don't know what you mean here. exit is a standard function, and you may use it if you wish, or you can decide not to. Personally, I prefer to return from main only, and only return error status from other functions, to let the caller decide what to do with it. It's usually bad form to write an API that kills the program on anything but an unrecoverable error that the program can't manage.
If execution didn't return to the parent (which is either a shell or some other program), it would just hang forever. Your OS makes sure that this doesn't happen in most ordinary cases.

C goto different function

I'm working with an embedded system where the exit() call doesn't seem to exist.
I have a function that calls malloc and rather than let the program crash when it fails I'd rather exit a bit more gracefully.
My initial idea was to use goto however the labels seem to have a very limited scope (I'm not sure, I've never used them before "NEVER USE GOTO!!1!!").
I was wondering if it is possible to goto a section of another function or if there are any other creative ways of exiting a C program from an arbitrary function.
void main() {
//stuff
a();
exit:
return;
}
void a() {
//stuff
//if malloc failed
goto exit;
}
Thanks for any help.
Options:
since your system is non-standard (or perhaps is standard but non-hosted), check its documentation for how to exit.
try abort() (warning: this will not call atexit handlers).
check whether your system allows you to send a signal to yourself that will kill yourself.
return a value from a() indicating error, and propagate that via error returns all the way back to main.
check whether your system has setjmp/longjmp. These are difficult to use correctly but they do provide what you asked for: the ability to transfer execution from anywhere in your program (not necessarily including a signal/interrupt handler, but then you probably wouldn't be calling malloc in either of those anyway) to a specific point in your main function.
if your embedded system is such that your program is the only code that runs on it, then instead of exiting you could call some code that goes into an error state: perhaps an infinite loop, that perhaps flashes an LED or otherwise indicates that badness has happened. Maybe you can provoke a reboot.
Why dont you use return values
if malloc failed
return 1;
else
return 0;
...........
if(!a())
return;
goto cannot possibly jump to another function.
Normally, you are advised please don't use goto! In this case what you are asking is not possible.
How to deal with this? There are few solutions.
Check return code or value of problematic functions and act accordingly.
Use setjmp/longjmp. This advice should be considered even more evil than using goto itself, but it does support jumping from one function to another.
Embedded systems rarely have any variation of exit(), as that function doesn't necessarily make any sense in the given context. Where does the controller of an elevator or a toaster exit to?
In multitasking embedded systems there could be a system call to exit or terminate a process, leaving only an idle process alive that does simply a busy loop: while (1); or in some cases call a privileged instruction to go to power saving mode: while (1) { asm("halt") };
In embedded systems one possible method to "recover" from error is to asm("trap #0"); or any equivalent of calling an interrupt vector, that implements graceful system shutdown with dumping core to flash drive or outputting an error code to UART.

What's the purpose of exit(0) ?

I understand that exit(1) indicated an error , for example :
if (something went wrong)
exit(EXIT_FAILURE);
But what's the purpose of using exit(EXIT_SUCCESS); ?
When handling with processes maybe ? e.g. for fork() ?
thanks
This gives the part of the system that invokes the program (usually the command shell) a way to check if the program terminated normally or not.
Edit - start -
By the way, it is possible to query the exit code of an interactive command as well through the use of the $? shell variable. For instance this failed ls command yields an exit code of value 2.
$ ls -3
ls: invalid option -- '3'
Try `ls --help' for more information.
$ echo $?
2
Edit - end -
Imagine a batch file (or shell script) that invokes a series of programs and depending on the outcome of each run may choose some action or the other. This action may consist of a simple message to the user, or the invocation of some other program or set of programs.
This is a way for a program to return a status of its run.
Also, note that zero denotes no problem, any non-zero value indicates a problem.
Programs will often use different non-zero values to pass more information back (other than just non-normal termination). So the non-zero exit value then serves as a more specific error code that can identify a particular problem. This of course depends on the meanings of the code being available (usually/hopefully in the documentation)
For instance, the ls man page has this bit of information at the bottom:
Exit status is 0 if OK, 1 if minor problems, 2 if serious trouble.
For Unix/Linux man pages, look for the section titled EXIT STATUS to get this information.
you can only exit your program from the main function by calling return. To exit the program from anywhere else, you can call exit(EXIT_SUCCESS). For example, when the user clicks an exit button.
It's a system call. There's always good information on system calls if you check the man pages:
http://linux.die.net/man/3/exit
On a Linux box, you can simply type man exit into a terminal and this information will come up.
There are two ways of 'normally' exiting a program: returning from main(), or calling exit(). Normally exit() is used, and thought of, for signalling a failure. However, if you are not in main(), you must still exit somehow. exit(0) is usually used to terminate the process when not in main().
main() is actually not a special function to the operating system, only to the runtime environment. The 'function' that actually gets loaded is normally defined as _start() (this is handled by the linker, and beyond the scope of this answer), written in assembly, which simply prepares the environment and calls main(). Upon return from main(), it also calls exit() with the return value from main().

Why do system calls return EFAULT instead of sending a segfault?

To be clear, this is a design rather than an implementation question
I want to know the rationale behind why POSIX behaves this way. POSIX system calls when given an invalid memory location return EFAULT rather than crashing the userspace program (by sending a sigsegv), which makes their behavior inconsistent with userspace functions.
Why? Doesn't this just hide memory bugs? Is it a historical mistake or is there a good reason for it?
Because system calls are executed by the kernel, not by the user program --- when the system call occurs, the user process halts and waits for the kernel to finish.
The kernel itself, of course, isn't allowed to seg fault, so it has to manually check all the address areas the user process gives it. If one of these checks fails, the system call fails with EFAULT. So in this situation a segmentation fault hasn't actually happening --- it's been avoided by the kernel explicitly checking to make sure all the addresses are valid. Hence it makes sense that no signal is sent.
In addition, if a signal were sent, there'd be no way the kernel could attach a meaningful program counter to the signal, the user process isn't actually executing when the system call is running. This means there'd be no way for the user process to produce decent diagnostics, restart the failed instruction, etc.
To summarise: mostly historical, but there is actual logic to the reasoning. Like EINTR, this doesn't make it any less irritating to deal with.
Well, what would you want to happen. A system call is a request to the system. If you ask: "when does the ferry to Munchen leave?" would you like the program to crash, or to get return = -1 with errno = ENOHARBOR ? If you ask the sytem to put your car into your handbag, would you like to have your handbag destroyed, or a return of -1 with errno set to EBAGTOOSMALL ?
There is a technical detail: before or after syscalls,arguments to/from user/system -land have to be converted (copied) when entering/leaving the system call. Mostly for security reasons the system is very reluctant to write into user-space. (Linux has a copy_to_user_space function for this (and vice versa), which checks the credentials before doing the actual copying)
Why? Doesn't this just hide memory bugs?`
On the contrary. It allows your program to handle the error(impossible in this case), and terminate gracefully. But the program must check the return value from system calls and inspect errno. In the case of SIGSEGVE, there is very little for your program to do, so mapping EINVAL to SIGSEGVE would be a bad idea.
Systemcalls were designed to always return (or block indefinitely...), whether they succeed or fail.
And a technical aspect could be that {segmentation faults, buserror, floating point exception, ...} are (often) generated by hardware interrupts.

Resources