Why does system() exist? - c

Many papers and such mention that calls to 'system()' are unsafe and unportable. I do not dispute their arguments.
I have noticed, though, that many Unix utilities have a C library equivalent. If not, the source is available for a wide variety of these tools.
While many papers and such recommend against goto, there are those who can make an argument for its use, and there are simple reasons why it's in C at all.
So, why do we need system()? How much existing code relies on it that can't easily be changed?

sarcastic answer Because if it didn't exist people would ask why that functionality didn't exist...
better answer
Many of the system functionality is not part of the 'C' standard but are part of say the Linux spec and Windows most likely has some equivalent. So if you're writing an app that will only be used on Linux environments then using these functions is not an issue, and as such is actually useful. If you're writing an application that can run on both Linux and Windows (or others) these calls become problematic because they may not be portable between system. The key (imo) is that you are simply aware of the issues/concerns and program accordingly (e.g. use appropriate #ifdef's to protect the code etc...)

The closest thing to an official "why" answer you're likely to find is the C89 Rationale. 4.10.4.5 The system function reads:
The system function allows a program to suspend its execution temporarily in order to run another program to completion.
Information may be passed to the called program in three ways: through command-line argument strings, through the environment, and (most portably) through data files. Before calling the system function, the calling program should close all such data files.
Information may be returned from the called program in two ways: through the implementation-defined return value (in many implementations, the termination status code which is the argument to the exit function is returned by the implementation to the caller as the value returned by the system function), and (most portably) through data files.
If the environment is interactive, information may also be exchanged with users of interactive devices.
Some implementations offer built-in programs called "commands" (for example, date) which may provide useful information to an application program via the system function. The Standard does not attempt to characterize such commands, and their use is not portable.
On the other hand, the use of the system function is portable, provided the implementation supports the capability. The Standard permits the application to ascertain this by calling the system function with a null pointer argument. Whether more levels of nesting are supported can also be ascertained this way; assuming more than one such level is obviously dangerous.
Aside from that, I would say mainly for historical reasons. In the early days of Unix and C, system was a convenient library function that fulfilled a need that several interactive programs needed: as mentioned above, "suspend[ing] its execution temporarily in order to run another program". It's not well-designed or suitable for any serious tasks (the POSIX requirements for it make it fundamentally non-thread-safe, it doesn't admit asynchronous events to be handled by the calling program while the other program is running, etc.) and its use is error-prone (safe construction of command string is difficult) and non-portable (because the particular form of command strings is implementation-defined, though POSIX defines this for POSIX-conforming implementations).
If C were being designed today, it almost certainly would not include system, and would either leave this type of functionality entirely to the implementation and its library extensions, or would specify something more akin to posix_spawn and related interfaces.

Many interactive applications offer a way for users to execute shell commands. For instance, in vi you can do:
:!ls
and it will execute the ls command. system() is a function they can use to do this, rather than having to write their own fork() and exec() code.
Also, fork() and exec() aren't portable between operating systems; using system() makes code that executes shell commands more portable.

Related

What are common uses for the system (3) command?

I came across the command while reading the famous C Language Book (1988). Is the command commonly used today?
From the book (section 7.8.4):
The function system(char *s) executes the command contained in the
character string s, then resumes execution of the current program. The
contents of s depend strongly on the local operating system. As a
trivial example, on UNIX systems, the statement
system("date");
causes the program date to be run ...
I was under the impression that fork-and-exec is the main way to run another program from the current one...
system it the function from the standard C library that allows a C program to invoke an external (meaning OS level) command.
(Almost) everything is in the above sentence: the function is standard C, meaning that is is supported by any conformant implementation. But what OS does is err... just OS dependant.
It should the prefered way for writing portable programs (because it is standard C) but unfortunately:
not all OS support same commands and/or same syntax
it is known to have some caveats on most systems
The latter part is related to security: many OS (at least all I know) have a configurable path where a command is searched, and in that case the system function does use that path. The problem is that by changing the path, the program can invoke in reality a command that is not the one intended by the programmer, if someone managed to install a different command with same name in a place they control, and also managed to change the path.
This is the reason why system is generally frowned upon and careful programmers only rely on lower level system dependant functions like fork+exec on Unix like or CreateProcess on Windows, or alternatively use absolute paths for the commands called from system. But then you need a rather complex configuration way to adapt that absolute path to various systems...

Are functions such as printf() implemented differently for Linux and Windows

Something I still don't fully understand. For example, standard C functions such as printf() and scanf() which deal with sending data to the standard output or getting data from the standard input. Will the source code which implements these functions be different depending on if we are using them for Windows or Linux?
I'm guessing the quick answer would be "yes", but do they really have to be different?
I'm probably wrong , but my guess is that the actual function code be the same, but the lower layer functions of the OS that eventually get called by these functions are different. So could any compiler compile these same C functions, but it is what gets linked after (what these functions depend on to work on lower layers) is what gives us the required behavior?
Will the source code which implements these functions be different
depending on if we are using them for Windows or Linux?
Probably. It may even be different on different Linuxes, and for different Windows programs. There are several distinct implementations of the C standard library available for Linux, and maybe even more than one for Windows. Distinct implementations will have different implementation code, otherwise lawyers get involved.
my guess is that the actual function code be the same, but the lower
layer functions of the OS that eventually get called by these
functions are different. So could any compiler compile these same C
functions, but it is what gets linked after (what these functions
depend on to work on lower layers) is what gives us the required
behavior?
It is conceivable that standard library functions would be written in a way that abstracts the environment dependencies to some lower layer, so that the same source for each of those functions themselves can be used in multiple environments, with some kind of environment-specific compatibility layer underneath. Inasmuch as the GNU C library supports a wide variety of environments, it serves as an example of the general principle, though Windows is not among the environments it supports. Even then, however, the environment distinction would be effective even before the link stage. Different environments have a variety of binary formats.
In practice, however, you are very unlikely to see the situation you describe for Windows and Linux.
Yes, they have different implementations.
Moreover you might be using multiple different implementations on the same OS. For example:
MinGW is shipped with its own implementation of standard library which is different from the one used by MSVC.
There are many different implementations of C library even for Linux: glibc, musl, dietlibc and others.
Obviously, this means there is some code duplication in the community, but there are many good reasons for that:
People have different views on how things should be implemented and tested. This alone is enough to "fork" the project.
License: implementations put some restrictions on how they can be used and might require some actions from the end user (GPL requires you to share your code in some cases). Not everyone can follow those requirements.
People have very different needs. Some environments are multithreaded, some are not. printf might need or might not need to use some thread synchronization mechanisms. Some people need locale support, some don't. All this can bloat the code in the end, not everyone is willing to pay for things they do not use. Even strerror is vastly different on different OSes.
Aforementioned synchronization mechanisms are usually OS-specific and work in specific ways. Same can be said about locale handling, signal handling and other things, including the actual data writing and reading.
Some implementations add non-standard extensions that can make your life easier. Not all of those make sense on other OSes. For example glibc adds 'e' mode specifier to open file with O_CLOEXEC flag. This doesn't make sense for Windows.
Many complex things cannot be implemented in pure C and require some compiler-specific extensions. This can tie implementation to a limited number of compilers.
In the end, it is much simpler to have many C libraries, than trying to create a one-size-fits-all implementation.
As you say the higher level parts of the implementation of something like printf, like the code used to format the string using the arguments, can be written in a cross-platform way and be shared between Linux and Windows. I'm not sure if there's a C library that actually does it though.
But to interact with the hardware or use other operating system facilities (such as when printf writes to the console), the libc implementation has to use the OS's interface: the system calls. And these are very different between Windows and Unix-likes, and different even among Unix-likes (POSIX specifies a lot of them but there are OS specific extensions). For example here you can find system call tables for Linux and Windows.
There are two parts to functions like printf(). The first part parses the format string, and assembles an array of characters ready for output. If this part is written in C, there's no reason preventing it being common across all C libraries, and no reason preventing it being different, so long the standard definition of what printf() does is implemented. As it happens, different library developers have read the standard's definition of printf(), and have come up with different ways of parsing and acting on the format string. Most of them have done so correctly.
The second part, the bit that outputs those characters to stdout, is where the differences come in. It depends on using the kernel system call interface; it's the kernel / OS that looks after input/output, and that is done in a specific way. The source code required to get the Linux kernel to output characters is very different to that required to get Windows to output characters.
On Linux, it's usual to use glibc; this does some elaborate things with printf(), buffering the output characters in a pipe until a newline is output, and only then calling the Linux system call for displaying characters on the screen. This means that printf() calls from separate threads are neatly separated, each being on their own line. But the same program source code, compiled against another C library for Linux, won't necessarily do the same thing, resulting in printf() output from different threads being all jumbled up and unreadable.
There's also no reason why the library that contains printf() should be written in C. So long as the same function calling convention as used by the C compiler is honoured, you could write it in assembler (though that'd be slightly mad!). Or Ada (calling convention might be a bit tricky...).
Will the source code which implements these functions be different
Let us try another point-of-view: competition.
No. Competitors in industry are not required by the C spec to share source code to issue a compliant compiler - nor would various standard C library developers always want to.
C does not require "open source".

definition of function printf in C language

I have read that C language does not include instructions for input and for output and that printf, scanf, getchar, putchar are actually functions.
Which are the primitive C language instructions to obtain the function printf , then?
Thank you.
If you want to use printf, you have to #include <stdio.h>. That file declares the function.
If you where thinking about how printf is implemented: printf might internally call any other functions and probably goes down to putc (also part of the C runtime) to write out the characters one-by-one. Eventually one of the functions needs to really write the character to the console. How this is done depends on the operating system. On Linux for example printf might internally call the Linux write function. On Windows printf might internally call WriteConsole.
The function printf is documented here; in fact, it is not part of the C language itself. The language itself does not provide a means for input and output. The function printf is defined in a library, which can be accessed using the compiler directive #include <stdio.h>.
No programming language provides true "primitives" for I/O. Any I/O "primitives" rely on lower abstraction levels, in this language or another.
I/O, at the lowest level, needs to access hardware. You might be looking at BIOS interrupts, hardware I/O ports, memory-mapped device controlers, or something else entirely, depending on the actual hardware your program is running on.
Because it would be a real burden to cater for all these possibilities in the implementation of the programming language, a hardware abstraction layer is employed. Individual I/O controllers are accessed by hardware drivers, which in turn are controlled by the operating system, which is providing I/O services to the application developer through a defined, abstract API. These may be accessed directly (e.g. by user-space assembly), or wrapped further (e.g. by the implementation of a programming language's interpreter, or standard library functions).
Whether you are looking at "commands" like (bash) echo or (Python) print, or library functions like (Java) System.out.println() or (C) printf() or (C++) std::cout, is just a syntactic detail: Any I/O is going through several layers of abstraction, because it is easier, and because it protects you from all kinds of erroneous or malicious program behaviour.
Those are the "primitives" of the respective language. If you're digging down further, you are leaving the realm of the language, and enter the realm of its implementation.
I once worked on a C library implementation myself. Ownership of the project has passed on since, but basically it worked like this:
printf() was implemented by means of vfprintf() (as was, eventually, every function of the *printf() family).
vfprintf() used a couple of internal helpers to do the fancy formatting, writing into a buffer.
If the buffer needed to be flushed, it was passed to an internal writef() function.
this writef() function needed to be implemented differently for each target system. On POSIX, it would call write(). On Win32, it would call WriteFile(). And so on.

Is there a way to test whether thread safe functions are available in the C standard library?

In regards to the thread safe functions in newer versions of the C standard library, is there a cross-platform way to tell if these are available via pre-processor definition? I am referring to functions such as localtime_r().
If there is not a standard way, what is the reliable way in GCC? [EDIT] Or posix systems with unistd.h?
There is no standard way to test that, which means there is no way to test it across all platforms. Tools like autoconf will create a tiny C program that calls this function and then try to compile and link it. It this works, looks like the function exists, if not, then it may not exist (or the compiler options are wrong and the appropriate CFLAGS need to be set).
So you have basically 6 options:
Require them to exist. Your code can only work on platforms where they exist; period. If they don't exist, compilation will fail, but that is not your problem, since the platform violates your minimum requirements.
Avoid using them. If you use the non-thread safe ones, maybe protected by a global lock (e.g. a mutex), it doesn't matter if they exist or not. Of course your code will then only work on platforms with POSIX mutexes, however, if a platform has no POSIX mutexes, it won't have POSIX threads either and if it has no POSIX threads (and I guess you are probably using POSIX threads w/o supporting any alternative), why would you have to worry about thread-safety in the first place?
Decide at runtime. Depending on the platform, either do a "weak link", so you can test at runtime if the function was found or not (a pointer to the function will point to NULL if it wasn't) or alternatively resolve the symbol dynamically using something like dlsym() (which is also not really portable, but widely supported in the Linux/UNIX world). However, in that case you need a fallback if the function is not found at runtime.
Use a tool like autoconf, some other tool with similar functionality, or your own configuration script to determine this prior to start of compilation (and maybe set preprocessor macros depending on result). In that case you will also need a fallback solution.
Limit usage to well known platforms. Whether this function is available on a certain platform is usually known (and once it is available, it won't go away in the future). Most platforms expose preprocessor macros to test what kind of platform that is and sometimes even which version. E.g. if you know that GNU/Linux, Android, Free/Open/NetBSD, Solaris, iOS and MacOS X all offer this function, test if you are compiling for one of these platforms and if yes, use it. If the code is compiled for another platform (or if you cannot determine what platform that is), it may or may not offer this function, but since you cannot say for sure, better be safe and use the fallback.
Let the user decide. Either always use the fallback, unless the user has signaled support or do it the other way round (which makes probably more sense), always assume it is there and in case compilation fails, offer a way the user can force your code into "compatibility mode", by somehow specifying that thread-safe-functions are not available (e.g. by setting an environment variable or by using a different make target). Of course this is the least convenient method for the (poor) user.

Ncurses programs in pseudo-terminals

In my continuing attempt to understand how pseudo-terminals work, I have written a small program to try to run bash.
The problem is, my line-breaking seems to be off. (The shell prompt only appears AFTER I press enter.)
Furthermore, I still cannot properly use ncurses programs, like vi. Can anyone tell me how to setup the pseudo-terminal for this?
My badly written program can be found here, I encourage you to compile it. The operating system is GNU/Linux, thanks.
EDIT: Compile like this: gcc program.c -lutil -o program
EDIT AGAIN: It looks like the issue with weird spacing was due to using printf(), still doesn't fix the issue with ncurses programs though.
There are several issues in your program. Some are relatively easy to fix - others not so much:
forkpty() and its friends come from BSD and are not POSIX compatible. They should be avoided for new programs. From the pty(7) manual page:
Historically, two pseudoterminal APIs have evolved: BSD and System V. SUSv1 standardized a pseudoterminal API based on the System V API, and this API should be employed in all new programs that use pseudoterminals.
You should be using posix_openpt() instead. This issue is probably not critical, but you should be aware of it.
You are mixing calls to raw system calls (read(), write()) and file-stream (printf(), fgets()) functions. This is a very good way to confuse yourself. In general you should choose one approach and stick with it. In this case, it's probably best to use the low-level system calls (read(), write()) to avoid any issues that would arise from the presence of the I/O buffers that the C library functions use.
You are assuming a line-based paradigm for your terminals, by using printf() and fgets(). This is not always true, especially when dealing with interactive programs like vim.
You are assuming a C-style single-byte null-terminated string paradigm. Terminals normally deal with characters and bytes - not strings. And while most character set encodings avoid using a zero byte, not all do so.
As a result of (2), (3) and (4) above, you are not using read() and write() correctly. You should be using their return values to determine how many bytes they processed, not string-based functions like strlen().
This is the issue that, in my opinion, will be most difficult to solve: You are implicitly assuming that:
The terminal (or its driver) is stateless: It is not. Period. There are at least two stateful controls that I suspect are the cause of ncurses-based programs not working correctly: The line mode and the local echo control of the terminal. At least these have to match between the parent/master and the slave terminal in order to avoid various strange artifacts.
The control-interface of a terminal can be passed-through just by passing the bytes back and forth: It is not always so. Modern virtual terminals allow for a degree of out-of-band control via ioctl() calls, as described for Linux here.
The simplest way to deal with this issue is probably to set the parent terminal to raw mode and let the slave pseudo-terminal driver deal with the awkward details.
You may want to have a look at this program which seems to work fine. It comes from the book The Linux Programming Interface and the full source code is here. Disclaimer: I have not read the book, nor am I promoting it - I just found the program using Google.

Resources