I'm learning C programming language and trying to do some basic stuff with it. The problem I faced is how to parse command line arguments. I read this answer and tried to find the library summary for unistd.h functions int the standard N1570. But unfortunately it is not defined there (Work only for POSIX compliant OS as far as I can understand).
AFAIK Windows is not really POSIX compliant so something like in the answer I referred to above
#include <unistd.h>
int main(int argc, char *argv[]){
int opt;
while((opt = getopt(argc, argv, "ilv") != -1){
//do some with it
}
}
is not really portable.
QUESTION: The only standardized and portable way to parse command line args is to do it yourself?
Is there a standardized way to parse command line arguments in C?
Yes there is. Not standardized by the C committee, but by others. The most commonly widespread is POSIX with it's Utility Conventions and getopt utility. Using GNU Argument Syntax with argp is just fun and cool. There is also commonly used GNUs getopt_long for supporting long arguments with getopt.
The only standardized and portable way to parse command line args is to do it yourself?
There is none standardized in C11 way to parse command line. The C11 only specifies, that the main arguments are strings, but they're value is just implementation-defined:
If the value of argc is greater than zero, the array members argv[0] through argv[argc-1] inclusive shall contain pointers to strings, which are given implementation-defined values by the host environment prior to program startup. The intent is to supply to the program information determined prior to program startup from elsewhere in the hosted environment.
The C standard doesn't introduce an abstraction of a "command line", so an abstraction consisting of "command line arguments" and furthermore "command line arguments parsing" is just not defined. I think introducing a standardized way to parse command line arguments is out of scope for a C standard.
The most portable way is to write and use the most portable code. Indeed the most portable code would not use any non-standard C libraries and "do it yourself" in a most portable manner.There is little sense to target all possible architectures and environments. If you are going only for GNU/Linux, I would go with getopt if you want to stay portable to some crazy environment and for argp if you want just to target GNU/Linux specific systems. getopt is really widespread, even a library for embedded systems like newlib has getopt and getopt_long implemented. For others, you can just copy/include the code for getopt from other sources into your program, thus protecting against environments that doesn't have it.
Related
This question already has answers here:
Parsing command-line arguments in C
(14 answers)
Closed 1 year ago.
I need to parse command options as given below in C:
./myBinary --option1 name --option2 age --option3 address
Getopt only support -l , -a kind of flags. Any suggestions?
The only solid way is parsing those arguments without depending on external libraries. There is no widely accepted standard for handling command line arguments.
The best you have is the standardised call to main providing access to command line arguments:
int main(int argc, char *argv[]) {
}
There are major differences between operating systems, but it's worse than that.
Windows tends to use /option or -option, but many tools borrow from other platforms.
Linux tends to go the GNU way, offering both --option and -o via getopt_long.
BSD and Unix tend to use only the short -o, but some applications may be ported from Linux.
Many macOS tools use -option but borrows from BSD and sometimes Linux.
So only GNU provides a "standard" library for parsing such options. Consider it a non-portable extension.
Long story short: you need to provide more details about your target platform(s) to get a better answer, but don't expect some perfect solution.
I'm currently reading up on and experimenting with the different possibilities of running programs from within C code on Linux. My use cases cover all possible scenarios, from simply running and forgetting about a process, reading from or writing to the process, to reading from and writing to it.
For the first two, popen() is very easy to use and works well. I understand that it uses some version of fork() and exec() internally, then invokes a shell to actually run the command.
For the third scenario, popen() is not an option, as it is unidirectional. Available options are:
Manually fork() and exec(), plus pipe() and dup2() for input/output
posix_spawn(), which internally uses the above as need be
What I noticed is that these can achieve the same that popen() does, but we can completely avoid the invoking of an additional sh. This sounds desirable, as it seems less complex.
However, I noticed that even examples on posix_spawn() that I found on the Internet do invoke a shell, so it would seem there must be a benefit to it. If it is about parsing command line arguments, wordexp() seems to do an equally good job.
What is the reason behind benefit of invoking a shell to run the desired process instead of running it directly?
Edit: I realized that my wording of the question didn't precisely reflect my actual interest - I was more curious about the benefits of going through sh rather than the (historical) reason, though both are obviously connected, so answers for both variations are equally relevant.
Invoking a shell allows you to do all the things that you can do in a shell.
For example,
FILE *fp = popen("ls *", "r");
is possible with popen() (expands all files in the current directory).
Compare it with:
execvp("/bin/ls", (char *[]){"/bin/ls", "*", NULL});
You can't exec ls with * as argument because exec(2) will interpret * literally.
Similarly, pipes (|), redirection (>, <, ...), etc., are possible with popen.
Otherwise, there's no reason to use popen if you don't need shell - it's unnecessary. You'll end up with an extra shell process and all the things that can go wrong in a shell go can wrong in your program (e.g., the command you pass could be incorrectly interpreted by the shell and a common security issue). popen() is designed that way. fork + exec solution is cleaner without the issues associated with a shell.
The glib answer is because the The POSIX standard ( http://pubs.opengroup.org/onlinepubs/9699919799/functions/popen.html ) says so. Or rather, it says that it should behave as if the command argument is passed to /bin/sh for interpretation.
So I suppose a conforming implementation could, in principle, also have some internal library function that would interpret shell commands without having to fork and exec a separate shell process. I'm not actually aware of any such implementation, and I suspect getting all the corner cases correct would be pretty tricky.
The 2004 version of the POSIX system() documentation has a rationale that is likely applicable to popen() as well. Note the stated restrictions on system(), especially the one stating "that the process ID is different":
RATIONALE
...
There are three levels of specification for the system() function. The
ISO C standard gives the most basic. It requires that the function
exists, and defines a way for an application to query whether a
command language interpreter exists. It says nothing about the command
language or the environment in which the command is interpreted.
IEEE Std 1003.1-2001 places additional restrictions on system(). It
requires that if there is a command language interpreter, the
environment must be as specified by fork() and exec. This ensures, for
example, that close-on- exec works, that file locks are not inherited,
and that the process ID is different. It also specifies the return
value from system() when the command line can be run, thus giving the
application some information about the command's completion status.
Finally, IEEE Std 1003.1-2001 requires the command to be interpreted
as in the shell command language defined in the Shell and Utilities
volume of IEEE Std 1003.1-2001.
Note the multiple references to the "ISO C Standard". The latest version of the C standard requires that the command string be processed by the system's "command processor":
7.22.4.8 The system function
Synopsis
#include <stdlib.h>
int system(const char *string);
Description
If string is a null pointer, the system function determines
whether the host environment has a command processor. If string
is not a null pointer, the system function passes the string
pointed to by string to that command processor to be executed
in a manner which the implementation shall document; this might then
cause the program calling system to behave in a non-conforming
manner or to terminate.
Returns
If the argument is a null pointer, the system function
returns nonzero only if a command processor is available. If
the argument is not a null pointer, and the system function
does return, it returns an implementation-defined value.
Since the C standard requires that the systems "command processor" be used for the system() call, I suspect that:
Somewhere there's a requirement in POSIX that ties popen() to the system() implementation.
It's much easier to just reuse the "command processor" entirely since there's also a requirement to run as a separate process.
So this is the glib answer twice-removed.
The question asked to me is that even if we supplied integer/float arguments at the command prompt they are still treated as strings or not in C language. I am not sure about that can any help just little bit. Is this true or not in C language and why? And what about others like Java or python ?
It is true, independent of the language, that the command line arguments to programs on Unix are strings. If the program chooses to interpret them as numbers, that is fine, but it is because the program (or programmer) chose to do so. Similarly, the runtime support for a language might alter the arguments passed by the o/s into integer or float types, but the o/s passes strings to that runtime (I know of no language that does this, but I don't claim to know all languages).
To see this, note that the ways to execute a program are the exec*() family of functions, and each of those takes a string which is the name of the program to be executed, and an array of strings which are the arguments to be passed to the program. Functions such as system() and popen() are built atop the exec*() family of functions — they also use fork(), of course. Even the posix_spawn() function takes an array of pointers to strings for the arguments of the spawned program.
It's not unlike mailing a letter without an envelope. We all agree to use the common enclosure known as an envelope. Operating systems pass parameters to programs using the common item known as a string of characters. It's beyond the scope of the operating system to understand what the program wants to do with the parameters.
There are some exceptions, one which comes to mind is the passing of parameters to a Linux Kernel Module. These can be passed as items other than strings.
Basically, this is an issue of creating an interface between the operating system and the program. any program. Remember that programs are not always written in C, And you don't even know whether there are things like float or int in the language.
You want to be able to pass several arguments (with natural delimiters), which may easily encode arbitrary information. In C, strings can be of arbitrary length, and the only constraint on them is that a zero byte in them signifies the end of the string. this is a highly flexible and natural way to pass arbitrary information to the program.
So you can never supply an arbitrary integer/float arguments directly to a program; The operating system (Unix / Linux / Windows / etc.) won't let you. You don't have any tool that gives you that interface, in the same way that you can't pass a mouse-click as an argument. All you supply is a sequence of characters.
Since Unix and C were designed together, it is also part of the C programming language, and from there it worked its way to C++, Java, Python and most other modern programming languages, and the same way into Linux, Windows and most other operating systems.
I have heard that in Windows, parameters are passed a single parameter, and then the program splits it into arguments, either in its runtime libraries, or sometimes, in the actual code.
I've heard that most C/C++ compilers do it in runtime libararies (for example, TCC - Tiny C Compiler, which I downloaded)
Are there any C compilers I can download, that don't? Any links to them?
And in such a compiler, would argsv[0] have the whole string?
Added
It's based on what this person (jdedb) said in Super User question Can't pipe or redirect Cygwin grep output, after seeming to suggest that I ask on Stack Overflow.
"It's up to the called program to split the command tail into words, if it wants to operate in Unix (and C language) fashion. (The runtime support libraries of most C and C++ language implementations for Win32 do this splitting behind the scenes."
He said it's the compilers.. But according to Necrolis, it's not the compiler.
(added- Necrolis commented correcting my misreading, compiler!=runtime library)
If you are on Windows, just use GetCommandLine. This is how most CRT wrappers get the command line to split to start with.
As for your actual question, it's not the compiler, but the CRT startup wrapper that they use. If you implement mainCRTstartup, and override the entrypoint with it, you can do whatever you want. A good example of how it works can be seen here.
That "parameter splitting" is the way mandated by the C99 Standard (PDF file) in 5.1.2.2.1.
If an implementation (compiler + library + options) recognizes but does not separate the program name from the other parameters (and parameters from each other) it is not conforming.
Of course, if you use a free-standing implementation none of this applies.
In the C / Unix environment I work in, I see some developers using __progname instead of argv[0] for usage messages. Is there some advantage to this? What's the difference between __progname and argv[0]. Is it portable?
__progname isn't standard and therefore not portable, prefer argv[0]. I suppose __progname could lookup a string resource to get the name which isn't dependent on the filename you ran it as. But argv[0] will give you the name they actually ran it as which I would find more useful.
Using __progname allows you to alter the contents of the argv[] array while still maintaining the program name. Some of the common tools such as getopt() modify argv[] as they process the arguments.
For portability, you can strcopy argv[0] into your own progname buffer when your program starts.
There is also a GNU extension for this, so that one can access the program invocation name from outside of main() without saving it manually. One might be better off doing it manually, however; thus making it portable as opposed to relying on the GNU extension. Nevertheless, I here provide an excerpt from the available documentation.
From the on-line GNU C Library manual (accessed today):
"Many programs that don't read input from the terminal are designed to exit if any system call fails. By convention, the error message from such a program should start with the program's name, sans directories. You can find that name in the variable program_invocation_short_name; the full file name is stored the variable program_invocation_name.
Variable: char * program_invocation_name
This variable's value is the name that was used to invoke the program running in the current process. It is the same as argv[0]. Note that this is not necessarily a useful file name; often it contains no directory names.
Variable: char * program_invocation_short_name
This variable's value is the name that was used to invoke the program running in the current process, with directory names removed. (That is to say, it is the same as program_invocation_name minus everything up to the last slash, if any.)
The library initialization code sets up both of these variables before calling main.
Portability Note: These two variables are GNU extensions. If you want your program to work with non-GNU libraries, you must save the value of argv[0] in main, and then strip off the directory names yourself. We added these extensions to make it possible to write self-contained error-reporting subroutines that require no explicit cooperation from main."
I see at least two potential problems with argv[0].
First, argv[0] or argv itself may be NULL if execve() caller was evil or careless enough. Calling execve("foobar", NULL, NULL) is usually an easy and fun way to prove an over confident programmer his code is not sig11-proof.
It must also be noted that argv will not be defined outside of main() while __progname is usually defined as a global variable you can use from within your usage() function or even before main() is called (like non standard GCC constructors).
It's a BSDism, and definitely not portable.
__progname is just argv[0], and examples in other replies here show the weaknesses of using it. Although not portable either, I'm using readlink on /proc/self/exe (Linux, Android), and reading the contents of /proc/self/exefile (QNX).
If your program was run using, for instance, a symbolic link, argv[0] will contain the name of that link.
I'm guessing that __progname will contain the name of the actual program file.
In any case, argv[0] is defined by the C standard. __progname is not.