Unsymmetry between main and launching process - c

A C main method has the signature
int main(int argc, char** argv) {
}
It will get an array of command line parameters. But when trying to launch an application, e.g. using CreateProcess or ShellExecute, they only accept 2 parameters, one for the application to launch and one for the parameters. Why the parameters are not specified as array, too? Why every application that uses other applications has to deal with escaping of command line parameters, e.g., when invoking a compare tool with 2 arbitrary file names that might contain spaces or quotes?

On very few system the actual program execution actually start at the main (or WinMain) or similar function. Instead the compiler tells the linker to use a special function which usually doesn't really take any arguments, in the C sense of the word.
The command-line arguments (if any) could be passed through special registers on the assembly level, or they needs to be fetched using special OS-specific functions (like GetCommandLine in the Windows API).
On Windows, the GetCommandLine function does indeed get the command line as a single string. Just like it was passed to e.g. CreateProcess.
For a Windows console program, the special "entry" function does some other initialization (like setting up stdin etc.), and then calls GetCommandLine to get the command-line arguments, which it then parses into an array suitable for the main function, which is then called.
If you look at the POSIX world (where e.g. Linux and macOS lives) then they have the exec family of functions which does indeed take an array for the arguments. Or a variable-argument list which is parsed into such an array.

Related

Execute a C program from within another C program as if it was a function call (in Windows)?

Is it possible call a separate C program (.exe file) within a C program, like if it was a function?
I would like to be able to pass arguments of any kind (like any other function) to this separate program, and get the return value (so it can be used in the host program).
I imagine that the arguments can be passed by using int argc, char *argv[], but I don't know if it's possible to pass integers, arrays, pointers to structures and so on.
On the other hand, I've read that the return value from the main function is system specific. Since I'm using Windows, is there any limitations to this return value (type, size, etc.)? Can it be anything that could be used as a return value in any normal function?
Thanks!
What you describe, is the basic premise of the Unix operating system. Unix was designed to allow accomplishing very complex tasks by chaining several commands, piping the (text) output of a command as the input of the next one (this was pretty revolutionary back then).
As klutt already suggested, you can accomplish the same with a Windows executable. To his list, I would add learning how to redirect the input/output of a program to a file handle.
The Windows PowerShell extended this concept to allow passing different data-types other than text, to some special executables known as cmdlets, however, to write your own, you need support from the .Net Framework or the .Net Core infrastructure, so you must do so from a managed language such as C# or C++/CLI.
Keep in mind that spawning a whole process is an extremely expensive operation (compared to simply calling a linked function), so there is some significant overhead you need to be aware of.

Call other Tcl commands from a custom command (Tcl_CmdProc)

At first glance (see the evidence below), it looks like while a Tcl_CmdProc has control, the interpreter is waiting for it to return and can't accept any other calls in the meantime.
So, how do I make any calls into Tcl before returning like e.g. a user-defined function would do? I guess I may need to set up a new call stack frame in the interpreter or something (and unwind it later). Tcl_CreateCommand man page says nothing on this matter.
The big picture is like this:
I'm fixing https://bugs.python.org/issue33257 . The TkinterHandlers.py example uses Python event handlers that are implemented as custom Tcl commands under the hood. Currently, their implementation releases the "Tcl lock" (a Python-specific lock that it wraps all Tcl calls with) while executing Python code and reacquires it to Tcl_SetObjResult at the end -- thus allowing other calls to the same interpreter in the meantime.
Now, if another call into the interpreter is actually made during this time frame, Tcl aborts shortly with a message on stderr: TclStackFree: incorrect freePtr. Call out of sequence?
And if I make the custom command hold on to the Tcl lock, it later freezes trying to acquire the lock again because it itself also needs to make a Tcl call sometimes. Now, I can make the lock reentrant, but without knowing how to handle the interpreter right, I'll probably break it, too.
To keep this question on topic, I'm specifically asking about how to handle the interpreter, and make Tcl calls in particular, from a Tcl_CmdProc. The specific situation is solely for exposition to illustrate my needs. If this is actually explained in some doc that I couldn't find, linking to it and reciting some key points would be sufficient.
To call a Tcl command from C code, you've got a choice between two API function families. One is Tcl_EvalObjv, and the other is Tcl_Eval. Each has a number of variants, but the only variant I'll mention is Tcl_EvalObjEx.
Tcl_EvalObjv
This function invokes a single Tcl command, with no processing of substitutions in arguments (unless the command itself does them, of course). It has this signature:
int Tcl_EvalObjv(Tcl_Interp *interp,
int objc,
Tcl_Obj *const objv[],
int flags);
It takes the description of what command to call and what arguments to pass to it as a C array of Tcl value references (in argument objv) where the array is of length objc; Tcl guarantees to not modify the array itself, but might transform the values if it does type conversions. The values must all have a non-zero reference count (and all values start with a zero reference count from their birthing Tcl_NewObj call). The interp is the interpreter context, and flags can usually be zero.
The result is a Tcl exception code; if it is TCL_OK, the result of the call can be retrieved from the interpreter using Tcl_GetObjResult, and if the exception code is TCL_ERROR then there was an error and you should usually pass that on out (perhaps adding to the stack trace with Tcl_AddErrorInfo). Other exception codes are possible; it's usually best to just pass those straight on out without doing any further processing (unless you're making something loop-like, when you should pay attention to TCL_BREAK and TCL_CONTINUE).
Tcl_Eval
This function evaluates a Tcl script, not just a single command, and that includes processing substitutions in arguments. It has this signature:
int Tcl_Eval(Tcl_Interp *interp,
const char *script);
The script is any old C string; Tcl won't modify it, but it will parse, bytecode-compile, and execute it. It's up to you to provide the script in a form that will execute a single command without surprises. The interp argument and the result of the function call are the same as for Tcl_EvalObjv.
If you're interested in using this for running a single command, you're actually better off using Tcl_EvalObjv or…
Tcl_EvalObjEx.
This is like Tcl_Eval except it takes the script as a Tcl value reference (and takes flags too).
int Tcl_EvalObjEx(Tcl_Interp *interp,
Tcl_Obj *objPtr,
int flags);
Again, make sure the objPtr has a non-zero reference count before passing it into this function. (It may adjust the reference count during execution.) Again, interp and the result are as documented for Tcl_EvalObjv, and flags is too.
The advantage of this for calling single commands is that you can call Tcl_NewListObj (or any other list-building function) to make the script value; doing so guarantees that there will be no surprise substitutions. But you could also go directly to invoking the command with Tcl_EvalObjv. But if you want to process anything more complex than a single simple call to a command, this is a good place to start as it has a key advantage that plain Tcl_Eval doesn't: it can make the type of the script passed in via objPtr be one that caches the compiled bytecode, allowing quite a reasonable performance gain in some circumstances.
Note that Tcl_EvalObjv is effectively the API that Tcl calls internally to invoke all user code and perform all I/O. (“Effectively” because things get more complex in Tcl 8.6.)
Within a Tcl_CmdProc, all these functions can be called as usual, no special processing or "handling of the interpreter" is needed. If this doesn't work for you, causing crashes or whatever, the interpreter is not at fault, something else must be wrong with your code.

Passing main arguments to init functions of gui libraries

When it comes to a library initialization in libraries such as Qt and GTK+
You have to pass the main arguments into the function that initializes the library. Why? What is the library doing with them?
Both Qt and GTK+ are designed to respond to certain command line flags for convenience. (Both respond to various environment variables as well.) You don't necessarily have to send argv and argc to the corresponding init functions, but it doesn't hurt, particularly if you intend to take advantage of the features.
Here's what the GTK+ documentation for gtk_init() has to say:
Although you are expected to pass the argc , argv parameters from
main() to this function, it is possible to pass NULL if argv is not
available or commandline handling is not required.
argc and argv are adjusted accordingly so your own code will never see
those standard arguments.
A full list of the command line options that GTK+ accepts is here.
Qt's QApplication similarly accepts command line arguments and removes the ones that it accepts. This is documented along with the accepted arguments in the QApplication constructor documentation.

how to use C to complete a wildcard function?

I am a rookie on C, and now I want to use C to complete a wildcard function. For example, I write a photo processing program named myphoto, and I want to use it like this: myphoto ./photos/*.png, and then myphoto will process all the png file in the dir one by one.
I would like to solve this problem as easily as possible, without the usage of regular expression, and I came up with a idea that maybe I could use the EXEC function to execute a command, but the EXEC function only returns int, not the char*.
So how can I solve this problem? thanks!
It is operating system specific. I'm giving a Posix and Linux point of view (on Windows it is different, and I don't know it).
Notice that if you are writing the program myprog.c compiled into myprog then running
myprog photos/*.png the main function in myprog.c is getting an array of strings (declare int main(int argc, char**argv) then the array of arguments has argc strings in array argv ....). The expansion is done by the shell before starting your myprog binary executable. See execve(2)
On Linux and Posix systems: read glob(7), you may want to use glob(3) and/or fnmatch(3) and/or wordexp(3). These functions are useful mostly if some data (e.g. a line in a file) contains photos/*.jpeg and your program want to "glob" that. You don't need to "glob" the arguments of main, this has been done already by your shell.
Read Advanced Linux Programming

Using '__progname' instead of argv[0]

In the C / Unix environment I work in, I see some developers using __progname instead of argv[0] for usage messages. Is there some advantage to this? What's the difference between __progname and argv[0]. Is it portable?
__progname isn't standard and therefore not portable, prefer argv[0]. I suppose __progname could lookup a string resource to get the name which isn't dependent on the filename you ran it as. But argv[0] will give you the name they actually ran it as which I would find more useful.
Using __progname allows you to alter the contents of the argv[] array while still maintaining the program name. Some of the common tools such as getopt() modify argv[] as they process the arguments.
For portability, you can strcopy argv[0] into your own progname buffer when your program starts.
There is also a GNU extension for this, so that one can access the program invocation name from outside of main() without saving it manually. One might be better off doing it manually, however; thus making it portable as opposed to relying on the GNU extension. Nevertheless, I here provide an excerpt from the available documentation.
From the on-line GNU C Library manual (accessed today):
"Many programs that don't read input from the terminal are designed to exit if any system call fails. By convention, the error message from such a program should start with the program's name, sans directories. You can find that name in the variable program_invocation_short_name; the full file name is stored the variable program_invocation_name.
Variable: char * program_invocation_name
This variable's value is the name that was used to invoke the program running in the current process. It is the same as argv[0]. Note that this is not necessarily a useful file name; often it contains no directory names.
Variable: char * program_invocation_short_name
This variable's value is the name that was used to invoke the program running in the current process, with directory names removed. (That is to say, it is the same as program_invocation_name minus everything up to the last slash, if any.)
The library initialization code sets up both of these variables before calling main.
Portability Note: These two variables are GNU extensions. If you want your program to work with non-GNU libraries, you must save the value of argv[0] in main, and then strip off the directory names yourself. We added these extensions to make it possible to write self-contained error-reporting subroutines that require no explicit cooperation from main."
I see at least two potential problems with argv[0].
First, argv[0] or argv itself may be NULL if execve() caller was evil or careless enough. Calling execve("foobar", NULL, NULL) is usually an easy and fun way to prove an over confident programmer his code is not sig11-proof.
It must also be noted that argv will not be defined outside of main() while __progname is usually defined as a global variable you can use from within your usage() function or even before main() is called (like non standard GCC constructors).
It's a BSDism, and definitely not portable.
__progname is just argv[0], and examples in other replies here show the weaknesses of using it. Although not portable either, I'm using readlink on /proc/self/exe (Linux, Android), and reading the contents of /proc/self/exefile (QNX).
If your program was run using, for instance, a symbolic link, argv[0] will contain the name of that link.
I'm guessing that __progname will contain the name of the actual program file.
In any case, argv[0] is defined by the C standard. __progname is not.

Resources