Command Line Argument Counting - c

This is a simple C program that prints the number of command line argument passed to it:
#include <stdio.h>
int main(int argc, char *argv[])
{
printf("%d\n", argc);
}
When I give the input
file_name *
It prints 623 instead of 2 in my pc (operating system Windows 7). But it gives the correct output in other cases. Is * a reserved character for command line arguments?
Note this program gives correct output for the following input:
file_name *Rafi
Output = 2

On a Unix command line, the shell is responsible for handling wildcards. yourapp * will run yourapp, and pass the name of ALL of the non-hidden files in the current directory as arguments. In your case, that's 622 files (623 = 622 files + name of the program).
On Windows, applications are responsible for wildcard parsing, so argc is 2, 1 for the name of the program (argv[0]) and 1 for the wildcard (argv[1] = *);

That * gets expanded by the shell or the runtime library (the former on *nixes, the latter on Windowses), and instead of literal * you get the names of all the files in the current working directory.

As others have mentioned, you're getting the 'shell wildcard expansion' or 'globbing' where the * is used as a wildcard to match file names to place in the argv array.
On Unix systems this is performed by the shell and has nothing (or little) to do with the C runtime.
On Windows systems, this functionality is not performed by the shell (unless possibly if you're using some Unix-like shell replacement like Cygwin). The globbing functionality may or may not be performed by the C runtime's initialization depending on what tools and/or linker options you use:
if you're using Microsoft's compiler, the C runtime will not perform globbing by default, and you would get an argc value of 2 in your example. However, if you ask the linker to link in setargv.obj (or wsetargv.obj if you have a Unicode build), then globbing is added to the runtime initialization and you'll get behavior similar to Unix's. setargv.obj has been distributed with MSVC for as long as I can remember, but it's still little known. I believe that most Windows programs perform their own wildcard expansion.
if you're using the MinGW/GCC tool chain, the C runtime will perform globbing before calling main() (at least it does for MinGW 4.6.1 - I suspect it's been in MinGW for a long time). I think MinGW might not perform globbing for GUI programs. You can disable MinGW's globbing behavior with one of the following:
define a global variable named _CRT_glob and initialize it to 0:
int _CRT_glob = 0;
link in the lib/CRT_noglob.o object file (I think this might be order dependent - you may need to place it before any libraries):
gcc c:/mingw/lib/CRT_noglob.o main.o -o main.exe

The problem is that the shell expands * into all the file names (that don't start with a .) in the current directory. This is all about the shell and very little to do with the C program.
The value of argc includes 1 for the program's own name, plus one for each argument passed by the shell.
Try:
filename *
filename '*'
The first will give you 623 (give or take - but it is time you cleaned up that directory!). The second will give you 2.

Related

How is it possible that Cygwin seemingly manages to bypass the MS C Runtime library enabling a C program to get its argv like a Linux machine would?

How is it possible that Cygwin seemingly manages to bypass the MS C Runtime library enabling a C program to get its argv like a Linux machine would?
I'll explain what I mean.
On Windows I understand that a C program has the choice of calling GetCommandLine() or of using argv.
And I understand that a windows implementation of C compiler would make C programs implicitly call the MS C Runtime Library, which will take the command line (perhaps outputted by GetCommandLine()), that isn't separated into arguments, and it'll take that as input and parse it, putting it into argv. This link mentions about that https://learn.microsoft.com/en-us/cpp/c-language/parsing-c-command-line-arguments?view=msvc-170
And from what I understand, on Linux, what's written after the command at the command line, goes straight from the shell to argv. No external library doing the parsing. The shell calls a POSIX function called execv and figures out what the arguments are and passes them to execv which passes them to the program's argv.
I use these programs for some tests
C:\blah>type w.c
#include <stdio.h>
#include <windows.h>
int main(int argc, char *argv[]) {
printf(GetCommandLine());
return 0;
}
C:\blah>w.exe "asdf" erw
w.exe "asdf" erw
C:\blah>
C:\blah>type w2.c
#include <stdio.h>
int main(int argc, char *argv[]) {
int i = 0;
while (argv[i]) {
printf("argv[%d] = %s\n", i, argv[i]);
i++;
}
return 0;
}
C:\blah>w2 abc "def"
argv[0] = w2
argv[1] = abc
argv[2] = def
C:\blah>
And w2.c can be run from linux too
root#ubuntu:~# ./w2 abc "def"
argv[0] = ./w2
argv[1] = abc
argv[2] = def
root#ubuntu:~#
I notice that there are some cases where the MS C Runtime gives a different parsing, to Linux. (Linux of course wouldn't be using the MS C Runtime)
For example, this link https://learn.microsoft.com/en-us/cpp/c-language/parsing-c-command-line-arguments?view=msvc-170 mentions this command line input a\\\b d"e f"g h and expected outputs.
C:\blah>w2 a\\\b d"e f"g h
argv[0] = w2
argv[1] = a\\\b
argv[2] = de fg
argv[3] = h
C:\blah>
Whereas on Linux, one gets
root#ubuntu:~# ./w2 a\\\b d"e f"g h
argv[0] = ./w2
argv[1] = a\b
argv[2] = de fg
argv[3] = h
So now the interesting test was, what would Cygwin do
user#comp /cygdrive/c/blah
$ ./w2 a\\\b d"e f"g h
argv[0] = C:\blah\w2.exe
argv[1] = a\b
argv[2] = de fg
argv[3] = h
Cygwin manages to get the result that a linux machine would give.
But it's running an EXE file that was compiled on Windows and that i'd have thought must be using the MS C Runtime library. And when running the EXE file from CMD outside cygwin, then it does look like it's using the MS C Runtime Library. So how is Cygwin seemingly managing to bypass that to lead the program to give the result that a linux machine would give.
How is this possible?! What is going on?!
I conversed with somebody that knows about cygwin.. They said that cygwin can detect whether an executable is a windows executable, or a cygwin executable. And the ldd command can do so. And a cygwin executable will be linked to cygwin1.dll. A program like sysinternals process explorer can show what DLLs are linked to a running process e.g. it shows that bash.exe is linked to cygwin1.dll. But the ldd command is more useful here as it shows also for commands that aren't kept open. $ ldd /bin/bash.exe showed some NT related DLLs, but also cygwin1.dll. Whereas $ldd ./w.exe showed just NT related Dlls, no cygwin1.dll.
And they said that this file winsup/cygwin/winf.cc is very relevant to that. I have it on my system https://gist.github.com/gartha1/4a2871b7f22ef85b5c8c0b08674b6f57 I see it has stuff about argv
Some comments and C guys I conversed with have indicated to me, and from my understanding of what they said, that Linux has some compiler specific C runtime libraries. And when people say C runtime libraries, they tend to mean to include also POSIX functions like execv, that is technically not part of the C standard, but part of the POSIX standard. And the runtime libraries apply before main starts and after end finishes.
I was looking at it from the point of view is, this is the command line, i've typed, and what is then sent to argv, and how. But another perspective is, looking at what's sent to argv, and taking a step back and what's the value at GetCommandLine() that'd produce that. And also I think, looking at the command line typed, and seeing what it sends or would send to GetCommandLine().
The MS C Runtime, starts with GetCommandLine() then calls GetCommandLineToArgs() https://learn.microsoft.com/en-us/windows/win32/api/processenv/nf-processenv-getcommandlinea "GetCommandLine as an alias which automatically selects the ANSI or Unicode version of this function" and "To convert the command line to an argv style array of strings, pass the result from GetCommandLineA to CommandLineToArgW." What is the difference between the `A` and `W` functions in the Win32 API? "The A functions use Ansi (not ASCII) strings as input and output, and the W functions use Unicode string instead .."
So, what MS C Runtime sees when it does GetCommandLine() is very significant. And I think Cygwin's linux shell e.g. bash , does its parsing.. which is described by info bash and includes "word splitting"(separating arguments), and quote removal.
calc.exe is useful because it stays open so I can look at the command line with WMIC. That's clearer than using w.exe(from cmd), to determine what the command line is.
To use some simple examples with calc.exe trying calling it with command line of
calc "abc"
calc a\a
In the case of calc "abc", what gets into argv in Cygwin and plain cmd is the same. And what gets seen by GetCommandLine() won't really need any adjustment, though cygwin sanitises what it makes available to GetCommandLine() a bit.
Looking with CMD, we see
C:\>w calc abc
w calc abc
C:\>w calc "abc"
w calc "abc"
C:\>w2 calc abc
argv[0] = w2
argv[1] = calc
argv[2] = abc
C:\>w2 calc "abc"
argv[0] = w2
argv[1] = calc
argv[2] = abc
So a value from GetCommandLine() of calc "abc" or calc abc are equivalent
I'm using wmiccalc.exe which runs the line wmic process where caption="calc.exe" get commandline | calc "abc"
C:\>calc "abc" <ENTER>
C:\>wmiccalc.bat<ENTER>
calc "abc"
Now see what happens if I run calc from cygwin, what the command line is
$ calc "abc" &
$ ./wmiccalc.bat
C:\Windows\System32\calc.exe abc
It is using a slightly sanitised command line that won't change anything in terms of what is sent to argv(what it gives the runtime to send to argv), relative to what the pure cmd call of calc.exe will end up (via the runtime), sending through to argv.
In both cases it'd be the MS C Runtime. That gets run.
What Cygwin did was it took the "abc" and said, well, bash will want abc in argv, so it constructed a command line that (when sent through the MS C Runtime), would/will send abc to argv.
Now let's look at this example
2. calc a\a
This is slightly different to the first example. 'cos not just what is sent (via the MS C runtime), to argv in the cygwin case and the cmd case are different..
What is produced by the MS C Runtime, is different.
Cygwin sends what it wants to send, to produce the output that bash wants produced.
C:\>calc a\a
C:\>wmiccalc.bat
calc a\a
From Windows, that's the command line
And from that command line, The MS C Runtime will send the following to argv
>w2 a\a
argv[0] = w2
argv[1] = a\a
If though an executable in linux gets a command line like a\a , it treats the backslash as an escape character.. so it wouldn't have a\a going to an argv.
$ echo a\a
aa
So if I do
$ calc a\a &
$ ./wmiccalc.bat
C:\Windows\System32\calc.exe aa
So cygwin will use a very different command line.. a command line of aa not a\a
$ ./w2 a\a
argv[0] = ......\w2.exe
argv[1] = aa
And that makes sense, because if we look at CMD, a command line a\a gets what we'd want if having that command line on windows.
>w2 a\a
argv[0] = w2
argv[1] = a\a
>
Whereas a command line of aa i.e. the MS C Runtime seeing a GetCommandLine() result of aa, gets what we'd want in argv if running it from linux or bash
>w2 aa
argv[0] = w2
argv[1] = aa
>
So, if you run the executable on windows plain CMD not cygwin, you get what it should show for Windows.
And if you run the executable from Cygwin, cygwin's shell e.g. bash shell, parses it constructs the windows call to the program so that it gives MS C Runtime the command line so that Ms C Runtime will put the right things into argv to give what a linux machine would show. So it's not bypassing MS C Runtime. It's using it cleverly. It's saying "Having parsed the output given to me by the linux shell e.g. bash, I know what argv values I want, so i'll put together a command line that takes into account how MS C Runtime parses things, so as to get the argv values I want"
By the way
One of the comments corrects one of the things I wrote in my question.. I wrote
And from what I understand, on Linux, what's written after the command
at the command line, goes straight from the shell to argv. No external
library doing the parsing. The shell calls a POSIX function called
execv and figures out what the arguments are and passes them to execv
which passes them to the program's argv.
But actually, there's a C Runtime used by compilers on linux.. The POSIX function execv would be considered to be part of that. If somebody didn't want to call it C Runtime, they could call it C/POSIX runtime.
Also some comments to the question helped correct some misconceptions in areas of lack of clarity in the question e.g.
To the question of "And am I correct in thinking that the shell passes the command line or some function of it, to the Runtime, which puts the command line into argv?"
this comment explained how what the shell wants the arguments to be, will eventually get to main(And thus argv). Never going straight there, and not even from the shell straight to the runtime.. From shell to OS to runtime.
"
#barlop: The shell passes the command line to the operating system, probably by calling CreateProcess (one of arguments of that function is the command line). The operating sytem then creates a new process, which causes the C run-time library to take control. The run-time library will probably call the Windows API function GetCommandLine and will use the returned information to set argc and argv, before it calls main. –
Andreas Wenzel"
Consider that the shell (bash?) in Cygwin does its own parsing of the command line before any Windows function is called to launch the application. Since this shell is more compatible to a Linux shell, I'd expect the same outcome, in contrast to the parsing of CMD. –
the busybee
"
Anyhow, I think this addresses what is happening.. How the command line typed into cygwin is transformed to a string seen by GetCommandLine() and gets the result using the MS C Runtime library.
I used two simple examples but they would explain it for the case given in the question too.

Get pre-shebang executable path in MacOS (equivalent to getauxval(AT_EXECFN) )

For the problem described at bash - Detect if a script is being run via shebang or was specified as a command line argument - Unix & Linux Stack Exchange, we need to distinguish between cases when a script is run via shebang and as an argument to the interpreter.
An answer to that question suggests getting the pre-shebang executable name using getauxval(AT_EXECFN) -- which works, but only in Linux.
Since the Pyenv project also officially supports MacOS, we need an equivalent for that if we are to consider that solution.
I've checked Finding current executable's path without /proc/self/exe -- but both _dyld_get_image_name(0) and _NSGetExecutablePath give the post-shebang name. Here's a sample program that I used to do the checking (see the question link above on how it's used; its compilation result needs to be put in place of the python3 Bash script given in that question):
#include <stdio.h>
#include <unistd.h>
/*#include <sys/auxv.h>*/
#include <mach-o/dyld.h>
#include <sys/param.h>
#include <alloca.h>
int main(int argc, char** argv) {
//char *at_execfn = (char*)getauxval(AT_EXECFN);
//const char *at_execfn = _dyld_get_image_name(0);
char *at_execfn = (char*)alloca(MAXPATHLEN);
uint32_t at_execfn_len = MAXPATHLEN;
_NSGetExecutablePath(at_execfn,&at_execfn_len);
printf("original executable: '%s'\n",at_execfn);
for(int i=0; i<argc; i++) {
printf("'%s'\n",argv[i]);
}
execvp("python3",argv);
}
This answer is based on the following assumptions; I'm sure others will vet whether they are true, but to my understanding, they are:
Python scripts will only use the shebang if they are executed directly.
Otherwise, the first command line argument will always be python, python3, or some other variation (python3.x, etc.).
You can already get the path to the original file, which is good because you can read what the shebang says, but you don't yet know whether the shebang was used, right? Python 3.10 offers an appealing solution: sys.orig_argv, which includes all the command line arguments, not just those from the program name forward as you get with normal sys.argv.
However, I'm sure you won't be implementing a 3.10-exclusive feature into pyenv! If that is the case, you can see the older C-API Py_GetArgcArgv, whose docs simply state:
Get the original command line arguments, before Python modified them.
Either way, I think that having the file path so you can read the shebang is the first part of the puzzle. The second part is figuring out if the shebang was actually used, and I think that the answer is in the command line arguments for most cases.

How to pass run time arguments to a function in c through a shell script

I have a shell script which has to take arguments from the command line and pass it to a function in C. I tried to search but didn't find understandable solutions. Kindly help me out.
Should the arguments be passed via an option as a command in the shell script?
I have a main function like this:
int main(int argc, char *argv[])
{
if(argc>1)
{
if(!strcmp(argv[1], "ABC"))
{
}
else if(!strcmp(argv[1], "XYZ"))
{
}
}
}
How to pass the parameters ABC/XYZ from the command line through a shell script which in turn uses a makefile to compile the code?
You cannot meaningfully compare strings with == which is a pointer equality test. You could use strcmp as something like argc>1 && !strcmp(argv[1], "XYZ"). The arguments of main have certain properties, see here.
BTW, main's argc is at least 1. So your test argc==0 is never true. Generally argv[0] is the program name.
However, if you use GNU glibc (e.g. on Linux), it provides several ways for parsing program arguments.
There are conventions and habits regarding program arguments, and you'll better follow them. POSIX specifies getopt(3), but on GNU systems, getopt_long is even more handy.
Be also aware that globbing is done by the shell on Unix-like systems. See glob(7).
(On Windows, things are different, and the command line might be parsed by some startup routine à la crt0)
In practice, you'll better use some system functions for parsing program arguments. Some libraries provide a way for that, e.g. GTK has gtk_init_with_args. Otherwise, if you have it, use getopt_long ...
Look also, for inspiration, into the source code of some free software program. You'll find many of them on github or elsewhere.
How to pass the parameters ABC/XYZ from the command line through a shell script
If you compile your C++ program into an executable, e.g. /some/path/to/yourexecutable, you just have to run a command like
/some/path/to/yourexecutable ABC
and if the directory /some/path/to/ containing yourexecutable is in your PATH variable, you can simply run yourexecutable ABC. How to set that PATH variable (which you can query using echo $PATH in your Unix shell) is a different question (you could edit some shell startup file, perhaps your $HOME/.bashrc, with a source code editor such as GNU emacs, vim, gedit, etc...; you could run some export PATH=.... command with an appropriate, colon-separated, sequence of directories).
which in turn uses a makefile to compile the code?
Then you should look into that Makefile and you'll know what is the executable file.
You are using and coding on/for Linux, so you should read something about Linux programming (e.g. ALP or something newer; see also intro(2) & syscalls(2)...) and you need to understand more about operating systems (so read Operating Systems: Three Easy Pieces).
See following simple example:
$ cat foo.c
#include <stdio.h>
int main(int argc, char ** argv)
{
int i;
for (i = 0; i < argc; ++i) {
printf("[%d] %s\n", i, argv[i]);
}
return 0;
}
$ gcc foo.c
$ ./a.out foo bar
[0] ./a.out
[1] foo
[2] bar
$

Command line arguments without the hyphen

How can I parse arguments without the hyphen in C?
I.e. virsh install vm
or
git pull origin master
When I tried it out, if there is no - prefix, everything just gets ignored and argc returns 1 (argv[0] is the program call).
I'm using Linux, but it would be nice if there was a cross platform method to achieve this.
UPDATE: the problem was me using a # in front of the first argument, I was trying to pass in #XX eg number_program #12. Needless to say this doesn't work.
Are you using some library to parse the arguments for you? There is no special 'hyphen' arguments when passing in parameters to a C program specifically. Parse argv however you like.
For example:
#include <stdio.h>
int main(int argc, char **argv)
{
int i;
for(i=0; i<argc; i++) {
//dont do this without proper input validation
printf("%s\n", argv[i]);
}
return 0;
}
Example run:
$ ./a.out test test test -hyphen
./a.out
test
test
test
-hyphen
argv contains the program name and the arguments to the program, in the order they were given in the command line.* Hyphens aren't special; they just make it easy for both people and computers to separate options from other args.
If you want to interpret args a certain way, that's your prerogative. That's what git does, basically interpreting argv[1] (if it exists, of course) as the name of a subcommand. And you don't need any libraries in order to do that. You just need to decide how you want the args interpreted.
* Modulo some cross-platform differences in how args are parsed; *nix typically does some pre-parsing for you and expands wildcard patterns, for example. You won't have 100% cross-platform compatibility unless you understand those differences and are ready for them.

running shell script using c programming

hello every one I want to ask that I am making a program in which i have to run shell script using c program. up till now i have separated the arguments. and i have searched that exec should be use to run shell scripts
but i am totally confused as there are many variants of exec and by reading the man pages i am unable to find which is best suitable
Also in some exec function first arg is
path
and some have
pointer to file
what is the difference and what should i write in place of it.kindly guide me
thanks
Running a shell script from a C program is usually done using
#include <stdlib.h>
int system (char *s);
where s is a pointer to the pathname of the script, e.g.
int rc = system ("/home/username/bin/somescript.sh");
If you need the stdout of the script, look at the popen man page.
#include <stdio.h>
#include <stdlib.h>
#define SHELLSCRIPT "\
for ((i=0 ; i < 10 ; i++))\n\
do\n\
echo \"Count: $i\"\n\
done\n\
"
int main(void)
{
puts("Will execute sh with the following script:");
puts(SHELLSCRIPT);
puts("Starting now:");
system(SHELLSCRIPT);
return 0;
}
Reference:
http://www.unix.com/programming/216190-putting-bash-script-c-program.html
All exec* library functions are ultimately convenience wrappers over the execve() system call. Just use the one that you find more convenient.
The ones that end in p (execlp(), execvp()) use the $PATH environment variable to find the program to run. For the others you need to use the full path as the first argument.
The ones ending in e (execle(), execve()) allow you to define the environment (using the last argument). This way you avoid potential problems with $PATH, $IFS and other dangerous environment variables.
The ones wih an v in its name take an array to specify arguments to the program to run, while the ones with an l take the arguments to the program to run as variable arguments, ending in (char *)NULL. As an example, execle() is very convenient to construct a fixed invocation, while execv* allow for a number of arguments that varies programatically.

Resources