Writing a portable command line wrapper in C - c

I'm writing a perl module called perl5i. Its aim is to fix a swath of common Perl problems in one module (using lots of other modules).
To invoke it on the command line for one liners you'd write: perl -Mperl5i -e 'say "Hello"' I think that's too wordy so I'd like to supply a perl5i wrapper so you can write perl5i -e 'say "Hello"'. I'd also like people to be able to write scripts with #!/usr/bin/perl5i so it must be a compiled C program.
I figured all I had to do was push "-Mperl5i" onto the front of the argument list and call perl. And that's what I tried.
#include <unistd.h>
#include <stdlib.h>
/*
* Meant to mimic the shell command
* exec perl -Mperl5i "$#"
*
* This is a C program so it works in a #! line.
*/
int main (int argc, char* argv[]) {
int i;
/* This value is set by a program which generates this C file */
const char* perl_cmd = "/usr/local/perl/5.10.0/bin/perl";
char* perl_args[argc+1];
perl_args[0] = argv[0];
perl_args[1] = "-Mperl5i";
for( i = 1; i <= argc; i++ ) {
perl_args[i+1] = argv[i];
}
return execv( perl_cmd, perl_args );
}
Windows complicates this approach. Apparently programs in Windows are not passed an array of arguments, they are passed all the arguments as a single string and then do their own parsing! Thus something like perl5i -e "say 'Hello'" becomes perl -Mperl5i -e say 'Hello' and Windows can't deal with the lack of quoting.
So, how can I handle this? Wrap everything in quotes and escapes on Windows? Is there a library to handle this for me? Is there a better approach? Could I just not generate a C program on Windows and write it as a perl wrapper as it doesn't support #! anyway?
UPDATE: Do be more clear, this is shipped software so solutions that require using a certain shell or tweaking the shell configuration (for example, alias perl5i='perl -Mperl5i') aren't satisfactory.

For Windows, use a batch file.
perl5i.bat
#echo off
perl -Mperl5i %*
%* is all the command line parameters minus %0.
On Unixy systems, a similar shell script will suffice.
Update:
I think this will work, but I'm no shell wizard and I don't have an *nix system handy to test.
perl5i
#!bash
perl -Mperl5i $#
Update Again:
DUH! Now I understood your #! comment correctly. My shell script will work from the CLI but not in a #! line, since #!foo requries that foo is a binary file.
Disregard previous update.
It seems like Windows complicates everything.
I think your best there is to use a batch file.
You could use a file association, associate .p5i with perl -Mperl5i %*. Of course this means mucking about in the registry, which is best avoided IMO. Better to include instructions on how to manually add the association in your docs.
Yet another update
You might want to look at how parl does it.

I can't reproduce the behaviour your describe:
/* main.c */
#include <stdio.h>
int main(int argc, char *argv[]) {
int i;
for (i = 0; i < argc; i++) {
printf("%s\n", argv[i]);
}
return 0;
}
C:\> ShellCmd.exe a b c
ShellCmd.exe
a
b
c
That's with Visual Studio 2005.

Windows is always the odd case. Personally, I wouldn't try to code for the Windows environment exception. Some alternatives are using "bat wrappers" or ftype/assoc Registry hacks for a file extension.
Windows ignores the shebang line when running from a DOS command shell, but ironically uses it when CGI-ing Perl in Apache for Windows. I got tired of coding #!c:/perl/bin/perl.exe directly in my web programs because of portability issues when moving to a *nix environment. Instead I created a c:\usr\bin directory on my workstation and copied the perl.exe binary from its default location, typically c:\perl\bin for AS Perl and c:\strawberry\perl\bin for Strawberry Perl. So in web development mode on Windows my programs wouldn't break when migrated to a Linux/UNIX webhost, and I could use a standard issue shebang line "#!/usr/bin/perl -w" without having to go SED crazy prior to deployment. :)
In the DOS command shell environment I just either set my PATH explicitly or create a ftype pointing to the actual perl.exe binary with embedded switch -Mperl5i. The shebang line is ignored.
ftype p5i=c:\strawberry\perl\bin\perl.exe -Mperl5i %1 %*
assoc .pl=p5i
Then from the DOS command line you can just call "program.pl" by itself instead of "perl -Mperl5i program.pl"
So the "say" statement worked in 5.10 without any additional coaxing just by entering the name of the Perl program itself, and it would accept a variable number of command line arguments as well.

Use CommandLineToArgvW to build your argv, or just pass your command line directly to CreateProcess.
Of couse, this requires a separate Windows-specific solution, but you said you're okay with that, this is relatively simple, and often coding key pieces specifically to the target system helps integration (from the users' POV) significantly. YMMV.
If you want to run the same program both with and without a console, you should read Raymond Chen on the topic.

On Windows, at the system level, the command-line is passed to the launched program as a single UTF-16 string, so any quotes entered in the shell are passed as is. So the double quotes from your example are not removed. This is quite different from the POSIX world where the shell does the job of parsing and the launched program receives an array of strings.
I've described here the behavior at the system level. However, between your C (or your Perl) program there is usually the C standard library that is parsing the system command line string to give it to main() or wmain() as argv[]. This is done inside your process, but you can still access the original command line string with GetCommandLineW() if you really want to control how the parsing is done, or get the string in its full UTF-16 encoding.
To learn more about the Windows command-line parsing quirks, read the following:
http://www.autohotkey.net/~deleyd/parameters/parameters.htm#WIN
http://blogs.msdn.com/b/oldnewthing/archive/2006/05/15/597984.aspx
You may also be interested by the code of the wrapper I wrote for Padre on Win32: this is a GUI program (which means that it will not open a console if launched from the Start menu) called padre.exe that embeds perl to launch the padre Perl script. It also does a small trick: it changes argv[0] to point it to perl.exe so that $^X will be something usable to launch external perl scripts.
The execv you are using in your example code is just an emulation in the C library of the POSIX-like behavior. In particular it will not add quotes around your arguments so that the launched perl works as expected. You have to do that yourself.
Note that due to the fact that the client is responsible for parsing, each client client can do it the way it wants. Many let the libc do it, but not all. So generic command-line generation rules on Windows can not exist: the rule depend on the program launched.
You may still be interested in "best effort" implementation such as Win32::ShellQuote.

If you were able to use C++ then perhaps Boost.Program_options would help:
http://www.boost.org/doc/libs/1_39_0/doc/html/program_options.html

Related

Execute any command-line shell like into execve

In case this is helpful, here's my environment: debian 8, gcc (with std = gnu99).
I am facing the following situation:
In my C program, I get a string (char* via a socket).
This string represents a bash command to execute (like 'ls ls').
This command can be any bash, as it may be complex (pipelines, lists, compound commands, coprocesses, shell function definitions ...).
I can not use system or popen to execute this command, so I use currently execve.
My concern is that I have to "filter" certain command.
For example, for the rm command, I can apply it only on the "/home/test/" directory. All other destinations is prohibited.
So I have to prevent the command "rm -r /" but also "ls ls && rm -r /".
So I have to parse the command line that is given me, to find all the command and apply filters on them.
And that's when I'm begin to be really lost.
The command can be of any complexity, so if I want to make pipelines (execve execute a command at a time) or if I want to find all commands for applying my filters, I'll have to develop parser identical to that of sh.
I do not like creating the wheel again, especially if I make it square.
So I wonder if there is a feature in the C library (or that of gnu) for that.
I have heard of wordexp, but I do not see how manage pipelines, redirection or other (in fact, this does not seem made for this) and i do not see how can I retrieve all the command inside the commande.
I read the man of sh(1) to see if I can use it to "parse" but not execute a command, but so far, I find nothing.
Do I need to code a parser from the beginning?
Thank for your reading, and I apologies for my bad english : it's not my motherlanguage (thanks google translate ...).
Your problem:
I am facing the following situation: In my C program, I get a string
(char* via a socket). This string represents a bash command to execute
(like 'ls ls'). This command can be any bash, as it may be complex
(pipelines, lists, compound commands, coprocesses, shell function
definitions ...).
How do you plan on authenticating who is at the other end of the socket connection?
You need to implement a command parser, with security considerations? Apparently to run commands remotely, as implied by "I get a string (char* via a socket)"?
The real solution:
How to set up SSH without passwords
Your aim
You want to use Linux and OpenSSH to automate your tasks. Therefore
you need an automatic login from host A / user a to Host B / user b.
You don't want to enter any passwords, because you want to call ssh
from a within a shell script.
Seriously.
That's how you solve this problem:
I receive on a socket a string that is a shell command and I have to
execute it. But before execute it, i have to ensure that there is not
a command in conflict with all the rules (like 'rm only inside this
directory, etc etc). For executing the command, I can't use system or
popen (I use execve). The rest is up to me.
Given
And that's when I'm begin to be really lost.
Because what you're being asked to do is implement security features and command parsing. Go look at the amount of code in SSH and bash.
Your OS comes with security features. SSH does authentication.
Don't try to reinvent those. You won't do it well - no one can. Look how long it's taken for bash and SSH to get where they are security-wise. (Hint: it's decades because there's literally decades of history and knowledge that were built into bash and SSH when they were first coded...)

How shell commands execute

I am a newbee and looking for some info.
Thanks in advance.
What is difference between echo "Hello World!" and a c-program which prints "Hello World!" using printf.
How do shell commands get executed. For example if I give ls it lists all the files in the directory. Is there executable binary which is run when we enter ls in shell.
Please let me know if you guys have any links or source to get this clear.
There are two main types of "commands" that the shell can execute. Built-in commands are executed by the shell itself - no new program is started. Simply typing echo in a shell prompt is an example of such a built-in command.
On the other hand, other commands execute external programs (also called binaries) - and ls is an example of this kind of command.
So, if you run echo in a shell, it's executed by the shell itself, but if you write a C program that performs the same action, it wil be run as an external program. As a matter of fact, most Linux systems come with such a binary, located at /bin/echo.
Why does it sometimes make sense to have both a built-in command and a program to accomplish the same task? Built-in commands are faster to execute as there is some cost involved in running an external program. But built-ins have some drawbacks, too: they can't be too complex as this would make the shell big and slow; they can not be upgraded separately from the shell and from each other; finally, there are situations where an external program which is not your shell would like to run an application: it can run external programs but it can't execute shell built-ins directly since it's not the shell. So sometimes it makes sense to have it both ways. Apart from echo, time is another example of this double approach.
The shell is just a user level way of interacting with the operating system, or the kernel. That's one of the reasons it's called a shell. The shell itself (sh, csh, tcsh, ksh, zsh, bash, etc...) is essentially just a binary the operating system executes to allow you to execute other binaries.
It generally gives a lot of other functionality though like built in functions (echo, fg, jobs, etc...), an interpreted language (for x in ..., if then, etc...), command history, and so on...
So, any text entered into the shell (like echo), the binary (or process) interprets and runs the corresponding functions in its code. Built in functions (like echo) don't need to create a new process, but if the text is interpreted as a request to execute a binary (vim, emacs, gcc, test, true, false, etc...) the shell will create a new process for it (unless you prefix it withexec), and execute it.
So, echo "Hello World! just runs code in the shell (process). A printf("Hello World!") would be in seperate binary that the shell would create a new process for (fork), and have the operating system execute (exec).

How to run one TCL script from a batch/job file by passing command line arguments?

I am writing scripts in tcl for a project I am working on.
I wanted to automate things as much as possible and wanted to not touch the source code of the script as far as possible. I want to run the main script file from a .bat or .job file sort of thing where I pass the command to execute the script along with the arguments.
I have referred to this post on stackoverflow:
How to run tcl script in other tcl script?
And have done pretty much the same thing. However, since my script is naked code rather than a single huge proc, I dont have the "args" parameter to read the arguments I wanted to pass.
For example, if script1.tcl is the main file containing the naked code, I want a file script2.job or script2.bat such that,
<command-to-run-script1.tcl> <mandatory-args> <optional-args>
is the content of the file.
Any suggestions on how I can implement the same?
To run a Tcl script, passing in some arguments, do:
tclsh script1.tcl theFirstArgument theSecondArgument ...
That's how it works in CMD scripts/BAT files on Windows, and in shell scripts on all Unixes. You might want to put quotes around some of the arguments too, but that's just absolute normal running of a program with arguments. (The tclsh might need to be tclsh8.5 or tclsh85 or … well, it depends on how it's installed. And script1.tcl might need to be a full path to the script.)
Inside the script, the arguments (starting at theFirstArgument) will appear in the Tcl list in the global argv variable. Note that this is not args, which is a feature of procedures. There are lots of ways of parsing the list of arguments, but any quoting supplied during the call itself should have been already stripped.
Here's a very simple version:
foreach argument $argv {
puts "Oh, I seem to have a >>$argument<<"
}
You probably need something more elaborate! There's many possibilities though, so be sure to be exact to get more focussed ideas.
If you're calling Tcl from another Tcl script, you need to use exec to do it. On the other hand, you can make things a bit easier for yourself in other ways:
exec [info nameofexecutable] script1.tcl theFirstArgument theSecondArgument ...
The info nameofexecutable command returns the name of the Tcl interpreter program (often tclsh8.5 or wish86 or …)

How does grep work?

I am trying to understand how grep works.
When I say grep "hello" *.*, does grep get 2 arguments — (1) string to be searched i.e. "hello" and (2) path *.*? Or does the shell convert *.* into something that grep can understand?
Where can I get source code of grep? I came across this GNU grep link. One of the README files says its different from unix grep. How so?
I want to look at source of FreeBSD version of grep and also Linux version of it (if they are different).
The power of grep is the magic of automata theory. GREP is an abbreviation for Global Regular Expression Print. And it works by constructing an automaton (a very simple "virtual machine": not Turing Complete); it then "executes" the automaton against the input stream.
The automaton is a graph or network of nodes or states. The transition between states is determined by the input character under scrutiny. Special automatons like + and * work by having transitions that loop back to themselves. Character classes like [a-z] are represented by a fan: one start node with branches for each character out to the "spokes"; and usually the spokes have a special "epsilon transition" to a single final state so it can be linked up with the next automaton to be built from the regular expression (the search string). The epsilon transitions allow a change of state without moving forward in the string being searched.
Edit: It appears I didn't read the question very closely.
When you type a command-line, it is first pre-processed by the shell. The shell performs alias substitutions and filename globbing. After substituting aliases (they're like macros), the shell chops up the command-line into a list of arguments (space-delimited). This argument list is passed to the main() function of the executable command program as an integer count (often called argc) and a pointer to a NULL-terminated ((void *)0) array of nul-terminated ('\0') char arrays.
Individual commands make use of their arguments however they wish. But most Unix programs will print a friendly help message if given the -h argument (since it begins with a minus-sign, it's called an option). GNU software will also accept a "long-form" option --help.
Since there are a great many differences between different versions of Unix programs the most reliable way to discover the exact syntax that a program requires is to ask the program itself. If that doesn't tell you what you need (or it's too cryptic to understand), you should next check the local manpage (man grep). And for gnu software you can often get even more info from info grep.
The shell does the globbing (conversion from * form to filenames). You can see this by if you have a simple C program:
#include <stdio.h>
int main(int argc, char **argv) {
for(int i=1; i<argc; i++) {
printf("%s\n", argv[i]);
}
return 0;
}
And then run it like this:
./print_args *
You'll see it prints out what matched, not * literally. If you invoke it like this:
./print_args '*'
You'll see it gets a literal *.
The shell expands the '*.*' into a list of file names and passes the expanded list of file names to the program such as grep. The grep program itself does not do expansion of file names.
So, in answer to your question: grep does not get 2 arguments; the shell converts '*.*' into something grep can understand.
GNU grep is different from Unix grep in supporting extra options, such as -w and -B and -A.
It looks to me like FreeBSD uses the GNU version of grep:
http://svnweb.freebsd.org/base/stable/8/gnu/usr.bin/grep/
How grep sees the wildcard argument depends on your shell. (Standard) Bourne shell has a switch (-f) to disable file name globbing (see man pages).
You may activate this switch in a script with
set -f

cmd- comma to separate parameters Compared to space?

I have some questions on the subject, of commas compared to space, in delimiting parameters.
They are questions that C programmers familiar with the cmd prompt, may be able to throw some light on..
I know that when doing
c:\>program a b c
there are 4 parameters [0]=program [1]=a [2]=b [3]=c
According to hh ntcmds.chm concepts..
Shell overview
; and , are used to separate parameters
; or , command1 parameter1;parameter2 Use to separate command parameters.
I see dir a,b gives the same result as dir a b
but
c:\>program a,b,c
gives parameters [0]=program [1]=a,b,c
So do some? or all? windows commands use ; and , ? and is that interpretation within the code of each command, or done by the shell like with space?
And if it is in the code of each command.. how would I know which do it?
I notice that documentation of explorer.exe mentions the comma,e.g. you can do
explorer /e,.
but DIR /? does not mention it, but can use it. And a typical c program doesn't take , as a delimiter at all.. So is it the case that the shell doesn't use comma to delimit, it uses space. And windows commands that do, do so 'cos they are (all?) written to delimit the parameters the shell has given them further when commas are used?
There are two differences here between Unix and Windows:
Internal commands such as DIR are built into the shell; their command line syntax doesn't have to follow the same rules as for regular programs
On Windows, programs are responsible for parsing their own command lines. The shell parses redirects and pipes, then passes the rest of the command line to the program in one string
Windows C programs built using Visual Studio use the command line parser in the Microsoft C runtime, which is similar to a typical Unix shell parser and obeys spaces and quotation marks.
I've never seen a C program that uses , or ; as a command line separator. I was aware of the special case for explorer /e,., but I'd never seen the dir a,b example until just now.
Batch files use a comma or semicolon as an alternative argument separator.
Test batch file:
#echo %1/%2/%3
Test run:
> test.cmd 1,2,3
1/2/3
> test.cmd 1;2 3
1/2/3
And, as you note, dir uses it, copy as well – those are both shell built-ins and probably run through a similar parser like batch files as well (it isn't exactly the same, since you can do things like cd.. or dir/s which aren't possible for anything else). I guess (note: speculation) this is some sort of backwards compatibility that goes back into the DOS or even CP/M days. Nowadays you probably should just use spaces. And as Tim notes, the C runtime dictates certain things about arguments and how they are supposed to be parsed. Many other languages/frameworks follow that convention but not necessarily all. PowerShell for example has completely different argument handling and this can sometimes be a surprise when interacting with native programs from within it (that being said, PowerShell cmdlets and functions are no programs executable elsewhere, but batch files likewise).

Resources