I have some questions on the subject, of commas compared to space, in delimiting parameters.
They are questions that C programmers familiar with the cmd prompt, may be able to throw some light on..
I know that when doing
c:\>program a b c
there are 4 parameters [0]=program [1]=a [2]=b [3]=c
According to hh ntcmds.chm concepts..
Shell overview
; and , are used to separate parameters
; or , command1 parameter1;parameter2 Use to separate command parameters.
I see dir a,b gives the same result as dir a b
but
c:\>program a,b,c
gives parameters [0]=program [1]=a,b,c
So do some? or all? windows commands use ; and , ? and is that interpretation within the code of each command, or done by the shell like with space?
And if it is in the code of each command.. how would I know which do it?
I notice that documentation of explorer.exe mentions the comma,e.g. you can do
explorer /e,.
but DIR /? does not mention it, but can use it. And a typical c program doesn't take , as a delimiter at all.. So is it the case that the shell doesn't use comma to delimit, it uses space. And windows commands that do, do so 'cos they are (all?) written to delimit the parameters the shell has given them further when commas are used?
There are two differences here between Unix and Windows:
Internal commands such as DIR are built into the shell; their command line syntax doesn't have to follow the same rules as for regular programs
On Windows, programs are responsible for parsing their own command lines. The shell parses redirects and pipes, then passes the rest of the command line to the program in one string
Windows C programs built using Visual Studio use the command line parser in the Microsoft C runtime, which is similar to a typical Unix shell parser and obeys spaces and quotation marks.
I've never seen a C program that uses , or ; as a command line separator. I was aware of the special case for explorer /e,., but I'd never seen the dir a,b example until just now.
Batch files use a comma or semicolon as an alternative argument separator.
Test batch file:
#echo %1/%2/%3
Test run:
> test.cmd 1,2,3
1/2/3
> test.cmd 1;2 3
1/2/3
And, as you note, dir uses it, copy as well – those are both shell built-ins and probably run through a similar parser like batch files as well (it isn't exactly the same, since you can do things like cd.. or dir/s which aren't possible for anything else). I guess (note: speculation) this is some sort of backwards compatibility that goes back into the DOS or even CP/M days. Nowadays you probably should just use spaces. And as Tim notes, the C runtime dictates certain things about arguments and how they are supposed to be parsed. Many other languages/frameworks follow that convention but not necessarily all. PowerShell for example has completely different argument handling and this can sometimes be a surprise when interacting with native programs from within it (that being said, PowerShell cmdlets and functions are no programs executable elsewhere, but batch files likewise).
Related
I have a small python script:
# args.py
import sys; print(sys.argv)
How can I write a .bat wrapper file that forwards all of the arguments to this script?
To eliminate my shell from the tests, I'm going to invoke it as:
import subprocess
import sys
def test_bat(*args):
return subprocess.check_output(['args.bat'] + list(args), encoding='ascii')
The obvious choice of batch file
#echo off
python args.py %*
Works for simple cases:
>>> test_bat('a', 'b', 'c')
"['args.py', 'a', 'b', 'c']\n"
>>> test_bat('a', 'b c')
"['args.py', 'a', 'b c']\n"
But rapidly falls apart when tried on arbitrary strings:
>>> test_bat('a b', 'c\n d')
"['args.py', 'a b', 'c']\n" # missing d
>>> test_bat('a', 'b^^^^^c')
"['args.py', 'a', 'b^c']\n" # missing ^^^^
Is it even possible to make a bat file pass on its arguments unmodified?
To prove it's not subprocess causing the issue - try running the above with
def test_py(*args):
return subprocess.check_output([sys.executable, 'args.py'] + list(args), encoding='ascii')
All of the tests behave as expected
Similar questions:
Get list of passed arguments in Windows batch script (.bat) - does not address lossless forwarding
Redirecting passed arguments to a windows batch file - addresses the same ideas as my question, but incorrectly closed as a duplicate of the above, and with less clear test-cases
Forwarding all batch file parameters to inner command - question does not consider corner-cases, accepted answer does not work for them
In short: There is no robust way to pass arguments through as-is via a batch file, because of how cmd.exe interprets arguments; note that cmd.exe is invariably involved, given that it is the interpreter needed to execute batch files, even if you invoke the batch file using an API that requests no shell involvement.
The problem in a nutshell:
On Windows, invoking an external program requires use of a command line as a single string for technical reasons. Therefore, even using an array-based, shell-free way of invoking an external program requires automated composition of a command line in which the individual arguments are embedded.
E.g., Python's subprocess.check_output() accepts the target executable and its arguments individually, as the elements of an array, as demonstrated in the question.
The target executable is invoked directly, using a command line that was automatically composed behind the scenes, without using the platform's shell as an intermediary (the way that Python's os.system() call does, for instance) - unless it so happens the target executable itself requires that shell as the executing interpreter, as is the case with cmd.exe for batch files.
Composing the command line requires selective double-quoting and escaping of embedded " chars. when embedding the individual arguments; typically that involves:
Using enclosing double-quoting ("..."), but only around arguments that contain whitespace (spaces).
Escaping embedded double quotes as \"
Notably, no other characters trigger double-quoting or individual escaping, even though those characters may have special meaning to a given shell.
While this approach works well with most external programs, it does NOT work reliably with batch files:
Unfortunately, cmd.exe doesn't treat the arguments as literals, but interprets them as if you had submitted the batch-file call in an interactive console (Command Prompt).
Combined with how the command line is composed (as described above), this results in many ways that the arguments can be misinterpreted and break the invocation altogether.
The primary problem is that arguments that end up unquoted in the command line that cmd.exe sees may break the invocation, namely if they contain characters such as & , |, > or <.
Even if the invocation doesn't break, characters such as ^ may get misinterpreted.
See below for specific examples of problematic arguments.
Trying to work around the problem on the calling side with embedded quoting - e.g., using '"^^^^^" as an argument in Python - does not work, because most languages, including Python, use \" to escape " characters behind the scenes, which cmd.exe does not recognize (it only recognizes "").
Hypothetically, you could painstakingly ^-escape individual characters in whitespace-free arguments, but not only is that quite cumbersome, it still wouldn't address all issues - see below.
Jeb's answer commendably addresses some of these issues inside the batch file, but it is quite complex and it too cannot address all issues - see next point.
There is no way to work around the following fundamental restrictions:
cmd.exe fundamentally cannot handle arguments with embedded newlines (line breaks):
Parsing the argument list simply stops at the first newline encountered.
CR (0xD) chars. in isolation are quietly removed.
The interpretation of % as part of an environment-variable reference (e.g, %OS%) cannot be suppressed:
%% does NOT help, because, curiously and unfortunately, the parsing rules of an interactive cmd.exe session apply(!), where the only way to suppress expansion is to employ the "variable-name disrupter trick", e.g., %^OS%, which only works in unquoted arguments - in double-quoted arguments, you fundamentally cannot prevent expansion.
You're lucky if the env. variable happens not to exist; the token is then left alone (e.g., %NoSuchVar% or %No Such Var% (note that cmd.exe does support variable names with spaces).
Examples of whitespace-free arguments that either break batch-file invocation or result in unwanted alteration of the value:
^^^^^
^ in unquoted strings is cmd.exe's escape character that escapes the next character, i.e., treats it as a literal; ^^ therefore represents a literal, single ^, so the above yields ^^, with the last ^ getting discarded
a|b
| separates commands in a pipeline, so cmd.exe will attempt to pipe the part of the command line before | to a command named b and the invocation will most likely break or, perhaps worse, will not work as intended and execute something it shouldn't.
To make this work, you'd need to define the argument as 'a^^^|b' (sic) on the Python side.
Note that a & b would not be affected, because the embedded whitespace would trigger double-quoting on the Python side, and use of & inside "..." is safe.
Other characters that pose similar problems are & < >
Interessting question, but it's tricky.
The main problem is, that %* can't be used here, as it modifies the content or completely fails dependent of the content.
To get the unmodified argv, you should use a technic like Get list of passed arguments in Windows batch script (.bat).
#echo off
SETLOCAL DisableDelayedExpansion
SETLOCAL
for %%a in (1) do (
set "prompt=$_"
echo on
for %%b in (1) do rem * #%*#
#echo off
) > argv.txt
ENDLOCAL
for /F "delims=" %%L in (argv.txt) do (
set "argv=%%L"
)
SETLOCAL EnableDelayedExpansion
set "argv=!argv:*#=!"
set "argv=!argv:~0,-2!"
REM argv now contains the unmodified content of %* .
c:\dev\Python35-32\python.exe args.py !argv!
This can be used to build a wrapper with limitations.
Carriage returns can't be fetched at all.
Line feeds currently can not be fetched in a safe way
I have one C program and one shell script and I'd like to "source" shell script using my C.
I tried use system() function, after it I can run script properly, but my colors doesn't work.
For example instead of CYAN - I defined it as:
CYAN='\e[96m'
it shows only \e[96m and some functions just failed with message:
./myscript.sh: 27: [: y: unexpected operator
Is there some solution?
A program that is not itself the shell cannot "source" a file of shell commands as the shell itself can do. A program can run such a file as a script, either directly or by invoking a shell to run it, but the script then gets its own environment, and any changes it applies to that environment do not propagate to the parent process's environment.
Programs receive their environment as a function of program startup. If you want a variable to be set in a program's environment then by far the easiest thing to do is arrange for it to be set when the program is invoked, either by exporting it from the parent process's environment or by wrapping program launch in a script that arranges for the same. There are additional alternatives on the process startup side, as well.
If a C program wants to alter its environment after startup, then it can use the setenv() and unsetenv() functions. Those are defined by POSIX, not C itself, but if we're talking about sourcing shell commands then it seems reasonable to assume a POSIX context.
Additionally, if you are trying to define CYAN as a shell variable whose contents are an ANSI escape sequence, then your syntax is wrong. No escape sequences at all are recognized within ordinary single quotes (even closing single quote cannot be escaped). Within double quotes the backslash does function as an escape character, but in a strict sense: C-style character codes are not supported there. If, again, you're processing that in the shell, as opposed to in C, then you appear to want
CYAN=$'\e[96m'
(Note the $, which is essential for \e to be recognized as representing the "escape" character, and which causes the shell to recognize a few other C-style escape sequences as well.)
I'm not that familiar with how it's done in Linux, but I understand that with Windows.. The program can use C's GetCommandLine() and see it all, or argv[] and the C compiler's libraries will take the output of GetCommandLine() and break it into parameters itself and the program can see each parameter.
When an internal program is called. It could be one that takes parameters or one that doesn't. Echo doesn't.. Maybe ECHO uses C's GetCommandLine()?
But let's say the command isn't echo, but one that parses parameters, a command like DEL or DIR. Those commands can use ; as a delimiter.
C:\blah>DEL a;b;c<ENTER>
If it were a C program that used argv[] then ; would not be interpreted as a delimiter, only space would.
Is there any known mechanism that it is using to parse the command line, that uses ; as a delimiter?
Do we have the algorithm? or the library that it uses to parse?
I'm implementing a linux shell for my weekend assignment and I am having some problems implementing wilcard matching as a feature in shell. As we all know, shells are a complete language by themselves, e.g. bash, ksh, etc. I don't need to implement the complete features like control structures, jobs etc. But how to implement the *?
A quick analysis gives you the following result:
echo *
lists all the files in the current directory. Is this the only logical manifestation of the shell? I mean, not considering the language-specific features of bash, is this what a shell does, internally? Replace a * with all the files in the current directory matching the pattern?
Also I have heard about Perl Compatible Regular Expression , but it seems to complex to use a third party library.
Any suggestions, links, etc.? I will try to look at the source code as well, for bash.
This is called "globbing" and the function performing this is named the same: glob(3)
Yes, that's what shell does. It will replace '*' characters by all files and folder names in cwd. It is in fact very basic regular expressions supporting only '?' and '*' and matching with file and folder names in cwd.
Remark that backslashed \* and '*' enclosed between simple or double quotes ' or " are not replaced (backslash and quotes are removed before passing to the command executed).
If you want more control than glob gives, the standard function fnmatch performs just glob matching.
Note that shells also performs word expansion (e.g. "~" → "/home/user"), which should be done before glob expansion, if you're doing filename matching manually. (Or use wordexp.)
I'm writing a perl module called perl5i. Its aim is to fix a swath of common Perl problems in one module (using lots of other modules).
To invoke it on the command line for one liners you'd write: perl -Mperl5i -e 'say "Hello"' I think that's too wordy so I'd like to supply a perl5i wrapper so you can write perl5i -e 'say "Hello"'. I'd also like people to be able to write scripts with #!/usr/bin/perl5i so it must be a compiled C program.
I figured all I had to do was push "-Mperl5i" onto the front of the argument list and call perl. And that's what I tried.
#include <unistd.h>
#include <stdlib.h>
/*
* Meant to mimic the shell command
* exec perl -Mperl5i "$#"
*
* This is a C program so it works in a #! line.
*/
int main (int argc, char* argv[]) {
int i;
/* This value is set by a program which generates this C file */
const char* perl_cmd = "/usr/local/perl/5.10.0/bin/perl";
char* perl_args[argc+1];
perl_args[0] = argv[0];
perl_args[1] = "-Mperl5i";
for( i = 1; i <= argc; i++ ) {
perl_args[i+1] = argv[i];
}
return execv( perl_cmd, perl_args );
}
Windows complicates this approach. Apparently programs in Windows are not passed an array of arguments, they are passed all the arguments as a single string and then do their own parsing! Thus something like perl5i -e "say 'Hello'" becomes perl -Mperl5i -e say 'Hello' and Windows can't deal with the lack of quoting.
So, how can I handle this? Wrap everything in quotes and escapes on Windows? Is there a library to handle this for me? Is there a better approach? Could I just not generate a C program on Windows and write it as a perl wrapper as it doesn't support #! anyway?
UPDATE: Do be more clear, this is shipped software so solutions that require using a certain shell or tweaking the shell configuration (for example, alias perl5i='perl -Mperl5i') aren't satisfactory.
For Windows, use a batch file.
perl5i.bat
#echo off
perl -Mperl5i %*
%* is all the command line parameters minus %0.
On Unixy systems, a similar shell script will suffice.
Update:
I think this will work, but I'm no shell wizard and I don't have an *nix system handy to test.
perl5i
#!bash
perl -Mperl5i $#
Update Again:
DUH! Now I understood your #! comment correctly. My shell script will work from the CLI but not in a #! line, since #!foo requries that foo is a binary file.
Disregard previous update.
It seems like Windows complicates everything.
I think your best there is to use a batch file.
You could use a file association, associate .p5i with perl -Mperl5i %*. Of course this means mucking about in the registry, which is best avoided IMO. Better to include instructions on how to manually add the association in your docs.
Yet another update
You might want to look at how parl does it.
I can't reproduce the behaviour your describe:
/* main.c */
#include <stdio.h>
int main(int argc, char *argv[]) {
int i;
for (i = 0; i < argc; i++) {
printf("%s\n", argv[i]);
}
return 0;
}
C:\> ShellCmd.exe a b c
ShellCmd.exe
a
b
c
That's with Visual Studio 2005.
Windows is always the odd case. Personally, I wouldn't try to code for the Windows environment exception. Some alternatives are using "bat wrappers" or ftype/assoc Registry hacks for a file extension.
Windows ignores the shebang line when running from a DOS command shell, but ironically uses it when CGI-ing Perl in Apache for Windows. I got tired of coding #!c:/perl/bin/perl.exe directly in my web programs because of portability issues when moving to a *nix environment. Instead I created a c:\usr\bin directory on my workstation and copied the perl.exe binary from its default location, typically c:\perl\bin for AS Perl and c:\strawberry\perl\bin for Strawberry Perl. So in web development mode on Windows my programs wouldn't break when migrated to a Linux/UNIX webhost, and I could use a standard issue shebang line "#!/usr/bin/perl -w" without having to go SED crazy prior to deployment. :)
In the DOS command shell environment I just either set my PATH explicitly or create a ftype pointing to the actual perl.exe binary with embedded switch -Mperl5i. The shebang line is ignored.
ftype p5i=c:\strawberry\perl\bin\perl.exe -Mperl5i %1 %*
assoc .pl=p5i
Then from the DOS command line you can just call "program.pl" by itself instead of "perl -Mperl5i program.pl"
So the "say" statement worked in 5.10 without any additional coaxing just by entering the name of the Perl program itself, and it would accept a variable number of command line arguments as well.
Use CommandLineToArgvW to build your argv, or just pass your command line directly to CreateProcess.
Of couse, this requires a separate Windows-specific solution, but you said you're okay with that, this is relatively simple, and often coding key pieces specifically to the target system helps integration (from the users' POV) significantly. YMMV.
If you want to run the same program both with and without a console, you should read Raymond Chen on the topic.
On Windows, at the system level, the command-line is passed to the launched program as a single UTF-16 string, so any quotes entered in the shell are passed as is. So the double quotes from your example are not removed. This is quite different from the POSIX world where the shell does the job of parsing and the launched program receives an array of strings.
I've described here the behavior at the system level. However, between your C (or your Perl) program there is usually the C standard library that is parsing the system command line string to give it to main() or wmain() as argv[]. This is done inside your process, but you can still access the original command line string with GetCommandLineW() if you really want to control how the parsing is done, or get the string in its full UTF-16 encoding.
To learn more about the Windows command-line parsing quirks, read the following:
http://www.autohotkey.net/~deleyd/parameters/parameters.htm#WIN
http://blogs.msdn.com/b/oldnewthing/archive/2006/05/15/597984.aspx
You may also be interested by the code of the wrapper I wrote for Padre on Win32: this is a GUI program (which means that it will not open a console if launched from the Start menu) called padre.exe that embeds perl to launch the padre Perl script. It also does a small trick: it changes argv[0] to point it to perl.exe so that $^X will be something usable to launch external perl scripts.
The execv you are using in your example code is just an emulation in the C library of the POSIX-like behavior. In particular it will not add quotes around your arguments so that the launched perl works as expected. You have to do that yourself.
Note that due to the fact that the client is responsible for parsing, each client client can do it the way it wants. Many let the libc do it, but not all. So generic command-line generation rules on Windows can not exist: the rule depend on the program launched.
You may still be interested in "best effort" implementation such as Win32::ShellQuote.
If you were able to use C++ then perhaps Boost.Program_options would help:
http://www.boost.org/doc/libs/1_39_0/doc/html/program_options.html