How do I perform a string replace in batch on %* arguments? - batch-file

I would like to perform a string replace on the arguments given to my batch file. For example, replacing the %~dp0 argument from back slashes to forward slashes.
I have tried: %~dp0:\=/ and %~dp0:\=/%, both do not replace the slashes. The only possible solution I can do is messy:
set dir=%~dp0
set dir=%dir:\=/%
Is there a better way to do it without setting a separate variable and hopefully in a single line?

Short answer - No.
String manipulations do not work for parsed arguments (%1) or for loop tokens (%%i).
It is generally considered bad practice to not save your parsed arguments into named variables for two reasons:
Readability - transferring your arguments into named variables will increase your ability to debug and understand your code after you have written it.
Overwriting - any call:function param1 param2 occurrences in your scripts will overwrite the argument values within that scope.

Related

How to safely pass an arbitrary text as parameter to a program in a shell script?

I'm writing a GUI application for character recognition that uses Tesseract. I want to allow the user to specify a custom shell command to be executed with /bin/sh -c when the text is ready.
The problem is the recognized text can contain literally anything, for example && rm -rf some_dir.
My first thought was to make it like in many other programs, where
the user can type the command in a text entry, and then special strings (like in printf()) in the command are replaced by the appropriate data (in my case, it might be %t). Then the whole string is passed to execvp(). For example, here is a screenshot from qBittorrent:
The problem is that even if I properly escape the text before replacing %t, nothing prevents the user to add extra quotes around the specifier:
echo '%t' >> history.txt
So the full command to be executed is:
echo ''&& rm -rf some_dir'' >> history.txt
Obviously, that's a bad idea.
The second option is only let the user to choose an executable (with a file selection dialog), so I can manually put the text from Tesseract as argv[1] for execvp(). The idea is that the executable can be a script where users can put anything they want and access the text with "$1". That way, the command injection is not possible (I think). Here's an example script a user can create:
#!/bin/sh
echo "$1" >> history.txt
It there any pitfalls with this approach? Or maybe there's a better way to safely pass an arbitrary text as parameter to a program in shell script?
In-Band: Escaping Arbitrary Data In An Unquoted Context
Don't do this. See the "Out-Of-Band" section below.
To make an arbitrarily C string (containing no NULs) evaluate to itself when used in an unquoted context in a strictly POSIX-compliant shell, you can use the following steps:
Prepend a ' (moving from the required initial unquoted context to a single-quoted context).
Replace each literal ' within the data with the string '"'"'. These characters work as follows:
' closes the initial single-quoted context.
" enters a double-quoted context.
' is, in a double-quoted context, literal.
" closes the double-quoted context.
' re-enters single-quoted context.
Append a ' (returning to the required initial single-quoted context).
This works correctly in a POSIX-compliant shell because the only character that is not literal inside of a single-quoted context is '; even backslashes are parsed as literal in that context.
However, this only works correctly when sigils are used only in an unquoted context (thus putting onus on your users to get things right), and when a shell is strictly POSIX-compliant. Also, in a worst-case scenario, you can have the string generated by this transform be up to 5x longer than the original; one thus needs to be cautious around how the memory used for the transform is allocated.
(One might ask why '"'"' is advised instead of '\''; this is because backslashes change their meaning used inside legacy backtick command substitution syntax, so the longer form is more robust).
Out-Of-Band: Environment Variables, Or Command-Line Arguments
Data should only be passed out-of-band from code, such that it's never run through the parser at all. When invoking a shell, there are two straightforward ways to do this (other than using files): Environment variables, and command-line arguments.
In both of the below mechanisms, only the user_provided_shell_script need be trusted (though this also requires that it be trusted not to introduce new or additional vulnerabilities; invoking eval or any moral equivalent thereto voids all guarantees, but that's the user's problem, not yours).
Using Environment Variables
Excluding error handling (if setenv() returns a nonzero result, this should be treated as an error, and perror() or similar should be used to report to the user), this will look like:
setenv("torrent_name", torrent_name_str, 1);
setenv("torrent_category", torrent_category_str, 1);
setenv("save_path", path_str, 1);
# shell script should use "$torrent_name", etc
system(user_provided_shell_script);
A few notes:
While values can be arbitrary C strings, it's important that the variable names be restricted -- either hardcoded constants as above, or prefixed with a constant (lowercase 7-bit ASCII) string and tested to contain only characters which are permissible shell variable names. (A lower-case prefix is advised because POSIX-compliant shells use only all-caps names for variables that modify their own behavior; see the POSIX spec on environment variables, particularly the note that "The name space of environment variable names containing lowercase letters is reserved for applications. Applications can define any environment variables with names from this name space without modifying the behavior of the standard utilities").
Environment space is a limited resource; on modern Linux, the maximum combined storage for both environment variables and command-line arguments is typically on the scale of 128kb; thus, setting large environment variables will cause execve()-family calls with large command lines to fail. Validating that length is within reasonable domain-specific limits is wise.
Using Command-Line Arguments:
This version requires an explicit API, such that the user configuring the trigger command knows which value will be passed in $1, which will be passed in $2, etc.
/* You'll need to do the usual fork() before this, and the usual waitpid() after
* if you want to let it complete before proceeding.
* Lots of Q&A entries on the site already showing the context.
*/
execl("/bin/sh", "-c", user_provided_shell_script,
"sh", /* this is $0 in the script */
torrent_name_str, /* this is $1 in the script */
torrent_category_str, /* this is $2 in the script */
path_str, /* this is $3 in the script */
NUL);
Any time you're runnng commands with even the possibility of user input making its way into them you must escape for the shell context.
There's no built-in function in C to do this, so you're on your own, but the basic idea is to render user parameters as either properly escaped strings or as separate arguments to some kind of execution function (e.g. exec family).

Perfectly forwarding arguments in batch

I have a small python script:
# args.py
import sys; print(sys.argv)
How can I write a .bat wrapper file that forwards all of the arguments to this script?
To eliminate my shell from the tests, I'm going to invoke it as:
import subprocess
import sys
def test_bat(*args):
return subprocess.check_output(['args.bat'] + list(args), encoding='ascii')
The obvious choice of batch file
#echo off
python args.py %*
Works for simple cases:
>>> test_bat('a', 'b', 'c')
"['args.py', 'a', 'b', 'c']\n"
>>> test_bat('a', 'b c')
"['args.py', 'a', 'b c']\n"
But rapidly falls apart when tried on arbitrary strings:
>>> test_bat('a b', 'c\n d')
"['args.py', 'a b', 'c']\n" # missing d
>>> test_bat('a', 'b^^^^^c')
"['args.py', 'a', 'b^c']\n" # missing ^^^^
Is it even possible to make a bat file pass on its arguments unmodified?
To prove it's not subprocess causing the issue - try running the above with
def test_py(*args):
return subprocess.check_output([sys.executable, 'args.py'] + list(args), encoding='ascii')
All of the tests behave as expected
Similar questions:
Get list of passed arguments in Windows batch script (.bat) - does not address lossless forwarding
Redirecting passed arguments to a windows batch file - addresses the same ideas as my question, but incorrectly closed as a duplicate of the above, and with less clear test-cases
Forwarding all batch file parameters to inner command - question does not consider corner-cases, accepted answer does not work for them
In short: There is no robust way to pass arguments through as-is via a batch file, because of how cmd.exe interprets arguments; note that cmd.exe is invariably involved, given that it is the interpreter needed to execute batch files, even if you invoke the batch file using an API that requests no shell involvement.
The problem in a nutshell:
On Windows, invoking an external program requires use of a command line as a single string for technical reasons. Therefore, even using an array-based, shell-free way of invoking an external program requires automated composition of a command line in which the individual arguments are embedded.
E.g., Python's subprocess.check_output() accepts the target executable and its arguments individually, as the elements of an array, as demonstrated in the question.
The target executable is invoked directly, using a command line that was automatically composed behind the scenes, without using the platform's shell as an intermediary (the way that Python's os.system() call does, for instance) - unless it so happens the target executable itself requires that shell as the executing interpreter, as is the case with cmd.exe for batch files.
Composing the command line requires selective double-quoting and escaping of embedded " chars. when embedding the individual arguments; typically that involves:
Using enclosing double-quoting ("..."), but only around arguments that contain whitespace (spaces).
Escaping embedded double quotes as \"
Notably, no other characters trigger double-quoting or individual escaping, even though those characters may have special meaning to a given shell.
While this approach works well with most external programs, it does NOT work reliably with batch files:
Unfortunately, cmd.exe doesn't treat the arguments as literals, but interprets them as if you had submitted the batch-file call in an interactive console (Command Prompt).
Combined with how the command line is composed (as described above), this results in many ways that the arguments can be misinterpreted and break the invocation altogether.
The primary problem is that arguments that end up unquoted in the command line that cmd.exe sees may break the invocation, namely if they contain characters such as & , |, > or <.
Even if the invocation doesn't break, characters such as ^ may get misinterpreted.
See below for specific examples of problematic arguments.
Trying to work around the problem on the calling side with embedded quoting - e.g., using '"^^^^^" as an argument in Python - does not work, because most languages, including Python, use \" to escape " characters behind the scenes, which cmd.exe does not recognize (it only recognizes "").
Hypothetically, you could painstakingly ^-escape individual characters in whitespace-free arguments, but not only is that quite cumbersome, it still wouldn't address all issues - see below.
Jeb's answer commendably addresses some of these issues inside the batch file, but it is quite complex and it too cannot address all issues - see next point.
There is no way to work around the following fundamental restrictions:
cmd.exe fundamentally cannot handle arguments with embedded newlines (line breaks):
Parsing the argument list simply stops at the first newline encountered.
CR (0xD) chars. in isolation are quietly removed.
The interpretation of % as part of an environment-variable reference (e.g, %OS%) cannot be suppressed:
%% does NOT help, because, curiously and unfortunately, the parsing rules of an interactive cmd.exe session apply(!), where the only way to suppress expansion is to employ the "variable-name disrupter trick", e.g., %^OS%, which only works in unquoted arguments - in double-quoted arguments, you fundamentally cannot prevent expansion.
You're lucky if the env. variable happens not to exist; the token is then left alone (e.g., %NoSuchVar% or %No Such Var% (note that cmd.exe does support variable names with spaces).
Examples of whitespace-free arguments that either break batch-file invocation or result in unwanted alteration of the value:
^^^^^
^ in unquoted strings is cmd.exe's escape character that escapes the next character, i.e., treats it as a literal; ^^ therefore represents a literal, single ^, so the above yields ^^, with the last ^ getting discarded
a|b
| separates commands in a pipeline, so cmd.exe will attempt to pipe the part of the command line before | to a command named b and the invocation will most likely break or, perhaps worse, will not work as intended and execute something it shouldn't.
To make this work, you'd need to define the argument as 'a^^^|b' (sic) on the Python side.
Note that a & b would not be affected, because the embedded whitespace would trigger double-quoting on the Python side, and use of & inside "..." is safe.
Other characters that pose similar problems are & < >
Interessting question, but it's tricky.
The main problem is, that %* can't be used here, as it modifies the content or completely fails dependent of the content.
To get the unmodified argv, you should use a technic like Get list of passed arguments in Windows batch script (.bat).
#echo off
SETLOCAL DisableDelayedExpansion
SETLOCAL
for %%a in (1) do (
set "prompt=$_"
echo on
for %%b in (1) do rem * #%*#
#echo off
) > argv.txt
ENDLOCAL
for /F "delims=" %%L in (argv.txt) do (
set "argv=%%L"
)
SETLOCAL EnableDelayedExpansion
set "argv=!argv:*#=!"
set "argv=!argv:~0,-2!"
REM argv now contains the unmodified content of %* .
c:\dev\Python35-32\python.exe args.py !argv!
This can be used to build a wrapper with limitations.
Carriage returns can't be fetched at all.
Line feeds currently can not be fetched in a safe way

Use of the # (hash character; Number sign; ASCII code 35) in batch files? [duplicate]

I've come across a batch file (portableshell.bat from the portable version of Strawberry Perl) that uses # and I can't understand why. I've searched online but there seems to be no reference to this usage. I need to imitate the batch file's functionality but I'm wary to do so without understanding what it does exactly.
What's the purpose of # here:
set drive=%~dp0
set drivep=%drive%
if #%drive:~-1%# == #\# set drivep=%drive:~0,-1%
and here:
if not #%1# == ## "%drivep%\perl\bin\perl.exe" %* & goto END
(here's a pastebin of the entire file, for reference)
The # character is there so that null/undefined values are properly handled in the comparison, namely to avoid comparing to undefined values should the variable evaluate to them.
This purpose can be served by almost any character (as long as it doesn't have another meaning in this context). Common choices are [] and {}. "" can also be used.
According to dbenham in the comments, using anything other than "" is generally bad practice. Quotes can fail when the variable has its own quotes, but is safe when the expanded variable cannot have them, as when expanding paths. The only way that's safe in all cases is delayed expansion.
Also, there is no reason to surround a variable expansion with # or [ or whatever is used. In the case of the batch file of the question, it is apparently just for symmetry.
A bit more about this can be read in this question.
Thanks to CodeCaster for the help

How can i include a pipe in a variable?

I'm trying to put a pipe in a variable. However, i can't seem to.
If i do set elfheader=|ELF, i just get:
"'ELF' is not recognized as an internal or external command,
operable program or batch file."
This is beacuse batch executes things literally.
I've tried something like:
set elfheader=(
|ELF
)
and it doesn't work.
I know why this happens, it's beacuse batch executes things literally and does this (however that is not the question i am asking):
"okay, oh, we want to set a variable. currently (... oh, pipe. so set that to ( and let's go execute the next command. hmm... what is "ELF"? what?"
TL;DR/To clarify, i am trying to put "|ELF" in a variable, in the batch programming language. However, it fails.
EDIT: This is not to clarify what can be used. I want to put that in a variable, not know what can not be used. I want to avoid that.
Supposing you want to assign the pipe character | to a variable, there are two options:
To use quotation marks around the entire assignment expression:
set "elfheader=|ELF"
Note that the quotation marks do not become part of the value. Let me recommend this method, because it avoids spacial handling of also other special characters; in addition, this avoids unintended trailing spaces to be appended in case there are some after the second ". However, this syntax only works in case the command extensions are enabled, which is the default anyway.
To use escaping by the character ^:
set elfheader=^|ELF
This works also with command extensions disabled (which is rarely the case though), but you need to take care for every single special character, like ^, &, <, >, |, and when the command line is placed within a parenthesised block of code, also (, ), and when delayed expansion is enabled, also !.

Using exec on each file in a bash script

I'm trying to write a basic find command for a assignment (without using find). Right now I have an array of files I want to exec something on. The syntax would look like this:
-exec /bin/mv {} ~/.TRASH
And I have an array called current that holds all of the files. My array only holds /bin/mv, {}, and ~/.TRASH (since I shift the -exec out) and are in an array called arguments.
I need it so that every file gets passed into {} and exec is called on it.
I'm thinking I should use sed to replace the contents of {} like this (within a for loop):
for i in "${current[#]}"; do
sed "s#$i#{}"
#exec stuff?
done
How do I exec the other arguments though?
You can something like this:
cmd='-exec /bin/mv {} ~/.TRASH'
current=(test1.txt test2.txt)
for f in "${current[#]}"; do
eval $(sed "s/{}/$f/;s/-exec //" <<< "$cmd")
done
Be very careful with eval command though as it can do nasty things if input comes from untrusted sources.
Here is an attempt to avoid eval (thanks to #gniourf_gniourf for his comments):
current=( test1.txt test2.txt )
arguments=( "/bin/mv" "{}" ~/.TRASH )
for f in "${current[#]}"; do
"${arguments[#]/\{\}/$f}"
done
Your are lucky that your design is not too bad, that your arguments are in an array.
But you certainly don't want to use eval.
So, if I understand correctly, you have an array of files:
current=( [0]='/path/to/file'1 [1]='/path/to/file2' ... )
and an array of arguments:
arguments=( [0]='/bin/mv' [1]='{}' [2]='/home/alex/.TRASH' )
Note that you don't have the tilde here, since Bash already expanded it.
To perform what you want:
for i in "${current[#]}"; do
( "${arguments[#]//'{}'/"$i"}" )
done
Observe the quotes.
This will replace all the occurrences of {} in the fields of arguments by the expansion of $i, i.e., by the filename1, and execute this expansion. Note that each field of the array will be expanded to one argument (thanks to the quotes), so that all this is really safe regarding spaces, glob characters, etc. This is really the safest and most correct way to proceed. Every solution using eval is potentially dangerous and broken (unless some special quotings is used, e.g., with printf '%q', but this would make the method uselessly awkward). By the way, using sed is also broken in at least two ways.
Note that I enclosed the expansion in a subshell, so that it's impossible for the user to interfere with your script. Without this, and depending on how your full script is written, it's very easy to make your script break by (maliciously) changing some variables stuff or cd-ing somewhere else. Running your argument in a subshell, or in a separate process (e.g., separate instance of bash or sh—but this would add extra overhead) is really mandatory for obvious security reasons!
Note that with your script, user has a direct access to all the Bash builtins (this is a huge pro), compared to some more standard find versions2!
1 Note that POSIX clearly specifies that this behavior is implementation-defined:
If a utility_name or argument string contains the two characters "{}", but not just the two characters "{}", it is implementation-defined whether find replaces those two characters or uses the string without change.
In our case, we chose to replace all occurrences of {} with the filename. This is the same behavior as, e.g., GNU find. From man find:
The string {} is replaced by the current file name being processed everywhere it occurs in the arguments to the command, not just in arguments where it is alone, as in some versions of find.
2 POSIX also specifies that calling builtins is not defined:
If the utility_name names any of the special built-in utilities (see Special Built-In Utilities), the results are undefined.
In your case, it's well defined!
I think that trying to implement (in pure Bash) a find command is a wonderful exercise that should teach you a lot… especially if you get relevant feedback. I'd be happy to review your code!

Resources