Why does FOR fail with delayed sub-string expansion after IN? - batch-file

Sub-string expansion works within the set of a for loop (that is the parenthesised part after in) when immediate expansion is used (write %%I instead of %I in a batch file):
set "X=123"
for %I in (%X:~1,1%) do #echo %I
However, it fails when delayed expansion is applied:
for %I in (!X:~1,1!) do #echo %I
I would expect the output to be:
2
But instead it is:
!X:~1
1!
Why is this and how can I prevent that?
I know I could work around that by quoting the set and using ~, but this is not what I want here:
for %I in ("!X:~1,1!") do #echo %~I
The following command line fails too:
for %I in (!X:*2=!) do #echo %I
The unexpected output is:
!
Also for command lines using the /R, /L and /R switches fail with such sub-string syntax.
It surely has got something to do with the fact that , and = are token separators for cmd, just like SPACE, TAB, ;, etc.

According to the thread How does the Windows Command Interpreter (CMD.EXE) parse scripts? and also numerous comments here, the answer lies in the special way for loops are parsed.
The key is the following excerpt of this answer (see the italic text in particular):
Phase 2) Process special characters, tokenize, and build a cached command block:
[...]
Three commands get special handling - IF, FOR, and REM
[...]
FOR is split in two after the DO. A syntax error in the FOR construction will result in a fatal syntax error.
The portion through DO is the actual FOR iteration command that flows all the way through phase 7
All FOR options are fully parsed in phase 2.
The IN parenthesized clause treats <LF> as <space>. After the IN clause is parsed, all tokens are concatenated together to form a single token.
Consecutive token delimiters collapse into a single space throughout the FOR command through DO.
Due to the fact that delayed expansion happens after parsing of for and the described behaviour, token separators like SPACE, TAB, ,, ;, =, etc. become converted to a single SPACE, hence a sub-string expansion expression like !X:~1,1! is changed to !X:~1 1!, and a sub-string substitution expression like !X:*2=! is changed to !X:*2 !, which both are invalid syntax.
Therefore to solve the issue you need to escape the token separators by ^ like:
for %I in (!X:~1^,1!) do #echo %I
and:
for %I in (!X:*2^=!) do #echo %I
(By the way, there is a very similar problem with if statements.)

Related

Why does batch interpret comments?

I have a simple batch test file test.bat with following lines:
#echo off
REM IF "%~version_info" == "" echo No version information found
echo test
When I run it I expected to get test instead I get:
The following usage of the path operator in batch-parameter
substitution is invalid: %~version_info" == "" echo No version information found
For valid formats type CALL /? or FOR /?
The syntax of the command is incorrect.
Why does batch try to interpret the comment? Or what is happening here? If I take the comment out, the script prints out test as expected.
Also the documentation doesn't mention anything of this.
I believe it's a consequence of the parsing sequence. In this case, it's a problem, but suppose you code (as I have done):
set "debug=rem"
%debug% echo some debug data
First, we replace the values in %vars%, then we interpret the line, using the first token as the command to execute. The above construct allows the command to be varied.
So there is a method to the madness...
The reason for this is the sequence of batch scripts.
The very first thing that happens is the (poor1) %-sign handling, that is the normal variable (%VAR%) and the command line argument (%1, %2, etc., and %*) expansion. Commands, and therefore even rem, are recognised in a later parsing phase.
The string %~ is an invalid argument reference, because there is neither a valid modifier or a combination of such (f, d, p, n, x, s, a, t, z, $PATH:), or a numeric digit following.
Refer to this thread: How does the Windows Command Interpreter (CMD.EXE) parse scripts?
1... The % expansion is faulty in my opinion, because %~ or %VAR:=, %VAR:*=, in case variable VAR is defined, result in an error, and variable expansion like %VAR:[*]search=[replace]% or %VAR:~[position][,[length]]% becomes aborted in case VAR is not defined (so %VAR:~%STR% becomes expanded to ~text when STR is set to text).
It is not ignoring it.
In batch files you need to add %% and not % so it is simply warning you that the substitution is invalid. cmdline still reads comment lines and seeing as you have a valid command in it, but incorrect method, it will warn you.
By doing this, you will not get the warning:
#echo off
REM IF "%%~version_info" == "" echo No version information found
echo test
Why batch interpret comments?
It doesn't interpret comments, but it has to parse lines, that's the problem.
First, the parser reads a line.
Then it expands all percent expressions and then it takes a look at the first token in the line.
If the first token is REM then the remaining stuff will not be interpreted anymore (redirection, delayed expansion, pipes, ampersands, ..., are all ignored)
The problem is, that the parser first expands all percent expresssions, when there is a expression like %~ then the parser throws an error message.
If you do not want to modify the commented code, you can use:
REM %= IF "%~version_info" == "" echo No version information found
Though this is slightly ugly, it will avoid interpreting the %~.
You can find more information on comments in batch in this answer
As to why this is the case, the whole batch thing sounds a bit broken.

How to pass a parameter to a batch file containing a % without it 'breaking'?

The Problem
In a main batch file, values are pulled from a .txt file (and SET as values of variables within this batch file). These values may each contain % characters.
These are read from the .txt file with no issues. However, when a variable with a value containing a % character is passed to a second batch file, the second batch file interprets any % characters as a variable expansion. (Note: There is no control over the second batch file.)
Example
echo %PERCENTVARIABLE%
Output: I%LOVE%PERCENT%CHARACTERS%
When passed to a second file and then echo'ed, would (probably) become IPERCENT, as it interprets %LOVE% and %CHARACTERS% as unset variables.
Research
I found the syntax to find and replace elements within a string in a batch file, as I thought I could potentially replace a % character with %% in order to escape it. However I cannot get it to work.
The syntax is -
set string=This is my string to work with.
set string=%string:work=play%
echo %string%
Where the output would then be This is my string to play with..
Questions
Is it possible to escape % characters using the find and replace syntax
in a variable? (If not, is there another way?)
Is it advisable to do so? (Could using these escape characters cause any issue in the second batch file which (as mentioned above) we would have no control over?)
Is there another way to handle this issue, if the above is not possible?
There are no simple rules that can be applied in all situations.
There are a few issues that make working with string literals in parameters difficult:
Poison characters like &, |, etc. must be escaped or quoted. Escaping is difficult because it can be confusing as to how many times to escape. So the recommendation is to usually quote the string.
Token delimiters like <space>, <tab>, =, ; and , cannot be included in a parameter value unless it is quoted.
A CALL to a script will double any quoted % characters, and there is no way to prevent this. Executing a script without CALL will not double the % characters. But if a script calls another script and expects control to be returned, then CALL must be used.
So we have a catch-22: On the one hand, we want to quote parameters to protect against poison characters and spaces (token delimiters). But to protect percents we don't want to quote.
The only reliable method to reliably pass string literals without concern of value corruption is to pass them by reference via environment variables.
The value to be passed should be stored in an environment value. Quotes and/or escapes and/or percent doubling is used to get the necessary characters in the value, but it is very manageable.
The name of the variable is passed in as a parameter.
The script accesses the value via delayed expansion. For example, if the first parameter is the name of a variable containing the value, then it is accessed as !%1!. Delayed expansion must be enabled before that syntax can be used - simply issue setlocal enableDelayedExpansion.
The beauty of delayed expansion is you never have to worry about corruption of poison characters, spaces, or percents when the variable is expanded.
Here is an example that shows how the following string literal can be passed to a subroutine
"<%|,;^> This & that!" & the other thing! <%|,;^>
#echo off
setlocal enableDelayedExpansion
set "parm1="^<%%^|,;^^^^^> This ^& that^^!" & the other thing^! <%%|,;^^^>"
echo The value before CALL is !parm1!
call :test parm1
exit /b
:test
echo The value after CALL is !%1!
-- OUTPUT --
The value before CALL is "<%|,;^> This & that!" & the other thing! <%|,;^>
The value after CALL is "<%|,;^> This & that!" & the other thing! <%|,;^>
But you state that you have no control over the 2nd called script. So the above elegant solution won't work for you.
If you were to show the code of the 2nd script, and show exactly what value you were trying to pass, then I might be able to give a solution that would work in that isolated situation. But there are some values that simply cannot be passed unless delayed expansion is used with variable names. (Actually, another option is to put the value in a file and read the value from the file, but that also requires change to your 2nd script)
may be...?
input.txt
I%LOVE%PERCENT%CHARACTERS%
batch1.bat
#echo off
setlocal enableDelayedExpansion
set/P var=<input.txt
echo(In batch 1 var content: %var%
set "var=!var:%%=%%%%!"
call batch2.bat "%var%"
endlocal
exit/B
batch2.bat
#echo off
set "var=%~1"
echo(In batch 2 var content: %var%
exit/B

Sub-string expansion with empty string causes error in If clause

I have the following code snippet:
if "%ARGV:~,1%"==":" echo %ARGV% begins with a colon.
As long as variable ARGV contains a non-empty value, or correctly said, it is defined, everything works as expected, hence if the string in ARGV begins with a colon, the echo command is executed.
However, as soon as I clear variable ARGV, a syntax error arises:
echo was unexpected at this time.
What is going on here? The syntax is perfectly fine, but why does that command line fail?
Even one of the most helpful threads here, How does the Windows Command Interpreter (CMD.EXE) parse scripts?, for such things does not deliver an explanation for this behaviour.
When I do the same directly in command prompt, everything is in order. Moreover, when I try it using delayed expansion no error occurs either.
My companion answer to jeb's answer to "How does the Windows Command Interpreter (CMD.EXE) parse scripts?" does explain the behavior.
My companion answer gives the necessary details on how % expansion works to fully predict the behavior.
If you keep ECHO ON, then you can see the result of the expansion, and the error message makes sense:
test.bat
#echo on
#set "ARGV="
if "%ARGV:~,1%"==":" echo %ARGV% begins with a colon.
-- output --
C:\test>test
echo was unexpected at this time.
C:\test>if "~,1" echo begins with a colon.
The important rules from my answer that explain the expansion result are:
1)(Percent) Starting from left, scan each character for %. If found then
1.1 (escape %) ... not relevant
1.2 (expand argument) ... not relevant
1.3 (expand variable)
Else if command extensions are disabled then ... not relevant
Else if command extensions are enabled then Look at next string of characters, breaking before % : or <LF>, and call them VAR
(may be an empty list). If VAR breaks before : and the subsequent
character is % then include : as the last character in VAR and
break before %.
If next character is % then Replace %VAR% with value of VAR (replace with nothing if VAR not defined) and continue scan
Else if next character is : then
If VAR is undefined then Remove %VAR: and continue scan.
... Remainder is not relevant
Starting with
if "%ARGV:~,1%"==":" echo %ARGV% begins with a colon.
The variable expansion expands all of the following strings to nothing because the variable is undefined:
%ARGV:
%"==":
%ARGV%
And you are left with:
if "~,1" echo begins with a colon.
It works with delayed expansion because the IF statement is parsed before delayed expansion (explained in jeb's answer within phase 2)
Everything works from the command line because the command line variable expansion does not remove the string when the variable is not defined. (loosely explained in jeb's answer near the bottom within CmdLineParser:, Phase1(Percent))

Escaping exclamation marks required in replace string but not in search string (substring replacement with delayed expansion on)?

Supposing one wants to replace certain substrings by exclamation marks using the substring replacement syntax while delayed expansion is enabled, they have to use immediate (normal) expansion, because the parser cannot distinguish between !s for expansion and literal ones.
However, why does one have to escape exclamation marks in the replacement string? And why is it not necessary and even disruptive when exclamation marks in the search string are escaped?
The following script replaces !s in a string by ` and in reverse order afterwards, so I expect the result to be equal to the initial string (which must not contain any back-ticks on its own of course):
#echo off
setlocal EnableExtensions DisableDelayedExpansion
rem This is the test string:
set "STRING=string!with!exclamation!marks!"
set "DELOFF=%STRING%"
set "DELOFF=%DELOFF:!=`%"
set "DELOFF=%DELOFF:`=!%"
setlocal EnableDelayedExpansion
set "DELEXP=!STRING!"
set "DELEXP=%DELEXP:!=`%"
set "DELEXP=%DELEXP:`=!%"
echo(original string: !STRING!
echo(normal expansion: !DELOFF!
echo(delayed expansion: !DELEXP!
endlocal
endlocal
exit /B
This result is definitely not what I want, the last string is different:
original string: string!with!exclamation!marks!
normal expansion: string!with!exclamation!marks!
delayed expansion: stringexclamation
As soon as take the line...:
set "DELEXP=%DELEXP:`=!%"
....and replace the ! by ^! there, hence escaping the exclamation mark in the replace string, the result is exactly what I expect:
original string: string!with!exclamation!marks!
normal expansion: string!with!exclamation!marks!
delayed expansion: string!with!exclamation!marks!
When I try other escaping combinations though (escape the exclamation mark in both the replace and the search string, or in the latter only), the result is again the aforementioned unwanted one.
I walked through the post How does the Windows Command Interpreter (CMD.EXE) parse scripts? but I could not find an explanation to that behaviour, because I learned the normal (or immediate, percent) expansion is accomplished long before delayed expansion occurs and any exclamation marks are even recognised. Also caret recognition and escaping seems to happen afterwards. In addition, there are even quotation marks around the strings that usualy hide carets from the parser.
Actually, for the substring replacement itself there is no escaping required. It becomes necessary for the later parsing phases only. This is why:
However, why does one have to escape exclamation marks in the replacement string?
The thing is, that immediate (normal, %) expansion is done in a quite early stage, whereas delayed expansion (!), as the name implies, is accomplished as one of the last steps. Hence a immediately expanded string also passes through the delayed expansion phase. As proof, set variables VAR to Value!X! and X to 0, then execute echo %VAR%, so you will get Value0 as the result.
But back to the initial question, when using immediate substring replacement, the replacement string is part of the expanded value, so it is also passed through the delayed expansion phase. Therefore, a literal exclamation mark must be escaped in order not to be consumed by the delayed expansion. This implies that the escaping is not needed for the replacement itself, it is actually done afterwards, so the given replace string including the escaping is applied literally.
And why is it not necessary and even disruptive when exclamation marks in the search string are escaped?
Since caret recognition and so escaping happens after immediate expansion, the search string is treated literally. Furthermore, the search string is replaced and therefore not included in output of immediate substring replacement, so it is not passed through the delayed expansion phase.
Let us look at the original example (excerpt only):
set "STRING=string!with!exclamation!marks!"
setlocal EnableDelayedExpansion
set "DELEXP=!STRING!"
set "DELEXP=%DELEXP:!=`%"
set "DELEXP=%DELEXP:`=!%"
echo(delayed expansion: !DELEXP!
endlocal
The replacement set "DELEXP=%DELEXP:!=`%" searches for !. The resulting value is string`with`exclamation`marks`.
Using set "DELEXP=%DELEXP:^!=`%" would search for ^! literally, so no occurrences would be found of course (so all the literal ! in the original string were kept, they were processed by delayed expansion finally).
The replacement set "DELEXP=%DELEXP:`=!%" replaces ` by ! perfectly, the result string is string!with!exclamation!marks!, but such are consumed by delayed expansion afterwards.
The escaped replacement %DELEXP:`=^!% replaces ` by ^! literally, so the result is string^!with^!exclamation^!marks^!; the escaping is processed afterwards during the delayed expansion phase, resulting in literal ! and the return string string!with!exclamation!marks! finally.
According to the post How does the Windows Command Interpreter (CMD.EXE) parse scripts?, there is a second phase where escaping occurs, which is the delayed expansion phase. This is the one that applies for the example in the original question, because the first escaping (during the special character recognition phase) is disabled due to the surrounding quotation marks (omitting such would lead to the need of double-escaping like ^^!).

Ignore percent sign in batch file

I have a batch file which moves files from one folder to another. The batch file is generated by another process.
Some of the files I need to move have the string "%20" in them:
move /y "\\myserver\myfolder\file%20name.txt" "\\myserver\otherfolder"
This fails as it tries to find a file with the name:
\\myserver\myfolder\file0name.txt
Is there any way to ignore %? I'm not able to alter the file generated to escape this, such as by doubling percent signs (%%), escaping with / or ^ (caret), etc.
You need to use %% in this case. Normally using a ^ (caret) would work, but for % signs you need to double up.
In the case of %%1 or %%i or echo.%%~dp1, because % indicates input either from a command or from a variable (when surrounded with %; %variable%)
To achieve what you need:
move /y "\\myserver\myfolder\file%%20name.txt" "\\myserver\otherfolder"
I hope this helps!
The question's title is very generic, which inevitably draws many readers looking for a generic solution.
By contrast, the OP's problem is exotic: needing to deal with an auto-generated batch file that is ill-formed and cannot be modified: % signs are not properly escaped in it.
The accepted answer provides a clever solution to the specific - and exotic - problem, but is bound to create confusion with respect to the generic question.
If we focus on the generic question:
How do you use % as a literal character in a batch file / on the command line?
Inside a batch file, always escape % as %%, whether in unquoted strings or not; the following yields My %USERNAME% is jdoe, for instance:
echo My %%USERNAME%% is %USERNAME%
echo "My %%USERNAME%% is %USERNAME%"
On the command line (interactively) - as well as when using the shell-invoking functions of scripting languages - the behavior fundamentally differs from that inside batch files: technically, % cannot be escaped there and there is no single workaround that works in all situations:
In unquoted strings, you can use the "^ name-disrupter" trick: for simplicity, place a ^ before every % char, but note that you're not technically escaping % that way (see below for more); e.g., the following again yields something like My %USERNAME% is jdoe:
echo My ^%USERNAME^% is %USERNAME%
In double-quoted strings, you cannot escape % at all, but there are workarounds:
You can use unquoted strings as above, which then requires you to additionally ^-escape all other shell metacharacters, which is cumbersome; these metacharacters are: <space> & | < > "
Alternatively, unless you're invoking a batch file, , you can individually double-quote % chars as part of a compound argument (most external programs and scripting engines parse a compound argument such as "%"USERNAME"%" as verbatim string %USERNAME%):
some_exe My "%"USERNAME"%" is %USERNAME%
From scripting languages, if you know you're calling a binary executable, you may be able to avoid the whole problem by forgoing the shell-invoking functions in favor of the "shell-free" variants, such as using execFileSync instead of execSync in Node.js.
Optional background information re command-line (interactive) use:
Tip of the hat to jeb for his help with this section.
On the command line (interactively), % can technically not be escaped at all; while ^ is generally cmd.exe's escape character, it does not apply to %.
As stated, there is no solution for double-quoted strings, but there are workarounds for unquoted strings:
The reason that "^ name-disrupter" trick (something like ^%USERNAME^%) works is:
It "disrupts" the variable name; that is, in the example above cmd.exe looks for a variable named USERNAME^, which (hopefully) doesn't exist.
On the command line - unlike in batch files - references to undefined variables are retained as-is.
Technically, a single ^ inside the variable name - anywhere inside it, as long as it's not next to another ^ - is sufficient, so that %USERNAME^%, for instance, would be sufficient, but I suggest adopting the convention of methodically placing ^ before each and every % for simplicity, because it also works for cases such as up 20^%, where the disruption isn't even necessary, but is benign, so you can apply it methodically, without having to think about the specifics of the input string.
A ^ before an opening %, while not necessary, is benign, because ^ escapes the very next character, whether that character needs escaping - or, in this case, can be escaped - or not. The net effect is that such ^ instances are ultimately removed from unquoted strings.
Largely hypothetical caveat: ^ is actually a legal character in variable names (see jeb's example in the comments); if your variable name ends with ^, simply place the "disruptive" ^ somewhere else in the variable name, as long as it's not directly next to another ^ (as that would cause a ^ to appear in the resulting string).
That said, in the (very unlikely) event that your variable has a name such as ^b^, you're out of luck.
In batch files, the percent sign may be "escaped" by using a double percent sign ( %% ).
That way, a single percent sign will be used within the command line. from http://www.robvanderwoude.com/escapechars.php
I think I've got a partial solution working. If you're only looking to transfer files that have the "%20" string in their name and not looking for a broader solution, you can make a second batch file call the first with %%2 as the second parameter. This way, when your program tries to fetch the second parameter when it hits the %2 in the text name, it will replace the %2 with an escaped %2, leaving the file name unchanged.
Hope this works!
How to "escape" inside a batch file withoput modify the file**
The original question is about a generated file, that can't be modified, but contains lines like:
move /y "\\myserver\myfolder\file%20name.txt" "\\myserver\otherfolder"
That can be partly solved by calling the script with proper arguments (%1, %2, ...)
#echo off
set "per=%%"
call generated_file.bat %%per%%1 %%per%%2 %%per%%3 %%per%%4
This simply sets the arguments to:
arg1="%1"
arg2="%2"
...
How to add a literal percent sign on the command line
mklement0 describes the problem, that escaping the percent sign on the command line is tricky, and inside quotes it seems to be impossible.
But as always it can be solved with a little trick.
for %Q in ("%") do echo "file%~Q20name.txt"
%Q contains "%" and %~Q expands to only %, independent of quotes.
Or to avoid the %~ use
for /F %Q in ("%") do echo "file%Q20name.txt"
You should be able to use a caret (^) to escape a percent sign.
Editor's note: The link is dead now; either way: It is % itself that escapes %, but only in batch files, not at the command prompt; ^ never escapes %, but at the command prompt it can be used indirectly to prevent variable expansion, in unquoted strings only.
The reason %2 is disappearing is that the batch file is substituting the second argument passed in, and your seem to not have a second argument. One way to work around that would be to actually try foo.bat ^%1 ^%2... so that when a %2 is encountered in a command, it is actually substituted with a literal %2.

Resources