Batch script Latin characters - batch-file

I am writing a batch script to go through some directories doing an specific task, something like the following:
set DBCreationScript=//Here I set the full path for the script
echo %DBCreationScript%
Problem is the path has got some latin characters (ç, ã, á) and when I run the script, the output shows strange characters, not the ones I typed in. The batch script is in ANSI encoding.
I already tried to set the script encoding to UTF-8, but apparently the batch interpreter can't handle the control characters that appear on the beggining of the file.
Any thoughts?

Save the batch file in OEM encoding (a decent editor should allow this) or change the code page prior to running it with
chcp 1252
You can also save it as UTF-8 without signature (BOM) and use
chcp 65001
but down that path lies peril and dragons await to eat you (in short: It's usually painful and has a few weird side-effects).

Related

En Dash in file path batch job query

Hoping someone can help.
I have a batch job (Windows env) which simply copies a file to another folder.
copy "\\ACP-MS-NAS21\Global\MEC Daily Productivity\Business Analysts\Master_List\HCP_Master_List.xlsx" ^
"\\ACP-MS-NAS21\Global\CSD [?] DWP Medical Services\CSL_CSD_DB\Master_List"
But I get the following error:
The filename, directory name, or volume label syntax is incorrect.
I can see there's an en dash in the file path which I believe is causing the problem.
Is there any way to have a wildcard in the file path or any other way the job can recognise this?
Thanks in advance.
PS. newbie at batch programming, coding etc, so please can explanations be in plain English. Many thanks
The command-line windows (and by extension - BAT files) operate in OEM codepage by default. Which exact codepage is defined by your OS settings (Language for non-Unicode programs or similar). Therefore you cannot normally use anything outside ASCII or whatever codepage you have.
Save script as UTF-8
To use those characters you will need to work in a codepage that actually has them. UTF-8 is the best one for this tasks (and probably the only one that will work in your case).
First, save your script as UTF-8. In notepad you can select codepage from save as menu.
If your editor does not allow you to select UTF-8 (no BOM) leave first line of your BAT file blank as some editors may preface your file a special header called BOM that helps detecting codepage. If it does and you leave 1st line blank you will get a Bad command or file name error as soon as your script starts but this won't prevent it from running properly.
Select UTF-8 codepage
Now, your script is in UTF-8 but windows command processor will still execute it as if it was in ASCII, thus corrupting all special characters. To specify our encoding we need to add following command, preferentially as a first non-blank line of your script, including comments (those may have non-ASCII characters in them - with unpredictable results)
CHCP 65001
CHCP changes current codepage to 65001 which is internal codepage number for UTF-8.
This works because latin letters and numbers in UTF-8 have the same encoding as in ASCII and OEM codepages. Thus your scripts starts executing in OEM codepage, but since CHCP 65001 command itself does not have non-ASCII characters it understood correctly. All following commands will be executed in UTF-8 and may have non-ASCII characters.
You may now insert em-dash into the filename and it won't be replaced with ? upon saving.
Set UTF-8 font
Unfortunately default console window font does not display UTF-8 correctly so you won't be able to see non-ASCII characters correctly.
To solve this you should right-click command-line window title bar, select properties, and change font to UNICODE one. Consolas, Lucida Console and Courier New should work.
Thanks for all the responses.
I've had to make a design change to my DB so have managed to get around the need to do this now (phew :))
Thanks again

How does ¯¯¯¯¯ become ùùùùùùùùù

I have tried: echo ¯¯¯¯¯
but the result becomes
ùùùùùùùùù
this was not the output expected.
the expected output was ¯¯¯¯¯
My previous solution, which saves the batch script in Unicode UTF-8 without BOM and codepage 65001 seems to have issues with both console and C runtime as user #eryksun mentioned.
#eryksun also mentioned in our chat:
Like I said, all of the codepages are supersets of ASCII, so what I mean is to limit the rest of the batch script to just ASCII characters, because they can be decoded properly regardless of the console codepage.
Unicode UTF-8
chcp 65001
echo ¯¯¯¯¯
chcp [Original Codepage]
Explanation by #eryksun:
CMD decodes line by line, i.e. you can change to codepage 65001 just for the non-ASCII lines and then switch back to the original codepage.
If you don't use an editor such as Notepad++ that can save UTF-8 without a BOM (byte order mark), CMD will see the first line as an error since it doesn't know to ignore a BOM.

Controlling codepages in a cmd window when running batch scripts

I have problems controlling character code pages in a Windows cmd window, or rather in DOS scripts (.bat files) I use for certain tasks on my Windows 7 office computer.
Here is the problem:
One of my scripts is used to open certain files in their respective programmes, e.g.
C:\Stuff\Büroeinrichtung\MyFile.xlsx
The crucial thing here is the u-umlaut (ü) in the directory name.
In my script I use
Start "" "C:\Stuff\Büroeinrichtung\MyFile.xlsx"
to start Excel and open the file.
This works as long as I tell my text editor (Notepad++) to encode the script using codepage 850 (Western European), as this is what the cmd windows on my machine use by default.
However, I want to be able to use scripts that are encoded in something else, primarily UTF-8 or UTF-8-BOM. From answers to another question posted here I learned that principally I can set a command in the script for the cmd window to change the codepage, e.g. chcp 65001 for UTF-8. So my script would then look like
chcp 65001
pause :: this is here just to have some visual control while testing
Start "" "C:\Stuff\Büroeinrichtung\MyFile.xlsx"
pause :: dito
But: whatever I do, I do not get this running. The cmd window nicely accepts the change to the codepage, then stops due to pause (in Line 2), but on hitting "enter" to continue I
either get an alert that something is wrong with the ü (other, fancy, characters displayed), or
I get an alert that a directory of that name wasn't found (again obviously something wrong with the ü the actual bits of which seem to respresent something else) or
the cmd window just disappears (apparently crashed, and apparently never reaching Line 4 where a new pause would halt it).
I tried all possible combinations of codepages called in the script and various encodings for the script file (.bat) itself but did not get anywhere.
So, to put the long story short: What do I have to do, in a script encoded in UTF-8 (or so) and going to be run on a machine using codepage 850 by default that a character ü (u-umlaut) in a directory name is to be understood in the script as exactly ü, nothing else?

Is it possible to use extened ASCII characters in a BAT file?

I have a bunch of dynamically created *.BAT files. These BAT files are used to create folders in a server. Just one line in each BAT file, such as: MKDIR \NetworkShare\abc\123
This "abc\123" string is from a database.
It runs OK for a while to create thousands subfolders on demand until today it stopped creating a special subfolder which has a "close single quote" (Alt + 0146 if typing from dos prompt) in the string.
I did some research and found that this "close single quote" is an extended ASCII character. It can't be saved properly in ANSI BAT file (end up as something else). I tried UNICODE and UTF-8 BAT file, but it doesn't work.
The only near-close solution is that I tried a binary editor to make sure it's code 146, but code 146 gives me Æ (ALT-146) not "close single quote" (Alt + 0146).
I know I can manually type special characters in DOS prompt (by using keyboard Alt + ).
But is there a way to properly save this "close single quote" (Alt + 0146) in BAT file so I can execute them dynamically?
The host system is Windows Server 2003 US-English.
Thank you for this CHCP 65001 trick. It leads to proper solution:
I took follow steps to resolve the issue:
+++++++++++++++++++
Prepare the BAT Text File (either manually or dynamically)
+++++++++++++++++++
(1) Make the first line blank (this is necessary, because there are hidden chars in the first line for UTF-8 text file)
(2) Put CHCP 65001 as second line
(3) main line here: MKDIR \networkshare\abc(right single quote-->this is special extended ASCII char)\123
(4) make sure the BAT file saved as UTF-8
+++++++++++++++++++
Now it's the CMD.EXE trick
+++++++++++++++++++
(1) Start cmd.exe
(2) open cmd.exe black screen property
(3) make sure the black screen font is "true type" i.e. "TT" like. By default, it is raster font, can not handle special ascii code properly. (This is the key step)
(4) now I can run my BAT to handle those extended ASCII chars properly.
Try changing the code page of your batch file to UTF-8: Insert this line at the top of your batch file and save the file as UTF-8:
chcp 65001
Be careful though: Creating folders with non-ASCII letters can break some programs that may rely on older API of libraries, or just assume that all folder and file names are ASCII.

Running BAT/CMD file with accented characters in it

I have a Windows batch file which has an instruction to execute an EXE file in a location whose path contains accented characters. Following are the contents of the batch file.
#echo off
C:\español\jre\bin\java.exe -version
C:\español\jre\bin\java.exe - This path exists and is proper. I can run this command directly on cmd.exe. But when I run the command from a bat/cmd file it fails saying "The system cannot find the path specified"
One way to fix this is by setting code page to 1252 (that works for me). But I'm afraid we'd have to set code pages for any non-English locale and figuring out which code page to use is pretty difficult.
Is there an alternative approach to fix this problem? Maybe a command-line option or something else?
Another way of doing this, in Windows, is by using wordpad.exe:
Run wordpad.exe
Write your script as you usually do, with accents
Choose Save as > Other formats
Choose to save it as Text document MS-DOS (*.txt)
Change the file extension from .txt to .bat
I had the same problem, and this answer solved it. Basically you have to wrap your script with a bunch of commands to change your terminal codepage, and then to restore it.
#echo off
for /f "tokens=2 delims=:." %%x in ('chcp') do set cp=%%x
chcp 1252>nul
:: your stuff here ::
chcp %cp%>nul
Worked like a charm!
I'm using Notepad++ and it has an option to change "character sets", OEM-US did the trick. ;)
Since you have #echo off you can't see what your batch is sending to the command prompt. Reproducing your problem with that off it seems like the ñ character gets misinterpreted since the output I see is:
C:\espa±ol\jre\bin\java -version
The system cannot find the path specified.
I was able to get it to work by echoing the command into the batch file from the command prompt, i.e.
echo C:\español\jre\bin\java.exe -version>>test.bat
This seems to translate the character into whatever the command prompt is looking for, though I've only tested it with English locale set so I don't know if it'll work in all situations for you. Also, if you open the batch in a text editor like notepad it looks wrong (C:\espa¤ol\jre\bin\java.exe)
Use Alt + 0164 for ¤ instead of Alt + 164 ñ in a batch file... It will look odd, but your script should run.
You can use Visual Studio Code and it will let you select the encoding you want to use. Lower right corner, you select the encoding and will display option "save with encoding". Select DOS and will save the accented chars.
I also had the same problem. I was trying to create a simple XCOPY batch file to copy a spreadsheet from one folder to another. Its name had the "é" character in it, and it refused to copy.
Even trying to use Katalin's and Metalcoder's suggestions didn't work on my neolithic Windows XP machine. Then I suddenly thought: Why not keep things as simple as possible (as I am myself extremely simple-minded when it comes to computers) and just substitute, in the batch file code, "é" with the wildcard character "?".
And guess what? It worked!

Resources