Batch File Character Encoding on Windows Seven? - file

How do I know what character encoding to save a batch file as?
I've heard that windows uses difrent charset for cmd and GUIs.

It depends on the character set implemented by cmd.
C:\Users\>chcp
Active code page: 437
Code page 437 is (I believe) the default for Windows systems. It uses ASCII encoding, and the following character set:

You can use chcp to display and config the active code page.
Show chcp usage:
C:\> chcp /?
Show current code page:
C:\> chcp
Config active code page (example):
C:\> chcp 437

I think the question has to do with localization, e.g., for echo messages in a .bat file. (and not line-endings or anything like that)
The question for me is then how can a .bat file control being treated as UTF-8 and not the local default code page. Is there a cue to cmd.exe that will get that to work?
In scripts built and used in an international setting, this matters.

Related

Open URL that contains umlaut with batch

I want to open an URL in chrome with a batch file. This works for normal URLs, but it doesn't for URLs with umlauts.
start chrome.exe https://trends.google.de/trends/explore?q=mähroboter
I cannot use "ae" as a replacement for "ä", as it will give me different results on Google Trends.
When I keep it like this, the URL in my browser changes to
https://trends.google.de/trends/explore?q=mA4hroboter
which again gives me the wrong results. It needs to be "ä".
I tried playing around with the file encoding. Currently UTF8 without BOM. I tried UTF8 with BOM, ANSI, converting to and fro. Nothing seemed to work. What can I do to make it work?
URLs must be URL encoded with percent-encoded bytes.
That means the German umlaut ä in a URL must be first UTF-8 encoded with the two bytes with the hexadecimal values C3 A4 and next percent-encoded resulting in %C3%A4 in the URL string:
https://trends.google.de/trends/explore?q=m%C3%A4hroboter
In a batch file a percent sign must be escaped with an additional percent sign to get it interpreted by Windows command processor as literal character and not
as beginning of a batch file argument reference as explained by help of command CALL output on running call /? in a command prompt window, or
beginning of a loop variable reference as explained by help of command FOR output on running for /? in a command prompt window, or
beginning / end of an environment variable reference as explained by help of command SET output on running set /? in a command prompt window.
So in the batch file must be used:
start chrome.exe https://trends.google.de/trends/explore?q=m%%C3%%A4hroboter

Controlling codepages in a cmd window when running batch scripts

I have problems controlling character code pages in a Windows cmd window, or rather in DOS scripts (.bat files) I use for certain tasks on my Windows 7 office computer.
Here is the problem:
One of my scripts is used to open certain files in their respective programmes, e.g.
C:\Stuff\Büroeinrichtung\MyFile.xlsx
The crucial thing here is the u-umlaut (ü) in the directory name.
In my script I use
Start "" "C:\Stuff\Büroeinrichtung\MyFile.xlsx"
to start Excel and open the file.
This works as long as I tell my text editor (Notepad++) to encode the script using codepage 850 (Western European), as this is what the cmd windows on my machine use by default.
However, I want to be able to use scripts that are encoded in something else, primarily UTF-8 or UTF-8-BOM. From answers to another question posted here I learned that principally I can set a command in the script for the cmd window to change the codepage, e.g. chcp 65001 for UTF-8. So my script would then look like
chcp 65001
pause :: this is here just to have some visual control while testing
Start "" "C:\Stuff\Büroeinrichtung\MyFile.xlsx"
pause :: dito
But: whatever I do, I do not get this running. The cmd window nicely accepts the change to the codepage, then stops due to pause (in Line 2), but on hitting "enter" to continue I
either get an alert that something is wrong with the ü (other, fancy, characters displayed), or
I get an alert that a directory of that name wasn't found (again obviously something wrong with the ü the actual bits of which seem to respresent something else) or
the cmd window just disappears (apparently crashed, and apparently never reaching Line 4 where a new pause would halt it).
I tried all possible combinations of codepages called in the script and various encodings for the script file (.bat) itself but did not get anywhere.
So, to put the long story short: What do I have to do, in a script encoded in UTF-8 (or so) and going to be run on a machine using codepage 850 by default that a character ü (u-umlaut) in a directory name is to be understood in the script as exactly ü, nothing else?

How to fix a batch file with an Hebrew font?

I created a batch file that contains characters in Hebrew.
ECHO אאאאא
The result is אאא on running the batch file.
How can I fix it?
It looks like you have encoded your batch file with UTF-8 saved without byte order mark (BOM) for Hebrew Letter Aleph with Unicode code value 05D0.
The batch code below copied into a UTF-8 encoded file without BOM changes the code page to UTF-8 (65001) before the characters are written into the console window.
#echo off
chcp 65001 >nul
ECHO אאאא
Instead of using multi-byte encoding UTF-8, it would be also possible to use single-byte encoding with code page 862 which contains this letter mapped to code value 80 (hexadecimal, 128 decimal).
#echo off
chcp 862 >nul
ECHO אאאא
Code page 862 is the OEM code page for Hebrew.
In console windows usually OEM code pages are used. If you open a command prompt window and execute in this window chcp you can see which code page is by default set on your machine.
But setting the right code page in batch file according to encoding used for the batch file does not automatically mean to get the Hebrew letters now displayed correct in console window on execution of the batch file.
The font used for the console window must support code page 862 respectively the Hebrew letters from Unicode table, too.
As I saw the Hebrew characters displayed wrong in command prompt window with default font setting Raster Fonts on my English Windows 7 x64 machine using by default code page 850 in console windows, I clicked on icon on left side of title bar of command prompt window, clicked in opened menu on Properties and selected Consolas on tab Font. The Hebrew letters were displayed now different than with Raster Fonts, but still not right. So Consolas also does not support Hebrew letters on my machine. Next I tried font Lucida Console, but again the Hebrew letters were not displayed right. In other words non of the 3 fonts available on my machine for console windows can be used to display the Hebrew letters in a console window with the right glyphs.
Read this brief overview of Unicode on a power tip page for text editor UltraEdit if you don't know anything about text encoding.
Command prompt environment is not really designed for Unicode. Select in Windows Control Panel - Region and Language the tab Administrative. There you can set system locale for non-Unicode programs. And there is also a link to a help page explaining what this settings is for - setting default font and code page for single byte encoded text in Windows GUI (Windows-1255) and console windows (OEM 862) with Hebrew (Israel) selected.

Running BAT/CMD file with accented characters in it

I have a Windows batch file which has an instruction to execute an EXE file in a location whose path contains accented characters. Following are the contents of the batch file.
#echo off
C:\español\jre\bin\java.exe -version
C:\español\jre\bin\java.exe - This path exists and is proper. I can run this command directly on cmd.exe. But when I run the command from a bat/cmd file it fails saying "The system cannot find the path specified"
One way to fix this is by setting code page to 1252 (that works for me). But I'm afraid we'd have to set code pages for any non-English locale and figuring out which code page to use is pretty difficult.
Is there an alternative approach to fix this problem? Maybe a command-line option or something else?
Another way of doing this, in Windows, is by using wordpad.exe:
Run wordpad.exe
Write your script as you usually do, with accents
Choose Save as > Other formats
Choose to save it as Text document MS-DOS (*.txt)
Change the file extension from .txt to .bat
I had the same problem, and this answer solved it. Basically you have to wrap your script with a bunch of commands to change your terminal codepage, and then to restore it.
#echo off
for /f "tokens=2 delims=:." %%x in ('chcp') do set cp=%%x
chcp 1252>nul
:: your stuff here ::
chcp %cp%>nul
Worked like a charm!
I'm using Notepad++ and it has an option to change "character sets", OEM-US did the trick. ;)
Since you have #echo off you can't see what your batch is sending to the command prompt. Reproducing your problem with that off it seems like the ñ character gets misinterpreted since the output I see is:
C:\espa±ol\jre\bin\java -version
The system cannot find the path specified.
I was able to get it to work by echoing the command into the batch file from the command prompt, i.e.
echo C:\español\jre\bin\java.exe -version>>test.bat
This seems to translate the character into whatever the command prompt is looking for, though I've only tested it with English locale set so I don't know if it'll work in all situations for you. Also, if you open the batch in a text editor like notepad it looks wrong (C:\espa¤ol\jre\bin\java.exe)
Use Alt + 0164 for ¤ instead of Alt + 164 ñ in a batch file... It will look odd, but your script should run.
You can use Visual Studio Code and it will let you select the encoding you want to use. Lower right corner, you select the encoding and will display option "save with encoding". Select DOS and will save the accented chars.
I also had the same problem. I was trying to create a simple XCOPY batch file to copy a spreadsheet from one folder to another. Its name had the "é" character in it, and it refused to copy.
Even trying to use Katalin's and Metalcoder's suggestions didn't work on my neolithic Windows XP machine. Then I suddenly thought: Why not keep things as simple as possible (as I am myself extremely simple-minded when it comes to computers) and just substitute, in the batch file code, "é" with the wildcard character "?".
And guess what? It worked!

Batch script Latin characters

I am writing a batch script to go through some directories doing an specific task, something like the following:
set DBCreationScript=//Here I set the full path for the script
echo %DBCreationScript%
Problem is the path has got some latin characters (ç, ã, á) and when I run the script, the output shows strange characters, not the ones I typed in. The batch script is in ANSI encoding.
I already tried to set the script encoding to UTF-8, but apparently the batch interpreter can't handle the control characters that appear on the beggining of the file.
Any thoughts?
Save the batch file in OEM encoding (a decent editor should allow this) or change the code page prior to running it with
chcp 1252
You can also save it as UTF-8 without signature (BOM) and use
chcp 65001
but down that path lies peril and dragons await to eat you (in short: It's usually painful and has a few weird side-effects).

Resources