I am currently working on a batch file and using the code page 437. I want the batch to echo the right pointing double angle quotation mark, but can't figure out for the life of me how or find what the equivalent would be.
The Wikipedia article about code page 437 lists all characters this code page defines with their code values.
The character right-pointing double angle quotation mark » has Unicode code value U+00BB. It is available also in code page 437 with decimal code value 175 (hexadecimal AF).
So best is to use a text editor for writing this batch file which supports editing the file with OEM code page 437.
But in case of using Windows Notepad not supporting editing text files with any code page, you have to use ANSI character encoding and insert the macron character ¯ into the batch file which has the decimal code value 175 (hex. AF) in code page Windows-1252 which is the default system ANSI code page on not using Unicode in countries which have code page 437 as system OEM code page.
ANSI is here a common acronym for one byte per character encoding. The code pages Windows-1252 and 437 are not really specified by the American National Standards Institute (ANSI).
Related
This question already has an answer here:
(C) Program that prints all CP850 characters on the screen and in a file have different outputs
(1 answer)
Closed 6 years ago.
I'm making a program in C for a project where it as to print to a file everything it prints to the console. The problem is that I have to print some special characters like 'Ç', so I use the ascii codes and it printd fine to the console however what it prints to the file is incorrect. Here is an example:
printf(" %c", 128);
output to console: Ç
fprintf(output, " %c", 128);
output to file: €
I ran the command chcp in cmd and it tells me I'm using code page 850 and I used those asci codes, so I don't know what is the problem. The program writes to a notepad txt file.
Ç is 128 in code page 437 or 850 etc., encodings which are sometimes used by Windows consoles. The same code 128 is € in code page 1252 or 1250, encodings which are quite often used by Windows graphical applications. The only reasonable way to proceed is to have your consoles use the same encoding as the graphical applications; for this, you can use the command chcp 1252 (change code page) in the console at the command prompt.
(Note: for chcp to be effective, the console must use a TrueType font such as Lucida Console or Consolas.)
Your command line (console) and whatever you use to display the file use different encodings.
Both times the byte 128 is written, but in some Extended ASCII variant (see also Wikipedia) it is interpreted as a C with cedilla,
whereas a common Windows encoding interprets it as the Euro symbol.
I am trying to execute a program from command line where there will be parameters. In my password there is a symbol '£', which I could not find to escape.
It is always good to enclose a parameter string like a quite good password containing also other characters than ASCII letters and numbers in double quotes.
But care must be taken on using characters in batch files which are not from ASCII table, i.e. the code point value (byte value) of the character is greater 127 decimal.
On using Windows Notepad to write a batch file and saving the file with ANSI encoding, the characters with a code point value greater 127 are saved using the code page according to Windows Region and Language settings. For North American and Western European countries this means using code page Windows-1252. The pound sign has decimal value 163 (hexadecimal: A3) in this code page.
But in a command process a different code page is used which can be seen by opening a command prompt window and run the command CHCP (change code page) without any parameter. This command outputs the active code page for command process which depends also on Windows Region and Language settings. The code page OEM 437 is used in North American countries and OEM 850 in Western European countries by default within a command process. The pound sign has the decimal value 156 in code page 437 as also in code page 850.
In other words you need to know what the application which compares the password expects for the pound sign in password:
A byte with value 163 as the password was defined using a GUI application.
A byte with value 156 as the password was defined from within a command prompt window.
Or 1 or even more other byte values depending on the code page and character encoding (ANSI, OEM, UTF-8, UTF-16) used as the password with pound sign was defined. For example UTF-8 character encoding uses 2 bytes with the decimal values 194 and 163 to encode a pound sign.
So what to write into the batch file?
Well, you have to find that out by yourself.
For example the password was defined from within a command prompt window using code page 850 and so the pound sign in stored password is a single byte with value 156. The batch file is edited in Notepad using code page 1252 and therefore the character œ must be used in password to have a byte with value 156 in the batch file in password string.
Thank you for your detailed answer #Mofi.
Background: My CMD program calls SQLPlus and the database password contains a '£'.
Summing this up into a short fix, the following steps worked for me.
The fix:
Open your script in a robust text editor (e.g. Atom, Notepad++,
etc)
Change the file encoding (of the text editor) to CP-1252
Add chcp 1252>nul to the top of your script
Run your script and enjoy the results!
As you have found, handling of the UK pound sign is a trap for the unwary in batch files.
The issue here is that a UK pound sign £ is not an ascii character, so is processed differently by the command prompt and Windows GUI programs like Notepad.
A solution that worked for me was to change the code page in the batch file to 650001 for unicode before using the £ sign.
This idea was discussed at Change the active console Code Page, which explains that the default code page is determined by the Windows Locale.
For example, put this code at the start of your batch file:
#echo off
:: Change the code page to Unicode/65001 before using non-ascii characters.
chcp 65001
Once i write a c program and try to output special characters (like ä ö ü ß) with printf() on the cmd window on windows 10 it only shows sth like ▒▒▒▒▒▒▒▒▒▒▒▒
But if i just type them in the cmd window without a c programm being executed it displays these characters properly.
When i change the console type to standard output in netbeans the output is correct as well.
I tried to change the codepage of cmd but it didnt fix the problem.
I use the gcc c compiler.
The reason is the usage of different code pages for character encoding.
In GUI text editor on writing program code stored in a file on which each character is encoded with just a single byte the code page Windows-1252 is used in Western European and North American countries.
In console window opened on running a console application an OEM code page is used which is in Western European countries OEM 850 and in North American countries OEM 437.
So you need for ÄÖÜäöüß different byte values written in code to get those characters displayed as expected in the console window at least on execution in Western European and North American countries.
Character Windows-1252 OEM 850
Ä \xC4 \x8E
Ö \xD6 \x99
Ü \xDC \x9A
ä \xE4 \x84
ö \xF6 \x94
ü \xF1 \x8C
ß \xDF \xE1
The code page used by default in a console window can be seen by opening a command prompt window and run either chcp (change code page) or mode which both display the active code page.
The default code page for GUI applications and console applications on a computer for a user account depends on the Windows region and language settings for this user account.
Some web pages you should read to better understand character encoding:
Character encoding (English Wikipedia article)
On the Goodness of Unicode by Tim Bray
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolksy
What's the best default new file format? (UltraEdit forum topic)
Programmers should not write non ASCII characters into strings output by a compiled executable because it depends on which code page is used by the compiler on creating the binary representation (bytes) of the characters in executable. It is better to use the hexadecimal notation when active code page on execution of the application is known or defined by the application before the string is output.
It is also possible to store strings in the executable in Unicode, determine the encoding of the output handle before output any string and convert each Unicode string to the encoding of the output handle before the string is written to the output handle.
And of course it depends on used output font how the bytes in the strings in the executable are finally really displayed on screen.
So, I'm trying to get some input from the user in a C program, doing fscanf(stdin, "%s", buffer)
When I input the character å i get a value of 134 which corresponds to the codepage 437.
But when i use the windows function GetACP() i get 1252 as the active codepage and 134 doesn't match å in that codepage. I tried setting the codepage to UTF-8 but that didn't give me any input at all.
Is there a way of getting the corresponding codepage for user input and convert that to unicode format? Or if there's a better way of getting the input.
I've been looking around a lot and I can't find much info on this.
The code page used by the console window is called the OEM code page for historical reasons. You can get the default code page with GetOEMCP and the currently selected code page with GetConsoleCP.
You can set the console to use UTF-8 with the command chcp 65001, but Microsoft does not guarantee it to work in all cases.
If you don't need normal C++ I/O to the console, you can use the Console Functions instead e.g. WriteConsoleW to output a Unicode string.
In my C program I've had to swap my unicode box-drawing characters into escaped characters for DOS code page 437 to get it to work in the Windows command prompt. Is it possible to change the code page of gnome-terminal to display these characters correctly when natively compiling the program for linux?
Thanks.
From https://nethackwiki.com/wiki/IBMgraphics
The current gnome-terminal does not
have a setting for code page 437, but
it does support other code pages that
are equivalent for NetHack's purposes,
such as 862 (Hebrew).
To set code page 862 on
gnome-terminal:
Select Terminal->Set Character Encoding->Add or Remove.
In the pane on the left, select the line with description Hebrew and
encoding IBM862.
Click the right-pointing arrow between the two panes.
Click Close.
The above steps only need to be done
once for the lifetime of the Gnome
installation. Once done, it is
sufficient to:
Select Terminal, Set Character Encoding, and then Hebrew (IBM862).
It should be noted that the current
default gnome-terminal font in Ubuntu
Jaunty fully supports DECgraphics as
long as eight_bit_tty is set to false.
If you need these characters, you should use their correct Unicode codepoint values and output them as UTF-8. Or, if you prefer, you can output them as wide characters and let the standard library's locale system take care of converting them to UTF-8 or another "native" encoding the user has selected (which might even be CP437, although I've never seen a system setup that poorly...).