printf UTF8 characters with printf from Hexadecimal ints - c

Kind of trivial thing but ...
I want to print japanese characters using plain C from Hexadecimals
From this table, I know that, the first char in the table, あ's Entity is &# 12353 and its Hex Entity is x3041, etc.
But how do I use this two numbers in order to get printed all characters in the command line?

If your terminal is set to UTF-8 and locale is set correctly, you may write:
char s[]="あ";
you can also try
char s[]={0xe3,0x81,0x82,0x0}
(the last is the Unicode UTF-8 encoding for "あ"), and then just printf("%s",s);

If __STDC_ISO_10646__ is defined, wchar_t is in Unicode, and you can do something like:
printf("%lc", (wchar_t)0x3041);

Related

Use the letter ñ in C

I have to save in a char[] the letter ñ and I'm not being able to do it. I tried doing this:
char example[1];
example[0] = 'ñ';
When compiling I get this:
$ gcc example.c
error: character too large for enclosing
character literal type
example[0] = 'ñ';
Does anyone know how to do this?
If you're using High Sierra, you are presumably using a Mac running macOS 10.13.3 (High Sierra), the same as me.
This comes down to code sets and locales — and can get tricky. Mac terminals use UTF-8 by default and ñ is Unicode character U+00F1, which requires two bytes, 0xC3 and 0xB1, to represent it in UTF-8. And the compiler is letting you know that one byte isn't big enough to hold two bytes of data. (In the single-byte code sets such as ISO 8859-1 or 8859-15, ñ has character code 0xF1 — 0xF1 and U+00F1 are similar, and this is not a coincidence; Unicode code points U+0000 to U+00FF are the same as in ISO 8859-1. ISO 8859-15 is a more modern variant of 8859-1, with the Euro symbol € and 7 other variations from 8859-1.)
Another option is to change the character set that your terminal works with; you need to adapt your code to suit the code set that the terminal uses.
You can work around this by using wchar_t:
#include <wchar.h>
void function(void);
void function(void)
{
wchar_t example[1];
example[0] = L'ñ';
putwchar(example[0]);
putwchar(L'\n');
}
#include <locale.h>
int main(void)
{
setlocale(LC_ALL, "");
function();
return 0;
}
This compiles; if you omit the call to setlocale(LC_ALL, "");, it doesn't work as I want (it generates just octal byte \361 (aka 0xF1) and a newline, which generates a ? on the terminal), whereas with setlocale(), it generates two bytes (\303\261 in octal, aka 0xC3 and 0xB1) and you see ñ on the console output.
You can use "extended ascii". This chart shows that 'ñ' can be represented in extended ascii as 164.
example[0] = (char)164;
You can print this character just like any other character
putchar(example[0]);
As noted in the comments above, this will depend on your environment. It might work on your machine but not another one.
The better answer is to use unicode, for example:
wchar_t example = '\u00F1';
This really depends on which character set / locale you will be using. If you want to hardcode this as a latin1 character, this example program does that:
#include <cstdio>
int main() {
char example[2] = {'\xF1'};
printf("%s", example);
return 0;
}
This, however, results in this output on my system that uses UTF-8:
$ ./a.out
�
So if you want to use non-ascii strings, I'd recommend not representing them as char arrays directly. If you really need to use char directly, the UTF-8 sequence for ñ is two chars wide, and can be written as such (again with a terminating '\0' for good measure):
char s[3] = {"\xC3\xB1"};

Why storing Unicode Characters in char works?

I have a program I made to test I/O from a terminal:
#include <stdio.h>
int main()
{
char *input[100];
scanf("%s", input);
printf("%s", input);
return 0;
}
It works as it should with ASCII characters, but it also works with Unicode characters and emoji.
Why is this?
Your code works because the input and output stream have the same encoding, and you do not do anything with c.
Basically, you type something, which is converted into a sequence of bytes, which are then stored in c, then you send back that sequence of bytes to stdout which convert them back to readable characters.
As long as the encoding and decoding process are compatible, you will get the "expected" result.
Now, what happens if you try to use standard "string" C functions? Let's assume you typed "♠Hello" in your terminal, you will get the expected output but:
strlen(c) -> 8
c[0] -> Some strange character
c[3] -> H
You see? You may be able to store whatever you want in a char array, it does not mean you should. If you want to deal with extended character sets, use wchar_t instead.
You're probably running on Linux, with your terminal set to UTF-8 so scanf produces UTF-8, and printf can output it. UTF-8 is designed such that char[] can store it. I explicitly use char[] and not char because non-ASCII characters need more than one byte.
Your program is undefined as it has undefined behavior.
scanf("%s", input);
expects a pointer to string, but
char *input[100];
input is pointer to pointer to char, char *.
Your program may work because the buffer you pass to scanf is of sufficient size to store unicode character and a characters you pass don't have a NULL byte in between them, but it may not work as well because the implementation of C on your (and any other) machine is allowed to do anything in cases of UB.

C: Display special characters with printf()

I wanted to know how to display special characters with printf().
I'm doing a string conversion program from Text to Code128 (barcode encoding).
For this type of encoding I need to display characters such as Î, Ç, È, Ì.
Example:
string to convert: EPE196000100000002260500004N
expected result: ÌEPEÇ3\ *R 6\ R $ÈNZÎ
printf result typed: ╠EPEÇ3\ *R 6\ R $ÇNZ╬
printf result image: []
EDIT: I only can use C in this program no C++ at all. All the awnsers I've find so far are in C++ not C so I'm asking how to do it with C ^^
I've find it,
#include <locale.h>
int main()
{
setlocale(LC_ALL,"");
printf("%c%c%c%c\n", 'Î', 'Ç', ' È','Ì');
}
Thank you all for your awnsers it helps me a lot!!! :)
If your console is in UTF-8 it is possible just to print UTF-8 hex representation for your symbols. See similar answer for C++ Special Characters on Console
The following line prints heart:
printf("%c%c%c\n", '\xE2', '\x99', '\xA5');
However, since you print '\xCC', '\xC8', '\xCE','\xC7' and you have 4 different symbols it means that the console encoding is some kind of ASCII extension. Probably you have such encoding http://asciiset.com/. In that case you need characters '\x8c', 'x8d'. Unfortunately there are no capital version of those symbols in that encoding. So, you need some other encoding for your console, for example Latin-1, ISO/IEC 8859-1.
For Windows console:
UINT oldcp = GetConsoleOutputCP(); // save current console encoding
SetConsoleOutputCP(1252);
// print in cp1252 (Latin 1) encoding: each byte => one symbol
printf("%c%c%c%c\n", '\xCC', '\xC8', '\xCE','\xC7');
SetConsoleOutputCP(CP_UTF8);
// 3 hex bytes in UTF-8 => one 'heart' symbol
printf("%c%c%c\n", '\xE2', '\x99', '\xA5');
SetConsoleOutputCP(oldcp);
The console font should support Unicode (for example 'Lucida Console'). It can be changed manually in the console properties, since the default font may be 'Raster Fonts'.

What's the use of universal characters on POSIX system?

In C one can pass unicode characters to printf() like this:
printf("some unicode char: %c\n", "\u00B1");
But the problem is that on POSIX compliant systems `char' is always 8 bits and most of UTF-8 character such as the above are wider and don't fit into char and as the result nothing is printed on the terminal. I can do this to achieve this effect however:
printf("some unicode char: %s\n", "\u00B1");
%s placeholder is expanded automatically and a unicode character is printed on the terminal. Also, in a standard it says:
If the hexadecimal value for a universal character name is less than
0x20 or in the range 0x7F-0x9F (inclusive), or if the universal
character name designates a character in the basic source character
set, then the program is illformed.
When I do this:
printf("letter a: %c\n", "\u0061");
gcc says:
error: \u0061 is not a valid universal character
So this technique is also unusable for printing ASCII characters. In this article on Wikipedia http://en.wikipedia.org/wiki/Character_(computing)#cite_ref-3 it says:
A char in the C programming language is a data type with the size of
exactly one byte, which in turn is defined to be large enough to
contain any member of the basic execution character set and UTF-8 code
units.
But is this doable on POSIX systems?
Use of universal characters in byte-based strings is dependent on the compile-time and run-time character encodings matching, so it's generally not a good idea except in certain situations. However they work very well in wide string and wide character literals: printf("%ls", L"\u00B1"); or printf("%lc", L'\00B1'); will print U+00B1 in the correct encoding for your locale.

How to Print UTF-16 Characters in C?

i have a file containing UTF-16 characters. i read in the file and can store the characters either in a uint16_t array or a char array (any better choice?)
But how do i print those characters?
I'm assuming you want to print to stdout or stderr. One method would be to use libiconv to convert from UTF-16 to UTF-32 (also known as UCS-4) into a wide-character string (wchar_t). You could then use wprintf and friends to print to the standard streams.

Resources