fprintf() issues utf-8 linux - c

Ok, I got to print UTF-8 encoded chars to terminal. But printing to file is not working like i expected. Using wchar.h and locale.h as such:
#include <locale.h>
#include <wchar.h>
int main(){
setlocale(LC_ALL,"");
wint_t index = 0;
FILE* fpinout = fopen("UTF-8.txt","w");
for(index = 0; index < 0x200; index++){
printf("%i:\t%lc\n", index, index); //works fine, prints utf-8 chars to terminal
fprintf(fpinout,"%i\t%lc", index, index); //does not work, output is wierd
}
fclose(fpinout);
}
I tried to use index there both as wint_t and wchar_t.
My UTF-8.txt file looks like this:
र㄀ĉल㌂̉ऴ㔄ԉश㜆܉स㤈उ〱ਉㄱଉ㈱ఉ㌱ഉ㐱ฉ㔱༉㘱ဉ㜱ᄉ㠱ሉ㤱ጉ〲ᐉㄲᔉ㈲ᘉ㌲ᜉ㐲᠉㔲ᤉ㘲ᨉ㜲ᬉ㠲ᰉ㤲ᴉ〳ḉㄳἉ㈳ ㌳℉㐳∉㔳⌉㘳␉㜳
┉㠳☉㤳✉〴⠉ㄴ⤉㈴⨉㌴⬉㐴Ⰹ㔴ⴉ㘴⸉㜴⼉㠴〉㤴ㄉ〵㈉ㄵ㌉㈵㐉㌵㔉㐵㘉㔵㜉㘵㠉㜵㤉㠵㨉㤵㬉〶㰉ㄶ㴉㈶㸉㌶㼉㐶䀉㔶䄉㘶䈉
㜶䌉㠶䐉㤶䔉〷䘉ㄷ䜉㈷䠉㌷䤉㐷䨉㔷䬉㘷䰉㜷䴉㠷三㤷伉〸倉ㄸ儉㈸刉㌸匉㐸吉㔸唉㘸嘉㜸圉㠸堉㤸変〹娉ㄹ嬉㈹尉㌹崉㐹帉
㔹弉㘹怉㜹愉㠹戉㤹按〱रㅤ㄰攉〱लㅦ㌰有〱ऴㅨ㔰椉〱शㅪ㜰欉〱सㅬ㤰洉ㄱरㅮㄱ漉ㄱलㅰ㌱焉ㄱऴㅲ㔱猉ㄱशㅴ㜱甉ㄱसㅶ㤱眉
㈱रㅸㄲ礉㈱लㅺ㌲笉㈱ऴㅼ㔲紉㈱शㅾ㜲缉㈱स胂㈱ह臂㌱र苂㌱ऱ菂㌱ल蓂㌱ळ藂㌱ऴ蛂㌱व蟂㌱श裂㌱ष观㌱स諂㌱ह诂㐱र賂㐱ऱ跂㐱ल軂㐱
ळ迂㐱ऴ郂㐱व釂㐱श鋂㐱ष鏂㐱स铂㐱ह闂㔱र雂㔱ऱ韂㔱ल飂㔱ळ駂㔱ऴ髂㔱व鯂㔱श鳂㔱ष鷂㔱स黂㔱ह鿂㘱रꃂ㘱ऱꇂ㘱लꋂ㘱ळꏂ㘱ऴ꓂
㘱वꗂ㘱शꛂ㘱षꟂ㘱सꣂ㘱ह꧂㜱रꫂ㜱ऱꯂ㜱ल곂㜱ळ귂㜱ऴ껂㜱व꿂㜱श냂㜱ष뇂㜱स닂㜱ह돂㠱र듂㠱ऱ뗂㠱ल뛂㠱ळ럂㠱ऴ룂㠱व맂㠱श뫂
㠱ष믂㠱स볂㠱ह뷂㤱र뻂㤱ऱ뿂㤱ल胃㤱ळ臃㤱ऴ苃㤱व菃㤱श蓃㤱ष藃㤱स蛃㤱ह蟃〲र裃〲ऱ觃〲ल諃〲ळ诃〲ऴ賃〲व跃〲श軃〲ष迃〲स郃〲ह
釃ㄲर鋃ㄲऱ鏃ㄲल铃ㄲळ闃ㄲऴ雃ㄲव韃ㄲश飃ㄲष駃ㄲस髃ㄲह鯃㈲र鳃㈲ऱ鷃㈲ल黃㈲ळ鿃㈲ऴꃃ㈲वꇃ㈲शꋃ㈲षꏃ㈲स꓃㈲हꗃ㌲रꛃ㌲ऱꟃ㌲
लꣃ㌲ळ꧃㌲ऴ꫃㌲वꯃ㌲श곃㌲ष귃㌲स껃㌲ह꿃㐲र냃㐲ऱ뇃㐲ल닃㐲ळ돃㐲ऴ듃㐲व뗃㐲श뛃㐲ष럃㐲स룃㐲ह맃㔲र뫃㔲ऱ믃㔲ल볃㔲ळ뷃㔲ऴ뻃
㔲व뿃
Any help is appreciated.

This way you write UTF32 by fact. Opening file in binary mode won't help. it will remain UTF32LE.
You should use transformation to UTF8 encoding. Either use ICU library or wctomb / wctombs / wclen c functions ( http://man7.org/linux/man-pages/man3/wctomb.3.html ). be aware that wctomb* functions usually are locale dependent (often won't work correctly with Japanese if you have Greek locale)

Related

Change the character encode in PostgreSQL C language function

I am using PostgreSQL 9.5 64bit version on windows server.
The character encoding of the database is set to UTF8.
I'd like to create a function that manipulates multibyte strings.
(e.g. cleansing, replace etc.)
I copied C language logic for manipulating characters from a other system,
The logic assumes that the character code is sjis.
I do not want to change C language logic, so I want to convert from UTF8 to sjis in C language function of Postgresql.
Like the convert_to function. (However, since the convert_to function returns bytea type, I want to obtain it with TEXT type.)
Please tell me how to convert from UTF 8 to sjis in C language.
Create Function Script:
CREATE FUNCTION CLEANSING_STRING(character varying)
RETURNS character varying AS
'$libdir/MyFunc/CLEANSING_STRING.dll', 'CLEANSING_STRING'
LANGUAGE c VOLATILE STRICT;
C Source:
#include <stdio.h>
#include <string.h>
#include <postgres.h>
#include <port.h>
#include <fmgr.h>
#include <stdlib.h>
#include <builtins.h>
#ifdef PG_MODULE_MAGIC
PG_MODULE_MAGIC;
#endif
extern PGDLLEXPORT Datum CLEANSING_STRING(PG_FUNCTION_ARGS);
PG_FUNCTION_INFO_V1(CLEANSING_STRING);
Datum CLEANSING_STRING(PG_FUNCTION_ARGS)
{
// Get Arg
text *arg1 = (text *)PG_GETARG_TEXT_P(0);
// Text to Char[]
char *arg;
arg = text_to_cstring(arg1);
// UTF8 to Sjis
//Char *sjisChar[] = foo(arg); // something like that..
// Copied from other system.(Assumes that the character code is sjis.)
cleansingString(sjisChar);
replaceStrimg(sjisChar);
// Sjis to UTF8
//arg = bar(sjisChar); // something like that..
//Char[] to Text and Return
PG_RETURN_TEXT_P(cstring_to_text(arg));
}
Succeeded in the way I was taught by question comments.
#include <mb/pg_wchar.h> //Add to include.
...
Datum CLEANSING_STRING(PG_FUNCTION_ARGS)
{
// Get Arg
text *arg1 = (text *)PG_GETARG_TEXT_P(0);
// Text to Char[]
char *arg;
arg = text_to_cstring(arg1);
// UTF8 to Sjis
Char *sjisChar[] = pg_server_to_any(arg, strlen(arg), PG_SJIS);
// Copied from other system.(Assumes that the character code is sjis.)
cleansingString(sjisChar);
replaceStrimg(sjisChar);
// Sjis to UTF8
arg = pg_any_to_server(sjisChar, strlen(sjisChar), PG_SJIS); //It converts from SJIS to server (UTF 8), the third argument sets the encoding of the conversion source.
//Char[] to Text and Return
PG_RETURN_TEXT_P(cstring_to_text(arg));
}

Visual Studio C Program: How to print symbols for card suits?

I'm trying to make a card game and I want to use the actual card suit symbols to print cards as so:
5♣ J♦ 10♠ Q♥
Problem is I literally have zero idea how to code these symbols to print successfully in a program.
You'll need to use the unicode characters for those symbols along with a font that supports them. This page lists the unicode character code for various suits. They are:
Spade = U+2660, Heart = U+2665, Diamond = U+2666, Heart = U+2663
This will give you black suits. There's also characters for white suits.
You'll also need to make sure you are using wchar_t to represent the characters, not char as it won't be wide enough. Also, make sure you use functions like wprintf to do your output.
With Windows console font set to "Lucida Console" the following works:
#include <stdio.h>
int main (void)
{
int i;
for(i=3; i<=6; i++)
printf("%c", i);
printf("\n");
return 0;
}
Program output:
♥♦♣♠
Similarly with "Consolas" font.

c doesn't print "┌──┐" character correctly

Good afternoon, I'm facing a problem on my c code and I don't know what is causing it.
Every time I try to print characters like these: "┌──┐" my program simply prints some strange characters, like on this screenshot:
I'm using Qt Creator on Windows, with Qt version 5.5.0 MSVC 64 bits. The compiler is the Microsoft Visual C++ Compiler 12.0 (amd64).
I tried changing the locale but with no success. The only way I found to print these characters was to define them as int variables with the ASCII code and printing them, but it led to some really extensive and ugly coding, like this:
int cSupEsq = 218; //'┌'
int cSupDir = 191; //'┐'
int cInfEsq = 192; //'└'
int cInfDir = 217; //'┘'
int mVert = 179; //'│'
int mHor = 196; //'─'
int espaco = 255; //' '
int letraO = 111; //'o'
//Inicia limpando a tela da aplicação
clrscr();
//Linha 1
printf("%c", cSupEsq);
for (i = 1; i < 79; i++) { printf("%c", mHor); }
printf("%c", cSupDir);
Is there any way I can make the program treat these characters correctly? What could be causing this problem?
Your solution to use the OEM code points is the right way to go, codepage 850/437 is the default code page for the console and therefore should work. You could also use SetConsoleOutputCP to ensure the correct code page is used for the console.
Having said that, what is happening when you do not use your workaround is that the source file is being saved using a different codepage ie. not codepage 850/437. The in memory representation of the source code is Unicode (probably UTF-8), when you save the file the in memory representation of the characters are mapped to the target code page for the file.
What you can do is to save the file using the 850/437 codepage as the target, I don't know how you do this in Qt Creator (If you can at all), in Visual Studio for example you can select the down arrow on the Save button and select "Save with encoding", you can then proceed to select the target codepage, in your case code page 850. This will ensure that the in memory code points are mapped correctly to the file to be compiled.
I hope that helps explain the issue.
It shouldn't be necessary to print the characters one at a time. Instead, you can use an escape sequence:
printf("\xDA\xBF\xC0\xD9\xB3\xC4\xFF");

Printing Greek characters in C

Is there any way to print Greek characters in C?
I'm trying to print out the word "ΑΝΑΓΡΑΜΜΑΤΙΣΜΟΣ"
with:
printf("ΑΝΑΓΡΑΜΜΑΤΙΣΜΟΣ");
but I get some random symbols as output in the console.
Set your console font to a Unicode TrueType font and emit the data using an "ANSI" mechanism (that's assuming Windows... ). For example this code prints γειά σου:
#include "windows.h"
int main()
{
SetConsoleOutputCP(1253); //"ANSI" Greek
printf("\xE3\xE5\xE9\xDC \xF3\xEF\xF5"); // encoded as windows-1253
return 0;
}
Use a console that supports Unicode, like Console2
Use wprintf or similar functions
Always use Unicode :)

Wrong glyphs displayed when using emWin and Korean fonts

I am using SEGGER emWin on an embedded system.
I have downloaded a Korean font: Korean True Type Font
And converted the font to C language data statements.
When I printed the text: 한국어 ("Korean"), nothing printed out.
The hex code for the text (UTF-8) is: \xED\x95\x9C\xEA\xB5\xAD\xEC\x96\xB4
I opened up the font in the Font Creator and noticed the glyph at offset 0xED does not match the first glyph in the text. Also, there are no glyphs at offset 0xED95 or 0x95ED.
I converted the file using 16-bit Unicode.
The hex code for the text was determined by using Google Translate, then copying the text into Notepad, saving the text as UTF-8 and then opening up the text file with a hex editor.
How do I get the hex string to print the appropriate glyphs?
Am I having a Unicode vs. UTF-8 issues?
Edit 1:
I am not calling any functions to change the encoding, as I am confused on that part.
Here's the essential code:
// alphabetize languages for display
static const Languages_t Language_map[] =
{
{"Deutsch", ESG_LANG_German__Deutsch_},
{"English", ESG_LANG_English},
{"Espa\303\361ol", ESG_LANG_Spanish__Espanol_},
{"Fran\303\247ais", ESG_LANG_French__Francais_}, /* parasoft-suppress MISRA2004-7_1 "octal sequence needed for text accents on foreign language text" */
{"Italiano", ESG_LANG_Italian__Italiano_},
{"Nederlands", ESG_LANG_Dutch__Nederlands_},
{"Portugu\303\252s", ESG_LANG_Portuguese__Portugues_}, /* parasoft-suppress MISRA2004-7_1 "octal sequence needed for text accents on foreign language text" */
{"Svenska", ESG_LANG_Swedish__Svenska_},
{"\xED\x95\x9C\xEA\xB5\xAD\xEC\x96\xB4",ESG_LANG_Korean}, // UTF-8
// {"\xFF\xFE\x5c\xD5\x6D\xAD\xB4\xC5", ESG_LANG_Korean}, // Unicode
};
for (index = ESG_LANG_English; index < ESG_LANG_MAX_LANG; index++)
{
if (index == ESG_LANG_Korean)
{
GUI_SetFont(&Font_KTimesSSK22_12pt);
}
else
{
GUI_SetFont(&GUI_FontMyriadPro_Semibold_22pt);
}
if (index == language)
{
GUI_SetColor(ESG_WHITE);
}
else
{
GUI_SetColor(ESG_AMR_BLUE);
}
(void) GUI_SetTextAlign(GUI_TA_HCENTER);
GUI_DispStringAt(Language_map[index].name,
(signed int)Language_position[index].x,
(signed int)Language_position[index].y);
}
//...
void GUI_DispStringAt(const char GUI_UNI_PTR *s, int x, int y) {
GUI_LOCK();
GUI_pContext->DispPosX = x;
GUI_pContext->DispPosY = y;
GUI_DispString(s);
GUI_UNLOCK();
}
The GUI_UNI_PTR is not for Unicode, but for "Universal":
/* Define "universal pointer". Normally, this is not needed (define will expand to nothing)
However, on some systems (AVR - IAR compiler) it can be necessary ( -> __generic),
since a default pointer can access RAM only, not the built-in Flash
*/
#ifndef GUI_UNI_PTR
#define GUI_UNI_PTR
#define GUI_UNI_PTR_USED 0
#else
#define GUI_UNI_PTR_USED 1
#endif
The emWin is performing correctly.
The system is set up for UTF-8 encodings.
The issue is finding a truetype unicode font that contains all the glyphs (bitmaps) for the Korean language. Many fonts claim to support Korean, but their glyphs are in the wrong place for unicode.

Resources