C store and print wchar_t

C store and print wchar_t - c

I want to store a string with characters from extend ascii table, and print them.
I tried:
wchar_t wp[] = L"Росси́йская Акаде́мия Нау́к ";
printf("%S", wp);
I can compile but when I run it, nothing is actually displayed in my terminal.
Could you help me please?
Edit: In response to this comment:
wprintf(L"%s", wp);
Sorry, I forgot to mention that I can only use write(), as was only using printf for my first attempts.

If you want wide chars (16 bit each) as output, use the following code, as suggested by Michael:
wprintf(L"%s", wp);
If you need utf8 output, you have to use iconv() for conversion between the two. See question 7469296 as a starting point.

You need to call setlocale() first and use %ls in printf():
#include <stdio.h>
#include <wchar.h>
#include <locale.h>
int main(int argc, char *argv[])
{
setlocale(LC_ALL, "");
// setlocale(LC_ALL, "C.UTF-8"); // this also works
wchar_t wp[] = L"Росси́йская Акаде́мия Нау́к";
printf("%ls\n", wp);
return 0;
}
For more about setlocale(), refer to Displaying wide chars with printf

Related

How can I print filled / unfilled square character using printf in c?

I am trying to print "□" and "■" using c.
I tried printf("%c", (char)254u); but it didn't work.
Any help? Thank you!

I do not know what is (char)254u in your code. First you set locale to unicode, next you just printf it. That is it.
#include <locale.h>
#include <stdio.h>
int main()
{
setlocale(LC_CTYPE, "en_US.UTF-8");
printf("%lc", u'□');
return 0;
}

You can use directly like this :
#include <stdio.h>
int main()
{
printf("■");
return 0;
}

You can print Unicode characters using _setmode.
Sets the file translation mode. learn more
#include <fcntl.h>
#include <stdio.h>
#include <io.h>
int main(void) {
_setmode(_fileno(stdout), _O_U16TEXT);
wprintf(L"\x25A0\x25A1\n");
return 0;
}
output
■□

As other answers have mentioned, you have need to set the proper locale to use the UTF-8 encoding, defined by the Unicode Standard. Then you can print it with %lc using any corresponding number. Here is a minimal code example:
#include <stdio.h>
#include <locale.h>
int main() {
setlocale(LC_CTYPE, "en_US.UTF-8"); // Defined proper UTF-8 locale
printf("%lc\n", 254); // Using '%lc' to specify wchar_t instead of char
return 0;
}
If you want to store it in a variable, you must use a wchar_t, which allows the number to be mapped to its Unicode symbol. This answer provides more detail.
wchar_t x = 254;
printf("%lc\n", x);

Reading and Printing Wide Character String

I want to read and print the uppercase version of wide characters in c.
Here's my code:
#include <stdio.h>
#include <wctype.h>
#include <wchar.h>
#include <locale.h>
int main(){
wchar_t sentence[100];
setlocale(LC_ALL, "");
void Edit(wchar_t str[]);
printf("Enter sentence -> ");
wscanf(L"%[^\n]ls", sentence);
Edit(sentence);
getchar();
return 0;
}
void Edit(wchar_t str[]){
int i = -1;
while(str[++i])
if(iswalpha(str[i])) //get rid of whitespaces and other characters
putwchar(towupper(str[i]));
}
It seems that the problem resides in wscanf in fact if I initialize the string like this:
wchar_t sentence[] = L"è";
And the Edit it without reading or asking for a string, it works.
I am using Windows 10, and USA international Keyboard Input,so to make 'è', I have to press ` + e. But I also tried to copy and paste it with ctrl+v, but doesn't work. I am using MINGW with GCC compiler version 6.3.0. I also tried this with my MacBook and doesn't work.
The problem is that if I input "kèy", I want "KÈY" as output. Instead, I get "KSY". I don't know why 'è' outputs 'S' but I tried with other vocals and I get same random characters. However, if I initialize the string as "kèy", I get "KÈY".
Update
I edited wscanf as:
wscanf(L"%ls", sentence);
And it works on my MacBook, but not on windows! Also in this way I can't input spaces, because wscanf stops at the first space.
Update 2
I found something really interesting:
Using this snippet:
int main(){
setlocale(LC_ALL, "");
wchar_t frase[100];
fwide(stdin, 1);
wscanf(L"%ls", frase);
wprintf(L"%ls", frase);
return 0;
}
I found this table and when I input 'è' I get S, which is what the Win column describes. So I tried to input Þ and I got 'è'! I think the problem is with the codes of the cmd, I am using page code: 850
I found a solution
I used system("chcp 1252") to change character set and it WORKED!

Don't use wscanf use fgetws - this will allow you to read white space and protect against buffer overflow. In general its best practice (especially when using user supplied input) to protect against buffer overflow. I edited your code below.
int main(){
wchar_t sentence[100];
setlocale(LC_ALL, "");
void Edit(wchar_t str[]);
printf("Enter sentence -> ");
fgetws(sentence,100,stdin);
Edit(sentence);
getchar();
return 0;
}
void Edit(wchar_t str[]){
int i = -1;
while(str[++i])
if(iswalpha(str[i])) //get rid of whitespaces and other characters
putwchar(towupper(str[i]));
}
You may also want to define your buffer size, and properly allocate and free memory as well.

Why Unicode characters are not displayed properly in terminal with GCC?

I've written a small C program:
#include <stdio.h>
#include <stdlib.h>
#include <locale.h>
int main() {
wprintf(L"%s\n", setlocale(LC_ALL, "C.UTF-8"));
wchar_t chr = L'┐';
wprintf(L"%c\n", chr);
}
Why doesn't this print the character ┐ ?
Instead it prints gibberish.
I've checked:
tried compiling without setlocale, same result
the terminal itself can print the character, I can copy-paste it to terminal from text-editor, it's gnome-terminal on Ubuntu
GCC version is 4.8.2

wprintf is a version of printf which takes a wide string as its format string, but otherwise behaves just the same: %c is still treated as char, not wchar_t. So instead you need to use %lc to format a wide character. And since your strings are ASCII you may as well use printf. For example:
int main() {
printf("%s\n", setlocale(LC_ALL, "C.UTF-8"));
wchar_t chr = L'┐';
printf("%lc\n", chr);
}

Reading and printing chinese characters using fread() and printf()?

I am trying to read Chinese characters from an infile, and I have found a few questions on the subject here but nothing that works for me or suits my needs. I am using the fread() implementation from this question, but it is not working. I am running Linux.
#define UNICODE
#ifdef UNICODE
#define _UNICODE
#else
#define _MBCS
#endif
#include <locale.h>
#include <stdio.h>
#include <wchar.h>
#include <string.h>
#include <stdlib.h>
int main(int argc, char * argv[]) {
FILE *infile = fopen(argv[1], "r");
wchar_t test[2] = L"\u4E2A";
setlocale(LC_ALL, "");
printf("%ls\n", test); //test
wcscpy(test, L"\u4F60"); //test
printf("%ls\n", test); //test
for (int i = 0; i < 5; i++){
fread(test, 2, 2, infile);
printf("%ls\n", test);
}
return 0;
}
I use the following text file to test it:
一个人
两本书
三张桌子
我喜欢一个猫
and the program outputs:
个
你
������
Anyone have any wisdom on the subject?
Edit: Also, that's all of my code because I'm not sure where it fails. There's some stuff in there where I test to make sure I can print unicode wchars that isn't entirely relevant to the question.

If you really need to read a UTF-8 (or rather a locale charmap) file one codepoint at a time you can use fscanf as below. But do note, this is codepoints not characters, characters may consist of multiple codepoints because of combining codes and some of the codepoints are most definitely not printable.
#include <locale.h>
#include <stdio.h>
#include <wchar.h>
#include <string.h>
#include <stdlib.h>
int
main(int argc, char *argv[])
{
FILE *infile = fopen(argv[1], "r");
wchar_t test[2] = L"\u4E2A";
setlocale(LC_ALL, "");
printf("%ls\n", test); //test
wcscpy(test, L"\u4F60"); //test
printf("%ls\n", test); //test
for (int i = 0; i < 5; i++) {
fscanf(infile, "%1ls", test);
printf("%ls\n", test);
}
return 0;
}
Most of the time you probably won't need to use the locale functionality because UTF-8 generally just works if you treat it as an opaque encoding. Part of this is because all non ASCII characters have all their component bytes in the 128..253 range (not a typo, 254 and 255 are unused) another part is that the bytes 128..159 are always continuation bytes all the start bytes for characters are 160..253 which means an error will just break one character not the rest of the stream. (Okay, codepoints vs characters is only really there to try to convince you that dividing UTF-8 up into "characters" probably won't do what you want).

You are telling fread to read two 2-byte values in each call; however, the characters you want to read have 3-byte UTF-8 encodings. In general, you need to decode the UTF-8 stream as a whole, not in fixed-sized byte chunks.

Output unicode wchar_t character

Just trying to output this unicode character ☒ in C using MinGW. I first put it on a buffer using swprintf, and then write it to the stdout using wprintf.
#include <stdio.h>
int main(int argc, char **argv)
{
wchar_t buffer[50];
wchar_t c = L'☒';
swprintf(buffer, L"The character is: %c.", c);
wprintf(buffer);
return 0;
}
The output under Windows 8 is:
The character is: .
Other characters such as Ɣ doesn't work neither.
What I am doing wrong?

You're using %c, but %c is for char, even when you use it from wprintf(). Use %lc, because the parameter is whar_t.
swprintf(buffer, L"The character is: %lc.", c);
This kind of error should normally be caught by compiler warnings, but it doesn't always happen. In particular, catching this error is tricky because both %c and %lc actually take int arguments, not char and wchar_t (the difference is how they interpret the int).

To output Unicode (or to be more precise UTF-16LE) to the Windows console, you have to change the file translation mode to _O_U16TEXT or _O_WTEXT. The latter one includes the BOM which isn't of interest in this case.
The file translation mode can be changed with _setmode. But it takes a file descriptor (abbreviated fd) and not a FILE *! You can get the corresponding fd from a FILE * with _fileno.
Here's an example that should work with MinGW and its variants, and also with various Visual Studio versions.
#define _CRT_NON_CONFORMING_SWPRINTFS
#include <stdio.h>
#include <io.h>
#include <fcntl.h>
int
main(void)
{
wchar_t buffer[50];
wchar_t c = L'Ɣ';
_setmode(_fileno(stdout), _O_U16TEXT);
swprintf(buffer, L"The character is: %c.", c);
wprintf(buffer);
return 0;
}

This works for me:
#include <locale.h>
#include <stdio.h>
#include <wchar.h>
int main(int argc, char **argv)
{
wchar_t buffer[50];
wchar_t c = L'☒';
if (!setlocale(LC_CTYPE, "")) {
fprintf(stderr, "Cannot set locale\n");
return 1;
}
swprintf(buffer, sizeof buffer, L"The character is %lc.", c);
wprintf(buffer);
return 0;
}
What I changed:
I added wchar.h include required by the use of swprintf
I added size as the second argument of swprintf as required by C
I changed %c conversion specification to %lc
I change locale using setlocale

This FAQ explains how to use UniCode / wide characters in MinGW:
https://sourceforge.net/p/mingw-w64/wiki2/Unicode%20apps/

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

C store and print wchar_t - c

If you want wide chars (16 bit each) as output, use the following code, as suggested by Michael: wprintf(L"%s", wp); If you need utf8 output, you have to use iconv() for conversion between the two. See question 7469296 as a starting point.

Related

How can I print filled / unfilled square character using printf in c?

Reading and Printing Wide Character String

Why Unicode characters are not displayed properly in terminal with GCC?

Reading and printing chinese characters using fread() and printf()?

Output unicode wchar_t character

Categories

Resources