I have been given this school project. I have to alphabetically sort list of items by Czech rules. Before I dig deeper, I have decided to test it on a 16 by 16 matrix so I did this:
typedef struct {
wint_t **field;
}LIST;
...
setlocale(LC_CTYPE,NULL);
....
list->field=(wint_t **)malloc(16*sizeof(wint_t *));
for(int i=0;i<16;i++)
list->field[i]=(wint_t *)malloc(16*sizeof(wint_t));
In another function I am trying to assign a char. Like this:
sorted->field[15][15] = L'C';
wprintf(L"%c\n",sorted->field[15][15]);
Everything is fine. Char is printed. But when I try to change it to
sorted->field[15][15] = L'Č';
It says: Extraneous characters in wide character constant ignored. (Xcode) And the printing part is skipped. The main.c file is in UTF-8. If I try to print this:
printf("ěščřžýááíé\n");
It prints it out as written. I am not sure if I should allocate mem using wint_t or wchar_t or if I am doing it right. I tested it with both but none of them works.
clang seems to support entering arbitrary byte sequences into to wide strings with the \x notation:
wchar_t c = L'\x2126';
This compiles without notice.
Edit: Adapting what I find on wikipedia about wide characters, the following works for me:
#include <stdio.h>
#include <wchar.h>
#include <stdlib.h>
#include <locale.h>
int main(void)
{
setlocale(LC_ALL,"");
wchar_t myChar1 = L'\x2126';
wchar_t myChar2 = 0x2126; // hexadecimal encoding of char Ω using UTF-16
wprintf(L"This is char: %lc \n",myChar1);
wprintf(L"This is char: %lc \n",myChar2);
}
and prints nice Ω characters in my terminal. Make sure that your teminal is able to interpret utf-8 characters.
Related
I am a beginner at C programming. I researched how to get a solution to my problem but I didn't find an answer so I asked here. My problem is:
I want to convert a hex array to a string. for example:
it is my input hex: uint8_t hex_in[4]={0x10,0x01,0x00,0x11};
and I want to string output like that: "10010011"
I tried some solutions but it gives me as "101011" as getting rid of zeros.
How can I obtain an 8-digit string?
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
int main(){
char dene[2];
uint8_t hex_in[4]={0x10,0x01,0x00,0x11};
//sprintf(dene, "%x%*x%x%x", dev[0],dev[1],2,dev[2],dev[3]);
//sprintf(dene, "%02x",hex_in[1]);
printf("dene %s\n",dene);
}
In order to store the output in a string, the string must be large enough. In this case holding 8 digits + the null terminator. Not 2 = 1 digit + the null terminator.
Then you can print each number with %02x or %02X to get 2 digits. Lower-case x gives lower case abcdef, upper-case X gives ABCDEF - otherwise they are equivalent.
Corrected code:
#include <stdio.h>
#include <stdint.h>
int main(void)
{
char str[9];
uint8_t hex_in[4]={0x10,0x01,0x00,0x11};
sprintf(str,"%02x%02x%02x%02x\n", hex_in[0],hex_in[1],hex_in[2],hex_in[3]);
puts(str);
}
Though pedantically, you should always print uint8_t and other types from stdint.h using the PRIx8 etc specifiers from inttypes.h:
#include <inttypes.h>
sprintf(str,"%02"PRIx8"%02"PRIx8"%02"PRIx8"%02"PRIx8"\n",
hex_in[0],hex_in[1],hex_in[2],hex_in[3]);
Following code works:
char *text = "中文";
printf("%s", text);
Then I'm trying to print this text via it's unicode code point which is 0x4e2d for "中" and 0x6587 for "文":
And sure, nothing prints out.
I'm trying to understand what's happening here when I store multi-byte string into char* and how to print multi-byte string with it's unicode code point, and further more, what does it mean by "Format specifier '%ls' requires 'wchar_t *' argument instead of 'wchar_t *'"?
Thanks for any help.
Edit:
I'm on Mac osx (high sierra 10.13.6), with clion
$ gcc --version
Configured with: --prefix=/Library/Developer/CommandLineTools/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 9.1.0 (clang-902.0.39.2)
Target: x86_64-apple-darwin17.7.0
Thread model: posix
wchar_t *arr = malloc(2 * sizeof(wchar_t));
arr[0] = 0x4e2d;
arr[1] = 0x6587;
First, the above string is not null-terminated. The printf function knows the beginning of the array, but it has no idea where the array ends, or what size it has. You have to add a zero at the end to make null-terminated C string.
To print this null-terminated wide string, use "printf("%ls", arr);" for Unix based machines (including Mac), use "wprintf("%s", arr);" in Windows (that's a completely different thing, it actually treats the string as UTF16)
Make sure to add setlocale(LC_ALL, "C.UTF-8"); or setlocale(LC_ALL, ""); for Unix based machines.
#include <stdio.h>
#include <stdlib.h>
#include <locale.h>
int main()
{
setlocale(LC_ALL, "C.UTF-8");
//print single character:
printf("%lc\n", 0x00004e2d);
printf("%lc\n", 0x00006587);
printf("%lc\n", 0x0001F310);
wchar_t *arr = malloc((2 + 1)* sizeof(wchar_t));
arr[0] = 0x00004e2d;
arr[1] = 0x00006587;
arr[2] = 0;
printf("%ls\n", arr);
return 0;
}
Aside,
In UTF32, code points always need 4 bytes (example 0x00004e2d) This can be represented with a 4 byte data type char32_t (or wchar_t in POSIX).
In UTF8, code points need 1, 2, 3, or 4 bytes. UTF8 encoding for ASCII characters needs one byte. While 中 needs 3 bytes (or 3 char values). You can confirm this by running this code:
printf("A:%d 中:%d 🙂:%d\n", strlen("A"), strlen("中"), strlen("🙂"));
Se we can't use a single char in UTF8. We can use strings instead:
const char* x = u8"中";
We can use normal string functions in C, like strcpy etc. But some standard C functions don't work. For example strchr just doesn't work for finding 中. This is usually not a problem because characters such as "print format specifiers" are all ASCII and are one byte.
I've written a small C program:
#include <stdio.h>
#include <stdlib.h>
#include <locale.h>
int main() {
wprintf(L"%s\n", setlocale(LC_ALL, "C.UTF-8"));
wchar_t chr = L'┐';
wprintf(L"%c\n", chr);
}
Why doesn't this print the character ┐ ?
Instead it prints gibberish.
I've checked:
tried compiling without setlocale, same result
the terminal itself can print the character, I can copy-paste it to terminal from text-editor, it's gnome-terminal on Ubuntu
GCC version is 4.8.2
wprintf is a version of printf which takes a wide string as its format string, but otherwise behaves just the same: %c is still treated as char, not wchar_t. So instead you need to use %lc to format a wide character. And since your strings are ASCII you may as well use printf. For example:
int main() {
printf("%s\n", setlocale(LC_ALL, "C.UTF-8"));
wchar_t chr = L'┐';
printf("%lc\n", chr);
}
I want to store a string with characters from extend ascii table, and print them.
I tried:
wchar_t wp[] = L"Росси́йская Акаде́мия Нау́к ";
printf("%S", wp);
I can compile but when I run it, nothing is actually displayed in my terminal.
Could you help me please?
Edit: In response to this comment:
wprintf(L"%s", wp);
Sorry, I forgot to mention that I can only use write(), as was only using printf for my first attempts.
If you want wide chars (16 bit each) as output, use the following code, as suggested by Michael:
wprintf(L"%s", wp);
If you need utf8 output, you have to use iconv() for conversion between the two. See question 7469296 as a starting point.
You need to call setlocale() first and use %ls in printf():
#include <stdio.h>
#include <wchar.h>
#include <locale.h>
int main(int argc, char *argv[])
{
setlocale(LC_ALL, "");
// setlocale(LC_ALL, "C.UTF-8"); // this also works
wchar_t wp[] = L"Росси́йская Акаде́мия Нау́к";
printf("%ls\n", wp);
return 0;
}
For more about setlocale(), refer to Displaying wide chars with printf
i'm trying to printf an ANSI character bigger than 127 using an unsigned char. The problem is that the character that i get is wrong. For example, if i try to print the character number 161 (¡) i get the number 237 (í). Why?
Yeah, sorry. So, i am using CodeBlocks on Windows 8.1 64 bit. This is the code:
unsigned char uc = 160;
...
printf("unsigned char considered': %c\n",uc);
...
I think you mean you are doing this:
#include <stdio.h>
int main ()
{
unsigned char c = 160;
printf ("The character is %c\n",c);
return 0;
}
How top bit set characters work depends on your terminal emulation. If you have a UTF-8 terminal, for instance, it will be expecting unicode sequences.