wcscoll returns different result than expected

wcscoll returns different result than expected - c

Consider this code:
#include <wchar.h>
#include <locale.h>
#include <stdio.h>
int main(void) {
setlocale(LC_ALL, "pl_PL.UTF-8");
printf("%d\n", wcscoll(L"ą", L"b"));
return 0;
}
The output is
158
But I'd expect it to be -1, since ą is just before b in the polish alphabet. Why did it return 158? And if not in this way, how can one compare words alphabetically?

I tried it in my Linux machine and I get 1 as output, positive just like yours.
Then I edited the supported locales at /etc/locale.gen, uncommented pl_PL.UTF-8 (not enabled by default), run sudo locale-gen and now it gives -4, that is negative, as expected.
The conclusion is that your system configuration, as is, does not support the selected locale.

Check the return value of setlocale; it's probably not recognizing your country/codepage string.
MS locale names use dashes, not underscores. If you're on Windows, try passing pl-PL instead of pl_PL.UTF-8.

Related

Let nCurses handle UTF8 character by terminal

Ncurses have whole family of print functions for wide characters: https://linux.die.net/man/3/mvwaddnwstr
The problem with those is that it depends glibc, so if some UTF-8 character were yet not added to glibc those won't be printed properly example is: ✅
The solution I can see is to let ncurses handle displaying the character by "terminal", what I mean by that is that if we would have some sort of print function that would accept hex UTF-8 string and let terminal find correct font like below snippet do:
#include <iostream>
#include <unistd.h>
int main()
{
write(1, "\xe2\x9c\x85", 9);
}
Is is possible with nCurses?

Defining _POSIX_C_SOURCE as 2 causes error when changing code page on Windows CMD with MinGW GCC

I've been writing a Linux program that's meant to write non-English characters on the terminal, I've recently been porting it to Windows, and I've run into some issues, when trying to change the code page and the font of the terminal, having the symbolic constant _POSIX_C_SOURCE previously defined seems to change the behavior of the code, and makes it incapable of properly printing non-English characters, for reference, this is my code.
#include <windows.h>
#include <stdio.h>
int main()
{
SetConsoleCP(CP_UTF8)
SetConsoleOutputCP(CP_UTF8)
HANDLE hStdOut = GetStdHandle(STD_OUTPUT_HANDLE);
CONSOLE_FONT_INFOEX cfie;
ZeroMemory(&cfie, sizeof(cfie));
cfie.cbSize = sizeof(cfie);
lstrcpyW(cfie.FaceName, L"Lucida Console");
SetCurrentConsoleFontEx(hStdOut, 0, &cfie);
printf("Ћирилични текст\n");
return 0;
}
This is what the program prints out depending on whether I do or don't define the constant in a command line argument while compiling.
C:\Users\User\Desktop>gcc test.c
C:\Users\User\Desktop>a.exe
Ћириличан текст
C:\Users\User\Desktop>gcc -D_POSIX_C_SOURCE=2 test.c
C:\Users\User\Desktop>a.exe
������������������ ����������

This is because outputting to standard output is done literally byte-by-byte when POSIX compliance is in effect. It uses a different implementation of what is done inside the printf function.

Does btowc(c) always return ( c in 0..127 ? c : WEOF )?

Is btowc(3) locale-dependant? I thought that with LANG=en_US.iso88591 it would return some european chars for bytes between 128 and 255, but it returns WEOF.
$ printf '\xFF\n' | iconv -f iso88591
ÿ
$ LANG=en_US.iso88591 ./a.out
255 -1
_
int main() {
int i = 0xFF;
printf("%d %d\n", i, btowc(i));
}

On my system anyway, going:
#include <locale.h>
//...
setlocale(LC_CTYPE, "en_US.iso88591");
causes the output to be 255 255. So this indicates that it does seem to be locale-dependent, although the C standard doesn't explicitly say that it is, as far as I can see. (It says that the mbs* function family are locale-dependent , but doesn't say so for btowc).
Your post looks like you are expecting the LANG environment variable to change how setlocale is done in the program startup. That variable affects how gcc reads your source files, but perhaps it does not have any run-time effect. The C standard says that programs all start up in the locale C.

strftime not giving correct output with %C option - Solaris 10

We are using /usr/xpg4/bin as default path in our profile.
We are printing the output of variable "curr_date" here:
lt = time(NULL);
ltime=localtime(localtime(&lt));
strftime(curr_date,sizeof(curr_date),"%m/%d/%y%C",ltime);
We get the output as "06/27/13Thu Jun 27 02:39:34 PDT" instead of "06/27/1320".
Do you know what should be the format specifiers that should work here?
Thanks

The use of /usr/xpg4/bin in your $PATH only selects the standard compliant commands, it does not change function calls in your programs to use the standards compliant versions.
As described in the Solaris standards(5) man page there are various #defines and compiler flags you need to use to specify compliance for various standards.
For instance, taking your code snippet and expanding it to this standalone test program:
#include <sys/types.h>
#include <time.h>
#include <stdio.h>
int main(int argc, char **argv)
{
time_t lt;
struct tm *ltime;
char curr_date[80];
lt = time(NULL);
ltime = localtime(&lt);
strftime(curr_date, sizeof(curr_date), "%m/%d/%y%C", ltime);
printf("%s\n", curr_date);
return 0;
}
Then compiling with the different flags shows the different behavior:
% cc -o /tmp/strftime /tmp/strftime.c
% /tmp/strftime
06/30/13Sun Jun 30 20:28:00 PDT 2013
% cc -xc99 -D_XOPEN_SOURCE=600 -o /tmp/strftime /tmp/strftime.c
% /tmp/strftime
06/30/1320
The default mode is backwards compatible with the traditional Solaris code, the second form requests compliance with the C99 and XPG6 (Unix03) standards.

Have a good look at the code between call to strftime() and printing curr_date. You're overwriting curr_data somewhere, because the start of what you print is correct. Might also be something fishy with memory management of curr_data; how is it defined, did you allocate memory for curr_data?
Set a breakpoint right after strftime() and you'll see it holds the expected/correct string.

Greek letters in Windows Concole

I'm writting a program in C and I want to have Greek characters in the menu when I run it in cmd.exe . Someone said that in order to include Greek characters you have to use a printf that goes something like this:
printf(charset:IS0-1089:uffe);
but they weren't sure.
Does anyone know how to do that?

Assuming Windows, you can:
set your console font to a Unicode TrueType font:
emit the data using an "ANSI" mechanism
This code prints γειά σου:
#include "windows.h"
int main() {
SetConsoleOutputCP(1253); //"ANSI" Greek
printf("\xE3\xE5\xE9\xDC \xF3\xEF\xF5");
return 0;
}
The hex codes represent γειά σου when encoded as windows-1253. If you use an editor that saves data as windows-1253, you can use literals instead. An alternative would be to use either OEM 737 (that really is a DOS encoding) or use Unicode.
I used SetConsoleOutputCP to set the console code page, but you could type the command chcp 1253 prior to running the program instead.

you can print a unicode char characters by using printf like this :
printf("\u0220\n");
this will print Ƞ

I think this might only work if your console supports Greek. Probably what you want to do is to map characters to the Greek, but using ASCII. For C# but same idea in C.
913 to 936 = upper case Greek letters
945 to 968 = lower case Greek letters
Read more at Suite101: Working with the Greek Alphabet and C#: How to Display ASCII Codes Correctly when Creating a C# Application | Suite101.com at this link.

One way to do this is to print a wide string. Unfortunately, Windows needs a bit of non-standard setup to make this work. This code does that setup inside #if blocks.
#include <locale.h>
#include <stdlib.h>
#include <stdio.h>
#include <wchar.h>
/* This has been reported not to autodetect correctly on tdm-gcc. */
#ifndef MS_STDLIB_BUGS // Allow overriding the autodetection.
# if ( _WIN32 || _WIN64 )
# define MS_STDLIB_BUGS 1
# else
# define MS_STDLIB_BUGS 0
# endif
#endif
#if MS_STDLIB_BUGS
# include <io.h>
# include <fcntl.h>
#endif
void init_locale(void)
// Does magic so that wprintf() can work.
{
// Constant for fwide().
static const int wide_oriented = 1;
#if MS_STDLIB_BUGS
// Windows needs a little non-standard magic.
static const char locale_name[] = ".1200";
_setmode( _fileno(stdout), _O_WTEXT );
#else
// The correct locale name may vary by OS, e.g., "en_US.utf8".
static const char locale_name[] = "";
#endif
setlocale( LC_ALL, locale_name );
fwide( stdout, wide_oriented );
}
int main(void)
{
init_locale();
wprintf(L"μουσάων Ἑλικωνιάδων ἀρχώμεθ᾽\n");
return EXIT_SUCCESS;
}
This has to be saved as UTF-8 with a BOM in order for older versions of Visual Studio to read it properly. Your console also has to be set to a monospaced Unicode font, such as Lucida Console, to display it properly. To mix wide strings in with ASCII strings, the standard defines the %ls and %lc format specifiers to printf(), although I’ve found these don’t work everywhere.
An alternative is to set the console to UTF-8 mode (On Windows, do this with chcp 65001.) and then print the UTF-8 string with printf(u8"μουσάων Ἑλικωνιάδων ἀρχώμεθ᾽\n");. UTF-8 is a second-class citizen on Windows, but that usually works. Try to run that without setting the code page first, though, and you will get garbage.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

wcscoll returns different result than expected - c

Check the return value of setlocale; it's probably not recognizing your country/codepage string. MS locale names use dashes, not underscores. If you're on Windows, try passing pl-PL instead of pl_PL.UTF-8.

Related

Let nCurses handle UTF8 character by terminal

Defining _POSIX_C_SOURCE as 2 causes error when changing code page on Windows CMD with MinGW GCC

Does btowc(c) always return ( c in 0..127 ? c : WEOF )?

strftime not giving correct output with %C option - Solaris 10

Greek letters in Windows Concole

Categories

Resources