vswprintf keeps prefixing a Byte Order Mark character - c

I am still a rookie with C, and even newer to wide chars in C.
The below code should show
4 points to Smurfs
but it shows
4 points to Smurfs
In gdb I see this:
(gdb) p buffer
$1 = L" 4 points to Smurfs",
But when I copy paste from the console, the spaces are magically gone:
(gdb) p buffer
$1 = L"4 points to Smurfs",
Also, buffer[0] contains this according to gdb:
65279 L' '
Apparently the character in question &#65279 is the Unicode Character 'ZERO WIDTH NO-BREAK SPACE' (U+FEFF). I retyped the code making sure I did not enter this. I don't know where this comes from. I also opened the code in notepad per https://stackoverflow.com/a/9691839/7602 and there is no extra chars there.
I wouldn't care if ncurses would stop showing this as a space.
Code (heavily cut down):
#include <time.h>
#include <stdio.h>
#include <errno.h>
#include <wchar.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <locale.h>
#define NCURSES_WIDECHAR 1
#include <ncursesw/ncurses.h>
#include "types.h"
#include "defines.h"
#include "externs.h"
WINDOW * term;
/*row column color n arguments */
void rccn(int row, int col, const wchar_t *fmt, ...)
{
wchar_t buffer[80];
int size;
va_list args;
va_start(args, fmt);
size = vswprintf(buffer, 80, fmt, args);
va_end( args );
if(size >= 80){
mvaddwstr(row, col, L"Possible hacker detected!");
}else{
mvaddwstr(row, col, buffer);
}
}
int main(void)
{
int ch;
setlocale(LC_ALL,"");
term = initscr();
rccn(1,1,L"%i points to %ls",4,L"Smurfs");
ch = getch();
return EXIT_SUCCESS;
}
The problem goes 'away' with
rccn(1,1,L"%i points to %ls",4,L"Smurfs"+1);
As if the wide encoding of the constant adds that char in front..

Found it..
I had followed a tutorial where it was advised to add this compiler flag:
-fwide-exec-charset=utf-32
My code was not running on Cygwin at all, and I read that Windows is utf-16 centered, so I removed that compiler flag and it started working on Cygwin.
Then out of curiosity I removed the compiler flag on Raspbian, and it is now working as expected there as well, no more byte order marks.

Related

Illegal Instruction :4

i was writing a c code for executing "history 10" command of terminal,i run program using clang compiler on my mac terminal,it show error "Illegal Instruction :4"
My Code is-
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include<errno.h>
#include<sys/wait.h>
#include <unistd.h>
#include<string.h>
int main()
{ char cmd[10];
strcpy(cmd,"history 10");
system(cmd);
return 0;
}
You overrun your buffer: the cmd array has only 10 characters and you strcpy an 11-character string into it (the string has an implicit 11-th zero byte at the end, which is the string's terminator).
Get rid of the buffer and just do
system("history 10");
Or declare the buffer long enough to accomodate your current, and possibly some future command. Something like this:
char cmd[500];

How can I print card suit characters in C Win32 console application?

I have seen a few questions on how to print these characters but none of the methods appear to be working. I suspect it is because I making a Win32 console application based on some of the comments I read.
Here is an example of what I have tried in my code currently. It only prints question mark boxes, or if I change it around I get question marks or random symbols.
I have tried defining these at the top.
#define SPADE '\x06'
#define CLUB '\x05'
#define HEART '\x03'
#define DIAMOND '\x04'
inside function, these are some of the things I've tried. I have left S,D,H,C in case I can't figure it out.
printf("%lc", SPADE);
//printf("♠");
//printf("S");
printf("%lc", HEART);
//printf("♥");
//printf("H");
printf("%lc", DIAMOND);
//printf("♦");
//printf("D");
printf("%lc", CLUB);
//printf("♣");
//printf("C");
UTF-16 wchar_t and wide characters functions are needed in Windows.
#include <windows.h>
int main()
{
DWORD n;
HANDLE hout = GetStdHandle(STD_OUTPUT_HANDLE);
const wchar_t *buf = L"♠♥♦♣\n";
WriteConsoleW(hout, buf, wcslen(buf), &n, 0);
return 0;
}
The following code will compile with Visual Studio:
#include <stdio.h>
#include <io.h> //for _setmode
#include <fcntl.h> //for _O_U16TEXT
int main()
{
_setmode(_fileno(stdout), _O_U16TEXT);
wprintf(L"♠♥♦♣\n");
return 0;
}
After setting the mode to UTF-16, you have to call _setmode(_fileno(stdout), _O_TEXT) if you wish to use printf again.

wchar_t* with UTF8 chars in MSVC

I am trying to format wchar_t* with UTF-8 characters using vsnprintf and then printing the buffer using printf.
Given the following code:
/*
This code is modified version of KB sample:
https://www.ibm.com/support/knowledgecenter/en/ssw_ibm_i_73/rtref/vsnprintf.htm
The usage of `setlocale` is required by my real-world scenario,
but can be modified if that fixes the issue.
*/
#include <wchar.h>
#include <stdarg.h>
#include <stdio.h>
#include <locale.h>
#ifdef MSVC
#include <windows.h>
#endif
void vout(char *string, char *fmt, ...)
{
setlocale(LC_CTYPE, "en_US.UTF-8");
va_list arg_ptr;
va_start(arg_ptr, fmt);
vsnprintf(string, 100, fmt, arg_ptr);
va_end(arg_ptr);
}
int main(void)
{
setlocale(LC_ALL, "");
#ifdef MSVC
SetConsoleOutputCP(65001); // with or without; no dice
#endif
char string[100];
wchar_t arr[] = { 0x0119 };
vout(string, "%ls", arr);
printf("This string should have 'ę' (e with ogonek / tail) after colon: %s\n", string);
return 0;
}
I compiled with gcc v5.4 on Ubuntu 16 to get the desired output in BASH:
gcc test.c -o test_vsn
./test_vsn
This string should have 'ę' (e with ogonek / tail) after colon: ę
However, on Windows 10 with CL v19.10.25019 (VS 2017), I get weird output in CMD:
cl test.c /Fetest_vsn /utf-8
.\test_vsn
This string should have 'T' (e with ogonek / tail) after colon: e
(the ę before colon becomes T and after the colon is e without ogonek)
Note that I used CL's new /utf-8 switch (introduced in VS 2015), which apparently has no effect with or without. Based on their blog post:
There is also a /utf-8 option that is a synonym for setting “/source-charset:utf-8” and “/execution-charset:utf-8”.
(my source file already has BOM / utf8'ness and execution-charset is apparently not helping)
What could be the minimal amount of changes to the code / compiler switches to make the output look identical to that of gcc?
Based on #RemyLebeau's comment, I modified the code to use w variant of the printf APIs to get the output identical with msvc on Windows, matching that of gcc on Unix.
Additionally, instead of changing codepage, I have now used _setmode (FILE translation mode).
/*
This code is modified version of KB sample:
https://www.ibm.com/support/knowledgecenter/en/ssw_ibm_i_73/rtref/vsnprintf.htm
The usage of `setlocale` is required by my real-world scenario,
but can be modified if that fixes the issue.
*/
#include <wchar.h>
#include <stdarg.h>
#include <stdio.h>
#include <locale.h>
#ifdef _WIN32
#include <io.h> //for _setmode
#include <fcntl.h> //for _O_U16TEXT
#endif
void vout(wchar_t *string, wchar_t *fmt, ...)
{
setlocale(LC_CTYPE, "en_US.UTF-8");
va_list arg_ptr;
va_start(arg_ptr, fmt);
vswprintf(string, 100, fmt, arg_ptr);
va_end(arg_ptr);
}
int main(void)
{
setlocale(LC_ALL, "");
#ifdef _WIN32
int oldmode = _setmode(_fileno(stdout), _O_U16TEXT);
#endif
wchar_t string[100];
wchar_t arr[] = { 0x0119, L'\0' };
vout(string, L"%ls", arr);
wprintf(L"This string should have 'ę' (e with ogonek / tail) after colon: %ls\r\n", string);
#ifdef _WIN32
_setmode(_fileno(stdout), oldmode);
#endif
return 0;
}
Alternatively, we can use fwprintf and provide stdout as first argument. To do the same with fwprintf(stderr,format,args) (or perror(format, args)), we would need to _setmode the stderr as well.

C++ vector.assign (contents of char array) works in WinXP-32, fails in Win10-64; why?

On this StackExchange topic, first answer and last comment upon that answer, I learned to assign the contents of my char array (which I just read from file) into my vector. The following code works fine in Windows XP, 32-bit, Visual Studio 2010, but fails in Win10, 64-bit, Studio 2012. Both projects use the Unicode character set. The contents of the file myConfig.txt are (separated by tabs):
words 3 mobius lagrange gauss
I am a complete noob, so if some mistake seems too stupid for anyone to make, go ahead and assume I made it.
The code:
#include "stdafx.h"
#include <windows.h>
#include <vector>
#include <iterator>
#include <string.h>
#include <wchar.h>
#include <tchar.h>
using namespace std;
vector<wchar_t> wvec;
int n;
wchar_t ss[256];
FILE* pfile;
int _tmain(int argc, _TCHAR* argv[])
{
fopen_s(&pfile,"myConfig.txt","r");
fwscanf_s(pfile,L"%ls",&ss);
wprintf(L"var name is %ls\n",ss);
fwscanf_s(pfile,L"%d",&n);
printf("num words is %d\n",n);
for (int i=0; i<3; i++) {
fwscanf_s(pfile,L"%ls",&ss);
wvec.clear();
n=wcslen(ss);
wprintf(L"vec empty %ls length %d\n",vec,vec.size());
wvec.assign(ss,ss+n+1); // +1 to contain null char
wprintf(L"vec sz %d filled %ls\n",wvec.size(),wvec);
}
printf("press Enter to finish\n");
getchar();
return 0;
}
On the Win10 machine, the output says, in part, "vec sz 7 filled ???", when I run/debug from the development environment. When I run the exe in the x64\Release folder, the corresponding line of output says "vec sz 5 filled ???", while the contents of myConfig.txt are exactly the same.
On the XP machine, the output is perfect.
Finally answered my own question: RTFM. It should not be
fscanf_s(pfile,"%s",&s);
It should be
fscanf_s(pfile,"%s",s, _countof(s));
I dunno why it worked in XP (convenient #bytes/char?) but, whatever.

Printing a unicode box in C

I'm trying to print this medium shade unicode box in C: ▒
(I'm doing the exercises in K&R and then got sidetracked on the one about making a histogram...). I know my unix term (Mac OSX) can display the box because I saved a text file with the box, and used cat textfilewithblock and it printed the block.
So far I initially tried:
#include <stdio.h>
#include <wchar.h>
int main(){
wprintf(L"▒\n");
return 0;
}
and nothing printed
iMac-2$ ./a.out
iMac-2:clang vik$
I did a search and found this: unicode hello world for C?
And it seems like I still have to set a locale (even though the executing environment in utf8? I'm still trying to figure out why this step is necessary) But anyway, it works! (after a bit of a struggle finally realizing that the proper string was en_US.UTF-8 rather than en_US.utf8 which I had read somewhere...)
#include <stdio.h>
#include <wchar.h>
#include <locale.h>
int main(){
setlocale (LC_ALL, "en_US.UTF-8");
wprintf(L"▒\n");
return 0;
}
Output is as follows:
iMac-2$ ./a.out
▒
iMac-2$
But when I try the following code...putting in the UTF-8 hex (which I got from here: http://www.utf8-chartable.de/unicode-utf8-table.pl?start=9472&unicodeinhtml=dec ) which is 0xe29692 for the box rather than pasting the box in itself, it doesn't work again.
#include <stdio.h>
#include <wchar.h>
#include <locale.h>
int main(){
setlocale (LC_ALL, "en_US.UTF-8");
wchar_t box = 0xe29692;
wprintf(L"%lc\n", box);
return 0;
}
I'm clearly missing something but can't quite figure out what it is.
The unicode value of the MEDIUM SHADE code point is not 0xe29692, it is 0x2592. <E2><96><92> is the 3 byte encoding for this code point in UTF-8.
You can print this thing either using the wide char APIs:
#include <stdio.h>
#include <wchar.h>
#include <locale.h>
int main(void) {
setlocale(LC_ALL, "en_US.UTF-8");
wchar_t box = 0x2592;
wprintf(L"%lc\n", box); // or simply printf("%lc\n", box);
return 0;
}
Or simply by printing the UTF-8 encoding directly:
#include <stdio.h>
int main(void) {
printf("\xE2\x96\x92\n");
return 0;
}
Or if your text editor encodes the source file in UTF-8:
#include <stdio.h>
int main(void) {
printf("▒\n");
return 0;
}
But be aware that this will not work: putchar('▒');
Also for full unicode support and a few more goodies, I recommend using iTerm2 on MacOS.
The box character is U+2592, which translates to 0xE2 0x96 0x92 in UTF-8. This adaptation of your third program mostly works for me:
#include <stdio.h>
#include <wchar.h>
#include <locale.h>
int main(void)
{
setlocale (LC_ALL, "en_US.UTF-8");
wchar_t box = 0xe29692;
wprintf(L"%lc\n", box);
wprintf(L"\n\nX\n\n");
box = L'\u2592'; //0xE2 0x96 0x92 = U+2592
wprintf(L"%lc\n", box);
wprintf(L"\n\n0x%.8X\n\n", box);
box = 0x2592;
wprintf(L"%lc\n", box);
return 0;
}
The output I get is:
X
▒
0x00002592
▒
The first print operation produces nothing of use; the others work.
Testing on Mac OS X 10.10.5. I happen to be compiling with GCC 5.3.0 (which I compiled), but I got the same output with XCode 7.0.2 and clang.

Resources