Accepting single byte special characters into the windows terminal (testing on cygwin) - c

I am testing a C program in the windows terminal. I mocked up a quick example of the section I am having issues with. The example is as follows:
$ cat test.c
#include <stdio.h>
#include <stdlib.h>
int main() {
char var[6];
scanf("%s", var);
int i=0;
while(var[i] != '\0') {
printf("%x ", var[i]);
i++;
}
return 0;
}
When I use a string with "normal" characters such as "dd" the output is as expected "61 61" (hex 61 is the letter "d"). When I try to input special characters such as í (0xA1 or U+00ED) I get the following output:
$ ./a.exe
í
ffffffc3 ffffffad
The UTF-8 codepage at http://www.utf8-chartable.de/ shows that the backwards 'i' is in fact 0xc3ad. How can I copy and paste this character as 0xA1, as I really want to input 0xA1 into the terminal, not 0xc3ad? I am copy and pasting this from "charmap". I even tried saving a text file in ANSI with the character and copying and pasting but I still get 0xc3ad. Please assist me.
EDIT: Running the same on a mac also gives me c3ad.

Related

I have difficulties with putwchar() in c

Certainly, my problem is not new...., so I apologize if my error is simply too stupid.
I just wanted to become familiar with putwchar and simply wrote the following little piece of code:
#include <stdio.h>
#include <wchar.h>
#include <locale.h>
int main(void)
{
char *locale = setlocale(LC_ALL, "");
printf ("Locale: %s\n", locale);
//setlocale(LC_CTYPE, "de_DE.utf8");
wchar_t hello[]=L"Registered Trademark: ®®\nEuro sign: €€\nBritisch Pound: ££\nYen: ¥¥\nGerman Umlauts: äöüßÄÖÜ\n";
int index = 0;
while (hello[index]!=L'\0'){
//printf("put liefert: %d\n", putwchar(hello[index++]));
putwchar(hello[index++]);
};
}
Now. the output is simply:
Locale: de_DE.UTF-8
Registered Trademark: ��
Euro sign: ��
Britisch Pound: ��
Yen: ��
German Umlauts: �������
\[1\]+ Fertig gedit versuch.c
None of the non-ASCII chars appeared on the screen.
As you see in the comment (and I well noticed that I must not mix putwchar and print in the same program, hence the line is in comment, putwchar returned the proper Unicode codepoint for the character I wanted to print. Thus, the call is supposed to work. (At least to my understanding.)
The c source is coded in utf-8
$ file versuch.c
versuch.c: C source, UTF-8 Unicode text
my system is Ubuntu Linux 20.04.05
compiler: gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04.1)
I would greatly appreciate any advice on this one.
As stated above: I simply expected the trademark sign, yen, € and the umlauts äöüßÄÖÜ to appear.
You shouldn't mix normal and wide output on the same stream.
I get the expected output if I change this early print:
printf ("Locale: %s\n", locale);
into a wide print:
wprintf(L"Locale: %s\n", locale);
Then the subsequent putwchar() calls write the expected characters.
You cannot mix narrow and wide I/O in the same stream (7.21.2). If you want putwchar, you cannot use printf. Start with wprintf instead (with the wide format string):
wprintf (L"Locale: %s\n", locale);
You can simply print those wide characters as shown below:
wprintf(L"Registered Trade Mark: %ls\n", L"®®");
wprintf(L"Euro Sign: %ls\n", L"€€");
wprintf(L"British Pound: %ls\n", L"££");
wprintf(L"Yen: %ls\n", L"¥¥");
wprintf(L"German Umlauts: %ls\n", L"äöüßÄÖÜ");
Please refer:
https://stackoverflow.com/a/37587933/2805824
https://stackoverflow.com/a/7696033/2805824

In linux program cannot erase utf-8 character completely using backspace

I have a C program which waits for user's input
#include <stdio.h>
int main()
{
getchar();
return 0;
}
Now I run it and input some Chinese characters like 测试测试. Then now I click backspace, I found I cannot erase these characters completely(some blank remained)
I found termios has a flag setting IUTF8, but why it doesn't work?
UPDATE ON 2022/12/31:
I am trying to describe my question more detailed, I have a program like this
Now I run it and enter some Chinese characters(without Enter key)
Then I keep clicking Backspace key(until nothing can be erased any more), but half of the content still display on my screen. It's so abnormal, how can I make the erase perform well?
I know it is a stupid question for you. I just want to make it more comfortable when typing some UTF8 characters(like Chinese characters).
I found the shell can handle this well, how can I do to make my program perform the same?
By the way, this is my locale output
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC=zh_CN.UTF-8
LC_TIME=zh_CN.UTF-8
LC_COLLATE="en_US.UTF-8"
LC_MONETARY=zh_CN.UTF-8
LC_MESSAGES="en_US.UTF-8"
LC_PAPER=zh_CN.UTF-8
LC_NAME=zh_CN.UTF-8
LC_ADDRESS=zh_CN.UTF-8
LC_TELEPHONE=zh_CN.UTF-8
LC_MEASUREMENT=zh_CN.UTF-8
LC_IDENTIFICATION=zh_CN.UTF-8
LC_ALL=
Use GNU readline to provide a shell-like interface, with Tab autocompletion, correct input handling, et cetera.
To compile the following example program, make sure you have the libreadline-dev package installed. The readline library needed to run the program will already be installed, because so many applications that are installed by default require it already.
// SPDX-License-Identifier: CC0-1.0
// Compile using
// gcc -Wall -O2 $(pkg-config --cflags readline) example.c $(pkg-config --libs readline) -o example
#define _GNU_SOURCE
#include <stdlib.h>
#include <locale.h>
#include <readline/readline.h>
#include <readline/history.h>
#include <stdio.h>
int main(void)
{
char *line;
setlocale(LC_ALL, "");
while (1) {
line = readline(NULL); // No prompt
// Input line is in 'line'; exit if end of input or empty line.
if (!line || *line == '\0')
break;
// Do something with 'line'
// Discard the dynamically allocated line
free(line);
}
return 0;
}
When using the GNU readline library, the library takes over the standard input, and handles character deletion (and many other things) at the terminal (termios) level. It works absolutely fine with file and pipe inputs as well, and is what e.g. bash shell uses for interactive input.

Latin Capital Letter 'E' with Circumflex (Ê)

In a C program in Windows 10, I should print the word TYCHÊ on the screen, but I cannot print the letter Ê (Hex code: \xCA):
#include <stdlib.h>
#include <stdio.h>
char *Word;
int main(int argc, char* argv[]){
Word = "TYCH\xCA";
printf("%s", Word);
}
What's wrong?
Windows is a pain when it comes to printing Unicode text, but the following should work with all modern compilers (MSVC 19 or later, g++ 9 or greater) on all modern Windows systems (Windows 10 or greater), in both Windows Console and Windows Terminal:
#include <iostream>
#include <windows.h>
int main()
{
SetConsoleOutputCP( CP_UTF8 );
std::cout << "TYCHÊ" << "\n";
}
Make sure your compiler takes UTF-8 as the input character set. For MSVC 19 you need a flag. I think it is the default for later versions, but I am unsure on that point:
cl /EHsc /W4 /Ox /std:c++17 /utf-8 example.cpp
g++ -Wall -Wextra -pedantic-errors -O3 -std=c++17 example.cpp
EDIT: Dangit, I misread the language tag again. :-(
Here’s some C:
#include <stdio.h>
#include <windows.h>
int main()
{
SetConsoleOutputCP( CP_UTF8 );
printf( "%s\n", "TYCHÊ" );
return 0;
}
You can try with this line
printf("%s%c", Word, 0x2580 + 82);
this can print your Ê.
I used CLion for resolve it, on another IDE it may not give the same result.
In the Windows Command Line you should choose the Code Page 65001:
CHCP 65001
If you want to silently do that directly from the source code:
system("CHCP 65001 > NUL");
In the C source code you should use the <locale.h> standard header.
#include <locale.h>
At the beginning of your program execution you can write:
setlocale(LC_ALL, "");
The empty string "" initializes to the default encoding of the underlying system (that you previously choose to be Unicode).
However, this answer of mine is just a patch, not a solution.
It will help you to print the french characters, at most.
Handling encoding in Windows command line is not straight.
See, for example: Command Line and UTF-8 issues

How to read string including escape characters in C using popen() in Linux?

I have to display all the files and sub-directories in a directory in Linux (more specifically, Ubuntu 19.10) using popen() in C. The relevant code is given below. The problem when I debug this code is, that, the "list" variable contains only up to the first "\n" escape character, which is ".:\n". How can I detour so that popen() outputs all the string including escape sequence characters?
#include <stdio.h>
int main()
{
FILE *read_file;
char list[1000];
read_file = popen("ls -R","r");
fgets(list, 1000, read_file);
pclose(read_file);
printf("%s", list);
return(0);
}

behaviour of escape characters

#include <stdio.h>
main()
{
printf("az\b\b");
printf("s\ni");
}
above program when compiled with gcc gives output
sz
i
Can someone help us out to understand the output
That's because your console interprets '\b' as a backspace character.

Resources