In linux program cannot erase utf-8 character completely using backspace - c

I have a C program which waits for user's input
#include <stdio.h>
int main()
{
getchar();
return 0;
}
Now I run it and input some Chinese characters like 测试测试. Then now I click backspace, I found I cannot erase these characters completely(some blank remained)
I found termios has a flag setting IUTF8, but why it doesn't work?
UPDATE ON 2022/12/31:
I am trying to describe my question more detailed, I have a program like this
Now I run it and enter some Chinese characters(without Enter key)
Then I keep clicking Backspace key(until nothing can be erased any more), but half of the content still display on my screen. It's so abnormal, how can I make the erase perform well?
I know it is a stupid question for you. I just want to make it more comfortable when typing some UTF8 characters(like Chinese characters).
I found the shell can handle this well, how can I do to make my program perform the same?
By the way, this is my locale output
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC=zh_CN.UTF-8
LC_TIME=zh_CN.UTF-8
LC_COLLATE="en_US.UTF-8"
LC_MONETARY=zh_CN.UTF-8
LC_MESSAGES="en_US.UTF-8"
LC_PAPER=zh_CN.UTF-8
LC_NAME=zh_CN.UTF-8
LC_ADDRESS=zh_CN.UTF-8
LC_TELEPHONE=zh_CN.UTF-8
LC_MEASUREMENT=zh_CN.UTF-8
LC_IDENTIFICATION=zh_CN.UTF-8
LC_ALL=

Use GNU readline to provide a shell-like interface, with Tab autocompletion, correct input handling, et cetera.
To compile the following example program, make sure you have the libreadline-dev package installed. The readline library needed to run the program will already be installed, because so many applications that are installed by default require it already.
// SPDX-License-Identifier: CC0-1.0
// Compile using
// gcc -Wall -O2 $(pkg-config --cflags readline) example.c $(pkg-config --libs readline) -o example
#define _GNU_SOURCE
#include <stdlib.h>
#include <locale.h>
#include <readline/readline.h>
#include <readline/history.h>
#include <stdio.h>
int main(void)
{
char *line;
setlocale(LC_ALL, "");
while (1) {
line = readline(NULL); // No prompt
// Input line is in 'line'; exit if end of input or empty line.
if (!line || *line == '\0')
break;
// Do something with 'line'
// Discard the dynamically allocated line
free(line);
}
return 0;
}
When using the GNU readline library, the library takes over the standard input, and handles character deletion (and many other things) at the terminal (termios) level. It works absolutely fine with file and pipe inputs as well, and is what e.g. bash shell uses for interactive input.

Related

libreadline - fgetc returns different values for the enter key

fgetc returns a different value for the enter key after
calling libreadline's rl_callback_handler_install(). It changes from line feed (\n) to carriage return (\r).
How is this possible? I've read the source but could not figure out what mechanism is used to achieve this.
Also, but less important, is this feature or a bug?
// compile with gcc -o main.o main.c -lreadline
#include <stdio.h>
#include <readline/readline.h>
#include <readline/history.h>
static void foo_rl_callback(char *line)
{
// do stuff
}
static void get_enter_key(void)
{
printf("press enter!\n");
printf("fgetc=%d\n", fgetc(stdin));
}
int main(int argc, char *argv[])
{
printf("readline: %s\n", rl_library_version);
get_enter_key();
rl_callback_handler_install(NULL, foo_rl_callback);
get_enter_key();
rl_callback_handler_remove();
get_enter_key();
return 0;
}
output (assuming user only presses the enter key):
readline: 8.1
press enter!
fgetc=10
press enter!
fgetc=13
press enter!
fgetc=10
I'm not 100% sure but I believe this is being done by prepare_terminal_settings (there are several versions of that function; the link goes to the version that should be being used on any system shipped in the past 15 years or so).
This function uses tcsetattr to twiddle a whole bunch of flags that control the behavior of a Unix terminal or pseudo-terminal. In particular, it turns the ICANON bit off, which means, among many other things, that U+000D CARRIAGE RETURN coming down the serial line is not converted to U+000A LINE FEED anymore.
While readline is active, you should be using only the readline API to interact with the terminal, not fgetc(stdin).

Latin Capital Letter 'E' with Circumflex (Ê)

In a C program in Windows 10, I should print the word TYCHÊ on the screen, but I cannot print the letter Ê (Hex code: \xCA):
#include <stdlib.h>
#include <stdio.h>
char *Word;
int main(int argc, char* argv[]){
Word = "TYCH\xCA";
printf("%s", Word);
}
What's wrong?
Windows is a pain when it comes to printing Unicode text, but the following should work with all modern compilers (MSVC 19 or later, g++ 9 or greater) on all modern Windows systems (Windows 10 or greater), in both Windows Console and Windows Terminal:
#include <iostream>
#include <windows.h>
int main()
{
SetConsoleOutputCP( CP_UTF8 );
std::cout << "TYCHÊ" << "\n";
}
Make sure your compiler takes UTF-8 as the input character set. For MSVC 19 you need a flag. I think it is the default for later versions, but I am unsure on that point:
cl /EHsc /W4 /Ox /std:c++17 /utf-8 example.cpp
g++ -Wall -Wextra -pedantic-errors -O3 -std=c++17 example.cpp
EDIT: Dangit, I misread the language tag again. :-(
Here’s some C:
#include <stdio.h>
#include <windows.h>
int main()
{
SetConsoleOutputCP( CP_UTF8 );
printf( "%s\n", "TYCHÊ" );
return 0;
}
You can try with this line
printf("%s%c", Word, 0x2580 + 82);
this can print your Ê.
I used CLion for resolve it, on another IDE it may not give the same result.
In the Windows Command Line you should choose the Code Page 65001:
CHCP 65001
If you want to silently do that directly from the source code:
system("CHCP 65001 > NUL");
In the C source code you should use the <locale.h> standard header.
#include <locale.h>
At the beginning of your program execution you can write:
setlocale(LC_ALL, "");
The empty string "" initializes to the default encoding of the underlying system (that you previously choose to be Unicode).
However, this answer of mine is just a patch, not a solution.
It will help you to print the french characters, at most.
Handling encoding in Windows command line is not straight.
See, for example: Command Line and UTF-8 issues

Segmentation fault before main when using key args [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 years ago.
Improve this question
I get a segmentation fault before main() when I try to start with command ./somename.o -s 4
Works well when using ./somename.o without key arguments
main.c
#include <stdio.h>
#include <stdlib.h>
#include "input.h"
#include "output.h"
int main(int argc, char** argv) {
input_handler(argc, argv);
pretty_print();
return 0;
}
input.h
#include "data.h"
#include"func.h"
#include <getopt.h>
#include <stdlib.h>
#include <stdio.h>
#include <time.h>
void input_handler(int argc, char** argv);
data.h
#pragma once
void(*func) (void);
void(*input) (void);
static struct Matrix {
int size;
int** A;
}matrix;
GitHub:
https://github.com/sandderson/lab2
EDIT:
added include guards
Also some usefull info:
I use windows subsystem for linux
I compile with makefile and following sequence:
gcc -c func.c
gcc -c input.c
gcc -c main.c
gcc -c output.c
gcc main.o func.o input.o output.o -o Lab2.o
Your call to getopt_long uses "sdi" as the options string, which means that -s, -d and -i are possible options, and that none of them take an argument (since none are followed by a colon). See man getopt for details.
But when you are handling the -s option, you do:
matrix.size = atoi(optarg);
which assumes optarg will be set up to point to an argument. It isn't, because as far as getopt_long is concerned, -s doesn't take an argument. Thus, it has its initial value (NULL) and atoi attempts to use that as a string. Unsurprisingly, a segmentation fault results.
Moreover, your attempt to bracket the error by inserting printf calls fails because you have failed to ensure that the printf is flushed to the actual output device. Stdio buffering makes printf a notoriously inaccurate tool for demonstrating the sequence of actions inside a program; you really cannot assume that an error preceded a call to printf just because the output from the printf was not visible.
Ideally, you should do both of the following (although either one would be sufficient in most cases):
Send debugging output to stderr using fprintf
Terminate debugging lines with a newline character
Eg: fprintf(stderr, "%s\n", "dlfkg");, although you could use a better message.
(Even if you do that, it is possible that the line output to the terminal is overwritten or otherwise fails to be presented as a result of a segfault which occurs soon afterwards. But your odds of seeing the message are a lot better.)
But if you neither of those things, then the most likely outcome is that the characters printed will only be placed in the stdio buffer, where they will stay until the buffer becomes full or a newline is printed (if the device is line-buffered, for which there is no guarantee). When the program blows up as a result of the segfault, the stdio buffers vanish into thin air, so nothing ever gets printed. Thus the non-appearance of the line tells you precisely nothing about the sequence of events.
The small amount of extra typing would have been a lot less than asking this question here and responding to the resulting comments. Just sayin'

Defining _POSIX_C_SOURCE as 2 causes error when changing code page on Windows CMD with MinGW GCC

I've been writing a Linux program that's meant to write non-English characters on the terminal, I've recently been porting it to Windows, and I've run into some issues, when trying to change the code page and the font of the terminal, having the symbolic constant _POSIX_C_SOURCE previously defined seems to change the behavior of the code, and makes it incapable of properly printing non-English characters, for reference, this is my code.
#include <windows.h>
#include <stdio.h>
int main()
{
SetConsoleCP(CP_UTF8)
SetConsoleOutputCP(CP_UTF8)
HANDLE hStdOut = GetStdHandle(STD_OUTPUT_HANDLE);
CONSOLE_FONT_INFOEX cfie;
ZeroMemory(&cfie, sizeof(cfie));
cfie.cbSize = sizeof(cfie);
lstrcpyW(cfie.FaceName, L"Lucida Console");
SetCurrentConsoleFontEx(hStdOut, 0, &cfie);
printf("Ћирилични текст\n");
return 0;
}
This is what the program prints out depending on whether I do or don't define the constant in a command line argument while compiling.
C:\Users\User\Desktop>gcc test.c
C:\Users\User\Desktop>a.exe
Ћириличан текст
C:\Users\User\Desktop>gcc -D_POSIX_C_SOURCE=2 test.c
C:\Users\User\Desktop>a.exe
������������������ ����������
This is because outputting to standard output is done literally byte-by-byte when POSIX compliance is in effect. It uses a different implementation of what is done inside the printf function.

how to make ncurses program working with other linux utils?

Suppose I have a ncurses program which does some job on curses screen, and finally print something to stdout. Call this program c.c, compiled to a.out.
I expect cat $(./a.out) first fire up ncurses, after some action, a.out quits and print c.c to stdout, which is read by cat, and thus print content of file c.c.
#include <stdio.h>
#include <ncurses.h>
int main() {
initscr();
noecho();
cbreak();
printw("hello world");
refresh();
getch();
endwin();
fprintf(stdout, "c.c");
return 0;
}
I also expect ./a.out | xargs vim, ls | ./a.out | xargs less to work.
But when I type ./a.out | xargs vim, hello world never shows up. The command seems not executed in order, vim does not open c.c.
What is the correct way to make a ncurses program to work with other linux utils?
Pipes use the standard output (stdout) and standard input (stdin).
The simplest way - rather than using initscr, which initializes the output to use the standard output, use newterm, which allows you to choose the file descriptors, e.g.,
newterm(NULL, stderr, stdin);
rather than
initscr();
which is (almost) the same as
newterm(NULL, stdout, stdin);
By the way, when you include <ncurses.h> (or <curses.h>), there is no need to include <stdio.h>.
If you wanted to use your program in the middle of a pipe, that is more complicated: you would have to drain the standard input and open the actual terminal device. But that's another question (and has already been answered).
Further reading:
initscr, newterm, endwin, isendwin, set_term, delscreen -
curses screen initialization and manipulation routines
ncurses works by writing a bunch of ansi escapes to stdout, which the terminal will interpret. You can run ./a.out > file and then inspect the file to see what you're actually writing. It'll be immediately obvious why programs are confused:
$ cat -vE file
^[(B^[)0^[[?1049h^[[1;24r^[[m^O^[[4l^[[H^[[Jhello world^[[24;1H^[[?1049l^M^[[?1l^[>c.c
The correct way of doing this is to skip all the graphical/textual UI parts when you detect that stdout is not a terminal, i.e. it's consumed by a program instead of a user:
#include <unistd.h>
#include <stdio.h>
#include <ncurses.h>
int main() {
if(isatty(1)) {
// Output is a terminal. Show stuff to the user.
initscr();
noecho();
cbreak();
printw("hello world");
refresh();
getch();
endwin();
} else {
// Output is consumed by a program.
// Skip UI.
}
fprintf(stdout, "c.c");
return 0;
}
This is the canonical Unix behavior.
If you instead want to force your UI to be shown regardless, you can draw your UI on stderr.

Resources