Latin Capital Letter 'E' with Circumflex (Ê) - c

In a C program in Windows 10, I should print the word TYCHÊ on the screen, but I cannot print the letter Ê (Hex code: \xCA):
#include <stdlib.h>
#include <stdio.h>
char *Word;
int main(int argc, char* argv[]){
Word = "TYCH\xCA";
printf("%s", Word);
}
What's wrong?

Windows is a pain when it comes to printing Unicode text, but the following should work with all modern compilers (MSVC 19 or later, g++ 9 or greater) on all modern Windows systems (Windows 10 or greater), in both Windows Console and Windows Terminal:
#include <iostream>
#include <windows.h>
int main()
{
SetConsoleOutputCP( CP_UTF8 );
std::cout << "TYCHÊ" << "\n";
}
Make sure your compiler takes UTF-8 as the input character set. For MSVC 19 you need a flag. I think it is the default for later versions, but I am unsure on that point:
cl /EHsc /W4 /Ox /std:c++17 /utf-8 example.cpp
g++ -Wall -Wextra -pedantic-errors -O3 -std=c++17 example.cpp
EDIT: Dangit, I misread the language tag again. :-(
Here’s some C:
#include <stdio.h>
#include <windows.h>
int main()
{
SetConsoleOutputCP( CP_UTF8 );
printf( "%s\n", "TYCHÊ" );
return 0;
}

You can try with this line
printf("%s%c", Word, 0x2580 + 82);
this can print your Ê.
I used CLion for resolve it, on another IDE it may not give the same result.

In the Windows Command Line you should choose the Code Page 65001:
CHCP 65001
If you want to silently do that directly from the source code:
system("CHCP 65001 > NUL");
In the C source code you should use the <locale.h> standard header.
#include <locale.h>
At the beginning of your program execution you can write:
setlocale(LC_ALL, "");
The empty string "" initializes to the default encoding of the underlying system (that you previously choose to be Unicode).
However, this answer of mine is just a patch, not a solution.
It will help you to print the french characters, at most.
Handling encoding in Windows command line is not straight.
See, for example: Command Line and UTF-8 issues

Related

In linux program cannot erase utf-8 character completely using backspace

I have a C program which waits for user's input
#include <stdio.h>
int main()
{
getchar();
return 0;
}
Now I run it and input some Chinese characters like 测试测试. Then now I click backspace, I found I cannot erase these characters completely(some blank remained)
I found termios has a flag setting IUTF8, but why it doesn't work?
UPDATE ON 2022/12/31:
I am trying to describe my question more detailed, I have a program like this
Now I run it and enter some Chinese characters(without Enter key)
Then I keep clicking Backspace key(until nothing can be erased any more), but half of the content still display on my screen. It's so abnormal, how can I make the erase perform well?
I know it is a stupid question for you. I just want to make it more comfortable when typing some UTF8 characters(like Chinese characters).
I found the shell can handle this well, how can I do to make my program perform the same?
By the way, this is my locale output
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC=zh_CN.UTF-8
LC_TIME=zh_CN.UTF-8
LC_COLLATE="en_US.UTF-8"
LC_MONETARY=zh_CN.UTF-8
LC_MESSAGES="en_US.UTF-8"
LC_PAPER=zh_CN.UTF-8
LC_NAME=zh_CN.UTF-8
LC_ADDRESS=zh_CN.UTF-8
LC_TELEPHONE=zh_CN.UTF-8
LC_MEASUREMENT=zh_CN.UTF-8
LC_IDENTIFICATION=zh_CN.UTF-8
LC_ALL=
Use GNU readline to provide a shell-like interface, with Tab autocompletion, correct input handling, et cetera.
To compile the following example program, make sure you have the libreadline-dev package installed. The readline library needed to run the program will already be installed, because so many applications that are installed by default require it already.
// SPDX-License-Identifier: CC0-1.0
// Compile using
// gcc -Wall -O2 $(pkg-config --cflags readline) example.c $(pkg-config --libs readline) -o example
#define _GNU_SOURCE
#include <stdlib.h>
#include <locale.h>
#include <readline/readline.h>
#include <readline/history.h>
#include <stdio.h>
int main(void)
{
char *line;
setlocale(LC_ALL, "");
while (1) {
line = readline(NULL); // No prompt
// Input line is in 'line'; exit if end of input or empty line.
if (!line || *line == '\0')
break;
// Do something with 'line'
// Discard the dynamically allocated line
free(line);
}
return 0;
}
When using the GNU readline library, the library takes over the standard input, and handles character deletion (and many other things) at the terminal (termios) level. It works absolutely fine with file and pipe inputs as well, and is what e.g. bash shell uses for interactive input.

Simple C Program Lags [Homework]

For an assignment I have we are to find vulnerabilities in a certain C program and exploit them using various buffer overflow attacks. However when I run the .out file in the terminal with it's input argument it just stalls and doesn't do anything.
Even when I run GDB, that just lags too. I'm not looking for a solution to the assignment, I'm just looking for reasons why it's not running?
#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#include <string.h>
void partialwin()
{
printf("Achieved 1/2!\n");
}
void fullwin(){
printf("Achieved 2/2\n");
}
void vuln(){
char buffer[36];
gets(buffer);
printf("Buffer contents are %s\n",buffer);
}
int main(int argc,char**argv){
vuln();
}
Providing your sourc file is called assignment1.c and you're using gcc this should work, $ being your command prompt (which could be different on your platform)
$ gcc assignment1.c
$ a.out
Hello
Buffer contents are Hello
$

Linux, field_buffer does not provide a UTF-8 string

In a C program for Linux, with ncursesw and form, I need to read the string stored in a field, with support for UTF-8 characters. When ASCII only is used, it is pretty simple, because the string is stored as an array of char:
char *dest;
...
dest = field_buffer(field[0], 0);
If I try to type a UTF-8 and non-ASCII character in the field with this code the character does not appear and it is not handled. In this answer for UTF-8 it is suggested to use ncursesw. But with the following code (written following this guide)
#define _XOPEN_SOURCE_EXTENDED
#include <ncursesw/form.h>
#include <locale.h>
int main()
{
...
setlocale(LC_ALL, "");
...
initscr();
wchar_t *dest;
...
dest = field_buffer(field[0], 0);
}
the compiler produces an error:
warning: assignment from incompatible pointer type [enabled by default]
dest = field_buffer(field[0], 0);
^
How to obtain from the field an array of wchar_t?
ncursesw uses get_wch instead of getch, so which function does it use instead of field_buffer()? I couldn't find it by googling.
The program is compiled in a system with the following locale:
$ locale
LANG=it_IT.UTF-8
LANGUAGE=
LC_CTYPE="it_IT.UTF-8"
LC_NUMERIC="it_IT.UTF-8"
LC_TIME="it_IT.UTF-8"
LC_COLLATE="it_IT.UTF-8"
LC_MONETARY="it_IT.UTF-8"
LC_MESSAGES="it_IT.UTF-8"
LC_PAPER="it_IT.UTF-8"
LC_NAME="it_IT.UTF-8"
LC_ADDRESS="it_IT.UTF-8"
LC_TELEPHONE="it_IT.UTF-8"
LC_MEASUREMENT="it_IT.UTF-8"
LC_IDENTIFICATION="it_IT.UTF-8"
LC_ALL=
It supports and uses UTF-8 as a default. With a locale like this, when the ncursesw environment is used, the C program should be able to save UTF-8 characters into a char array.
In order to correctly set up ncursesw it is very important to follow all the steps of the mentioned guide. In particular, the program should have the header
#define _XOPEN_SOURCE_EXTENDED
#include <ncursesw/form.h>
#include <stdio.h>
#include <locale.h>
The program should be compiled as
gcc -o executable_file source_file.c -lncursesw -lformw
and the program should contain
setlocale(LC_ALL, "");
before initscr();. With all these conditions satisfied, the string can be saved into a normal char array, as if ncurses and ASCII were used instead of ncursesw and UTF-8. As specified by John Bollinger in the comments, the function field_buffer can only return a char * and so it is unuseful to use any other data type such as wchar_t.

Compiling with Mingw

I have a bunch of C files and header files in the folder. When I compile the C files with MinGW compiler, it shows that there is no such file or directory. But I have all the files in the same folder. How do I get them to compile?
I have attached the code for your reference (file computil.c):
#include <stdio.h>
#include <computil.h>
#include <dataio.h>
int getc_skip_marker_segment(const unsigned short marker, unsigned char **cbufptr, unsigned char *ebufptr)
{
int ret;
unsigned short length;
ret = getc_ushort(&length, cbufptr, ebufptr);
if(ret)return(ret);
length -= 2;
if(((*cbufptr)+length) >= ebufptr)
{
fprintf(stderr, "ERROR : getc_skip_marker_segment : ");
fprintf(stderr, "unexpected end of buffer when parsing ");
fprintf(stderr, "marker %d segment of length %d\n", marker, length);
return(-2); }(*cbufptr) += length; return(0);
}
}
I am compiling it with gcc -c computil.c.
I believe you are going to have to add the current directory to the list of "standard places" that gcc uses. When you use instead of "computil.h", a Unix-style compiler won't look in the current directory.
For a quick fix to that, add -I. to the gcc command line. (dash, capital eye, period):
gcc -I. computil.c
If that's an application include file intended to be found where the source files are found, then you should change the include line to:
#include "computil.h"
That's one of the valuable nuances from Classic C that got lost in the ANSI standardization process. Standard C lets the compiler decide if there's a difference or between <> bracketed and "" quoted headers. It makes a difference in Unix and GNU ("GNU's Not Unix!"), well, pretty much is Unix only better in places.
To put it simple, #include <header.h> means "search in the compiler's own library directories, while #include "header.h means "search in the same directory as the .c file that made the #include".
I don't believe gcc has any library headers named computil.h and dataio.h, so the code won't compile.

Accepting single byte special characters into the windows terminal (testing on cygwin)

I am testing a C program in the windows terminal. I mocked up a quick example of the section I am having issues with. The example is as follows:
$ cat test.c
#include <stdio.h>
#include <stdlib.h>
int main() {
char var[6];
scanf("%s", var);
int i=0;
while(var[i] != '\0') {
printf("%x ", var[i]);
i++;
}
return 0;
}
When I use a string with "normal" characters such as "dd" the output is as expected "61 61" (hex 61 is the letter "d"). When I try to input special characters such as í (0xA1 or U+00ED) I get the following output:
$ ./a.exe
í
ffffffc3 ffffffad
The UTF-8 codepage at http://www.utf8-chartable.de/ shows that the backwards 'i' is in fact 0xc3ad. How can I copy and paste this character as 0xA1, as I really want to input 0xA1 into the terminal, not 0xc3ad? I am copy and pasting this from "charmap". I even tried saving a text file in ANSI with the character and copying and pasting but I still get 0xc3ad. Please assist me.
EDIT: Running the same on a mac also gives me c3ad.

Resources