(Android NDK) Strings containing non-ASCII characters get cut off - c

I'm trying to port a program written in C to Android using the NDK and JNI, and I'm stuck with a ridiculous problem which is driving me crazy.
To make it short, if I do this...
char str[1024];
sprintf(str, "Hellö, this is söme stränge letters.");
...strlen(str) returns 35, as expected. Right?
But if I include a specifier, and do this...
char str[1024];
sprintf(str, "Hellö again. Here's a number: %d", 1);
...strlen(str) returns 4.
Do you see what's happening? It appears the NDK can't (or won't?) accept non-ASCII characters in strings, if I try to format them.
Any time I include an ASCII character >127 in the format string, it just gets cut off. Like it was NULL-terminated.
Is this a bug? Is this expected behaviour?
Ultimately, my question is: What can I do to solve this?
Many thanks in advance.

A "preview" version of Android 5.0 had some issues that were fixed in the final release. See this bug report for more information.
If you get a hex dump of the .o file (with e.g. xxd on Linux) and search for a fragment of the string, you can see how it's encoded in the executable. If it's valid UTF-8 -- I get c3 b6 for 'ö' when I compile with desktop gcc -- then it should work. If it's using some other encoding, the Android libc may reject it as invalid.
If the string in the binary doesn't appear to be UTF-8, check your makefiles for things like -fexec-charset=.

Related

Can I change the text color through sprintf in C?

I'm new to C and I came across this code and it was confusing me:
sprintf(banner1, "\e[37╔═╗\e[37┌─┐\e[37┌┐┌\e[37┌─┐\e[37┌─┐\e[37┌─┐\e[37┌─┐\e[37m\r\n");
sprintf(banner2, "\e[37╠═╝\e[37├─┤\e[37│││\e[37│ ┬\e[37├─┤\e[37├┤\e[37 ├─┤\e[37m\r\n");
sprintf(banner3, "\e[37╩ \e[37┴ ┴┘\e[37└┘\e[37└─┘\e[37┴ ┴\e[37└─┘\e[37┴ ┴\e[37m\r\n");
I was just confused as I don't know what do \e[37 and \r\n mean. And can I change the colors?
This looks like an attempt to use ANSI terminal color escapes and Unicode box drawing characters to write the word "PANGAEA" in a large, stylized, colorful manner. I'm guessing it's part of a retro-style BBS or MUD system, intended to be interacted with over telnet or ssh. It doesn't work, because whoever wrote it made a bunch of mistakes. Here's a corrected, self-contained program:
#include <stdio.h>
int main(void)
{
printf("\e[31m╔═╗\e[32m┌─┐ \e[33m┌┐┌\e[34m┌─┐\e[35m┌─┐\e[36m┌─┐\e[37m┌─┐\e[0m\n");
printf("\e[31m╠═╝\e[32m├─┤ \e[33m│││\e[34m│ ┬\e[35m├─┤\e[36m├┤ \e[37m├─┤\e[0m\n");
printf("\e[31m╩ \e[32m┴ ┴┘\e[33m┘└┘\e[34m└─┘\e[35m┴ ┴\e[36m└─┘\e[37m┴ ┴\e[0m\n");
return 0;
}
The mistakes were: using \r\n instead of plain \n, leaving out the m at the end of each and every escape sequence, and a number of typos in the actual letters (missing spaces and the like).
I deliberately changed sprintf(bannerN, ... to printf to make it a self-contained program instead of a fragment of a larger system, and changed the actual color codes used for each letter to make it a more interesting demo. When I run this program on my computer I get this output:
The program will only work on your computer if your terminal emulator supports both ANSI color escapes and printing UTF-8 with no special ceremony. Most Unix-style operating systems nowadays support both by default; I don't know about Windows.

Why does sprintf not work when \"%s\" is used?

I'm using sprintf with the IMXRT1021 NXP microcontroller but not getting the required output.
Library: Redlib (nohost-nf)
I have tried both ways but the result is the same.
sprintf(at,"AT=\x22%s\x22,\x22%s\x22\r\n","abcdef","123456");
sprintf(at,"AT=\"%s\",\"%s\"\r\n","abcdef","123456");
Expected output:
AT="abcdef","123456"\r\n
Actual output:
AT=\"abcdef\",\"123456\"\r\n
It depends on what you are talking about.
If you were to output this into a terminal, the string you would see is the one you expected:
AT="abcdef","123456" # plus newline etc.
However, the C representation of that string is:
"AT=\"abcdef\",\"123456\"\r\n"

Stray 377 and 376

I am new to Linux OS and i am trying to compile a simpe c program
, I wrote it in using text editor
#include<stdio.h>
void main(){
printf("Hello!");
}
I typed gcc -o main main.c
and the following issue shows up
main.c:1:1: error: stray '\377' in program
# i n c l u d e < s t d i o . h >
main.c:1:2: error: stray '\376' in program
This happens whenever i run c or c++ program
\377 and \376 are an octal representation of the bytes that constitute the value 0xFEFF, the UTF-16 byte order marker. Your compiler doesn't expect those characters in your source code.
You need to change the encoding of your source file to either be UTF-8 or ASCII. Given the number of text editors that exist and the lack of that information in your question I cannot list every possibility for how to do that.
You could just do this in a bash shell:
cat > program.c
// File content here
^D
This will create a file called "program.c" with "// File content here" as its content, in UTF-8.
Your text editor is saving the program in the wrong character encoding. Save it as ASCII plain text and try again.
There is no text but encoded text.
With your editor, you have chosen to save your text file with the UTF-16LE character encoding (presumably).
Any program that reads a text file must know the character encoding of the text file. It could accept one documented character encoding (only or default) and/or allow you to tell it which you used.
This could work
gcc -finput-charset=UTF16-LE main.c
but since you have include files, the include files must use the same character encoding. On my system, they use UTF-8 (and include ©, which is good because gcc chokes on the bytes for that, letting me know that I've messed up).
Note: It's not very common to save a C source file (or most any text file) with UTF-16. UTF-8 is very common for all types of text files. (ASCII is also not very common, either. You might not find it as an option in many text editors. Historically, MS-DOS did not support it and Windows only got it very late and only for the sake of completeness.)

wprintf with UNICODE (Hebrew) characters

I have a wchar_t array with English and Hebrew characters and when I print it with wprintf() it prints to console the English characters only. When I'm using _wsetlocale( LC_ALL, L"Hebrew" ) I get the Hebrew characters as "????".
The machine I'm working on supports Hebrew of course.
BTW - using c:\windows\system32\cmd.exe and 'dir' on a directory with Hebrew characters, also shows "???" instead of Hebrew.
Any idea?
Have you confirmed that your console font can handle unicode characters? Most don't. You might try the Consolas font.
When I've run into this before, I've found this article by Michael Kaplan to be extremely helpful.
Basically Microsoft's C runtime library isn't implemented very well to allow this.
You can do _setmode(_fileno(stdout), _O_U16TEXT); and then writing with wcout or wprintf will work. However trying to use cout or printf, or anything that doesn't write UTF-16 will then cause the program to crash.

Writing to a file in Unicode

I am having some problems writing to a file in unicode inside my c program. I am trying to write a unicode Japanese string to a file. When I go to check the file though it is empty. If I try a non-unicode string it works just fine. What am I doing wrong?
setlocale(LC_CTYPE, "");
FILE* f;
f = _wfopen(COMMON_FILE_PATH,L"w");
fwprintf(f,L"日本語");
fclose(f);
Oh about my system:
I am running Windows. And my IDE is Visual Studio 2008.
You might need to add the encoding to the mode. Possibly this:
f = _wfopen(COMMON_FILE_PATH,L"w, ccs=UTF-16LE");
Doing the same with fopen() works for me here. I'm using Mac OS X, so I don't have _wfopen(); assuming _wfopen() isn't returning bad stuff to you, your code should work.
Edit: I tested on cygwin, too - it also seems to work fine.
I cannot find a reference to _wfopen on either of my boxes, I however don't see why opening it with fopen should cause a problem, all you need is a file pointer.
What matters is if or not C recognizes the internal Unicode's values and pushes those binary values to the file properly.
Try just using fopen as Carl suggested, it should work properly.
Edit: if it still doesn't work you may try defining the characters as their integer values and pushing them with fwprintf(), I know that's cumbersome and not a good fix in the long run, but it should work as well.

Resources