I am trying to print unicode to the terminal under linux using the wchar_t type defined in the wchar.h header. I have tried the following:
#include <wchar.h>
#include <stdio.h>
int main(int argc, char *argv[])
{
/*
char* direct = "\xc2\xb5";
fprintf(stderr, "%s\n", direct);
*/
wchar_t* dir_lit = L"μ";
wchar_t* uni_lit = L"\u03BC";
wchar_t* hex_lit = L"\xc2\xb5";
fwprintf(stderr,
L"direct: %ls, unicode: %ls, hex: %ls\n",
dir_lit,
uni_lit,
hex_lit);
return 0;
}
and compiled it using gcc -O0 -g -std=c11 -o main main.c.
This produces the output direct: m, unicode: m, hex: ?u (based on a terminal with LANG=en_US.UTF-8). In hex:
00000000 64 69 72 65 63 74 3a 20 6d 2c 20 75 6e 69 63 6f |direct: m, unico|
00000010 64 65 3a 20 6d 2c 20 68 65 78 3a 20 3f 75 0a |de: m, hex: ?u.|
0000001f
The only way that I have managed to obtain the desired output of μ is via the code commented in above (as a char* consisting of hex digits).
I have also tried to print based on the wcstombs funtion:
void print_wcstombs(wchar_t* str)
{
char buffer[100];
wcstombs(buffer, str, sizeof(buffer));
fprintf(stderr, "%s\n", buffer);
}
If I call for example print_wcstombs(dir_lit), then nothing is printed at all, so this approach does not seem to work at all.
I would be contend with the hex digit solution in principle, however, the length of the string is not calulated correctly (should be one, but is two bytes long), so formatting via printf does not work correctly.
Is there any way to handle / print unicode literals the way I intend using the wchar_t type?
With your program as-is, I compiled and ran it to get
direct: ?, unicode: ?, hex: ?u
I then included <locale.h> and added a setlocale(LC_CTYPE, ""); at the very beginning of the main() function, which, when run using a Unicode locale (LANG=en_US.UTF-8), produces
direct: μ, unicode: μ, hex: µ
(Codepoint 0xC2 is  in Unicode and 0xB5 is µ (U+00B5 MICRO SIGN as oppposed to U+03BC GREEK SMALL LETTER MU); hence the characters seen for the 'hex' output; results might vary if using an environment that does not use Unicode for wide characters).
Basically, to output wide characters you need to set the ctype locale so the stdio system knows how to convert them to the multibyte ones expected by the underlying system.
The updated program:
#include <wchar.h>
#include <stdio.h>
#include <locale.h>
int main(int argc, char *argv[])
{
setlocale(LC_CTYPE, "");
wchar_t* dir_lit = L"μ";
wchar_t* uni_lit = L"\u03BC";
wchar_t* hex_lit = L"\xc2\xb5";
fwprintf(stderr,
L"direct: %ls, unicode: %ls, hex: %ls\n",
dir_lit,
uni_lit,
hex_lit);
return 0;
}
I have a program in C, which calculates sha256 hash of input file. It is using the openSSL library. Here is the core of the program:
#include <openssl/sha.h>
SHA256_CTX ctx;
unsigned char buffer[512];
SHA256_Init(&ctx);
SHA256_Update(&ctx, buffer, len);
SHA256_Final(buffer, &ctx);
fwrite(&buffer,32,1,stdout);
I need to change it to calculate sha512 hash instead.
Can I just (naively) change all the names of the functions from SHA256 to SHA512, and then in the last step fwrite 64 bytes, instead of the 32 bytes ? Is that all, or do I have to make more changes ?
Yes, this will work. The man page for the SHA family of functions lists the following:
int SHA256_Init(SHA256_CTX *c);
int SHA256_Update(SHA256_CTX *c, const void *data, size_t len);
int SHA256_Final(unsigned char *md, SHA256_CTX *c);
unsigned char *SHA256(const unsigned char *d, size_t n,
unsigned char *md);
...
int SHA512_Init(SHA512_CTX *c);
int SHA512_Update(SHA512_CTX *c, const void *data, size_t len);
int SHA512_Final(unsigned char *md, SHA512_CTX *c);
unsigned char *SHA512(const unsigned char *d, size_t n,
unsigned char *md);
...
SHA1_Init() initializes a SHA_CTX structure.
SHA1_Update() can be called repeatedly with chunks of the message
to be hashed (len bytes at data).
SHA1_Final() places the message digest in md, which must have space
for SHA_DIGEST_LENGTH == 20 bytes of output, and erases the
SHA_CTX.
The SHA224, SHA256, SHA384 and SHA512 families of functions operate
in the same way as for the SHA1 functions. Note that SHA224 and
SHA256 use a SHA256_CTX object instead of SHA_CTX. SHA384 and
SHA512 use SHA512_CTX. The buffer md must have space for the
output from the SHA variant being used (defined by
SHA224_DIGEST_LENGTH, SHA256_DIGEST_LENGTH, SHA384_DIGEST_LENGTH
and SHA512_DIGEST_LENGTH). Also note that, as for the SHA1()
function above, the SHA224(), SHA256(), SHA384() and SHA512()
functions are not thread safe if md is NULL.
To confirm, let's look at some code segments. First with SHA256:
SHA256_CTX ctx;
unsigned char buffer[512];
char *str = "this is a test";
int len = strlen(str);
strcpy(buffer,str);
SHA256_Init(&ctx);
SHA256_Update(&ctx, buffer, len);
SHA256_Final(buffer, &ctx);
fwrite(&buffer,32,1,stdout);
When run as:
./test1 | od -t x1
Outputs:
0000000 2e 99 75 85 48 97 2a 8e 88 22 ad 47 fa 10 17 ff
0000020 72 f0 6f 3f f6 a0 16 85 1f 45 c3 98 73 2b c5 0c
0000040
Which matches the output of:
echo -n "this is a test" | openssl sha256
Which is:
(stdin)= 2e99758548972a8e8822ad47fa1017ff72f06f3ff6a016851f45c398732bc50c
Now the same code with the changes you suggested:
SHA512_CTX ctx;
unsigned char buffer[512];
char *str = "this is a test";
int len = strlen(str);
strcpy(buffer,str);
SHA512_Init(&ctx);
SHA512_Update(&ctx, buffer, len);
SHA512_Final(buffer, &ctx);
fwrite(&buffer,64,1,stdout);
The output when passed through "od" gives us:
0000000 7d 0a 84 68 ed 22 04 00 c0 b8 e6 f3 35 ba a7 e0
0000020 70 ce 88 0a 37 e2 ac 59 95 b9 a9 7b 80 90 26 de
0000040 62 6d a6 36 ac 73 65 24 9b b9 74 c7 19 ed f5 43
0000060 b5 2e d2 86 64 6f 43 7d c7 f8 10 cc 20 68 37 5c
0000100
Which matches the output of:
echo -n "this is a test" | openssl sha512
Which is:
(stdin)= 7d0a8468ed220400c0b8e6f335baa7e070ce880a37e2ac5995b9a97b809026de626da636ac7365249bb974c719edf543b52ed286646f437dc7f810cc2068375c
I am using macOS and openssl. This works for me:
#include <openssl/sha.h>
#include <stdio.h>
#include <string.h>
int main() {
unsigned char data[] = "some text";
unsigned char hash[SHA512_DIGEST_LENGTH];
SHA512(data, strlen((char *)data), hash);
for (int i = 0; i < SHA512_DIGEST_LENGTH; i++)
printf("%02x", hash[i]);
putchar('\n');
}
I compile using,
~$ gcc -o sha512 sha512.c \
-I /usr/local/opt/openssl/include \
-L /usr/local/opt/openssl/lib \
-lcrypto
~S ./sha512
e2732baedca3eac1407828637de1dbca702c3fc9ece16cf536ddb8d6139cd85dfe7464b8235
b29826f608ccf4ac643e29b19c637858a3d8710a59111df42ddb5
NEW EDIT:
Basically I've provided a example that isn't correct. In my real application the string will of course not always be "C:/Users/Familjen-Styren/Documents/V\u00E5gformer/20140104-0002/text.txt". Instead I will have a input window in java and then I will "escape" the unicode characters to a universal character name. And then it will be "unescaped" in C (I do this to avoid problems with passing multibyte characters from java to c). So here is a example where I actually ask the user to input a string (filename):
#include <stdio.h>
#include <string.h>
int func(const char *fname);
int main()
{
char src[100];
scanf("%s", &src);
printf("%s\n", src);
int exists = func((const char*) src);
printf("Does the file exist? %d\n", exists);
return exists;
}
int func(const char *fname)
{
FILE *file;
if (file = fopen(fname, "r"))
{
fclose(file);
return 1;
}
return 0;
}
And now it will think the universal character names is just a part of the actual filename. So how do I "unescape" the universal character names included in the input?
FIRST EDIT:
So I compile this example like this: "gcc -std=c99 read.c" where 'read.c' is my source file. I need the -std=c99 parameter because I'm using the prefix '\u' for my universal character name. If I change it to '\x' it works fine, and I can remove the -std=c99 parameter. But in my real application the input will not use the prefix '\x' instead it will be using the prefix '\u'. So how do I work around this?
This code gives the desired result but for my real application I can't really use '\x':
#include <stdio.h>
#include <string.h>
int func(const char *fname);
int main()
{
char *src = "C:/Users/Familjen-Styren/Documents/V\x00E5gformer/20140104-0002/text.txt";
int exists = func((const char*) src);
printf("Does the file exist? %d\n", exists);
return exists;
}
int func(const char *fname)
{
FILE *file;
if (file = fopen(fname, "r"))
{
fclose(file);
return 1;
}
return 0;
}
ORIGINAL:
I've found a few examples of how to do this in other programming languages like javascript but I couldn't find any example on how to do this in C. Here is a sample code which produces the same error:
#include <stdio.h>
#include <string.h>
int func(const char *fname);
int main()
{
char *src = "C:/Users/Familjen-Styren/Documents/V\u00E5gformer/20140104-0002/text.txt";
int len = strlen(src); /* This returns 68. */
char fname[len];
sprintf(fname,"%s", src);
int exists = func((const char*) src);
printf("%s\n", fname);
printf("Does the file exist? %d\n", exists); /* Outputs 'Does the file exist? 0' which means it doesn't exist. */
return exists;
}
int func(const char *fname)
{
FILE *file;
if (file = fopen(fname, "r"))
{
fclose(file);
return 1;
}
return 0;
}
If I instead use the same string without universal character names:
#include <stdio.h>
#include <string.h>
int func(const char *fname);
int main()
{
char *src = "C:/Users/Familjen-Styren/Documents/Vågformer/20140104-0002/text.txt";
int exists = func((const char*) src);
printf("Does the file exist? %d\n", exists); /* Outputs 'Does the file exist? 1' which means it does exist. */
return exists;
}
int func(const char *fname)
{
FILE *file;
if (file = fopen(fname, "r"))
{
fclose(file);
return 1;
}
return 0;
}
it will output 'Does the file exist? 1'. Which means it does indeed exist. But the problem is I need to be able to handle universal character. So how do I unescape a string which contains universal character names?
Thanks in advance.
I'm reediting the answer in the hope to make it clearer. First of all I'm assuming you are familiar with this: http://www.joelonsoftware.com/articles/Unicode.html. It is required background knowledge when dealing with character encoding.
Now I'm starting with a simple test program I typed on my linux machine test.c
#include <stdio.h>
#include <string.h>
#include <wchar.h>
#define BUF_SZ 255
void test_fwrite_universal(const char *fname)
{
printf("test_fwrite_universal on %s\n", fname);
printf("In memory we have %d bytes: ", strlen(fname));
for (unsigned i=0; i<strlen(fname); ++i) {
printf("%x ", (unsigned char)fname[i]);
}
printf("\n");
FILE* file = fopen(fname, "w");
if (file) {
fwrite((const void*)fname, 1, strlen(fname), file);
fclose(file);
file = NULL;
printf("Wrote to file successfully\n");
}
}
int main()
{
test_fwrite_universal("file_\u00e5.txt");
test_fwrite_universal("file_å.txt");
test_fwrite_universal("file_\u0436.txt");
return 0;
}
the text file is encoded as UTF-8. On my linux machine my locale is en_US.UTF-8
So I compile and run the program like this:
gcc -std=c99 test.c -fexec-charset=UTF-8 -o test
test
test_fwrite_universal on file_å.txt
In memory we have 11 bytes: 66 69 6c 65 5f c3 a5 2e 74 78 74
Wrote to file successfully
test_fwrite_universal on file_å.txt
In memory we have 11 bytes: 66 69 6c 65 5f c3 a5 2e 74 78 74
Wrote to file successfully
test_fwrite_universal on file_ж.txt
In memory we have 11 bytes: 66 69 6c 65 5f d0 b6 2e 74 78 74
Wrote to file successfully
The text file is in UTF-8, my locale is working of of UTF-8 and the execution character set for char is UTF-8.
In main I call the function fwrite 3 times with character strings. The function prints the strings byte by byte. Then writes a file with that name and write that string into the file.
We can see that "file_\u00e5.txt" and "file_å.txt" are the same: 66 69 6c 65 5f c3 a5 2e 74 78 74
and sure enough (http://www.fileformat.info/info/unicode/char/e5/index.htm) the UTF-8 representation for code point +00E5 is: c3 a5
In the last example I used \u0436 which is a Russian character ж (UTF-8 d0 b6)
Now lets try the same on my windows machine. Here I use mingw and I execute the same code:
C:\test>gcc -std=c99 test.c -fexec-charset=UTF-8 -o test.exe
C:\test>test
test_fwrite_universal on file_å.txt
In memory we have 11 bytes: 66 69 6c 65 5f c3 a5 2e 74 78 74
Wrote to file successfully
test_fwrite_universal on file_å.txt
In memory we have 11 bytes: 66 69 6c 65 5f c3 a5 2e 74 78 74
Wrote to file successfully
test_fwrite_universal on file_╨╢.txt
In memory we have 11 bytes: 66 69 6c 65 5f d0 b6 2e 74 78 74
Wrote to file successfully
So it looks like something went horribly wrong printf is not writing the characters properly and the files on the disk also look wrong.
Two things worth noting: in terms of byte values the file name is the same in both linux and windows. The content of the file is also correct when opened with something like notepad++
The reason for the problem is the C Standard library on windows and the locale. Where on linux the system locale is UTF-8 on windows my default locale is CP-437. And when I call functions such as printf fopen it assumes the input is in CP-437 and there c3 a5 are actually two characters.
Before we look at a proper windows solution lets try to explain why you have different results in file_å.txt vs file_\u00e5.txt.
I believe the key is the encoding of your text file. If I write the same test.c in CP-437:
C:\test>iconv -f UTF-8 -t cp437 test.c > test_lcl.c
C:\test>gcc -std=c99 test_lcl.c -fexec-charset=UTF-8 -o test_lcl.exe
C:\test>test_lcl
test_fwrite_universal on file_å.txt
In memory we have 11 bytes: 66 69 6c 65 5f c3 a5 2e 74 78 74
Wrote to file successfully
test_fwrite_universal on file_å.txt
In memory we have 10 bytes: 66 69 6c 65 5f 86 2e 74 78 74
Wrote to file successfully
test_fwrite_universal on file_╨╢.txt
In memory we have 11 bytes: 66 69 6c 65 5f d0 b6 2e 74 78 74
Wrote to file successfully
I now get a difference between file_å and file_\u00e5. The character å in the file is actually encoded as 0x86. Notice that this time the second string is 10 characters long not 11.
If we look at the file and tell Notepad++ to use UTF-8 we will see a funny result. Same goes to the actual data written to the file.
Finally how to get the damn thing working on windows. Unfortunately It seems that it is impossible to use the standard library with UTF-8 encoded strings. On windows you can't set the C locale to that. see: What is the Windows equivalent for en_US.UTF-8 locale?.
However we can work around this with wide characters:
#include <stdio.h>
#include <string.h>
#include <windows.h>
#define BUF_SZ 255
void test_fopen_windows(const char *fname)
{
wchar_t buf[BUF_SZ] = {0};
int sz = MultiByteToWideChar(CP_UTF8, 0, fname, strlen(fname), (LPWSTR)buf, BUF_SZ-1);
wprintf(L"converted %d characters\n", sz);
wprintf(L"Converting to wide characters %s\n", buf);
FILE* file =_wfopen(buf, L"w");
if (file) {
fwrite((const void*)fname, 1, strlen(fname), file);
fclose(file);
wprintf(L"Wrote file %s successfully\n", buf);
}
}
int main()
{
test_fopen_windows("file_\u00e5.txt");
return 0;
}
To compile use:
gcc -std=gnu99 -fexec-charset=UTF-8 test_wide.c -o test_wide.exe
_wfopen is not ANSI compliant and -std=c99 actually means STRICT_ANSI so you should use gnu99 to have that function.
Wrong array size (forgot the .txt and \0 and that an encoded non-ASCII char takes up more than 1 byte.)
// length of the string without the universal character name.
// C:/Users/Familjen-Styren/Documents/Vågformer/20140104-0002/text
// 123456789012345678901234567890123456789012345678901234567890123
// 1 2 3 4 5 6
// int len = 63;
// C:/Users/Familjen-Styren/Documents/Vågformer/20140104-0002/text.txt
int len = 100;
char *src = "C:/Users/Familjen-Styren/Documents/V\u00E5gformer/20140104-0002/text.txt";
char fname[len];
// or if you can use VLA
char fname[strlen(src)+1];
sprintf(fname, "%s", src);
I've created a C program to write to a serial port (/dev/ttyS0) on an embedded ARM system. The kernel running on the embedded ARM system is Linux version 3.0.4, built with the same cross-compiler as the one listed below.
My cross-compiler is arm-linux-gcc (Buildroot 2011.08) 4.3.6, running on an Ubuntu x86_64 host (3.0.0-14-generic #23-Ubuntu SMP). I have used the stty utility to set up the serial port from the command line.
Mysteriously, it seems that the program will refuse to run on the embedded ARM system if a single line of code is present. If the line is removed, the program will run.
Here is a full code listing replicating the problem:
EDIT: I now close the file on error, as suggested in the comments below.
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdint.h>
#include <string.h>
#include <errno.h>
#include <termios.h>
int test();
void run_experiment();
int main()
{
run_experiment();
return 0;
}
void run_experiment()
{
printf("Starting program\n");
test();
}
int test()
{
int fd;
int ret;
fd = open("/dev/ttyS0", O_RDWR | O_NOCTTY);
printf("fd = %u\n", fd);
if (fd < 0)
{
close(fd);
return 0;
}
fcntl(fd, F_SETFL, 0);
printf("Now writing to serial port\n");
//TODO:
// segfault occurs due to line of code here
// removing this line causes the program to run properly
ret = write( fd, "test\r\n", sizeof("test\r\n") );
if (ret < 0)
{
close(fd);
return 0;
}
close(fd);
return 1;
}
The output of this program on the ARM system is the following:
Segmentation fault
However, if I remove the line listed above and recompile the program, the problem goes away, and the output is the following:
Starting program
fd = 3
Now writing to serial port
What could be going wrong here, and how do I debug the problem? Would this be an issue with the code, with the cross-compiler compiler, or with a version of the OS?
I have also tried various combinations of O_WRONLY and O_RDWR without O_NOCTTY when opening the file, but the problem still persists.
As suggested by #wildplasser in the comments below, I have replaced the test function with the following code, heavily based on the code at another site (http://www.warpspeed.com.au/cgi-bin/inf2html.cmd?..\html\book\Toolkt40\XPG4REF.INF+112).
However, the program still doesn't run, and I receive the mysterious Segmentation Fault again.
Here is the code:
int test()
{
int fh;
FILE *fp;
char *cp;
if (-1 == (fh = open("/dev/ttyS0", O_RDWR)))
{
perror("Unable to open");
return EXIT_FAILURE;
}
if (NULL == (fp = fdopen(fh, "w")))
{
perror("fdopen failed");
close(fh);
return EXIT_FAILURE;
}
for (cp = "hello world\r\n"; *cp; cp++)
fputc( *cp, fp);
fclose(fp);
return 0;
}
This is very mysterious, since using other programs that I have written, I can use the write() function in a similar fashion to write to sysfs files, without any problem.
HOWEVER, if the program is exactly in the same structure, then I cannot write to /dev/null.
BUT I can successfully write to a sysfs file using exactly the same program!
If the segfault occurred at a particular line in the function, then I would assume that the function call would be causing the segfault. However, the full program does not run!
UPDATE: To provide more information, here is the cross-compiler information used to build on ARM system:
$ arm-linux-gcc --v
Using built-in specs.
Target: arm-unknown-linux-uclibcgnueabi
Configured with: /media/RESEARCH/SAS2-version2/device-system/buildroot/buildroot-2011.08/output/toolchain/gcc-4.3.6/configure --prefix=/media/RESEARCH/SAS2-version2/device-system/buildroot/buildroot-2011.08/output/host/usr --build=x86_64-unknown-linux-gnu --host=x86_64-unknown-linux-gnu --target=arm-unknown-linux-uclibcgnueabi --enable-languages=c,c++ --with-sysroot=/media/RESEARCH/SAS2-version2/device-system/buildroot/buildroot-2011.08/output/host/usr/arm-unknown-linux-uclibcgnueabi/sysroot --with-build-time-tools=/media/RESEARCH/SAS2-version2/device-system/buildroot/buildroot-2011.08/output/host/usr/arm-unknown-linux-uclibcgnueabi/bin --disable-__cxa_atexit --enable-target-optspace --disable-libgomp --with-gnu-ld --disable-libssp --disable-multilib --enable-tls --enable-shared --with-gmp=/media/RESEARCH/SAS2-version2/device-system/buildroot/buildroot-2011.08/output/host/usr --with-mpfr=/media/RESEARCH/SAS2-version2/device-system/buildroot/buildroot-2011.08/output/host/usr --disable-nls --enable-threads --disable-decimal-float --with-float=soft --with-abi=aapcs-linux --with-arch=armv5te --with-tune=arm926ej-s --disable-largefile --with-pkgversion='Buildroot 2011.08' --with-bugurl=http://bugs.buildroot.net/
Thread model: posix
gcc version 4.3.6 (Buildroot 2011.08)
Here is the makefile that I am using to compile my code:
CC=arm-linux-gcc
CFLAGS=-Wall
datacollector: datacollector.o
clean:
rm -f datacollector datacollector.o
UPDATE: Using the debugging suggestions given in the comments and answers below, I found that the segfault was caused by including the \r escape sequence in the string. For some strange reason, the compiler doesn't like the \r escape sequence, and will cause a segfault without running the code.
If the \r escape sequence is removed, then the code runs as expected.
Thus, the offending line of code should be the following:
ret = write( fd, "test\n", sizeof("test\n") );
So for the record, a full test program that actually runs is the following (could someone comment?):
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdint.h>
#include <string.h>
#include <errno.h>
#include <termios.h>
int test();
void run_experiment();
int main()
{
run_experiment();
return 0;
}
void run_experiment()
{
printf("Starting program\n");
fflush(stdout);
test();
}
int test()
{
int fd;
int ret;
char *msg = "test\n";
// NOTE: This does not work and will cause a segfault!
// even if the fflush is called after each printf,
// the program will still refuse to run
//char *msg = "test\r\n";
fd = open("/dev/ttyS0", O_RDWR | O_NOCTTY);
printf("fd = %u\n", fd);
fflush(stdout);
if (fd < 0)
{
close(fd);
return 0;
}
fcntl(fd, F_SETFL, 0);
printf("Now writing to serial port\n");
fflush(stdout);
ret = write( fd, msg, strlen(msg) );
if (ret < 0)
{
close(fd);
return 0;
}
close(fd);
return 1;
}
EDIT: As an aside to all of this, is it better to use:
ret = write( fd, msg, sizeof(msg) );
or is it better to use:
ret = write( fd, msg, strlen(msg) );
Which is better? Is it better to use sizeof() or strlen()? It appears that some of the data in the string is truncated and not written to the serial port using the sizeof() function.
As I understand from Pavel's comment below, it is better to use strlen() if msg is declared as char*.
Moreover, it appears that gcc is not creating a proper binary when the escape sequence \r is being used to write to a tty.
Referring to the last test program given in my post above, the following line of code causes a segfault without the program running:
char *msg = "test\r\n";
As suggested by Igor in the comments, I have run the gdb debugger on the binary with the offending line of code. I had to compile the program with the -g switch.
The gdb debugger is being run natively on the ARM system, and all binaries are being built for the ARM architecture on the host using the same Makefile. All binaries are being built using the arm-linux-gcc cross-compiler.
The output of gdb (running natively on the ARM system) is as follows:
GNU gdb 6.8
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "arm-unknown-linux-uclibcgnueabi"...
"/programs/datacollector": not in executable format: File format not recognized
(gdb) run
Starting program:
No executable file specified.
Use the "file" or "exec-file" command.
(gdb) file datacollector
"/programs/datacollector": not in executable format: File format not recognized
(gdb)
However, if I change the single line of code to the following, the binary compiles and runs properly. Note that the \r escape sequence is missing:
char *msg = "test\n";
Here is the output of gdb after changing the single line of code:
GNU gdb 6.8
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "arm-unknown-linux-uclibcgnueabi"...
(gdb) run
Starting program: /programs/datacollector
Starting program
fd = 4
Now writing to serial port
test
Program exited normally.
(gdb)
UPDATE:
As suggested by Zack in an answer below, I have now ran a test program on the embedded
Linux system. Although Zack gives a detailed script to run on the embedded system, I was
unable to run the script due to the lack of development tools (compiler and headers) installed in the root file system.
In lieu of installing these tools, I simply compiled the nice test program that Zack provided in the script and
used the strace utility. The strace utility was run on the embedded system.
At last, I think that I understand what is happening.
The bad binary was transferred to the embedded system over FTP, using an SPI-to-Ethernet bridge (KSZ8851SNL).
There is a driver for the KSZ8851SNL in the Linux kernel.
It appears that either the Linux kernel driver, the proftpd server software running on the embedded system, or the actual hardware itself (KSZ8851SNL)
was somehow corrupting the binary. The binary runs well on the embedded system.
Here is the output of strace on the testz binary transferred to the embedded Linux system over the Ethernet serial link:
Bad binary tests:
# strace ./testz /dev/null
execve("./testz", ["./testz", "/dev/null"], [/* 17 vars */]) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|0x4000000, -1, 0) = 0x40089000
--- SIGSEGV (Segmentation fault) # 0 (0) ---
+++ killed by SIGSEGV +++
Segmentation fault
# strace ./testz /dev/ttyS0
execve("./testz", ["./testz", "/dev/ttyS0"], [/* 17 vars */]) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|0x4000000, -1, 0) = 0x400ca000
--- SIGSEGV (Segmentation fault) # 0 (0) ---
+++ killed by SIGSEGV +++
Segmentation fault
#
Here is the output of strace on the testz binary transferred on SD card to the embedded Linux system:
Good binary tests:
# strace ./testz /dev/null
execve("./testz", ["./testz", "/dev/null"], [/* 17 vars */]) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|0x4000000, -1, 0) = 0x40058000
open("/lib/libc.so.0", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0755, st_size=298016, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|0x4000000, -1, 0) = 0x400b8000
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0(\0\1\0\0\0\240\230\0\0004\0\0\0"..., 4096) = 4096
mmap2(NULL, 348160, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40147000
mmap2(0x40147000, 290576, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 3, 0) = 0x40147000
mmap2(0x40196000, 4832, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0x47) = 0x40196000
mmap2(0x40198000, 14160, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x40198000
close(3) = 0
munmap(0x400b8000, 4096) = 0
stat("/lib/ld-uClibc.so.0", {st_mode=S_IFREG|0755, st_size=25296, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|0x4000000, -1, 0) = 0x400c4000
set_tls(0x400c4470, 0x400c4470, 0x4007b088, 0x400c4b18, 0x40) = 0
mprotect(0x40196000, 4096, PROT_READ) = 0
mprotect(0x4007a000, 4096, PROT_READ) = 0
ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B115200 opost isig icanon echo ...}) = 0
ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, {B115200 opost isig icanon echo ...}) = 0
open("/dev/null", O_RDWR|O_NOCTTY|O_NONBLOCK) = 3
write(3, "1\n", 2) = 2
write(3, "12\n", 3) = 3
write(3, "123\n", 4) = 4
write(3, "1234\n", 5) = 5
write(3, "12345\n", 6) = 6
write(3, "1\r\n", 3) = 3
write(3, "12\r\n", 4) = 4
write(3, "123\r\n", 5) = 5
write(3, "1234\r\n", 6) = 6
close(3) = 0
exit_group(0) = ?
# strace ./testz /dev/ttyS0
execve("./testz", ["./testz", "/dev/ttyS0"], [/* 17 vars */]) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|0x4000000, -1, 0) = 0x400ed000
open("/lib/libc.so.0", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0755, st_size=298016, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|0x4000000, -1, 0) = 0x40176000
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0(\0\1\0\0\0\240\230\0\0004\0\0\0"..., 4096) = 4096
mmap2(NULL, 348160, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40238000
mmap2(0x40238000, 290576, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 3, 0) = 0x40238000
mmap2(0x40287000, 4832, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0x47) = 0x40287000
mmap2(0x40289000, 14160, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x40289000
close(3) = 0
munmap(0x40176000, 4096) = 0
stat("/lib/ld-uClibc.so.0", {st_mode=S_IFREG|0755, st_size=25296, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|0x4000000, -1, 0) = 0x400d1000
set_tls(0x400d1470, 0x400d1470, 0x40084088, 0x400d1b18, 0x40) = 0
mprotect(0x40287000, 4096, PROT_READ) = 0
mprotect(0x40083000, 4096, PROT_READ) = 0
ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B115200 opost isig icanon echo ...}) = 0
ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, {B115200 opost isig icanon echo ...}) = 0
open("/dev/ttyS0", O_RDWR|O_NOCTTY|O_NONBLOCK) = 3
write(3, "1\n", 21
) = 2
write(3, "12\n", 312
) = 3
write(3, "123\n", 4123
) = 4
write(3, "1234\n", 51234
) = 5
write(3, "12345\n", 612345
) = 6
write(3, "1\r\n", 31
) = 3
write(3, "12\r\n", 412
) = 4
write(3, "123\r\n", 5123
) = 5
write(3, "1234\r\n", 61234
) = 6
close(3) = 0
exit_group(0) = ?
EDIT: Read on for gory details, but the quick answer is, your FTP client is corrupting your program. This is an intentional feature of FTP, which can be turned off by typing binary at the FTP prompt before get whatever or put whatever. If you're using a graphical FTP client it should have a checkbox somewhere with the same effect. Or switch to scp, which does not have this inconvenient feature.
First off, there is no difference in the generated assembly code
between (one of the) working object files and the broken object file.
$ objdump -dr dc-good.o > dc-good.s
$ objdump -dr dc-bad.o > dc-bad.s
$ diff -u dc-good.s dc-bad.s
--- dc-good.s 2012-01-21 08:20:05.318518596 -0800
+++ dc-bad.s 2012-01-21 08:20:10.954566852 -0800
## -1,5 +1,5 ##
-dc-good.o: file format elf32-littlearm
+dc-bad.o: file format elf32-littlearm
Disassembly of section .text:
In fact, there are only two bytes that differ between the good and
bad object files. (You misunderstood what I was asking for with
"test\r\n" versus "testX\n": I wanted the two strings to be the
same length, so that everything would have the same offset in the
object files. Fortunately, your compiler padded the shorter string to
the same length as the longer string, so everything has the same
offset anyway.)
$ hd dc-good.o > dc-good.x
$ hd dc-bad.o > dc-bad.x
$ diff -u1 dc-good.x dc-bad.x
--- dc-good.x 2012-01-21 08:17:28.713174977 -0800
+++ dc-bad.x 2012-01-21 08:17:39.129264489 -0800
## -154,3 +154,3 ##
00000990 53 74 61 72 74 69 6e 67 20 70 72 6f 67 72 61 6d |Starting program|
-000009a0 00 00 00 00 74 65 73 74 58 0a 00 00 2f 64 65 76 |....testX.../dev|
+000009a0 00 00 00 00 74 65 73 74 58 0d 0a 00 2f 64 65 76 |....testX.../dev|
000009b0 2f 74 74 79 53 30 00 00 66 64 20 3d 20 25 75 0a |/ttyS0..fd = %u.|
## -223,3 +223,3 ##
00000de0 61 72 69 65 73 2f 64 61 74 61 63 6f 6c 6c 65 63 |aries/datacollec|
-00000df0 74 6f 72 2d 62 61 64 2d 62 69 6e 61 72 79 2d 32 |tor-bad-binary-2|
+00000df0 74 6f 72 2d 62 61 64 2d 62 69 6e 61 72 79 2d 31 |tor-bad-binary-1|
00000e00 00 46 49 4c 45 00 5f 5f 73 74 61 74 65 00 5f 5f |.FILE.__state.__|
The first difference is the difference that should be there: 74 65 73
74 58 0a 00 00 is the correct encoding of "test\n" (with one byte
of padding), 74 65 73 74 58 0d 0a 00 is the correct encoding of
"test\r\n". The other difference appears to be debugging
information: the name of the directory in which you compiled the
programs. This is harmless.
The object files are as they should be, so at this point we can rule
out a bug in the compiler or the assembler. Now let's look at the
executables.
$ hd dc-good > dc-good.xe
$ hd dc-bad > dc-bad.xe
$ diff -u1 dc-good.xe dc-bad.xe
--- dc-good.xe 2012-01-21 08:31:33.456437417 -0800
+++ dc-bad.xe 2012-01-21 08:31:38.388480238 -0800
## -120,3 +120,3 ##
00000770 f0 af 1b e9 53 74 61 72 74 69 6e 67 20 70 72 6f |....Starting pro|
-00000780 67 72 61 6d 00 00 00 00 74 65 73 74 58 0a 00 00 |gram....testX...|
+00000780 67 72 61 6d 00 00 00 00 74 65 73 74 58 0d 0a 00 |gram....testX...|
00000790 2f 64 65 76 2f 74 74 79 53 30 00 00 66 64 20 3d |/dev/ttyS0..fd =|
## -373,3 +373,3 ##
00001750 63 6f 6c 6c 65 63 74 6f 72 2d 62 61 64 2d 62 69 |collector-bad-bi|
-00001760 6e 61 72 79 2d 32 00 46 49 4c 45 00 5f 5f 73 74 |nary-2.FILE.__st|
+00001760 6e 61 72 79 2d 31 00 46 49 4c 45 00 5f 5f 73 74 |nary-1.FILE.__st|
00001770 61 74 65 00 5f 5f 67 63 73 00 73 74 64 6f 75 74 |ate.__gcs.stdout|
Same two differences, different offsets within the executable. This
is also as it should be. We can rule out a bug in the linker as well
(if it was screwing up the address of the string, it would have to be
screwing it up the same way in both executables and they both ought to
crash).
At this point I think we are looking at a bug in your C library or
kernel. To pin it down further, I would like you to try this test
script. Run it as sh testz.sh on the ARM board, and send us the
complete output.
#! /bin/sh
set -e
cat >testz.c <<\EOF
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#define W(f, s) write(f, s, sizeof s - 1)
int
main(int ac, char **av)
{
int f;
if (ac != 2) return 2;
f = open(av[1], O_RDWR|O_NOCTTY|O_NONBLOCK);
if (f == -1) return 1;
W(f, "1\n");
W(f, "12\n");
W(f, "123\n");
W(f, "1234\n");
W(f, "12345\n");
W(f, "1\r\n");
W(f, "12\r\n");
W(f, "123\r\n");
W(f, "1234\r\n");
close(f);
return 0;
}
EOF
arm-linux-gcc -Wall -g testz.c -o testz
set +e
strace ./testz /dev/null
echo ----
strace ./testz /dev/ttyS0
echo ----
exit 0
I've looked at the damaged binary you provided and now I know what's wrong.
$ ls -l testz*
-rwxr-x--- 1 zack zack 7528 Dec 31 1979 testz-bad
-rwxr-x--- 1 zack zack 7532 Jan 21 16:35 testz-good
Ignore the odd datestamp; see how the -bad version is four bytes smaller than the -good version? There were exactly four \r characters in the source code. Let's have a look at the differences in the hex dumps. I've pulled out the interesting bit of the diff and shuffled it around a little to make it easier to see what's going on.
00000620 00 00 00 00 31 32 33 34 0a 00 00 00 31 32 33 34 |....1234....1234|
-00000630 35 0a 00 00 31 0d 0a 00 31 32 0d 0a 00 00 00 00 |5...1...12......|
+00000630 35 0a 00 00 31 0a 00 31 32 0a 00 00 00 00 31 32 |5...1..12.....12|
-00000640 31 32 33 0d 0a 00 00 00 31 32 33 34 0d 0a 00 00 |123.....1234....|
+00000640 33 0a 00 00 00 31 32 33 34 0a 00 00 00 00 00 00 |3....1234.......|
-00000650 00 00 00 00 68 84 00 00 1c 84 00 00 00 00 00 00 |....h...........|
+00000650 68 84 00 00 1c 84 00 00 00 00 00 00 01 00 00 00 |h...............|
The file transfer is replacing 0d 0a (that is, \r\n) sequences with 0a (just \n). This causes everything after this point in the file to be displaced four bytes from where it's supposed to be. The code is before this point, and so are all the ELF headers that the kernel looks at, which is why you don't get
execve("./testz-bad", ["./testz-bad", "/dev/null"], [/* 36 vars */]) = -1 ENOEXEC (Exec format error)
from the test script; instead, you get a segfault inside the dynamic loader, because the DYNAMIC segment (which tells the dynamic loader what to do) is after the displacement starts.
$ readelf -d testz-bad 2> /dev/null
Dynamic section at offset 0x660 contains 13 entries:
Tag Type Name/Value
0x00000035 (<unknown>: 35) 0xc
0x0000832c (<unknown>: 832c) 0xd
0x00008604 (<unknown>: 8604) 0x19
0x00010654 (<unknown>: 10654) 0x1b
0x00000004 (HASH) 0x1a
0x00010658 (<unknown>: 10658) 0x1c
0x00000004 (HASH) 0x4
0x00008108 (<unknown>: 8108) 0x5
0x0000825c (<unknown>: 825c) 0x6
0x0000815c (<unknown>: 815c) 0xa
0x00000098 (<unknown>: 98) 0xb
0x00000010 (SYMBOLIC) 0x15
0x00000000 (NULL) 0x3
Contrast:
$ readelf -d testz-good
Dynamic section at offset 0x660 contains 18 entries:
Tag Type Name/Value
0x00000001 (NEEDED) Shared library: [libc.so.0]
0x0000000c (INIT) 0x832c
0x0000000d (FINI) 0x8604
0x00000019 (INIT_ARRAY) 0x10654
0x0000001b (INIT_ARRAYSZ) 4 (bytes)
0x0000001a (FINI_ARRAY) 0x10658
0x0000001c (FINI_ARRAYSZ) 4 (bytes)
0x00000004 (HASH) 0x8108
0x00000005 (STRTAB) 0x825c
0x00000006 (SYMTAB) 0x815c
0x0000000a (STRSZ) 152 (bytes)
0x0000000b (SYMENT) 16 (bytes)
0x00000015 (DEBUG) 0x0
0x00000003 (PLTGOT) 0x10718
0x00000002 (PLTRELSZ) 56 (bytes)
0x00000014 (PLTREL) REL
0x00000017 (JMPREL) 0x82f4
0x00000000 (NULL) 0x0
The debugging information is also after the displacement, which is why gdb didn't like the program.
So why this very particular corruption? It's not a bug in anything; it's an intentional feature of your FTP client, which defaults to transferring files in "text mode", which means (among other things) that it converts DOS-style line endings (\r\n) to Unix-style (\n). Because that would be what you wanted if this were 1991 and you were transferring text files off your IBM PC to your institutional file server. It is basically never what is wanted nowadays, even if you are moving text files around. Fortunately, you can turn it off: just type binary at the FTP prompt before the file transfer commands. *Un*fortunately, as far as I know there is no way to make that stick; you have to do that every time. I recommend switching to scp, which always transfers files verbatim and is also easier to operate from build automation.
First things first - the fact that you only see the seg fault is NOT indicative that the program failed to run at all. What happens is that the output from the printf calls is line buffered, and when the program seg faults, it's never written out.
If you add
fflush(stdout);
after every printf, you'll see your output prior to the segfault.
Now, in your original program, what's the point of the fcntl(fd, F_SETFL, 0); call? What are you trying to achieve with it? Are you trying to turn off non-blocking mode? What if you don't make that call?
As to your second test, I see that you are using perror, but again the lack of error messages doesn't tell you that the program isn't running - it just tells you that you didn't get any error messages, and you still aren't flushing stdout, so you'll never see the printf from run_experiment.
I also see that in your second test you're doing an fdopen with read mode, then trying to write to that FILE pointer. While that certainly shouldn't crash, it also certainly shouldn't work.
Now, outside of your program, are you sure the serial port works OK? Try doing 'cat > /dev/ttyS0' and see what happens, just to be sure it's not something wonky with the hardware.