Append text to an existing UTF16LE file - c

How can I write to an existing file with UTF16LE encoding? I've already used fopen(file, "a"); but the resulting file will be like this:
<?xml version="1.0" encoding="UTF-16" standalone="yes"?>
㰼㱤㱯㱣㰾㰊㰼㱰㱡㱧㱥㰠㱮㱡㱭㱥㰽㰢㱎㱏㱒㱍㱁㱌㰢㰾㰊㰼㱦㱩㱥㱬㱤㰠㱮㱡㱭㱥㰽㰢㱉㱤㱥㱮㱴㱩㱦㱩㱣㱡㱴㱩㱯㱮㸢㱔㱃㰳㰶㰰㰴㰰㰱㰭㰭㰭㰭㰱㰲㰷㰼㰯㱦㱩㱥㱬㱤㰾㰊㰼㱦㱩㱥㱬㱤㰠㱮㱡㱭㱥㰽㰢㱔㱲㱡㱣㱥㱡㱢㱩㱬㱩㱴㱹㸢㰱㰳㱖㱖㱖㰭㰭㰭㰭㰭㰭㰭㰭㰭㰭㰭㰭㰭㰭㰭㰭㰰㰰㰼㰯㱦㱩㱥㱬㱤㰾㰊㰼㱦㱩㱥㱬㱤㰠㱮㱡㱭㱥㰽㰢㱄㱥㱳㱣㱲㱩㱰㱴㱩㱯㱮㸢㱄㱥㱳㱣㱲㱩㱰㱴㱩㱯㱮㰀㰼㰯㱦㱩㱥㱬㱤㰾㰊㰼㰯㱰㱡㱧㱥㰾㰊㰼㰯㱤㱯㱣㰾㰊
I don't know how I can append a 2-byte character to this file.

A UTF-16 character is not necessarily 2 bytes wide. It may be 2 bytes
or 4 bytes (read up here).
The weird output you have posted most likely result from appending wchar_ts
directly to the file, generating UTF-16 characters with a byte order that is
the reverse of the right one, and these UTF-16 characters lie up in the
"oriental" heights of the UTF-16 range.
Assuming from your question's tags that you are working with GCC on Linux,
you may use the iconv library by including <inconv.h> to import
a character-encoding conversion api. Here is a specimen program
that converts the wchar_t array:
L'A',L'P',L'P',L'E',L'N',L'D',L'A',L'G',L'E' // "APPENDAGE"
to UTF-16LE and appends the result to the file "tdata.txt". It hard-codes
a limit of 64 bytes on the converted length of output.
#include <stdio.h>
#include <stdlib.h>
#include <iconv.h>
#include <assert.h>
#define MAXOUT 64
int main(void)
{
wchar_t appendage [] = {
L'A',L'P',L'P',L'E',L'N',L'D',L'A',L'G',L'E'
};
wchar_t * inp = appendage;
char converted[MAXOUT];
char * outp = converted;
size_t remain_in = sizeof(appendage);
size_t remain_out = MAXOUT;
size_t conversions;
size_t written;
char const *tfile = "../tdata.txt";
// Create the right converter from wchar_t to UTF-16LE
iconv_t iconvdesc = iconv_open("UTF-16LE","WCHAR_T");
if (iconvdesc == (iconv_t) -1) {
perror("error: conversion from wchar_t to UTF-16LE is not available");
exit(EXIT_FAILURE);
}
FILE * fp = fopen(tfile,"a");
if (!fp) {
fprintf(stderr,"error: cannot open \"%s\" for append\n",tfile,stderr);
perror(NULL);
exit(EXIT_FAILURE);
}
// Do the conversion.
conversions =
iconv(iconvdesc, (char **)&inp, &remain_in, (char **)&outp, &remain_out);
if (conversions == (size_t)-1) {
perror("error: iconv() failed");
exit(EXIT_FAILURE);
}
assert(remain_in == 0);
// Write the UTF-16LE
written = fwrite(converted,1,MAXOUT - remain_out,fp);
assert(written == MAXOUT - remain_out);
fclose(fp);
iconv_close(iconvdesc);
exit(EXIT_SUCCESS);
}
For GCC, wchar_t is 4 bytes wide, hence wide enough for any UTF-16. For
Microsoft's compilers it is 2-bytes wide.
Documentation of <iconv.h> is here

Related

Why can't my editor understand characters like é, è, à [duplicate]

My setup: gcc-4.9.2, UTF-8 environment.
The following C-program works in ASCII, but does not in UTF-8.
Create input file:
echo -n 'привет мир' > /tmp/вход
This is test.c:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define SIZE 10
int main(void)
{
char buf[SIZE+1];
char *pat = "привет мир";
char str[SIZE+2];
FILE *f1;
FILE *f2;
f1 = fopen("/tmp/вход","r");
f2 = fopen("/tmp/выход","w");
if (fread(buf, 1, SIZE, f1) > 0) {
buf[SIZE] = 0;
if (strncmp(buf, pat, SIZE) == 0) {
sprintf(str, "% 11s\n", buf);
fwrite(str, 1, SIZE+2, f2);
}
}
fclose(f1);
fclose(f2);
exit(0);
}
Check the result:
./test; grep -q ' привет мир' /tmp/выход && echo OK
What should be done to make UTF-8 code work as if it was ASCII code - not to bother how many bytes a symbol takes, etc. In other words: what to change in the example to treat any UTF-8 symbol as a single unit (that includes argv, STDIN, STDOUT, STDERR, file input, output and the program code)?
#define SIZE 10
The buffer size of 10 is insufficient to store the UTF-8 string привет мир. Try changing it to a larger value. On my system (Ubuntu 12.04, gcc 4.8.1), changing it to 20, worked perfectly.
UTF-8 is a multibyte encoding which uses between 1 and 4 bytes per character. So, it is safer to use 40 as the buffer size above.
There is a big discussion at How many bytes does one Unicode character take? which might be interesting.
Siddhartha Ghosh's answer gives you the basic problem. Fixing your code requires more work, though.
I used the following script (chk-utf8-test.sh):
echo -n 'привет мир' > вход
make utf8-test
./utf8-test
grep -q 'привет мир' выход && echo OK
I called your program utf8-test.c and amended the source like this, removing the references to /tmp, and being more careful with lengths:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define SIZE 40
int main(void)
{
char buf[SIZE + 1];
char *pat = "привет мир";
char str[SIZE + 2];
FILE *f1 = fopen("вход", "r");
FILE *f2 = fopen("выход", "w");
if (f1 == 0 || f2 == 0)
{
fprintf(stderr, "Failed to open one or both files\n");
return(1);
}
size_t nbytes;
if ((nbytes = fread(buf, 1, SIZE, f1)) > 0)
{
buf[nbytes] = 0;
if (strncmp(buf, pat, nbytes) == 0)
{
sprintf(str, "%.*s\n", (int)nbytes, buf);
fwrite(str, 1, nbytes, f2);
}
}
fclose(f1);
fclose(f2);
return(0);
}
And when I ran the script, I got:
$ bash -x chk-utf8-test.sh
+ '[' -f /etc/bashrc ']'
+ . /etc/bashrc
++ '[' -z '' ']'
++ return
+ alias 'r=fc -e -'
+ echo -n 'привет мир'
+ make utf8-test
gcc -O3 -g -std=c11 -Wall -Wextra -Werror utf8-test.c -o utf8-test
+ ./utf8-test
+ grep -q 'привет мир' $'в?\213?\205од'
+ echo OK
OK
$
For the record, I was using GCC 5.1.0 on Mac OS X 10.10.3.
This is more of a corollary to the other answers, but I'll try to explain this from a slightly different angle.
Here is Jonathan Leffler's version of your code, with three slight changes: (1) I made explicit the actual individual bytes in the UTF-8 strings; and (2) I modified the sprintf formatting string width specifier to hopefully do what you are actually attempting to do. Also tangentially (3) I used perror to get a slightly more useful error message when something fails.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define SIZE 40
int main(void)
{
char buf[SIZE + 1];
char *pat = "\320\277\321\200\320\270\320\262\320\265\321\202"
" \320\274\320\270\321\200"; /* "привет мир" */
char str[SIZE + 2];
FILE *f1 = fopen("\320\262\321\205\320\276\320\264", "r"); /* "вход" */
FILE *f2 = fopen("\320\262\321\213\321\205\320\276\320\264", "w"); /* "выход" */
if (f1 == 0 || f2 == 0)
{
perror("Failed to open one or both files"); /* use perror() */
return(1);
}
size_t nbytes;
if ((nbytes = fread(buf, 1, SIZE, f1)) > 0)
{
buf[nbytes] = 0;
if (strncmp(buf, pat, nbytes) == 0)
{
sprintf(str, "%*s\n", 1+(int)nbytes, buf); /* nbytes+1 length specifier */
fwrite(str, 1, 1+nbytes, f2); /* +1 here too */
}
}
fclose(f1);
fclose(f2);
return(0);
}
The behavior of sprintf with a positive numeric width specifier is to pad with spaces from the left, so the space you tried to use is superfluous. But you have to make sure the target field is wider than the string you are printing in order for any padding to actually take place.
Just to make this answer self-contained, I will repeat what others have already said. A traditional char is always exactly one byte, but one character in UTF-8 is usually not exactly one byte, except when all your characters are actually ASCII. One of the attractions of UTF-8 is that legacy C code doesn't need to know anything about UTF-8 in order to continue to work, but of course, the assumption that one char is one glyph cannot hold. (As you can see, for example, the glyph п in "привет мир" maps to the two bytes -- and hence, two chars -- "\320\277".)
This is clearly less than ideal, but demonstrates that you can treat UTF-8 as "just bytes" if your code doesn't particularly care about glyph semantics. If yours does, you are better off switching to wchar_t as outlined e.g. here: http://www.gnu.org/software/libc/manual/html_node/Extended-Char-Intro.html
However, the standard wchar_t is less than ideal when the standard expectation is UTF-8. See e.g. the GNU libunistring documentation for a less intrusive alternative, and a bit of background. With that, you should be able to replace char with uint8_t and the various str* functions with u8_str* replacements and be done. The assumption that one glyph equals one byte will still need to be addressed, but that becomes a minor technicality in your example program. An adaptation is available at http://ideone.com/p0VfXq (though unfortunately the library is not available on http://ideone.com/ so it cannot be demonstrated there).
The following code works as required:
#include <stdio.h>
#include <locale.h>
#include <stdlib.h>
#include <wchar.h>
#define SIZE 10
int main(void)
{
setlocale(LC_ALL, "");
wchar_t buf[SIZE+1];
wchar_t *pat = L"привет мир";
wchar_t str[SIZE+2];
FILE *f1;
FILE *f2;
f1 = fopen("/tmp/вход","r");
f2 = fopen("/tmp/выход","w");
fgetws(buf, SIZE+1, f1);
if (wcsncmp(buf, pat, SIZE) == 0) {
swprintf(str, SIZE+2, L"% 11ls", buf);
fputws(str, f2);
}
fclose(f1);
fclose(f2);
exit(0);
}
Probably your test.c file is not stored in UTF-8 format and for that reason "привет мир" string is ASCII - and the comparison failed. Change text encoding of source file and try again.

fgets statement reads first line and not sure how to modify because I have to return a pointer [duplicate]

I need to copy the contents of a text file to a dynamically-allocated character array.
My problem is getting the size of the contents of the file; Google reveals that I need to use fseek and ftell, but for that the file apparently needs to be opened in binary mode, and that gives only garbage.
EDIT: I tried opening in text mode, but I get weird numbers. Here's the code (I've omitted simple error checking for clarity):
long f_size;
char* code;
size_t code_s, result;
FILE* fp = fopen(argv[0], "r");
fseek(fp, 0, SEEK_END);
f_size = ftell(fp); /* This returns 29696, but file is 85 bytes */
fseek(fp, 0, SEEK_SET);
code_s = sizeof(char) * f_size;
code = malloc(code_s);
result = fread(code, 1, f_size, fp); /* This returns 1045, it should be the same as f_size */
The root of the problem is here:
FILE* fp = fopen(argv[0], "r");
argv[0] is your executable program, NOT the parameter. It certainly won't be a text file. Try argv[1], and see what happens then.
You cannot determine the size of a file in characters without reading the data, unless you're using a fixed-width encoding.
For example, a file in UTF-8 which is 8 bytes long could be anything from 2 to 8 characters in length.
That's not a limitation of the file APIs, it's a natural limitation of there not being a direct mapping from "size of binary data" to "number of characters."
If you have a fixed-width encoding then you can just divide the size of the file in bytes by the number of bytes per character. ASCII is the most obvious example of this, but if your file is encoded in UTF-16 and you happen to be on a system which treats UTF-16 code points as the "native" internal character type (which includes Java, .NET and Windows) then you can predict the number of "characters" to allocate as if UTF-16 were fixed width. (UTF-16 is variable width due to Unicode characters above U+FFFF being encoded in multiple code points, but a lot of the time developers ignore this.)
I'm pretty sure argv[0] won't be an text file.
Give this a try (haven't compiled this, but I've done this a bazillion times, so I'm pretty sure it's at least close):
char* readFile(char* filename)
{
FILE* file = fopen(filename,"r");
if(file == NULL)
{
return NULL;
}
fseek(file, 0, SEEK_END);
long int size = ftell(file);
rewind(file);
char* content = calloc(size + 1, 1);
fread(content,1,size,file);
return content;
}
If you're developing for Linux (or other Unix-like operating systems), you can retrieve the file-size with stat before opening the file:
#include <stdio.h>
#include <sys/stat.h>
int main() {
struct stat file_stat;
if(stat("main.c", &file_stat) != 0) {
perror("could not stat");
return (1);
}
printf("%d\n", (int) file_stat.st_size);
return (0);
}
EDIT: As I see the code, I have to get into the line with the other posters:
The array that takes the arguments from the program-call is constructed this way:
[0] name of the program itself
[1] first argument given
[2] second argument given
[n] n-th argument given
You should also check argc before trying to use a field other than '0' of the argv-array:
if (argc < 2) {
printf ("Usage: %s arg1", argv[0]);
return (1);
}
argv[0] is the path to the executable and thus argv[1] will be the first user submitted input. Try to alter and add some simple error-checking, such as checking if fp == 0 and we might be ble to help you further.
You can open the file, put the cursor at the end of the file, store the offset, and go back to the top of the file, and make the difference.
You can use fseek for text files as well.
fseek to end of file
ftell the offset
fseek back to the begining
and you have size of the file
Kind of hard with no sample code, but fstat (or stat) will tell you how big the file is. You allocate the memory required, and slurp the file in.
Another approach is to read the file a piece at a time and extend your dynamic buffer as needed:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define PAGESIZE 128
int main(int argc, char **argv)
{
char *buf = NULL, *tmp = NULL;
size_t bufSiz = 0;
char inputBuf[PAGESIZE];
FILE *in;
if (argc < 2)
{
printf("Usage: %s filename\n", argv[0]);
return 0;
}
in = fopen(argv[1], "r");
if (in)
{
/**
* Read a page at a time until reaching the end of the file
*/
while (fgets(inputBuf, sizeof inputBuf, in) != NULL)
{
/**
* Extend the dynamic buffer by the length of the string
* in the input buffer
*/
tmp = realloc(buf, bufSiz + strlen(inputBuf) + 1);
if (tmp)
{
/**
* Add to the contents of the dynamic buffer
*/
buf = tmp;
buf[bufSiz] = 0;
strcat(buf, inputBuf);
bufSiz += strlen(inputBuf) + 1;
}
else
{
printf("Unable to extend dynamic buffer: releasing allocated memory\n");
free(buf);
buf = NULL;
break;
}
}
if (feof(in))
printf("Reached the end of input file %s\n", argv[1]);
else if (ferror(in))
printf("Error while reading input file %s\n", argv[1]);
if (buf)
{
printf("File contents:\n%s\n", buf);
printf("Read %lu characters from %s\n",
(unsigned long) strlen(buf), argv[1]);
}
free(buf);
fclose(in);
}
else
{
printf("Unable to open input file %s\n", argv[1]);
}
return 0;
}
There are drawbacks with this approach; for one thing, if there isn't enough memory to hold the file's contents, you won't know it immediately. Also, realloc() is relatively expensive to call, so you don't want to make your page sizes too small.
However, this avoids having to use fstat() or fseek()/ftell() to figure out how big the file is beforehand.

Reading text file with non-english character in C

Is it possible to read a text file hat has non-english text?
Example of text in file:
E 37
SVAR:
Fettembolisyndrom. (1 poäng)
Example of what is present in buffer which stores "fread" output using "puts" :
E 37 SVAR:
Fettembolisyndrom.
(1 poäng)
Under Linux my program was working fine but in Windows I am seeing this problem with non-english letters. Any advise how this can be fixed?
Program:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <string.h>
int debug = 0;
int main(int argc, char* argv[])
{
if (argc < 2)
{
puts("ERROR! Please enter a filename\n");
exit(1);
}
else if (argc > 2)
{
debug = atoi(argv[2]);
puts("Debugging mode ENABLED!\n");
}
FILE *fp = fopen(argv[1], "rb");
fseek(fp, 0, SEEK_END);
long fileSz = ftell(fp);
fseek(fp, 0, SEEK_SET);
char* buffer;
buffer = (char*) malloc (sizeof(char)*fileSz);
size_t readSz = fread(buffer, 1, fileSz, fp);
rewind(fp);
if (readSz == fileSz)
{
char tmpBuff[100];
fgets(tmpBuff, 100, fp);
if (!ferror(fp))
{
printf("100 characters from text file: %s\n", tmpBuff);
}
else
{
printf("Error encounter");
}
}
if (strstr("FRÅGA",buffer) == NULL)
{
printf("String not found!");
}
return 0;
}
Sample output
Text file
Summary: If you read text from a file encoded in UTF-8 and display it on the console you must either set the console to UTF-8 or transcode the text from UTF-8 to the encoding used by the console (in English-speaking countries, usually MS-DOS code page 437 or 850).
Longer explanation
Bytes are not characters and characters are not bytes. The char data type in C holds a byte, not a character. In particular, the character Å (Unicode <U+00C5>) mentioned in the comments can be represented in many ways, called encodings:
In UTF-8 it is two bytes, '\xC3' '\x85';
In UTF-16 it is two bytes, either '\xC5' '\x00' (little-endian UTF-16), or '\x00' '\xC5' (big-endian UTF-16);
In Latin-1 and Windows-1252, it is one byte, '\xC5';
In MS-DOS code page 437 and code page 850, it is one byte, '\x8F'.
It is the responsibility of the programmer to translate between the internal encoding used by the program (usually but not always Unicode), the encoding used in input or output files, and the encoding expected by the display device.
Note: Sometimes, if the program does not do much with the characters it reads and outputs, one can get by just by making sure that the input files, the output files, and the display device all use the same encoding. In Linux, this encoding is almost always UTF-8. Unfortunately, on Windows the existence of multiple encodings is a fact of life. System calls expect either UTF-16 or Windows-1252. By default, the console displays Code Page 437 or 850. Text files are quite often in UTF-8. Windows is old and complicated.

how to read and process utf-8 characters in one char in c from the file

how i can to read and process utf-8 characters in one char in c from the file
this is my code
FILE *file = fopen(fileName, "rb");
char *code;
size_t n = 0;
if (file == NULL) return NULL;
fseek(file, 0, SEEK_END);
long f_size = ftell(file);
fseek(file, 0, SEEK_SET);
code = malloc(f_size);
char a,b;
while (!feof(file)) {
fscanf(file, "%c", &a);
code[n++] = a;
// i want to modify "a" (current char) in here
}
code[n] = '\0';
this is file content
~”م‘‎iاk·¶;R0ثp9´
-پ‘“گAéI‚sہئzOU,HدلKŒ©َض†ُ­ ت6‘گA=…¢¢³qد4â9àr}hw O‍Uجy.4a³‎M;£´`د$r(q¸Œçً£F 6pG|ںJr(TîsشR
Chars can commonly hold 255 different values (1 byte), or in other words, just the ASCII table (it could use the extended table if you make it unsigned). For handling UTF-8 characters i would recommend using another type like wchar_t (if a wide character in your compiler means as an UTF-8), otherwise use char_32 if you're using C++11, or a library to deal with your data like ICU.
Edit
This example code explains how to deal with UTF-8 in C. Note that you have to make sure that wchar_t in your compiler can store an UTF-8.
#include <stdio.h>
#include <locale.h>
#include <stdlib.h>
#include <wchar.h>
main() {
FILE *file=fopen("Testing.txt", "r, ccs=UTF-8");
wchar_t sentence[100000], ch=1;
int n=0;
char*loc = setlocale(LC_ALL, "");
printf("Locale set to: %s\n", loc);
if(file==NULL){
printf("Error processing file\n");
} else {
while((ch = fgetwc(file)) != 65535){
/* The end of file value may vary depending of the wchar_t!*/
/* wprintf(L"%lc", ch); */
sentence[n]=ch+1; /*Example modification*/
n++;
}
}
fclose(file);
file=fopen("Testing.txt", "w, ccs=UTF-8");
fputws(sentence, file);
wprintf(L"%ls", sentence);
fclose(file);
return 0;
}
Your system locale
The char*loc = setlocale(LC_ALL, ""); will help you see your current system locale. Make sure is in UTF-8 if your using linux, if you're using windows then you'll have to stick to one language. This is not a problem if you don't want to print the characters.
How to open the file
Firstly, I opened it for reading it as text file instead of reading it as binary file. Also I have to open the file using the UTF-8 formating (I think in linux it will be as your locale, so the ccs=UTF-8 won't be necessary). Even though in windows we're stuck with one language, the file still has to be read in UTF-8.
Using compatible functions with the characters
For this we'll use the functions inside the wchar.h library (like wprintf and fgetwc). The problem with the other functions is that they are limited to the range of a char, giving the wrong value.
I used as an example this:
¿khñà?
hello
~”م‘‎iاk·¶;R0ثp9´ -پ‘“گAéI‚sہئzOU,HدلKŒ©َض†ُ­ ت6‘گA=…¢¢³qد4â9àr}hw O‍Uجy.4a³‎M;£´`د$r(q¸Œçً£F 6pG|ںJr(TîsشR
In the last part of the program It overwrites the file with the acumulated modified string.
You could try changing sentence[n]=ch+1; to sentence[n]=ch; to check in your original file if it reads and outputs the file correctly (and uncomment the wprintf to check the output).

C read binary stdin

I'm trying to build an instruction pipeline simulator and I'm having a lot of trouble getting started. What I need to do is read binary from stdin, and then store it in memory somehow while I manipulate the data. I need to read in chunks of exactly 32 bits one after the other.
How do I read in chunks of exactly 32 bits at a time? Secondly, how do I store it for manipulation later?
Here's what I've got so far, but examining the binary chunks I read further, it just doesn't look right, I don't think I'm reading exactly 32 bits like I need.
char buffer[4] = { 0 }; // initialize to 0
unsigned long c = 0;
int bytesize = 4; // read in 32 bits
while (fgets(buffer, bytesize, stdin)) {
memcpy(&c, buffer, bytesize); // copy the data to a more usable structure for bit manipulation later
// more stuff
buffer[0] = 0; buffer[1] = 0; buffer[2] = 0; buffer[3] = 0; // set to zero before next loop
}
fclose(stdin);
How do I read in 32 bits at a time (they are all 1/0, no newlines etc), and what do I store it in, is char[] okay?
EDIT: I'm able to read the binary in but none of the answers produce the bits in the correct order — they are all mangled up, I suspect endianness and problems reading and moving 8 bits around ( 1 char) at a time — this needs to work on Windows and C ... ?
What you need is freopen(). From the manpage:
If filename is a null pointer, the freopen() function shall attempt to change the mode of the stream to that specified by mode, as if the name of the file currently associated with the stream had been used. In this case, the file descriptor associated with the stream need not be closed if the call to freopen() succeeds. It is implementation-defined which changes of mode are permitted (if any), and under what circumstances.
Basically, the best you can really do is this:
freopen(NULL, "rb", stdin);
This will reopen stdin to be the same input stream, but in binary mode. In the normal mode, reading from stdin on Windows will convert \r\n (Windows newline) to the single character ASCII 10. Using the "rb" mode disables this conversion so that you can properly read in binary data.
freopen() returns a filehandle, but it's the previous value (before we put it in binary mode), so don't use it for anything. After that, use fread() as has been mentioned.
As to your concerns, however, you may not be reading in "32 bits" but if you use fread() you will be reading in 4 chars (which is the best you can do in C - char is guaranteed to be at least 8 bits but some historical and embedded platforms have 16 bit chars (some even have 18 or worse)). If you use fgets() you will never read in 4 bytes. You will read in at least 3 (depending on whether any of them are newlines), and the 4th byte will be '\0' because C strings are nul-terminated and fgets() nul-terminates what it reads (like a good function). Obviously, this is not what you want, so you should use fread().
Consider using SET_BINARY_MODE macro and setmode:
#ifdef _WIN32
# include <io.h>
# include <fcntl.h>
# define SET_BINARY_MODE(handle) setmode(handle, O_BINARY)
#else
# define SET_BINARY_MODE(handle) ((void)0)
#endif
More details about SET_BINARY_MODE macro here: "Handling binary files via standard I/O"
More details about setmode here: "_setmode"
I had to piece the answer together from the various comments from the kind people above, so here is a fully-working sample that works - only for Windows, but you can probably translate the windows-specific stuff to your platform.
#include "stdafx.h"
#include "stdio.h"
#include "stdlib.h"
#include "windows.h"
#include <io.h>
#include <fcntl.h>
int main()
{
char rbuf[4096];
char *deffile = "c:\\temp\\outvideo.bin";
size_t r;
char *outfilename = deffile;
FILE *newin;
freopen(NULL, "rb", stdin);
_setmode(_fileno(stdin), _O_BINARY);
FILE *f = fopen(outfilename, "w+b");
if (f == NULL)
{
printf("unable to open %s\n", outfilename);
exit(1);
}
for (;; )
{
r = fread(rbuf, 1, sizeof(rbuf), stdin);
if (r > 0)
{
size_t w;
for (size_t nleft = r; nleft > 0; )
{
w = fwrite(rbuf, 1, nleft, f);
if (w == 0)
{
printf("error: unable to write %d bytes to %s\n", nleft, outfilename);
exit(1);
}
nleft -= w;
fflush(f);
}
}
else
{
Sleep(10); // wait for more input, but not in a tight loop
}
}
return 0;
}
For Windows, this Microsoft _setmode example specifically shows how to change stdin to binary mode:
// crt_setmode.c
// This program uses _setmode to change
// stdin from text mode to binary mode.
#include <stdio.h>
#include <fcntl.h>
#include <io.h>
int main( void )
{
int result;
// Set "stdin" to have binary mode:
result = _setmode( _fileno( stdin ), _O_BINARY );
if( result == -1 )
perror( "Cannot set mode" );
else
printf( "'stdin' successfully changed to binary mode\n" );
}
fgets() is all wrong here. It's aimed at human-readable ASCII text terminated by end-of-line characters, not binary data, and won't get you what you need.
I recently did exactly what you want using the read() call. Unless your program has explicitly closed stdin, for the first argument (the file descriptor), you can use a constant value of 0 for stdin. Or, if you're on a POSIX system (Linux, Mac OS X, or some other modern variant of Unix), you can use STDIN_FILENO.
fread() suits best for reading binary data.
Yes, char array is OK, if you are planning to process them bytewise.
I don't know what OS you are running, but you typically cannot "open stdin in binary". You can try things like
int fd = fdreopen (fileno (stdin), outfname, O_RDONLY | OPEN_O_BINARY);
to try to force it. Then use
uint32_t opcode;
read(fd, &opcode, sizeof (opcode));
But I have no actually tried it myself. :)
I had it right the first time, except, I needed ntohl ... C Endian Conversion : bit by bit

Resources