Reading binary files in C, on UNIX systems (Ubuntu 10.10) - c

this is my first time programming in C, and on a UNIX system. I am trying to do something fairly simple. I have a binary file that is an image of a Compact Flash camera card, and consists of a few JPG images. I am trying to read through the file, find the byte sequence corresponding to FF D8 FF E0, or FF D8 FF E1, the signifiers of the beginning of a JPG file, then writing everything between that signifier and the next to a new jpg file.
At the moment I am just trying to get my computer to print out the file as is, by reading it in 512 size blocks, the stated size of the blocks in the original file system. I have the following code:
#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>
int main (int argc, char *argv[])
{
FILE * raw;
FILE * currentimage;
char buf[512];
char beg1[32] = "11111111110110001111111111100001";
char beg2[32] = "11111111110110001111111111100000";
raw = fopen("card.raw", "rb");
while((fread(buf, sizeof(raw), 512, raw) > 0))
{
printf(buf);
printf("\n");
}
}
It just prints out the file formatted into what I presume is ASCII, so it looks like a bunch of gobbledegook. How can I get this data formatted to either binary 1's and 0's or, even better, hex 0-F's?
Any help would be much appreciated.
P.S. beg1 and beg2 correspond to the binary values of the hex values I am looking for, but they are not really relevant to the rest of the code I have at the moment.

Instead of printf(buf); you would need to loop through each byte and do printf("%02x ", byte). Take a look at the source of hexdump here:
http://qa.coreboot.org/docs/doxygen/hexdump_8c_source.html

You should read up on what printf() does, as in NO PROGRAMMING LANGUAGE that I know, should you EVER use data as the first argument to printf. The first argument should be a template, which the way you used it should be "%s". To see hex output, replace your loop with this:
int size;
while((size = (fread(buf, sizeof(raw), 512, raw)) > 0))
{
for (int i = 0; i < size; i++)
{
printf("%2X", buf[i]);
}
printf("\n");
}
To answer your question about comparision in C before printing:
The numerical data is right there in buf -- buf[i] is an char from -127 to 128 that contains the value. If you want to look at its hex representation, you can do:
sprintf(some_other_buffer, "%2X", buf[i]);
Then you can perform string manipulation on some_other_buffer, knowing it's a 2 character string.

Related

Reading text file with non-english character in C

Is it possible to read a text file hat has non-english text?
Example of text in file:
E 37
SVAR:
Fettembolisyndrom. (1 poäng)
Example of what is present in buffer which stores "fread" output using "puts" :
E 37 SVAR:
Fettembolisyndrom.
(1 poäng)
Under Linux my program was working fine but in Windows I am seeing this problem with non-english letters. Any advise how this can be fixed?
Program:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <string.h>
int debug = 0;
int main(int argc, char* argv[])
{
if (argc < 2)
{
puts("ERROR! Please enter a filename\n");
exit(1);
}
else if (argc > 2)
{
debug = atoi(argv[2]);
puts("Debugging mode ENABLED!\n");
}
FILE *fp = fopen(argv[1], "rb");
fseek(fp, 0, SEEK_END);
long fileSz = ftell(fp);
fseek(fp, 0, SEEK_SET);
char* buffer;
buffer = (char*) malloc (sizeof(char)*fileSz);
size_t readSz = fread(buffer, 1, fileSz, fp);
rewind(fp);
if (readSz == fileSz)
{
char tmpBuff[100];
fgets(tmpBuff, 100, fp);
if (!ferror(fp))
{
printf("100 characters from text file: %s\n", tmpBuff);
}
else
{
printf("Error encounter");
}
}
if (strstr("FRÅGA",buffer) == NULL)
{
printf("String not found!");
}
return 0;
}
Sample output
Text file
Summary: If you read text from a file encoded in UTF-8 and display it on the console you must either set the console to UTF-8 or transcode the text from UTF-8 to the encoding used by the console (in English-speaking countries, usually MS-DOS code page 437 or 850).
Longer explanation
Bytes are not characters and characters are not bytes. The char data type in C holds a byte, not a character. In particular, the character Å (Unicode <U+00C5>) mentioned in the comments can be represented in many ways, called encodings:
In UTF-8 it is two bytes, '\xC3' '\x85';
In UTF-16 it is two bytes, either '\xC5' '\x00' (little-endian UTF-16), or '\x00' '\xC5' (big-endian UTF-16);
In Latin-1 and Windows-1252, it is one byte, '\xC5';
In MS-DOS code page 437 and code page 850, it is one byte, '\x8F'.
It is the responsibility of the programmer to translate between the internal encoding used by the program (usually but not always Unicode), the encoding used in input or output files, and the encoding expected by the display device.
Note: Sometimes, if the program does not do much with the characters it reads and outputs, one can get by just by making sure that the input files, the output files, and the display device all use the same encoding. In Linux, this encoding is almost always UTF-8. Unfortunately, on Windows the existence of multiple encodings is a fact of life. System calls expect either UTF-16 or Windows-1252. By default, the console displays Code Page 437 or 850. Text files are quite often in UTF-8. Windows is old and complicated.

How to detect the character encoding of command line arguments in mingw

Is it safe to assume they are ISO-8859-15 (Window-1252?), or is there some function I can call to query this? The end goal is to conversion to UTF-8.
Background:
The problem described by this question arises because XMLStarlet assumes its command line arguments are UTF-8. Under Windows it seems they are actually ISO-8859-15 (Window-1252?), or at least adding the following to the beginning of main makes things work:
char **utf8argv = malloc(sizeof(char*) * (argc+1));
utf8argv[argc] = NULL;
{
iconv_t windows2utf8 = iconv_open("UTF-8", "ISO-8859-15");
int i;
for (i = 0; i < argc; i++) {
const char *arg = argv[i];
size_t len = strlen(arg);
size_t outlen = len*2 + 1;
char *utfarg = malloc(outlen);
char *out = utfarg;
size_t ret = iconv(windows2utf8,
&arg, &len,
&out, &outlen);
if (ret < 0) {
perror("iconv");
utf8argv[i] = NULL;
continue;
}
out[0] = '\0';
utf8argv[i] = utfarg;
}
argv = utf8argv;
}
Testing Encoding
The following program prints out the bytes of its first argument in decimal:
#include <strings.h>
#include <stdio.h>
int main(int argc, char *argv[])
{
for (int i = 0; i < strlen(argv[1]); i++) {
printf("%d ", (unsigned char) argv[1][i]);
}
printf("\n");
return 0;
}
chcp reports code page 850, so the characters æ and Æ should be 145 and 146, respectively.
C:\Users\npostavs\tmp>chcp
Active code page: 850
But we see 230 and 198 reported which matches 1252:
C:\Users\npostavs\tmp>cmd-chars æÆ
230 198
Passing characters outside of codepage causes lossy transformation
Making a shortcut to cmd-chars.exe with arguments αβγ (these are not present in codepage 1252) gives
C:\Users\npostavs\tmp>shortcut-cmd-chars.lnk
97 223 63
Which is aß?.
You can call CommandLineToArgvW with a call to GetCommandLineW as the first argument to get the command-line arguments in an argv-style array of wide strings. This is the only portable Windows way, especially with the code page mess; Japanese characters can be passed via a Windows shortcut for example. After that, you can use WideCharToMultiByte with a code page argument of CP_UTF8 to convert each wide-character argv element to UTF-8.
Note that calling WideCharToMultiByte with an output buffer size (byte count) of 0 will allow you to determine the number of UTF-8 bytes required for the number of characters specified (or the entire wide string including the null terminator if you wish to pass -1 as the number of wide characters to simplify your code). Then you can allocate the required number of bytes using malloc et al. and call WideCharToMultiByte again with the correct number of bytes instead of 0. If this was performance-critical, a different solution would probably be best, but since this is a one-time function to get command-line arguments, I'd say any decrease in performance would be negligible.
Of course, don't forget to free all of your memory, including calling LocalFree with the pointer returned by CommandLineToArgvW as the argument.
For more info on the functions and how you can use them, click the links to see the MSDN documentation.
The command-line parameters are in the system default codepage, which varies depending on system settings. Rather than specify a specific source charset at all, you can specify "char" or "" instead and let iconv_open() figure out what the system charset actually is:
iconv_t windows2utf8 = iconv_open("UTF-8", "char");
Otherwise, you are better off retrieving the command-line as UTF-16 instead of as Ansi, and then you can convert it directly to UTF-8 using iconv_open("UTF-8", "UTF-16LE"), or WideCharToMultiByte(CP_UTF8) like Chrono suggested.
It seems that you are under windows.
In this case, you can make a system() call to run the CHCP command.
#include <stdlib.h> // Uses: system()
#include <stdio.h>
// .....
// 1st: Store the present windows codepage in a text file:
system("CMD /C \"CHCP > myenc.txt\"");
// 2nd: Read the first line in the file:
FILE *F = fopen("myenc.txt", "r");
char buffer[100];
fgets(buffer, F);
fclose(F);
// 3rd: Analyze the loaded string to find the Windows codepage:
int codepage = my_CHCP_analizer_func(buffer);
// The function my_CHCP_analizer_func() must be written for you,
// and it has to take in account the way in that CHCP prints the information.
Finally, the codepages sent by CHCP can be checked for example here:
Windows Codepages

Problems with fscanf and numbers stored on file in native int format, not text

Data at start of file in HEX 78000000 0300497B ..............
equating to int32 120 and char 03 followed by a load of other char data,
data written by another of my programs and viewed in hex dump mode.
When trying to read it back with another program I have tried..
int j,padNumber;
char rot;
j=fscanf(fp,"%d%c",&padNumber,&rot); // insists on returning j=0,padNumber=0 & rot=0
whereas
char c1,c2,c3,c4,rot;
j=fscanf(fp,"&c&c&c&c&c",&c1,%c2,&c3,&c4,&rot);// gives
j=5,c1='x',c2='\0',c3='\0',c4='\0',rot='!x03'
which equates to my on file data.
Why can't I get my int back in native format
Use fread when reading from a native binary dump.
fread(&padNumber, sizeof padNumber, 1, fp);
fread(&rot, sizeof rot, 1, fp);
Or if you're on a Unix platform, the direct read syscall could work too.
#include <unistd.h>
...
read(fd, &padNumber, sizeof padNumber);
read(fd, &rot, sizeof rot);

Read and write to binary files in C?

Does anyone have an example of code that can write to a binary file. And also code that can read a binary file and output to screen. Looking at examples I can write to a file ok But when I try to read from a file it is not outputting correctly.
Reading and writing binary files is pretty much the same as any other file, the only difference is how you open it:
unsigned char buffer[10];
FILE *ptr;
ptr = fopen("test.bin","rb"); // r for read, b for binary
fread(buffer,sizeof(buffer),1,ptr); // read 10 bytes to our buffer
You said you can read it, but it's not outputting correctly... keep in mind that when you "output" this data, you're not reading ASCII, so it's not like printing a string to the screen:
for(int i = 0; i<10; i++)
printf("%u ", buffer[i]); // prints a series of bytes
Writing to a file is pretty much the same, with the exception that you're using fwrite() instead of fread():
FILE *write_ptr;
write_ptr = fopen("test.bin","wb"); // w for write, b for binary
fwrite(buffer,sizeof(buffer),1,write_ptr); // write 10 bytes from our buffer
Since we're talking Linux.. there's an easy way to do a sanity check. Install hexdump on your system (if it's not already on there) and dump your file:
mike#mike-VirtualBox:~/C$ hexdump test.bin
0000000 457f 464c 0102 0001 0000 0000 0000 0000
0000010 0001 003e 0001 0000 0000 0000 0000 0000
...
Now compare that to your output:
mike#mike-VirtualBox:~/C$ ./a.out
127 69 76 70 2 1 1 0 0 0
hmm, maybe change the printf to a %x to make this a little clearer:
mike#mike-VirtualBox:~/C$ ./a.out
7F 45 4C 46 2 1 1 0 0 0
Hey, look! The data matches up now*. Awesome, we must be reading the binary file correctly!
*Note the bytes are just swapped on the output but that data is correct, you can adjust for this sort of thing
There are a few ways to do it. If I want to read and write binary I usually use open(), read(), write(), close(). Which are completely different than doing a byte at a time. You work with integer file descriptors instead of FILE * variables. fileno will get an integer descriptor from a FILE * BTW. You read a buffer full of data, say 32k bytes at once. The buffer is really an array which you can read from really fast because it's in memory. And reading and writing many bytes at once is faster than one at a time. It's called a blockread in Pascal I think, but read() is the C equivalent.
I looked but I don't have any examples handy. OK, these aren't ideal because they also are doing stuff with JPEG images. Here's a read, you probably only care about the part from open() to close(). fbuf is the array to read into,
sb.st_size is the file size in bytes from a stat() call.
fd = open(MASKFNAME,O_RDONLY);
if (fd != -1) {
read(fd,fbuf,sb.st_size);
close(fd);
splitmask(fbuf,(uint32_t)sb.st_size); // look at lines, etc
have_mask = 1;
}
Here's a write: (here pix is the byte array, jwidth and jheight are the JPEG width and height so for RGB color we write height * width * 3 color bytes). It's the # of bytes to write.
void simpdump(uint8_t *pix, char *nm) { // makes a raw aka .data file
int sdfd;
sdfd = open(nm,O_WRONLY | O_CREAT);
if (sdfd == -1) {
printf("bad open\n");
exit(-1);
}
printf("width: %i height: %i\n",jwidth,jheight); // to the console
write(sdfd,pix,(jwidth*jheight*3));
close(sdfd);
}
Look at man 2 open, also read, write, close. Also this old-style jpeg example.c: https://github.com/LuaDist/libjpeg/blob/master/example.c I'm reading and writing an entire image at once here. But they're binary reads and writes of bytes, just a lot at once.
"But when I try to read from a file it is not outputting correctly." Hmmm. If you read a number 65 that's (decimal) ASCII for an A. Maybe you should look at man ascii too. If you want a 1 that's ASCII 0x31. A char variable is a tiny 8-bit integer really, if you do a printf as a %i you get the ASCII value, if you do a %c you get the character. Do %x for hexadecimal. All from the same number between 0 and 255.
I'm quite happy with my "make a weak pin storage program" solution. Maybe it will help people who need a very simple binary file IO example to follow.
$ ls
WeakPin my_pin_code.pin weak_pin.c
$ ./WeakPin
Pin: 45 47 49 32
$ ./WeakPin 8 2
$ Need 4 ints to write a new pin!
$./WeakPin 8 2 99 49
Pin saved.
$ ./WeakPin
Pin: 8 2 99 49
$
$ cat weak_pin.c
// a program to save and read 4-digit pin codes in binary format
#include <stdio.h>
#include <stdlib.h>
#define PIN_FILE "my_pin_code.pin"
typedef struct { unsigned short a, b, c, d; } PinCode;
int main(int argc, const char** argv)
{
if (argc > 1) // create pin
{
if (argc != 5)
{
printf("Need 4 ints to write a new pin!\n");
return -1;
}
unsigned short _a = atoi(argv[1]);
unsigned short _b = atoi(argv[2]);
unsigned short _c = atoi(argv[3]);
unsigned short _d = atoi(argv[4]);
PinCode pc;
pc.a = _a; pc.b = _b; pc.c = _c; pc.d = _d;
FILE *f = fopen(PIN_FILE, "wb"); // create and/or overwrite
if (!f)
{
printf("Error in creating file. Aborting.\n");
return -2;
}
// write one PinCode object pc to the file *f
fwrite(&pc, sizeof(PinCode), 1, f);
fclose(f);
printf("Pin saved.\n");
return 0;
}
// else read existing pin
FILE *f = fopen(PIN_FILE, "rb");
if (!f)
{
printf("Error in reading file. Abort.\n");
return -3;
}
PinCode pc;
fread(&pc, sizeof(PinCode), 1, f);
fclose(f);
printf("Pin: ");
printf("%hu ", pc.a);
printf("%hu ", pc.b);
printf("%hu ", pc.c);
printf("%hu\n", pc.d);
return 0;
}
$
This is an example to read and write binary jjpg or wmv video file.
FILE *fout;
FILE *fin;
Int ch;
char *s;
fin=fopen("D:\\pic.jpg","rb");
if(fin==NULL)
{ printf("\n Unable to open the file ");
exit(1);
}
fout=fopen("D:\\ newpic.jpg","wb");
ch=fgetc(fin);
while (ch!=EOF)
{
s=(char *)ch;
printf("%c",s);
ch=fgetc (fin):
fputc(s,fout);
s++;
}
printf("data read and copied");
fclose(fin);
fclose(fout);
I really struggled to find a way to read a binary file into a byte array in C++ that would output the same hex values I see in a hex editor. After much trial and error, this seems to be the fastest way to do so without extra casts. By default it loads the entire file into memory, but only prints the first 1000 bytes.
string Filename = "BinaryFile.bin";
FILE* pFile;
pFile = fopen(Filename.c_str(), "rb");
fseek(pFile, 0L, SEEK_END);
size_t size = ftell(pFile);
fseek(pFile, 0L, SEEK_SET);
uint8_t* ByteArray;
ByteArray = new uint8_t[size];
if (pFile != NULL)
{
int counter = 0;
do {
ByteArray[counter] = fgetc(pFile);
counter++;
} while (counter <= size);
fclose(pFile);
}
for (size_t i = 0; i < 800; i++) {
printf("%02X ", ByteArray[i]);
}
this questions is linked with the question How to write binary data file on C and plot it using Gnuplot by CAMILO HG. I know that the real problem have two parts: 1) Write the binary data file, 2) Plot it using Gnuplot.
The first part has been very clearly answered here, so I do not have something to add.
For the second, the easy way is send the people to the Gnuplot manual, and I sure someone find a good answer, but I do not find it in the web, so I am going to explain one solution (which must be in the real question, but I new in stackoverflow and I can not answer there):
After write your binary data file using fwrite(), you should create a very simple program in C, a reader. The reader only contains the same structure as the writer, but you use fread() instead fwrite(). So it is very ease to generate this program: copy in the reader.c file the writing part of your original code and change write for read (and "wb" for "rb"). In addition, you could include some checks for the data, for example, if the length of the file is correct. And finally, your program need to print the data in the standard output using a printf().
For be clear: your program run like this
$ ./reader data.dat
X_position Y_position (it must be a comment for Gnuplot)*
1.23 2.45
2.54 3.12
5.98 9.52
Okey, with this program, in Gnuplot you only need to pipe the standard output of the reader to the Gnuplot, something like this:
plot '< ./reader data.dat'
This line, run the program reader, and the output is connected with Gnuplot and it plot the data.
*Because Gnuplot is going to read the output of the program, you must know what can Gnuplot read and plot and what can not.
#include <stdio.h>
#include <stdlib.h>
main(int argc, char **argv) //int argc; char **argv;
{
int wd;
FILE *in, *out;
if(argc != 3) {
printf("Input and output file are to be specified\n");
exit(1);
}
in = fopen(argv[1], "rb");
out = fopen(argv[2], "wb");
if(in == NULL || out == NULL) { /* open for write */
printf("Cannot open an input and an output file.\n");
getchar();
exit(0);
}
while(wd = getw(in), !feof(in)) putw(wd, out);
fclose(in);
fclose(out);
}

C read binary stdin

I'm trying to build an instruction pipeline simulator and I'm having a lot of trouble getting started. What I need to do is read binary from stdin, and then store it in memory somehow while I manipulate the data. I need to read in chunks of exactly 32 bits one after the other.
How do I read in chunks of exactly 32 bits at a time? Secondly, how do I store it for manipulation later?
Here's what I've got so far, but examining the binary chunks I read further, it just doesn't look right, I don't think I'm reading exactly 32 bits like I need.
char buffer[4] = { 0 }; // initialize to 0
unsigned long c = 0;
int bytesize = 4; // read in 32 bits
while (fgets(buffer, bytesize, stdin)) {
memcpy(&c, buffer, bytesize); // copy the data to a more usable structure for bit manipulation later
// more stuff
buffer[0] = 0; buffer[1] = 0; buffer[2] = 0; buffer[3] = 0; // set to zero before next loop
}
fclose(stdin);
How do I read in 32 bits at a time (they are all 1/0, no newlines etc), and what do I store it in, is char[] okay?
EDIT: I'm able to read the binary in but none of the answers produce the bits in the correct order — they are all mangled up, I suspect endianness and problems reading and moving 8 bits around ( 1 char) at a time — this needs to work on Windows and C ... ?
What you need is freopen(). From the manpage:
If filename is a null pointer, the freopen() function shall attempt to change the mode of the stream to that specified by mode, as if the name of the file currently associated with the stream had been used. In this case, the file descriptor associated with the stream need not be closed if the call to freopen() succeeds. It is implementation-defined which changes of mode are permitted (if any), and under what circumstances.
Basically, the best you can really do is this:
freopen(NULL, "rb", stdin);
This will reopen stdin to be the same input stream, but in binary mode. In the normal mode, reading from stdin on Windows will convert \r\n (Windows newline) to the single character ASCII 10. Using the "rb" mode disables this conversion so that you can properly read in binary data.
freopen() returns a filehandle, but it's the previous value (before we put it in binary mode), so don't use it for anything. After that, use fread() as has been mentioned.
As to your concerns, however, you may not be reading in "32 bits" but if you use fread() you will be reading in 4 chars (which is the best you can do in C - char is guaranteed to be at least 8 bits but some historical and embedded platforms have 16 bit chars (some even have 18 or worse)). If you use fgets() you will never read in 4 bytes. You will read in at least 3 (depending on whether any of them are newlines), and the 4th byte will be '\0' because C strings are nul-terminated and fgets() nul-terminates what it reads (like a good function). Obviously, this is not what you want, so you should use fread().
Consider using SET_BINARY_MODE macro and setmode:
#ifdef _WIN32
# include <io.h>
# include <fcntl.h>
# define SET_BINARY_MODE(handle) setmode(handle, O_BINARY)
#else
# define SET_BINARY_MODE(handle) ((void)0)
#endif
More details about SET_BINARY_MODE macro here: "Handling binary files via standard I/O"
More details about setmode here: "_setmode"
I had to piece the answer together from the various comments from the kind people above, so here is a fully-working sample that works - only for Windows, but you can probably translate the windows-specific stuff to your platform.
#include "stdafx.h"
#include "stdio.h"
#include "stdlib.h"
#include "windows.h"
#include <io.h>
#include <fcntl.h>
int main()
{
char rbuf[4096];
char *deffile = "c:\\temp\\outvideo.bin";
size_t r;
char *outfilename = deffile;
FILE *newin;
freopen(NULL, "rb", stdin);
_setmode(_fileno(stdin), _O_BINARY);
FILE *f = fopen(outfilename, "w+b");
if (f == NULL)
{
printf("unable to open %s\n", outfilename);
exit(1);
}
for (;; )
{
r = fread(rbuf, 1, sizeof(rbuf), stdin);
if (r > 0)
{
size_t w;
for (size_t nleft = r; nleft > 0; )
{
w = fwrite(rbuf, 1, nleft, f);
if (w == 0)
{
printf("error: unable to write %d bytes to %s\n", nleft, outfilename);
exit(1);
}
nleft -= w;
fflush(f);
}
}
else
{
Sleep(10); // wait for more input, but not in a tight loop
}
}
return 0;
}
For Windows, this Microsoft _setmode example specifically shows how to change stdin to binary mode:
// crt_setmode.c
// This program uses _setmode to change
// stdin from text mode to binary mode.
#include <stdio.h>
#include <fcntl.h>
#include <io.h>
int main( void )
{
int result;
// Set "stdin" to have binary mode:
result = _setmode( _fileno( stdin ), _O_BINARY );
if( result == -1 )
perror( "Cannot set mode" );
else
printf( "'stdin' successfully changed to binary mode\n" );
}
fgets() is all wrong here. It's aimed at human-readable ASCII text terminated by end-of-line characters, not binary data, and won't get you what you need.
I recently did exactly what you want using the read() call. Unless your program has explicitly closed stdin, for the first argument (the file descriptor), you can use a constant value of 0 for stdin. Or, if you're on a POSIX system (Linux, Mac OS X, or some other modern variant of Unix), you can use STDIN_FILENO.
fread() suits best for reading binary data.
Yes, char array is OK, if you are planning to process them bytewise.
I don't know what OS you are running, but you typically cannot "open stdin in binary". You can try things like
int fd = fdreopen (fileno (stdin), outfname, O_RDONLY | OPEN_O_BINARY);
to try to force it. Then use
uint32_t opcode;
read(fd, &opcode, sizeof (opcode));
But I have no actually tried it myself. :)
I had it right the first time, except, I needed ntohl ... C Endian Conversion : bit by bit

Resources