So here's what I got:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include <math.h>
int main()
{
FILE *fin;
struct STR {
float d;
int x;
} testing;
testing.d = 11.12;
testing.x = 31121;
fin = fopen("output.txt", "w");
//fprintf(fin,"%7.4f %7d\n",testing.d,testing.x);
fwrite(&testing, sizeof(struct STR),1,fin);
fclose(fin);
return 0;
}
So what happens when I compile and run? I get this:
"…ë1A‘y "
When I comment out the fwrite and use the fprintf, I get this:
"11.1200 31121"
Can someone explain this to me? I tried running it on windows and on linux, and both times the output was obscure.
Also, I guess while we're on the subject, how come the size of the text file with "11.1200 31121" is 16 bytes? I thought that integers (on a 32-bit machine) were 4 bytes each? Is it 16 bytes because there are 16 total characters in the txt file?
Thanks
You are opening the file as a text file but you are writing binary data, it's not human readable. To read it properly you need fread(). Instead fprintf() writes text as you can check.
So
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include <math.h>
struct STR {
float d;
int x;
} testing;
int main()
{
FILE *file;
testing.d = 11.12;
testing.x = 31121;
file = fopen("output.txt", "wb");
if (file != NULL)
{
fwrite(&testing, sizeof(struct STR), 1, file);
fclose(file);
}
file = fopen("output.txt", "rb");
if ((file != NULL) && (fread(&testing, sizeof(struct STR), 1, file) == 1))
{
fprintf(stdout, "%f -- %d\n", testing.d, testing.x);
fclose(file);
}
return 0;
}
should make it clear.
As iharob said, you’re writing binary data that are getting interpreted as nonsense characters in the current locale, not human-readable ASCII. In addition, the reason your compiler is allocating sixteen bytes to the structure is padding. The reason the compiler is padding to 16 bytes is that your CPU has special instructions to index arrays of structures more efficiently when their size is a small power of two.
If you really want to serialize your data in a portable binary format, or transmit it over a network, you should both use an exact-width type such as int32_t rather than int (which has been 16, 32 or 64 bits and might have other widths than those) and also convert to a specific endianness rather than whatever the native byte order happens to be. The classic solution is htonl(). Also, write out each field separately to avoid problems with padding, or use a compiler extension to pack your structure and turn padding off.
Related
I am trying to read Chinese characters from an infile, and I have found a few questions on the subject here but nothing that works for me or suits my needs. I am using the fread() implementation from this question, but it is not working. I am running Linux.
#define UNICODE
#ifdef UNICODE
#define _UNICODE
#else
#define _MBCS
#endif
#include <locale.h>
#include <stdio.h>
#include <wchar.h>
#include <string.h>
#include <stdlib.h>
int main(int argc, char * argv[]) {
FILE *infile = fopen(argv[1], "r");
wchar_t test[2] = L"\u4E2A";
setlocale(LC_ALL, "");
printf("%ls\n", test); //test
wcscpy(test, L"\u4F60"); //test
printf("%ls\n", test); //test
for (int i = 0; i < 5; i++){
fread(test, 2, 2, infile);
printf("%ls\n", test);
}
return 0;
}
I use the following text file to test it:
一个人
两本书
三张桌子
我喜欢一个猫
and the program outputs:
个
你
������
Anyone have any wisdom on the subject?
Edit: Also, that's all of my code because I'm not sure where it fails. There's some stuff in there where I test to make sure I can print unicode wchars that isn't entirely relevant to the question.
If you really need to read a UTF-8 (or rather a locale charmap) file one codepoint at a time you can use fscanf as below. But do note, this is codepoints not characters, characters may consist of multiple codepoints because of combining codes and some of the codepoints are most definitely not printable.
#include <locale.h>
#include <stdio.h>
#include <wchar.h>
#include <string.h>
#include <stdlib.h>
int
main(int argc, char *argv[])
{
FILE *infile = fopen(argv[1], "r");
wchar_t test[2] = L"\u4E2A";
setlocale(LC_ALL, "");
printf("%ls\n", test); //test
wcscpy(test, L"\u4F60"); //test
printf("%ls\n", test); //test
for (int i = 0; i < 5; i++) {
fscanf(infile, "%1ls", test);
printf("%ls\n", test);
}
return 0;
}
Most of the time you probably won't need to use the locale functionality because UTF-8 generally just works if you treat it as an opaque encoding. Part of this is because all non ASCII characters have all their component bytes in the 128..253 range (not a typo, 254 and 255 are unused) another part is that the bytes 128..159 are always continuation bytes all the start bytes for characters are 160..253 which means an error will just break one character not the rest of the stream. (Okay, codepoints vs characters is only really there to try to convince you that dividing UTF-8 up into "characters" probably won't do what you want).
You are telling fread to read two 2-byte values in each call; however, the characters you want to read have 3-byte UTF-8 encodings. In general, you need to decode the UTF-8 stream as a whole, not in fixed-sized byte chunks.
I'm trying to calculate the entropy of a .exe file by giving it as an input. However, I'm getting a zero value instead of an answer.
Entropy of a file can be understood as the the summation of (pi*log(pi)) every character in the file. I'm trying to calculate the entropy of a .exe file. However, I'm ending up getting a '0'. The '.exe' file for sure has an output.
Below is my code.
#include <stdio.h>
#include <stdlib.h>
#include "stdbool.h"
#include <string.h>
#include <conio.h>
#include <math.h>
#define MAXLEN 100
int makehist( char *S, int *hist, int len) {
int wherechar[256];
int i,histlen;
histlen=0;
for (i=0;i<256;i++)
wherechar[i]=-1;
for (i=0;i<len;i++) {
if (wherechar[(int)S[i]]==-1) {
wherechar[(int)S[i]]=histlen;
histlen++;
}
hist[wherechar[(int)S[i]]]++;
}
return histlen;
}
double entropy(int *hist, int histlen, int len) {
int i;
double H;
H=0;
for (i=0;i<histlen;i++) {
H-=(double)hist[i]/len*log((double)hist[i]/len);
}
return H;
}
void main() {
char S[100];
int len,*hist,histlen;
int num;
double H;
int i=0;
int count =0;
FILE*file = fopen("freq.exe","r");
while (fscanf(file,"%d",&num)>0)
{
S[i]=num;
printf("%d",S[i]);
i++;
}
hist=(int*)calloc(i,sizeof(int));
histlen=makehist(S,hist,i);
H=entropy(hist,histlen,i);
printf("%lf\n",H);
getch();
}
while (fscanf(file,"%d",&num)>0)
This reads numbers encoded as leading white space, optional sign, and a sequence of digits. As soon as some other character is encountered in your file (probably the first byte), your loop will stop. You need to read raw bytes, with getc or fread.
Also, please consider doing the most basic debugging before submitting a question to StackOverflow. Surely your printf in that loop never printed anything, yet you don't mention this in your question and apparently didn't investigate why.
Some other issues:
#define MAXLEN 100
This is never used.
void main()
This is not a valid definition of main. Use
int main(void)
char S[100];
You have undefined behavior if the input contains more than 100 chars, and a .exe file surely will. You really should be feeding the bytes into your histogram calculation as you read them, rather than storing them in a buffer. Easiest is to make wherechar and histlen globals, but you could also put everything you need into a struct and pass a pointer to the struct, together with each byte, to makehist, and again pass a pointer to the struct to entropy.
FILE*file = fopen("freq.exe","r");
Binary files must be opened with "rb" (doesn't matter on linux but does on Windows).
Also, you should check whether fopen succeeds.
hist=(int*)calloc(i,sizeof(int));
hist should have 256 elements. If you allocate this first, then you can process each byte as it is read per above.
You do a divide by zero in entropy if the file is empty ... you should check for len == 0.
wherechar[(int)S[i]] is undefined behavior if the file has chars with negative values, as it surely will. You should use unsigned char instead of char, and then the casts aren't necessary.
This line seems to be reading numbers:
fscanf(file,"%d",&num)
But I don't really expect to find many numbers in an EXE file.
They'd be random byte-values of all different types.
Numbers are only the digits 0-9 (and - & + signs as well).
I am attempting to read the size values from the header of a .pgm image file (mars.pgm), and assign the resulting values to the integer variables u and v using sscanf.
When executed the program prints P5 832 700 127 in the first line, which is correct (the 832 and 700 are the size values that I want to pick out).
In the second line that is meant to print u and v variables two very large numbers are printed, instead of the 832 and 700 values.
I cannot figure out why this is not working as desired. When using the small test program (located at the bottom of the post) sscanf picks out the values from a string like I expected it to.
#include<stdio.h>
#include <string.h>
int main()
{
FILE *fin;
fin= fopen ("mars.pgm","r+");
if (fin == NULL)
{
printf ("ERROR");
fclose(fin);
}
int u,v,i,d,c;
char test[20];
for (i=0; i<=20; i++)
{
test[i]=getc(fin);
}
sscanf(test,"%d,%d,%d,%d",&c,&u,&v,&d);
printf("%s\n",test);
printf("%d %d",u, v);
fclose(fin);
}
small test Program
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
void main(void)
{
int a;
char s[3];
s[0]='1';
s[1]=' ';
s[2]='2';
sscanf(s,"%d",&a);
printf("%d",a);
}
First of all, I advise you to make a small test: initialize your variables with a 0, for instance, and verify what value they are holding after read operation.
Then, try removing , characters from your format string. Check if it works then.
This behavior you see is happening because fscanf() and derivatives match the full pattern when scanning, which means if your source data has no commas and your format has commas, it may be ignored.
I am primarily a Python programmer, but I've been working with C because Python is too slow for graphics (20 fps moving fractals FTW). I'm hitting a sticking point though...
I wrote a little file in a hex editor to test with. When I try reading the first byte, 5A, it correctly gives me 90 with a program like this...
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
FILE *data;
int main(int argc, char* argv[])
{
data=fopen("C:\\vb\\svotest1.vipc","r+b");
unsigned char number;
fread(&number,1,1,data);
printf("%d\n",number);
}
But when I try reading the first four bytes, 5A F3 5B 20, into an integer I get 542896986
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
FILE *data;
int main(int argc, char* argv[])
{
data=fopen("C:\\vb\\svotest1.vipc","r+b");
unsigned long number;
fread(&number,1,4,data);
printf("%d\n",number);
}
It should be 1525898016!!!
The problem is it has reversed the byte order. GAH! Of course! The way this program works will depend on the machine. And now that we are on the subject, even the byte won't work on every machine!
So I need help... In Python I can use struct.pack and struct.unpack to pack data into bytes using any format (long, short, single, double, signed, unsigned, big endian, little endian) and unpack it. I need something like this in C... I could write it myself but I don't know how.
The easiest way to handle this portably is probably to use htonl on the data before you write it to the file (so the file will be in network/big endian order) and ntohl when you read (converts the network order data to the local convention, whatever that is).
If you have to do much more than a few values of one type, you may want to look into a more complete library for the purpose, such as Sun XDR or Google Protocol Buffers.
I think you can figure out the (Endianness) http://en.wikipedia.org/wiki/Endianness and then read the byte order.
Like this?
unsigned long bin = 0;
unsigned char temp = 0;
for(unsigned char i = 0; i < 4; i++) {
fread(&temp,1,1,data);
bin = (bin << 8) | temp;
}
This results in the first part being the most significant. I think that's what you want.
I wish to open a binary file, to read the first byte of the file and finally to print the hex value (in string format) to stdout (ie, if the first byte is 03 hex, I wish to print out 0x03 for example). The output I get does not correspond with what I know to be in my sample binary, so I am wondering if someone can help with this.
Here is the code:
#include <stdio.h>
#include <fcntl.h>
int main(int argc, char* argv[])
{
int fd;
char raw_buf[1],str_buf[1];
fd = open(argv[1],O_RDONLY|O_BINARY);
/* Position at beginning */
lseek(fd,0,SEEK_SET);
/* Read one byte */
read(fd,raw_buf,1);
/* Convert to string format */
sprintf(str_buf,"0x%x",raw_buf);
printf("str_buf= <%s>\n",str_buf);
close (fd);
return 0;
}
The program is compiled as follows:
gcc rd_byte.c -o rd_byte
and run as follows:
rd_byte BINFILE.bin
Knowing that the sample binary file used has 03 as its first byte, I get the output:
str_buf= <0x22cce3>
What I expect is
str_buf= <0x03>
Where is the error in my code?
Thank you for any help.
You're printing the value of the pointer raw_buf, not the memory at that location:
sprintf(str_buf,"0x%x",raw_buf[0]);
As Andreas said, str_buf is also not big enough. But: no need for a second buffer, you could just call printf directly.
printf("0x%x",raw_buf[0]);
Less is more...
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
int main(int argc, char* argv[]) {
int fd;
unsigned char c;
/* needs error checking */
fd = open(argv[1], O_RDONLY);
read(fd, &c, sizeof(c));
close(fd);
printf("<0x%x>\n", c);
return 0;
}
seeking is not needed
if you want to read a byte use an unsigned char
printf will do the format
I think that you are overcomplicating things and using non-portable constructs where they aren't really necessary.
You should be able to just do:
#include <stdio.h>
int main(int argc, char** argv)
{
if (argc < 2)
return 1; /* TODO: better error handling */
FILE* f = fopen(argv[1], "rb");
/* TODO: check f is not NULL */
/* Read one byte */
int first = fgetc(f);
if (first != EOF)
printf("first byte = %x\n", (unsigned)first);
/* TODO else read failed, empty file?? */
fclose(f);
return 0;
}
str_buf has a maximum size of 1 (char str_buf[1];), it should at least 5 bytes long (4 for XxXX plus the \0).
Moreover, change
sprintf(str_buf,"0x%x",raw_buf);
to
sprintf(str_buf,"0x%x",*raw_buf);
otherwise you'll print the address of the raw_buf pointer, instead of its value (that you obtain by dereferencing the pointer).
Finally, make sure both raw_buf is unsigned. The standard specified that the signness of chars (where not explicitly specified) is implementation defined, ie, every implementation decides whether they should be signed or not. In practice, on most implementations they are signed by default unless you're compiling with a particular flag. When dealing with bytes always make sure they are unsigned; otherwise you'll get surprising results should you want to convert them to integers.
Using the information from the various responses above (thank you all!) I would like to post this piece of code which is a trimmed down version of what I finally used.
There is however a difference between what the following code does and what was described in my origal question : this code does not read the first byte of the binary file header as described originally, but instead reads the 11th and 12th bytes (offsets 10 & 11) of the input binary file (a .DBF file). The 11th and 12th bytes contain the length of a data record (this is what I want to know in fact) with the Least Significant Byte positioned first: for example, if the 11th and 12th bytes are respectivly : 0x06 0x08, then the length of a data record would be 0x0806 bytes, or 2054bytes in decimal
#include <stdio.h>
#include <fcntl.h>
int main(int argc, char* argv[]) {
int fd, dec;
unsigned char c[1];
unsigned char hex_buf[6];
/* No error checking, etc. done here for brevity */
/* Open the file given as the input argument */
fd = open(argv[1], O_RDONLY);
/* Position ourselves on the 11th byte aka offset 10 of the input file */
lseek(fd,10,SEEK_SET);
/* read 2 bytes into memory location c */
read(fd, &c, 2*sizeof(c));
/* write the data at c to the buffer hex_buf in the required (reverse) byte order + formatted */
sprintf(hex_buf,"%.2x%.2x",c[1],c[0]);
printf("Hexadecimal value:<0x%s>\n", hex_buf);
/* copy the hex data in hex_buf to memory location dec, formatting it into decimal */
sscanf(hex_buf, "%x", &dec);
printf("Answer: Size of a data record=<%u>\n", dec);
return 0;
}