Reading numbers from file in C - c

I am primarily a Python programmer, but I've been working with C because Python is too slow for graphics (20 fps moving fractals FTW). I'm hitting a sticking point though...
I wrote a little file in a hex editor to test with. When I try reading the first byte, 5A, it correctly gives me 90 with a program like this...
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
FILE *data;
int main(int argc, char* argv[])
{
data=fopen("C:\\vb\\svotest1.vipc","r+b");
unsigned char number;
fread(&number,1,1,data);
printf("%d\n",number);
}
But when I try reading the first four bytes, 5A F3 5B 20, into an integer I get 542896986
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
FILE *data;
int main(int argc, char* argv[])
{
data=fopen("C:\\vb\\svotest1.vipc","r+b");
unsigned long number;
fread(&number,1,4,data);
printf("%d\n",number);
}
It should be 1525898016!!!
The problem is it has reversed the byte order. GAH! Of course! The way this program works will depend on the machine. And now that we are on the subject, even the byte won't work on every machine!
So I need help... In Python I can use struct.pack and struct.unpack to pack data into bytes using any format (long, short, single, double, signed, unsigned, big endian, little endian) and unpack it. I need something like this in C... I could write it myself but I don't know how.

The easiest way to handle this portably is probably to use htonl on the data before you write it to the file (so the file will be in network/big endian order) and ntohl when you read (converts the network order data to the local convention, whatever that is).
If you have to do much more than a few values of one type, you may want to look into a more complete library for the purpose, such as Sun XDR or Google Protocol Buffers.

I think you can figure out the (Endianness) http://en.wikipedia.org/wiki/Endianness and then read the byte order.

Like this?
unsigned long bin = 0;
unsigned char temp = 0;
for(unsigned char i = 0; i < 4; i++) {
fread(&temp,1,1,data);
bin = (bin << 8) | temp;
}
This results in the first part being the most significant. I think that's what you want.

Related

Why is fwrite writing strange things?

So here's what I got:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include <math.h>
int main()
{
FILE *fin;
struct STR {
float d;
int x;
} testing;
testing.d = 11.12;
testing.x = 31121;
fin = fopen("output.txt", "w");
//fprintf(fin,"%7.4f %7d\n",testing.d,testing.x);
fwrite(&testing, sizeof(struct STR),1,fin);
fclose(fin);
return 0;
}
So what happens when I compile and run? I get this:
"…ë1A‘y "
When I comment out the fwrite and use the fprintf, I get this:
"11.1200 31121"
Can someone explain this to me? I tried running it on windows and on linux, and both times the output was obscure.
Also, I guess while we're on the subject, how come the size of the text file with "11.1200 31121" is 16 bytes? I thought that integers (on a 32-bit machine) were 4 bytes each? Is it 16 bytes because there are 16 total characters in the txt file?
Thanks
You are opening the file as a text file but you are writing binary data, it's not human readable. To read it properly you need fread(). Instead fprintf() writes text as you can check.
So
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include <math.h>
struct STR {
float d;
int x;
} testing;
int main()
{
FILE *file;
testing.d = 11.12;
testing.x = 31121;
file = fopen("output.txt", "wb");
if (file != NULL)
{
fwrite(&testing, sizeof(struct STR), 1, file);
fclose(file);
}
file = fopen("output.txt", "rb");
if ((file != NULL) && (fread(&testing, sizeof(struct STR), 1, file) == 1))
{
fprintf(stdout, "%f -- %d\n", testing.d, testing.x);
fclose(file);
}
return 0;
}
should make it clear.
As iharob said, you’re writing binary data that are getting interpreted as nonsense characters in the current locale, not human-readable ASCII. In addition, the reason your compiler is allocating sixteen bytes to the structure is padding. The reason the compiler is padding to 16 bytes is that your CPU has special instructions to index arrays of structures more efficiently when their size is a small power of two.
If you really want to serialize your data in a portable binary format, or transmit it over a network, you should both use an exact-width type such as int32_t rather than int (which has been 16, 32 or 64 bits and might have other widths than those) and also convert to a specific endianness rather than whatever the native byte order happens to be. The classic solution is htonl(). Also, write out each field separately to avoid problems with padding, or use a compiler extension to pack your structure and turn padding off.

correct use of malloc in function with passed uint32_t array pointer

I'm having difficulty using malloc in a function where I read a binary file with 4 byte unsigned integers, free the passed array reference, remalloc it to the new size and then try to access members of the array. I think the problem is due to the uint32_t type as the array seems to be being seen as a array of 8 byte integers rather than 4 byte ones. Not exactly sure where I am going wrong, it might be with the use of malloc, IE maybe I need to instruct it to create uint32_t types in a different way to what I am doing, or maybe its something else. Code:
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <unistd.h>
#include <string.h>
#include <stdint.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <malloc.h>
int filecheck (uint32_t **sched, int * count) {
int v, size = 0;
FILE *f = fopen("/home/pi/schedule/default", "rb");
if (f == NULL) {
return 0;
}
fseek(f, 0, SEEK_END);
size = ftell(f);
fseek(f, 0, SEEK_SET);
int schedsize = sizeof(uint32_t);
int elementcount = size / schedsize;
free(*sched);
*sched = malloc(size);
if (elementcount != fread(sched, schedsize, elementcount, f)) {
free(*sched);
return 0;
}
fclose(f);
// This works correctly and prints data as expected
for (v=0;v<elementcount;v++) {
printf("Method 1 %02d %u \n", v, ((uint32_t*)sched)[v]);
}
// This skips every other byte byt does not print any numbers > 32 bit unsigned
for (v=0;v<elementcount;v++) {
printf("Method 2 %02d %u \n", v, sched[v]);
}
// This treats the binary file as if it was 64 bit uints, printing 64 bit numbers
for (v=0;v<elementcount;v++) {
printf("Method 3 %02d %lu \n", v, sched[v]);
}
*count = elementcount;
return 1;
}
int main (){
uint32_t *sched = NULL;
int i, count = 0;
if (filecheck(&sched, &count)) {
for (i=0;i<count;i++) { // At the next line I get a segmentation fault
printf("Method 4 %02d %u\n", i, sched[i]);
}
} else {
printf("Error\n");
}
return 0;
}
I added the file reading code as requested. Trying to access the array sched in main() the way I am doing will print the data I am reading as if it was 8 byte integers. So I guess sched is being seen as an array of 64 bit integers rather than 32 bit as I have defined it as. I guess this is because I freed it and used malloc on it again. But I believe I instructed malloc that the type it should create is a uint32_t so I am confused as to why the data is not being treated as such.
Edit: Found a way to make it work, but not sure if its right (method 1). Is this the only and cleanest way to get this array treated as uint32_t type? Surely the compiler should know what type I am dealing with without me having to cast it every time I use it.
Edit: Actually when I try to access it in main, I get a seg fault. I added a comment in the code to reflect that. I didnt notice this before as I was using the for print loop in several places to trace how the data was being viewed by the compiler.
Edit: Not sure if there is a way I can add the binary data file. I am making it by hand in a hex editor and its exact content is not that important. In the bin file, FF FF FF FF FF FF FF FF should be read as 2 uint32_t types, rather than as 1 64 bit integer. Fread AFAIK does not care about this, it just fills the buffer, but stand to be corrected if thats wrong.
TIA, Pete
// This works correctly and prints data as expected
for (v=0;v<elementcount;v++) {
printf("Method 1 %02d %u \n", v, ((uint32_t*)sched)[v]);
}
This is not quite right, because sched is a uint32_t**, and you allocated to *sched. And also because %u is not how you should print uint32_t.
First dereference sched and then apply the array index access.
You want printf("%02d %"PRIu32"\n", v, (*sched)[v]));
// This skips every other byte byt does not print any numbers > 32 bit unsigned
for (v=0;v<elementcount;v++) {
printf("Method 2 %02d %u \n", v, sched[v]);
}
Yeah, that's because sched[v] is a uint32_t*, and since it's a pointer type and you're probably running on a 64-bit machine, it's probably 64 bits... so you're iterating with 8-byte increments instead of 4-byte ones and trying to print pointers as %u. It also goes out of bounds because 8*10 is larger than 4*10, and that is likely to cause a segmentation fault.
// This treats the binary file as if it was 64 bit uints, printing 64 bit numbers
for (v=0;v<elementcount;v++) {
printf("Method 2 %02d %lu \n", v, sched[v]);
}
This is like the second example only you're printing with the marginally more appropriate but still incorrect %lu.
You need to fread() into *sched instead of sched, as well, or you will cause UB and this will very likely cause a segmentation fault when you try to read the array.
Also, your for loop in main should never run since you aren't setting count to anything in filecheck so it should still be zero (in the absence of UB, at least).
Other comments:
As you've just learned, don't cast the result of malloc() in C.
main() should return a value. 0 if everything went fine.
You can write some uint32_t to a file as bytes using fwrite(), of course.
Check the result of malloc() to see if it succeeded.
Use size_t for size_t values instead of int, and print accordingly (%zu).

I'm unable to calculate the entropy of a .exe file

I'm trying to calculate the entropy of a .exe file by giving it as an input. However, I'm getting a zero value instead of an answer.
Entropy of a file can be understood as the the summation of (pi*log(pi)) every character in the file. I'm trying to calculate the entropy of a .exe file. However, I'm ending up getting a '0'. The '.exe' file for sure has an output.
Below is my code.
#include <stdio.h>
#include <stdlib.h>
#include "stdbool.h"
#include <string.h>
#include <conio.h>
#include <math.h>
#define MAXLEN 100
int makehist( char *S, int *hist, int len) {
int wherechar[256];
int i,histlen;
histlen=0;
for (i=0;i<256;i++)
wherechar[i]=-1;
for (i=0;i<len;i++) {
if (wherechar[(int)S[i]]==-1) {
wherechar[(int)S[i]]=histlen;
histlen++;
}
hist[wherechar[(int)S[i]]]++;
}
return histlen;
}
double entropy(int *hist, int histlen, int len) {
int i;
double H;
H=0;
for (i=0;i<histlen;i++) {
H-=(double)hist[i]/len*log((double)hist[i]/len);
}
return H;
}
void main() {
char S[100];
int len,*hist,histlen;
int num;
double H;
int i=0;
int count =0;
FILE*file = fopen("freq.exe","r");
while (fscanf(file,"%d",&num)>0)
{
S[i]=num;
printf("%d",S[i]);
i++;
}
hist=(int*)calloc(i,sizeof(int));
histlen=makehist(S,hist,i);
H=entropy(hist,histlen,i);
printf("%lf\n",H);
getch();
}
while (fscanf(file,"%d",&num)>0)
This reads numbers encoded as leading white space, optional sign, and a sequence of digits. As soon as some other character is encountered in your file (probably the first byte), your loop will stop. You need to read raw bytes, with getc or fread.
Also, please consider doing the most basic debugging before submitting a question to StackOverflow. Surely your printf in that loop never printed anything, yet you don't mention this in your question and apparently didn't investigate why.
Some other issues:
#define MAXLEN 100
This is never used.
void main()
This is not a valid definition of main. Use
int main(void)
char S[100];
You have undefined behavior if the input contains more than 100 chars, and a .exe file surely will. You really should be feeding the bytes into your histogram calculation as you read them, rather than storing them in a buffer. Easiest is to make wherechar and histlen globals, but you could also put everything you need into a struct and pass a pointer to the struct, together with each byte, to makehist, and again pass a pointer to the struct to entropy.
FILE*file = fopen("freq.exe","r");
Binary files must be opened with "rb" (doesn't matter on linux but does on Windows).
Also, you should check whether fopen succeeds.
hist=(int*)calloc(i,sizeof(int));
hist should have 256 elements. If you allocate this first, then you can process each byte as it is read per above.
You do a divide by zero in entropy if the file is empty ... you should check for len == 0.
wherechar[(int)S[i]] is undefined behavior if the file has chars with negative values, as it surely will. You should use unsigned char instead of char, and then the casts aren't necessary.
This line seems to be reading numbers:
fscanf(file,"%d",&num)
But I don't really expect to find many numbers in an EXE file.
They'd be random byte-values of all different types.
Numbers are only the digits 0-9 (and - & + signs as well).

Can't assign wide char into wide char field.

I have been given this school project. I have to alphabetically sort list of items by Czech rules. Before I dig deeper, I have decided to test it on a 16 by 16 matrix so I did this:
typedef struct {
wint_t **field;
}LIST;
...
setlocale(LC_CTYPE,NULL);
....
list->field=(wint_t **)malloc(16*sizeof(wint_t *));
for(int i=0;i<16;i++)
list->field[i]=(wint_t *)malloc(16*sizeof(wint_t));
In another function I am trying to assign a char. Like this:
sorted->field[15][15] = L'C';
wprintf(L"%c\n",sorted->field[15][15]);
Everything is fine. Char is printed. But when I try to change it to
sorted->field[15][15] = L'Č';
It says: Extraneous characters in wide character constant ignored. (Xcode) And the printing part is skipped. The main.c file is in UTF-8. If I try to print this:
printf("ěščřžýááíé\n");
It prints it out as written. I am not sure if I should allocate mem using wint_t or wchar_t or if I am doing it right. I tested it with both but none of them works.
clang seems to support entering arbitrary byte sequences into to wide strings with the \x notation:
wchar_t c = L'\x2126';
This compiles without notice.
Edit: Adapting what I find on wikipedia about wide characters, the following works for me:
#include <stdio.h>
#include <wchar.h>
#include <stdlib.h>
#include <locale.h>
int main(void)
{
setlocale(LC_ALL,"");
wchar_t myChar1 = L'\x2126';
wchar_t myChar2 = 0x2126; // hexadecimal encoding of char Ω using UTF-16
wprintf(L"This is char: %lc \n",myChar1);
wprintf(L"This is char: %lc \n",myChar2);
}
and prints nice Ω characters in my terminal. Make sure that your teminal is able to interpret utf-8 characters.

C Programming : how do I read and print out a byte from a binary file?

I wish to open a binary file, to read the first byte of the file and finally to print the hex value (in string format) to stdout (ie, if the first byte is 03 hex, I wish to print out 0x03 for example). The output I get does not correspond with what I know to be in my sample binary, so I am wondering if someone can help with this.
Here is the code:
#include <stdio.h>
#include <fcntl.h>
int main(int argc, char* argv[])
{
int fd;
char raw_buf[1],str_buf[1];
fd = open(argv[1],O_RDONLY|O_BINARY);
/* Position at beginning */
lseek(fd,0,SEEK_SET);
/* Read one byte */
read(fd,raw_buf,1);
/* Convert to string format */
sprintf(str_buf,"0x%x",raw_buf);
printf("str_buf= <%s>\n",str_buf);
close (fd);
return 0;
}
The program is compiled as follows:
gcc rd_byte.c -o rd_byte
and run as follows:
rd_byte BINFILE.bin
Knowing that the sample binary file used has 03 as its first byte, I get the output:
str_buf= <0x22cce3>
What I expect is
str_buf= <0x03>
Where is the error in my code?
Thank you for any help.
You're printing the value of the pointer raw_buf, not the memory at that location:
sprintf(str_buf,"0x%x",raw_buf[0]);
As Andreas said, str_buf is also not big enough. But: no need for a second buffer, you could just call printf directly.
printf("0x%x",raw_buf[0]);
Less is more...
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
int main(int argc, char* argv[]) {
int fd;
unsigned char c;
/* needs error checking */
fd = open(argv[1], O_RDONLY);
read(fd, &c, sizeof(c));
close(fd);
printf("<0x%x>\n", c);
return 0;
}
seeking is not needed
if you want to read a byte use an unsigned char
printf will do the format
I think that you are overcomplicating things and using non-portable constructs where they aren't really necessary.
You should be able to just do:
#include <stdio.h>
int main(int argc, char** argv)
{
if (argc < 2)
return 1; /* TODO: better error handling */
FILE* f = fopen(argv[1], "rb");
/* TODO: check f is not NULL */
/* Read one byte */
int first = fgetc(f);
if (first != EOF)
printf("first byte = %x\n", (unsigned)first);
/* TODO else read failed, empty file?? */
fclose(f);
return 0;
}
str_buf has a maximum size of 1 (char str_buf[1];), it should at least 5 bytes long (4 for XxXX plus the \0).
Moreover, change
sprintf(str_buf,"0x%x",raw_buf);
to
sprintf(str_buf,"0x%x",*raw_buf);
otherwise you'll print the address of the raw_buf pointer, instead of its value (that you obtain by dereferencing the pointer).
Finally, make sure both raw_buf is unsigned. The standard specified that the signness of chars (where not explicitly specified) is implementation defined, ie, every implementation decides whether they should be signed or not. In practice, on most implementations they are signed by default unless you're compiling with a particular flag. When dealing with bytes always make sure they are unsigned; otherwise you'll get surprising results should you want to convert them to integers.
Using the information from the various responses above (thank you all!) I would like to post this piece of code which is a trimmed down version of what I finally used.
There is however a difference between what the following code does and what was described in my origal question : this code does not read the first byte of the binary file header as described originally, but instead reads the 11th and 12th bytes (offsets 10 & 11) of the input binary file (a .DBF file). The 11th and 12th bytes contain the length of a data record (this is what I want to know in fact) with the Least Significant Byte positioned first: for example, if the 11th and 12th bytes are respectivly : 0x06 0x08, then the length of a data record would be 0x0806 bytes, or 2054bytes in decimal
#include <stdio.h>
#include <fcntl.h>
int main(int argc, char* argv[]) {
int fd, dec;
unsigned char c[1];
unsigned char hex_buf[6];
/* No error checking, etc. done here for brevity */
/* Open the file given as the input argument */
fd = open(argv[1], O_RDONLY);
/* Position ourselves on the 11th byte aka offset 10 of the input file */
lseek(fd,10,SEEK_SET);
/* read 2 bytes into memory location c */
read(fd, &c, 2*sizeof(c));
/* write the data at c to the buffer hex_buf in the required (reverse) byte order + formatted */
sprintf(hex_buf,"%.2x%.2x",c[1],c[0]);
printf("Hexadecimal value:<0x%s>\n", hex_buf);
/* copy the hex data in hex_buf to memory location dec, formatting it into decimal */
sscanf(hex_buf, "%x", &dec);
printf("Answer: Size of a data record=<%u>\n", dec);
return 0;
}

Resources