I'm dabbling in a bit of Game Boy save state hacking, and I'm currently getting my head around reading from a binary file in C. I understand that char is one byte so I'm reading in chars using the following code
#include <stdio.h>
#include <errno.h>
#include <string.h>
#include <unistd.h>
int main () {
FILE *fp = NULL;
char buffer[100];
int filesize;
fp = fopen("save file.srm", "r+"); //open
if(fp == NULL) printf("File loading error number %i\n", errno);
fseek(fp, 0L, SEEK_END); //seek to end to get file size
filesize = ftell(fp);
printf("Filesize is %i bytes\n",filesize);
fseek(fp, 0, SEEK_SET); //set read point at start of file
fread(buffer, sizeof buffer, 1, fp);//read into buffer
for(int i=70;i<80;i++) printf("%x\n", buffer[i]); //display
fclose(fp);
return(0);
}
The output I get is
Filesize is 32768 bytes
ffffff89
66
ffffff85
2
2
ffffff8b
44
ffffff83
c
0
I'm trying to load in bytes, so I want each row of the output to have maximum value 0xff. Can someone explain why I'm getting ffffff8b?
if(fp == NULL) printf("File loading error number %i\n", errno);
When you detect an error, do not just print a message. Either exit the program or do something to correct for the error.
char buffer[10];
Use unsigned char for working with raw data. char may be signed, which can cause undesired effects.
fread(buffer, strlen(buffer)+1, 1, fp);
buffer has not been initialized at this point, so the behavior of strlen(buffer) is not defined by the C standard. In any case, you do not want to use the length of the string currently in buffer as the size for fread. You want the size of the array. So use sizeof buffer (without the +1).
for(int i=0;i<10;i++)
Do not iterate to ten. Iterate to the number of bytes put into the buffer by fread. fread returns size_t value that is the number of items read. If you use it as size_t n = fread(buffer, 1, sizeof buffer, fp);, the number of items (in n) will be the number of bytes read, since having 1 for the second argument says each item to read is one byte.
printf("%x\n", buffer[i]);
To print an unsigned char, use %hhx. Because your buffer had signed char elements, some of them were negative. When used in this printf, they were promoted to negative int values. Then, because of the %x, printf attempted to print them as unsigned int values. All the extra bits from the negative values in two’s complement form showed up.
Very simply: char can be signed or unsigned by default: that's down to the compiler. In your case, it appears to be signed.
When you pass the char of buffer[i] to printf(), it is promoted to int, and sign-extended if the original char value had its top bit set. Hence anything that's in the range 0x80-0xff gets a lot of fs prefixing the value.
If you declare buffer to be unsigned char, this problem should not occur. But you should, in combination with that, use "%hhx" rather than "%x" for your printf() format, since the hh length modifier forces printf() to mask the input value so that only those bits applicable to an unsigned char (given that you're using the x specifier) are used.
Related
I recently started dabbing in C again, a language I'm not particularly proficient at and, in fact, keep forgetting (I mostly code in Python). My idea here is to read data from a hypothetically large file as chunks and then process the data accordingly. For now, I'm simulating this by actually loading the whole file into a buffer of type short with fread. This method will be changed, since it would be a very bad idea for, say, a file that's 1 GB, I'd think. The end goal is to read a chunk as one, process, move the cursor, read another chunk and so on.
The file in question is 43 bytes and has the phrase "The quick brown fox jumps over the lazy dog". This size is convenient because it's a prime number, so no matter how many bytes I split it into, there will always be trailing garbage (due to the buffer having leftover space?). Data processing in this case is just printing out the shorts as two chars after byte manipulation (see code below)
#include <stdio.h>
#include <stdlib.h>
#define MAX_BUFF_SIZE 1024
long file_size(FILE *f)
{
if (fseek(f, 0, SEEK_END) != 0) exit(EXIT_FAILURE); // Move cursor to the end
long file_size = ftell(f); // Determine position to get file size
rewind(f);
return file_size;
}
int main(int argc, char* argv[])
{
short buff[MAX_BUFF_SIZE] = {0}; // Initialize to 0 remove trailing garbage
char* filename = argv[1];
FILE* fp = fopen(filename, "r");
if (fp)
{
size_t size = sizeof(buff[0]); // Size in bytes of each chunk. Fixed to 2 bytes
int nmemb = (file_size(fp) + size - 1) / size; // Number of chunks to read from file
// (ceil f_size/size)
printf("Should read at most %d chunks\n", nmemb);
short mask = 0xFF; // Mask to take first or second byte
size_t num_read = fread(buff, size, nmemb, fp);
printf("Read %lu chunks\n\n", num_read); // Seems to have read more? Look into.
for (int i=0; i<nmemb; i++) {
char first_byte = buff[i] & mask;
char second_byte = (buff[i] >> 8) & mask; // Identity for 2 bytes. Keep mask for consistency
printf("Chunk %02d: 0x%04x | %c %c\n", // Remember little endian (bytes reversed)
i, buff[i], first_byte, second_byte);
}
fclose(fp);
} else
{
printf("File %s not found\n", filename);
return 1;
}
return 0;
}
Now yesterday, on printing out the last chunk of data I was getting "Chunk 21: 0xffff9567 | g". The last (first?) byte (0x67) is g, and I did expect some trailing garbage, but I don't understand why it was printing out so many bytes when the variable buff has shorts in it. At that point I was just printing the hex as %x, not %04x, and buff was not initialized to 0. Today, I decided to initialize it to 0 and not only did the garbage disappear, but I can't recreate the problem even after leaving buff uninitialized again.
So here are my questions that hopefully aren't too abstract:
Does fread look beyond the file when reading data and does it remove trailing garbage itself, or is it up to us?
Why was printf showing an int when the buffer is a short? (I assume %x is for ints) and why can't I replicate the behaviour even after leaving buff without initialization?
Should I always initialize the buffer to zero to remove trailing garbage? What's the usual approach here?
I hope these aren't too many, or too vague, questions, and that I was clear enough. Like I said, I don't know much about C but find low-mid level programming very interesting, especially when it comes to direct data bit/byte manipulation.
Hope you have a great day!
EDIT 1:
Some of you wisely suggested I use num_read instead of nmemb on the loop, since that's the return value of fread, but that means I'll discard the rest of the file (nmemb is 22 but num_read is 21). Is that the usual approach? Also, thank you for pointing out that %x was casting to unsigned int, hence the 4 bytes instead of 2.
EDIT 2:
For clarification, and since I mispoke in a comment, I'd like to keep the remaining byte (or data), while discarding the rest, which is undefined. I don't know if this is the usual approach since if I use num_read in the loop, whatever is leftover at the end is discarded, data or not. I'm more interested in knowing what the usual approach is: discard leftover data or remove anything that we know is undefined, in this case one of the bytes.
I can't understand why function fread() behaves differently in these 2 examples:
1)
I have a structure with a short and a char (size is 4 bytes including padding) and an array of three such structures.If I write each short and char of each structure separately with fwrite() and then read that file with fread() to a variable whose type is that structure, I will read 4 bytes at a time (there will be 9 bytes in the file) so you can see that one byte will be left in 3rd iteration (and one byte will be lost in each iteration).What happens is that there is no 3rd read because I'm left with one byte and fread has to read 4 bytes.
2)
A simpler example, if I write a 1 byte char to a file with fwrite() and then put the content of that file into a 4 byte int with fread(), the integer will get that data.
Why does this happen?Why does the data get read in one case but not in the other if EOF is reached?
Here is the first example:
int main()
{
struct X { short int s; char c; } y, x[]=
{{0x3132,'3'},{0x3435,'6'},{0x3738,'9'}};
FILE *fp=fopen("FILE.DAT","wb+");
if (fp)
{
for(int i=0;i<sizeof(x)/sizeof(x[i]);)
{
fwrite(&x[i].s,sizeof(x[i].s),1,fp);
fwrite(&x[i].c,sizeof(x[i].c),1,fp);
i++;
}
rewind(fp);
for(int i=0;fread(&y,sizeof(y),1,fp);)
printf("%d:%x %c\n",++i, y.s, y.c);
fclose(fp);
}
return 0;
}
Second example:
int main()
{
FILE *fp=fopen("FILE.DAT","wb+");
char c = 'a';
fwrite(&c, sizeof(c), 1, fp);
rewind(fp);
int num;
fread(&num, sizeof(num), 1, fp);
fclose(fp);
return 0;
}
Why does the data get read in one case but not in the other if EOF is reached?
"What happens is that there is no 3rd read because I'm left with one byte and fread has to read 4 bytes." is a questionable premise.
1st Code did read 3 times. There are with no bytes left to read.
In both codes, the last read was a partial read with a fread() return value of 0.#Useless
(The first code did not print the result of the 3rd read.)
With fread(), a return value of 0 does not mean "end-of-file" was immediately encountered - nothing read. Instead, 0 means an complete read did not occur due to :
* "end-of-file" or partial read.
* rare I/O error.
Why does this happen?
In the 2nd code, results may differ due to Indeterminate behavior
fread() ... If a partial element is read, its value is indeterminate1 C11dr §7.21.8.1 2
fread(&num, sizeof(num), 1, fp) result may or may not be as expected.
A more informative example
int main(void) {
FILE *fp = fopen("FILE.DAT", "wb+");
char c = 'a';
printf(" %8X\n", c);
fwrite(&c, sizeof(c), 1, fp);
rewind(fp);
unsigned num = rand();
printf(" %8X\n", num);
size_t len = fread(&num, sizeof(num), 1, fp);
printf("%zu %8X\n", len, num);
len = fread(&num, sizeof(num), 1, fp);
printf("%zu\n", len);
fclose(fp);
return 0;
}
Output
61 as expected
5851F42D as expected - some random value
0 5851F461 Indeterminate! (in this case, looks like the LSByte was replaced.)
0 as expected
Moral of the story: assess the return value of fread() before relying on what was read into the buffer.
1 indeterminate value
either an unspecified value or a trap representation
... when EOF is reached ...
EOF isn't "reached". Many <stdio.h> functions return EOF as a signal that something went wrong, giving no indication what that something is. If you want to know what went wrong after receiving the signal, test with feof() and/or ferror().
I need to be able to make sure my array is correctly receiving values from the file card.raw through fread.
I am not confident about using arrays with pointers, so if anybody could help me with the theory here, it would be GREATLY appreciate it. Thanks in advance.
The code is supposed to take literally one block of size 512 bytes and stick it into the array. Then I am just using a debugger and printf to examine the arrays output.
/**
* recover.c
*
* Computer Science 50
* Problem Set 4
*
* Recovers JPEGs from a forensic image.
*/
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
int main(int argc, char* argv[])
{
//Size of EACH FAT JPEG in bytes
#define FILESIZE 512
unsigned char* buffer[FILESIZE];
///Step 1: Open jpeg
FILE* readfrom = fopen("card.raw", "rb");
if (readfrom == NULL)
{
printf("Could not open");
}
///Step 2: Find Beginning of JPEG. The first digits will be 255216255 Then 224 or 225
fread(&buffer, FILESIZE, 1, readfrom);
for(int x = 0; x < FILESIZE; x++)
{
printf("%d = %c\n", x, buffer[x]);
}
fclose(readfrom);
}
Use return values from input functions. fread() reports how many elements were read - code might not have read 512. Swap FILESIZE, 1 to detect the number of characters/bytes read.
// fread(&buffer, FILESIZE, 1, readfrom);
size_t count = fread(&buffer, 1, FILESIZE, readfrom);
Only print out up to the number of elements read. Recommend hexadecimal (and maybe decimal) output rather than character.
for(size_t x = 0; x < count; x++) {
// printf("%d = %c\n", x, buffer[x]);
printf("%3zu = %02X % 3u\n", x, buffer[x], buffer[x]);
}
If the fopen() failed, best to not continue with for() and fclose().
if (readfrom == NULL) {
printf("Could not open");
return -1;
}
The second parameter is size, in bytes, of each element to be read.
The third parameter is Number of elements each one with a size of the <second parameter> bytes.
So, swap your second and first parameters.
Replace unsigned char* buffer[FILESIZE]; with unsigned char buffer[FILESIZE];. For now, you have an array of unsigned char *, when you need unsigned char. Because buffer is already a pointer, you don't need to take its address. In fread call, replace &buffer with buffer.
It must go like this: fread(buffer, 1, FILESIZE, readfrom);
One more thing: add return with a specific error code after printf("Could not open");, because if file hasn't been open, you cannot read from it, can you? And add return 0; in the end of main.
And take your #define out of main.
Read more about fread here: http://www.cplusplus.com/reference/cstdio/fread/
On this Wikipedia page there is a sample C program reading and printing first 5 bytes from a file:
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
char buffer[5] = {0}; /* initialized to zeroes */
int i;
FILE *fp = fopen("myfile", "rb");
if (fp == NULL) {
perror("Failed to open file \"myfile\"");
return EXIT_FAILURE;
}
for (i = 0; i < 5; i++) {
int rc = getc(fp);
if (rc == EOF) {
fputs("An error occurred while reading the file.\n", stderr);
return EXIT_FAILURE;
}
buffer[i] = rc;
}
fclose(fp);
printf("The bytes read were... %x %x %x %x %x\n", buffer[0], buffer[1], buffer[2], buffer[3], buffer[4]);
return EXIT_SUCCESS;
}
The part I don’t understand is that it uses getc function which returns an int and stores it in an array of chars - how is it possible to store ints in a char array ?
Techically, C allows you to "shorten" a variable by assigning it to something that is smaller than itself. The specification doesn't say EXACTLY what happens when you do that (because of technicalities in some machines where slightly weird things happens), but in practice, on nearly all machines that you are likely to use unless you work on museum pieces or some very special hardware, it simply acts as if the "upper" bits of the larger number has been "cut off".
And in this particular case, getc is specifically designed to return something that fits in a char, except for the case when it returns EOF, which often has the value -1. Although quite often, char may well support having the value -1 too, but it's not guaranteed to be the case (if char is an unsigned type - something the C and C++ standards support equally with char being a signed type that can be -1).
Check this out:-
If the integer value returned by getc() is stored into a variable of
type char and then compared against the integer constant EOF, the
comparison may never succeed, because sign-extension of a variable of
type char on widening to integer is implementation-defined.
Yes, getc() returns an integer. However, except for the special return value EOF, the returned value will always be within the range of a char (-128 to 127 on a 2's compliment machine with default signed chars).
Therefore, after checking for EOF, it is always safe to transfer the value to a char variable without data loss.
Here i am using two different functions for calculating CRC16 for any type of file (.txt,.tar,.tar.gz,.bin,.scr,.sh etc) and different size also varies from 1 KB to 5 GB.
I want to achieve this
`cross platform
less time consuming
Have to work proper for any type of file and any size`
i got same value of CRC in both functions. but any one can tell me which one is more better to calculate CRC16 for any type of file with any size on different different platform.
Here we have to consider 0 to 255 all type characters.
Can any body please suggest me which one is good in my requirements.
Code of both functions :
First one which has int datatype in readChar here i am using int readChar
int CRC16_int(const char* filePath) {
//Declare variable to store CRC result.
unsigned short result;
//Declare loop variables.
int intInnerLoopIndex;
result = 0xffff; //initialize result variable to perform CRC checksum calculation.
//Store message which read from file.
//char content[2000000];
//Create file pointer to open and read file.
FILE *readFile;
//Use to read character from file.
int readChar;
//open a file for Reading
readFile = fopen(filePath, "rb");
//Checking file is able to open or exists.
if (!readFile) {
fputs("Unable to open file %s", stderr);
}
/*
Here reading file and store into variable.
*/
int chCnt = 0;
while ((readChar = getc(readFile)) != EOF) {
//printf("charcater is %c\n",readChar);
//printf("charcater is %c and int is %d \n",readChar,readChar);
result ^= (short) (readChar);
for (intInnerLoopIndex = 0; intInnerLoopIndex < 8; intInnerLoopIndex++) {
if ((result & 0x0001) == 0x0001) {
result = result >> 1; //Perform bit shifting.
result = result ^ 0xa001; //Perform XOR operation on result.
} else {
result = result >> 1; //Perform bit shifting.
}
}
//content[chCnt] = readChar;
chCnt++;
}
printf("\nCRC data length in file: %d", chCnt);
//This is final CRC value for provided message.
return (result);
}
Second one is unsigned char datatype of readChar Here i am using unsigned char readChar
int CRC16_unchar(const char* filePath) {
unsigned int filesize;
//Declare variable to store CRC result.
unsigned short result;
//Declare loop variables.
unsigned int intOuterLoopIndex, intInnerLoopIndex;
result = 0xffff; //initialize result variable to perform CRC checksum calculation.
FILE *readFile;
//Use to read character from file.
//The problem is if you read a byte from a file with the hex value (for example) 0xfe,
//then the char value will be -2 while the unsigned char value will be 254.
//This will significantly affect your CRC
unsigned char readChar;
//open a file for Reading
readFile = fopen(filePath, "rb");
//Checking file is able to open or exists.
if (!readFile) {
fputs("Unable to open file %s", stderr);
}
fseek(readFile, 0, SEEK_END); // seek to end of file
filesize = ftell(readFile); // get current file pointer
fseek(readFile, 0, SEEK_SET); // seek back to beginning of file
/*
Here reading file and store into variable.
*/
int chCnt = 0;
for (intOuterLoopIndex = 0; intOuterLoopIndex < filesize; intOuterLoopIndex++) {
readChar = getc(readFile);
printf("charcater is %c and int is %d\n",readChar,readChar);
result ^= (short) (readChar);
for (intInnerLoopIndex = 0; intInnerLoopIndex < 8; intInnerLoopIndex++) {
if ((result & 0x0001) == 0x0001) {
result = result >> 1; //Perform bit shifting.
result = result ^ 0xa001; //Perform XOR operation on
} else {
result = result >> 1; //Perform bit shifting.
}
}
chCnt++;
}
printf("\nCRC data length in file: %d", chCnt);
return (result);
}
Please Help me to figure out this problem
Thanks
First things first. Don't do file reading (or whatever the source is) and CRC calculating in the same function. This is bad design. File reading is typically not completely platform independent (although POSIX is your best friend), but CRC calculation can be done very platform independently. Also you might want to reuse your CRC algorithm for other kind of data sources which aren't accessed with fopen().
To give you a hint, the CRC function I always drop in to my projects has this prototype:
uint16_t Crc16(const uint8_t* buffer, size_t size,
uint16_t polynomial, uint16_t crc);
You don't have to call the function once and feed it the complete contents of the file. Instead you can loop through the file in blocks and call the function for each block. The polynomial argument in your case is 0xA001 (which is BTW a polynomial in 'reversed' form), and the crc argument is set to 0xFFFF the first time. Each subsequent time you call the function you pass the previous return value of the function to the crc argument.
In your second code frament (CRC16_unchar) you first determine the filesize and then read that number of bytes. Don't do that, it unnecessary limits you to handle files of maximum 4GB (in the most cases). Just reading until EOF is cleaner IMHO.
Furthermore I see that you are struggling with signed/unsigned bytes. Do know that
printf doesn't know if you pass an signed or unsigned integer. You tell printf with '%d' or '%u' how to interpret the integer.
Even in C itself there is hardly a difference between a signed and unsigned integer. C won't magically change the value of 255 to -1 if you do int8_t x = 255.
See this anser for more details about when C uses the signedness of an integer: When does the signedness of an integer really matter?. Rule of thumb: Just always use uint8_t for handling raw bytes.
So both functions are fine regarding signedness/integer size.
EDIT: As other users indicated in their answers, read the file in block instead per-byte:
uint16_t CRC16_int(const char* filePath) {
FILE *readFile;
const uint8_t buf[1024];
size_t len;
uint16_t result = 0xffff;;
/* Open a file for reading. */
readFile = fopen(filePath, "rb");
if (readFile == NULL) {
exit(1);
}
/* Read until EOF. */
while ( (len = fread(buf, sizeof(buf), 1, readFile)) > 0 ) {
result = Crc16(buf, len, 0xA001, result);
}
/* readFile could be in error state, check it with ferror() or feof() functions. */
return result;
}
Also you should alter you function prototype to make it possible to return an error, e.g.:
// Return true when successful, false on error. CRC is stored in result.
bool CRC16_int(const char* filePath, uint16_t *result)
You want to read and write 8-bit bytes using unsigned char instead of plain char because char can be either signed or unsigned and that's up to the compiler (allowed by the C standard). So, the value you get from getc() should be converted to unsigned char prior to being used in the CRC calculations. You could also fread() into an unsigned char. If you work with signed chars, sign extension of chars into ints will likely break your CRC calculations.
Also, per the C standard fseek(FilePtr, 0, SEEK_END) has undefined behavior for binary streams and binary streams need not meaningfully support SEEK_END in fseek(). In practice, though, this usually works as we want.
Another thing you should consider is checking for I/O errors. Your code is broken in this respect.
The datatype you do the calculation with should, in my opinion, not be the same that you read from the file. Doing one function call into the runtime library to read a single byte is simply not efficient. You should probably read on the order of 2-4 KB at a time, and then iterate over each returned "chunk" in whatever manner you choose.
There's also absolutely no point in reading in the size of the file in advance, you should simply read until reading returns less data than expected, in which case you can inspect feof() and ferror() to figure out what to do, typically just stop since you're done. See the fread() manual page.