I have one encrypted file named encrypt.
Here I calculated crc 16 for this file and store this crc result in unsigned short this unsigned short size is 2 byte(16 bits).
Now I want to append 2 byte of crc value at the end of this file and read these last 2 bytes from file and have to compare this crc so how can I achieve this thing?
I used this code
fseek(readFile, filesize, SEEK_SET);
fprintf(readFile,"%u",result);
Here filesize is my file original encrypted file size and after this i add result which is unsigned short but in file its write 5 bytes.
file content after this
testsgh
30549
original file data is testsgh but here crc is 30459 I want to store this value in 2 byte. so how can I do?
You should open the file in binary append mode:
FILE *out = fopen("myfile.bin", "ab");
This will eliminate the need to seek to the end.
Then, you need to use a direct write, not a print which converts the value to a string and writes the string. You want to write the bits of your unsigned short checksum:
const size_t wrote = fwrite(&checksum, sizeof checksum, 1, out);
This succeeded if and only if the value of wrote is 1.
However, please note that this risks introducing endianness errors, since it writes the value using your machine's local byte order. To be on the safe side, it's cleaner to decide on a byte ordering and implement it directly. For big-endian:
const unsigned char check_bytes[2] = { checksum >> 8, checksum & 255 };
const size_t wrote = fwrite(check_bytes, sizeof check_bytes, 1, out);
Again, we expect wrote to be 1 after the call to indicate that both bytes were successfully written.
Use fwrite(), not fprintf. I don't have access to a C compiler atm but fwrite(&result, sizeof(result), 1, readFile); should work.
You could do something like this:
unsigned char c1, c2;
c1 = (unsigned char)(result >> 8);
c2 = (unsigned char)( (result << 8) >> 8);
and then append c1 and c2 at the end of the file. When you read the file back, just do the opposite:
result = ( (unsigned)c1 << 8 ) + (unsigned)c2;
Hope that helps.
you can write single characters with %c formating. e.g.
fprintf(readfile, "%c%c", result % 256, result / 256)
btw: readfile is misleading, when you write to it :-)
Related
Basically I have a text file that contains a number. I changed the number to 0 to start and then I read 2 bytes from the file (because an int is 2 bytes) and I converted it to an int. I then print the results, however it's printing out weird results.
So when I have 0 it prints out 2608 for some reason.
I'm going off a document that says I need to read through a file where the offset of bytes 0 to 1 represents a number. So this is why I'm reading bytes instead of characters...
I imagine the issue is due to reading bytes instead of reading by characters, so if this is the case can you please explain why it would make a difference?
Here is my code:
void readFile(FILE *file) {
char buf[2];
int numRecords;
fread(buf, 1, 2, file);
numRecords = buf[0] | buf[1] << 8;
printf("numRecords = %d\n", numRecords);
}
I'm not really sure what the buf[0] | buf[1] << 8 does, but I got it from another question... So I suppose that could be the issue as well.
The number 0 in your text file will actually be represented as a 1-byte hex number 0x30. 0x30 is loaded to buf[0]. (In the ASCII table, 0 is represented by 0x30)
You have garbage data in buf[1], in this case the value is 0x0a. (0x0a is \n in the ASCII table)
Combining those two by buf[0] | buf[1] << 8 results in 0x0a30 which is 2608 in decimal. Note that << is the bit-wise left shift operator.
(Also, the size of int type is 4-byte in many systems. You should check that out.)
You can directly read into integer
fread(&numRecords, sizeof(numRecords), 1, file);
You need to check sizeof(int) on your system, if its four bytes you need to declare numRecords as short int rather than int
Lets say I have 2 variables:
int var1 = 1; //1 byte
int var2 = 2; //1 byte
I want to combine these and encode as a 32bit unsigned integer (uint32_t). By combining them, it would be 2 bytes. I'd then fill the remaining space with 2 bytes of 0 padding. This is to write to a file, hence the need for this specific type of encoding.
So by combining the above example variables, the output I need is:
1200 //4 bytes
There's no need to go the roundabout way of "combining" the values into an uint32_t. Binary files are streams of bytes, so writing single bytes is very possible:
FILE * const out = fopen("myfile.bin", "wb");
const int val1 = 1;
const int val2 = 2;
if(out != NULL)
{
fputc(val1, out);
fputc(val2, out);
// Pad the file to four bytes, as originally requested. Not needed, though.
fputc(0, out);
fputc(0, out);
fclose(out);
}
This uses fputc() to write single bytes to the file. It takes an integer argument for the value to write, but treats it as unsigned char internally, which is essentially "a byte".
Reading back would be just as simple, using e.g. fgetc() to read out the two values, and of course checking for failure. You should check these writes too, I omitted it because error handling.
I have a programm in C that writes a frequency table to a binary file.
The frequency table is an array filled with structs that contains an int and a char.
So I have to write an unsigned int counter and an unsigned char character to the file (multiple times).
I know that an integer normally uses 4 bytes however I know that the int counter can never be bigger than 2^24-1.
So I could use 4 bytes to write the counter and the character to the file => 3 bytes for counter and 1 byte for the character. This would also be easy to read.
Is there an easy way to do this in C without using special libraries?
Yes, there is a very easy way of doing it in C. You can combine a char, which is one byte on all platforms, with an int of up to 24 bits in size by shifting the char by 24 bits to the left:
uint32_t toWrite = (myChar << 24) | myCount;
When you read the data back, perform the opposite operation:
uint32_t fromFile;
uint32_t myCount = fromFile & 0xFFFFFF;
char myChar = (fromFile >> 24) & 0xFF;
fread(cur, 2, 1, fin)
I am sure I will feel stupid when I get an answer to this, but what is happening?
cur is a pointer to a code_cur, a short (2 bytes), fin is a stream open for binary reading.
If my file is 00101000 01000000
what I get in the end is
code_cur = 01000000 00101000
Why is that? I am not putting any contest yet because the problem really boils down to this (at least for me) unexpected behaviour.
And, in case this is the norma, how can I obtain the desired effect?
P.S.
I should probably add that, in order to 'view' the bytes, I am printing their integer value.
printf("%d\n",code_cur)
I tried it a couple times and it seemed reliable.
As others have pointed out you need to learn more on endianness.
You don't know it but your file is (luckily) in Network Byte Order (which is Big Endian). Your machine is little endian, so a correction is needed. Needed or not, this correction is always recommended as this will guarantee that your program runs everywhere.
Do somethig similar to this:
{
uint16_t tmp;
if (1 == fread(&tmp, 2, 1, fin)) { /* Check fread finished well */
code_cur = ntohs(tmp);
} else {
/* Treat error however you see fit */
perror("Error reading file");
exit(EXIT_FAILURE); // requires #include <stdlib.h>
}
}
ntohs() will convert your value from file order to your machine's order, whatever it is, big or little endian.
This is why htonl and htons (and friends) exist. They're not part of the C standard library, but they're available on pretty much every platform that does networking.
"htonl" means "host to network, long"; "htons" means "host to network, short". In this context, "long" means 32 bits, and "short" means 16 bits (even if the platform declares "long" to be 64 bits). Basically, whenever you read something from the "network" (or in your case, the stream you're reading from), you pass it through "ntoh*". When you're writing out, you pass it through "hton*"
You can permutate those function names in whatever way you want, except for the silly ones (no, there is no ntons, and no stonl either)
As others have pointed out, this is an endianess issue.
The Most Significant Byte differs in your file and your machine. Your file has big-endian (MSB first) and your machine is little-endian (MSB last or LSB first).
To understand what's happening, let's create a file with some binary data:
uint8_t buffer[2] = {0x28, 0x40}; // hexadecimal for 00101000 01000000
FILE * fp = fopen("file.bin", "wb"); // opens or creates file as binary
fwrite(buffer, 1, 2, fp); // write two bytes to file
fclose(fp);
The file.bin was created and holds the binary value 00101000 01000000, let's read it:
uint8_t buffer[2] = {0, 0};
FILE * fp = fopen("file.bin", "rb");
fread(buffer, 1, 2, fp); // read two bytes from file
fclose(fp);
printf("0x%02x, 0x%02x\n", buffer[0], buffer[1]);
// The above prints 0x28, 0x40, as expected and in the order we wrote previously
So everything works well because we are reading byte-by-byte and bytes don't have endianess (technically they do, they are always Most Significant Bit first regardless of your machine, but you may think as if they didn't to simplify the understanding).
Anyways, as you noticed, here's what happens when you try to read the short directly:
FILE * fp_alt = fopen("file.bin", "rb");
short incorrect_short = 0;
fread(&incorrect_short, 1, 2, fp_alt);
fclose(fp_alt);
printf("Read short as machine endianess: %hu\n", incorrect_short);
printf("In hex, that is 0x%04x\n", incorrect_short);
// We get the incorrect decimal of 16424 and hex of 0x4028!
// The machine inverted our short because of the way the endianess works internally
The worst part is that if you're using a big-endian machine, the above results would not return incorrect number leaving you unaware that your code is endian-specific and not portable between processors!
It's nice to use ntohs from arpa/inet.h to convert the endianess, but I find it strange since it's a whole (non-standard) library made for network communication to solve an issue that comes from reading files, and it solves it by reading it incorrectly from the file and then 'translating' the incorrect value instead of just reading it correctly.
In higher languages we often see functions to handle reading endianess from file instead of converting the value because we (usually) know how a file structure is and its endianess, just look at Javascript Buffer's readInt16BE method, straight to the point and easy to use.
Motivated by this simplicity, I created a function that reads a 16-bit integer below (but it's very easy to change to 8, 32 or 64 bits if you need to):
#include <stdint.h> // necessary for specific int types
// Advances and reads a single signed 16-bit integer from the file descriptor as Big Endian
// Writes the value to 'result' pointer
// Returns 1 if succeeds or 0 if it fails
int8_t freadInt16BE(int16_t * result, FILE * f) {
uint8_t buffer[sizeof(int16_t)];
if (!result || !f || sizeof(int16_t) != fread((void *) buffer, 1, sizeof(int16_t), f))
return 0;
*result = buffer[0] << 8 + buffer[1];
return 1;
}
Usage is simple (error handling omitted for brevity):
FILE * fp = fopen("file.bin", "rb"); // Open file as binary
short code_cur = 0;
freadInt16BE(&code_cur, fp);
fclose(fp);
printf("Read Big-Endian (MSB first) short: %hu\n", code_cur);
printf("In hex, that is 0x%04x\n", code_cur);
// The above code prints 0x2840 correctly (decimal: 10304)
The function will fail (return 0) if the file either: doesn't exist, can't be open, or did not contain the 2 bytes to be read at the current position.
As a bonus, if you happen to find a file that is little-endian, you can use this function:
// Advances and reads a single signed 16-bit integer from the file descriptor as Little Endian
// Writes the value to 'result' pointer
// Returns 1 if succeeds or 0 if it fails
int8_t freadInt16LE(int16_t * result, FILE * f) {
uint8_t buffer[sizeof(int16_t)];
if (!result || !f || sizeof(int16_t) != fread((void *) buffer, 1, sizeof(int16_t), f))
return 0;
*result = buffer[1] << 8 + buffer[0];
return 1;
}
I need to read 32bit instructions from a binary file.
so what i have right now is:
unsigned char buffer[4];
fread(buffer,sizeof(buffer),1,file);
which will put 4 bytes in an array
how should I approach that to connect those 4 bytes together in order to process 32bit instruction later?
Or should I even start in a different way and not use fread?
my weird method right now is to create an array of ints of size 32 and the fill it with bits from buffer array
The answer depends on how the 32-bit integer is stored in the binary file. (I'll assume that the integer is unsigned, because it really is an id, and use the type uint32_t from <stdint.h>.)
Native byte order The data was written out as integer on this machine. Just read the integer with fread:
uint32_t op;
fread(&op, sizeof(op), 1, file);
Rationale: fread read the raw representation of the integer into memory. The matching fwrite does the reverse: It writes the raw representation to thze file. If you don't need to exchange the file between platforms, this is a good method to store and read data.
Little-endian byte order The data is stored as four bytes, least significant byte first:
uint32_t op = 0u;
op |= getc(file); // 0x000000AA
op |= getc(file) << 8; // 0x0000BBaa
op |= getc(file) << 16; // 0x00CCbbaa
op |= getc(file) << 24; // 0xDDccbbaa
Rationale: getc reads a char and returns an integer between 0 and 255. (The case where the stream runs out and getc returns the negative value EOF is not considered here for brevity, viz laziness.) Build your integer by shifting each byte you read by multiples of 8 and or them with the existing value. The comments sketch how it works. The capital letters are being read, the lower-case letters were already there. Zeros have not yet been assigned.
Big-endian byte order The data is stored as four bytes, least significant byte last:
uint32_t op = 0u;
op |= getc(file) << 24; // 0xAA000000
op |= getc(file) << 16; // 0xaaBB0000
op |= getc(file) << 8; // 0xaabbCC00
op |= getc(file); // 0xaabbccDD
Rationale: Pretty much the same as above, only that you shift the bytes in another order.
You can imagine little-endian and big-endian as writing the number one hundred and twenty tree (CXXIII) as either 321 or 123. The bit-shifting is similar to shifting decimal digtis when dividing by or multiplying with powers of 10, only that you shift my 8 bits to multiply with 2^8 = 256 here.
Add
unsigned int instruction;
memcpy(&instruction,buffer,4);
to your code. This will copy the 4 bytes of buffer to a single 32-bit variable. Hence you will get connected 4 bytes :)
If you know that the int in the file is the same endian as the machine the program's running on, then you can read straight into the int. No need for a char buffer.
unsigned int instruction;
fread(&instruction,sizeof(instruction),1,file);
If you know the endianness of the int in the file, but not the machine the program's running on, then you'll need to add and shift the bytes together.
unsigned char buffer[4];
unsigned int instruction;
fread(buffer,sizeof(buffer),1,file);
//big-endian
instruction = (buffer[0]<<24) + (buffer[1]<<16) + (buffer[2]<<8) + buffer[3];
//little-endian
instruction = (buffer[3]<<24) + (buffer[2]<<16) + (buffer[1]<<8) + buffer[0];
Another way to think of this is that it's a positional number system in base-256. So just like you combine digits in a base-10.
257
= 2*100 + 5*10 + 7
= 2*10^2 + 5*10^1 + 7*10^0
So you can also combine them using Horner's rule.
//big-endian
instruction = ((((buffer[0]*256) + buffer[1]*256) + buffer[2]*256) + buffer[3]);
//little-endian
instruction = ((((buffer[3]*256) + buffer[2]*256) + buffer[1]*256) + buffer[0]);
#luser droog
There are two bugs in your code.
The size of the variable "instruction" must not be 4 bytes: for example, Turbo C assumes sizeof(int) to be 2. Obviously, your program fails in this case. But, what is much more important and not so obvious: your program will also fail in case sizeof(int) be more than 4 bytes! To understand this, consider the following example:
int main()
{ const unsigned char a[4] = {0x21,0x43,0x65,0x87};
const unsigned char* p = &a;
unsigned long x = (((((p[3] << 8) + p[2]) << 8) + p[1]) << 8) + p[0];
printf("%08lX\n", x);
return 0;
}
This program prints "FFFFFFFF87654321" under amd64, because an unsigned char variable becomes SIGNED INT when it is used! So, changing the type of the variable "instruction" from "int" to "long" does not solve the problem.
The only way is to write something like:
unsigned long instruction;
instruction = 0;
for (int i = 0, unsigned char* p = buffer + 3; i < 4; i++, p--) {
instruction <<= 8;
instruction += *p;
}