Repeatedly fread() 16 bits from binary file - c

I'm reading a binary file. The first 16 bits represent an array index, the next 16 represent the number of 16-bit items about to be listed, and then the remaining multiples of 16 represent all those 16-bit items. For example, the following hex-dump of the file 'program':
30 00 00 02 10 00 F0 25
represents index 0x3000, with 0x0002 elements following, which are 0x1000 and 0xF025.
FILE *fp = fopen(program, "rb");
char indexChar, nItemsChar;
u_int16_t index, nItems;
fread (&indexChar, 2, 1, fp);
fread (&nItemsChar, 2, 1, fp);
address = strtol(&indexChar, NULL, 16);
nItems = strtol(&nItemsChar, NULL, 16);
for (u_int16_t i = 0; i < nItems; ++i)
{
fread (state->mem + index + i, 2, 1, fp);
}
I'm not even sure if this approach works because I get EXC_BAD_ACCESS when trying to fRead() into nItemsChar. What am I doing wrong?

You are confusing ascii (text) file i/o and binary.
Program is crashing at fread(&nItemsChar,2,1,fp) because you have read 2 bytes into a 1 byte memory space (actually it may be messing up on the previous fread)
You then try to use strtol which converts from ascii to long int but the values read are binary
Instead just use
fread(&index, sizeof(index),1,fp);
fread(&nItems, sizeof(nItems),1,fp);
and then the for loop. Note that this assumes that the file is written with the same endianness as your processor/configuration.

uint16_t index, *nItems;
fread (&index, sizeof(uint16_t), 1, fp);
nItems = (uint16_t*)calloc(index, sizeof(uint16_t));
fread (nItems, sizeof(uint16_t), index, fp);

Related

Value of CRC changing everytime program is run

I am writing a CLI utility in C that analyzes PNG files and outputs data about it. More specifically, it prints out the length, CRC and type values of each chunk in the PNG file. I am using the official specification for the PNG file format and it says that each chunk has a CRC value encoded in it for data integrity.
My tool is running fine and it outputs the correct values for length and type and outputs what appears to be a correct value for the CRC (as in it is formatted as 4-bytes hexadecimal) - the only problem is that everytime I run this program, the value of the CRC changes. Is this normal, and if not what could be causing it?
Here is the main part of the code
CHUNK chunk;
BYTE buffer;
int i = 1;
while (chunk.type != 1145980233) { // 1145980233 is a magic number that signals our program that IEND chunk
// has been reached it is just the decimal equivalent of 'IEND'
printf("============\nCHUNK: %i\n", i);
// Read LENGTH value; we have to buffer and then append to length hexdigit-by-hexdigit to account for
// reversals of byte-order when reading infile (im not sure why this reversal only happens here)
for(unsigned j = 0; j < 4; ++j) {
fread(&buffer, 1, sizeof(BYTE), png);
chunk.length = (chunk.length | buffer)<<8; // If length is 0b4e and buffer is 67 this makes sure that length
// ends up 0b4e67 and not 0b67
}
chunk.length = chunk.length>>8; // Above bitshifting ends up adding an extra 00 to end of length
// This gets rid of that
printf("LENGTH: %u\n", chunk.length);
// Read TYPE value
fread(&chunk.type, 4, sizeof(BYTE), png);
// Print out TYPE in chars
printf("TYPE: ");
printf("%c%c%c%c\n", chunk.type & 0xff, (chunk.type & 0xff00)>>8, (chunk.type & 0xff0000)>>16, (chunk.type & 0xff000000)>>24);
// Allocate LENGTH bytes of memory for data
chunk.data = calloc(chunk.length, sizeof(BYTE));
// Populate DATA
for(unsigned j = 0; j < chunk.length; ++j) {
fread(&buffer, 1, sizeof(BYTE), png);
}
// Read CRC value
for(unsigned j = 0; j < 4; ++j) {
fread(&chunk.crc, 1, sizeof(BYTE), png);
}
printf("CRC: %x\n", chunk.crc);
printf("\n");
i++;
}
here are some preprocessor directives and global variables
#define BYTE uint8_t
typedef struct {
uint32_t length;
uint32_t type;
uint32_t crc;
BYTE* data;
} CHUNK;
here are some examples of the output I am getting
Run 1 -
============
CHUNK: 1
LENGTH: 13
TYPE: IHDR
CRC: 17a6a400
============
CHUNK: 2
LENGTH: 2341
TYPE: iCCP
CRC: 17a6a41e
Run 2 -
============
CHUNK: 1
LENGTH: 13
TYPE: IHDR
CRC: 35954400
============
CHUNK: 2
LENGTH: 2341
TYPE: iCCP
CRC: 3595441e
Run 3 -
============
CHUNK: 1
LENGTH: 13
TYPE: IHDR
CRC: 214b0400
============
CHUNK: 2
LENGTH: 2341
TYPE: iCCP
CRC: 214b041e
As you can see, the CRC values are different each time, yet within each run they are all fairly similar whereas my intuition tells me this should not be the case and the CRC value should not be changing.
Just to make sure, I also ran
$ cat test.png > file1
$ cat test.png > file2
$ diff -s file1 file2
Files file1 and file2 are identical
so accessing the file two different times doesnt change the CRC values in them, as expected.
Thanks,
This:
fread(&chunk.crc, 1, sizeof(BYTE), png);
keeps overwriting the first byte of chunk.crc with the bytes read from the file. The other three bytes of chunk.crc are never written, and so you are seeing whatever was randomly in memory at those locations when your program started. You will note that the 00 and 1e at the ends is consistent, since that is the one byte that is being written.
Same problem with this in your data reading loop:
fread(&buffer, 1, sizeof(BYTE), png);
An unrelated error is that you are accumulating bytes in a 32-bit integer thusly:
chunk.length = (chunk.length | buffer)<<8;
and then after the end of that loop, rolling it back down:
chunk.length = chunk.length>>8;
That will always discard the most significant byte of the length, since you push it off the top of the 32 bits, and then roll eight zero bits back down in its place. Instead you need to do it like this:
chunk.length = (chunk.length << 8) | buffer;
and then all 32 bits are retained, and you don't need to fix it at the end.
This is a bad idea:
fread(&chunk.type, 4, sizeof(BYTE), png);
because it is not portable. What you end up with in chunk.type depends on the endianess of the architecture it is running on. For "IHDR", you will get 0x52444849 on a little-endian machine, and 0x49484452 on a big-endian machine.

How to read binary inputs from a file in C

What I need to do is to read binary inputs from a file. The inputs are for example (binary dump),
00000000 00001010 00000100 00000001 10000101 00000001 00101100 00001000 00111000 00000011 10010011 00000101
What I did is,
char* filename = vargs[1];
BYTE buffer;
FILE *file_ptr = fopen(filename,"rb");
fseek(file_ptr, 0, SEEK_END);
size_t file_length = ftell(file_ptr);
rewind(file_ptr);
for (int i = 0; i < file_length; i++)
{
fread(&buffer, 1, 1, file_ptr); // read 1 byte
printf("%d ", (int)buffer);
}
But the problem here is that, I need to divide those binary inputs in some ways so that I can use it as a command (e.g. 101 in the input is to add two numbers)
But when I run the program with the code I wrote, this provides me an output like:
0 0 10 4 1 133 1 44 8 56 3 147 6
which shows in ASCII numbers.
How can I read the inputs as binary numbers, not ASCII numbers?
The inputs should be used in this way:
0 # Padding for the whole file!
0000|0000 # Function 0 with 0 arguments
00000101|00|000|01|000 # MOVE the value 5 to register 0 (000 is MOV function)
00000011|00|001|01|000 # MOVE the value 3 to register 1
000|01|001|01|100 # ADD registers 0 and 1 (100 is ADD function)
000|01|0000011|10|000 # MOVE register 0 to 0x03
0000011|10|010 # POP the value at 0x03
011 # Return from the function
00000110 # 6 instructions in this function
I am trying to implement some sort of like assembly language commands
Can someone please help me out with this problem?
Thanks!
You need to understand the difference between data and its representation. You are correctly reading the data in binary. When you print the data, printf() gives the decimal representation of the binary data. Note that 00001010 in binary is the same as 10 in decimal and 00000100 in binary is 4 in decimal. If you convert each sequence of bits into its decimal value, you will see that the output is exactly correct. You seem to be confusing the representation of the data as it is output with how the data is read and stored in memory. These are two different and distinct things.
The next step to solve your problem is to learn about bitwise operators: |, &, ~, >>, and <<. Then use the appropriate combination of operators to extract the data you need from the stream of bits.
The format you use is not divisible by a byte, so you need to read your bits into a circular buffer and parse it with a state machine.
Read "in binary" or "in text" is quite the same thing, the only thing that change is your interpretation of the data. In your exemple you are reading a byte, and you are printing the decimal value of that byte. But you want to print the bit of that char, to do that you just need to use binary operator of C.
For example:
#include <stdio.h>
#include <stdbool.h>
#include <string.h>
#include <limits.h>
struct binary_circle_buffer {
size_t i;
unsigned char buffer;
};
bool read_bit(struct binary_circle_buffer *bcn, FILE *file, bool *bit) {
if (bcn->i == CHAR_BIT) {
size_t ret = fread(&bcn->buffer, sizeof bcn->buffer, 1, file);
if (!ret) {
return false;
}
bcn->i = 0;
}
*bit = bcn->buffer & ((unsigned char)1 << bcn->i++); // maybe wrong order you should test yourself
// *bit = bcn->buffer & (((unsigned char)UCHAR_MAX / 2 + 1) >> bcn->i++);
return true;
}
int main(void)
{
struct binary_circle_buffer bcn = { .i = CHAR_BIT };
FILE *file = stdin; // replace by your file
bool bit;
size_t i = 0;
while (read_bit(&bcn, file, &bit)) {
// here you must code your state machine to parse instruction gl & hf
printf(bit ? "1" : "0");
if (++i >= 7) {
i = 0;
printf(" ");
}
}
}
Help you more would be difficult, you are basically asking help to code a virtual machine...

Is there any way to represent 4 byte number as a 16 byte number?

I have to encrypt some data using a 4 byte IV. However, the encryption algorithm that I am using (AES128)needs a 16 byte (128 bit) key. Say, my code is as follows:
#include<gcrypt.h>
void encrypt(){
int IV = 6174;
gcry_cipher_hd_t hd;
errStatus = gcry_cipher_open(&hd, GCRY_CIPHER_AES128, GCRY_CIPHER_MODE_CBC, 0);
errStatus = gcry_cipher_setkey(hd, keyBuffer, 16);
gcry_cipher_setiv(hd, &IV, 16);
gcry_cipher_encrypt(hd, output, 16, bytesToEncrypt, 16);
}
Say keyBuffer contains a 16 byte key and bytes,output` are my input and output respectively. How do I go about properly giving the IV?
Can you try this code just for avoiding buffer overflow? I haven't tested this myself, though
int IV[4] = { 6174 }; // { 6174, 0, 0, 0 }

Reading SQLite header

I was trying to parse the header from an SQLite database file, using this (fragment of the actual) code:
struct Header_info {
char *filename;
char *sql_string;
uint16_t page_size;
};
int read_header(FILE *db, struct Header_info *header)
{
assert(db);
uint8_t sql_buf[100] = {0};
/* load the header */
if(fread(sql_buf, 100, 1, db) != 1) {
return ERR_SIZE;
}
/* copy the string */
header->sql_string = strdup((char *)sql_buf);
/* verify that we have a proper header */
if(strcmp(header->sql_string, "SQLite format 3") != 0) {
return ERR_NOT_HEADER;
}
memcpy(&header->page_size, (sql_buf + 16), 2);
return 0;
}
Here are the relevant bytes of the file I'm testing it on:
0000000: 5351 4c69 7465 2066 6f72 6d61 7420 3300 SQLite format 3.
0000010: 1000 0101 0040 2020 0000 c698 0000 1a8e .....# ........
Following this spec, the code looks correct to me.
Later I print header->page_size with this line:
printf("\tPage size: %"PRIu16"\n", header->page_size);
But that line prints out 16, instead of the expected 4096. Why? I'm almost certain it's some basic thing that I've just overlooked.
It's an endianness problem. x86 is little-endian, that is, in memory, the least significant byte is stored first. When you load 10 00 into memory on a little-endian architecture, you therefore get 00 10 in human-readable form, which is 16 instead of 4096.
Your problem is therefore that memcpy is not an appropriate tool to read the value.
See the following section of the SQLite file format spec :
1.2.2 Page Size
The two-byte value beginning at offset 16 determines the page size of
the database. For SQLite versions 3.7.0.1 and earlier, this value is
interpreted as a big-endian integer and must be a power of two between
512 and 32768, inclusive. Beginning with SQLite version 3.7.1, a page
size of 65536 bytes is supported. The value 65536 will not fit in a
two-byte integer, so to specify a 65536-byte page size, the value is
at offset 16 is 0x00 0x01. This value can be interpreted as a
big-endian 1 and thought of is as a magic number to represent the
65536 page size. Or one can view the two-byte field as a little endian
number and say that it represents the page size divided by 256. These
two interpretations of the page-size field are equivalent.
It seems an endianness issue. If you are on a little-endian machine this line:
memcpy(&header->page_size, (sql_buf + 16), 2);
copies the two bytes 10 00 into an uint16_t which will have the low-order byte at the lower address.
You can do this instead:
header->page_size = sql_buf[17] | (sql_buf[16] << 8);
Update
For the record, note that the solution I propose will work regardless of the endianness of the machine (see this Rob Pike's Article).

Reading bytes from a text file that has the form of machine code in C?

I have a text file with machine code in this form:
B2 0A 05
B2 1A 01
B3 08 00 17
B2 09 18
where an instruction has this format:
OP Mode Operand
Note: Operand could be 1 or 2 bytes.
Where:(example)
OP = B2
Mode = 0A
Operand = 05
How can I read the bytes in a variable? As shown in the above example.
When i read the file I get individual characters. I have an array of pointers where I read individual line, but still cannot solve the problem of reading a byte.
Any ideas,suggestions.
I hope I am not confusing anyone here.
Thank you.
Consider using fscanf. You can use the %x format specifier to read hexadecimal integers.
Verify that the file is opened in binary mode ("rb").
Use fread to read one byte at a time:
unsigned char opcode;
unsigned char mode;
unsigned int operand;
fread(&opcode, 1, sizeof(opcode), data_file);
fread(&mode, 1, sizeof(mode), data_file);
// Use mode and opcode to determine how many bytes to read
if (opcode == 0xB2)
{
unsigned char byte_operand = 0;
fread(&byte_operand, 1, sizeof(byte_operand), data_file);
operand = byte_operand;
}
if (opcode == 0xB3)
{
if (mode == 0x08)
{
fread(&operand, 1, sizeof(operand), data_file);
}
}
A more efficient method is to read in chunks or blocks of data into a buffer and parse the buffer using a pointer to const unsigned char:
unsigned char * buffer = malloc(MAX_BUFFER_SIZE);
unsigned char * p_next_byte = 0;
if (buffer)
{
fread(buffer, MAX_BUFFER_SIZE, sizeof(unsigned char), data_file);
p_next_byte = buffer;
opcode = *p_next_byte++;
mode = *p_next_byte++
Get_Operand(&operand,
&p_next_byte,
opcode,
mode);
}
A safer design is to use a function, Get_Byte(), which returns the next data byte (and reloads buffers if necessary).

Resources