Value of CRC changing everytime program is run - c

I am writing a CLI utility in C that analyzes PNG files and outputs data about it. More specifically, it prints out the length, CRC and type values of each chunk in the PNG file. I am using the official specification for the PNG file format and it says that each chunk has a CRC value encoded in it for data integrity.
My tool is running fine and it outputs the correct values for length and type and outputs what appears to be a correct value for the CRC (as in it is formatted as 4-bytes hexadecimal) - the only problem is that everytime I run this program, the value of the CRC changes. Is this normal, and if not what could be causing it?
Here is the main part of the code
CHUNK chunk;
BYTE buffer;
int i = 1;
while (chunk.type != 1145980233) { // 1145980233 is a magic number that signals our program that IEND chunk
// has been reached it is just the decimal equivalent of 'IEND'
printf("============\nCHUNK: %i\n", i);
// Read LENGTH value; we have to buffer and then append to length hexdigit-by-hexdigit to account for
// reversals of byte-order when reading infile (im not sure why this reversal only happens here)
for(unsigned j = 0; j < 4; ++j) {
fread(&buffer, 1, sizeof(BYTE), png);
chunk.length = (chunk.length | buffer)<<8; // If length is 0b4e and buffer is 67 this makes sure that length
// ends up 0b4e67 and not 0b67
}
chunk.length = chunk.length>>8; // Above bitshifting ends up adding an extra 00 to end of length
// This gets rid of that
printf("LENGTH: %u\n", chunk.length);
// Read TYPE value
fread(&chunk.type, 4, sizeof(BYTE), png);
// Print out TYPE in chars
printf("TYPE: ");
printf("%c%c%c%c\n", chunk.type & 0xff, (chunk.type & 0xff00)>>8, (chunk.type & 0xff0000)>>16, (chunk.type & 0xff000000)>>24);
// Allocate LENGTH bytes of memory for data
chunk.data = calloc(chunk.length, sizeof(BYTE));
// Populate DATA
for(unsigned j = 0; j < chunk.length; ++j) {
fread(&buffer, 1, sizeof(BYTE), png);
}
// Read CRC value
for(unsigned j = 0; j < 4; ++j) {
fread(&chunk.crc, 1, sizeof(BYTE), png);
}
printf("CRC: %x\n", chunk.crc);
printf("\n");
i++;
}
here are some preprocessor directives and global variables
#define BYTE uint8_t
typedef struct {
uint32_t length;
uint32_t type;
uint32_t crc;
BYTE* data;
} CHUNK;
here are some examples of the output I am getting
Run 1 -
============
CHUNK: 1
LENGTH: 13
TYPE: IHDR
CRC: 17a6a400
============
CHUNK: 2
LENGTH: 2341
TYPE: iCCP
CRC: 17a6a41e
Run 2 -
============
CHUNK: 1
LENGTH: 13
TYPE: IHDR
CRC: 35954400
============
CHUNK: 2
LENGTH: 2341
TYPE: iCCP
CRC: 3595441e
Run 3 -
============
CHUNK: 1
LENGTH: 13
TYPE: IHDR
CRC: 214b0400
============
CHUNK: 2
LENGTH: 2341
TYPE: iCCP
CRC: 214b041e
As you can see, the CRC values are different each time, yet within each run they are all fairly similar whereas my intuition tells me this should not be the case and the CRC value should not be changing.
Just to make sure, I also ran
$ cat test.png > file1
$ cat test.png > file2
$ diff -s file1 file2
Files file1 and file2 are identical
so accessing the file two different times doesnt change the CRC values in them, as expected.
Thanks,

This:
fread(&chunk.crc, 1, sizeof(BYTE), png);
keeps overwriting the first byte of chunk.crc with the bytes read from the file. The other three bytes of chunk.crc are never written, and so you are seeing whatever was randomly in memory at those locations when your program started. You will note that the 00 and 1e at the ends is consistent, since that is the one byte that is being written.
Same problem with this in your data reading loop:
fread(&buffer, 1, sizeof(BYTE), png);
An unrelated error is that you are accumulating bytes in a 32-bit integer thusly:
chunk.length = (chunk.length | buffer)<<8;
and then after the end of that loop, rolling it back down:
chunk.length = chunk.length>>8;
That will always discard the most significant byte of the length, since you push it off the top of the 32 bits, and then roll eight zero bits back down in its place. Instead you need to do it like this:
chunk.length = (chunk.length << 8) | buffer;
and then all 32 bits are retained, and you don't need to fix it at the end.
This is a bad idea:
fread(&chunk.type, 4, sizeof(BYTE), png);
because it is not portable. What you end up with in chunk.type depends on the endianess of the architecture it is running on. For "IHDR", you will get 0x52444849 on a little-endian machine, and 0x49484452 on a big-endian machine.

Related

How to read binary inputs from a file in C

What I need to do is to read binary inputs from a file. The inputs are for example (binary dump),
00000000 00001010 00000100 00000001 10000101 00000001 00101100 00001000 00111000 00000011 10010011 00000101
What I did is,
char* filename = vargs[1];
BYTE buffer;
FILE *file_ptr = fopen(filename,"rb");
fseek(file_ptr, 0, SEEK_END);
size_t file_length = ftell(file_ptr);
rewind(file_ptr);
for (int i = 0; i < file_length; i++)
{
fread(&buffer, 1, 1, file_ptr); // read 1 byte
printf("%d ", (int)buffer);
}
But the problem here is that, I need to divide those binary inputs in some ways so that I can use it as a command (e.g. 101 in the input is to add two numbers)
But when I run the program with the code I wrote, this provides me an output like:
0 0 10 4 1 133 1 44 8 56 3 147 6
which shows in ASCII numbers.
How can I read the inputs as binary numbers, not ASCII numbers?
The inputs should be used in this way:
0 # Padding for the whole file!
0000|0000 # Function 0 with 0 arguments
00000101|00|000|01|000 # MOVE the value 5 to register 0 (000 is MOV function)
00000011|00|001|01|000 # MOVE the value 3 to register 1
000|01|001|01|100 # ADD registers 0 and 1 (100 is ADD function)
000|01|0000011|10|000 # MOVE register 0 to 0x03
0000011|10|010 # POP the value at 0x03
011 # Return from the function
00000110 # 6 instructions in this function
I am trying to implement some sort of like assembly language commands
Can someone please help me out with this problem?
Thanks!
You need to understand the difference between data and its representation. You are correctly reading the data in binary. When you print the data, printf() gives the decimal representation of the binary data. Note that 00001010 in binary is the same as 10 in decimal and 00000100 in binary is 4 in decimal. If you convert each sequence of bits into its decimal value, you will see that the output is exactly correct. You seem to be confusing the representation of the data as it is output with how the data is read and stored in memory. These are two different and distinct things.
The next step to solve your problem is to learn about bitwise operators: |, &, ~, >>, and <<. Then use the appropriate combination of operators to extract the data you need from the stream of bits.
The format you use is not divisible by a byte, so you need to read your bits into a circular buffer and parse it with a state machine.
Read "in binary" or "in text" is quite the same thing, the only thing that change is your interpretation of the data. In your exemple you are reading a byte, and you are printing the decimal value of that byte. But you want to print the bit of that char, to do that you just need to use binary operator of C.
For example:
#include <stdio.h>
#include <stdbool.h>
#include <string.h>
#include <limits.h>
struct binary_circle_buffer {
size_t i;
unsigned char buffer;
};
bool read_bit(struct binary_circle_buffer *bcn, FILE *file, bool *bit) {
if (bcn->i == CHAR_BIT) {
size_t ret = fread(&bcn->buffer, sizeof bcn->buffer, 1, file);
if (!ret) {
return false;
}
bcn->i = 0;
}
*bit = bcn->buffer & ((unsigned char)1 << bcn->i++); // maybe wrong order you should test yourself
// *bit = bcn->buffer & (((unsigned char)UCHAR_MAX / 2 + 1) >> bcn->i++);
return true;
}
int main(void)
{
struct binary_circle_buffer bcn = { .i = CHAR_BIT };
FILE *file = stdin; // replace by your file
bool bit;
size_t i = 0;
while (read_bit(&bcn, file, &bit)) {
// here you must code your state machine to parse instruction gl & hf
printf(bit ? "1" : "0");
if (++i >= 7) {
i = 0;
printf(" ");
}
}
}
Help you more would be difficult, you are basically asking help to code a virtual machine...

Determine if a message is too long to embed in an image

I created a program that embeds a message in a PPM file by messing with the last bit in each byte in the file. The problem I have right now is that I don't know if I am checking if a message is too long or not correctly. Here's what I've got so far:
int hide_message(const char *input_file_name, const char *message, const char *output_file_name)
{
unsigned char * data;
int n;
int width;
int height;
int max_color;
//n = 3 * width * height;
int code = load_ppm_image(input_file_name, &data, &n, &width, &height, &max_color);
if (code)
{
// return the appropriate error message if the image doesn't load correctly
return code;
}
int len_message;
int count = 0;
unsigned char letter;
// get the length of the message to be hidden
len_message = (int)strlen(message);
if (len_message > n/3)
{
fprintf(stderr, "The message is longer than the image can support\n");
return 4;
}
for(int j = 0; j < len_message; j++)
{
letter = message[j];
int mask = 0x80;
// loop through each byte
for(int k = 0; k < 8; k++)
{
if((letter & mask) == 0)
{
//set right most bit to 0
data[count] = 0xfe & data[count];
}
else
{
//set right most bit to 1
data[count] = 0x01 | data[count];
}
// shift the mask
mask = mask>>1 ;
count++;
}
}
// create the null character at the end of the message (00000000)
for(int b = 0; b < 8; b++){
data[count] = 0xfe & data[count];
count++;
}
// write a new image file with the message hidden in it
int code2 = write_ppm_image(output_file_name, data, n, width, height, max_color);
if (code2)
{
// return the appropriate error message if the image doesn't load correctly
return code2;
}
return 0;
}
So I'm checking to see if the length of the message (len_message) is longer that n/3, which is the same thing as width*height. Does that seem correct?
The check you're currently doing is checking whether the message has more bytes than the image has pixels. Because you're only using 1 bit per pixel to encode the message, you need to check if the message has more bits than the message has pixels.
So you need to do this:
if (len_message*8 > n/3)
In addition to #dbush's remarks about checking the number of bits in your message, you appear not to be accounting for all the bytes available to you in the image. Normal ("raw", P6-format) PPM images use three color samples per pixel, at either 8 or 16 bits per sample. Thus, the image contains at least 3 * width * height bytes of color data, and maybe as many as 6 * width * height.
On the other hand, the point of steganophraphy is to make the presence of a hidden message difficult to detect. In service to that objective, if you have a PPM with 16 bits per sample then you probably want to avoid modifying the more-significant bytes of the samples. Or if you don't care about that, then you might as well use the whole low-order byte of each sample in that case.
Additionally, PPM files record the maximum possible value of any sample, which does not need to be the same as the maximum value of the underlying type. It is possible for your technique to change the actual maximum value to be greater than the recorded maximum, and if you do not then change the maximum-value field as well then the inconsistency could be a tip-off that the file has been tampered with.
Furthermore, raw PPM format affords the possibility of multiple images of the same size in one file. The file header does not express how many there are, so you have to look at the file size to tell. You can use the bytes of every image in the file to hide your message.

Get the character dominant from a string

Okay.. according to the title i am trying to figure out a way - function that returns the character that dominates in a string. I might be able to figure it out.. but it seems something is wrong with my logic and i failed on this. IF someome can come up with this without problems i will be extremelly glad thank you.
I say "in a string" to make it more simplified. I am actually doing that from a buffered data containing a BMP image. Trying to output the base color (the dominant pixel).
What i have for now is that unfinished function i started:
RGB
bitfox_get_primecolor_direct
(char *FILE_NAME)
{
dword size = bmp_dgets(FILE_NAME, byte);
FILE* fp = fopen(convert(FILE_NAME), "r");
BYTE *PIX_ARRAY = malloc(size-54+1), *PIX_CUR = calloc(sizeof(RGB), sizeof(BYTE));
dword readed, i, l;
RGB color, prime_color;
fseek(fp, 54, SEEK_SET); readed = fread(PIX_ARRAY, 1, size-54, fp);
for(i = 54; i<size-54; i+=3)
{
color = bitfox_pixel_init(PIXEL_ARRAY[i], PIXEL_ARRAY[i+1], PIXEL_ARRAY[i+2);
memmove(PIX_CUR, color, sizeof(RGB));
for(l = 54; l<size-54; l+=3)
{
if (PIX_CUR[2] == PIXEL_ARRAY[l] && PIX_CUR[1] == PIXEL_ARRAY[l+1] &&
PIX_CUR[0] == PIXEL_ARRAY[l+2])
{
}
Note that RGB is a struct containing 3 bytes (R, G and B).
I know thats nothing but.. thats all i have for now.
Is there any way i can finish this?
If you want this done fast throw a stack of RAM at it (if available, of course). You can use a large direct-lookup table with the RGB trio to manufacture a sequence of 24bit indexes into a contiguous array of counters. In partial-pseudo, partial code, something like this:
// create a zero-filled 2^24 array of unsigned counters.
uint32_t *counts = calloc(256*256*256, sizeof(*counts));
uint32_t max_count = 0
// enumerate your buffer of RGB values, three bytes at a time:
unsigned char rgb[3];
while (getNextRGB(src, rgb)) // returns false when no more data.
{
uint32_t idx = (((uint32_t)rgb[0]) << 16) | (((uint32_t)rgb[1]) << 8) | (uint32_t)rgb[2];
if (++counts[idx] > max_count)
max_count = idx;
}
R = (max_count >> 16) & 0xFF;
G = (max_count >> 8) & 0xFF;
B = max_count & 0xFF;
// free when you have no more images to process. for each new
// image you can memset the buffer to zero and reset the max
// for a fresh start.
free(counts);
Thats it. If you can afford to throw a big hulk of memory at this a (it would be 64MB in this case, at 4 bytes per entry at 16.7M entries), then performing this becomes O(N). If you have a succession of images to process you can simply memset() the array back to zeros, clear max_count, and repeat for each additional file. Finally, don't forget to free your memory when finished.
Best of luck.

Repeatedly fread() 16 bits from binary file

I'm reading a binary file. The first 16 bits represent an array index, the next 16 represent the number of 16-bit items about to be listed, and then the remaining multiples of 16 represent all those 16-bit items. For example, the following hex-dump of the file 'program':
30 00 00 02 10 00 F0 25
represents index 0x3000, with 0x0002 elements following, which are 0x1000 and 0xF025.
FILE *fp = fopen(program, "rb");
char indexChar, nItemsChar;
u_int16_t index, nItems;
fread (&indexChar, 2, 1, fp);
fread (&nItemsChar, 2, 1, fp);
address = strtol(&indexChar, NULL, 16);
nItems = strtol(&nItemsChar, NULL, 16);
for (u_int16_t i = 0; i < nItems; ++i)
{
fread (state->mem + index + i, 2, 1, fp);
}
I'm not even sure if this approach works because I get EXC_BAD_ACCESS when trying to fRead() into nItemsChar. What am I doing wrong?
You are confusing ascii (text) file i/o and binary.
Program is crashing at fread(&nItemsChar,2,1,fp) because you have read 2 bytes into a 1 byte memory space (actually it may be messing up on the previous fread)
You then try to use strtol which converts from ascii to long int but the values read are binary
Instead just use
fread(&index, sizeof(index),1,fp);
fread(&nItems, sizeof(nItems),1,fp);
and then the for loop. Note that this assumes that the file is written with the same endianness as your processor/configuration.
uint16_t index, *nItems;
fread (&index, sizeof(uint16_t), 1, fp);
nItems = (uint16_t*)calloc(index, sizeof(uint16_t));
fread (nItems, sizeof(uint16_t), index, fp);

Using bzip2 low-level routines to compress chunks of data

The Overview
I am using the low-level calls in the libbzip2 library: BZ2_bzCompressInit(), BZ2_bzCompress() and BZ2_bzCompressEnd() to compress chunks of data to standard output.
I am migrating working code from higher-level calls, because I have a stream of bytes coming in and I want to compress those bytes in sets of discrete chunks (a discrete chunk is a set of bytes that contains a group of tokens of interest — my input is logically divided into groups of these chunks).
A complete group of chunks might contain, say, 500 chunks, which I want to compress to one bzip2 stream and write to standard output.
Within a set, using the pseudocode I outline below, if my example buffer is able to hold 101 chunks at a time, I would open a new stream, compress 500 chunks in runs of 101, 101, 101, 101, and one final run of 96 chunks that closes the stream.
The Problem
The issue is that my bz_stream structure instance, which keeps tracks of the number of compressed bytes in a single pass of the BZ2_bzCompress() routine, seems to claim to be writing more compressed bytes than the total bytes in the final, compressed file.
For example, the compressed output could be a file with a true size of 1234 bytes, while the number of reported compressed bytes (which I track while debugging) is somewhat higher than 1234 bytes (say 2345 bytes).
My rough pseudocode is in two parts.
The first part is a rough sketch of what I do to compress a subset of chunks (and I know that I have another subset coming after this one):
bz_stream bzStream;
unsigned char bzBuffer[BZIP2_BUFFER_MAX_LENGTH] = {0};
unsigned long bzBytesWritten = 0UL;
unsigned long long cumulativeBytesWritten = 0ULL;
unsigned char myBuffer[UNCOMPRESSED_MAX_LENGTH] = {0};
size_t myBufferLength = 0;
/* initialize bzStream */
bzStream.next_in = NULL;
bzStream.avail_in = 0U;
bzStream.avail_out = 0U;
bzStream.bzalloc = NULL;
bzStream.bzfree = NULL;
bzStream.opaque = NULL;
int bzError = BZ2_bzCompressInit(&bzStream, 9, 0, 0);
/* bzError checking... */
do
{
/* read some bytes into myBuffer... */
/* compress bytes in myBuffer */
bzStream.next_in = myBuffer;
bzStream.avail_in = myBufferLength;
bzStream.next_out = bzBuffer;
bzStream.avail_out = BZIP2_BUFFER_MAX_LENGTH;
do
{
bzStream.next_out = bzBuffer;
bzStream.avail_out = BZIP2_BUFFER_MAX_LENGTH;
bzError = BZ2_bzCompress(&bzStream, BZ_RUN);
/* error checking... */
bzBytesWritten = ((unsigned long) bzStream.total_out_hi32 << 32) + bzStream.total_out_lo32;
cumulativeBytesWritten += bzBytesWritten;
/* write compressed data in bzBuffer to standard output */
fwrite(bzBuffer, 1, bzBytesWritten, stdout);
fflush(stdout);
}
while (bzError == BZ_OK);
}
while (/* while there is a non-final myBuffer full of discrete chunks left to compress... */);
Now we wrap up the output:
/* read in the final batch of bytes into myBuffer (with a total byte size of `myBufferLength`... */
/* compress remaining myBufferLength bytes in myBuffer */
bzStream.next_in = myBuffer;
bzStream.avail_in = myBufferLength;
bzStream.next_out = bzBuffer;
bzStream.avail_out = BZIP2_BUFFER_MAX_LENGTH;
do
{
bzStream.next_out = bzBuffer;
bzStream.avail_out = BZIP2_BUFFER_MAX_LENGTH;
bzError = BZ2_bzCompress(&bzStream, (bzStream.avail_in) ? BZ_RUN : BZ_FINISH);
/* bzError error checking... */
/* increment cumulativeBytesWritten by `bz_stream` struct `total_out_*` members */
bzBytesWritten = ((unsigned long) bzStream.total_out_hi32 << 32) + bzStream.total_out_lo32;
cumulativeBytesWritten += bzBytesWritten;
/* write compressed data in bzBuffer to standard output */
fwrite(bzBuffer, 1, bzBytesWritten, stdout);
fflush(stdout);
}
while (bzError != BZ_STREAM_END);
/* close stream */
bzError = BZ2_bzCompressEnd(&bzStream);
/* bzError checking... */
The Questions
Am I calculating cumulativeBytesWritten (or, specifically, bzBytesWritten) incorrectly, and how would I fix that?
I have been tracking these values in a debug build, and I do not seem to be "double counting" the bzBytesWritten value. This value is counted and used once to increment cumulativeBytesWritten after each successful BZ2_bzCompress() pass.
Alternatively, am I not understanding the correct use of the bz_stream state flags?
For example, does the following compress and keep the bzip2 stream open, so long as I keep sending some bytes?
bzError = BZ2_bzCompress(&bzStream, BZ_RUN);
Likewise, can the following statement compress data, so long as there are at least some bytes are available to access from the bzStream.next_in pointer (BZ_RUN), and then the stream is wrapped up when there are no more bytes available (BZ_FINISH)?
bzError = BZ2_bzCompress(&bzStream, (bzStream.avail_in) ? BZ_RUN : BZ_FINISH);
Or, am I not using these low-level calls correctly at all? Should I go back to using the higher-level calls to continuously append a grouping of compressed chunks of data to one main file?
There's probably a simple solution to this, but I've been banging my head on the table for a couple days in the course of debugging what could be wrong, and I'm not making much progress. Thank you for any advice.
In answer to my own question, it appears I am miscalculating the number of bytes written. I should not use the total_out_* members. The following correction works properly:
bzBytesWritten = sizeof(bzBuffer) - bzStream.avail_out;
The rest of the calculations follow.

Resources