Appending bits to a file in C - c

I'm working on a CRC32 program for a project and I've hit another stumbling block. We get a 32-bit UINT back from the ASM code we have, and in order to test the algorithm, we need to append those exact bits to the end of the text file we threw into the algorithm, and we're kind of at a loss of how to do that. We tried fprint, but that transformed the int into a char and changed the bits. Same deal with fwrite. Is there some way to do this with fwrite we're missing? Any help would be appreciated.

You have to open the file in binary mode.
It's also possible you have to flip the bytes (if the ASM code returns them in a different endianness than expected. If the target is big-endian, than htonl will work).

Related

Writing number as raw binary to file

So I'm trying to write the value of a long as raw binary to a file with fwrite(), here's a test file:
#include<stdio.h>
int main() {
FILE* stream;
stream = fopen("write", "wb");
long number = 0xfff;
fwrite(&number, sizeof(long), 1, stream);
}
When opening the written file in a text editor, this is the output:
ff0f 0000 0000 0000
while I was expecting something like 0000 0000 0000 0fff.
How do I write the desired result correctly as raw binary?
Read about Endianness, which states how a bytes are arranged in a word (or double/quad word, et cetera) in a computer system.
I'm assuming you've coded and compiled this example on a X86 system, which is little-endian, so, the least significant bits COME FIRST. The opposite of that arrangement is called big-endian.
Now, it is clear that your objective in this exercise is to marshall (or pickle, depending on how your prefer your jargon) some bytes to be later retrieved, possibly by another program.
If you develop a program that uses fread() and and reads the data in the same way (using sizeof(long) so you don't read too much data) and in a machine with the same endianness, it will magically work, and the number you expect is gonna be back. But, if you compile and run the "read" tool in a machine with the opposite endianness, reading the same input file, your number will be garbled.
If your objective is to marshall data, you should be better off with a tool to help you marshall your bytes in a way that is endianness-agnostic, that is, a library that helps you get the data in the correct order. There are libraries out there that take care of that for you.
There's no problem. You're seeing it as ff0f 0000 0000 0000 cause of the endian of the machine!
Try using fread() instead!
as other have pointed out in the comments, this is an endianness "issue". That it, it is only an issue if you are going to run your software on systems with an other endianness.
This very useful resource is the first result for the "c endian" google search, at least for me. I hope it helps you.
Edit: I will dig a bit more into the details below.
To determine what is the endianness of the machine you are currently running on, write a known value (for example, 0xAABBCCDD) into memory, then take a pointer to that value and cast it to a char* or other 1-byte data type. Read the same value again, byte by byte, and compare the ordering with what you wrote. If they are the same, you are on a big endian machine. If not... Then you could be on a little (more probable) or middle (less probable) endian machine. You can in either case generate a "swap map" for reordering the bytes; which is the reason why I chose four different byte values in the above example.
But, how to swap those values? An easy way, as indicated here, is to use bit shifts with masks. You can also use tables and swap their values around.

How to properly work with file upon encoding and decoding it?

It doesn't matter how I exactly encrypt and decode files. I operate with file as a char massive, everything is almost fine, until I get file, which size is not divide to 8 bytes. Because I can encrypt and decode file each round 8 bytes, because of particular qualities of algorithm (size of block must be 64 bit).
So then, for example, I face .jpg and tried simply add spaces to end of file, result file can't be opened ( ofc. with .txt files nothing bad happen).
Is any way out here?
If you want information about algorithm http://en.wikipedia.org/wiki/GOST_(block_cipher).
UPD: I can't store how many bytes was added, because initial file can be deleted or moved. And, what we are suppose to do then we know only key and have encrypted file.
Do you need padding.
The best way to do this would be to use PKCS#7.
However GOST is not so good, better using AES-CBC.
There is an ongoing similar discussion in "python-channel".

GIF LZW decompression hints?

I've read through numerous articles on GIF LZW decompression, but I'm still confused as to how it works or how to solve, in terms of coding, the more fiddly bits of coding.
As I understand it, when I get to the byte stream in the GIF for the LZW compressed data, the stream tells me:
Minimum code size, AKA number of bits the first byte starts off with.
Now, as I understand it, I have to either add one to this for the clear code, or add two for clear code and EOI code. But I'm confused as to which of these it is?
So say I have 3 colour codes (01, 10, 11), with EOI code assumed (as 00) will the byte that follows the minimum code size (of 2) be 2 bits, or will it be 3 bits factoring in the clear code? Or is the clear code/EOI code both already factored into the minimum size?
The second question is, what is the easiest way to read in dynamically sized bits from a file? Because reading an odd numbers of bits (3 bits, 12 bits etc) from an even numbered byte (8) sounds like it could be messy and buggy?
To start with your second question: yes you have to read the dynamically sized bits from an 8bit bytestream. You have to keep track of the size you are reading, and the number of unused bits left from previous read operations (used for correctly putting the 'next byte' from the file).
IIRC there is a minimum code size of 8 bits, which would give you a clear code of 256 (base 10) and an End Of Input of 257. The first stored code is then 258.
I am not sure why you did not looked up the source of one of the public domain graphics libraries. I know I did not because back in 1989 (!) there were no libraries to use and no internet with complete descriptions. I had to implement a decoder from an example executable (for MS-DOS from Compuserve) that could display images and a few GIF files, so I know that can be done (but it is not the most efficient way of spending your time).

Reading a binary file bit by bit

I know the function below:
size_t fread(void *ptr, size_t size_of_elements, size_t number_of_elements, FILE *a_file);
It only reads byte by byte, my goal is to be able to read 12 bits at a time and then take them into an array. Any help or pointers would be greatly appreciated!
Adding to the first comment, you can try reading one byte at a time (declare a char variable and write there), and then use the bitwise operators >> and << to read bit by bit. Read more here: http://www.cprogramming.com/tutorial/bitwise_operators.html
Many years ago, I wrote some I/O routines in C for a Huffman encoder. This needs to be able to read and write on the granularity of bits rather than bytes. I created functions analogous to read(2) and write(2) that could be asked to (say) read 13 bits from a stream. To encode, for example, bytes would be fed into the coder and variable numbers of bits would emerge the other side. I had a simple structure with a bit pointer into the current byte being read or written. Every time it went off the end, it flushed the completed byte out and reset the pointer to zero. Unfortunately that code is long gone, but it might be an idea to pull apart an open-source Huffman coder and see how the problem was solved there. Similarly, base64 coding takes 3 bytes of data and turns them into 4 (or vice versa).
I've implemented a couple of methods to read/write files bit by bit. Here they are. Whether it is viable or not for your use case, you have to decide that for yourself. I've tried to make the most readable, optimized code i could, not being a seasoned C developer (for now).
Internally, it uses a "bitCursor" to store information about previous bits that don't yet fit a full byte. It has who data fields: d stores data and s stores size, or the amount of bits stored in the cursor.
You have four functions:
newBitCursor(), which returns a bitCursor object with default values
{0,0}. Such a cursor is needed at the beginning of a sequence of
read/write operations to or from a file.
fwriteb(void *ptr, size_t sizeb, size_t rSizeb, FILE *fp, bitCursor
*c), which writes sizeb rightmost bits of the value stored in ptr to fp.
fcloseb(FILE *fp, bitCursor *c), which writes a remaining byte, if
the previous writes did not exactly encapsulate all data needed to
be written, that is probably almost always the case...
freadb(void *ptr, size_t sizeb, size_t rSizeb, FILE *fp, bitCursor
*c), which reads sizeb bits and bitwise ORs them to *ptr. (it is, therefore, your responsibility to init *ptr as 0)
More info is provided in the comments. Have Fun!
Edit: It has come to my knowledge today that when i made that i assumed Little Endian! :P Oops! It's always nice to realize how much of a noob i still am ;D.
Edit: GNU's Binary File Descriptors.
Read the first two bytes from your a_file file pointer and check the bits in the least or greatest byte — depending on the endianness of your platform (x86 is little-endian) — using bitshift operators.
You can't really put bits into an array, as there isn't a datatype for bits. Rather than keeping 1's and 0's in an array, which is inefficient, it seems cheaper just to keep the two bytes in a two-element array (say, of type unsigned char *) and write functions to map those two bytes to one of 4096 (2^12) values-of-interest.
As a complication, on subsequent reads, if you want to fread through the pointer every 12 bits, you would read only one byte, using the left-over bits from the previous read to build a new 12-bit value. If there are no leftovers, you would need to read two bytes.
Your mapping functions would need to address the second case where bits were used from previous read, because the two bytes would need different mapping. To do this efficiently, a modulus on a read-counter could be used to swap between two mappings.
read 2 bytes and do bit wise operations will get it done for the next time read 2nd bytes onwards apply the bit-wise operations will get back you expected . . . .
For your problem you can see this demo program which read 2byte but actual information is only 12 bit.As well as this type of things are used it bit wise access.
fwrite() is a standard library function which take the size argument as byte and of type int.So it is not possible exactly 12bit read.If the file you create then create like below as well as read as below it solve your problem.
If that file is special file which not written by you then follow the standard provided for that file to read I think they also writing like this only.Or you can provide the axact where it I may help you.
#include<stdio.h>
#include<stdlib.h>
struct node
{
int data:12;
}NODE;
int main()
{
FILE *fp;
fp=fopen("t","w");
NODE.data=1024;
printf("%d\n",NODE.data);
fwrite(&NODE,sizeof(NODE),1,fp);
NODE.data=0;
NODE.data=2048;
printf("%d\n",(unsigned)NODE.data);
fwrite(&NODE,sizeof(NODE),1,fp);
fclose(fp);
fp=fopen("t","r");
fread(&NODE,sizeof(NODE),1,fp);
printf("%d\n",NODE.data);
fread(&NODE,sizeof(NODE),1,fp);
printf("%d\n",NODE.data);
fclose(fp);
}

After encryption, an exe file becomes non-executable

After writing a basic LFSR-based stream cipher encryption module in C, I tried it on usual text files, and then on a .exe file in Windows. However, after decrypting it back the file is not running, giving some error about being a 16-bit. Evidently some error in decrypting. Or are files made so that if I tamper with their binary code they become corrupted?
I'm checking my program on text-files in the hope of locating any error on my part. However, the question is had anyone tried running your own encryption programs on an executable file? Is their any obvious answer to this?
There is nothing special about executables. They are obviously binary files and thus contain 00 bytes and bytes >127. As long as your algorithm is binary safe, it should work.
Compare the original file and the decrypted file using a hex-editor. To see how they differ.
The error you get means that you didn't decrypt the executable header correctly, so the decryption mistake must already affect the first few bytes of your file.
Evidently some error in decrypting. An exe is a bag 'o bytes just like any other file, there's no magic. You are merely likely to run into byte values that you won't get in a text file. Like a zero.
A decryption process should be the inverse of its encryption. In other words, Decrypt(Encrypt(X)) == X for all inputs X, of all possible lengths, of all possible byte values.
I suggest you build yourself a test harness that will run some pairwise checks with randomised data so you can prove to yourself that the two transformations do indeed cancel each other out. I mean something like:
for length from 0 to 1000000:
generate a string of that length with random contents
encrypt it to a fresh memory buffer
decrypt it to a fresh memory buffer
compare the decrypted string with the original string
Do this first of all on in-memory strings so you can isolate the algorithm from your file-handling code.
Once you've proved the algorithm is properly inverting, you can then do the same for files; as others have said you might well be running into issues with handling binary files, that's a common gotcha.

Resources