Explanation of HEX value representation and Endianess - c

I was working on a script to basically output some sample data as a binary blob.
I'm a new intern in the software field and vaguely remember the idea of endianness.
I realize that the most significant bits for big-endian starts at the top and works down the memory block.
If I have 0x03000201 and the data is being parsed to output 0 1 2, how does this happen and what is being done to make that work in terms of bits, bytes, etc.
I am wondering, in the example posted below, how the numbers are extracted to form 0 1 2 when printing out the data stored in the variables.
For example: I am creating a couple lines of the binary blob using this file:
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
FILE *file;
int buffer = 0x03000201;
int buffer2= 0x010203;
file = fopen("test.bin", "wb");
if (file != NULL)
{
fwrite(&buffer, sizeof(buffer), 1, file);
fwrite(&buffer2, sizeof(buffer2), 1, file);
fclose(file);
}
return 0;
}
I then created a Python script to parse this data:
Info About Parse
import struct
with open('test.bin','rb') as f:
while True:
data = f.read(4)
if not data: break
var1, var2, var3 = struct.unpack('=BHB', data)
print(var1, var2, var3)

Big or little endianness defines how to interpret a sequence of bytes longer than one byte and how to store those in memory. Wikipedia will help you with that.
I was really just looking to understand how 0x0300020 when read 2
bytes at a time and reprinted yields 0 1 2.
You don't read 2 bytes at a time, you read 4 bytes: data = f.read(4)
f.read(size) reads some quantity of data and returns it as a string.
You unpack data using =BHB - byte, 2 bytes, byte. Endianness comes into play only when you unpack data, all other IO calls in your code deal with byte sequences.
Experiment with unpack() Byte Order, Size, and Alignment You may also look at file data with a HEX editor of your choice.
And if, after your research, you have a concrete question, ask here.

Related

Using getchar() to read from file

I have an assignment and basically i want to read all the bytes from an audio file using getchar() like this:
while(ch = getchar()) != EOF)
At some point I have to read 4 consecutive bytes that stand for size of file and I can't understand the following:
If the file my program is reading is for example 150 bytes in size, that is enough to be stored in 1 of the 4 bytes, which means that 3 of the bytes will be 0 and the last one will be 150 in that case. I understand that I need to read all 4 bytes, through 4 repetitions of the while in the above section of cod, in order to get all the information I need, but what exactly is getchar() going to return to my variable, as it returns the ASCII code for the character it just read?
Also what happens for larger numbers, that can't be stored in a single byte?
Cant comment since i dont have enough reputation, i am deeply perplexed with your question for I do not understand what do you mean or what are you trying to achieve
The function getChar() should be used for returning mostly a single byte at a time, in fact only upon reading your question did i check the manual to learn it reads more than one although from my experience and the tests i performed now it seems it is used for reading multi byte characters heres the simple code i used to check for it
char * c;
printf("Enter character: ");
c = getchar();
printf("%s",c);
The character i used and this will probably unformat is the stack overflow glyph i use in my polybar, 溜, here it shows as an asian character.
Not only that but fgets will return EOF when arriving at the end of the file(or when an error occurs) as stated in the linux manual
https://linux.die.net/man/3/getchar
Also upon further reading it depends on how the file stores data, if its big endian the first byte read will be 0,0,0,150 else if its little endian it will be 150,0,0,0 but thats assuming it is reading 1 character at the time and not 4 at once as you described it
As for the "solution" of your question why not use fread() reading the 4 bytes at once or a derivative when it does it job properly?
EDIT
As asked by the comment the following "concatenates" the values bit-wise i used scanf because i was too lazy to manually check for every ASCII key, this assuming the file is big endian, ie 0,0,0,150 else invert the order in which the << is done and it should "just werk™"
#include <stdio.h>
#include <stdlib.h>
unsigned char c[4];
unsigned int dosomething(){
unsigned int result=0;
result= (unsigned int)c[0]<< 24 | (unsigned int)c[1]<< 16 | (unsigned int)c[2]<< 8 | (unsigned int)c[3];
return result;
}
int main(int argc, char const *argv[]){
for (size_t i = 0; i < 4; i++)
{
printf("Enter character: ");
scanf ("%u", &c[i]);
printf("%u\n", c[i]);
//printf("%s",c);
}
printf("%u",dosomething());
return 0;
}
Now for the fread it is used like the following fread(pointertodatatoread, sizeofdata, sizeofarray, filepointer);
for indepth look here is the manual:
https://www.tutorialspoint.com/c_standard_library/c_function_fread.htm
this should be asked in a different thread as i feel im asking another question
If the file my program is reading is for example 150 bytes in size, that is enough to be stored in 1 of the 4 bytes, which means that 3 of the bytes will be 0 and the last one will be 150 in that case. I understand that I need to read all 4 bytes in order to get all the information I need, but what exactly is getchar() going to return to my variable, as it returns the ASCII code for the character it just read?
getchar doesn't know anything about ASCII. It returns the numeric value of the byte it reads, or a special code, represented by EOF, if it cannot read a byte. If you treat the byte as an ASCII code then that's a matter of interpretation.
Thus, if your file size is encoded as as three zero bytes followed by one byte with value 150, then getchar() will return that as 0, 0, 0, and 150 on four consecutive calls.

How to read a binary into an array

Say I have a 90 megabyte file. It's not encrypted, but it is binary.
I want to store this file into a table as an array of byte values so I can process the file byte by byte.
I can spare up to 2 GB of ram, so something with a thing like jotting down what bytes have been processed, which bytes have yet to be processed, and the processed bytes, would all be good. I don't exactly care about how long it may take to process.
How should I approach this?
Note I've expanded and rewritten this answer due to Egor's comment.
You first need the file open in binary mode. The distinction is important on Windows, where the default text mode will change line endings from CR+LF into C newlines. You do this by specifying a mode argument to io.open of "rb".
Although you can read a file one byte at a time, in practice you will want to work through the file in buffers. Those buffers can be fairly large, but unless you know you are handling only small files in a one-off script, you should avoid reading the entire file into a buffer with file:read"*a" since that will cause various problems with very large files.
Once you have a file open in binary mode, you read a chunk of it using buffer = file:read(n), where n is an integer count of bytes in the chunk. Using a moderately sized power of two will likely be the most efficient. The return value will either be nil, or will be a string of up to n bytes. If less than n bytes long, that was the last buffer in the file. (If reading from a socket, pipe, or terminal, however, reads less than n may only indicate that no data has arrived yet, depending on lots of other factors to complex to explain in this sentence.)
The string in buffer can be processed any number of ways. As long as #buffer is not too big, then {buffer:byte(1,-1)} will return an array of integer byte values for each byte in the buffer. Too big partly depends on how your copy of Lua was configured when it was built, and may depend on other factors such as available memory as well. #buffer > 1E6 is certainly too big. In the example that follows, I used buffer:byte(i) to access each byte one at a time. That works for any size of buffer, at least as long as i remains an integer.
Finally, don't forget to close the file.
Here's a complete example, lightly tested. It reads a file a buffer at a time, and accumulates the total size and the sum of all bytes. It then prints the size, sum, and average byte value.
-- sum all bytes in a file
local name = ...
assert(name, "Usage: "..arg[0].." filename")
file = assert(io.open(name, "rb"))
local sum, len = 0,0
repeat
local buffer = file:read(1024)
if buffer then
len = len + #buffer
for i = 1, #buffer do
sum = sum + buffer:byte(i)
end
end
until not buffer
file:close()
print("length:",len)
print("sum:",sum)
print("mean:", sum / len)
Run with Lua 5.1.4 on my Windows box using the example as its input, it reports:
length: 402
sum: 30374
mean: 75.557213930348
To split the contents of a string s into an array of bytes use {s:byte(1,-1)}.

Conversion from binary file to hex in C

I am trying to write some simple program to uploading files to my server. I' d like to convert binary files to hex. I have written something, but it does not work properly.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
static int bufferSize = 1024;
FILE *source;
FILE *dest;
int n;
int counter;
int main() {
unsigned char buffer[bufferSize];
source = fopen("server.pdf", "rb");
if (source) {
dest = fopen("file_test", "wb");
while (!feof(source)) {
n = fread(buffer, 1, bufferSize, source);
counter += n;
strtol(buffer, NULL, 2);
fwrite(buffer, 1, n, dest);
}
}
else {
printf("Error");
}
fclose(source);
fclose(dest);
}
I use strtol to convert binary do hex. After invoking this code I have still strange characters in my file_test file.
I' d like to upload a file on server, for example a PDF file. But firstly I have to write a program, that will convert this file to a hex file. I'd like that the length of a line in hex file would be equal 1024. After that, I will upload this file line by line with PL/SQL.
EDIT: I completely misunderstood what the OP was aiming for. He wants to convert his pdf file to its hex representation, as I see now, because he wants to put that file in a text blob field in some database table. I still claim the exercise is a complete waste of time,since blobs can contain binary data: that's what blobs were invented for. Blob means binary large object.
You said: "I' d like to upload file on server, for example pdf file. But firstly I have to write a program, that will convert this file to hex file."
You don't have to, and must not, write any such conversion program.
You have to first understand and internalize the idea that hex notation is only an easy-to-read representation of binary. If you think, as you seem to, that you have to "convert" a pdf file to hex, then you are mistaken. A pdf file is a binary file is a binary file. You don't "convert" anything, not unless you want to change the binary!
You must abandon, delete, discard, defenestrate, forget about, and expunge your notion of "converting" any binary file to anything else. See, hex exists only as a human-readable presentation format for binary, each hex digit representing four contiguous binary digits.
To put it another way: hex representation is for human consumption only, unsuitable (almost always) for program use.
For an example: suppose your pdf file holds a four-bit string "1100," whose human-readable hex representation can be 'C'. When you "convert" that 1100 to hex the way you want to do it, you replace it by the ASCII character 'C', whose decimal value is 67. You can see right away that's not what you want to do and you immediately see also that it's not even possible: the decimal value 67 needs seven bits and won't fit in your four bits of "1100".
HTH
Your code is fantastically confused.
It's reading in the data, then doing a strtol() call on it, with a base of 2, and then ignoring the return value. What's the point in that?
To convert the first loaded byte of data to hexadecimal string, you should probably use something like:
char hex[8];
sprintf(hex, "%02x", (unsigned int) buffer[0] & 0xff);
Then write hex to the output file. You need to do this for all bytes loaded, of course, not just buffer[0].
Also, as a minor point, you can't call feof() before you've tried reading the file. It's better to not use feof() and instead check the return value of fread() to detect when it fails.
strtol converts a string containing a decimal representation of a number to the binary number if i am not mistaken. You probably want to convert something like a binary OK to 4F 4B... To do that you can use for example sprintf(aString, "%x", aChar).

Writing structure variables into a file, problem

Hi I have to write the contents of a structure variable into a file. I have a working program but the output looks distorted, you can understand when you look at the output. The simple version of the code is as below and the outputs follows.
Code:
#include<stdio.h>
#include <stdlib.h>
struct mystruct
{
char xx[20];
char yy[20];
int zz;
};
void filewrite(struct mystruct *myvar)
{
FILE *f;
f = fopen("trace.bin","wb");
if (f == NULL)
{
printf("\nUnable to create the file");
exit(0);
}
fwrite(myvar, sizeof(struct mystruct), 1, f);
fclose(f);
}
void main()
{
struct mystruct myvar = {"Rambo 1", "Rambo 2", 1234};
filewrite(&myvar);
}
Output:
(1. Where is the integer '1234'? I need that intact.)
(2. Why does some random character appear here?)
trace.bin
Rambo 1Rambo 2Ò
Your program is correct and the output too...
Your are writing a binary file containing the raw data from memory.
The integer zz gets written to disk as 4 bytes (or 2 depending on the size of an int on your system), with the least significant byte first (Intel machine I guess).
1234 (decimal) gets written as 0xD2, 0x04, 0x00, 0x00 to disk.
0xD2 is a Ò when you look at in text form. The 0x04 and the 0x's are non-printable characters so they don't show.
First, in general it's not a good practice to copy non-packed struct-types to files since the compiler can add padding to the struct in order to align it in memory. Thus you will end up with either a non-portable implementation, or some garbled output where someone else tries to read your file, and the bits/bytes are not placed at the correct offset because of the compiler's padding bytes.
Second, I'm not sure how you are reading your file back (it appears you just copied it into a buffer and tried to print that), but the last set of bytes is an int type at the end ... it's not going to be a null-terminated string, so the manner in which it prints will not look "correct" ... printing non-null-terminated strings as strings can also lead to buffer overflows resulting in segmentation faults, etc.
In order to read back the contents of the file in a human-readable format, you would need to open the file and read contents back into the correct data-structures/types, and then appropriately call printf or some other means of converting the binary data to ASCII data for a print-out.
I don't recommend dumping memory directly into file, you should use some serialization method (e.x if you have pointer in the struct, you are doomed). I recommend Google Buffers Protocol if data will be shared between multiple applications.

Reading a binary file 1 byte at a time

I am trying to read a binary file in C 1 byte at a time and after searching the internet for hours I still can not get it to retrieve anything but garbage and/or a seg fault. Basically the binary file is in the format of a list that is 256 items long and each item is 1 byte (an unsigned int between 0 and 255). I am trying to use fseek and fread to jump to the "index" within the binary file and retrieve that value. The code that I have currently:
unsigned int buffer;
int index = 3; // any index value
size_t indexOffset = 256 * index;
fseek(file, indexOffset, SEEK_SET);
fread(&buffer, 256, 1, file);
printf("%d\n", buffer);
Right now this code is giving me random garbage numbers and seg faulting. Any tips as to how I can get this to work right?
Your confusing bytes with int. The common term for a byte is an unsigned char. Most bytes are 8-bits wide. If the data you are reading is 8 bits, you will need to read in 8 bits:
#define BUFFER_SIZE 256
unsigned char buffer[BUFFER_SIZE];
/* Read in 256 8-bit numbers into the buffer */
size_t bytes_read = 0;
bytes_read = fread(buffer, sizeof(unsigned char), BUFFER_SIZE, file_ptr);
// Note: sizeof(unsigned char) is for emphasis
The reason for reading all the data into memory is to keep the I/O flowing. There is an overhead associated with each input request, regardless of the quantity requested. Reading one byte at a time, or seeking to one position at a time is the worst case.
Here is an example of the overhead required for reading 1 byte:
Tell OS to read from the file.
OS searches to find the file location.
OS tells disk drive to power up.
OS waits for disk drive to get up to speed.
OS tells disk drive to position to the correct track and sector.
-->OS tells disk to read one byte and put into drive buffer.
OS fetches data from drive buffer.
Disk spins down to a stop.
OS returns 1 byte to your program.
In your program design, the above steps will be repeated 256 times. With everybody's suggestion, the line marked with "-->" will read 256 bytes. Thus the overhead is executed only once instead of 256 times to get the same quantity of data.
In your code you are trying to read 256 bytes to the address of one int. If you want to read one byte at a time, call fread(&buffer, 1, 1, file); (See fread).
But a simpler solution will be to declare an array of bytes, read it all together and process it after that.
unsigned char buffer; // note: 1 byte
fread(&buffer, 1, 1, file);
It is time to read mans I believe.
Couple of problems with the code as it stands.
The prototype for fread is:
size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream);
You've set the size to 256 (bytes) and the count to 1. That's fine, that means "read one lump of 256 bytes, shove it into the buffer".
However, your buffer is on the order of 2-8 bytes long (or, at least, vastly smaller than 256 bytes), so you have a buffer overrun. You probably want to use fred(&buffer, 1, 1, file).
Furthermore, you're writing byte data to an int pointer. This will work on one endian-ness (small-endian, in fact), so you'll be fine on Intel architecture and from that learn bad habits tha WILL come back and bite you, one of these days.
Try real hard to only write byte data into byte-organised storage, rather than into ints or floats.
You are trying to read 256 bytes into a 4-byte integer variable called "buffer". You are overwriting the next 252 bytes of other data.
It seems like buffer should either be unsigned char buffer[256]; or you should be doing fread(&buffer, 1, 1, f) and in that case buffer should be unsigned char buffer;.
Alternatively, if you just want a single character, you could just leave buffer as int (unsigned is not needed because C99 guarantees a reasonable minimum range for plain int) and simply say:
buffer = fgetc(f);

Resources