I am trying to write a binary representation of the integer into a file , accepted that I will get hexadecimal format in the file, however I don't get the expected result.
uint32_t a = 1;
FILE * file = fopen("out.txt", "ab+");
fwrite(&a, sizeof(uint32_t), 1, file );
I expect to get (little endian)
1000 0000
but instead I get in the file
0100 0000
The machine running this snippet of code is Ubuntu linux 32 bit (little endian).
Is there someone who could explain why it's like so , is the file's content consistent with the integer representation on my machine ?
Cheers.
Assuming each of those groups of two digits is a byte, what you're seeing is correct:
01 00 00 00
Little endian orders bytes, not nybbles within bytes. So what you have is:
01 00 00 00
|| || || ||
|| || || == -> 0 * 256 * 256 * 256
|| || == ----> 0 * 256 * 256
|| == -------> 0 * 256
== ----------> 1
Related
I am trying to implement an IJVM an read a binary file.
I understand that an .ijvm file contains a 32-bit magic number and any number of data blocks and that a data block has three parts.
My intention is to first read and store the magic number which is always of constant size and then the data block to a different array.
The .ijvm file looks like this:
1d ea df ad 00 01 00 00 00 00 00 00 00 00 00 00
00 00 00 07 10 30 10 31 60 fd ff
with the first 4 bytes (1d ea df ad) being the magic n. and the rest the data block.
After reading the file I determine the total size being 27 bytes, thus the rest should be 23 bytes.
However, no matter what I try, despite storing and reading correctly the magic and data parts, I always get a wrong data part size and not 23 bytes as I think it should.
byte_t bufferMagic[4];
byte_t *dataBlock;
FILE *fp;
uint32_t filelen;
uint32_t dataBlocklen;
fp = fopen(binary_file, "r");
fseek(fp, 0, SEEK_END); //compute the size of the file
filelen = ftell(fp);
fseek(fp, 0, SEEK_SET);
fprintf(stderr,"file:%s is %d bytes long\n",binary_file,filelen); //outputs27
//read magic number
fread(bufferMagic,1,4,fp);
fprintf(stderr, "Magic number: 0x%02hhx%02hhx%02hhx%02hhx \n",
bufferMagic[0],bufferMagic[1],bufferMagic[2],bufferMagic[3]);
//read data block
dataBlock = (byte_t*)malloc(sizeof(byte_t) * filelen - 4);
//dataBlocklen = ftell(fp); //outputs 4
dataBlocklen = sizeof(dataBlock); //outputs 8
fread(dataBlock,1,filelen - 4,fp); //reads data block correctly
can you please explain what am I missing? Why both dataBlocklen not giving 23 bytes?
//dataBlocklen = ftell(fp); //outputs 4
Returns 4 because current file offset is at 4th byte as you did fread for magicnumber prior to ftell.
fread(bufferMagic,1,4,fp);
and
dataBlocklen = sizeof(dataBlock); //outputs 8
Returns 8 because dataBlock is pointer, thus sizeof(pointer) is 8 byte on your machine.
Thanks for the replies, everyone was useful in helping me understand how this works.
A friend sent me this piece of C code asking how it worked (he doesn't know either). I don't usually work with C, but this piqued my interest. I spent some time trying to understand what was going on but in the end I couldn't fully figure it out. Here's the code:
void knock_knock(char *s){
while (*s++ != '\0')
printf("Bazinga\n");
}
int main() {
int data[5] = { -1, -3, 256, -4, 0 };
knock_knock((char *) data);
return 0;
}
Initially I thought it was just a fancy way to print the data in the array (yeah, I know :\), but then I was surprised when I saw it didn't print 'Bazinga' 5 times, but 8. I searched stuff up and figured out it was working with pointers (total amateur when it comes to c), but I still couldn't figure out why 8. I searched a bit more and found out that usually pointers have 8 bytes of length in C, and I verified that by printing sizeof(s) before the loop, and sure enough it was 8. I thought this was it, it was just iterating over the length of the pointer, so it would make sense that it printed Bazinga 8 times. It also was clea to me now why they'd use Bazinga as the string to print - the data in the array was meant to be just a distraction. So I tried adding more data to the array, and sure enough it kept printing 8 times. Then I changed the first number of the array, -1, to check whether the data truly was meaningless or not, and this is where I was confused. It didn't print 8 times anymore, but just once. Surely the data in the array wasn't just a decoy, but for the life of me I couldn't figure out what was going on.
Using the following code
#include<stdio.h>
void knock_knock(char *s)
{
while (*s++ != '\0')
printf("Bazinga\n");
}
int main()
{
int data[5] = { -1, -3, 256, -4, 0 };
printf("%08X - %08X - %08X\n", data[0], data[1], data[2]);
knock_knock((char *) data);
return 0;
}
You can see that HEX values of data array are
FFFFFFFF - FFFFFFFD - 00000100
Function knock_knock print Bazinga until the pointed value is 0x00 due to
while (*s++ != '\0')
But the pointer here is pointing chars, so is pointing a single byte each loop and so, the first 0x00 is reached accessing the "first" byte of third value of array.
You need to look at the bytewise representation of data in the integer array data. Assuming an integer is 4 bytes, The representation below gives the numbers in hex
-1 --> FF FF FF FF
-3 --> FF FF FF FD
256 --> 00 00 01 00
-4 --> FF FF FF FC
0 --> 00 00 00 00
The array data is these numbers stored in a Little- Endian format. I.e. the LSbyte comes first. So,
data ={FF FF FF FF FD FF FF FF 00 01 00 00 FC FF FF FF 00 00 00 00};
The function knock_knock goes through this data bytewise and prints Bazinga for every non-zero. It stops at the first zero found, which will be after 8 bytes.
(Note: Size of Integer can 2 or 8 bytes, but given that your pointer size is 8 bytes, I am guessing that size of integer is 4 bytes).
It is easy to understand what occurs here if to output the array in hex as a character array. Here is shown how to do this
#include <stdio.h>
int main(void)
{
int data[] = { -1, -3, 256, -4, 0 };
const size_t N = sizeof( data ) / sizeof( *data );
char *p = ( char * )data;
for ( size_t i = 0; i < N * sizeof( int ); i++ )
{
printf( "%0X ", p[i] );
if ( ( i + 1) % sizeof( int ) == 0 ) printf( "\n" );
}
return 0;
}
The program output is
FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF
FFFFFFFD FFFFFFFF FFFFFFFF FFFFFFFF
0 1 0 0
FFFFFFFC FFFFFFFF FFFFFFFF FFFFFFFF
0 0 0 0
So the string "Bazinga" will be outputted as many times as there are non-zero bytes in the representations of integer numbers in the array. As it is seen the first two negative numbers do not have zero bytes in their representations.
However the number 256 in any case has such a byte at the very beginning of its internal representation. So the string will be outputted exactly eight times provided that sizeof( int ) is equal to 4.
I'm reading some code that performs bitwise operations on an int and stores them in an array. I've worked out the binary representation of each step and included this beside the code.
On another computer, the array buff is received as a message and displayed in hex as [42,56,da,1,0,0]
Why question is, how could you figure out what the original number was from the hex number. I get that 42 and 56 are the ASCII equivalent of 'B' and 'V'. But how do you get the number 423 from da '1' '0' '0'?
Thanks
DA 01 00 00 is the little endian representation of 0x000001DA or just 0x1DA. This, in turn, is 256 + 13 * 16 + 10 = 474. Maybe you had this number and changed the program later and forgot to recompile?
Seen it from the other side, 423 is 0x1a7…
as glgl says it should be a7, you can see where that comes from
buff[2] = reading&0xff; // 10100111 = 0xa7
buff[3] = (reading>>8)&0xff; //00000001 = 1
buff[4] = (reading>>16)&0xff; //00000000 = 0
buff[5] = (reading>>24)&0xff; ////00000000 = 0
I am currently enrolled in a CS107 class which makes the following assumptions:
sizeof(int) == 4
sizeof(short) == 2
sizeof(char) == 1
big endianness
My professor showed the following code:
int arr[5];
((short*)(((char*) (&arr[1])) + 8))[3] = 100;
Here are the 20 bytes representing arr:
|....|....|....|....|....|
My professor states that &arr[1] points here, which I agree with.
|....|....|....|....|....|
x
I now understand that (char*) makes the pointer the width of a char (1 byte) instead of the width of an int (4 bytes).
What I don't understand is the + 8, which my professor says points here:
|....|....|....|....|....|
x
But shouldn't it point here, since it is going forwards 8 times the size of a char (1 byte)?
|....|....|....|....|....|
x
Let's take it step by step. Your expression can be decomposed like this:
((short*)(((char*) (&arr[1])) + 8))[3]
-----------------------------------------------------
char *base = (char *) &arr[1];
char *base_plus_offset = base + 8;
short *cast_into_short = (short *) base_plus_offset;
cast_into_short[3] = 100;
base_plus_offset points at byte location 12 within the array. cast_into_short[3] refers to a short value at location 12 + sizeof(short) * 3, which, in your case is 18.
The expression will set the two bytes 18 bytes after the start of arr to the value 100.
#include <stdio.h>
int main() {
int arr[5];
char* start=(char*)&arr;
char* end=(char*)&((short*)(((char*) (&arr[1])) + 8))[3];
printf("sizeof(int)=%zu\n",sizeof(int));
printf("sizeof(short)=%zu\n",sizeof(short));
printf("offset=%td <- THIS IS THE ANSWER\n",(end-start));
printf("100=%04x (hex)\n",100);
for(size_t i=0;i<5;++i){
printf("arr[%zu]=%d (%08x hex)\n",i,arr[i],arr[i]);
}
}
Possible Output:
sizeof(int)=4
sizeof(short)=2
offset=18 <- THIS IS THE ANSWER
100=0064 (hex)
arr[0]=0 (00000000 hex)
arr[1]=0 (00000000 hex)
arr[2]=0 (00000000 hex)
arr[3]=0 (00000000 hex)
arr[4]=6553600 (00640000 hex)
In all your professors shenanigans he's shifted you 1 integer, 8 chars/bytes and 3 shorts that 4+8+6=18 bytes. Bingo.
Notice this output reveals the machine I ran this on to have 4 byte integers, 2 byte short (common) and be little-endian because the last two bytes of the array were set to 0x64 and 0x00 respectively.
I find your diagrams dreadfully confusing because it isn't very clear if you mean the '|' to be addresses or not.
|....|....|....|....|
012345678901234567890
^ 1 ^ ^ 2
A X C S B
Include the bars ('|') A is the start of Arr and B is 'one past the end' (a legal concept in C).
X is the address referred to by the expression &Arr[1].
C by the expression (((char*) (&arr[1])) + 8).
S by the whole expression.
S and the byte following are assigned to and what that means depends on the endian-ness of your platform.
I leave it as an exercise to determine what the output on a similar but big-endian platform who output. Anyone?
I notice from the comments you're big-endian and I'm little-endian (stop sniggering).
You only need to change one line of the output.
Here's some code that can show you which byte gets modified on your system, along with a breakdown of what is happening:
#include <stdio.h>
int main( int argc, char* argv[] )
{
int arr[5];
int i;
for( i = 0; i < 5; i++ )
arr[i] = 0;
printf( "Before: " );
for( i = 0; i < sizeof(int)*5; i++ )
printf( "%2.2X ", ((char*)arr)[i] );
printf( "\n" );
((short*)(((char*) (&arr[1])) + 8))[3] = 100;
printf( "After: " );
for( i = 0; i < sizeof(int)*5; i++ )
printf( "%2.2X ", ((char*)arr)[i] );
printf( "\n" );
return 0;
}
Start from the inner most:
int pointer to (arr + 4)
&arr[1]
|...|...|...|...|...
Xxxx
char pointer to (arr + 4)
(char*)(&arr[1])
|...|...|...|...|...
X
char pointer to (arr + 4 + 8)
((char*)(&arr[1])) + 8)
|...|...|...|...|...
X
short pointer to (arr + 4 + 8)
(short*)((char*)(&arr[1])) + 8)
|...|...|...|...|...
Xx
short at (arr + 4 + 8 + (3 * 2)) (this is an array index)
((short*)((char*)(&arr[1])) + 8))[3]
|...|...|...|...|...
Xx
Exactly which byte gets modified here depends on the endianess of your system. On my little endian x86 I get the following output:
Before: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
After: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 64 00
Good Luck with your course.
I find myself writing a simple program to extract data from a bmp file. I just got started and I am at one of those WTF moments.
When I run the program and supply this image: http://www.hack4fun.org/h4f/sites/default/files/bindump/lena.bmp
I get the output:
type: 19778
size: 12
res1: 0
res2: 54
offset: 2621440
The actual image size is 786,486 bytes. Why is my code reporting 12 bytes?
The header format specified in,
http://en.wikipedia.org/wiki/BMP_file_format matches my BMP_FILE_HEADER structure. So why is it getting filled with wrong information?
The image file doesn't appear to be corrupt and other images are giving equally wrong outputs. What am I missing?
#include <stdio.h>
#include <stdlib.h>
typedef struct {
unsigned short type;
unsigned int size;
unsigned short res1;
unsigned short res2;
unsigned int offset;
} BMP_FILE_HEADER;
int main (int args, char ** argv) {
char *file_name = argv[1];
FILE *fp = fopen(file_name, "rb");
BMP_FILE_HEADER file_header;
fread(&file_header, sizeof(BMP_FILE_HEADER), 1, fp);
if (file_header.type != 'MB') {
printf("ERROR: not a .bmp");
return 1;
}
printf("type: %i\nsize: %i\nres1: %i\nres2: %i\noffset: %i\n", file_header.type, file_header.size, file_header.res1, file_header.res2, file_header.offset);
fclose(fp);
return 0;
}
Here the header in hex:
0000000 42 4d 36 00 0c 00 00 00 00 00 36 00 00 00 28 00
0000020 00 00 00 02 00 00 00 02 00 00 01 00 18 00 00 00
The length field is the bytes 36 00 0c 00`, which is in intel order; handled as a 32-bit value, it is 0x000c0036 or decimal 786,486 (which matches the saved file size).
Probably your C compiler is aligning each field to a 32-bit boundary. Enable a pack structure option, pragma, or directive.
There are two mistakes I could find in your code.
First mistake: You have to pack the structure to 1, so every type size is exactly the size its meant to be, so the compiler doesn't align it for example in 4 bytes alignment. So in your code, short, instead of being 2 bytes, it was 4 bytes. The trick for this, is using a compiler directive for packing the nearest struct:
#pragma pack(1)
typedef struct {
unsigned short type;
unsigned int size;
unsigned short res1;
unsigned short res2;
unsigned int offset;
} BMP_FILE_HEADER;
Now it should be aligned properly.
The other mistake is in here:
if (file_header.type != 'MB')
You are trying to check a short type, which is 2 bytes, with a char type (using ''), which is 1 byte. Probably the compiler is giving you a warning about that, it's canonical that single quotes contain just 1 character with 1-byte size.
To get this around, you can divide this 2 bytes into 2 1-byte characters, which are known (M and B), and put them together into a word. For example:
if (file_header.type != (('M' << 8) | 'B'))
If you see this expression, this will happen:
'M' (which is 0x4D in ASCII) shifted 8 bits to the left, will result in 0x4D00, now you can just add or or the next character to the right zeroes: 0x4D00 | 0x42 = 0x4D42 (where 0x42 is 'B' in ASCII). Thinking like this, you could just write:
if (file_header.type != 0x4D42)
Then your code should work.