Issue in converting little Endian hexdump output to Big Endian (C-programming)

Issue in converting little Endian hexdump output to Big Endian (C-programming) - c

I'm struggling with a problem that requires I perform a hex dump to an object file I've created with the function fopen().
I've declared the necessary integer variable (in HEX) as follows:
//Declare variables
int code = 0xCADE;
The output must be big Endian so I've swapped the bytes in this manner:
//Swap bytes
int swapped = (code>>8) | (code<<8);
I then opened the file for binary output in this manner:
//Open file for binary writing
FILE *dest_file = fopen(filename, "wb");
Afterwards, I write the variable code (which corresponds to a 16 bit word) to the file in the following manner using fwrite():
//Write out first word of header (0xCADE) to file
fwrite(&swapped, sizeof(int), 1, dest_file);
After compiling, running, and performing a hexdump on the file in which the contents have been written to, I observe the following output:
0000000 ca de ca 00
0000004
Basically everything is correct up until the extra "ca 00". I am unsure why that is there and need it removed so that my output is just:
0000000 ca de
0000004
I know the Endianness problem has been addressed extensively on the stack, but after performing a serach, I am unclear as to how to classify this problem. How can I approach this problem so that "ca 00" is removed?
Thanks very much.
EDIT:
I've changed both:
//Declare variables
int code = 0xCADE;
//Swap bytes
int swapped = (code>>8) | (code<<8);
to:
//Declare variables
unsigned short int code = 0xCADE;
//Swap bytes
unsigned short int swapped = (code>>8) | (code<<8);
And I observe:
0000000 ca de 00 00
0000004
Which gets me closer to what I need but there's still that extra "00 00". Any help is appreciated!

You are telling fwrite to write sizeof(int) bytes, which on your system evaluates to 4 bytes (the size of int is 4). If you want to write two bytes, just do:
fwrite(&swapped, 2, 1, dest_file);

To reduce confusion, code that reorders bytes should use bytes (uint8 or char) and not multi-byte types like int.
To swap two bytes:
char bytes[2];
char temp;
fread(bytes, 2, 1, file1);
temp = bytes[0];
bytes[0] = bytes[1];
bytes[1] = temp;
fwrite(bytes, 2, 1, file2);
If you use int, you probably deceive yourself assuming that its size is 2 (while it's most likely 4), and assuming anything about how your system writes int to files, which may be incorrect. While if you work with bytes, there cannot be any surprises - your code does exactly what it looks like it does.

Related

Parsing ID3V2 Frames in C

I have been attempting to retrieve ID3V2 Tag Frames by parsing through the mp3 file and retrieving each frame's size. So far I have had no luck.
I have effectively allocated memory to a buffer to aid in reading the file and have been successful in printing out the header version but am having difficulty in retrieving both the header and frame sizes. For the header framesize I get 1347687723, although viewing the file in a hex editor I see 05 2B 19.
Two snippets of my code:
typedef struct{ //typedef structure used to read tag information
char tagid[3]; //0-2 "ID3"
unsigned char tagversion; //3 $04
unsigned char tagsubversion;//4 00
unsigned char flags; //5-6 %abc0000
uint32_t size; //7-10 4 * %0xxxxxxx
}ID3TAG;
if(buff){
fseek(filename,0,SEEK_SET);
fread(&Tag, 1, sizeof(Tag),filename);
if(memcmp(Tag.tagid,"ID3", 3) == 0)
{
printf("ID3V2.%02x.%02x.%02x \nHeader Size:%lu\n",Tag.tagversion,
Tag.tagsubversion, Tag.flags ,Tag.size);
}
}

Due to memory alignment, the compiler has set 2 bytes of padding between flags and size. If your struct were putted directly in memory, size would be at address 6 (from the beginning of the struct). Since an element of 4 bytes size must be at an address multiple of 4, the compiler adds 2 bytes, so that size moves to the closest multiple of 4 address, which is here 8. So when you read from your file, size contains bytes 8-11. If you try to print *(&Tag.size - 2), you'll surely get the correct result.
To fix that, you can read fields one by one.

ID3v2 header structure is consistent across all ID3v2 versions (ID3v2.0, ID3v2.3 and ID3v2.4).
Its size is stored as a big-endian synch-safe int32
Synchsafe integers are
integers that keep its highest bit (bit 7) zeroed, making seven bits
out of eight available. Thus a 32 bit synchsafe integer can store 28
bits of information.
Example:
255 (%11111111) encoded as a 16 bit synchsafe integer is 383
(%00000001 01111111).
Source : http://id3.org/id3v2.4.0-structure § 6.2
Below is a straightforward, real-life C# implementation that you can easily adapt to C
public int DecodeSynchSafeInt32(byte[] bytes)
{
return
bytes[0] * 0x200000 + //2^21
bytes[1] * 0x4000 + //2^14
bytes[2] * 0x80 + //2^7
bytes[3];
}
=> Using values you read on your hex editor (00 05 EB 19), the actual tag size should be 112025 bytes.

By coincidence I am also working on an ID3V2 reader. The doc says that the size is encoded in four 7-bit bytes. So you need another step to convert the byte array into an integer... I don't think just reading those bytes as an int will work because of the null bit on top.

why does a integer type need to be little-endian?

I am curious about little-endian
and I know that computers almost have little-endian method.
So, I praticed through a program and the source is below.
int main(){
int flag = 31337;
char c[10] = "abcde";
int flag2 = 31337;
return 0;
}
when I saw the stack via gdb,
I noticed that there were 0x00007a69 0x00007a69 .... ... ... .. .... ...
0x62610000 0x00656463 .. ...
So, I have two questions.
For one thing,
how can the value of char c[10] be under the flag?
I expected there were the value of flag2 in the top of stack and the value of char c[10] under the flag2 and the value of flag under the char c[10].
like this
7a69
"abcde"
7a69
Second,
I expected the value were stored in the way of little-endian.
As a result, the value of "abcde" was stored '6564636261'
However, the value of 31337 wasn't stored via little-endian.
It was just '7a69'.
I thought it should be '697a'
why doesn't integer type conform little-endian?

There is some confusion in your understanding of endianness, stack and compilers.
First, the locations of variables in the stack may not have anything to do with the code written. The compiler is free to move them around how it wants, unless it is a part of a struct, for example. Usually they try to make as efficient use of memory as possible, so this is needed. For example having char, int, char, int would require 16 bytes (on a 32bit machine), whereas int, int, char, char would require only 12 bytes.
Second, there is no "endianness" in char arrays. They are just that: arrays of values. If you put "abcde" there, the values have to be in that order. If you would use for example UTF16 then endianness would come into play, since then one part of the codeword (not necessarily one character) would require two bytes (on a normal 8-bit machine). These would be stored depending on endianness.
Decimal value 31337 is 0x007a69 in 32bit hexadecimal. If you ask a debugger to show it, it will show it as such whatever the endianness. The only way to see how it is in memory is to dump it as bytes. Then it would be 0x69 0x7a 0x00 0x00 in little endian.
Also, even though little endian is very popular, it's mainly because x86 hardware is popular. Many processors have used big endian (SPARC, PowerPC, MIPS amongst others) order and some (like older ARM processors) could run in either one, depending on the requirements.
There is also a term "network byte order", which actually is big endian. This relates to times before little endian machines became most popular.

Integer byte order is an arbitrary processor design decision. Why for example do you appear to be uncomfortable with little-endian? What makes big-endian a better choice?
Well probably because you are a human used to reading numbers from left-to-right; but the machine hardly cares.
There is in fact a reasonable argument that it is intuitive for the least-significant-byte to be placed in the lowest order address; but again, only from a human intuition point-of-view.

GDB shows you 0x62610000 0x00656463 because it is interpreting data (...abcde...) as 32bit words on a little endian system.
It could be either way, but the reasonable default is to use native endianness.
Data in memory is just a sequence of bytes. If you tell it to show it as a sequence (array) of short ints, it changes what it displays. Many debuggers have advanced memory view features to show memory content in various interpretations, including string, int (hex), int (decimal), float, and many more.

You got a few excellent answers already.
Here is a little code to help you understand how variables are laid out in memory, either using little-endian or big-endian:
#include <stdio.h>
void show_var(char* varname, unsigned char *ptr, size_t size) {
int i;
printf ("%s:\n", varname);
for (i=0; i<size; i++) {
printf("pos %d = %2.2x\n", i, *ptr++);
}
printf("--------\n");
}
int main() {
int flag = 31337;
char c[10] = "abcde";
show_var("flag", (unsigned char*)&flag, sizeof(flag));
show_var("c", (unsigned char*)c, sizeof(c));
}
On my Intel i5 Linux machine it produces:
flag:
pos 0 = 69
pos 1 = 7a
pos 2 = 00
pos 3 = 00
--------
c:
pos 0 = 61
pos 1 = 62
pos 2 = 63
pos 3 = 64
pos 4 = 65
pos 5 = 00
pos 6 = 00
pos 7 = 00
pos 8 = 00
pos 9 = 00
--------

writing a byte with the "write" system call in C

Using the system call write, I am trying to write a number to a file. I want the file pointed by fileid to have 4 as '04'(expected outcome).
unsigned int g = 4;
if (write(fileid, &g, (size_t) sizeof(int) ) == -1)
{
perror("Error"); exit(1);
}
I get the output '0000 0004' in my file. If I put one instead of sizeof(int) I get 00.
Is there a specific type that I missed ?
PS. I have to read this value form the file also, so if there isn't a type I'm not quite sure how I would go about doing that.

If writing 1 byte of g will print 00 or 04 will depend on the architecture. Usually, 32-bit integers will be stored in the memory using little-endian, meaning the less significant byte comes first, therefore 32-bits int 4 is stored as 04 00 00 00 and the first byte is 04.
But this is not always true. Some architectures will store using big-endian, so the byte order in memory is the same as its read in 32-bit hexadecimal 00 00 00 04.
Wikipedia Article.

sizeof(int) will return 4; so actually, the code is writing four bytes.
Change the type of 'g' from
unsigned int
to
unsigned char
... and, change
sizeof(int)
to
sizeof(unsigned char) .. or sizeof(g)
Then you should see that only one byte '04' will be written.

In this circumstance I would recommend using uint8_t, which is defined in <stdint.h>. On basically all systems you will ever encounter, this is a typedef for unsigned char, but using this name makes it clearer that the value in the variable is being treated as a number, not a character.
uint8_t g = 4;
if (write(fileid, &g, 1) != 1) {
perror("write");
exit(1);
}
(sizeof(char) == 1 by definition, and therefore so is sizeof(uint8_t).)
To understand why your original code did not behave as you expected, read up on endianness.

If you want to save only one byte, it will be more appropriate to create a variable that is of size one byte and save it using write.
unsigned int g = 4;
unsinged char c = (unsigned char)g;
if (write(fileid, &c, 1 ) == -1)
{
perror("Error"); exit(1);
}
If you lose any data, it will be in the program and not in/out of files.

Integer Conversion for Char Array

I've been trying to brush up on my C recently and was writing a program to manually parse through a PNG file.
I viewed the PNG file in a hex editor and noticed a stream of bytes that looked like
00 00 00 0D
in hex format.
This string supposedly represents a length that I am interested in.
I used getc(file) to pull in the bytes of the PNG file.
I created a char array as
char example[8];
to store the characters retrieved from getc.
Now, I have populated example and printing it with
printf("%#x, %#x, %#x, %#x", example[0]....
shows 0, 0, 0, 0xd which is exactly what I want.
However when I use
int x = atoi(example)
or
int x = strtol(example, NULL, 16)
I get back zero in both cases (I was expecting 13). Am I missing something fundamental?

atoi converts strings like "0" to its numeric equivalent, in this case 0. What you have instead is the string "\0\0\0\0\0\0\0\r" which is nowhere near numeric characters.
If you want to interpret your bytes as a number you could do something like
char example[4] = {0, 0, 0, 0xd};
printf("%d\n", *(uint32_t*) example);
You will notice (in case you're using a x86 CPU) that you will get 218103808 instead of 13
due to little endianness: the farther you go right the more significant the number gets.
As PNG uses big endian you can simply use be32toh (big endian to host endianess):
uint32_t* n = example;
printf("%u\n", be32toh(*n)

atoi and strtol expect text strings, while you have an array of binary values. To combine the individual bytes in an array to a larger integer, try something like:
uint32_t x = (a[0] << 24) | (a[1] << 16) | (a[2] << 8) | a[3];

atoi etc. operates on (ascii) strings.
You would get 123 for "123", which is in bytes 49 50 41 0.
What you have instead is binary 00 00 00 7B ... (well, endianess matters too).
Simple, but in this case wrong solution (ignoring endianess):
Cast the array address to int* and then get a value with *.
As integers in PNG are supposed to be big endian in any case,
the pointer casting would only work with big endian machines.
As portable solution, shifting the bytes with 24,16,8,0 and binary-or´ing them will do.

How can I add 2 byte CRC at the end of File

I have one encrypted file named encrypt.
Here I calculated crc 16 for this file and store this crc result in unsigned short this unsigned short size is 2 byte(16 bits).
Now I want to append 2 byte of crc value at the end of this file and read these last 2 bytes from file and have to compare this crc so how can I achieve this thing?
I used this code
fseek(readFile, filesize, SEEK_SET);
fprintf(readFile,"%u",result);
Here filesize is my file original encrypted file size and after this i add result which is unsigned short but in file its write 5 bytes.
file content after this
testsgh
30549
original file data is testsgh but here crc is 30459 I want to store this value in 2 byte. so how can I do?

You should open the file in binary append mode:
FILE *out = fopen("myfile.bin", "ab");
This will eliminate the need to seek to the end.
Then, you need to use a direct write, not a print which converts the value to a string and writes the string. You want to write the bits of your unsigned short checksum:
const size_t wrote = fwrite(&checksum, sizeof checksum, 1, out);
This succeeded if and only if the value of wrote is 1.
However, please note that this risks introducing endianness errors, since it writes the value using your machine's local byte order. To be on the safe side, it's cleaner to decide on a byte ordering and implement it directly. For big-endian:
const unsigned char check_bytes[2] = { checksum >> 8, checksum & 255 };
const size_t wrote = fwrite(check_bytes, sizeof check_bytes, 1, out);
Again, we expect wrote to be 1 after the call to indicate that both bytes were successfully written.

Use fwrite(), not fprintf. I don't have access to a C compiler atm but fwrite(&result, sizeof(result), 1, readFile); should work.

You could do something like this:
unsigned char c1, c2;
c1 = (unsigned char)(result >> 8);
c2 = (unsigned char)( (result << 8) >> 8);
and then append c1 and c2 at the end of the file. When you read the file back, just do the opposite:
result = ( (unsigned)c1 << 8 ) + (unsigned)c2;
Hope that helps.

you can write single characters with %c formating. e.g.
fprintf(readfile, "%c%c", result % 256, result / 256)
btw: readfile is misleading, when you write to it :-)