I've been trying to brush up on my C recently and was writing a program to manually parse through a PNG file.
I viewed the PNG file in a hex editor and noticed a stream of bytes that looked like
00 00 00 0D
in hex format.
This string supposedly represents a length that I am interested in.
I used getc(file) to pull in the bytes of the PNG file.
I created a char array as
char example[8];
to store the characters retrieved from getc.
Now, I have populated example and printing it with
printf("%#x, %#x, %#x, %#x", example[0]....
shows 0, 0, 0, 0xd which is exactly what I want.
However when I use
int x = atoi(example)
or
int x = strtol(example, NULL, 16)
I get back zero in both cases (I was expecting 13). Am I missing something fundamental?
atoi converts strings like "0" to its numeric equivalent, in this case 0. What you have instead is the string "\0\0\0\0\0\0\0\r" which is nowhere near numeric characters.
If you want to interpret your bytes as a number you could do something like
char example[4] = {0, 0, 0, 0xd};
printf("%d\n", *(uint32_t*) example);
You will notice (in case you're using a x86 CPU) that you will get 218103808 instead of 13
due to little endianness: the farther you go right the more significant the number gets.
As PNG uses big endian you can simply use be32toh (big endian to host endianess):
uint32_t* n = example;
printf("%u\n", be32toh(*n)
atoi and strtol expect text strings, while you have an array of binary values. To combine the individual bytes in an array to a larger integer, try something like:
uint32_t x = (a[0] << 24) | (a[1] << 16) | (a[2] << 8) | a[3];
atoi etc. operates on (ascii) strings.
You would get 123 for "123", which is in bytes 49 50 41 0.
What you have instead is binary 00 00 00 7B ... (well, endianess matters too).
Simple, but in this case wrong solution (ignoring endianess):
Cast the array address to int* and then get a value with *.
As integers in PNG are supposed to be big endian in any case,
the pointer casting would only work with big endian machines.
As portable solution, shifting the bytes with 24,16,8,0 and binary-orĀ“ing them will do.
Related
I want to write the 4 bytes of an int32_t to a binary file in big-endian order. I used fwrite() directly with a pointer to my int32_t, and it somewhat works, but the problem is that my integer is written in little-endian order, with the smallest bytes written first. For example, if I write:
int32_t specialInt = 262;
fwrite(&specialInt, 4, 1, myFile);
and I open it with my hex editor, I see:
06 01 00 00 ...
which is backwards compared to how I want it. I would like:
00 00 01 06 ...
How should I get my int_32t to be in big-endian order? Is there a built-in C library function that will get the bytes in the correct order, or should I use memcpy() to put the bytes into a temp char array, then write the bytes one by one backwards into the file?
Thanks pmg for writing the answer in the comment:
Assuming CHAR_BIT == 8:
unsigned char value[4];
value[0] = (uint32_t)specialInt >> 24;
value[1] = (uint32_t)specialInt >> 16;
value[2] = (uint32_t)specialInt >> 8;
value[3] = (uint32_t)specialInt;
fwrite(value, 4, 1, myFile);
You could use htonl() function from POSIX 2001 standard available in arpa/inet.h header. See https://linux.die.net/man/3/ntohl
Big endian is Internet byte order, known as "network byte order".
You need to transform int32_t to uint32_t. This cast is defined by C standard. Next transform it to network endian via htonl() and then write it to a file:
int32_t specialInt = 262;
uint32_t encodedInt = htonl((uint32_t) specialInt);
_Static_assert(sizeof encodedInt == 4, "Oops");
fwrite(&encodedInt, 4, 1, myFile);
It could be abbreviated a bit with a compound literal.
fwrite(&(uint32_t) { htonl((uint32_t)specialInt) }, 4, 1, myFile);
Suppose I have two arrays.
uint8_t[SIZE] src = { 0 };
uint32_t[SIZE] dst = { 0 };
uint8_t* srcPtr; // Points to current src value
uint32_t* dstPtr; // Points to current dst value
src holds values that sometimes need to be put into dst. Importantly, the values from src may be 8-bit, 16-bit, or 32-bit, and aren't necessarily properly aligned. So, suppose I wish to use memcpy() like below, to copy a 16-bit value
memcpy(dstPtr, srcPtr, 2);
Will I run into an endianness issue here? This works fine on little-endian systems, since if I want to copy 8, then srcPtr has 08 then 00 the bytes at dstPtr will be 08 00 00 00 and the value will be 8, as expected.
But if I were on a big-endian system, srcPtr would be 00 then 08, and the bytes at dstPtr will be 00 08 00 00 (I presume), which would take on a value of 524288.
What would be an endian-independent way to write this copy?
Will I run into an endianness issue here?
Not necessarily endianness issues per se, but yes, the specific approach you describe will run into issues with integer representation.
This works fine on
little-endian systems, since if I want to copy 8, then srcPtr has 08
then 00 the bytes at dstPtr will be 08 00 00 00 and the value will be
8, as expected.
You seem to be making an assumption there, either
that more bytes of the destination will be modified than you actually copy, or perhaps
that relevant parts of the destination are pre-set to all zero bytes.
But you need to understand that memcpy() will copy exactly the number of bytes requested. No more than that will be read from the specified source, and no more than that will be modified in the destination. In particular, the data types of the objects to which the source and destination pointers point have no effect on the operation of memcpy().
What would be an endian-independent way to write this copy?
The most natural way to do it would be via simple assignment, relying on the compiler to perform the necessary conversion:
*dstPtr = *srcPtr;
However, I take your emphasis on the prospect that the arrays might not aligned as a concern that it may be unsafe to dereference the source and / or destination pointer. That will not, in fact, be the case for pointers to char, but it might be the case for pointers to other types. For cases where you take memcpy as the only safe way to read from the arrays, the most portable method for converting value representations is still to rely on the implementation. For example:
uint8_t* srcPtr = /* ... */;
uint32_t* dstPtr = /* ... */;
uint16_t srcVal;
uint32_t dstVal;
memcpy(&srcVal, srcPtr, sizeof(srcVal));
dstVal = srcVal; // conversion is automatically performed
memcpy(dstPtr, &dstVal, sizeof(dstVal));
Will I run into an endianness issue here?
Yes. You're not copying, you're converting from one format to another (packing several unsigned integers into a single larger unsigned integer).
What would be an endian-independent way to write this copy?
The simple way is to make the conversion explicit, like:
for(int i = 0; i < something; i++) {
dest[i] = (uint32_t)src[i*4] | ((uint32_t)src[i*4+1] << 8) |
((uint32_t)src[i*4+2] << 16) | ((uint32_t)src[i*4+3] << 24);
}
However, for cases where using memcpy() works it's likely to be faster, and this won't change after compiling; so you could do something like:
#ifdef BIG_ENDIAN
for(int i = 0; i < something; i++) {
dest[i] = (uint32_t)src[i*4] | ((uint32_t)src[i*4+1] << 8) |
((uint32_t)src[i*4+2] << 16) | ((uint32_t)src[i*4+3] << 24);
}
#else
memcpy(dest, src, something*4);
#endif
Note: you'd also have to define the BIG_ENDIAN macro when appropriate - e.g. maybe a -D BIG_ENDIAN command line argument when starting the compiler when you know the target architecture needs it.
I'm storing 16-bit values in src which aren't 16-bit-aligned which then need to be put into a 64-bit integer
That adds another problem - some architectures do not allow misaligned accesses. You need to use explicit conversion (read 2 separate uint8_t, not a misaligned uint16_t) to avoid this problem too.
I'm struggling with a problem that requires I perform a hex dump to an object file I've created with the function fopen().
I've declared the necessary integer variable (in HEX) as follows:
//Declare variables
int code = 0xCADE;
The output must be big Endian so I've swapped the bytes in this manner:
//Swap bytes
int swapped = (code>>8) | (code<<8);
I then opened the file for binary output in this manner:
//Open file for binary writing
FILE *dest_file = fopen(filename, "wb");
Afterwards, I write the variable code (which corresponds to a 16 bit word) to the file in the following manner using fwrite():
//Write out first word of header (0xCADE) to file
fwrite(&swapped, sizeof(int), 1, dest_file);
After compiling, running, and performing a hexdump on the file in which the contents have been written to, I observe the following output:
0000000 ca de ca 00
0000004
Basically everything is correct up until the extra "ca 00". I am unsure why that is there and need it removed so that my output is just:
0000000 ca de
0000004
I know the Endianness problem has been addressed extensively on the stack, but after performing a serach, I am unclear as to how to classify this problem. How can I approach this problem so that "ca 00" is removed?
Thanks very much.
EDIT:
I've changed both:
//Declare variables
int code = 0xCADE;
//Swap bytes
int swapped = (code>>8) | (code<<8);
to:
//Declare variables
unsigned short int code = 0xCADE;
//Swap bytes
unsigned short int swapped = (code>>8) | (code<<8);
And I observe:
0000000 ca de 00 00
0000004
Which gets me closer to what I need but there's still that extra "00 00". Any help is appreciated!
You are telling fwrite to write sizeof(int) bytes, which on your system evaluates to 4 bytes (the size of int is 4). If you want to write two bytes, just do:
fwrite(&swapped, 2, 1, dest_file);
To reduce confusion, code that reorders bytes should use bytes (uint8 or char) and not multi-byte types like int.
To swap two bytes:
char bytes[2];
char temp;
fread(bytes, 2, 1, file1);
temp = bytes[0];
bytes[0] = bytes[1];
bytes[1] = temp;
fwrite(bytes, 2, 1, file2);
If you use int, you probably deceive yourself assuming that its size is 2 (while it's most likely 4), and assuming anything about how your system writes int to files, which may be incorrect. While if you work with bytes, there cannot be any surprises - your code does exactly what it looks like it does.
I am curious about little-endian
and I know that computers almost have little-endian method.
So, I praticed through a program and the source is below.
int main(){
int flag = 31337;
char c[10] = "abcde";
int flag2 = 31337;
return 0;
}
when I saw the stack via gdb,
I noticed that there were 0x00007a69 0x00007a69 .... ... ... .. .... ...
0x62610000 0x00656463 .. ...
So, I have two questions.
For one thing,
how can the value of char c[10] be under the flag?
I expected there were the value of flag2 in the top of stack and the value of char c[10] under the flag2 and the value of flag under the char c[10].
like this
7a69
"abcde"
7a69
Second,
I expected the value were stored in the way of little-endian.
As a result, the value of "abcde" was stored '6564636261'
However, the value of 31337 wasn't stored via little-endian.
It was just '7a69'.
I thought it should be '697a'
why doesn't integer type conform little-endian?
There is some confusion in your understanding of endianness, stack and compilers.
First, the locations of variables in the stack may not have anything to do with the code written. The compiler is free to move them around how it wants, unless it is a part of a struct, for example. Usually they try to make as efficient use of memory as possible, so this is needed. For example having char, int, char, int would require 16 bytes (on a 32bit machine), whereas int, int, char, char would require only 12 bytes.
Second, there is no "endianness" in char arrays. They are just that: arrays of values. If you put "abcde" there, the values have to be in that order. If you would use for example UTF16 then endianness would come into play, since then one part of the codeword (not necessarily one character) would require two bytes (on a normal 8-bit machine). These would be stored depending on endianness.
Decimal value 31337 is 0x007a69 in 32bit hexadecimal. If you ask a debugger to show it, it will show it as such whatever the endianness. The only way to see how it is in memory is to dump it as bytes. Then it would be 0x69 0x7a 0x00 0x00 in little endian.
Also, even though little endian is very popular, it's mainly because x86 hardware is popular. Many processors have used big endian (SPARC, PowerPC, MIPS amongst others) order and some (like older ARM processors) could run in either one, depending on the requirements.
There is also a term "network byte order", which actually is big endian. This relates to times before little endian machines became most popular.
Integer byte order is an arbitrary processor design decision. Why for example do you appear to be uncomfortable with little-endian? What makes big-endian a better choice?
Well probably because you are a human used to reading numbers from left-to-right; but the machine hardly cares.
There is in fact a reasonable argument that it is intuitive for the least-significant-byte to be placed in the lowest order address; but again, only from a human intuition point-of-view.
GDB shows you 0x62610000 0x00656463 because it is interpreting data (...abcde...) as 32bit words on a little endian system.
It could be either way, but the reasonable default is to use native endianness.
Data in memory is just a sequence of bytes. If you tell it to show it as a sequence (array) of short ints, it changes what it displays. Many debuggers have advanced memory view features to show memory content in various interpretations, including string, int (hex), int (decimal), float, and many more.
You got a few excellent answers already.
Here is a little code to help you understand how variables are laid out in memory, either using little-endian or big-endian:
#include <stdio.h>
void show_var(char* varname, unsigned char *ptr, size_t size) {
int i;
printf ("%s:\n", varname);
for (i=0; i<size; i++) {
printf("pos %d = %2.2x\n", i, *ptr++);
}
printf("--------\n");
}
int main() {
int flag = 31337;
char c[10] = "abcde";
show_var("flag", (unsigned char*)&flag, sizeof(flag));
show_var("c", (unsigned char*)c, sizeof(c));
}
On my Intel i5 Linux machine it produces:
flag:
pos 0 = 69
pos 1 = 7a
pos 2 = 00
pos 3 = 00
--------
c:
pos 0 = 61
pos 1 = 62
pos 2 = 63
pos 3 = 64
pos 4 = 65
pos 5 = 00
pos 6 = 00
pos 7 = 00
pos 8 = 00
pos 9 = 00
--------
Using the system call write, I am trying to write a number to a file. I want the file pointed by fileid to have 4 as '04'(expected outcome).
unsigned int g = 4;
if (write(fileid, &g, (size_t) sizeof(int) ) == -1)
{
perror("Error"); exit(1);
}
I get the output '0000 0004' in my file. If I put one instead of sizeof(int) I get 00.
Is there a specific type that I missed ?
PS. I have to read this value form the file also, so if there isn't a type I'm not quite sure how I would go about doing that.
If writing 1 byte of g will print 00 or 04 will depend on the architecture. Usually, 32-bit integers will be stored in the memory using little-endian, meaning the less significant byte comes first, therefore 32-bits int 4 is stored as 04 00 00 00 and the first byte is 04.
But this is not always true. Some architectures will store using big-endian, so the byte order in memory is the same as its read in 32-bit hexadecimal 00 00 00 04.
Wikipedia Article.
sizeof(int) will return 4; so actually, the code is writing four bytes.
Change the type of 'g' from
unsigned int
to
unsigned char
... and, change
sizeof(int)
to
sizeof(unsigned char) .. or sizeof(g)
Then you should see that only one byte '04' will be written.
In this circumstance I would recommend using uint8_t, which is defined in <stdint.h>. On basically all systems you will ever encounter, this is a typedef for unsigned char, but using this name makes it clearer that the value in the variable is being treated as a number, not a character.
uint8_t g = 4;
if (write(fileid, &g, 1) != 1) {
perror("write");
exit(1);
}
(sizeof(char) == 1 by definition, and therefore so is sizeof(uint8_t).)
To understand why your original code did not behave as you expected, read up on endianness.
If you want to save only one byte, it will be more appropriate to create a variable that is of size one byte and save it using write.
unsigned int g = 4;
unsinged char c = (unsigned char)g;
if (write(fileid, &c, 1 ) == -1)
{
perror("Error"); exit(1);
}
If you lose any data, it will be in the program and not in/out of files.