Particular Big endian little endian representation of 0x12345678? - arm

mem32[&100]=&12345678 If big-endian addressing is used what is mem8[&101]?
I am getting the answer 56 and here is my reasoning;
In my understanding; in big endian system, the most significant byte is stored first in memory so i.e.:
0x100 0x101 0x102 0x103
78 56 34 12
But the "correct" answer is 34.
Could someone explain why?/ Tell me that answer is wrong?
EDIT: I realised my mistake. For a moment I forgot the number at the left end is the most significant!

For the 32 bit value 0x12345678, 12 is the most significant byte, and this comes first on a big endian system, followed by 34, 56, 78.
Big endian:
0x100 12
0x101 34
0x102 56
0x103 78
Little endian:
0x100 78
0x101 56
0x102 34
0x103 12

You got it the wrong way round:
0x100 0x101 0x102 0x103
78 56 34 12
This is 0x78563412 in BIG endian, or 0x12345678 in LITTLE endian.
The 32 Bit value 0x12345678 in BIG endian is 12 34 56 78.

Related

How does little-endian processor save a number in their memory in C?

short a = 0x1234;
char *p = &a;
printf("%x%x", *p, *(p+1));
output
3412
I'm curious about how memory store a value 0x1234. I firstly thought 0x1234 is saved as 0x3412(reversed per byte) in memory, but according to lsb 0 numbering, it seems right that memory saves 0x1234 as 0x2c48(reversed per bit) I think that the value 0x1234 is saved in memory as 0x2c48 and little-endian cpu regards 0x2c48 as 0x1234. Is this correct?
On a little-endian system a multi-byte word is stored in reverse byte order.
So e.g. the 16-bit word 0x1234 will be stored with 0x34 in the low address, and 0x12 in the high address.
As a byte array it would be
uint8_t word[2] = { 0x34, 0x12 };
Bits of a byte are never reversed.
A 32-bit (four byte) word like 0x12345678 would be stored in the order 0x78, 0x56, 0x34 and 0x12 (low to high address).
Little Endian processors store the least significant Byte first and then more significant bytes.
e.g. short int a = 0x1234 will be stored as
|------|------|
| 0x34 | 0x12 |
|------|------|
A 32 bit integer unsigned int b = 0x12345678 is stored as
|------|------|------|------|
| 0x78 | 0x56 | 0x34 | 0x12 |
|------|------|------|------|
It is impossible to find out which "order" your CPU stores bits in, because you cannot address individual bits.
Little-endian means the CPU stores the lowest byte first. But what does first mean? It means with the lowest address. Bits don't have addresses, so there's no way to tell which one is first. End of story.
Here's how we tell whether a CPU is little-endian with bytes:
LITTLE ENDIAN BIG ENDIAN
words: 0x1234 words: 0x1234
/---------------\ /---------------\
bytes: 0x34 0x12 bytes: 0x34 0x12
/------\ /------\ /------\ /------\
+------+ +------+ +------+ +------+
| 0x34 | | 0x12 | | 0x34 | | 0x12 |
+------+ +------+ +------+ +------+
Address: 5000 5001 Address: 5001 5000
I have written the bytes in the same order on both sides, but the address is different.
That's how you tell the difference between little-endian and big-endian. You write a word to address 5000, then you check whether the byte with address 5000 contains 0x34 or 0x12.
Now, with bits you have this situation:
LITTLE BIT-ENDIAN BIG BIT-ENDIAN
bytes: 0x34 words: 0x34
/---------------\ /---------------\
+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+
bits: |0|0|1|0|1|1|0|0| bits: |0|0|1|0|1|1|0|0|
+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+
address: 5000 5000
I have written the bits in the same order on both sides, but the address is the same. Only the byte has an address. The individual bits don't have addresses. If they had addresses, you could ask for, say, bit 5000.4 and see whether it's 1 or a 0 to figure out whether bits are little-endian or big-endian, but they don't, so you can't.
If you store a byte and then read it back, you will get the same byte value, no matter whether your system has little-endian or big-endian bits.

Printing out stack gives weird positioning of

I'm currently trying to understand string formatting vulnerabilities in C, but to get there, I have to understand some weird (at least for me) behaviour of the memory stack.
I have a program
#include <string.h>
#include <stdio.h>
int main(int argc, char *argv[]) {
char buffer[200];
char key[] = "secret";
printf("Location of key: %p\n", key);
printf("Location of buffer: %p\n", &buffer);
strcpy(buffer, argv[1]);
printf(buffer);
printf("\n");
return 0;
}
which I call with
./form AAAA.BBBE.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x
What I would expect is to get something like
... .41414141.42424245. ...
but I get
... .41414141.4242422e.30252e45. ... (there is some character in between B and E).
What is happening here?
I disabled ASLR and stack protection and compile it with -m32 flag.
I think your output is just fine. x86 is little-endian - least significant byte of a number has smaller address in memory, so 1000 (0x3E8) is stored as E8 03, not 03 E8 (that would be big-endian).
Let's assume that the compiler passes all arguments to printf through stack and variadic arguments are expected to be laid on the stack from its top to its end (on x86 that means "from lower addresses to higher addresses").
So, before calling printf our stack would like like this:
<return address><something>AAAA.BBBE.%08x.%<something>
^ - head of the stack
Or, if we spell each byte in hex:
<return address><something>414141412e424242452e253038782e25<something>
^ - head of the stack A A A A . B B B E . % 0 8 x . %
Then you ask printf to take a lot of unsigned ints from the stack (32-bit, presumably) and print them in hexadecimal, separated by dots. It skips <return address> and some other details of stack frame and starts from some random point in the stack before buffer (because buffer is in parent's stack frame). Suppose that at some point it takes the following chunk as 4-byte int:
<return address><something>414141412e424242452e253038782e25<something>
^ - head of the stack A A A A . B B B E . % 0 8 x . %
^^^^^^^^
That is, our int is represented in memory with four bytes. Their values are, starting from the byte with the smallest address: 41 41 41 2e. As x86 is a little-endian, 2e is the most significant byte, which means this sequence is interpreted as 0x2e414141 and printed as such.
Now, if we look at your output:
41414141.4242422e.30252e45
We see that there are three ints: 0x41414141 (stored as 41 41 41 41 in memory), 0x4242422e (stored as 2e 42 42 42 in memory because the least significant byte has the smallest address) and 0x30252e45 (stored as 45 2e 25 30 in memory). That is, in that case printf read the following bytes:
number one |number two |number three|
41 41 41 41|2e 42 42 42|45 2e 25 30 |
A A A A |. B B B |E . % 0 |
Which looks perfectly correct to me - it's beginning of buffer as expected.
This is essentially what you're outputting with the %08x formats, and you're on a little-endian machine:
41 41 41 41 2e 42 42 42 45 2e 25 30 38 78 2e 25 30 38 78 2e 25 30 38 78 2e
The first is all 41s, and they get flipped to be all 41s.
The next four bytes are 2e424242, which become 4242422e.
Then, 452e2530 becomes 30252e45.
It's easier to figure this out if you look at buffer in a memory window in your debugger.
By the way, you can print the address of buffer like this (without the &):
printf("Location of buffer: %p\n", buffer);
You're passing AAAA.BBBE.%08x... to printf which is the format specifier. So printf expects an additional unsigned integer argument for every %08x. But you don't provide any, the behaviour will be undefined.
You can read in the C Draft Standard (n1256):
If there are insufficient arguments for the format, the behavior is undefined.
You're getting hexadecimal output from anywhere which is in your case from the stack.

Need clarification about unsigned char * in C

Given the code:
...
int x = 123
...
unsigned char * xx = (char *) & x;
...
I have xx[0] = 123, xx[1] = 0, xx[2] = 0, etc.
Can someone explain what is happening here? I dont have a great understanding of pointers in general, so the simpler the better.
Thanks
You're accessing the bytes (chars) of a little-endian int in sequence. The number 123 in an int on a little-endian system will usually be stored as {123,0,0,0}. If your number had been 783 (256 * 3 + 15), it would be stored as {15,3,0,0}.
I'll try to explain all the pieces in ASCII pictures.
int x = 123;
Here, x is the symbol representing a location of type int. Type int uses 4 bytes of memory on a 32-bit machine, or 8 bytes on a 64-bit machine. This can be compiler dependent as well. But for this discussion, let's assume 32-bits (4 bytes).
Memory on x86 is managed "little endian", meaning if a number requires multiple bytes (it's value is > 255 unsigned, or > 127 signed, single byte values), then the number is stored with the least significant byte in the lowest address. If your number were hexadecimal, 0x12345678, then it would be stored as:
x: 78 <-- address that `x` represents
56 <-- x addr + 1 byte
34 <-- x addr + 2 bytes
12 <-- x addr + 3 bytes
Your number, decimal 123, is 7B hex, or 0000007B (all 4 bytes shown), so would look like:
x: 7B <-- address that `x` represents
00 <-- x addr + 1 byte
00 <-- x addr + 2 bytes
00 <-- x addr + 3 bytes
To make this clearer, let's make up a memory address for x, say, 0x00001000. Then the byte locations would have the following values:
Address Value
x: 00001000 7B
00001001 00
00001002 00
00001003 00
Now you have:
unsigned char * xx = (char *) & x;
Which defines a pointer to an unsigned char (an 8-bit, or 1-byte unsigned value, ranging 0-255) whose value is the address of your integer x. In other words, the value contained at location xx is 0x00001000.
xx: 00
10
00
00
The ampersand (&) indicates you want the address of x. And, technically, the declaration isn't correct. It really should be cast properly as:
unsigned char * xx = (unsigned char *) & x;
So now you have a pointer, or address, stored in the variable xx. That address points to x:
Address Value
x: 00001000 7B <-- xx points HERE (xx has the value 0x00001000)
00001001 00
00001002 00
00001003 00
The value of xx[0] is what xx points to offset by 0 bytes. It's offset by bytes because the type of xx is a pointer to an unsigned char which is one byte. Therefore, each offset count from xx is by the size of that type. The value of xx[1] is just one byte higher in memory, which is the value 00. And so on. Pictorially:
Address Value
x: 00001000 7B <-- xx[0], or the value at `xx` + 0
00001001 00 <-- xx[1], or the value at `xx` + 1
00001002 00 <-- xx[2], or the value at `xx` + 2
00001003 00 <-- xx[3], or the value at `xx` + 3
Yeah, you're doing something you shouldn't be doing...
That said... One part of the result is you're working on a little Endian processor. The int x = 123; statement allocates 4 bytes on the stack and intializes it with the value 123; Since it is little Endian, the memory looks like 123, 0, 0, 0 in memory. If it was big Endian, it would be 0, 0, 0, 123. Your char pointer is pointing to the first byte of memory where x is stored.
unsigned char * xx = (char *) & x;
You take the address of x, you tell the compiler it is a pointer to a character[string], you assign that to xx, which is a pointer to a character[string]. The cast to (char *) just keeps the compiler happy.
Now if you print xx, or inspect it, it can depend on the machine what you see - the so-called little-endian ot big-endian way of storing integers. X86 is little endian and stores the bytes of the integer in reverse. So storing 0x00000123 will store 0x23 0x01 0x00 0x00, which is what you see when inspecting the location xx points to as characters.

What does it mean "bytes numbered from 0 (LSB) to 3 (MSB)"?

I should extract byte n from word x.
Example: getByte(0x12345678,1) = 0x56.
And there is written, that bytes numbered from 0(LSB) to 3(MSB), the meaning of which I can't understand.
Thank you.
Consider your 32 bit word (0x12345678) as 4 bytes:
Word : 12 34 56 78 (hex)
Byte #: 3 2 1 0
MSB<-----LSB
MSB = Most Significant Byte
LSB = Least Signficant Byte
It means that you are supposed to consider x composed of bytes as x = &Sum;n&in;[0,4) bn × 256n, and given x you are supposed to compute bn. That is, b0 is the least-significant byte and b3 is the most-significant byte.
MSB and LSB mean Most Significative Byte and Least Significative Byte, respectively. A byte being a 8-bit number that can be directly represented by 2 hexadecimal positions. So, the number 0x12345678 is a word containing 4 bytes, 12 34 56 78. The rightmost is the LSB, and the leftmost is the MSB. So you are taking the byte 1 that is the SECOND byte from right to left.

Compress a struct into a binary file? [C]

This is part of my homework that I'm having difficults to solve.
I have a simple structure:
typedef struct Client {
char* lname;
unsigned int id;
unsigned int car_id;
} Client;
And the exercise is:
Create a text file named as the company name and then branch number with txt extention.
the file contain all clients' details.
The file you created in exercise 1 will be compressed. as a result, a binary file be created with .cmpr extention.
I don't really have an idea how to implement 2.
I remember at the lectures that the professor said we have to use "all" the variable, with binary operators (<< , >> , | , &, ~), but I don't know how to used it.
I'm using Ubuntu, under GCC and Eclipse. I'm using C.
I'd be glad to get helped. thanks!
Let's say the file from step 1 looks like:
user1798362
2324
462345
where the three fields were simply printed on three lines. Note that the above is the text/readable (i.e. ASCII) representation of that file.
Looking at the contents of this file in hex(adecimal) representation we get (with the ASCII character printed below each byte value):
75 73 65 72 31 37 39 38 33 36 32 0a 32 33 32 34 0a 34 36 32 33 34 35 0a
u s e r 1 7 9 8 3 6 2 nl 2 3 2 4 nl 4 6 2 3 4 5 nl
here nl is of course the newline character. You can count that there are 24 bytes.
In step 2 you have to invent another format that saves as many bits as possible. The simplest way to do this is to compress each of the three fields individually.
Similar to where the text format uses a nl to mark the end of a field, you also need a way to define where a binary field begins and ends. A common way is to put a length in front of the binary field data. As a first step we could replace the nl's with a length and get:
58 75 73 65 72 31 37 39 38 33 36 32 20 32 33 32 34 30 34 36 32 33 34 35
-- u s e r 1 7 9 8 3 6 2 -- 2 3 2 4 -- 4 6 2 3 4 5
For now we simply take a whole byte for the length in bits. Note that 58 is the hex representation of 77 (i.e. 11 characters * 8 bits), the bit length of lname',20hex equals 4 * 8 = 32, and30is 6 * 8 = 48. This does not compress anything, as it's still 24 bytes in total. But we already got a binary format because58,20and30` got a special meaning.
The next step would be to compress each field. This is where it gets tricky. The lname field consists of ASCII character. In ASCII only 7 of the 8 bits are needed/used; here's a nice table For example the letter u in binary is 01110101. We can safely chop off the leftmost bit, which is always 0. This yields 1110101. The same can be done for all the characters. So you'll end up with 11 7-bit values -> 77 bits.
These 77 bits now must be fit in 8-bit bytes. Here are the first 4 bytes user in binary representation, before chopping the leftmost bit off:
01110101 01110011 01100101 01110010
Chopping off a bit in C is done by shifting the byte (i.e. unsigned char) to the left with:
unsigned char byte = lname[0];
byte = byte << 1;
When you do this for all characters you get:
1110101- 1110011- 1100101- 1110010-
Here I use - to indicate the bits in these bytes that are now available to be filled; they became available by shifting all bits one place to the left. You now use one or more bit from the right side of the next byte to fill up these - gaps. When doing this for these four bytes you'll get:
11101011 11001111 00101111 0010----
So now there's a gap of 4 bits that should be filled with the bit from the character 1, etc.
Filling up these gaps is done by using the binary operators in C which you mention. We already use the shift left <<. To combine 1110101- and 1110011- for example we do:
unsigned char* name; // name MUST be unsigned to avoid problems with binary operators.
<allocated memory for name and read it from text file>
unsigned char bytes[10]; // 10 is just a random size that gives us enough space.
name[0] = name[0] << 1; // We shift to the left in-place here, so `name` is overwritten.
name[1] = name[1] << 1; // idem.
bytes[0] = name[0] | (name[1] >> 7);
bytes[1] = name[1] << 1;
With name[1] >> 7 we have 1110011- >> 7 which gives: 00000001; the right most bit. With the bitwise OR operator | we then 'add' this bit to 1110101-, resulting in 111010111.
You have to do things like this in a loop to get all the bits in the correct bytes.
The new length of this name field is 11 * 7 = 77, so we've lost a massive 11 bits :-) Note that with a byte length, we assume that the lname field will never be more than 255 / 7 = 36 characters long.
As with the bytes above, you can then coalesce the second length against the final bits of the lname field.
To compress the numbers you first read 'em in with (fscanf(file, %d, ...)) in an unsigned int. There will be many 0s at the left side in this 4-byte unsigned int. The first field for example is (shown in chunks of 4 bit only for readability):
0000 0000 0000 0000 0000 1001 0001 0100
which has 20 unused bits at the left.
You need to get rid of these. Do 32 minus the number of zero's at the left, and you get the bit-length of this number. Add this length to the bytes array by coalescing its bits against those of previous field. Then only add the significant bits of the number to the bytes. This would be:
1001 0001 0100
In C, when working with the bits of an 'int' (but also 'short', 'long', ... any variable/number larger than 1 byte), you must take byte-order or endianness into account.
When you do the above step twice for both numbers, you're done. You then have a bytes array you can write to a file. Of course you must have kept where you were writing in bytes in the steps above; so you know the number of bytes. Note that in most cases there will be a few bits in the last byte that are not filled with data. But that doesn't hurt and it simply unavoidable waste of the fact that files are stored in chunks of 8 bits = 1 byte minimally.
When reading the binary file, you'll get a reverse process. You'll read in a unsigned char bytes array. You then know that the first byte (i.e. bytes[0]) contains the bit-length of the name field. You then fill in the bytes of the 'lname' byte-by-byte by shifting and masking. etc....
Good luck!

Resources