Adding one byte to a hexadecimal number - c

I seem to have confused myself so much that this doesn't make sense anymore.
1 byte = 8 bits.
So if I have a memory location such as
0xdeadbeef
3735928559 (base10)
1101 1110 1010 1101 1011 1110 1110 1111
Now if I add one byte to 0xdeadbeef, what is the binary sequence I'm adding? Is it 1000? If I add 1 bit, I get 0xdeadbee0, and if I add 1 bit 8 times, I get 0xdeadbef7. Which is correct?
I remember from microprocessors the counter incremented in PC += 4, which gives 0xdeadbef3, so I'm not sure which is the right answer.

What I understand from your question is that, you are confused with adding a bit and a byte to the counter.
Since memory addresses are measured in bytes (in programming languages), any arithmetic operation to it is done in bytes.
To increment counter, adding 1 to it is like increment it to one byte next to the base address. Adding 1 to 0xdeadbeef will increment it to 0xdeadbef0.

I'm referring to memory locations.
So 0xdeadbeef is an address. If you increment it by 1 byte, you simply add 1 to it.
i.e. 0xdeadbeef + 1 = 0xdeadbef0
Concluding It looks like adding 1 bit to the address increments pointer by 1 byte because you access memory at byte granularity and increment memory in terms of number of bytes. But actually you added number 1 (i.e. 0x00000001). In case you want to increment by 4 byte, you add 4 to the address because memory is addressed(unit of) as number of bytes.

what is adding one byte? byte needs to have value .
you need to add number to a number , not size (8 bits) to number...
If the byte you are adding = 0x08 --> 1000 in binary , and
0xdeadbeef+ 0x08 = 0xdeadbef7

If, as I gathered, that thing is a memory address, which is measured in "number of bytes after the base (zero) address", if you move one byte forward in memory the memory address is incremented by one.
Think of it like this: if you measure the distance from the start of the road in meters, if you move forward of one meter (which is your unit of measurement) the distance from the start increases of 1.
Be careful though that pointers in C (and C++) work in a slightly confusing (at first) way: if your pointer is of type T *, each arithmetic operation on if is performed in units of T, so the underlying memory address is moved around in steps of sizeof(T).
For example, if you have:
int a[2];
int *ptr=a;
int *ptr2=ptr+1;
printf("Delta in ints: %d", (int)(ptr2-ptr)); // will print 1
char *cptr=(char *)ptr;
char *cptr2=(char *)ptr2;
printf("Delta in chars: %d", (int)(cptr2-cptr)); // will print sizeof(int), typically 4

In computers each memory location holds 8-bit(1-byte) of data. So, when we add 1 byte it will simply add 1 byte to the memory location.

when you add 1 to an address of word-size, 1 is interpreted as a word-size string of bits that has a decimal value of 1.
For your case, 0xdeadbeef has 32-bits or 4 bytes. So adding 1 is like doing 0xdeadbeef + 0x00000001. Adding 300 is like 0xdeadbeef + 0x0000012C
Just in case you're wondering if you can add more than word-size to the address, it's not possible, because the address can't be bigger or smaller than the word size.

Related

Pointers in C with typecasting

#include<stdio.h>
int main()
{
int a;
char *x;
x = (char *) &a;
a = 512;
x[0] = 1;
x[1] = 2;
printf("%d\n",a);
return 0;
}
I'm not able to grasp the fact that how the output is 513 or even Machine dependent ? I can sense that typecasting is playing a major role but what is happening behind the scenes, can someone help me visualise this problem ?
The int a is stored in memory as 4 bytes. The number 512 is represented on your machine as:
0 2 0 0
When you assign to x[0] and x[1], it changes this to:
1 2 0 0
which is the number 513.
This is machine-dependent, because the order of bytes in a multi-byte number is not specified by the C language.
For simplifying assume the following:
size of int is 4 (in bytes)
size of any pointer type is 8
size of char is 1 byte
in line 3 x is referencing a as a char, this means that x thinks that he is pointing to a char (he has no idea that a was actually a int.
line 4 is meant to confuse you. Don't.
line 5 - since x thinks he is pointing to a char x[0] = 1 changes just the first byte of a (because he thinks that he is a char)
line 6 - once again, x changed just the second byte of a.
note that the values put in lines 5 and 6 overide the value in line 4.
the value of a is now 0...0000 0010 0000 0001 (513).
Now when we print a as an int, all 4 bytes would be considered as expected.
Let me try to break this down for you in addition to the previous answers:
#include<stdio.h>
int main()
{
int a; //declares an integer called a
char *x; //declares a pointer to a character called x
x = (char *) &a; //points x to the first byte of a
a = 512; //writes 512 to the int variable
x[0] = 1; //writes 1 to the first byte
x[1] = 2; //writes 2 to the second byte
printf("%d\n",a); //prints the integer
return 0;
}
Note that I wrote first byte and second byte. Depending on the byte order of your platform and the size of an integer you might not get the same results.
Lets look at the memory for 32bit or 4 Bytes sized integers:
Little endian systems
first byte | second byte | third byte | forth byte
0x00 0x02 0x00 0x00
Now assigning 1 to the first byte and 2 to the second one leaves us with this:
first byte | second byte | third byte | forth byte
0x01 0x02 0x00 0x00
Notice that the first byte gets changed to 0x01 while the second was already 0x02.
This new number in memory is equivalent to 513 on little endian systems.
Big endian systems
Lets look at what would happen if you were trying this on a big endian platform:
first byte | second byte | third byte | forth byte
0x00 0x00 0x02 0x00
This time assigning 1 to the first byte and 2 to the second one leaves us with this:
first byte | second byte | third byte | forth byte
0x01 0x02 0x02 0x00
Which is equivalent to 16,908,800 as an integer.
I'm not able to grasp the fact that how the output is 513 or even Machine dependent
The output is implementation-defined. It depends on the order of bytes in CPU's interpretation of integers, commonly known as endianness.
I can sense that typecasting is playing a major role
The code reinterprets the value of a, which is an int, as an array of bytes. It uses two initial bytes, which is guaranteed to work, because an int is at least two bytes in size.
Can someone help me visualise this problem?
An int consists of multiple bytes. They can be addressed as one unit that represents an integer, but they can also be addressed as a collection of bytes. The value of an int depends on the number of bytes that you set, and on the order of these bytes in CPU's interpretation of integers.
It looks like your system stores the least significant byte at a lowest address, so the result of storing 1 and 2 at offsets zero and one produces this layout:
Byte 0 Byte 1 Byte 2 Byte 3
------ ------ ------ ------
1 2 0 0
Integer value can be computed as follows:
1 + 2*256 + 0*65536 + 0*16777216
By taking x, which is a char *, and pointing it to the address of a, which is an int, you can use x to modify the individual bytes that represent a.
The output you're seeing suggests that an int is stored in little-endian format, meaning the least significant byte comes first. This can change however if you run this code on a different system (ex. a Sun SPARC machine which is big-enidan).
You first set a to 512. In hex, that's 0x200. So the memory for a, assuming a 32 bit int in little endian format, is laid out as follows:
-----------------------------
| 0x00 | 0x02 | 0x00 | 0x00 |
-----------------------------
Next you set x[0] to 1, which updates the first byte in the representation of a (in this case leaving it unchanged):
-----------------------------
| 0x01 | 0x02 | 0x00 | 0x00 |
-----------------------------
Then you set x[1] to 2, which updates the second byte in the representation of a:
-----------------------------
| 0x01 | 0x02 | 0x00 | 0x00 |
-----------------------------
Now a has a value of 0x201, which in decimal is 513.

How to determine if a byte is null in a word

I am reading the "strlen" source code from the glibc, and the trick developers found to speed it up is to read n bytes where n is the size of a long word, instead of reading 1 byte at each iteration.
I will assume that a long word has 4 bytes.
The tricky part is that every "chunk" of 4 bytes the function reads can contain a null byte, so at each iteration, the function has to check if there was a null byte in the chunk. They do it like
if (((longword - lomagic) & ~longword & himagic) != 0) { /* null byte found */ }
where longword is the chunk of data and himagic and lowmagic are magical values defined as:
himagic = 0x80808080L;
lomagic = 0x01010101L;
Here is the comment for thoses values
/* Bits 31, 24, 16, and 8 of this number are zero. Call these bits
the "holes." Note that there is a hole just to the left of
each byte, with an extra at the end:
bits: 01111110 11111110 11111110 11111111
bytes: AAAAAAAA BBBBBBBB CCCCCCCC DDDDDDDD
The 1-bits make sure that carries propagate to the next 0-bit.
The 0-bits provide holes for carries to fall into. */
How does this trick of finding the null byte work?
From the famous "Bit Twiddling Hacks" page By Sean Eron Anderson, a description of what is currently used in the glibc implementation you're referring to (Anderson calls the algorithm hasless(v, 1)):
The subexpression (v - 0x01010101UL), evaluates to a high bit set in
any byte whenever the corresponding byte in v is zero or greater than
0x80. The sub-expression ~v & 0x80808080UL evaluates to high bits set
in bytes where the byte of v doesn't have its high bit set (so the
byte was less than 0x80). Finally, by ANDing these two sub-expressions
the result is the high bits set where the bytes in v were zero, since
the high bits set due to a value greater than 0x80 in the first
sub-expression are masked off by the second.
It appears that the comment(s) in the glibc source is confusing because it doesn't apply to what the code is actually doing anymore - it's describing what would have been an implementation of the algorithm that Anderson describes just before the hasless(v, 1) algorithm is described.

converting little endian hex to big endian decimal in C

I am trying to understand and implement a simple file system based on FAT12. I am currently looking at the following snippet of code and its driving me crazy:
int getTotalSize(char * mmap)
{
int *tmp1 = malloc(sizeof(int));
int *tmp2 = malloc(sizeof(int));
int retVal;
* tmp1 = mmap[19];
* tmp2 = mmap[20];
printf("%d and %d read\n",*tmp1,*tmp2);
retVal = *tmp1+((*tmp2)<<8);
free(tmp1);
free(tmp2);
return retVal;
};
From what I've read so far, the FAT12 format stores the integers in little endian format.
and the code above is getting the size of the file system which is stored in the 19th and 20th byte of boot sector.
however I don't understand why retVal = *tmp1+((*tmp2)<<8); works. is the bitwise <<8 converting the second byte to decimal? or to big endian format?
why is it only doing it to the second byte and not the first one?
the bytes in question are [in little endian format] :
40 0B
and i tried converting them manually by switching the order first to
0B 40
and then converting from hex to decimal, and I get the right output, I just don't understand how adding the first byte to the bitwise shift of second byte does the same thing?
Thanks
The use of malloc() here is seriously facepalm-inducing. Utterly unnecessary, and a serious "code smell" (makes me doubt the overall quality of the code). Also, mmap clearly should be unsigned char (or, even better, uint8_t).
That said, the code you're asking about is pretty straight-forward.
Given two byte-sized values a and b, there are two ways of combining them into a 16-bit value (which is what the code is doing): you can either consider a to be the least-significant byte, or b.
Using boxes, the 16-bit value can look either like this:
+---+---+
| a | b |
+---+---+
or like this, if you instead consider b to be the most significant byte:
+---+---+
| b | a |
+---+---+
The way to combine the lsb and the msb into 16-bit value is simply:
result = (msb * 256) + lsb;
UPDATE: The 256 comes from the fact that that's the "worth" of each successively more significant byte in a multibyte number. Compare it to the role of 10 in a decimal number (to combine two single-digit decimal numbers c and d you would use result = 10 * c + d).
Consider msb = 0x01 and lsb = 0x00, then the above would be:
result = 0x1 * 256 + 0 = 256 = 0x0100
You can see that the msb byte ended up in the upper part of the 16-bit value, just as expected.
Your code is using << 8 to do bitwise shifting to the left, which is the same as multiplying by 28, i.e. 256.
Note that result above is a value, i.e. not a byte buffer in memory, so its endianness doesn't matter.
I see no problem combining individual digits or bytes into larger integers.
Let's do decimal with 2 digits: 1 (least significant) and 2 (most significant):
1 + 2 * 10 = 21 (10 is the system base)
Let's now do base-256 with 2 digits: 0x40 (least significant) and 0x0B (most significant):
0x40 + 0x0B * 0x100 = 0x0B40 (0x100=256 is the system base)
The problem, however, is likely lying somewhere else, in how 12-bit integers are stored in FAT12.
A 12-bit integer occupies 1.5 8-bit bytes. And in 3 bytes you have 2 12-bit integers.
Suppose, you have 0x12, 0x34, 0x56 as those 3 bytes.
In order to extract the first integer you only need take the first byte (0x12) and the 4 least significant bits of the second (0x04) and combine them like this:
0x12 + ((0x34 & 0x0F) << 8) == 0x412
In order to extract the second integer you need to take the 4 most significant bits of the second byte (0x03) and the third byte (0x56) and combine them like this:
(0x56 << 4) + (0x34 >> 4) == 0x563
If you read the official Microsoft's document on FAT (look up fatgen103 online), you'll find all the FAT relevant formulas/pseudo code.
The << operator is the left shift operator. It takes the value to the left of the operator, and shift it by the number used on the right side of the operator.
So in your case, it shifts the value of *tmp2 eight bits to the left, and combines it with the value of *tmp1 to generate a 16 bit value from two eight bit values.
For example, lets say you have the integer 1. This is, in 16-bit binary, 0000000000000001. If you shift it left by eight bits, you end up with the binary value 0000000100000000, i.e. 256 in decimal.
The presentation (i.e. binary, decimal or hexadecimal) has nothing to do with it. All integers are stored the same way on the computer.

Which of the following is the correct output for the program given below?

if the machine is 32bit little-endianess and the sizeof(int) is 4 byte.
Given the following program:
line1: #include<stdio.h>
line2: {
line3: int arr[3]={2,3,4};
line4: char *p;
line5: p=(char*)arr;
line6: printf("%d",*p);
line7: p=p+1;
line8: printf("%d\n",*p);
line9: return 0;
}
What is the expected output?
A: 2 3
B: 2 0
C: 1 0
D: garbage value
one thing that bothering me the casting of the integer pointer to an character pointer.
How important the casting is?
What is the compiler doing at line 5? (p = (char *) arr;)
What is happening at line 7? (p = p + 1)
If the output is 20 then how the 0 is being printed out?
(E) none of the above
However, provided that (a) you are on a little-endian machine (e.g. x86), and (b) sizeof(int) >= 2, this should print "20" (no space is printed between the two).
a) the casting is "necessary" to read the array one byte at a time instead of as a series of ints
b) this is just coercing the address of the first int into a pointer to char
c) increment the address stored in p by sizeof(char) (which is 1)
d) the second byte of the machine representation of the int is printed by line 8
(D), or compiler specific, as sizeof(int) (as well as endianness) is platform-dependent.
How important the casting is?
Casting, as a whole is an integral (pun unintended) part of the C language.
and what the compilar would do in line number5?
It takes the address of the first element of arr and puts it in p.
and after line number 5 whats going on line number7?
It increments the pointer so it points to the next char from that memory address.
and if the output is 2 0 then how the 0 is being printed by the compiler?
This is a combination of endanness and sizeof(int). Without the specs of your machine, there isn't much else I can do to explain.
However, assuming little endian and sizeof(int) == 4, we can see the following:
// lets mark these memory regions: |A|B|C|D|
int i = 2; // represented as 0x02000000
char *ptr = (char *) &i; // now ptr points to 0x02 (A)
printf("%d\n", *ptr); // prints '2', because ptr points to 0x02 (A)
ptr++; // increment ptr, ptr now points to 0x00 (B)
printf("%d\n", *ptr); // prints '0', because ptr points to 0x00 (B)
1.important of casting:-
char *p;
this line declare a pointer to a character.That means its property is it can de-reference
only one byte at a time,and also displacement are one one byte.
p=(char*)arr;
2. type casting to char * is only for avoid warning by compiler nothing else.
If you don't then also same behavior.
as pointer to a character as I already write above p=p+1 point to next byte
printf("%d\n",*p);
%d is formatting the value to decimal integer so decimal format shown
here *p used and as per its property it can de-reference only one byte.So now memory organisation comes into picture.
that is your machine follows little endian/LSB first or big endian/MSB first
as per your ans your machine follow little endian.So first time your ans is 0.
Then next byte must be zero so output is 0.
in binary:
2 represented as 00-00-00-02(byte wise representation)
but in memory it stores like
02-00-00-00 four bytes like this
in first memory byte 02
and in 2nd memory byte 00

Understand the following line

I read this code in a library which is used to display a bitmap (.bmp) to an LCD.
I do really hard in understanding what is happening at the following lines, and how it does happen.
Maybe someone can explain this to me.
uint16_t s, w, h;
uint8_t* buffer; // does get malloc'd
s = *((uint16_t*)&buffer[0]);
w = *((uint16_t*)&buffer[18]);
h = *((uint16_t*)&buffer[22]);
I guess it's not that hard for a real C programmer, but I am still learning, so I thought I just ask :)
As far as I understand this, it sticks somehow together two uint8_tvariables to an uint16_t.
Thanks in advance for your help here!
In the code you've provided, buffer (which is an array of bytes) is read, and values are extracted into s, w and h.
The (uint16_t*)&buffer[n] syntax means that you're extracting the address of the nth byte of buffer, and casting it into a uint16_t*. The casting tells the compiler to look at this address as if points at a uint16_t, i.e. a pair of uint8_ts.
The additional * in the code dereferences the pointer, i.e. extracts the value from this address. Since the address now points at a uint16_t, a uint16_t value is extracted.
As a result:
s gets the value of the first uint16_t, i.e. bytes 0 and 1.
w gets the value of the tenth uint16_t, i.e. bytes 18 and 19.
h gets the value of the twelveth uint16_t, i.e. bytes 22 and 23.
The code:
takes two bytes at positions 0 and 1 in the buffer, sticks them together into an unsigned 16-bit value, and stores the result in s;
it does the same with bytes 18/19, storing the result in w;
ditto for bytes 22/23 and h.
It is worth noting that the code uses the native endianness of the target platform to decide which of the two bytes represents the top 8 bits of the result, and which represents the bottom 8 bits.
uint8_t* buffer; // pointer to 8 bit or simply one byte
Buffer points to memory address of bytes -> |byte0|byte1|byte2|....
(uint16_t*)&buffer[0] // &buffer[0] is actually the same as buffer
(uint16_t*)&buffer[0] equals (uint16_t*)buffer; it points to 16 bit or halfword
(uint16_t*)buffer points to memory: |byte0byte1 = halfword0|byte2byte3 = halfword1|....
w = *((uint16_t*)&buffer[18]);
Takes memory address to byte 18 in buffer, then reinterpret this address to address of halfword then gets halfword on this address;
it's simply w = byte18 and byte19 sticked together forming a halfword
h = *((uint16_t*)&buffer[22]);
h = byte22 and byte 23 sticked together
UPD More detailed explanation:
h = *((uint16_t*)&buffer[22]) =>
1) buffer[22] === 22nd uint8_t (a.k.a. byte) of buffer; let's call it byte22
2) &buffer[22] === &byte === address of byte22 in memory; it's of type uint8_t*, as same as buffer; letscall it byte22_address;
3) (uint16_t*)&buffer[22] = (uint16_t*)byte22_address; casts address of byte to address of (two bytes sticked together; address of halfword of the same address; let's call it halfword11_address;
4) h = *((uint16_t*)&buffer[22]) === *halfword11_address; * operator takes value at address, that is 11th halfword or bytes 22 and 23 sticked together;

Resources