How to determine if a byte is null in a word - c

I am reading the "strlen" source code from the glibc, and the trick developers found to speed it up is to read n bytes where n is the size of a long word, instead of reading 1 byte at each iteration.
I will assume that a long word has 4 bytes.
The tricky part is that every "chunk" of 4 bytes the function reads can contain a null byte, so at each iteration, the function has to check if there was a null byte in the chunk. They do it like
if (((longword - lomagic) & ~longword & himagic) != 0) { /* null byte found */ }
where longword is the chunk of data and himagic and lowmagic are magical values defined as:
himagic = 0x80808080L;
lomagic = 0x01010101L;
Here is the comment for thoses values
/* Bits 31, 24, 16, and 8 of this number are zero. Call these bits
the "holes." Note that there is a hole just to the left of
each byte, with an extra at the end:
bits: 01111110 11111110 11111110 11111111
bytes: AAAAAAAA BBBBBBBB CCCCCCCC DDDDDDDD
The 1-bits make sure that carries propagate to the next 0-bit.
The 0-bits provide holes for carries to fall into. */
How does this trick of finding the null byte work?

From the famous "Bit Twiddling Hacks" page By Sean Eron Anderson, a description of what is currently used in the glibc implementation you're referring to (Anderson calls the algorithm hasless(v, 1)):
The subexpression (v - 0x01010101UL), evaluates to a high bit set in
any byte whenever the corresponding byte in v is zero or greater than
0x80. The sub-expression ~v & 0x80808080UL evaluates to high bits set
in bytes where the byte of v doesn't have its high bit set (so the
byte was less than 0x80). Finally, by ANDing these two sub-expressions
the result is the high bits set where the bytes in v were zero, since
the high bits set due to a value greater than 0x80 in the first
sub-expression are masked off by the second.
It appears that the comment(s) in the glibc source is confusing because it doesn't apply to what the code is actually doing anymore - it's describing what would have been an implementation of the algorithm that Anderson describes just before the hasless(v, 1) algorithm is described.

Related

SWAR byte counting methods from 'Bit Twiddling Hacks' - why do they work?

Bit Twiddling Hacks contains the following macros, which count the number of bytes in a word x that are less than, or greater than, n:
#define countless(x,n) \
(((~0UL/255*(127+(n))-((x)&~0UL/255*127))&~(x)&~0UL/255*128)/128%255)
#define countmore(x,n) \
(((((x)&~0UL/255*127)+~0UL/255*(127-(n))|(x))&~0UL/255*128)/128%255)
However, it doesn't explain why they work. What's the logic behind these macros?
Let's try for intuition on countmore.
First, ~0UL/255*(127-n) is a clever way of copying the value 127-n to all bytes in the word in parallel. Why does it work? ~0 is 255 in all bytes. Consequently, ~0/255 is 1 in all bytes. Multiplying by (127-n) does the "copying" mentioned at the outset.
The term ~0UL/255*127 is just a special case of the above where n is zero. It copies 127 into all bytes. That's 0x7f7f7f7f if words are 4 bytes. "Anding" with x zeros out the high order bit in each byte.
That's the first term (x)&~0UL/255*127). The result is the same as x except the high bit in each byte is zeroed.
The second term ~0UL/255*(127-(n)) is as above: 127-n copied to each byte.
For any given byte x[i], adding the two terms gives us 127-n+x[i] if x[i]<=127. This quantity will have the high order bit set whenever x[i]>n. It's easiest to see this as adding two 7-bit unsigned numbers. The result "overflows" into the 8th bit because the result is 128 or more.
So it looks like the algorithm is going to use the 8th bit of each byte as a boolean indicating x[i]>n.
So what about the other case, x[i]>127? Here we know the byte is more than n because the algorithm stipulates n<=127. The 8th bit ought to be always 1. Happily, the sum's 8th bit doesn't matter because the next step "or"s the result with x. Since x[i] has the 8th bit set to 1 if and only if it's 128 or larger, this operation "forces" the 8th bit to 1 just when the sum might provide a bad value.
To summarize so far, the "or" result has the 8th bit set to 1 in its i'th byte if and only if x[i]>n. Nice.
The next operation &~0UL/255*128 sets everything to zero except all those 8th bits of interest. It's "anding" with 0x80808080...
Now the task is to find the number of these bits set to 1. For this, countmore uses some basic number theory. First it shifts right 7 bits so the bits of interest are b0, b8, b16... The value of this word is
b0 + b8*2^8 + b16*2^16 + ...
A beautiful fact is that 1 == 2^8 == 2^16 == ... mod 255. In other words, each 1 bit is 1 mod 255. It follows that finding mod 255 of the shifted result is the same as summing b0+b8+b16+...
Yikes. We're done.
Let's analyse countless macro. We can simplify this macro as following code:
#define A(n) (0x0101010101010101UL * (0x7F+n))
#define B(x) (x & 0x7F7F7F7F7F7F7F7FUL)
#define C(x,n) (A(n) - B(x))
#define countless(x,n) (( C(x,n) & ~x & 0x8080808080808080UL) / 0x80 % 0xFF )
A(n) will be:
A(0) = 0x7F7F7F7F7F7F7F7F
A(1) = 0x8080808080808080
A(2) = 0x8181818181818181
A(3) = 0x8282828282828282
....
And for B(x), each byte of x will mask with 0x7F.
If we suppose x = 0xb0b1b2b3b4b5b6b7 and n = 0, then C(x,n) will equals to 0x(0x7F-b0)(0x7F-b1)(0x7F-b2)...
For example, We suppose x = 0x1234567811335577 and n = 0x50. So:
A(0x50) = 0xCFCFCFCFCFCFCFCF
B(0x1234567811335577) = 0x1234567811335577
C(0x1234567811335577, 0x50) = 0xBD9B7957BE9C7A58
~(0x1234567811335577) = 0xEDCBA987EECCAA88
0xEDCBA987EECCAA88 & 0x8080808080808080UL = 0x8080808080808080
C(0x1234567811335577, 0x50) & 0x8080808080808080 = 0x8080000080800000
(0x8080000080800000 / 0x80) % 0xFF = 4 //Count bytes that equal to 0x80 value.

Adding one byte to a hexadecimal number

I seem to have confused myself so much that this doesn't make sense anymore.
1 byte = 8 bits.
So if I have a memory location such as
0xdeadbeef
3735928559 (base10)
1101 1110 1010 1101 1011 1110 1110 1111
Now if I add one byte to 0xdeadbeef, what is the binary sequence I'm adding? Is it 1000? If I add 1 bit, I get 0xdeadbee0, and if I add 1 bit 8 times, I get 0xdeadbef7. Which is correct?
I remember from microprocessors the counter incremented in PC += 4, which gives 0xdeadbef3, so I'm not sure which is the right answer.
What I understand from your question is that, you are confused with adding a bit and a byte to the counter.
Since memory addresses are measured in bytes (in programming languages), any arithmetic operation to it is done in bytes.
To increment counter, adding 1 to it is like increment it to one byte next to the base address. Adding 1 to 0xdeadbeef will increment it to 0xdeadbef0.
I'm referring to memory locations.
So 0xdeadbeef is an address. If you increment it by 1 byte, you simply add 1 to it.
i.e. 0xdeadbeef + 1 = 0xdeadbef0
Concluding It looks like adding 1 bit to the address increments pointer by 1 byte because you access memory at byte granularity and increment memory in terms of number of bytes. But actually you added number 1 (i.e. 0x00000001). In case you want to increment by 4 byte, you add 4 to the address because memory is addressed(unit of) as number of bytes.
what is adding one byte? byte needs to have value .
you need to add number to a number , not size (8 bits) to number...
If the byte you are adding = 0x08 --> 1000 in binary , and
0xdeadbeef+ 0x08 = 0xdeadbef7
If, as I gathered, that thing is a memory address, which is measured in "number of bytes after the base (zero) address", if you move one byte forward in memory the memory address is incremented by one.
Think of it like this: if you measure the distance from the start of the road in meters, if you move forward of one meter (which is your unit of measurement) the distance from the start increases of 1.
Be careful though that pointers in C (and C++) work in a slightly confusing (at first) way: if your pointer is of type T *, each arithmetic operation on if is performed in units of T, so the underlying memory address is moved around in steps of sizeof(T).
For example, if you have:
int a[2];
int *ptr=a;
int *ptr2=ptr+1;
printf("Delta in ints: %d", (int)(ptr2-ptr)); // will print 1
char *cptr=(char *)ptr;
char *cptr2=(char *)ptr2;
printf("Delta in chars: %d", (int)(cptr2-cptr)); // will print sizeof(int), typically 4
In computers each memory location holds 8-bit(1-byte) of data. So, when we add 1 byte it will simply add 1 byte to the memory location.
when you add 1 to an address of word-size, 1 is interpreted as a word-size string of bits that has a decimal value of 1.
For your case, 0xdeadbeef has 32-bits or 4 bytes. So adding 1 is like doing 0xdeadbeef + 0x00000001. Adding 300 is like 0xdeadbeef + 0x0000012C
Just in case you're wondering if you can add more than word-size to the address, it's not possible, because the address can't be bigger or smaller than the word size.

Understanding shifting and logical operations

I am trying to read the 'size' of an SD card. The sample example which I am having has following lines of code:
unsigned char xdata *pchar; // Pointer to external mem space for FLASH Read function;
pchar += 9; // Size indicator is in the 9th byte of CSD (Card specific data) register;
// Extract size indicator bits;
size = (unsigned int)((((*pchar) & 0x03) << 1) | (((*(pchar+1)) & 0x80) >> 7));
I am not able to understand what is actually being done in the above line where indicator bit is being extracted. Can somebody help me in understanding this?
The size is made up of bits from two bytes. One byte is at pchar, the other at pchar + 1.
(*pchar) & 0x03) takes the 2 least significant bits (chopping of the 6 most significant ones).
This result is shifted one bit to the left using << 1. For example:
11011010 (& 0x03/00000011)==> 00000010 (<< 1)==> 00000100 (-----10-)
Something similar is done with pchar + 1. For example:
11110110 (& 0x80/10000000)==> 10000000 (>> 7)==> 00000001 (-------1)
Then these two values are OR-ed together with |. So in this example you'd get:
00000100 | 00000001 = 00000101 (-----101)
But note that the 5 most significant bits will always be 0 (above indicated with -) because they were &-ed away:
To summarize, the first byte holds two bits of size, while the second byte only one bit.
It seems the size indicator, say SI, consists of 3 bits, where *pchar contains the two most significant bits of SI in its lowest two bits (0x03) and *(pchar+1) contains the least significant bit of SI in its highest bit (0x80).
The first and second line figure out how to point to the data that you want.
Let's now go through the steps involved, from left to right.
The first portion of the operations takes the byte pointed to by pchar, performs a logical AND on the byte and 0x03 and shifts over that result by one bit.
That result is then logically ORed with the next byte (*pchar+1), which in turn is ANDed with 0x80, which is then right shifted by seven bits. Essentially, this portion just strips off the first bit in the byte and shifts it over by seven bits.
What the result is essentially this:
Imagine pchar points to the byte where bits are represented by letters: ABCDEFGH.
The first part ANDs with 0x03, so we are left with 000000GH. This is then left shifted by one bit, so we are left with 00000GH0.
Same thing for the right portion. pchar+1 is represented by IJKLMNOP. With the first logical AND, we are left with I0000000. This is then right shifted seven times. So we have 0000000I. This is combined with the left hand portion using the OR, so we have 00000GHI, which is then casted into an int, which holds your size.
Basically, there are three bits that hold the size, but they are not byte aligned. As a result, some manipulation is necessary.
size = (unsigned int)((((*pchar) & 0x03) << 1) | (((*(pchar+1)) & 0x80) >> 7));
Can somebody help me in understanding this?
We have byte *pchar and byte *(pchar+1). Each byte consists of 8 bits.
Let's index each bit of *pchar in bold: 76543210 and each bit of *(pchar+1) in italic: 76543210.
1.. ((*pchar) & 0x03) << 1 means "zero all bits of *pchar except bits 0 and 1, then shift result to the left by 1 bit":
76543210 --> xxxxxx10 --> xxxxx10x
2.. (((*(pchar+1)) & 0x80) >> 7) means "zero all bits of *(pchar+1) except bit 7, then shift result to the right by 7 bits":
76543210 --> 7xxxxxxx --> xxxxxxx7
3.. ((((*pchar) & 0x03) << 1) | (((*(pchar+1)) & 0x80) >> 7)) means "combine all non-zero bits of left and right operands into one byte":
xxxxx10x | xxxxxxx7 --> xxxxx107
So, in the result we have two low bits from *pchar and one high bit from *(pchar+1).

converting little endian hex to big endian decimal in C

I am trying to understand and implement a simple file system based on FAT12. I am currently looking at the following snippet of code and its driving me crazy:
int getTotalSize(char * mmap)
{
int *tmp1 = malloc(sizeof(int));
int *tmp2 = malloc(sizeof(int));
int retVal;
* tmp1 = mmap[19];
* tmp2 = mmap[20];
printf("%d and %d read\n",*tmp1,*tmp2);
retVal = *tmp1+((*tmp2)<<8);
free(tmp1);
free(tmp2);
return retVal;
};
From what I've read so far, the FAT12 format stores the integers in little endian format.
and the code above is getting the size of the file system which is stored in the 19th and 20th byte of boot sector.
however I don't understand why retVal = *tmp1+((*tmp2)<<8); works. is the bitwise <<8 converting the second byte to decimal? or to big endian format?
why is it only doing it to the second byte and not the first one?
the bytes in question are [in little endian format] :
40 0B
and i tried converting them manually by switching the order first to
0B 40
and then converting from hex to decimal, and I get the right output, I just don't understand how adding the first byte to the bitwise shift of second byte does the same thing?
Thanks
The use of malloc() here is seriously facepalm-inducing. Utterly unnecessary, and a serious "code smell" (makes me doubt the overall quality of the code). Also, mmap clearly should be unsigned char (or, even better, uint8_t).
That said, the code you're asking about is pretty straight-forward.
Given two byte-sized values a and b, there are two ways of combining them into a 16-bit value (which is what the code is doing): you can either consider a to be the least-significant byte, or b.
Using boxes, the 16-bit value can look either like this:
+---+---+
| a | b |
+---+---+
or like this, if you instead consider b to be the most significant byte:
+---+---+
| b | a |
+---+---+
The way to combine the lsb and the msb into 16-bit value is simply:
result = (msb * 256) + lsb;
UPDATE: The 256 comes from the fact that that's the "worth" of each successively more significant byte in a multibyte number. Compare it to the role of 10 in a decimal number (to combine two single-digit decimal numbers c and d you would use result = 10 * c + d).
Consider msb = 0x01 and lsb = 0x00, then the above would be:
result = 0x1 * 256 + 0 = 256 = 0x0100
You can see that the msb byte ended up in the upper part of the 16-bit value, just as expected.
Your code is using << 8 to do bitwise shifting to the left, which is the same as multiplying by 28, i.e. 256.
Note that result above is a value, i.e. not a byte buffer in memory, so its endianness doesn't matter.
I see no problem combining individual digits or bytes into larger integers.
Let's do decimal with 2 digits: 1 (least significant) and 2 (most significant):
1 + 2 * 10 = 21 (10 is the system base)
Let's now do base-256 with 2 digits: 0x40 (least significant) and 0x0B (most significant):
0x40 + 0x0B * 0x100 = 0x0B40 (0x100=256 is the system base)
The problem, however, is likely lying somewhere else, in how 12-bit integers are stored in FAT12.
A 12-bit integer occupies 1.5 8-bit bytes. And in 3 bytes you have 2 12-bit integers.
Suppose, you have 0x12, 0x34, 0x56 as those 3 bytes.
In order to extract the first integer you only need take the first byte (0x12) and the 4 least significant bits of the second (0x04) and combine them like this:
0x12 + ((0x34 & 0x0F) << 8) == 0x412
In order to extract the second integer you need to take the 4 most significant bits of the second byte (0x03) and the third byte (0x56) and combine them like this:
(0x56 << 4) + (0x34 >> 4) == 0x563
If you read the official Microsoft's document on FAT (look up fatgen103 online), you'll find all the FAT relevant formulas/pseudo code.
The << operator is the left shift operator. It takes the value to the left of the operator, and shift it by the number used on the right side of the operator.
So in your case, it shifts the value of *tmp2 eight bits to the left, and combines it with the value of *tmp1 to generate a 16 bit value from two eight bit values.
For example, lets say you have the integer 1. This is, in 16-bit binary, 0000000000000001. If you shift it left by eight bits, you end up with the binary value 0000000100000000, i.e. 256 in decimal.
The presentation (i.e. binary, decimal or hexadecimal) has nothing to do with it. All integers are stored the same way on the computer.

Large bit arrays in C

Our OS professor mentioned that for assigning a process id to a new process, the kernel incrementally searches for the first zero bit in a array of size equivalent to the maximum number of processes(~32,768 by default), where an allocated process id has 1 stored in it.
As far as I know, there is no bit data type in C. Obviously, there's something I'm missing here.
Is there any such special construct from which we can build up a bit array? How is this done exactly?
More importantly, what are the operations that can be performed on such an array?
Bit arrays are simply byte arrays where you use bitwise operators to read the individual bits.
Suppose you have a 1-byte char variable. This contains 8 bits. You can test if the lowest bit is true by performing a bitwise AND operation with the value 1, e.g.
char a = /*something*/;
if (a & 1) {
/* lowest bit is true */
}
Notice that this is a single ampersand. It is completely different from the logical AND operator &&. This works because a & 1 will "mask out" all bits except the first, and so a & 1 will be nonzero if and only if the lowest bit of a is 1. Similarly, you can check if the second lowest bit is true by ANDing it with 2, and the third by ANDing with 4, etc, for continuing powers of two.
So a 32,768-element bit array would be represented as a 4096-element byte array, where the first byte holds bits 0-7, the second byte holds bits 8-15, etc. To perform the check, the code would select the byte from the array containing the bit that it wanted to check, and then use a bitwise operation to read the bit value from the byte.
As far as what the operations are, like any other data type, you can read values and write values. I explained how to read values above, and I'll explain how to write values below, but if you're really interested in understanding bitwise operations, read the link I provided in the first sentence.
How you write a bit depends on if you want to write a 0 or a 1. To write a 1-bit into a byte a, you perform the opposite of an AND operation: an OR operation, e.g.
char a = /*something*/;
a = a | 1; /* or a |= 1 */
After this, the lowest bit of a will be set to 1 whether it was set before or not. Again, you could write this into the second position by replacing 1 with 2, or into the third with 4, and so on for powers of two.
Finally, to write a zero bit, you AND with the inverse of the position you want to write to, e.g.
char a = /*something*/;
a = a & ~1; /* or a &= ~1 */
Now, the lowest bit of a is set to 0, regardless of its previous value. This works because ~1 will have all bits other than the lowest set to 1, and the lowest set to zero. This "masks out" the lowest bit to zero, and leaves the remaining bits of a alone.
A struct can assign members bit-sizes, but that's the extent of a "bit-type" in 'C'.
struct int_sized_struct {
int foo:4;
int bar:4;
int baz:24;
};
The rest of it is done with bitwise operations. For example. searching that PID bitmap can be done with:
extern uint32_t *process_bitmap;
uint32_t *p = process_bitmap;
uint32_t bit_offset = 0;
uint32_t bit_test;
/* Scan pid bitmap 32 entries per cycle. */
while ((*p & 0xffffffff) == 0xffffffff) {
p++;
}
/* Scan the 32-bit int block that has an open slot for the open PID */
bit_test = 0x80000000;
while ((*p & bit_test) == bit_test) {
bit_test >>= 1;
bit_offset++;
}
pid = (p - process_bitmap)*8 + bit_offset;
This is roughly 32x faster than doing a simple for loop scanning an array with one byte per PID. (Actually, greater than 32x since more of the bitmap is will stay in CPU cache.)
see http://graphics.stanford.edu/~seander/bithacks.html
No bit type in C, but bit manipulation is fairly straight forward. Some processors have bit specific instructions which the code below would nicely optimize for, even without that should be pretty fast. May or may not be faster using an array of 32 bit words instead of bytes. Inlining instead of functions would also help performance.
If you have the memory to burn just use a whole byte to store one bit (or whole 32 bit number, etc) greatly improve performance at the cost of memory used.
unsigned char data[SIZE];
unsigned char get_bit ( unsigned int offset )
{
//TODO: limit check offset
if(data[offset>>3]&(1<<(offset&7))) return(1);
else return(0);
}
void set_bit ( unsigned int offset, unsigned char bit )
{
//TODO: limit check offset
if(bit) data[offset>>3]|=1<<(offset&7);
else data[offset>>3]&=~(1<<(offset&7));
}

Resources