Understanding the magic number 0x07EFEFEFF used for strlen optimization - c

I stumbled upon this answer regarding the utilization of the magic number 0x07EFEFEFF used for strlen's optimization, and here is what the top answer says:
Look at the magic bits. Bits number 16, 24 and 31 are 1. 8th bit is 0.
8th bit represents the first byte. If the first byte is not zero, 8th bit becomes 1 at this point. Otherwise it's 0.
16th bit represents the second byte. Same logic.
24th bit represents the third byte.
31th bit represents the fourth byte.
However, if I calculate result = ((a + magic) ^ ~a) & ~magic with a = 0x100, I find that result = 0x81010100, meaning that according to the top answerer, the second byte of a equals 0, which is obviously false.
What am I missing?
Thanks!

The bits only tell you if a byte is zero if the lower bytes are non-zero -- so it can only tell you the FIRST 0 byte, but not about bytes after the first 0.
bit8=1 means first byte is zero. Other bytes, unknown
bit8=0 means first byte is non-zero
bit8=0 & bit16=1 means second byte is zero, higher bytes unknown
bit8=0 & bit16=0 mans first two bytes are non-zero.
Also, the last bit (bit31) only tells you about 7 bits of the last byte (and only if the first 3 bytes are non-zero) -- if it is the only bit set then the last byte is 0 or 128 (and the rest are non-zero).

Related

Correct length of VkPipelineMultisampleStateCreateInfo.pSampleMask

Unlike other Vulkan's structs, where all type pArrayName*; has a companion uint32_t arrayNameCount with array length, struct VkPipelineMultisampleStateCreateInfo does not define any sampleMaskCount for field const VkSampleMask* pSampleMask;.
The Vulkan docs says the following about valid usage:
"If pSampleMask is not NULL, pSampleMask must be a pointer to an array of ⌈rasterizationSamples / 32⌉ VkSampleMask values."
But VkSampleCountFlagBits rasterizationSamples; is a bitwise value:
rasterizationSamples is a VkSampleCountFlagBits specifying the number of samples per pixel used in rasterization.
So far so good.
But VkSampleCountFlagBits is a enumeration of power of 2 values, ranging from 0x01 to 0x40 (or 01 to 64 decimal). Possible ANDed values may ranges from 01 to 127.
So I guess bitwise values ranging from 1 to 31 will result in a pSampleMask with length 0; values from 32 to 63 will give length 2, and so on.
Is that correct ?
I'm felling really really dumb !
When they say "rasterizationSamples" in the formula they almost certainly mean "the number of rasterization samples", not "the value of the rasterizationSamples bitmask".
Additionally, ⌈...⌉ means to round up to the nearest integer.
So, for rasterization sample counts from 1 to 32 (bitmask values 0x01 to 0x20), pSampleMask points to a single value. For rasterization sample counts from 33 to 64 (bitmask value 0x40), it points to an array of two values.
I notice that the bitmask's value line up with the description of each bit (64 has the value 64, and so on) but it could be coincidence.
The rasterizationSamples must be only one of the SampleCountFlagBits values (not a bitwised | value).
Simply speaking, then you need the amount of bits in pSampleMask array (consisting of 32 b values) to be greater or equal to the amount of samples as specified by rasterizationSamples. (i.e. one VkSampleMask := uint32_t for all of them except for SAMPLE_COUNT_64, which needs two)
It is somewhat funny that they didn't choose single uint64_t for the purpose. (maybe they plan adding 128 samples :)

Get length of multibyte UTF-8 sequence

I am parsing some UTF-8 text but am only interested in characters in the ASCII range, i.e., I can just skip multibyte sequences.
I can easily detect the beginning of a sequence because the sign bit is set, so the char value is < 0. But how can I tell how many bytes are in the sequence so I can skip over it?
I do not need to perform any validation, i.e., I can assume the input is valid UTF-8.
Just strip out all bytes which are no valid ascii, don't try to get cute and interpret bytes >127 at all. This works as long as you don't have any combining sequences with base character in ascii range. For those you would need to interpret the codepoints themselves.
Although Deduplicator's answer is more appropriate to the specific purpose of skipping over multibyte sequences, if there is a need to get the length of each such character, pass the first byte to this function:
int getUTF8SequenceLength (unsigned char firstPoint) {
firstPoint >>= 4;
firstPoint &= 7;
if (firstPoint == 4) return 2;
return firstPoint - 3;
}
This returns the total length of the sequence, including the first byte. I'm using an unsigned char value as the firstPoint parameter here for clarity, but note this function will work exactly the same way if the parameter is a signed char.
To explain:
UTF-8 uses bits 5, 6, and 7 in the first byte of a sequence to indicate the remaining length. If all three are set, the sequence is 3 additional bytes. If only the first of these from the left (the 7th bit) is set, the sequence is 1 additional byte. If the first two from the left are set, the sequence is 2 additional bytes. Hence, we want to examine these three bits (the value here is just an example):
11110111
^^^
The value is shifted down by 4 then AND'd with 7. This leaves only the 1st, 2nd, and 3rd bits from the right as the only possible ones set. The value of these bits are 1, 2, and 4 respectively.
00000111
^^^
If the value is now 4, we know only the first bit from the left (of the three we are considering) is set and can return 2.
After this, the value is either 7, meaning all three bits are set, so the sequence is 4 bytes in total, or 6, meaning the first two from the left are set so the sequence is 3 bytes in total.
This covers the range of valid Unicode characters expressed in UTF-8.

Appending the message in MD5

I am trying to understand how the MD5 hashing algorithm work and have been reading the Wikipedia article about it.
After one appends the message so that the length of the message (in bits) is congruent to 448 mod 512, one is supposed to
append length mod (2 pow 64) to message
From what I can understand this means to append the message with 64 bits representing the length of the message. I am a bit confused about how this is done.
My first questions is: is this the length of the original unappended message or the length that one gets after having appended it with the 1 followed by zeros?
My second question is: Is the length the length in bytes? That is, if my message is one byte, would I append the message with 63 0's and then a 1. Or if the message is 10 bytes, then I would append the message with 60 0's and 1010.
The length of the unpadded message. From the MD5 RFC, 3.2:
A 64-bit representation of b (the length of the message before the
padding bits were added) is appended to the result of the previous
step. In the unlikely event that b is greater than 2^64, then only
the low-order 64 bits of b are used. (These bits are appended as two
32-bit words and appended low-order word first in accordance with the
previous conventions.)
The length is in bits. See MD5 RFC, 3.1:
The message is "padded" (extended) so that its length (in bits) is
congruent to 448, modulo 512. That is, the message is extended so
that it is just 64 bits shy of being a multiple of 512 bits long.
Padding is always performed, even if the length of the message is
already congruent to 448, modulo 512.
The MD5 spec is far more precise than the Wikipedia article. I always suggest reading the spec over the Wiki page if you want implementation-level detail.
if my message is one byte, would I append the message with 63 0's and then a 1. Or if the message is 10 bytes, then I would append the message with 60 0's and 1010.
Not quite. Don't forget the obligatory bit value "1" that is always appended at the start of the padding. From the spec:
Padding is performed as follows: a single "1" bit is appended to the
message, and then "0" bits are appended so that the length in bits of
the padded message becomes congruent to 448, modulo 512. In all, at
least one bit and at most 512 bits are appended.
This reference C implementation (disclaimer: my own) of MD5 may be of help, it's written so that hopefully it's easy to follow.

How the magic bits are improving the strlen function in glibc [duplicate]

This question already has answers here:
How to determine if a byte is null in a word
(1 answer)
How the glibc strlen() implementation works [duplicate]
(1 answer)
Closed 9 months ago.
I was going through the source of strlen for glibc. They have used magic bits to find the length of string. Can someone please explain how it is working.
Thank you
Let's say this function is looking through a string -- 4 bytes at a time, as explained by the comments (we assume long ints are 4 bytes) -- and the current "chunk" look like this:
'\3' '\3' '\0' '\3'
00000011 00000011 00000000 00000011 (as a string: "\x03\x03\x00\x03")
The strlen function is just looking for the first zero byte in this string. It first determines, for each 4-byte chunk, whether there's any zero byte in there, by checking this magic_bits shortcut first: it adds the 4 bytes to this value:
01111110 11111110 11111110 11111111
Adding any non-zero bytes to this value will cause the 1's to overflow into the holes marked by zeroes, by propagating the carries. For our chunk, it'd look like this:
11111111 111111 1 1111111 Carries
00000011 00000011 00000000 00000011 Chunk
01111110 11111110 11111110 11111111 Magic bits
+ -----------------------------------
10000010 00000001 11111111 00000010
^ ^ ^ ^
(The hole bits are marked by ^'s.)
And, from the comments:
/* Look at only the hole bits. If any of the hole bits
are unchanged, most likely one of the bytes was a
zero. */
If there's no zeroes in the chunk, all of the hole bits will get set to 1's. However, because of the zero byte, one hole bit didn't get filled by a propagating carry, and we can then go check which byte it was.
Essentially, it speeds up the strlen calculation by applying some bit addition magic to 4-byte chunks to scan for zeroes, before narrowing down the search to single byte comparisons.
The idea is instead to compare one byte at a time against zero, rather to check one unsigned long object at a time if one of its byte is zero. This means checking 8 bytes at a time when sizeof (unsigned long) is 8.
With bit hacks, there is a fast known expression that can determine if one of the bytes compares equal to zero. Then if one of the bytes is equal to zero, the bytes of the object are individually tested to find the first one which is zero. The advantage of using bitwise operations is it reduces the number of branching instructions.
The bit hack expression to check if one of the byte of a multi-byte object is equal to zero is explained in the famous Stanford Bit Twiddling Hacks page, in
Determine if a word has a zero byte
http://graphics.stanford.edu/~seander/bithacks.html#ZeroInWord

How to allocate a 32-byte aligned memory in C

Came across this question in one of the interview samples. A 16-byte aligned allocation has already been answered in How to allocate aligned memory only using the standard library?
But, I have a specific question in the same regarding the mask used to zero down the last 4 bits. This mask "~0F" has been used such that the resulting address is divisible by 16. What should be done to achieve the same for 32-byte alignment/divisibility?
First, the question you referred to is 16-byte alignment, not 16-bit alignment.
Regarding your actual question, you just want to mask off 5 bits instead of 4 to make the result 32-byte aligned. So it would be ~0x1F.
To clarify a bit:
To align a pointer to a 32 byte boundary, you want the last 5 bits of the address to be 0. (Since 100000 is 32 in binary, any multiple of 32 will end in 00000.)
0x1F is 11111 in binary. Since it's a pointer, it's actually some number of 0's followed by 11111 - for example, with 64-bit pointers, it would be 59 0's and 5 1's. The ~ means that these values are inverted - so ~0x1F is 59 1's followed by 5 0's.
When you take ptr & ~0x1F, the bitwise & causes all bits that are &'ed with 1 to stay the same, and all bits that are &'ed with 0 to be set to 0. So you end up with the original value of ptr, except that the last 5 bits have been set to 0. What this means is that we've subtracted some number between 0 and 31 in order to make ptr a multiple of 32, which was the goal.

Resources