Find how many times pattern accrues in binary number using C - c

Load a 32-bit non-negative integer (unsigned int) and an 8-bit pattern (unsigned int). It is not necessary to check the loaded numbers. Load both numbers in the decimal number system.
Determine the number of times a given pattern appears in the binary notation of a loaded 32-bit number.
It is not allowed to use the string.h library and aggregate data types.
For example:
32-bit number 514 00000000 00000000 00000010 00000010
8-bit number 2 00000010
So it should print that number 2 accrues 2 times.
I'm not sure how to tackle this problem. I've tried to keep a counter that counts streek, but it become to complicated to quickly.

This is a solution for specifically 8-bit patterns and 32-bit numbers:
for (int i = 0; i < 4; i++) // 32 bits = 4 bytes
if ((number & 0xFF << i * 8) >> i * 8 == pattern)
count++;
The general idea is, for every 8-bit sequence in the number, mask it (set all other bits to 0 by &'ing the number by 0xFF shifted however many bits required) and shift it however many bits are required to bring the masked sequence to the least significant position. Here is a live working example.

Related

Understanding Byte swapping

I'm trying to figure out a way to create a little endian/big endian conversion for 64 bit integers (uint64_t) and while I find a lot of answers online as to how to do it, none of them explain what exactly is going on. For example, to get the nth byte of the integer I found this response:
int x = (number >> (8*n)) & 0xff;
Even though I understand the bit shifting component (shifting 8n digits to the right) I don't see where the & and 0xff come in, and what they mean aside from & is a bitwise AND operator.
So, how would this sort of logic apply to a big-endian/little-endian byte swapping method for 64 bit integers?
Thank you in advance.
It might be easiest to think of an analogy with decimal numbers:
Take the number 308. It has three digits, '3', '0', and '8'. By convention, digits to the left are more significant than digits to the right. But the convention could just as easily have been the other way...the digits could've been written in reverse order (e.g., 803).
Why is this relevant? Consider a hexadecimal representation of a number on a computer: 0xabcd0123. In a mathematically rigorous sense, one can view this number as 4 radix-256 digits. (i.e., 0xab, 0xcd, 0x01, 0x23). So, endianness is about the convention by which these radix-256 digits are ordered when written into memory.
Little-endian means "write the least significant digit to the lowest address";
Big-endian means "write the most significant digit to the lowest address".
So, on to the mechanics of processing endianness:
If you think of the decimal example above, how would you get each digit? The least significant digit is given by taking the number modulo 10 (i.e., 308 % 10 = 8). The second digit can be found by dividing the number by 10, then taking it modulo 10 (i.e., 308 / 10 = 30; 30 % 10 = 0) and so forth.
The process is exactly the same for binary data on a computer, except that it's treated as radix-256 instead of radix-10 like decimal digits. This is where a few tricks come in.
When doing modulo with a modulus that is a power of 2, you can do it via AND. Let m=256 as our modulus. Because m = 2 to some power, x % m is equivalent to x & (m-1). This is a numerical fact that is out-of scope for this answer.
When doing division by a power of 2, you can do it via right-shift. That is, let m=256 be our divisor. Because m = 2 to the 8th power, x / m is equivalent to x >> 8.
Thus binary endian-specific serialization uses exactly the process above:
uint32_t val = 0xabcd0123;
(val & 0xff) is equivalent to (val % 256), and yields 0x23.
((val >> 8) & 0xff) is equivalent to ((val / 256) % 256), and yields 0x01.
((val >> 16) & 0xff) is equivalent to (((val / 256)/256) % 256), and yields 0xcd.
and so on. So now that you have access to the digits/bytes, you simply have to chose the order in which to store them. Per above, "big endian = most-significant at lowest address", "little endian = least-significant at lowest address".

Creating a byte (8 bits) from 4 2 bits

I am trying to figure out a way to get as much out of the limited memory in my microcontroller (32kb) and am seeking suggestions or pointers to an algorithm that performs what I am attempting to do.
Some background: I am sending Manchester Encoded bits out a SPI (Serial Peripheral Interface) directly from DMA. As the smallest possible unit I can store data into DMA is a byte (8 bits), I am having to represent my 1's as 0b11110000 and my 0's as 0b00001111. This basically means that for every bit of information, I need to use a byte (8 bits) of memory. Which is very inefficient.
If I could reduce this, so that my 1's are represented as 0b10 and my 0's as 0b01, I'd only have to use a 1/4 of a byte (2 bits) for every 1 bit of memory, which is fine for my solution.
Now, if I could save to DMA in bits, this would not be a problem, but of course I need to work with bytes. So I know the solution to my problem involves collecting the 8 bits (or in my case, 4 2bits) and then storing to DMA as a byte.
Questions:
Is there a standard way to solve this problem?
How can I some how create a 8 bit number from a collection of 4 2 bit numbers? But I do not want the addition of these numbers, but the actual way it looks when collected together.
For example: I have the following 4 2 bit numbers (keeping in mind that 0b10 represents 1 and 0b01 represents 0) (Also, the type these are stored in is open to the solution, as obviously there is no such thing as a 2 bit type)
Number1: 0b01 Number 2: 0b10 Number 3: 0b10 Number4: 0b01
And I want to create the following 8 bit number from these:
8 Bit Number: 0b01 10 10 01 or without the spaces 0b01101001 (0x69)
I am programming in c
It seems that you can pack four numbers a, b, c, d, all of which of value zero or one, like so:
64 * (a + 1) + 16 * (b + 1) + 4 * (c + 1) + (d + 1)
This is using the fact that x + 1 encodes your two-bit integer: 1 becomes 0b10, and 0 becomes 0b01.
It's Manchester encoding so 0b11110000 and 0b00001111 should be the only candidates. If so, then reduce the memory by a factor of 8.
uint8_t PackedByte = 0;
for (i=0; i<8; i++) {
PackedByte <<= 1;
if (buf[i] == 0xF0) // 0b11110000
PackedByte++;
}
Other other hand, if it's Manchester encoding and one may not have perfect encoding, then there are 3 results: 0, 1, indeterminate.
uint8_t PackedByte = 0;
for (i=0; i<8; i++) {
int upper = BitCount(buf[i] >> 4);
int lower = BitCount(buf[i] & 0xF);
if (upper > lower)
PackedByte++;
else if (upper == lower)
Hande_Indeterminate();
}
Various simplifications absent in the above, but shown for logic flow.
To number get abcd from (a,b,c,d) you need to shift the number to their places and OR :-
(a<<6)|(b<<4)|(c<<2)|d

converting little endian hex to big endian decimal in C

I am trying to understand and implement a simple file system based on FAT12. I am currently looking at the following snippet of code and its driving me crazy:
int getTotalSize(char * mmap)
{
int *tmp1 = malloc(sizeof(int));
int *tmp2 = malloc(sizeof(int));
int retVal;
* tmp1 = mmap[19];
* tmp2 = mmap[20];
printf("%d and %d read\n",*tmp1,*tmp2);
retVal = *tmp1+((*tmp2)<<8);
free(tmp1);
free(tmp2);
return retVal;
};
From what I've read so far, the FAT12 format stores the integers in little endian format.
and the code above is getting the size of the file system which is stored in the 19th and 20th byte of boot sector.
however I don't understand why retVal = *tmp1+((*tmp2)<<8); works. is the bitwise <<8 converting the second byte to decimal? or to big endian format?
why is it only doing it to the second byte and not the first one?
the bytes in question are [in little endian format] :
40 0B
and i tried converting them manually by switching the order first to
0B 40
and then converting from hex to decimal, and I get the right output, I just don't understand how adding the first byte to the bitwise shift of second byte does the same thing?
Thanks
The use of malloc() here is seriously facepalm-inducing. Utterly unnecessary, and a serious "code smell" (makes me doubt the overall quality of the code). Also, mmap clearly should be unsigned char (or, even better, uint8_t).
That said, the code you're asking about is pretty straight-forward.
Given two byte-sized values a and b, there are two ways of combining them into a 16-bit value (which is what the code is doing): you can either consider a to be the least-significant byte, or b.
Using boxes, the 16-bit value can look either like this:
+---+---+
| a | b |
+---+---+
or like this, if you instead consider b to be the most significant byte:
+---+---+
| b | a |
+---+---+
The way to combine the lsb and the msb into 16-bit value is simply:
result = (msb * 256) + lsb;
UPDATE: The 256 comes from the fact that that's the "worth" of each successively more significant byte in a multibyte number. Compare it to the role of 10 in a decimal number (to combine two single-digit decimal numbers c and d you would use result = 10 * c + d).
Consider msb = 0x01 and lsb = 0x00, then the above would be:
result = 0x1 * 256 + 0 = 256 = 0x0100
You can see that the msb byte ended up in the upper part of the 16-bit value, just as expected.
Your code is using << 8 to do bitwise shifting to the left, which is the same as multiplying by 28, i.e. 256.
Note that result above is a value, i.e. not a byte buffer in memory, so its endianness doesn't matter.
I see no problem combining individual digits or bytes into larger integers.
Let's do decimal with 2 digits: 1 (least significant) and 2 (most significant):
1 + 2 * 10 = 21 (10 is the system base)
Let's now do base-256 with 2 digits: 0x40 (least significant) and 0x0B (most significant):
0x40 + 0x0B * 0x100 = 0x0B40 (0x100=256 is the system base)
The problem, however, is likely lying somewhere else, in how 12-bit integers are stored in FAT12.
A 12-bit integer occupies 1.5 8-bit bytes. And in 3 bytes you have 2 12-bit integers.
Suppose, you have 0x12, 0x34, 0x56 as those 3 bytes.
In order to extract the first integer you only need take the first byte (0x12) and the 4 least significant bits of the second (0x04) and combine them like this:
0x12 + ((0x34 & 0x0F) << 8) == 0x412
In order to extract the second integer you need to take the 4 most significant bits of the second byte (0x03) and the third byte (0x56) and combine them like this:
(0x56 << 4) + (0x34 >> 4) == 0x563
If you read the official Microsoft's document on FAT (look up fatgen103 online), you'll find all the FAT relevant formulas/pseudo code.
The << operator is the left shift operator. It takes the value to the left of the operator, and shift it by the number used on the right side of the operator.
So in your case, it shifts the value of *tmp2 eight bits to the left, and combines it with the value of *tmp1 to generate a 16 bit value from two eight bit values.
For example, lets say you have the integer 1. This is, in 16-bit binary, 0000000000000001. If you shift it left by eight bits, you end up with the binary value 0000000100000000, i.e. 256 in decimal.
The presentation (i.e. binary, decimal or hexadecimal) has nothing to do with it. All integers are stored the same way on the computer.

How to create mask with least significat bits set to 1 in C

Can someone please explain this function to me?
A mask with the least significant n bits set to 1.
Ex:
n = 6 --> 0x2F, n = 17 --> 0x1FFFF // I don't get these at all, especially how n = 6 --> 0x2F
Also, what is a mask?
The usual way is to take a 1, and shift it left n bits. That will give you something like: 00100000. Then subtract one from that, which will clear the bit that's set, and set all the less significant bits, so in this case we'd get: 00011111.
A mask is normally used with bitwise operations, especially and. You'd use the mask above to get the 5 least significant bits by themselves, isolated from anything else that might be present. This is especially common when dealing with hardware that will often have a single hardware register containing bits representing a number of entirely separate, unrelated quantities and/or flags.
A mask is a common term for an integer value that is bit-wise ANDed, ORed, XORed, etc with another integer value.
For example, if you want to extract the 8 least significant digits of an int variable, you do variable & 0xFF. 0xFF is a mask.
Likewise if you want to set bits 0 and 8, you do variable | 0x101, where 0x101 is a mask.
Or if you want to invert the same bits, you do variable ^ 0x101, where 0x101 is a mask.
To generate a mask for your case you should exploit the simple mathematical fact that if you add 1 to your mask (the mask having all its least significant bits set to 1 and the rest to 0), you get a value that is a power of 2.
So, if you generate the closest power of 2, then you can subtract 1 from it to get the mask.
Positive powers of 2 are easily generated with the left shift << operator in C.
Hence, 1 << n yields 2n. In binary it's 10...0 with n 0s.
(1 << n) - 1 will produce a mask with n lowest bits set to 1.
Now, you need to watch out for overflows in left shifts. In C (and in C++) you can't legally shift a variable left by as many bit positions as the variable has, so if ints are 32-bit, 1<<32 results in undefined behavior. Signed integer overflows should also be avoided, so you should use unsigned values, e.g. 1u << 31.
For both correctness and performance, the best way to accomplish this has changed since this question was asked back in 2012 due to the advent of BMI instructions in modern x86 processors, specifically BLSMSK.
Here's a good way of approaching this problem, while retaining backwards compatibility with older processors.
This method is correct, whereas the current top answers produce undefined behavior in edge cases.
Clang and GCC, when allowed to optimize using BMI instructions, will condense gen_mask() to just two ops. With supporting hardware, be sure to add compiler flags for BMI instructions:
-mbmi -mbmi2
#include <inttypes.h>
#include <stdio.h>
uint64_t gen_mask(const uint_fast8_t msb) {
const uint64_t src = (uint64_t)1 << msb;
return (src - 1) ^ src;
}
int main() {
uint_fast8_t msb;
for (msb = 0; msb < 64; ++msb) {
printf("%016" PRIx64 "\n", gen_mask(msb));
}
return 0;
}
First, for those who only want the code to create the mask:
uint64_t bits = 6;
uint64_t mask = ((uint64_t)1 << bits) - 1;
# Results in 0b111111 (or 0x03F)
Thanks to #Benni who asked about using bits = 64. If you need the code to support this value as well, you can use:
uint64_t bits = 6;
uint64_t mask = (bits < 64)
? ((uint64_t)1 << bits) - 1
: (uint64_t)0 - 1
For those who want to know what a mask is:
A mask is usually a name for value that we use to manipulate other values using bitwise operations such as AND, OR, XOR, etc.
Short masks are usually represented in binary, where we can explicitly see all the bits that are set to 1.
Longer masks are usually represented in hexadecimal, that is really easy to read once you get a hold of it.
You can read more about bitwise operations in C here.
I believe your first example should be 0x3f.
0x3f is hexadecimal notation for the number 63 which is 111111 in binary, so that last 6 bits (the least significant 6 bits) are set to 1.
The following little C program will calculate the correct mask:
#include <stdarg.h>
#include <stdio.h>
int mask_for_n_bits(int n)
{
int mask = 0;
for (int i = 0; i < n; ++i)
mask |= 1 << i;
return mask;
}
int main (int argc, char const *argv[])
{
printf("6: 0x%x\n17: 0x%x\n", mask_for_n_bits(6), mask_for_n_bits(17));
return 0;
}
0x2F is 0010 1111 in binary - this should be 0x3f, which is 0011 1111 in binary and which has the 6 least-significant bits set.
Similarly, 0x1FFFF is 0001 1111 1111 1111 1111 in binary, which has the 17 least-significant bits set.
A "mask" is a value that is intended to be combined with another value using a bitwise operator like &, | or ^ to individually set, unset, flip or leave unchanged the bits in that other value.
For example, if you combine the mask 0x2F with some value n using the & operator, the result will have zeroes in all but the 6 least significant bits, and those 6 bits will be copied unchanged from the value n.
In the case of an & mask, a binary 0 in the mask means "unconditionally set the result bit to 0" and a 1 means "set the result bit to the input value bit". For an | mask, an 0 in the mask sets the result bit to the input bit and a 1 unconditionally sets the result bit to 1, and for an ^ mask, an 0 sets the result bit to the input bit and a 1 sets the result bit to the complement of the input bit.

Homework - C bit puzzle - Perform % using C bit operations (no looping, conditionals, function calls, etc)

I'm completely stuck on how to do this homework problem and looking for a hint or two to keep me going. I'm limited to 20 operations (= doesn't count in this 20).
I'm supposed to fill in a function that looks like this:
/* Supposed to do x%(2^n).
For example: for x = 15 and n = 2, the result would be 3.
Additionally, if positive overflow occurs, the result should be the
maximum positive number, and if negative overflow occurs, the result
should be the most negative number.
*/
int remainder_power_of_2(int x, int n){
int twoToN = 1 << n;
/* Magic...? How can I do this without looping? We are assuming it is a
32 bit machine, and we can't use constants bigger than 8 bits
(0xFF is valid for example).
However, I can make a 32 bit number by ORing together a bunch of stuff.
Valid operations are: << >> + ~ ! | & ^
*/
return theAnswer;
}
I was thinking maybe I could shift the twoToN over left... until I somehow check (without if/else) that it is bigger than x, and then shift back to the right once... then xor it with x... and repeat? But I only have 20 operations!
Hint: In decadic system to do a modulo by power of 10, you just leave the last few digits and null the other. E.g. 12345 % 100 = 00045 = 45. Well, in computer numbers are binary. So you have to null the binary digits (bits). So look at various bit manipulation operators (&, |, ^) to do so.
Since binary is base 2, remainders mod 2^N are exactly represented by the rightmost bits of a value. For example, consider the following 32 bit integer:
00000000001101001101000110010101
This has the two's compliment value of 3461525. The remainder mod 2 is exactly the last bit (1). The remainder mod 4 (2^2) is exactly the last 2 bits (01). The remainder mod 8 (2^3) is exactly the last 3 bits (101). Generally, the remainder mod 2^N is exactly the last N bits.
In short, you need to be able to take your input number, and mask it somehow to get only the last few bits.
A tip: say you're using mod 64. The value of 64 in binary is:
00000000000000000000000001000000
The modulus you're interested in is the last 6 bits. I'll provide you a sequence of operations that can transform that number into a mask (but I'm not going to tell you what they are, you can figure them out yourself :D)
00000000000000000000000001000000 // starting value
11111111111111111111111110111111 // ???
11111111111111111111111111000000 // ???
00000000000000000000000000111111 // the mask you need
Each of those steps equates to exactly one operation that can be performed on an int type. Can you figure them out? Can you see how to simplify my steps? :D
Another hint:
00000000000000000000000001000000 // 64
11111111111111111111111111000000 // -64
Since your divisor is always power of two, it's easy.
uint32_t remainder(uint32_t number, uint32_t power)
{
power = 1 << power;
return (number & (power - 1));
}
Suppose you input number as 5 and divisor as 2
`00000000000000000000000000000101` number
AND
`00000000000000000000000000000001` divisor - 1
=
`00000000000000000000000000000001` remainder (what we expected)
Suppose you input number as 7 and divisor as 4
`00000000000000000000000000000111` number
AND
`00000000000000000000000000000011` divisor - 1
=
`00000000000000000000000000000011` remainder (what we expected)
This only works as long as divisor is a power of two (Except for divisor = 1), so use it carefully.

Resources