Project Euler Problem 10 - Efficient Algorithm - c

I attempted Project Euler's problem 10 using the very easy algorithm and the running time looks like hours. So I googled for an efficient algorithm and found this by Shlomif Fish.
The code is reproduced below:
int main(int argc, char * argv[])
{
int p, i;
int mark_limit;
long long sum = 0;
memset(bitmask, '\0', sizeof(bitmask));
mark_limit = (int)sqrt(limit);
for (p=2 ; p <= mark_limit ; p++)
{
if (! ( bitmask[p>>3]&(1 << (p&(8-1))) ) )
{
/* It is a prime. */
sum += p;
for (i=p*p;i<=limit;i+=p)
{
bitmask[i>>3] |= (1 << (i&(8-1)));
}
}
}
for (; p <= limit; p++)
{
if (! ( bitmask[p>>3]&(1 << (p&(8-1))) ) )
{
sum += p;
}
}
I have problems understanding the code. Specifically, how does this bit shifting code able to determine whether a number is prime or not.
if (! ( bitmask[p>>3]&(1 << (p&(8-1))) ) )
{
/* It is a prime. */
sum += p;
for (i=p*p;i<=limit;i+=p)
{
bitmask[i>>3] |= (1 << (i&(8-1)));
}
}
Can someone please explain this code block to me, especially this part ( bitmask[p>>3]&(1 << (p&(8-1)? Thank you very much.

The code is a modified Sieve of Eratosthenes. He is packing one number into one bit: 0 = prime, 1 = composite. The bit shifting is to get to the correct bit in the byte array.
bitmask[p>>3]
is equivalent to
bitmask[p / 8]
which selects the correct byte in the bitmask[] array.
(p&(8-1))
equals p & 7, which selects the lower 3 bits of p. This is equivalent to p % 8
Overall we are selecting bit (p % 8) of byte bitmask[p / 8]. That is we are selecting the bit in the bitmask[] array which represents the number p.
The 1 << (p % 8) sets up a 1 bit correctly located in a byte. This is then AND'ed with the bitmask[p / 8] byte to see if that particular bit is set or not, thus checking whether p is a prime number.
The overall statement equates to if (isPrime(p)), using the already completed part of the sieve to help extend the sieve.

The bitmask is acting as an array of bits. Since you can't address bits individually, you first have to access the byte and then modify a bit within it. Shifting right by 3 is the same as dividing by 8, which puts you on the right byte. The one is then shifted into place by the remainder.
x>>3 is equivalent to x/8
x&(8-1) is equivalent to x%8
But on some older systems, the bit manipulations may have been faster.
The line sets the ith bit, where i has been determined not to be prime because it is a multiple of another prime number number:
bitmask[i>>3] |= (1 << (i&(8-1)));
This line checks that the pth bit is not set, which means it is prime, since if it wasn't prime it would have been set by the line above.
if (! ( bitmask[p>>3]&(1 << (p&(8-1))) ) )

Related

left Rotate algorithm binary value in c

I'm trying to left rotate binary value:
int left_side=0;
for (int i = 0; i < n; i++)
{
left_side = left_side | ( ( number & ( 1<<BITS-i )) >> (BITS+i+1)-n );
}
BITS indicates the length of the binary value, n is the distance for rotation.
Example: 1000 , and n=1 which means the solution will be: 0001.
Some reason that I don't understand when I rotate it (from left to right side), lets take an example for number 253 which in binary sequence is 11111101 and n=3 (distance), the result from my code in binary sequence is 101 (which is 5).
Why the answer isn't 7? What I missed in my condition in this loop?
Thanks.
You want to rotate left your number n of a specific amount of bits.
Thus you have to shift your number to the left using n << amount and put the left bits to the right. This is done by putting the bits [0-amount[ to [NUMBER_BITS-amount,NUMBER_BITS[` using right shift.
For instance, if your number is an uint32_t you can use the following code (or easily adapt it to other types).
uint32_t RotateLeft(uint32_t n, int amount) {
return (n << amount)|(n >> (8*sizeof(uint32_t) - amount));
}
I think there are 2 approaches:
rotate by 1 bit n times
rotate by n % BITS bits only once
The rotation "wraps over" so we could just modulo every BITSth (8th) multiple of n to 0 in case someone would want to rotate 100 times. I think the second approach would be easier to understand, implement and read and why loop 100 times, if it can be done once?
Algorithm:
I will use 8 bits for demonstration, even though int is minimum 16 bits. The original MSB is marked by a dot.
.11110000 rl 2 == 110000.11
Well what has happened? 2 bits went right and the rest (BITS - 2) went left. That is just shift left and shift right "combined".
a = .11110000 << 2 == 110000.00
b = .11110000 >> (BITS - 2) == 000000.11
c = a | b
c == 110000.11
Easy, isn't it? Just remember to use n % BITS first and to use an unsigned type.
unsigned int rotateLeft(unsigned int number, int n) {
n %= BITS;
unsigned int left = number << n;
unsigned int right = number >> (BITS - n);
return left | right;
}

Convert int to binary string of certain size

I'm struggling to adapt to C after programming in Java for some time and I need help. What I'm looking for is a method that takes following input:
Integer n, the one to be converted to binary string (character array).
Integer length, which defines the length of the string (positions from the left not filled with the binary numbers are going to be set to default 0).
//Here's some quick code in Java to get a better understanding of what I'm looking for:
public static String convertToBinary(int length, int n) {
return String.format("%1$" + bit + "s", Integer.toBinaryString(value)).replace(' ', '0');
}
System.out.println(convertToBinary(8,1));
// OUTPUT:
00000001 (not just 1 or 01)
Any hints on what the equivalent of this would be in C? Also, could you provide me with an example of how the resulting binary string should be returned?
(not a duplicate, since what I'm looking for is '00000001', not simply '1')
The C standard library does not contain an equivalent function to Integer.toBinaryString(). The good news is, writing such a function won't be too complicated, and if you're in the process of learning C, this problem is fairly ideal for learning how to use the bitwise operators.
You'll want to consult an existing tutorial or manual for all the details, but here are a few examples of the sort of things that would be useful for this or similar tasks. All numbers are unsigned integers in these examples.
n >> m shifts all bits in n right by m steps, and fills in zeros on the left side. So if n = 13 (1101 in binary), n >> 1 would be 6 (i.e. 110), and n >> 2 would be 3 (i.e. 11).
n << m does the same thing, but shifting left. 3 << 2 == 12. This is equivalent to multiplying n by 2 to the power of m. (If it isn't obvious why that is, you'll want to think about how binary numbers are represented for awhile until you understand it clearly; it'll make things easier if you have an intuitive understanding of that property.)
n & m evaluates to a number such that each bit of the result is 1 if and only if it's 1 in both n and m. e.g. 12 & 5 == 4, (1100, 0101, and 0100 being the respective representations of 12, 5, and 4).
So putting those together, n & (1 << i) will be nonzero if and only if bit i is set: 1 obviously only has a single bit set, 1 << i moves it to the appropriate position, and n & (1 << i) checks if that position also has a 1 bit for n. (keeping in mind that the rightmost/least significant bit is bit 0, not bit 1.) So using that, it's a simple matter of checking each bit individually to see if it's 1 or 0, and you have your binary conversion function.
like this:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <limits.h>
char *convertToBinary(int length, int n) {
unsigned num = (unsigned)n;
int n_bit = CHAR_BIT * sizeof(num);
if(length > n_bit){
fprintf(stderr, "specified length greater than maximum length.\n");
length = n_bit;//or expand size?
}
char *bin = malloc(n_bit + 1);//static char bin[CHAR_BIT * sizeof(num)+1]; If you change, memmove(-->return p;), free is not necessary.
memset(bin, '0', n_bit);
bin[n_bit] = 0;
char *p = bin + n_bit;
do {
*--p = "01"[num & 1];
num >>= 1;
}while(num);
int bits = bin + n_bit - p;
if(bits < length){
p -= length - bits;
return memmove(bin, p, length + 1);
} else if(bits > length){
fprintf(stderr, "Specified length is not enough.(%s but length is %d)\n", p, length);
return memmove(bin, p, bits+1);//or cut off
/*
free(bin);
return ""; or return NULL;
*/
}// else if(bits == length)
return bin;
}
int main(void){
char *sbin = convertToBinary(8, 1);
puts(sbin);
free(sbin);
return 0;
}

Sieve of Eratosthenes using a bit array

I have a bit array prime[]of unsigned int. I wish to implement a Sieve of Eratosthenes using this array, by having each bit represent a number n. That is, given n, the array element that holds the bit that corresponds to n would be prime[n/32] and the specific bit would be in position n%32.
My testBitIs0(int n) function returns 1 when the number is prime (if its bit == 0), otherwise 0:
return ( (prime[n/32] & (1 << (n%32) )) != 0);
My setBit(int n) function simply sets the bit to 1 at the corresponding position:
int i = n/32;
int pos = n%32;
unsigned int flag = 1;
flag = flag << pos;
prime[i] = prime[i] | flag;
The issue that I'm having is that when I call setBit with multiples of a prime number, I don't think it sets the bit correctly. When I call setBit with multiples of a prime number (such as 4, 6, 8, etc. for the number 2) the next time I run this line:
if(testBitIs0(i)) { ... }
With i = 4/6/8/etc it will still return 1 when it should return 0.
Can someone please check my code to make sure I am implementing this correctly? Thanks.
This looks like it does what you're after. There's a bit array and some bit twiddling functions too.
http://bcu.copsewood.net/dsalg/bitwise/bitwise.html

How do the bit manipulations in this bit-sorting code work?

Jon Bentley in Column 1 of his book programming pearls introduces a technique for sorting a sequence of non-zero positive integers using bit vectors.
I have taken the program bitsort.c from here and pasted it below:
/* Copyright (C) 1999 Lucent Technologies */
/* From 'Programming Pearls' by Jon Bentley */
/* bitsort.c -- bitmap sort from Column 1
* Sort distinct integers in the range [0..N-1]
*/
#include <stdio.h>
#define BITSPERWORD 32
#define SHIFT 5
#define MASK 0x1F
#define N 10000000
int a[1 + N/BITSPERWORD];
void set(int i)
{
int sh = i>>SHIFT;
a[i>>SHIFT] |= (1<<(i & MASK));
}
void clr(int i) { a[i>>SHIFT] &= ~(1<<(i & MASK)); }
int test(int i){ return a[i>>SHIFT] & (1<<(i & MASK)); }
int main()
{ int i;
for (i = 0; i < N; i++)
clr(i);
/*Replace above 2 lines with below 3 for word-parallel init
int top = 1 + N/BITSPERWORD;
for (i = 0; i < top; i++)
a[i] = 0;
*/
while (scanf("%d", &i) != EOF)
set(i);
for (i = 0; i < N; i++)
if (test(i))
printf("%d\n", i);
return 0;
}
I understand what the functions clr, set and test are doing and explain them below: ( please correct me if I am wrong here ).
clr clears the ith bit
set sets the ith bit
test returns the value at the ith bit
Now, I don't understand how the functions do what they do. I am unable to figure out all the bit manipulation happening in those three functions.
The first 3 constants are inter-related. BITSPERWORD is 32. This you'd want to set based on your compiler+architecture. SHIFT is 5, because 2^5 = 32. Finally, MASK is 0x1F which is 11111 in binary (ie: the bottom 5 bits are all set). Equivalently, MASK = BITSPERWORD - 1.
The bitset is conceptually just an array of bits. This implementation actually uses an array of ints, and assumes 32 bits per int. So whenever we want to set, clear or test (read) a bit we need to figure out two things:
which int (of the array) is it in
which of that int's bits are we talking about
Because we're assuming 32 bits per int, we can just divide by 32 (and truncate) to get the array index we want. Dividing by 32 (BITSPERWORD) is the same as shifting to the right by 5 (SHIFT). So that's what the a[i>>SHIFT] bit is about. You could also write this as a[i/BITSPERWORD] (and in fact, you'd probably get the same or very similar code assuming your compiler has a reasonable optimizer).
Now that we know which element of a we want, we need to figure out which bit. Really, we want the remainder. We could do this with i%BITSPERWORD, but it turns out that i&MASK is equivalent. This is because BITSPERWORD is a power of 2 (2^5 in this case) and MASK is the bottom 5 bits all set.
Basically is a bucket sort optimized:
reserve a bit array of length n
bits.
clear the bit array (first for in main).
read the items one by one (they must all be distinct).
set the i'th bit in the bit array if the read number is i.
iterate the bit array.
if the bit is set then print the position.
Or in other words (for N < 10 and to sort 3 numbers 4, 6, 2) 0
start with an empty 10 bit array (aka one integer usually)
0000000000
read 4 and set the bit in the array..
0000100000
read 6 and set the bit in the array
0000101000
read 2 and set the bit in the array
0010101000
iterate the array and print every position in which the bits are set to one.
2, 4, 6
sorted.
Starting with set():
A right shift of 5 is the same as dividing by 32. It does that to find which int the bit is in.
MASK is 0x1f or 31. ANDing with the address gives the bit index within the int. It's the same as the remainder of dividing the address by 32.
Shifting 1 left by the bit index ("1<<(i & MASK)") results in an integer which has just 1 bit in the given position set.
ORing sets the bit.
The line "int sh = i>>SHIFT;" is a wasted line, because they didn't use sh again beneath it, and instead just repeated "i>>SHIFT"
clr() is basically the same as set, except instead of ORing with 1<<(i & MASK) to set the bit, it ANDs with the inverse to clear the bit. test() ANDs with 1<<(i & MASK) to test the bit.
The bitsort will also remove duplicates from the list, because it will only count up to 1 per integer. A sort that uses integers instead of bits to count more than 1 of each is called a radix sort.
The bit magic is used as a special addressing scheme that works well with row sizes that are powers of two.
If you try understand this (note: I rather use bits-per-row than bits-per-word, since we're talking about a bit-matrix here):
// supposing an int of 1 bit would exist...
int1 bits[BITSPERROW * N]; // an array of N x BITSPERROW elements
// set bit at x,y:
int linear_address = y*BITSPERWORD + x;
bits + linear_address = 1; // or 0
// 0 1 2 3 4 5 6 7 8 9 10 11 ... 31
// . . . . . . . . . . . . .
// . . . . X . . . . . . . . -> x = 4, y = 1 => i = (1*32 + 4)
The statement linear_address = y*BITSPERWORD + x also means that x = linear_address % BITSPERWORD and y = linear_address / BITSPERWORD.
When you optimize this in memory by using 1 word of 32 bits per row, you get the fact that a bit at column x can be set using
int bitrow = 0;
bitrow |= 1 << (x);
Now when we iterate over the bits, we have the linear address, but need to find the corresponding word.
int column = linear_address % BITSPERROW;
int bit_mask = 1 << column; // meaning for the xth column,
// you take 1 and shift that bit x times
int row = linear_address / BITSPERROW;
So to set the i'th bit, you can do this:
bits[ i%BITSPERROW ] |= 1 << (linear_address / BITSPERROW );
An extra gotcha is, that the modulo operator can be replaced by a logical AND, and the / operator can be replaced by a shift, too, if the second operand is a power of two.
a % BITSPERROW == a & ( BITSPERROW - 1 ) == a & MASK
a / BITSPERROW == a >> ( log2(BITSPERROW) ) == a & SHIFT
This ultimately boils down to the very dense, yet hard-to-understand-for-the-bitfucker-agnostic notation
a[ i >> SHIFT ] |= ( 1 << (i&MASK) );
But I don't see the algorithm working for e.g. 40 bits per word.
Quoting the excerpts from Bentleys' original article in DDJ, this is what the code does at a high level:
/* phase 1: initialize set to empty */
for (i = 0; i < n; i++)
bit[i] = 0
/* phase 2: insert present elements */
for each i in the input file
bit[i] = 1
/* phase 3: write sorted output */
for (i = 0; i < n; i++)
if bit[i] == 1
write i on the output file
A few doubts :
1. Why is it a need for a 32 bit ?
2. Can we do this in Java by creating a HashMap with Keys from 0000000 to 9999999
and values 0 or 1 based on the presence/absence of the bit ? What are the implications
for such a program ?

Fastest way to count number of bit transitions in an unsigned int

I'm looking for the fastest way of counting the number of bit transitions in an unsigned int.
If the int contains: 0b00000000000000000000000000001010
The number of transitions are: 4
If the int contains: 0b00000000000000000000000000001001
The number of transitions are: 3
Language is C.
int numTransitions(int a)
{
int b = a >> 1; // sign-extending shift properly counts bits at the ends
int c = a ^ b; // xor marks bits that are not the same as their neighbors on the left
return CountBits(c); // count number of set bits in c
}
For an efficient implementation of CountBits see http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel
Fastest depends on your scenario:
As you specified your datatype as constant sized (unsigned int), it is possible with lookup table. But when you need this operation only once the constant overhead to init the table is too big, and scanning+counting through the int is far faster despite.
I guess the overall best would be a combination: Look up table for a byte or word (256 or 64k entries is not so much), and then combine the bytes/words by their last/first bit.
In C/C++ I would do the following:
unsigned int Transitions(unsigned int value)
{
unsigned int result = 0;
for (unsigned int markers = value ^ (value >> 1); markers; markers = markers >> 1)
{
if (markers & 0x01) result++;
}
return result;
}
Here's the code using arithmetic shift + xor and Kernighan's method for bit counting:
int count_transitions(int x)
{
assert((-1 >> 1) < 0); // check for arithmetic shift
int count = 0;
for(x ^= (x >> 1); x; x &= x - 1)
++count;
return count;
}
What language?
I would loop 64 times and then bit shift your number to inspect of the bits, then store the previous bit and compare it to the current one. If it's different, incremember your count.
Ok, with transitions you mean if you walk through the string of 0-s and 1-s, you count each occurance that a 0 follows a 1 or a 1 follows a 0.
This is easy by shifting bits out and counting the changes:
transitions(n)
result = 0
prev = n mod 2
n = n div 2
while n<>0
if n mod 2 <> prev then
result++
prev = n mod 2
fi
n = n div 2
elihw
return result
you can replace the mod and div with shifts.

Resources