Long long int makes my Sieve of Eratosthenes super slow? - c

I have a program that requires me to find primes up till 10**10-1 (10,000,000,000). I wrote a Sieve of Eratosthenes to do this, and it worked very well (and accurately) as high as 10**9 (1,000,000,000). I confirmed its accuracy by having it count the number of primes it found, and it matched the value of 50,847,534 on the chart found here. I used unsigned int as the storage type and it successfully found all the primes in approximately 30 seconds.
However, 10**10 requires that I use a larger storage type: long long int. Once I switched to this, the program is running signifigantly slower (its been 3 hours plus and its still working). Here is the relevant code:
typedef unsigned long long ul_long;
typedef unsigned int u_int;
ul_long max = 10000000000;
u_int blocks = 1250000000;
char memField[1250000000];
char mapBit(char place) { //convert 0->0x80, 1->0x40, 2->0x20, and so on
return 0x80 >> (place);
}
for (u_int i = 2; i*i < max; i++) {
if (memField[i / 8] & activeBit) { //Use correct memory block
for (ul_long n = 2 * i; n < max; n += i) {
char secondaryBit = mapBit(n % 8); //Determine bit position of n
u_int activeByte = n / 8; //Determine correct memory block
if (n < 8) { //Manual override memory block and bit for first block
secondaryBit = mapBit(n);
activeByte = 0;
}
memField[activeByte] &= ~(secondaryBit); //Set the flag to false
}
}
activeBit = activeBit >> 1; //Check the next
if (activeBit == 0x00) activeBit = 0x80;
}
I figure that since 10**10 is 10x larger then 10**9 it should take 10 times the amount of time. Where is the flaw in this? Why did changing to long long cause such significant performance issues and how can I fix this? I recognize that the numbers get larger, so it should be somewhat slower, but only towards the end. Is there something I'm missing.
Note: I realize long int should technically be large enough but my limits.h says it isn't even though I'm compiling 64 bit. Thats why I use long long int in case anyone was wondering. Also, keep in mind, I have no computer science training, just a hobbyist.
edit: just ran it in "Release" as x86-64 with some of the debug statements suggested. I got the following output:
looks like I hit the u_int bound. I don't know why i is getting that large.

Your program has an infinite loop in for (u_int i = 2; i*i < max; i++). i is an unsigned int so i*i wraps at 32-bit and is always less than max. Make i an ul_long.
Note that you should use simpler bit pattern from 1 to 0x80 for bit 0 to 7.
Here is a complete version:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef unsigned long long ul_long;
typedef unsigned int u_int;
#define TESTBIT(a, bit) (a[(bit) / 8] & (1 << ((bit) & 7)))
#define CLEARBIT(a, bit) (a[(bit) / 8] &= ~(1 << ((bit) & 7)))
ul_long count_primes(ul_long max) {
size_t blocks = (max + 7) / 8;
unsigned char *memField = malloc(blocks);
if (memField == NULL) {
printf("cannot allocate memory for %llu bytes\n",
(unsigned long long)blocks);
return 0;
}
memset(memField, 255, blocks);
CLEARBIT(memField, 0); // 0 is not prime
CLEARBIT(memField, 1); // 1 is not prime
// clear bits after max
for (ul_long i = max + 1; i < blocks * 8ULL; i++) {
CLEARBIT(memField, i);
}
for (ul_long i = 2; i * i < max; i++) {
if (TESTBIT(memField, i)) { //Check if i is prime
for (ul_long n = 2 * i; n < max; n += i) {
CLEARBIT(memField, n); //Reset all multiples of i
}
}
}
unsigned int bitCount[256];
for (int i = 0; i < 256; i++) {
bitCount[i] = (((i >> 0) & 1) + ((i >> 1) & 1) +
((i >> 2) & 1) + ((i >> 3) & 1) +
((i >> 4) & 1) + ((i >> 5) & 1) +
((i >> 6) & 1) + ((i >> 7) & 1));
}
ul_long count = 0;
for (size_t i = 0; i < blocks; i++) {
count += bitCount[memField[i]];
}
printf("count of primes up to %llu: %llu\n", max, count);
free(memField);
return count;
}
int main(int argc, char *argv[]) {
if (argc > 1) {
for (int i = 1; i < argc; i++) {
count_primes(strtoull(argv[i], NULL, 0));
}
} else {
count_primes(10000000000);
}
return 0;
}
It completes in 10 seconds for 10^9 and 131 seconds for 10^10:
count of primes up to 1000000000: 50847534
count of primes up to 10000000000: 455052511

Related

Time complexity of bit reverse function in C

#include <stdio.h>
unsigned int reverseBits(unsigned int num)
{
unsigned int reverse_num = 0;
for(int i = 0; i < sizeof(unsigned int) * 8; ++i)
{
reverse_num = (reverse_num | (num & 1));
num = num >> 1;
if(i != (sizeof(unsigned int) * 8) - 1)
reverse_num = reverse_num << 1;
}
return reverse_num;
}
int main()
{
unsigned int num = 0;
scanf("%u", &num);
printf("bit reverse of %u is %u\n", num, reverseBits(num));
return 0;
}
What is the time complexity of this bit reversing function, if we change the input size to uint_8/uint_16/uint64_t, the for loop runs for the size of the input * 8 times. This functions runs in a constant time for n inputs. so what is the time complexity of this function in big "O" notation?
O(n), for n bits.
For uint_8, the algorithm runs in 8 steps.
For uint_16, the alrogithm runs in 16 steps.
etc.
I'm no expert, but some instructions sets might have a one-cycle bit reverse (use __asm__), so you can run in O(n) for n bytes; eight times faster. Some compilers might do this automgically if you use -O3.

Effective bits calculation along the array in specified position on STM32

I'm wondering if someone know effective approach to calculate bits in specified position along array?
Assuming that OP wants to count active bits
size_t countbits(uint8_t *array, int pos, size_t size)
{
uint8_t mask = 1 << pos;
uint32_t result = 0;
while(size--)
{
result += *array++ & mask;
}
return result >> pos;
}
You can just loop the array values and test for the bits with a bitwise and operator, like so:
int arr[] = {1,2,3,4,5};
// 1 - 001
// 2 - 010
// 3 - 011
// 4 - 100
// 5 - 101
int i, bitcount = 0;
for (i = 0; i < 5; ++i){
if (arr[i] & (1 << 2)){ //testing and counting the 3rd bit
bitcount++;
}
}
printf("%d", bitcount); //2
Note that i opted for 1 << 2 which tests for the 3rd bit from the right or the third least significant bit just to be easier to show. Now bitCount would now hold 2 which are the number of 3rd bits set to 1.
Take a look at the result in Ideone
In your case you would need to check for the 5th bit which can be represented as:
1 << 4
0x10000
16
And the 8th bit:
1 << 7
0x10000000
256
So adjusting this to your bits would give you:
int i, bitcount8 = 0, bitcount5 = 0;
for (i = 0; i < your_array_size_here; ++i){
if (arr[i] & 0x10000000){
bitcount8++;
}
if (arr[i] & 0x10000){
bitcount5++;
}
}
If you need to count many of them, then this solution isn't great and you'd be better off creating an array of bit counts, and calculating them with another for loop:
int i, j, bitcounts[8] = {0};
for (i = 0; i < your_array_size_here; ++i){
for (j = 0; j < 8; ++j){
//j will be catching each bit with the increasing shift lefts
if (arr[i] & (1 << j)){
bitcounts[j]++;
}
}
}
And in this case you would access the bit counts by their index:
printf("%d", bitcounts[2]); //2
Check this solution in Ideone as well
Let the bit position difference (e.g. 7 - 4 in this case) be diff.
If 2diff > n, then code can add both bits at the same time.
void count(const uint8_t *Array, size_t n, int *bit7sum, int *bit4sum) {
unsigned sum = 0;
unsigned mask = 0x90;
while (n > 0) {
n--;
sum += Array[n] & mask;
}
*bit7sum = sum >> 7;
*bit4sum = (sum >> 4) & 0x07;
}
If the processor has a fast multiply and n is still not too large, like n < pow(2,14) in this case. (Or n < pow(2,8) in the general case)
void count2(const uint8_t *Array, size_t n, int *bit7sum, int *bit4sum) {
// assume 32 bit or wider unsigned
unsigned sum = 0;
unsigned mask1 = 0x90;
unsigned m = 1 + (1u << 11); // to move bit 7 to the bit 18 place
unsigned mask2 = (1u << 18) | (1u << 4);
while (n > 0) {
n--;
sum += ((Array[n] & mask1)*m) & mask2;
}
*bit7sum = sum >> 18;
*bit4sum = ((1u << 18) - 1) & sum) >> 4);
}
Algorithm: code is using a mask, multiply, mask to separate the 2 bits. The lower bit remains in it low position while the upper bit is shifted to the upper bits. Then a parallel add occurs.
The loop avoids any branching aside from the loop itself. This can make for fast code. YMMV.
With even larger n, break it down into multiple calls to count2()

Is there a more optimal way to approach some of these functions?

I completed some bit manipulation exercises out of a textbook recently and have grasped onto some of the core ideas behind manipulating bits firmly. My main concern with making this post is for optimizations to my current code. I get the hunch that there are some functions that I could approach better. Do you have any recommendations for the following code?
#include <stdio.h>
#include "funcs.h"
// basically sizeof(int) using bit manipulation
unsigned int int_size(){
int size = 0;
for(unsigned int i = ~00u; i > 0; i >>= 1, size++);
return size;
}
// get a bit at a specific nth index
// index starts with 0 on the most significant bit
unsigned int bit_get(unsigned int data, unsigned int n){
return (data >> (int_size() - n - 1)) & 1;
}
// set a bit at a specific nth index
// index starts with 0 on the most significant bit
unsigned int bit_set(unsigned int data, unsigned int n){
return data | (1 << (int_size() - n - 1));
}
// gets the bit width of the data (<32)
unsigned int bit_width(unsigned int data){
int width = int_size();
for(; width > 0; width--)
if((data & (1 << width)) != 0)
break;
return width + 1;
}
// print the data contained in an unsigned int
void print_data(unsigned int data){
printf("%016X = ",data);
for(int i = 0; i < int_size(); i++)
printf("%X",bit_get(data,i));
putchar('\n');
}
// search for pattern in source (where pattern is n wide)
unsigned int bitpat_search(unsigned int source, unsigned int pattern,
unsigned int n){
int right = int_size() - n;
unsigned int mask = 0;
for(int i = 0; i < n; i++)
mask |= 1 << i;
for(int i = 0; i < right; i++)
if(((source & (mask << (right - i))) >> (right - i) ^ pattern) == 0)
return i - bit_width(source);
return -1;
}
// extract {count} bits from data starting at {start}
unsigned int bitpat_get(unsigned int data, int start, int count){
if(start < 0 || count < 0 || int_size() <= start || int_size() <= count || bit_width(data) != count)
return -1;
unsigned int mask = 1;
for(int i = 0; i < count; i++)
mask |= 1 << i;
mask <<= int_size() - start - count;
return (data & mask) >> (int_size() - start - count);
}
// set {count} bits (basically width of {replace}) in {*data} starting at {start}
void bitpat_set(unsigned int *data, unsigned int replace, int start, int count){
if(start < 0 || count < 0 || int_size() <= start || int_size() <= count || bit_width(replace) != count)
return;
unsigned int mask = 1;
for(int i = 0; i < count; i++)
mask |= 1 << i;
*data = ((*data | (mask << (int_size() - start - count))) & ~(mask << (int_size() - start - count))) | (replace << (int_size() - start - count));
}
because your int_size() function returns the same value each time you could save some time there:
unsigned int int_size(){
static unsigned int size = 0;
if (size == 0)
for(unsigned int i = ~00u; i > 0; i >>= 1, size++);
return size;
}
so it will calculate the value only once.
But replacing all calls of this function by sizeof(int)*8 would be much better.
I looked through your code and there's nothing that jumps out at me.
Overall, don't sweat the small stuff. If the code runs and works fine, no worries. If you are really concerned about performance, go ahead and run your code through a profiler.
Overall, I will say that the one thing you might be dealing with is the "paranoia" I see in your code regarding the width of an int. I generally use the fixed-length types in stdint.h and give the caller some options regarding what length of ints (i.e. uint8_t, uint16_t, uint32_t, etc.) they want to deal with.
Also, in C99, there are bitfields, which allow for each bit to be addressed into.
unsigned int int_size(){
return __builtin_popcount((unsigned int) -1) / __builtin_popcount((unsigned char) -1);
}
This should be faster than looping.
Including int_size() in all the others seems like its going to kill performance unless the compiler is really good at optimizing that loop out.
You could use a uint32_t instead of an int and then you would know up front the size.
You could also use sizeof(int) to get the size in bytes of an int and multiply by 8. I haven't seen an environment that defined a byte to be other than 8 bits, but the standard does seem to allow for it in saying it is implementation defined.

masking most significant bit

I wrote this function to remove the most significant bit in every byte. But this function doesn't seem to be working the way I wanted it to be.
The output file size is always '0', I don't understand why nothing's been written to the output file. Is there a better and simple way to remove the most significant bit in every byte??
In relation to shift operators, section 6.5.7 of the C standard says:
If the value of the right operand is negative or is greater than or
equal to the width of the promoted left operand, the behavior is
undefined.
So firstly, remove nBuffer << 8;. Even if it were well defined, it wouldn't be an assignment operator.
As people have mentioned, you'd be better off using CHAR_BIT than 8. I'm pretty sure, instead of 0x7f you mean UCHAR_MAX >> 1 and instead of 7 you meant CHAR_BIT - 1.
Let's just focus on nBuffer and bit_count, here. I shall comment out anything that doesn't use either of these.
bit_count += 7;
if (bit_count == 7*8)
{
*out_buf++ = nBuffer;
/*if((write(out_fd, bit_buf, sizeof(char))) == -1)
oops("Cannot write on the file", "");*/
nBuffer << 8;
bit_count -= 8;
}
nBuffer = 0;
bit_count = 0;
At the end of this code, what is the value of nBuffer? What about bit_count? What impact would that have on your second loop? while (bit_count > 0)
Now let's focus on the commented out code:
if((write(out_fd, bit_buf, sizeof(char))) == -1)
oops("Cannot write on the file", "");
Where are you assigning a value to bit_buf? Using an uninitialised variable is undefined behaviour.
Instead of going through all of the bits to find the high one, this goes through only the 1 bits. high() returns the high bit of the argument, or zero if the argument is zero.
inline int high(int n)
{
int k;
do {
k = n ^ (n - 1);
n &= ~k;
} while (n);
return (k + 1) >> 1;
}
inline int drop_high(int n)
{
return n ^ high(n);
}
unsigned char remove_most_significant_bit(unsigned char b)
{
int bit;
for(bit = 0; bit < 8; bit++)
{
unsigned char mask = (0x80 >> bit);
if( mask & b) return b & ~mask;
}
return b;
}
void remove_most_significant_bit_from_buffer(unsigned char* b, int length)
{
int i;
for(i=0; i<length;i++)
{
b[i] = remove_most_significant_bit(b[i]);
}
}
void test_it()
{
unsigned char data[8];
int i;
for(i = 0; i < 8; i++)
{
data[i] = (1 << i) + i;
}
for(i = 0; i < 8; i++)
{
printf("%d\r\n", data[i]);
}
remove_most_significant_bit_from_buffer(data, 8);
for(i = 0; i < 8; i++)
{
printf("%d\r\n", data[i]);
}
}
I won't go through your entire answer to provide your reworked code, but removing the most significant bit is easy. This comes from the fact that the most significant bit can easily be found by using log base 2 converted to an integer.
#include <stdio.h>
#include <math.h>
int RemoveMSB(int a)
{
return a ^ (1 << (int)log2(a));
}
int main(int argc, char const *argv[])
{
int a = 4387;
printf("MSB of %d is %d\n", a, (int)log2(a));
a = RemoveMSB(a);
printf("MSB of %d is %d\n", a, (int)log2(a));
return 0;
}
Output:
MSB of 4387 is 12
MSB of 291 is 8
As such, 4387 in binary is 1000100100011 with a most significant bit at 12.
Likewise, 291 in binary is 0000100100011 with a most significant bit at 8.

How do I get bit-by-bit data from an integer value in C?

I want to extract bits of a decimal number.
For example, 7 is binary 0111, and I want to get 0 1 1 1 all bits stored in bool. How can I do so?
OK, a loop is not a good option, can I do something else for this?
If you want the k-th bit of n, then do
(n & ( 1 << k )) >> k
Here we create a mask, apply the mask to n, and then right shift the masked value to get just the bit we want. We could write it out more fully as:
int mask = 1 << k;
int masked_n = n & mask;
int thebit = masked_n >> k;
You can read more about bit-masking here.
Here is a program:
#include <stdio.h>
#include <stdlib.h>
int *get_bits(int n, int bitswanted){
int *bits = malloc(sizeof(int) * bitswanted);
int k;
for(k=0; k<bitswanted; k++){
int mask = 1 << k;
int masked_n = n & mask;
int thebit = masked_n >> k;
bits[k] = thebit;
}
return bits;
}
int main(){
int n=7;
int bitswanted = 5;
int *bits = get_bits(n, bitswanted);
printf("%d = ", n);
int i;
for(i=bitswanted-1; i>=0;i--){
printf("%d ", bits[i]);
}
printf("\n");
}
As requested, I decided to extend my comment on forefinger's answer to a full-fledged answer. Although his answer is correct, it is needlessly complex. Furthermore all current answers use signed ints to represent the values. This is dangerous, as right-shifting of negative values is implementation-defined (i.e. not portable) and left-shifting can lead to undefined behavior (see this question).
By right-shifting the desired bit into the least significant bit position, masking can be done with 1. No need to compute a new mask value for each bit.
(n >> k) & 1
As a complete program, computing (and subsequently printing) an array of single bit values:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char** argv)
{
unsigned
input = 0b0111u,
n_bits = 4u,
*bits = (unsigned*)malloc(sizeof(unsigned) * n_bits),
bit = 0;
for(bit = 0; bit < n_bits; ++bit)
bits[bit] = (input >> bit) & 1;
for(bit = n_bits; bit--;)
printf("%u", bits[bit]);
printf("\n");
free(bits);
}
Assuming that you want to calculate all bits as in this case, and not a specific one, the loop can be further changed to
for(bit = 0; bit < n_bits; ++bit, input >>= 1)
bits[bit] = input & 1;
This modifies input in place and thereby allows the use of a constant width, single-bit shift, which may be more efficient on some architectures.
Here's one way to do it—there are many others:
bool b[4];
int v = 7; // number to dissect
for (int j = 0; j < 4; ++j)
b [j] = 0 != (v & (1 << j));
It is hard to understand why use of a loop is not desired, but it is easy enough to unroll the loop:
bool b[4];
int v = 7; // number to dissect
b [0] = 0 != (v & (1 << 0));
b [1] = 0 != (v & (1 << 1));
b [2] = 0 != (v & (1 << 2));
b [3] = 0 != (v & (1 << 3));
Or evaluating constant expressions in the last four statements:
b [0] = 0 != (v & 1);
b [1] = 0 != (v & 2);
b [2] = 0 != (v & 4);
b [3] = 0 != (v & 8);
Here's a very simple way to do it;
int main()
{
int s=7,l=1;
vector <bool> v;
v.clear();
while (l <= 4)
{
v.push_back(s%2);
s /= 2;
l++;
}
for (l=(v.size()-1); l >= 0; l--)
{
cout<<v[l]<<" ";
}
return 0;
}
Using std::bitset
int value = 123;
std::bitset<sizeof(int)> bits(value);
std::cout <<bits.to_string();
#prateek thank you for your help. I rewrote the function with comments for use in a program. Increase 8 for more bits (up to 32 for an integer).
std::vector <bool> bits_from_int (int integer) // discern which bits of PLC codes are true
{
std::vector <bool> bool_bits;
// continously divide the integer by 2, if there is no remainder, the bit is 1, else it's 0
for (int i = 0; i < 8; i++)
{
bool_bits.push_back (integer%2); // remainder of dividing by 2
integer /= 2; // integer equals itself divided by 2
}
return bool_bits;
}
#include <stdio.h>
int main(void)
{
int number = 7; /* signed */
int vbool[8 * sizeof(int)];
int i;
for (i = 0; i < 8 * sizeof(int); i++)
{
vbool[i] = number<<i < 0;
printf("%d", vbool[i]);
}
return 0;
}
If you don't want any loops, you'll have to write it out:
#include <stdio.h>
#include <stdbool.h>
int main(void)
{
int num = 7;
#if 0
bool arr[4] = { (num&1) ?true: false, (num&2) ?true: false, (num&4) ?true: false, (num&8) ?true: false };
#else
#define BTB(v,i) ((v) & (1u << (i))) ? true : false
bool arr[4] = { BTB(num,0), BTB(num,1), BTB(num,2), BTB(num,3)};
#undef BTB
#endif
printf("%d %d %d %d\n", arr[3], arr[2], arr[1], arr[0]);
return 0;
}
As demonstrated here, this also works in an initializer.

Resources