Why is ```(n+1)(BASE-1)``` added when calculating the NMAX of Adler-32? - zlib

zlib/adler32.c
code is:
#define BASE 65521U /* largest prime smaller than 65536 */
#define NMAX 5552
/* NMAX is the largest n such that 255n(n+1)/2 + (n+1)(BASE-1) <= 2^32-1 */
but I think it should be calculated like this:
255n(n+1)/2 + n < 2^32 - 1
What is the purpose of adding (n+1)(BASE-1)?

Related

Uniform distribution in arc4random_uniform and PCG

Both arc4random_uniform from OpenBSD and the PCG library by Melissa O'Neill have a similar looking algorithm to generate a non biased unsigned integer value up to an excluding upper bound.
inline uint64_t
pcg_setseq_64_rxs_m_xs_64_boundedrand_r(struct pcg_state_setseq_64 * rng,
uint64_t bound) {
uint64_t threshold = -bound % bound;
for (;;) {
uint64_t r = pcg_setseq_64_rxs_m_xs_64_random_r(rng);
if (r >= threshold)
return r % bound;
}
}
Isn't -bound % bound always zero? If it's always zero then why have the loop and the if statement at all?
The OpenBSD has the same thing too.
uint32_t
arc4random_uniform(uint32_t upper_bound)
{
uint32_t r, min;
if (upper_bound < 2)
return 0;
/* 2**32 % x == (2**32 - x) % x */
min = -upper_bound % upper_bound;
/*
* This could theoretically loop forever but each retry has
* p > 0.5 (worst case, usually far better) of selecting a
* number inside the range we need, so it should rarely need
* to re-roll.
*/
for (;;) {
r = arc4random();
if (r >= min)
break;
}
return r % upper_bound;
}
Apple's version of arc4random_uniform has a different version of it.
u_int32_t
arc4random_uniform(u_int32_t upper_bound)
{
u_int32_t r, min;
if (upper_bound < 2)
return (0);
#if (ULONG_MAX > 0xffffffffUL)
min = 0x100000000UL % upper_bound;
#else
/* Calculate (2**32 % upper_bound) avoiding 64-bit math */
if (upper_bound > 0x80000000)
min = 1 + ~upper_bound; /* 2**32 - upper_bound */
else {
/* (2**32 - (x * 2)) % x == 2**32 % x when x <= 2**31 */
min = ((0xffffffff - (upper_bound * 2)) + 1) % upper_bound;
}
#endif
/*
* This could theoretically loop forever but each retry has
* p > 0.5 (worst case, usually far better) of selecting a
* number inside the range we need, so it should rarely need
* to re-roll.
*/
for (;;) {
r = arc4random();
if (r >= min)
break;
}
return (r % upper_bound);
}
Because bound is a uint64_t, -bound is evaluated modulo 264. The result is 264−bound, not −bound.
Then -bound % bound calculates the residue of 264−bound modulo bound. This equals the residue of 264 modulo bound.
By setting threshold to this and rejecting numbers that are less than threshold, the routine reduces the accepted interval to 264−threshold numbers. The result is an interval that has a number of numbers that is a multiple of bound.
From a number r selected in that interval, the routine returns r % bound. Due to the trimming of the interval, there are an equal number of occurrences of each residue, so the result has no bias for any residue over any other.

Is there any way to reduce time complexity of the function given below?

This is the function to calculate the factorial of very large numbers. This works perfectly fine but the time complexity is quite high. How to reduce time complexity?
This function is called once.
Current time to find factorial 0f 1 million is 40 000ms;
Expected time: 10 000ms
static void calcfactorial(unsigned int n)
{
unsigned int carry, i, j;
len = factorial[0] = 1;
for (i = 1; i < LEN; i++)
factorial[i] = 0;
for (i = 2; i <= n; i++)
{
carry = 0;
for (j = 0; j < len; j++)
{
factorial[j] = factorial[j] * i + carry;
carry = factorial[j] / 10;
factorial[j] = factorial[j] % 10;
}
while (carry)
{
factorial[len++] = carry % 10;
carry = carry / 10;
}
}
}
Since you only need a four-fold improvement in time, the following may suffice:
Use a wider (unsigned) integer type, such as uint64_t.
Instead of calculating in base ten, use the largest power of ten, B, such that B•N fits in the integer type1, where N is the number you are computing the factorial of. For example, for 64-bit integers and 1,000,000!, you could use base 1013.
When doing multiplications, do not multiply by every digit in the product array, as the loop for (j = 0; j < len; j++) does. All digits beyond the first start as zero, and they slowly become non-zero as work progresses. Track the highest non-zero digit, and do multiplications only up to that digit, until the product carries into the next digit.
Similarly, the low digits become zero as the work progresses, due to accumulating factors of the base in the factorial. Track the lowest non-zero digit, and start work there.
A program demonstrating these is below.
A significant cost in this program is the divisions by the base. If you switch to a power-of-two base, these become bitwise operations (shifts for division and bitwise AND operations for remainders), which are much cheaper. This should speed up computing the factorial considerably. However, the final product will have to be converted to decimal for output. That will have a lower cost than computing entirely in decimal, so it is a win.
After that, you might consider this answer in Computer Science Stack Exchange. It suggests restructuring the factorial as powers of primes and using repeated squaring to compute the powers of primes, which are then multiplied.
This answer suggests using n! ≈ sqrt(2πn)•(n/e)n, which would require more sophisticated mathematics and programming.
Footnote
1 The purpose of using a power of ten is then the result can be directly printed from its base-B digits.
Demonstration
#include <inttypes.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
/* Define T to be an unsigned integer type. The larger the better up to the
widest type supported efficiently (generally the widest type for which the
processor has arithmetic instructions).
Define PRIuT to be a printf conversion specifier to print a type T as an
unsigned decimal numeral.
*/
typedef uint64_t T;
#define PRIuT PRIu64
// Return the base-10 logarithm of x, rounded down.
static unsigned ilog10(T x)
{
unsigned log = 0;
for (; 10 <= x; x /= 10)
++log;
return log;
}
// Return 10**x.
static T iexp10(unsigned x)
{
T power = 1;
for (; 0 < x; --x)
power *= 10;
return power;
}
int main(void)
{
// Set the value we want the factorial of.
static const T N = 1000000;
// Set the maximum value of T by using wrapping.
static const T MaximumT = -1;
/* Determine the maximum number of decimal digits we can use easily:
Given a number with Digits decimal digits, Digits*N will be
representable in a T.
*/
unsigned Digits = ilog10(MaximumT/N);
/* Set Base to 10**Digits. This is the numerical base we will do
arithmetic in -- like base ten, but more efficient if it is bigger.
*/
T Base = iexp10(Digits);
/* Set an array size that is sufficient to contain N!
For 1 < N, N! < N**N, so the number of digits in N! is less than
log10(N**N) = N * log(10). Since we are using ilog10, which rounds
down, we add 1 to it to round up, ensuring we have enough room.
Then we divide that number of digits by the number of digits we will
have in each array element (and round up, by subtracting one before the
division and adding one after), and that is the number of array
elements we allocate.
*/
size_t S = (N * (ilog10(N)+1) - 1) / Digits + 1;
T *Product = malloc(S * sizeof *Product);
if (!Product)
{
fprintf(stderr,
"Error, unable to allocate %zu bytes.\n", S * sizeof *Product);
exit(EXIT_FAILURE);
}
/* Initialize the array to 1. L and H remember the index of the lowest
and highest non-zero array element, respectively. Since all the
elements before L or after H are zero, we do not need to use them in
the multiplication.
*/
Product[0] = 1;
size_t L = 0, H = 0;
// Multiply the product by the numbers from 2 to N.
for (T i = 2; i <= N; ++i)
{
// Start with no carry.
T carry = 0;
/* Multiply each significant base-Base digit by i, add the carry in,
and separate the carry out. We start separately with the lowest
non-zero element so we can track if it becomes zero.
*/
while (1)
{
T t = Product[L] * i + carry;
carry = t / Base;
if ((Product[L] = t % Base)) // Leave when digit is non-zero.
break;
++L; // If digit is zero, increase L.
}
for (size_t j = L+1; j <= H; ++j)
{
T t = Product[j] * i + carry;
carry = t / Base;
Product[j] = t % Base;
}
// If there is a final carry out, put it in a new significant digit.
if (0 != carry)
Product[++H] = carry;
}
/* Print the result. The first base-Base digit is printed with no
leading zeros. All subsequent base-Base digits are printed with
leading zeros as needed to ensure exactly Digit decimal digits are
printed.
*/
printf("%" PRIuT, Product[H]);
for (size_t j = H; 0 < j--;)
printf("%0*" PRIuT, Digits, Product[j]);
printf("\n");
free(Product);
}

How to generate 12 digit random number in C?

I'm trying to generate 12 digit random numbers in C, but it's always generating 10 digit numbers.
#include <stdio.h>
#include <time.h>
#include <stdlib.h>
void main()
{
srand(time(NULL));
long r = rand();
r = r*100;
printf("%ld",r);
}
rand() returns an int value in the range of [0...RAND_MAX]
Based on the C spec, RAND_MAX >= 32767 and RAND_MAX <= INT_MAX.
Call rand() multiple times to create a wide value
unsigned long long rand_atleast12digit(void) {
unsigned long long r = rand();
#if RAND_MAX >= 999999999999
#elif RAND_MAX >= 999999
r *= RAND_MAX + 1ull;
r += rand();
#else
r *= RAND_MAX + 1ull;
r += rand();
r *= RAND_MAX + 1ull;
r += rand();
#endif
return r;
}
The above returns a number if the range of 0 to at least 999,999,999,999. To reduce that to only that range, code could use return r % 1000000000000;.
Using % likely does not create an balanced distribution of random numbers. Other posts address details of how to cope with that like this good one incorporated as follows.
#if RAND_MAX >= 999999999999
#define R12DIGIT_DIVISOR (RAND_MAX/1000000000000)
#elif RAND_MAX >= 999999
#define RAND_MAX_P1 (RAND_MAX+1LLU)
#define R12DIGIT_DIVISOR ((RAND_MAX_P1*RAND_MAX_P1-1)/1000000000000)
#else
#define RAND_MAX_P1 (RAND_MAX+1LLU)
#define R12DIGIT_DIVISOR ((RAND_MAX_P1*RAND_MAX_P1*RAND_MAX_P1-1)/1000000000000)
#endif
unsigned long long rand_12digit(void) {
unsigned long long retval;
do {
retval = rand_atleast12digit() / R12DIGIT_DIVISOR;
} while (retval == 1000000000000);
return retval;
}
Note that the quality of rand() is not well defined, so repeated calls may not provide high quality results.
OP's code fails if long is 32-bit as it lacks range for a 12 decimal digit values. #Michael Walz
If long is wide enough, *100 will always make the least 2 decimal digits 00 - not very random. #Alexei Levenkov
long r = rand();
r = r*100;
The result of rand is int, which means you can't get a 12 digit number directly from it.
If you need value that is always 12 digits you need to make sure values fit in particular range.
Sample below assumes that you need just some of the numbers to be 12 digits - you just need 8 extra bits - so shifting and OR'ing results would produce number in 0x7fffffffff-0 range that would often result up to 12 digit output when printed as decimal:
r = rand();
r = (r << 8) | rand();
PS: Make sure the variable that will store the result is big enough to store the 12 digit number.
My simple way to generate random strings or numbers is :
static char *ws_generate_token(size_t length) {
static char charset[] = "1234567890"; // generate numbers only
//static char charset[] = "abcdefghijklmnopqrstuvwxyz1234567890"; to generate random string
char *randomString = NULL;
if (length) {
randomString = malloc(sizeof(char) * (length + 1));
if (randomString) {
for (int n = 0; n < length; n++) {
int key = rand() % (int)(sizeof(charset) -1);
randomString[n] = charset[key];
}
randomString[length] = '\0';
}
}
return randomString;
}
Explain the code
Create an array of chars which will contains (numbers, alphabets ...etc)
Generate a random number between [0, array length], let's name it X.
Get the character at random X position in the array of chars.
finally, add this character to the sequence of strings (or numbers) you want to have in return.
How to use it ?
#define TOKEN_LENGTH 12
char *token;
token = ws_generate_token(TOKEN_LENGTH);
conversion from string to int
int token_int = atol(token);
dont forget !
free(token); // free the memory when you finish
#include <stdio.h>
#include <stdlib.h>
int main()
{
int i, n;
time_t t;
n = 5;
/* Intializes random number generator int range */
srand((unsigned) time(&t));
/* Print 5 random numbers from 50 to back
for( i = 0 ; i < n ; i++ )
{
printf("%d\n", rand() % 50);
}
return(0);
}

How to generate a random number from whole range of int in C?

unsigned const number = minimum + (rand() % (maximum - minimum + 1))
I know how to (easily) generate a random number within a range such as from 0 to 100. But what about a random number from the full range of int (assume sizeof(int) == 4), that is from INT_MIN to INT_MAX, both inclusive?
I don't need this for cryptography or the like, but a approximately uniform distribution would be nice, and I need a lot of those numbers.
The approach I'm currently using is to generate 4 random numbers in the range from 0 to 255 (inclusive) and do some messy casting and bit manipulations. I wonder whether there's a better way.
On my system RAND_MAX is 32767 which is 15 bits. So for a 32-bit unsigned just call three times and shift, or, mask etc.
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
int main(void){
unsigned rando, i;
srand((unsigned)time(NULL));
for (i = 0; i < 3; i++) {
rando = ((unsigned)rand() << 17) | ((unsigned)rand() << 2) | ((unsigned)rand() & 3);
printf("%u\n", rando);
}
return 0;
}
Program output:
3294784390
3748022412
4088204778
For reference I'm adding what I've been using:
int random_int(void) {
assert(sizeof(unsigned int) == sizeof(int));
unsigned int accum = 0;
size_t i = 0;
for (; i < sizeof(int); ++i) {
i <<= 8;
i |= rand() & 0x100;
}
// Attention: Implementation defined!
return (int) accum;
}
But I like Weather Vane's solution better because it uses fewer rand() calls and thus makes more use of the (hopefully good) distribution generated by it.
We should be able to do something that works no matter what the range of rand() or what size result we're looking for just by accumulating enough bits to fill a given type:
// can be any unsigned type.
typedef uint32_t uint_type;
#define RAND_UINT_MAX ((uint_type) -1)
uint_type rand_uint(void)
{
// these are all constant and factor is likely a power of two.
// therefore, the compiler has enough information to unroll
// the loop and can use an immediate form shl in-place of mul.
uint_type factor = (uint_type) RAND_MAX + 1;
uint_type factor_to_k = 1;
uint_type cutoff = factor ? RAND_UINT_MAX / factor : 0;
uint_type result = 0;
while ( 1 ) {
result += rand() * factor_to_k;
if (factor_to_k <= cutoff)
factor_to_k *= factor;
else
return result;
}
}
Note: Makes the minimum number of calls to rand() necessary to populate all bits.
Let's verify this gives a uniform distribution.
At this point we could just cast the result of rand_uint() to type int and be done, but it's more useful to get output in a specified range. The problem is: How do we reach INT_MAX when the operands are of type int?
Well... We can't. We'll need to use a type with greater range:
int uniform_int_distribution(int min, int max)
{
// [0,1) -> [min,max]
double canonical = rand_uint() / (RAND_UINT_MAX + 1.0);
return floor(canonical * (1.0 + max - min) + min);
}
As a final note, it may be worthwhile to implement the random function in terms of type double instead, i.e., accumulate enough bits for DBL_MANT_DIG and return a result in the range [0,1). In fact this is what std::generate_canonical does.

how to find the amount of characters in unsigned long long

hi I have 2 question the first one is the one in the title and the other one is here:
does unsigned long long is the biggest integer (can hold the biggest amount of characters)?
cause I need an int that can hold few millions characters (digits) is this possible? I'm coding in C.
and this is connecting me to the other question how can I display the amount of digits on the screen? is it need to be like this?:
printf("%d", intName.length)
thanks every one!!
I am assuming that when you refer to the amount of characters you mean the number of digits in the number. If so then one this question has everything you need to know and included code similar to this
int numberOfDigits(unsigned long long n)
{
if (n == 0)
return 0;
return floor( log10( abs( n ) ) ) + 1;
}
as for holding a few million digits you probably want to look into using a library such as The GNU Multiple Precision Arithmetic Library which includes the function
size_t mpz_sizeinbase( const mpz_t op, int base )
which will tell you how many digits your number has.
printf("%llu", xxxxxxxxx);
the ll (el-el) long-long modifier with the u (unsigned) conversion
you can also use
uint64_t a;
uint32_t b;
But you need to inlcude inttypes.h library that gives you types such as int32_t, int64_t, uint64_t.
C99 provides intmax_t (and uintmax_t) which will be the largest supported integer type (typically 64 bit.)
Assuming you have a conforming C99 snprintf then you can get the number of digits with:
length = snprintf(NULL, 0, "%llu", value);
for an unsigned long long value (and %ju for uintmax_t.)
Otherwise you'll have to pass a buffer in (yuk) or do something manually like:
length = value < 10 ? 1 :
value < 100 ? 2 :
...
also yuk!
But this is all pretty irrelevant if you really do want million-digit integers, in which case you'll need to use a library such as gmp to work with such big numbers.
The maximal decimal length of an unsigned integer of a given type of bit length bitlen is given by 1 + floor(log10(2^bitlen-1)) (mathematically, without taking overflows and rounding errors into account). The approximation 1/log2(10) ~ 4004.0/13301 (obtained with continued fractions, see http://en.wikipedia.org/wiki/Continued_fraction) leads to the formula 1 + bitlen * 4004 / 13301 (computationally, i.e. the division rounds down). Mathematical details are given in the comments of the snippet below.
#include <limits.h>
#include <stdio.h>
/**
* Maximal number of digits in the decimal representation of an unsigned type.
*
* floor( log2(2^bitlen - 1) / log2(10) ) == floor( bitlen / log2(10) )
* otherwise an integer n would exist with
* log2(2^bitlen - 1) / log2(10) < n < bitlen / log2(10)
* log2(2^bitlen - 1) < n * log2(10) < bitlen
* 2^bitlen - 1 < 2^(n * log2(10)) < 2^bitlen
* 2^bitlen - 1 < (2^log2(10))^n < 2^bitlen
* 2^bitlen - 1 < 10^n < 2^bitlen
* which is impossible
*
* 1 / log2(10) ~ 0.301029995663981
* 4004 / 13301 ~ 0.30102999774453
*
* 1 + floor( log10(2^bitlen - 1) )
* == 1 + floor( log2(2^bitlen - 1) / log2(10) )
* == 1 + floor( bitlen / log2(10) )
* <= 1 + floor( bitlen * 4004.0 / 13301 )
* == 1 + bitlen * 4004 / 13301
* with equality for bitlen <= 13300 == 8 * 1662.5
*/
#define DECLEN(unsigned_t) (1 + CHAR_BIT*sizeof(unsigned_t) * 4004 / 13301)
int main(int argc, char *argv[]) {
printf("unsigned char : %zu\n", DECLEN(unsigned char));
printf("short unsigned : %zu\n", DECLEN(short unsigned));
printf("unsigned : %zu\n", DECLEN(unsigned));
printf("long unsigned : %zu\n", DECLEN(long unsigned));
printf("long long unsigned : %zu\n", DECLEN(long long unsigned));
return 0;
}

Resources