I implemented a fast-power algorithm to perform the operation (x^y) % m.
Since m is very large (4294434817), I used long long to store the result. However, long long still seems not enough during the operation. For example, I got a negative number for (3623752876 * 3623752876) % 4294434817.
Is there anyway to figure it out?
All three of those constants are between 231 and 232.
The type unsigned long long is guaranteed to be able to store values up to at least 264-1, which exceeds the product 3623752876 * 3623752876.
So just use unsigned long long for the calculation. long long is wide enough to hold the individual constants, but not the product.
You could also use uint64_t, defined in <stdint.h>. Unlike unsigned long long, it's guaranteed to be exactly 64 bits wide. Since you don't really need an exact width of 64 bits (128-bit arithmetic would work just as well), uint_least64_t or uint_fast64_t is probably more suitable. But unsigned long long is arguably simpler, and in this case it will work correctly. (uint64_t is not guaranteed to exist, though on any C99 or later implementation it almost certainly will.)
For larger values, including intermediate results, you'll likely need to use something wider than unsigned long long, which is likely to require some kind of multi-precision arithmetic. The GNU GMP library is one possibility. Another is to use a language that has built-in support for arbitrary-width integer arithmetic (such as Python).
This answer is based on the calculation (x * x) % y although the question is not entirely clear.
Use uint64_t because although unsigned int is large enough to hold the operands, and the result, it won't hold the product.
#include <stdio.h>
#include <stdint.h>
int main(void)
{
unsigned x = 3623752876;
unsigned m = 4294434817;
uint64_t r;
r = ((uint64_t)x * x) % m;
printf("%u\n", (unsigned)r);
return 0;
}
Program output:
3896043471
We can use the power of modulus arithmetic to do such calculations. The fundamental property of multiplication in modulus arithmetic for two numbers a and b states:
(a*b)%m = ((a%m)*(b%m))%m
If m fits into the long long int type, then the above calculation will never overflow.
Since you are trying to do modular exponentiation, you can read more about it here.
Related
unsigned long hash(char *str)
{
unsigned long hash = 5381;
int c;
while ((c = *str++))
hash = ((hash << 5) + hash) + c; /* hash * 33 + c */
return hash % NUM_BUCKETS;
}
With this code, when you put in the function 20 letters(such as zzzzzzzzzzzzzzzzzzzzzzzzzzzz) you get an output of a huge number. how does the long hold the numbers if it is restricted to only 32 bits?
You should first check that an unsigned long is 32 bits. If you're getting values above about (roughly) 4.2 billion, it's almost certainly wider than that(a).
You can check this by compiling and running the following program:
#include <limits.h>
#include <stdio.h>
int main(void) {
printf("%d\n%zu\n", CHAR_BIT, sizeof(unsigned long));
return 0;
}
The first value is the number of bits in a byte, the second the number of bytes in an unsigned long. Multiplying the two will therefore give you the number of bits in the unsigned long type.
On my system, I get 8 and 8, indicating a 64-bit size.
(a) The ISO C standard does not mandate an exact size for the original types found in C (though it may for things like uint32_t). In fact it doesn't directly even mandate the number of bits at all.
What is does mandate is the minimum range requirements which, for unsigned long is 0..4294967295 (the 4.2 billion I mentioned before).
However, an implementation is free to provide you with something larger, such as a 128-bit type, which would give you a range from zero up to about 1038, or a hundred million million million million million million.
As an aside, I could have used billions, trillions, or even quadrillions but:
there's sometimes disagreement as to that actual powers of ten they represent; and
the use of many "million" suffixes drives home the size more than a single rarely knows suffix like "undecillion" or "sextillion".
unsigned long is at least 32 bits, but it can be larger. It's a 64-bit type with most compilers on most 64-bit processors except under Windows. So a function returning unsigned long can return a value that's larger than 232.
However, the function you show is guaranteed to return a number in the range from 0 to NUM_BUCKETS inclusive. If you see a value that's larger than NUM_BUCKETS, what you're seeing is not a value returned by this function. Maybe there's a bug in your code. Make sure that you've enabled all reasonable warnings on your compiler and that you've resolved them correctly (not by blindly adding a cast). If you still don't understand your program's output, use a debugger and inspect intermediate values. If you still don't understand what your program is doing, you can ask online, with complete code that reproduces the problem.
Original question
I have a piece of code here:
unsigned long int a =100000;
int a =100000UL;
Do the above two lines represent the same thing?
Revised question
#include <stdio.h>
int main(void)
{
long int x=50000*1024000;
printf("%ld\n",x);
return 0;
}
For a long int, my compiler uses 8 bytes, so the max range is (2^63-1). So here 50000*1024000 results in something which is definitely less than the max range of long int So why does my compiler warn of overflow and give the wrong output?
Original question
The two definitions are not the same.
The types of the variables are different — unsigned long versus (signed) int. The behaviour of these types is quite different because of the difference in signedness. They also may have quite different ranges of valid values.
Technically, the numeric constants are different too; the first is a (signed) int unless int cannot hold the value 100,000, in which case it will be (signed) long instead. That will be converted to unsigned long and assigned to the first a. The other constant is an unsigned long value because of the UL integer suffix, and will be converted to int using the normal rules. If int cannot hold the value 100,000, the normal conversion rules will apply. It is legitimate, though very unusual these days, for sizeof(int) == 2 * sizeof(CHAR_BIT) where CHAR_BIT is 8 — so int is a 16-bit signed type. This is normally treated as a short and normally int is a 32-bit signed type, but the standard does not rule out the alternative.
Most likely, the two variants of a both end up holding the value 100,000, but they are not the same because of the difference in signedness.
Revised question
The arithmetic is done in terms of the two operands of the * operator, and those are 50000 and 1024000. Each of those fits in a 32-bit int, so the calculation is done as int — and the result would be 51200000000, but that requires at least 36 bits to represent the value, so you have 32-bit arithmetic overflow, and the result is undefined behaviour.
After the arithmetic is complete, the int result is converted to 64-bit long — not before.
The compiler is correct to warn, and because you invoked undefined behaviour, anything that is printed is 'correct'.
To fix the code, you can write:
#include <stdio.h>
int main(void)
{
long x = 50000L * 1024000L;
printf("%ld\n", x);
return 0;
}
Strictly, you only need one of the two L suffixes, but symmetry suggests using both. You could use one or two (long) casts instead if you prefer. You can save on spaces too, if you wish, but they help the readability of the code.
The long int and int are not necessarily the same, but they might be. Unsigned and signed are not the same thing. Numerical constants can represent the same value without being the same thing, as in 100000 and 100000UL (the former being a signed int, the latter being unsigned long)
With my compiler, c is 54464 (16 bits truncated) and d is 10176.
But with gcc, c is 120000 and d is 600000.
What is the true behavior? Is the behavior undefined? Or is my compiler false?
unsigned short a = 60000;
unsigned short b = 60000;
unsigned long c = a + b;
unsigned long d = a * 10;
Is there an option to alert on these cases?
Wconversion warns on:
void foo(unsigned long a);
foo(a+b);
but doesn't warn on:
unsigned long c = a + b
First, you should know that in C the standard types do not have a specific precision (number of representable values) for the standard integer types. It only requires a minimal precision for each type. These result in the following typical bit sizes, the standard allows for more complex representations:
char: 8 bits
short: 16 bits
int: 16 (!) bits
long: 32 bits
long long (since C99): 64 bits
Note: The actual limits (which imply a certain precision) of an implementation are given in limits.h.
Second, the type an operation is performed is determined by the types of the operands, not the type of the left side of an assignment (becaus assignments are also just expressions). For this the types given above are sorted by conversion rank. Operands with smaller rank than int are converted to int first. For other operands, the one with smaller rank is converted to the type of the other operand. These are the usual arithmetic conversions.
Your implementation seems to use 16 bit unsigned int with the same size as unsigned short, so a and b are converted to unsigned int, the operation is performed with 16 bit. For unsigned, the operation is performed modulo 65536 (2 to the power of 16) - this is called wrap-around (this is not required for signed types!). The result is then converted to unsigned long and assigned to the variables.
For gcc, I assume this compiles for a PC or a 32 bit CPU. for this(unsigned) int has typically 32 bits, while (unsigned) long has at least 32 bits (required). So, there is no wrap around for the operations.
Note: For the PC, the operands are converted to int, not unsigned int. This because int can already represent all values of unsigned short; unsigned int is not required. This can result in unexpected (actually: implementation defined) behaviour if the result of the operation overflows an signed int!
If you need types of defined size, see stdint.h (since C99) for uint16_t, uint32_t. These are typedefs to types with the appropriate size for your implementation.
You can also cast one of the operands (not the whole expression!) to the type of the result:
unsigned long c = (unsigned long)a + b;
or, using types of known size:
#include <stdint.h>
...
uint16_t a = 60000, b = 60000;
uint32_t c = (uint32_t)a + b;
Note that due to the conversion rules, casting one operand is sufficient.
Update (thanks to #chux):
The cast shown above works without problems. However, if a has a larger conversion rank than the typecast, this might truncate its value to the smaller type. While this can be easily avoided as all types are known at compile-time (static typing), an alternative is to multiply with 1 of the wanted type:
unsigned long c = ((unsigned long)1U * a) + b
This way the larger rank of the type given in the cast or a (or b) is used. The multiplication will be eliminated by any reasonable compiler.
Another approach, avoiding to even know the target type name can be done with the typeof() gcc extension:
unsigned long c;
... many lines of code
c = ((typeof(c))1U * a) + b
a + b will be computed as an unsigned int (the fact that it is assigned to an unsigned long is not relevant). The C standard mandates that this sum will wrap around modulo "one plus the largest unsigned possible". On your system, it looks like an unsigned int is 16 bit, so the result is computed modulo 65536.
On the other system, it looks like int and unsigned int are larger, and therefore capable of holding the larger numbers. What happens now is quite subtle (acknowledge #PascalCuoq): Beacuse all values of unsigned short are representable in int, a + b will be computed as an int. (Only if short and int are the same width or, in some other way, some values of unsigned short cannot be represented as int will the sum will be computed as unsigned int.)
Although the C standard does not specify a fixed size for either an unsigned short or an unsigned int, your program behaviour is well-defined. Note that this is not true for an signed type though.
As a final remark, you can use the sized types uint16_t, uint32_t etc. which, if supported by your compiler, are guaranteed to have the specified size.
In C the types char, short (and their unsigned couterparts) and float should be considered to be as "storage" types because they're designed to optimize the storage but are not the "native" size that the CPU prefers and they are never used for computations.
For example when you have two char values and place them in an expression they are first converted to int, then the operation is performed. The reason is that the CPU works better with int. The same happens for float that is always implicitly converted to a double for computations.
In your code the computation a+b is a sum of two unsigned integers; in C there's no way of computing the sum of two unsigned shorts... what you can do is store the final result in an unsigned short that, thanks to the properties of modulo math, will be the same.
Why are constants in C terminated by use of L or UL etc. For example
unsigned long x = 12345678UL;
My question is what is the significance of this and are there any advantages in doing this.
because any number such as 12345 is treated as an integer in C. The problem comes when you try to do bitwise operations on them. Then, it can overflow.
This can have serious untracable errors and bugs. To avoid that, when a larger constant number is to be assigned to a (unsigned)long variable, UL and L are used.
UL is to tell the compiler to treat the integer token as an unsigned long rather than int
L is to tell the compiler to treat the integer token as long rather than int.
The suffix of an integer constant forces a minimum integer type, but the compiler will choose a large one (consistent with some constraints on signedness) if the number cannot be represented in it (see C11 6.4.4.1, in particular the table after §5).
If all you do is use the constant to initialize a variable, you don't need any suffix (except for the edge case of a number that is in range of unsigned long long, but not long long - in that case, any of the unsigned suffixes u, ul or ull as well as octal or hexadecimal representation can be used - decimal integer constants without suffix only promote to signed types).
Suffixes become important if you use the constants in more complex expressions because they will determine the result of it, eg
32u << 30
has type unsigned and will truncate the value, whereas
32ull << 30
won't.
I saw the question which is commented. But my question is different. I want to know the reason behind doing this.
The reason is overflow:
#include <stdio.h>
int main(void)
{
unsigned long a = 1U << 32; // (1U = unsigned int)
unsigned long b = 1UL << 32; // (1UL = unsigned long int)
printf("%lu %lu\n", a, b);
return 0;
}
On my computer int is 32 bits and long is 64, this is the output:
0 4294967296
This is what happens:
a 000000000000000000000000000000000000001 1U
^--------------------------------------- << 32 left shift count >= width of type
b 000000000000000000000000000000000000000000000000000000000000000000000000001 1UL
^---------------------------------- << 32
The reason is that C is very badly designed, and will forcibly convert numbers that are too large into an int for no reason whatsoever, unless the programmer hasn't explicitly told the compiler not to be incredibly stupid via. various "numerical constant suffixes" (like UL).
Note that these "numerical constant suffixes" only really push the problem to larger numbers while also causing additional portability problems. For a simple example, consider uint64_t myNumber = 0xFDECBA9876543210UL; (which will break when compiled on a compiler where long int is only 32-bits).
For another example; consider uint128_t myNumber = 0xFDECBA9876543210FDECBA9876543210ULL; which is broken on virtually all compilers (where long long is 64-bit or smaller), and there is no way to do it correctly (as there simply isn't any "128-bit suffix" that can be used).
I like to initialize my variables to some "dummy" value and have started to use int64_t and uint64_t. So far, it looks like there are at least three ways I could initialize an int64_t to a particular value (and with slight changes for the unsigned equivalent):
int64_t method_one = 0;
int64_t method_two = 0LL;
int64_t method_three = INT64_C(0);
I use GCC and target OS X and Linux. I'd like to pick a method that aims for ease of portability and clarity — but correctness, above all. Am I overthinking this, or is there a "best" or "most recommended" approach for initializing this variable type, for any particular value I throw at it (which is within its bounds, of course)?
int64_t method_one = 0;
...is perfectly reasonable. C99 (see e.g. draft here; yes, I know it's not the most recent standard any more, but it's the one that introduced the int<N>_t types) says that:
the 0 has type int (§6.4.4.1 para.5);
the type of the expression is int64_t (§6.5.16 para.3);
the type of the right-hand side will be converted to the type of the expression (§6.5.16.1 para.2);
this conversion will not change the value (§6.3.1.3 para.1).
So there's nothing wrong with that at all, and the lack of additional clutter makes it the most readable of the options when initialising to 0 or anything else in the range of an int.
int64_t method_two = 0LL;
int64_t is not guaranteed to be the same as long long; however, this should in fact work portably for any signed 64-bit value as well (and similarly ULL for unsigned 64-bit values): long long (and unsigned long long) should be at least 64 bits in a C99-compliant implementation (§5.2.4.2.1), so LL (and ULL) should always be safe for initialising 64-bit values.
int64_t method_three = INT64_C(0);
This is arguably a better option for values which may be outside the range of an int, as it expresses the intent more clearly: INT64_C(n) will expand to something appropriate for any n in (at least) a 64-bit range (see §7.18 in general, and particularly §7.8.4.1).
In practice, I might well use any of the above, depending on context. For example:
uint64_t counter = 0;
(Why add unnecessary clutter?)
uint64_t some_bit = 1ULL << 40;
(1 << 40 simply won't work unless int is unusually wide; and UINT64_C(1) << 40 seems less readable to me here.)
uint64_t some_mask = UINT64_C(0xFF00FF00FF00FF00);
(In this case, explicitly calling out the value as a 64-bit constant seems more readable to me than writing 0xFF00FF00FF00FF00ULL.)
Personnally, I would use the third, which is the most portable way to achieve this.
#include <stdint.h>
int64_t method_three = INT64_C(0);
uint64_t method_three = UINT64_C(0);
Anyway, I don't think it's a very important thing.
According to the ANSI C standard, the suffix for a long long int and unsigned long long int is LL and ULL respectively:
octal or hexadecimal suffixed by ll or LL long long int, unsigned
long long int decimal, octal, or hexadecimal suffixed by both u or U,
and ll or LL unsigned long long int
If you know that int64_t is defined as:
typedef signed long long int int64_t
Then method two is most definitely the correct one:
int64_t method_two = 0LL;
uint64_t method_two = 0ULL;
Edit:
Keeping in mind the portability issues, and the fact that it's not guaranteed to be defined as long long, then it would be better to use the third method:
INT64_C()
UINT64_C()