Related
Beginner in C language.
I suspect it may be due to overflow, but could not solve this simple exercise:
program to compute the sum of squares of all the natural numbers smaller than 10000
I initially tried:
#include <stdio.h>
int main() {
int a = 10000;
int square(int num) {
return num * num;
};
int total = 0;
while (a > 0) {
a--;
total += square(a);
//printf("a is %d and square is %d and total %d \n", a, square(a), total );
};
printf("total is %d ", total );
return total;
}
result: total is -1724114088
and here there's the strange thing:
...
a is 9936 and square is 98724096 and total 2063522144
a is 9935 and square is 98704225 and total -2132740927
...
So I tried to change total to long, tried to change declaring square function as long square(int num ), but nothing changed.
Could you explain why the sum turns negative ?
Is it due to overflow ? But why not resetting to 0 or positive, instead of going negative ?
how can I know how many bits for int are there in a computer that I don't know (e.g. cloud ?
E.g. I am coding here: [https://www.programiz.com/c-programming/online-compiler/]
Which is best practice to fix it ?
Do not define function in functions.
int main() {
int square() { // NO!
Functions belong at file scope:
int square() { //OK
}
int main() { //OK
}
The code compiles because compilers have extensions to the language. It's not part of the C programming language.
Could you explain why the sum turns negative ?
See ex. why the value of sum is coming out to be negative? and other questions. The sum "wraps around" on your platform.
Is it due to overflow ?
Yes.
But why not resetting to 0 or positive, instead of going negative ?
Because systems nowadays are twos-complement, it's simpler to implement a single hardware instruction for adding numbers then two separate instructions with special overflow semantics. Unsigned and signed twos-complement numbers behave the same when doing operations on them, so instead of doing special semantics on overflow, when adding signed numbers they are added the same as they would be unsigned (bits are just added) and the result is then interpreted as a signed number (in a C program), which because the most significant bit becomes set the number becomes negative.
Anyway, compiler just does not care, because signed overflow is undefined behavior compiler does not have to care. The compiler just generates a hardware instruction for signed addition, which behaves as explained above.
how can I know how many bits for int are there in a computer that I don't know
You can check your compiler documentation.
But usually it's simpler to just compile a simple C program where you use CHAR_BIT - the number of bits in a byte - and sizeof(int) - the number of bytes in an int - and inspect the output of that program. For example, a program such as:
#include <stdio.h>
#include <limits.h>
int main() {
printf("There are %d bits in int\n", (int)sizeof(int) * CHAR_BIT);
}
Note that number of bits in types does not only change with platform and operating systems, it can change with compiler, compiler versions and also compilation options.
Which is best practice to fix it ?
This depends on what behavior do you want.
To calculate bigger values use a bigger datatype - long or long long. When the language features are not enough, move your program to use some big number library.
If you want to terminate the program in case of problems - you can check for overflow and call abort() or similar if it happens.
Instead, you could have used a formula.
Sum of Squares of first N natural numbers = (N * (N + 1) * (2 * N + 1) / 6
For now, let N be 10000.
Ignoring the 6 in the formula, the sum of squares is as big as 10^12. It will not fit in an integer. You should use a data type that can accommodate bigger values, like long or long long int.
Here's the modified code.
#include <stdio.h>
int main() {
int a = 10000;
int square(int num) {
return num * num;
};
// Change int to long long int
long long int total = 0;
while (a > 0) {
a--;
total += square(a);
//printf("a is %d and square is %d and total %d \n", a, square(a), total );
};
// Change %d to %lld
printf("total is %lld ", total );
return total;
}
You'll need to change all uses of int to long:
#include <stdio.h>
int main() {
long a = 10000;
long square(long num) {
return num * num;
};
long total = 0;
while (a > 0) {
a--;
total += square(a);
//printf("a is %ld and square is %ld and total %ld \n", a, square(a), total );
};
printf("total is %ld ", total );
return 0;
}
which prints total is 333283335000
EDIT
Or you could just change the total, the return type of square, and perform the appropriate casts when computing the squared values:
#include <stdio.h>
int main() {
int a = 10000;
long square(int num) {
return (long)num * (long)num;
};
long total = 0;
while (a > 0) {
a--;
total += square(a);
//printf("a is %ld and square is %ld and total %ld \n", a, square(a), total );
};
printf("total is %ld ", total );
return 0;
}
Produces the same result shown above.
onlinegdb here
I'm implementing my own decrease-and-conquer method for an.
Here's the program:
#include <stdio.h>
#include <math.h>
#include <stdlib.h>
#include <time.h>
double dncpow(int a, int n)
{
double p = 1.0;
if(n != 0)
{
p = dncpow(a, n / 2);
p = p * p;
if(n % 2)
{
p = p * (double)a;
}
}
return p;
}
int main()
{
int a;
int n;
int a_upper = 10;
int n_upper = 50;
int times = 5;
time_t t;
srand(time(&t));
for(int i = 0; i < times; ++i)
{
a = rand() % a_upper;
n = rand() % n_upper;
printf("a = %d, n = %d\n", a, n);
printf("pow = %.0f\ndnc = %.0f\n\n", pow(a, n), dncpow(a, n));
}
return 0;
}
My code works for small values of a and n, but a mismatch in the output of pow() and dncpow() is observed for inputs such as:
a = 7, n = 39
pow = 909543680129861204865300750663680
dnc = 909543680129861348980488826519552
I'm pretty sure that the algorithm is correct, but dncpow() is giving me wrong answers.
Can someone please help me rectify this? Thanks in advance!
Simple as that, these numbers are too large for what your computer can represent exactly in a single variable. With a floating point type, there's an exponent stored separately and therefore it's still possible to represent a number near the real number, dropping the lowest bits of the mantissa.
Regarding this comment:
I'm getting similar outputs upon replacing 'double' with 'long long'. The latter is supposed to be stored exactly, isn't it?
If you call a function taking double, it won't magically operate on long long instead. Your value is simply converted to double and you'll just get the same result.
Even with a function handling long long (which has 64 bits on nowadays' typical platforms), you can't deal with such large numbers. 64 bits aren't enough to store them. With an unsigned integer type, they will just "wrap around" to 0 on overflow. With a signed integer type, the behavior of overflow is undefined (but still somewhat likely a wrap around). So you'll get some number that has absolutely nothing to do with your expected result. That's arguably worse than the result with a floating point type, that's just not precise.
For exact calculations on large numbers, the only way is to store them in an array (typically of unsigned integers like uintmax_t) and implement all the arithmetics yourself. That's a nice exercise, and a lot of work, especially when performance is of interest (the "naive" arithmetic algorithms are typically very inefficient).
For some real-life program, you won't reinvent the wheel here, as there are libraries for handling large numbers. The arguably best known is libgmp. Read the manuals there and use it.
Given this simple random generator:
int i, r = 0;
for (i = 0; i < 50; i++) {
r = (1234 * r + 101) % (11000000);
printf("%d\n", r);
}
Surprisingly, I get negative values!
101
124735
10923091
192507
6553739
-7620565
-10842517
-10763989
-1860437
8188139
Isn't supposed to be positive values? Can someone explain this?
You get negative values because your program has integer arithmetics overflows. The behavior is actually undefined for signed type int. You should use a larger type to avoid this. Type unsigned long long is guaranteed to have at least 64 value bits, which is enough for the maximum intermediary result 1234 * 10999999 + 101.
int i;
unsigned long long r = 0;
for (i = 0; i < 50; i++) {
r = (1234 * r + 101) % 11000000;
printf("%llu\n", r);
}
rici commented that r does not need be a larger type since it's value is in range 0..10999999. This is not completely true as type int may be too small to handle such values. The range for int can be as small as -32767..32767.
Nevertheless, The intermediary computation must be performed with a larger type to avoid arithmetic overfow. Here is the corresponding code:
int i, r = 0; // assuming 32-bit ints
for (i = 0; i < 50; i++) {
r = (1234ULL * r + 101) % 11000000;
printf("%d\n", r);
}
As you've seen in other answers, this behavior is due to overflow.
If you want to be able to detect stuff like this earlier, use gcc or clang's Undefined Behavior Sanitizer (UBSan).
$ /opt/clang+llvm-4.0.0-armv7a-linux-gnueabihf/bin/clang -fsanitize=undefined don.c
$ ./a.out
don.c:8:18: runtime error: signed integer overflow: 1234 * 10923091 cannot be represented in type 'int'
don.c, line 8, column 18 is the multiplication in this line: r = (1234*r +101) % (11000000);.
You have to be careful, as your code is producing overflows, even if you do unsigned arithmetic.
Probably your int variable is a 32 bit integer which overflows after the number 2.147.483.647, and if you consider the worst case of your computation, you'll have 1.234*10.999.999 + 101 ==> 13.573.998.867, before calculating the modulus operation, and this will lead you to error.
The best thing you can do is to use 64 bit number for this kind of calculation not to overflow, with this sample code (you'll see different results even for your normal positive ones)
$ cat pru.c
#include <stdio.h>
#include <stdint.h>
int main()
{
uint64_t i, r=0;
for (i = 0; i < 50; i++) {
r = (1234*r +101) % (11000000);
printf("%llu\n", r);
}
}
which results in:
$ pru
101
124735
10923091
4094395
3483531
8677355
4856171
8515115
2652011
5581675
1787051
5221035
7757291
2497195
1538731
6794155
1987371
10415915
5239211
8186475
4110251
1049835
8496491
1669995
3773931
4030955
2198571
7036715
4306411
1111275
7313451
4798635
3515691
4362795
4689131
387755
5489771
9377515
10853611
6356075
396651
5467435
3814891
10575595
4284331
6864555
860971
6438315
2880811
1920875
This is correct, as 1.234*10.999.999 + 101 ==> 13.573.998.867 will never overflow a uint64_t number (this is the maximum result you can have) and will produce correct results.
int power(int first,int second) {
int counter1 = 0;
long ret = 1;
while (counter1 != second){
ret *= first;
counter1 += 1;
}
return ret;
}
int main(int argc,char **argv) {
long one = atol(argv[1]);
long two = atol(argv[2]);
char word[30];
long finally;
printf("What is the operation? 'power','factorial' or 'recfactorial'\n");
scanf("%20s",word);
if (strcmp("power",word) == 0){
finally = power(one,two);
printf("%ld\n",finally);
return 0;
}
}
This function is intended to do the "power of" operation like on the calculator, so if I write: ./a.out 5 3 it will give me 5 to the power of 3 and print out 125
The problem is, in cases where the numbers are like: ./a.out 20 10, 20 to the power of 10, I expect to see the result of: 1.024 x 10^13, but it instead outputs 797966336.
What is the cause of the current output I am getting?
Note: I assume that this has something to do with the atol() and long data types. Are these not big enough to store the information? If not, any idea how to make it run for bigger numbers?
Sure, your inputs are long, but your power function takes and returns int! Apparently, that's 32-bit on your system … so, on your system, 1.024×1013 is more than int can handle.
Make sure that you pick a type that's big enough for your data, and use it consistently. Even long may not be enough — check your system!
First and foremost, you need to change the return type and input parameter types of power() from int to long. Otherwise, on a system where long and int are having different size,
The input arguments may get truncated to int while you're passing long.
The returned value will be casted to int before returning, which can truncate the actual value.
After that, 1.024×1013 (10240000000000) cannot be held by an int or long (if 32 bits). You need to use a data type having more width, like long long.
one and two are long.
long one = atol(argv[1]);
long two = atol(argv[2]);
You call this function with them
int power(int first, int second);
But your function takes int, there is an implicit conversion here, and return int. So now, your long are int, that cause an undefined behaviour (see comments).
Quick answer:
The values of your power function get implicitly converted.
Change the function parameters to type other then int that can hold larger values, one possible type would be long.
The input value gets type converted and truncated to match the parameters of your function.
The result of the computation in the body of the function will be again converted to match the return type, in your case int: not able to handle the size of the values.
Note1: as noted by the more experienced members, there is a machine-specific issue, which is that your int type is not handling the usual size int is supposed to handle.
1. To make the answer complete
Code is mixing int, long and hoping for an answer the exceeds long range.
The answer is simply the result of trying to put 10 pounds of potatoes in a 5-pound sack.
... idea how to make it run for bigger numbers.
Use the widest integer available. Examples: uintmax_t, unsigned long long.
With C99 onward, normally the greatest representable integer will be UINTMAX_MAX.
#include <stdint.h>
uintmax_t power_a(long first, long second) {
long counter1 = 0;
uintmax_t ret = 1;
while (counter1 != second){ // number of iterations could be in the billions
ret *= first;
counter1 += 1;
}
return ret;
}
But let us avoid problematic behavior with negative numbers and improve the efficiency of the calculation from liner to exponential.
// return x raised to the y power
uintmax_t pow_jululu(unsigned long x, unsigned long y) {
uintmax_t z = 1;
uintmax_t base = x;
while (y) { // max number of iterations would bit width: e.g. 64
if (y & 1) {
z *= base;
}
y >>= 1;
base *= base;
}
return z;
}
int main(int argc,char **argv) {
assert(argc >= 3);
unsigned long one = strtoul(argv[1], 0, 10);
unsigned long two = strtoul(argv[2], 0, 10);
uintmax_t finally = pow_jululu(one,two);
printf("%ju\n",finally);
return 0;
}
This approach has limits too. 1) z *= base can mathematically overflow for calls like pow_jululu(2, 1000). 2) base*base may mathematically overflow in the uncommon situation where unsigned long is more than half the width of uintmax_t. 3) some other nuances too.
Resort to other types e.g.: long double, Arbitrary-precision arithmetic. This is likely beyond the scope of this simple task.
You could use a long long which is 8 bytes in length instead of the 4 byte length of long and int.
long long will provide you values between –9,223,372,036,854,775,808 to 9,223,372,036,854,775,807. This I think should just about cover every value you may encounter just now.
I do not understand why the sizeof operator is producing the following results:
sizeof( 2500000000 ) // => 8 (8 bytes).
... it returns 8, and when I do the following:
sizeof( 1250000000 * 2 ) // => 4 (4 bytes).
... it returns 4, rather than 8 (which is what I expected). Can someone clarify how sizeof determines the size of an expression (or data type) and why in my specific case this is occurring?
My best guess is that the sizeof operator is a compile-time operator.
Bounty Question: Is there a run time operator that can evaluate these expressions and produce my expected output (without casting)?
2500000000 doesn't fit in an int, so the compiler correctly interprets it as a long (or long long, or a type where it fits). 1250000000 does, and so does 2. The parameter to sizeof isn't evaluated, so the compiler can't possibly know that the multiplication doesn't fit in an int, and so returns the size of an int.
Also, even if the parameter was evaluated, you'd likely get an overflow (and undefined behavior), but probably still resulting in 4.
Here:
#include <iostream>
int main()
{
long long x = 1250000000 * 2;
std::cout << x;
}
can you guess the output? If you think it's 2500000000, you'd be wrong. The type of the expression 1250000000 * 2 is int, because the operands are int and int and multiplication isn't automagically promoted to a larger data type if it doesn't fit.
http://ideone.com/4Adf97
So here, gcc says it's -1794967296, but it's undefined behavior, so that could be any number. This number does fit into an int.
In addition, if you cast one of the operands to the expected type (much like you cast integers when dividing if you're looking for a non-integer result), you'll see this working:
#include <iostream>
int main()
{
long long x = (long long)1250000000 * 2;
std::cout << x;
}
yields the correct 2500000000.
[Edit: I did not notice, initially, that this was posted as both C and C++. I'm answering only with respect to C.]
Answering your followup question, "Is there anyway to determine the amount of memory allocated to an expression or variable at run time?": well, not exactly. The problem is that this is not a very well formed question.
"Expressions", in C-the-language (as opposed to some specific implementation), don't actually use any memory. (Specific implementations need some code and/or data memory to hold calculations, depending on how many results will fit into CPU registers and so on.) If an expression result is not stashed away in a variable, it simply vanishes (and the compiler can often omit the run-time code to calculate the never-saved result). The language doesn't give you a way to ask about something it doesn't assume exists, i.e., storage space for expressions.
Variables, on the other hand, do occupy storage (memory). The declaration for a variable tells the compiler how much storage to set aside. Except for C99's Variable Length Arrays, though, the storage required is determined purely at compile time, not at run time. This is why sizeof x is generally a constant-expression: the compiler can (and in fact must) determine the value of sizeof x at compile time.
C99's VLAs are a special exception to the rule:
void f(int n) {
char buf[n];
...
}
The storage required for buf is not (in general) something the compiler can find at compile time, so sizeof buf is not a compile-time constant. In this case, buf actually is allocated at run time and its size is only determined then. So sizeof buf is a runtime-computed expression.
For most cases, though, everything is sized up front, at compile time, and if an expression overflows at run-time, the behavior is undefined, implementation-defined, or well-defined depending on the type. Signed integer overflow, as in 2.5 billion multiplied by 2, when INT_MAX is just a little over 2.7 billion, results in "undefined behavior". Unsigned integers do modular arithmetic and thus allow you to calculate in GF(2k).
If you want to make sure some calculation cannot overflow, that's something you have to calculate yourself, at run time. This is a big part of what makes multiprecision libraries (like gmp) hard to write in C—it's usually a lot easier, as well as faster, to code big parts of that in assembly and take advantage of known properties of the CPU (like overflow flags, or double-wide result-register-pairs).
Luchian answered it already. Just for complete it..
C11 Standard states (C++ standard has similar lines) that the type of an integer literal with no suffix to designating the type is dertermined as follows:
From 6.4.4 Constants (C11 draft):
Semantics
4 The value of a decimal constant is computed base 10; that of an
octal constant, base 8; that of a hexadecimal constant, base 16. The
lexically first digit is the most significant.
5 The type of an integer constant is the first of the corresponding
list in which its value can be represented.
And the table is as follows:
Decimal Constant
int
int long int
long long int
Octal or Hexadecimal Constant
int
unsigned int
long int
unsigned long int
long long int
unsigned long long int
For Octal and Hexadecimal constants, even unsigned types are possible. So depending on your platform whichever in the above list (int or long int or long long int) fits first (in the order) will be the type of integer literal.
Another way to put the answer is to say that what is relevant to sizeof is not the value of the expression but it's type. sizeof returns the memory size for a type that can be provided either explicitely as a type or as an expression. In this case the compiler will compute this type at compile time without actually computing the expression (following known rules, for instance if you call a function, the resulting type is the type of the returned value).
As other poster stated there is an exception for variable length array (whose type size is only known at run time).
In other word you usually write things like sizeof(type) or sizeof expression where expression is an L-Value. Expression is almost never a complex computing (like the stupid example of calling a function above) : it would be useless anyway as it is not evaluated.
#include <stdio.h>
int main(){
struct Stype {
int a;
} svar;
printf("size=%d\n", sizeof(struct Stype));
printf("size=%d\n", sizeof svar);
printf("size=%d\n", sizeof svar.a);
printf("size=%d\n", sizeof(int));
}
Also notice that as sizeof is a language keyword, not a function parenthesis are not necessary before the trailing expression (we have the same kind of rule for return keyword).
For your follow-up question, there's no "operator", and there's no difference between the "compile time" size of an expression, and the "run time" size.
If you want to know if a given type can hold the result you're looking for, you can always try something like this:
#include <stdio.h>
#include <limits.h>
int main(void) {
int a = 1250000000;
int b = 2;
if ( (INT_MAX / (double) b) > a ) {
printf("int is big enough for %d * %d\n", a, b);
} else {
printf("int is not big enough for %d * %d\n", a, b);
}
if ( (LONG_MAX / (double) b) > a ) {
printf("long is big enough for %d * %d\n", a, b);
} else {
printf("long is not big enough for %d * %d\n", a, b);
}
return 0;
}
and a (slightly) more general solution, just for larks:
#include <stdlib.h>
#include <stdio.h>
#include <limits.h>
/* 'gssim' is 'get size of signed integral multiplication */
size_t gssim(long long a, long long b);
int same_sign(long long a, long long b);
int main(void) {
printf("size required for 127 * 1 is %zu\n", gssim(127, 1));
printf("size required for 128 * 1 is %zu\n", gssim(128, 1));
printf("size required for 129 * 1 is %zu\n", gssim(129, 1));
printf("size required for 127 * -1 is %zu\n", gssim(127, -1));
printf("size required for 128 * -1 is %zu\n", gssim(128, -1));
printf("size required for 129 * -1 is %zu\n", gssim(129, -1));
printf("size required for 32766 * 1 is %zu\n", gssim(32766, 1));
printf("size required for 32767 * 1 is %zu\n", gssim(32767, 1));
printf("size required for 32768 * 1 is %zu\n", gssim(32768, 1));
printf("size required for -32767 * 1 is %zu\n", gssim(-32767, 1));
printf("size required for -32768 * 1 is %zu\n", gssim(-32768, 1));
printf("size required for -32769 * 1 is %zu\n", gssim(-32769, 1));
printf("size required for 1000000000 * 2 is %zu\n", gssim(1000000000, 2));
printf("size required for 1250000000 * 2 is %zu\n", gssim(1250000000, 2));
return 0;
}
size_t gssim(long long a, long long b) {
size_t ret_size;
if ( same_sign(a, b) ) {
if ( (CHAR_MAX / (long double) b) >= a ) {
ret_size = 1;
} else if ( (SHRT_MAX / (long double) b) >= a ) {
ret_size = sizeof(short);
} else if ( (INT_MAX / (long double) b) >= a ) {
ret_size = sizeof(int);
} else if ( (LONG_MAX / (long double) b) >= a ) {
ret_size = sizeof(long);
} else if ( (LLONG_MAX / (long double) b) >= a ) {
ret_size = sizeof(long long);
} else {
ret_size = 0;
}
} else {
if ( (SCHAR_MIN / (long double) llabs(b)) <= -llabs(a) ) {
ret_size = 1;
} else if ( (SHRT_MIN / (long double) llabs(b)) <= -llabs(a) ) {
ret_size = sizeof(short);
} else if ( (INT_MIN / (long double) llabs(b)) <= -llabs(a) ) {
ret_size = sizeof(int);
} else if ( (LONG_MIN / (long double) llabs(b)) <= -llabs(a) ) {
ret_size = sizeof(long);
} else if ( (LLONG_MIN / (long double) llabs(b)) <= -llabs(a) ) {
ret_size = sizeof(long long);
} else {
ret_size = 0;
}
}
return ret_size;
}
int same_sign(long long a, long long b) {
if ( (a >= 0 && b >= 0) || (a <= 0 && b <= 0) ) {
return 1;
} else {
return 0;
}
}
which, on my system, outputs:
size required for 127 * 1 is 1
size required for 128 * 1 is 2
size required for 129 * 1 is 2
size required for 127 * -1 is 1
size required for 128 * -1 is 1
size required for 129 * -1 is 2
size required for 32766 * 1 is 2
size required for 32767 * 1 is 2
size required for 32768 * 1 is 4
size required for -32767 * 1 is 2
size required for -32768 * 1 is 2
size required for -32769 * 1 is 4
size required for 1000000000 * 2 is 4
size required for 1250000000 * 2 is 8
Yes, sizeof() doesn't calculate the memory required for the result of that multiplication.
In the second case both literals : 1250000000 and 2 each requires 4 bytes of memory, hence sizeof() returns 4. If one of the values was above 4294967295 (2^32 - 1), you would have got 8.
But i don't know how sizeof() returned 8 for 2500000000. It returns 4 on my VS2012 compiler
The C11 Draft is here: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf
You can find the Cx0 draft here: http://c0x.coding-guidelines.com/6.5.3.4.html
In both cases, section 6.5.3.4 is what you are looking for. Basically, your problem boils down to this:
// Example 1:
long long x = 2500000000;
int size = sizeof(x); // returns 8
// Example 2:
int x = 1250000000;
int y = 2;
int size = sizeof(x * y); // returns 4
In example 1, you have a long long (8 bytes), so it returns 8. In example 2, you have an int * int which returns an int, which is 4 bytes (so it returns 4).
To answer your bounty question: Yes and no. sizeof will not calculate the size needed for the operation you are trying to perform, but it will tell you the size of the results if you perform the operation with the proper labels:
long long x = 1250000000;
int y = 2;
int size = sizeof(x * y); // returns 8
// Alternatively
int size = sizeof(1250000000LL * 2); // returns 8
You have to tell it you are dealing with a large number or it will assume it is dealing with the smallest type it can (which in this case is int).
The most simple answer in one line is:
sizeof() is a function evaluated at COMPILE TIME who's input is a c type, the value of which is completely ignored
FURTHER DETAIL: ..therefore as 2500000000 is compiled it would have to be stored as a LONG as it is too long to fit in an int, therefore this argument is simply compiled as '(type) long'. However, 1250000000 and 2 both fit in type 'int' therefore that is the type passed to sizeof, since the resulting value is never stored as because the compiler simply is interested in the type, the multiplication is never evaluated.