Comparing unsigned and signed int - c

I guess this is one of the classical questions.
As far as I know comparing unsigned and signed int are performed using unsigned arithmetic, which means that if length = -1 = unsigned max of 32 bits.
The code can be fixed by either declaring length to be an int, or by changing the test of the for loop to be i < length.
Declaring length to be an int, it's easy to understand, but changing the loop to be i < length not really easy.
If we have the following situation: 5 < -1 which if performed using unsigned arithmetic, in my computer yields 5 < 4294967295, how can this be a solution, it seems like it will access undefined elements.
Code
float sum_elements(float a[], unsigned length)
{
int i;
float result = 0;
for (i = 0; i <= length-1; i++)
result += a[i];
return result;
}

Consider the condition.
i <= length-1
As you mentioned, if length is zero then you will enter into a situation like 5 < 4294967295.
Changing the condition to "i < length" will prevent this.
Also changing type of variable "i" to "unsigned" makes sense because (a) it is array index. (b) you are comparing it with an "unsigned".
So I would prefer this code.
float sum_elements(float a[], unsigned length)
{
unsigned i = 0;
//float result = 0.0; //Refer comment section.
double result = 0.0;
for (i = 0; i < length; i++)
result += (double)a[i];
return result;
}

Option #1:
for (i = 0; i <= (int)length-1; i++)
Option #2:
for (i = 0; i+1 <= length; i++)
Option #3:
for (i = 0; i < length; i++)

It's your compilator job's, when he creates he's parser lexer, he uses a table for your variables. If he saw something like :
float a = b + 60
60 will be cast in 60.0 by your compilator.
I think this is the same thing here:
(unsigned int)length = (unsigned int)length (int)-1
becomes:
(unsigned int)length = (int)length (int)-1;
If you want a proper arithmetic comparison, you should use the flag -Wextra

A pedantic <= compare of and int <= unsigned would test for negative-ness first.
for (i = 0; i < 0 || ((unsigned) i) <= length-1; i++)
Removing the -1 helps to avoid overflow.
for (i = 0; i < 0 || ((unsigned) i) < length; i++)
A good compiler will likely optimize the code so 2 compares are not actually in the executable.
If -Wsign-conversion or its equivalent compiler option is not used, drop the cast for cleaner code #R..
for (i = 0; i < 0 || i < length; i++)
As well commented by #chqrlie the compare may perform well but subsequent operations on i may be a problem. In particular when i == INT_MAX, the i++ is UB.
Better to use size_t (an unsigned type) for array size computation and indexing.
float sum_elements(float a[], size_t length) {
float result = 0;
size_t i;
for (i = 0; i < length; i++)
result += a[i];
return result;
}

Your code will not perform as expected in 2 cases:
if length == 0, length - 1, computed using unsigned arithmetic, is a very large number and comparing i <= length - 1 will be always true because the comparison is also performed using unsigned arithmetics.
if length is larger than the maximum integer value, i can never reach such a value and although the comparison performed using unsigned arithmetic will work as expected, the indexing a[i] will be incorrect on 64-bit systems where the negative index will point outside the array.
The compiler correctly diagnoses a real problem. Using a signed type for i and comparing that to an unsigned length expression can lead to unexpected behavior. Correct the problem this way:
float sum_elements(float a[], unsigned length) {
double result = 0.0;
for (unsigned i = 0; i < length; i++) {
result += a[i];
}
return result;
}
Notes:
the types for length and i really should be size_t as this may be a larger type than unsigned.
the sum should be computed using double arithmetics, to achieve better precision than using float. Precision will be better, but still limited. Summing the array elements in a different order can produce a different result.

Lose the i variable, to save a little stack space and make the function faster.
float sum_elements(float a[], unsigned length)
{
float result = 0;
while (length--)
result += *a++;
return result;
}

Related

Conversion of string constant to numeric value using C

I have written a C program which uses two different algorithms to convert a string constant representing a numeric value to its integer value. For some reasons, the first algorithm, atoi(), doesn't execute properly on large values, while the second algorithm, atoi_imp(), works fine. Is this an optimization issue or some other error? The problem is that the first function makes the program's process to terminate with an error.
#include <stdio.h>
#include <string.h>
unsigned long long int atoi(const char[]);
unsigned long long int atoi_imp(const char[]);
int main(void) {
printf("%llu\n", atoi("9417820179"));
printf("%llu\n", atoi_imp("9417820179"));
return 0;
}
unsigned long long int atoi(const char str[]) {
unsigned long long int i, j, power, num = 0;
for (i = strlen(str) - 1; i >= 0; --i) {
power = 1;
for (j = 0; j < strlen(str) - i - 1; ++j) {
power *= 10;
}
num += (str[i] - '0') * power;
}
return num;
}
unsigned long long int atoi_imp(const char str[]) {
unsigned long long int i, num = 0;
for (i = 0; str[i] >= '0' && str[i] <= '9'; ++i) {
num = num * 10 + (str[i] - '0');
}
return num;
}
atoi is part of C standard library, with signature int atoi(const char *);.
You are declaring that a function with that name exists, but give it different return type. Note that in C, function name is the only thing that matters, and the toolchain can only trust what you tell in the source code. If you lie to the compiler, like here, all bets are off.
You should select different name for your own implementation to avoid issues.
As researched by #pmg, C standard (link to C99.7.1.3) says, using names from C standard library for your own global symbols (functions or global variables) is explicitly Undefined Behavior. Beware of nasal demons!
Ok there is at least one problem with your function atoi.
You are looping down on an unsigned value and check if its bigger equal zero, which should be an underflow.
The most easy fix is index shifting i.e.:
unsigned long long int my_atoi(const char str[]) {
unsigned long long int i, j, power, num = 0;
for (i = strlen(str); i != 0; --i) {
power = 1;
for (j = 0; j < strlen(str) - i; ++j) {
power *= 10;
}
num += (str[i-1] - '0') * power;
}
return num;
}
Too late, but may help. I did for base 10, in case you change the base you need to take care about how to compute the digit 0, in *p-'0'.
I would use the Horner's rule to compute the value.
#include <stdio.h>
void main(void)
{
char *a = "5363", *p = a;
int unsigned base = 10;
long unsigned x = 0;
while(*p) {
x*=base;
x+=(*p-'0');
p++;
}
printf("%lu\n", x);
}
Your function has an infinite loop: as i is unsigned, i >= 0 is always true.
It can be improved in different ways:
you should compute the length of str just once. strlen() is not cheap, it must scan the string until it finds the null terminator. The compiler is not always capable of optimizing away redundant calls for the same argument.
power could be computed incrementally, avoiding the need for a nested loop.
you should not use the name atoi as it is a standard function in the C library. Unless you implement its specification exactly and correctly, you should use a different name.
Here is a corrected and improved version:
unsigned long long int atoi_power(const char str[]) {
size_t i, len = strlen(str);
unsigned long long int power = 1, num = 0;
for (i = len; i-- > 0; ) {
num += (str[i] - '0') * power;
power *= 10;
}
return num;
}
Modified this way, the function should have a similar performance as the atoi_imp version. Note however that they do not implement the same semantics. atoi_pow must be given a string of digits, whereas atoi_imp can have trailing characters.
As a matter of fact neither atoi_imp nor atoi_pow implement the specification of atoi extended to handle larger unsigned integers:
atoi ignored any leading white space characters,
atoi accepts an optional sign, either '+' or '-'.
atoi consumes all following decimal digits, the behavior on overflow is undefined.
atoi ignores and trailing characters that are not decimal digits.
Given these semantics, the natural implementation or atoi is that of atoi_imp with extra tests. Note that even strtoull(), which you could use to implement your function handles white space and an optional sign, although the conversion of negative values may give surprising results.

Issue when trying to generate large dataset

I want to create large dataset of 60M elements each of dim = 256 (later will be written to a file). I wrote the following to do so:
#define N 60000000L
#define DIM 256
int main(){
int i,j;
double tmp, *data;
data = calloc(N*DIM, sizeof(double));
tmp = 1.0 / RAND_MAX;
for (i = 0; i < N*DIM; i++) data[i] = (double) rand() * tmp - 0.5;
// TODO: Save data buffer in file
free(data);
}
But I get this error:
main.c:111:19: warning: comparison of constant 7695802368 with expression of type 'int' is always
true [-Wtautological-constant-out-of-range-compare]
for (i = 0; i < N*DIM; i++) data[i] = (double) rand() * tmp - 0.5;
~ ^ ~~~~~~~~~~
1 warning generated.
Can someone hint me please why this issue exists and how to get rid of it? Thank you
The int type on your machine cannot hold a large enough number to ever be >= 7695802368. This means your int type is 32 bits. You'll need to use a 64 bit number, long long int or probably more appropriately, unsigned long long int.
Use size_t math with memory allocations and indexing.
It is the Goldilocks type for indexing/sizing, neither too narrow, nor too wide. Keep in mind that it is some unsigned type.
//#define N 60000000L
#define N ((size_t) 60000000u)
#define DIM 256
int main(){
size_t i;// use the best type for indexing
double tmp, *data;
size_t n = N*DIM;
data = calloc(n, sizeof *data); // de-ref the pointer,
if (data) { // check allocation
tmp = 1.0 / RAND_MAX;
for (i = 0; i < n; i++) {
data[i] = (double) rand() * tmp - 0.5;
The warning explains it for you: you're testing a signed integer (that probably maxes out at 2,147,483,647, and is clearly less than the value you're testing against, considering your compiler warning) to see if its value is less than a value that's over 7 billion: it will always be true. Did you maybe mean to do i < N?

Malloc() to create a new size for integers for use with math - Guidance needed

My goal is to create a integer type with a bigger size than 4 bytes, or 8 if I use long. I tried malloc to try and give more bytes in the memory for a bigger integer, but it still broke on the 31st iteration (gave a negative number). here's my code:
int main()
{
int x = 31; //(normally an int can do up to 30 without going negative so this is my test number)
int i;
int *bigNum = NULL;
bigNum = malloc((sizeof(int)*2));
*bigNum = 1;
for (i=0; i<x; i++) {
*bigNum = *bigNum * 2;
printf("%d \n", *bigNum);
}
free(bigNum);
}
Output:
2
4
...
..
...
1073741824
-2147483648
Although you have allocated more memory for your integer, no other part of the system knows this, including:
the compiler doesn't know this;
the CPU chip doesn't know this.
printf doesn't know this.
So all calculations are just carried out using the native int size.
Note that you can't tell the CPU chip you use larger integers; it is a physical/design limitation of the chip.
Dereferencing an int * gives you an int no matter how much extra memory you allocate for it.
If you want a dat type able to hold more information, try a long (although the guarantee is that it will be at least as big as an int).
If you want to handle integers beyond what your implementation provides, use a bignum library, like MPIR.
goal is to create a integer type with a bigger size
To handle multi-int integers, code also needs supporting functions for each basic operation:
int main(void) {
int x = 31;
RandBigNum *bigNum = RandBigNum_Init();
RandBigNum_Assign_int(bigNum, 1);
for (int i=0; i<x; i++) {
RandBigNum_Muliply_int(bigNum, 2);
RandBigNum_Print(bigNum);
printf(" \n");
}
Now, how might implement all this? Many approaches.
Below is a simply, incomplete and untested one. It is not necessarily a good approach, but to present an initial idea of the details needed to accomplish a big number library.
// Numbers are all positive. The first array element is the size of the number
typedef unsigned RandBigNum;
#define RandBigNum_MAXP1 (UINT_MAX + 1ull)
RandBigNum *RandBigNum_Init(void) {
return calloc(1, sizeof *RandBigNum);
}
void RandBigNum_Muliply_int(RandBigNum *x, unsigned scale) {
unsigned carry = 0;
for (unsigned i = 1; i <= x[0]; i++) {
unsigned long long product = 1ull * x[i] * scale + carry;
x[i] = product % RandBigNum_MAXP1;
carry *= product / RandBigNum_MAXP1;
}
if (carry) {
unsigned n = x[0] + 2;
x = realloc(x, sizeof *x * n); // re-alloc check omitted
x[x[0]] = carry;
x[0]++;
}
}
// many other functions

Two's complement and loss of information in C

I want do the two's complement of a float data.
unsigned long Temperature ;
Temperature = (~(unsigned long)(564.48))+1;
But the problem is that the cast loses information, 564 instead of 564.48.
Can i do the two's complement without a loss of information?
That is a very weird thing to do; floating-point numbers are not stored as 2s complement, so it doesn't make a lot of sense.
Anyway, you can perhaps use the good old union trick:
union {
float real;
unsigned long integer;
} tmp = { 564.48 };
tmp.integer = ~tmp.integer + 1;
printf("I got %f\n", tmp.real);
When I tried it (on ideone) it printed:
I got -0.007412
Note that this relies on unspecified behavior, so it's possible it might break if your compiler does not implement the access in the most straight-forward manner. This is distinct form undefined behavior (which would make the code invalid), but still not optimal. Someone did tell me that newer standards make it clearer, but I've not found an exact reference so ... consider yourself warned.
You can't use ~ over floats (it must be an integer type):
#include <stdio.h>
void print_binary(size_t const size, void const * const ptr)
{
unsigned char *b = (unsigned char *) ptr;
unsigned char byte;
int i, j;
for (i = size - 1; i >= 0; i--) {
for (j = 7; j >= 0; j--) {
byte = b[i] & (1 << j);
byte >>= j;
printf("%u", byte);
}
}
printf("\n");
}
int main(void)
{
float f = 564.48f;
char *p = (char *)&f;
size_t i;
print_binary(sizeof(f), &f);
for (i = 0; i < sizeof(float); i++) {
p[i] = ~p[i];
}
print_binary(sizeof(f), &f);
f += 1.f;
return 0;
}
Output:
01000100000011010001111010111000
10111011111100101110000101000111
Of course print_binary is there for test the result, remove it, and (as pointed out by #barakmanos) print_binary assumes little endian, the rest of the code is not affected by endiannes:
#include <stdio.h>
int main(void)
{
float f = 564.48f;
char *p = (char *)&f;
size_t i;
for (i = 0; i < sizeof(float); i++) {
p[i] = ~p[i];
}
f += 1.f;
return 0;
}
Casting a floating-point value to an integer value changes the "bit contents" of that value.
In order to perform two's complement on the "bit contents" of a floating-point value:
float f = 564.48f;
unsigned long Temperature = ~*(unsigned long*)&f+1;
Make sure that sizeof(long) == sizeof(float), or use double instead of float.

comparison between signed and unsigned integer expressions [-Wsign-compare] warning

for ( i= 0; i < sizeof(r)/sizeof(r[0]); ++i ){
r[i]= 0;
}
So this is the for loop I'm having troubles with, how can I rewrite it so I don't get the warning:
comparison between signed and unsigned integer expressions [-Wsign-compare]
sizeof() returns an unsigned integer of type size_t. So use an index of the same type.
size_t i;
for (i = 0; i < sizeof(r)/sizeof(r[0]); ++i) {
r[i] = 0;
}
Recommend to not use int size = sizeof(r)/sizeof(r[0]);. The range of size_t may greatly exceed the positive range of int. The assignment could then lose significant bits.
size_t is the type best used to index array variables. Remember, though, since it is some unsigned integer, it can not represent negative indexes.
In your code:
for ( i= 0; i < sizeof(r)/sizeof(r[0]); ++i ){
r[i]= 0;
}
I think the "i" is declared as an int, try "unsigned int i;" like this:
for (unsigned int i = 0; i < sizeof(r)/sizeof(r[0]); ++i ){
r[i]= 0;
}
Run your code and it should remove this warning for sure.

Resources