C array comparison - c

Is the defacto method for comparing arrays (in C) to use memcmp from string.h?
I want to compare arrays of ints and doubles in my unit tests
I am unsure whether to use something like:
double a[] = {1.0, 2.0, 3.0};
double b[] = {1.0, 2.0, 3.0};
size_t n = 3;
if (! memcmp(a, b, n * sizeof(double)))
/* arrays equal */
or to write a bespoke is_array_equal(a, b, n) type function?

memcmp would do an exact comparison, which is seldom a good idea for floats, and would not follow the rule that NaN != NaN. For sorting, that's fine, but for other purposes, you might to do an approximate comparison such as:
bool dbl_array_eq(double const *x, double const *y, size_t n, double eps)
{
for (size_t i=0; i<n; i++)
if (fabs(x[i] - y[i]) > eps)
return false;
return true;
}

Using memcmp is not generally a good idea. Let's start with the more complex and work down from there.
Though you mentioned int and double, I first want to concentrate on memcmp as a general solution, such as to compare arrays of type:
struct {
char c;
// 1
int i;
// 2
}
The main problem there is that implementations are free to add padding to structures at locations 1 and 2, making a bytewise comparison potentially false even though the important bits match perfectly.
Now down to doubles. You might think this was better as there's no padding there. However there are other problems.
The first is the treatment of NaN values. IEEE754 goes out of its way to ensure that NaN is not equal to any other value, including itself. For example, the code:
#include <stdio.h>
#include <string.h>
int main (void) {
double d1 = 0.0 / 0.0, d2 = d1;
if (d1 == d2)
puts ("Okay");
else
puts ("Bad");
if (memcmp (&d1, &d2, sizeof(double)) == 0)
puts ("Okay");
else puts
("Bad");
return 0;
}
will output
Bad
Okay
illustrating the difference.
The second is the treatment of plus and minus zero. These should be considered equal for the purposes of comparison but, as the bit patterns are different, memcmp will say they are different.
Changing the declaration/initialisation of d1 and d2 in the above code to:
double d1 = 0.0, d2 = -d1;
will make this clear.
So, if structures and doubles are problematic, surely integers are okay. After all, they're always two's complement, yes?
No, actually they're not. ISO mandates one of three encoding schemes for signed integers and the other two (ones' complements and sign/magnitude) suffer from a similar problem as doubles, that fact that both plus and minus zero exist.
So, while they should possibly be considered equal, again the bit patterns are different.
Even for unsigned integers, you have a problem (it's also a problem for signed values as well). ISO states that these representations can have value bits and padding bits, and that the values of the padding bits are unspecified.
So, even for what may seem the simplest case, memcmp can be a bad idea.

Replace memset with memcmp in your code, and it works.
In your case (as the size both arrays arrays are identical and known during compilation) you can even do:
memcmp(a, b, sizeof(a));

The function you're looking for is memcmp, not memset. See the answers to this question for why it might not be a good idea to memcmp an array of doubles though.

memcmp compares two blocks of memory for number of size given
memset is used to initialise buffer with a value for size given
buffers can be compared without using memcmp in following way. same can be changed for different datatypes.
int8_t array_1[] = { 1, 2, 3, 4 }
int8_t array_2[] = { 1, 2, 3, 4 }
uint8_t i;
uint8_t compare_result = 1;
for (i = 0; i < (sizeof(array_1)/sizeof(int8_t); i++)
{
if (array_1[i] != array_2[i])
{
compare_result = 0;
break;
}
}

Related

Why does a high-value input prevent an array from using the actual input value in C?

I'm making a function that takes a value using scanf_s and converts that into a binary value. The function works perfectly... until I put in a really high value.
I'm also doing this on VS 2019 in x64 in C
And in case it matters, I'm using
main(int argc, char* argv[])
for the main function.
Since I'm not sure what on earth is happening, here's the whole code I guess.
BinaryGet()
{
// Declaring lots of stuff
int x, y, z, d, b, c;
int counter = 0;
int doubler = 1;
int getb;
int binarray[2000] = { 0 };
// I only have to change things to 1 now, am't I smart?
int binappend[2000] = { 0 };
// Get number
printf("Gimme a number\n");
scanf_s("%d", &getb);
// Because why not
printf("\n");
// Get the amount of binary places to be used (how many times getb divides by 2)
x = getb;
while (x > 1)
{
d = x;
counter += 1;
// Tried x /= 2, gave me infinity loop ;(
x = d / 2;
}
// Fill the array with binary values (i.e. 1, 2, 4, 8, 16, 32, etc)
for (b = 1; b <= counter; b++)
{
binarray[b] = doubler * 2;
doubler *= 2;
}
// Compare the value of getb to binary values, subtract and repeat until getb = 0)
c = getb;
for (y = counter; c >= 1; y--)
{
// Printing c at each subtraction
printf("\n%d\n", c);
// If the value of c (a temp variable) compares right to the binary value, subtract that binary value
// and put a 1 in that spot in binappend, the 1 and 0 list
if (c >= binarray[y])
{
c -= binarray[y];
binappend[y] += 1;
}
// Prevents buffer under? runs
if (y <= 0)
{
break;
}
}
// Print the result
for (z = 0; z <= counter; z++)
{
printf("%d", binappend[z]);
}
}
The problem is that when I put in the value 999999999999999999 (18 digits) it just prints 0 once and ends the function. The value of the digits doesn't matter though, 18 ones will have the same result.
However, when I put in 17 digits, it gives me this:
99999999999999999
// This is the input value after each subtraction
1569325055
495583231
495583231
227147775
92930047
25821183
25821183
9043967
655359
655359
655359
655359
131071
131071
131071
65535
32767
16383
8191
4095
2047
1023
511
255
127
63
31
15
7
3
1
// This is the binary
1111111111111111100100011011101
The binary value it gives me is 31 digits. I thought that it was weird that at 32, a convenient number, it gimps out, so I put in the value of the 32nd binary place minus 1 (2,147,483,647) and it worked. But adding 1 to that gives me 0.
Changing the type of array (unsigned int and long) didn't change this. Neither did changing the value in the brackets of the arrays. I tried searching to see if it's a limit of scanf_s, but found nothing.
I know for sure (I think) it's not the arrays, but probably something dumb I'm doing with the function. Can anyone help please? I'll give you a long-distance high five.
The problem is indeed related to the power-of-two size of the number you've noticed, but it's in this call:
scanf_s("%d", &getb);
The %d argument means it is reading into a signed integer, which on your platform is probably 32 bits, and since it's signed it means it can go up to 2³¹-1 in the positive direction.
The conversion specifiers used by scanf() and related functions can accept larger sizes of data types though. For example %ld will accept a long int, and %lld will accept a long long int. Check the data type sizes for your platform, because a long int and an int might actually be the same size (32 bits) eg. on Windows.
So if you use %lld instead, you should be able to read larger numbers, up to the range of a long long int, but make sure you change the target (getb) to match! Also if you're not interested in negative numbers, let the type system help you out and use an unsigned type: %llu for an unsigned long long.
Some details:
If scanf or its friends fail, the value in getb is indeterminate ie. uninitialised, and reading from it is undefined behaviour (UB). UB is an extremely common source of bugs in C, and you want to avoid it. Make sure your code only reads from getb if scanf tells you it worked.
In fact, in general it is not possible to avoid UB with scanf unless you're in complete control of the input (eg. you wrote it out previously with some other, bug free, software). While you can check the return value of scanf and related functions (it will return the number of fields it converts), its behaviour is undefined if, say, a field is too large to fit into the data type you have for it.
There's a lot more detail on scanf etc. here.
To avoid problems with not knowing what size an int is, or if a long int is different on this platform or that, there is also the header stdint.h which defines integer types of a specific width eg. int64_t. These also have macros for use with scanf() like SCNd64. These are available from C99 onwards, but note that Windows' support of C99 in its compilers is incomplete and may not include this.
Don't be so hard on yourself, you're not dumb, C is a hard language to master and doesn't follow modern idioms that have developed since it was first designed.

How to compare long doubles with qsort and with regard to NaN?

How to compare long doubles with qsort() and with regard to not-a-number?
When sorting an array that might contain not-a-numbers, I would like to put all the those NAN to one end of the sorted array.
qsort() imposes some restriction on the compare function.
The function shall return an integer less than, equal to, or
greater than zero if the first argument is considered to be respectively less than, equal to, or greater than the second.
C11dr §7.22.5.2 3
When the same objects ... are passed more than once to the comparison function, the results shall be consistent with one another. That is, for qsort they shall define a total ordering on the array, ... the same object shall always compare the same way with the key.
§7.22.5 4
a > b is false when a <= b or if a is not-a-number or if b is not-a-number. So a > b is not the same as !(a <= b) as they have opposite results if one of them is NaN.
If the compare function uses return (a > b) - (a < b);, code would return 0 if one or both a or b are NaN. The array would not sort as desired and it loses the total ordering requirement.
The long double aspect of this sort is important when using the classify functions like int isnan(real-floating x); or int isfinite(real-floating x);. I know isfinite( finite_long_double_more_than_DBL_MAX) might return false. So I have concerns about what isnan(some_long_double) might do something unexpected.
I tried the below. It apparently sorts as desired.
Sub-question: Is compare() below sufficient to sort as desired? Any recommended simplifications? If not - how to fix?
(For this task, it is OK for values like 0.0L and -0.0L to sort in any way)
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <float.h>
int compare(const void *a, const void *b) {
const long double *fa = (const long double *) a;
const long double *fb = (const long double *) b;
if (*fa > *fb) return 1;
if (*fa < *fb) return -1;
if (*fa == *fb) {
//return -memcmp(fa, fb, sizeof *fa); if -0.0, 0.0 order important.
return 0;
}
// At least one of *fa or *fb is NaN
// is *fa a non-NaN?
if (!isnan(*fa)) return -1;
if (!isnan(*fb)) return 1;
// both NaN
return 0;
// return -memcmp(fa, fb, tbd size); if NaN order important.
}
int main(void) {
long double x[] = { 0.0L / 0.0, 0.0L / 0.0, 0.0, 1.0L / 0.0, -0.0, LDBL_MIN,
LDBL_MAX, 42.0, -1.0L / 0.0, 867-5309, -0.0 };
x[0] = -x[0];
printf("unsorted: ");
size_t n = sizeof x / sizeof x[0];
for (size_t i = 0; i < n; i++) {
printf("%.3Le,", x[i]);
}
printf("\nsorted: ");
qsort(x, n, sizeof x[0], compare);
for (size_t i = 0; i < n; i++) {
printf("%.3Le,", x[i]);
}
puts("");
}
Output
unsorted: nan,-nan,0.000e+00,inf,-0.000e+00,3.362e-4932,1.190e+4932,4.200e+01,-inf,-4.442e+03,-0.000e+00,
sorted: -inf,-4.442e+03,-0.000e+00,0.000e+00,-0.000e+00,3.362e-4932,4.200e+01,1.190e+4932,inf,nan,-nan,
If I knew the compare function was correct, I'd post on Code Review for improvement ideas. Yet I am not confident enough that code works correctly with those pesky NaNs.
This is just a simple reordering of your tests, but it makes the status of NaN more clear if you will.
int compare(const void *a, const void *b)
{
const long double fa = *(const long double *) a;
const long double fb = *(const long double *) b;
if (isnan(fa))
{
if (isnan(fb))
{
return 0;
}
return 1;
}
if (isnan(fb))
{
return -1;
}
if (fa > fb) return 1;
if (fa < fb) return -1;
/* no more comparisons needed */
return 0;
}
As the tests for NaN are at the top and no NaNs should pass through, the bottom three lines can safely be replaced with your
return (a > b) - (a < b);
Apart from the discussion of the different types of NaN (a bit sounding like how many angels can dance on a CPU core), this ought to be stable enough for your purposes, and I can't see any possible issues with this code.
With Clang, neither -ffast-math nor -fdenormal-fp-math=[ieee|preserve-sign|positive-zero] yields other results. Nor did gcc with -ffast-math,
-funsafe-math-optimizations, and even -ffinite-math-only (the latter most likely because there are no operations other than a straight compare to NaN).
Just to be complete, I tested with both std::numeric_limits<double>::signaling_NaN(); and std::numeric_limits<double>::quiet_NaN(); (from C++ <limits.h>) as well – again, no difference in the sort order.
The NaN test
int isnan(real-floating x);
The isnan macro determines whether its argument value is a NaN. First, an argument represented in a format wider than its semantic type is converted to its semantic type. Then determination is based on the type of the argument.235
235 For the isnan macro, the type for determination does not matter unless the implementation supports NaNs in the evaluation type but not in the semantic type.
isnan(some_long_double) will work as hoped except on a rare platform.
int isunordered(real-floating x, real-floating y) acts like isnan() expect it accounts for both arguments.
On many platforms, code could use (a == a) as a candidate NaN test as that evaluates to 0 when a is NaN and 1 otherwise. Unfortunately, unless an implementation defines __STDC_IEC_559__, that is not certain to work.
The compare
>=, >, <, <= and C11 7.12.14 Comparison macros
Using >=, >, <, <= when at least one operand is a NaN can result in a "invalid" floating-point exception. So prior testing for NaN is prudent as answered by #usr2564301
C offers macros isgreaterequal(), isgreaterequal(), isless(), islessthna() which do the compare and not raise the "invalid" floating-point
exception. This is good alternative for double, yet the macros use a real-floating which may differ from long double. isgreater(long_double_a, long_double_a) may evaluate as double and not provide the desired compare result.
The challenge with classify macros is that the semantic type may be narrower than long double.
The following uses the above ideas and as I read the C spec is well defined and functionally correct for all cases except the rare one: when long double has NaN but not the real-floating (often double) does not.
#include <math.h>
// compare 2 long double. All NaN are greater than numbers.
int compare(const void *a, const void *b) {
const long double *fa = (const long double *) a;
const long double *fb = (const long double *) b;
if (!isunordered(*fa, *fb)) {
return (*fa > *fb) - (*fa < *fb);
}
if (!isnan(*fa)) {
return -1;
}
return isnan(*fb); // return 0 or 1
}
Note: After reading many of the good comments and learning a great deal, I am posting this self-answer as provided in Can I answer my own question? in addition to accepting another answer.

C - erroneous output after multiplication of large numbers

I'm implementing my own decrease-and-conquer method for an.
Here's the program:
#include <stdio.h>
#include <math.h>
#include <stdlib.h>
#include <time.h>
double dncpow(int a, int n)
{
double p = 1.0;
if(n != 0)
{
p = dncpow(a, n / 2);
p = p * p;
if(n % 2)
{
p = p * (double)a;
}
}
return p;
}
int main()
{
int a;
int n;
int a_upper = 10;
int n_upper = 50;
int times = 5;
time_t t;
srand(time(&t));
for(int i = 0; i < times; ++i)
{
a = rand() % a_upper;
n = rand() % n_upper;
printf("a = %d, n = %d\n", a, n);
printf("pow = %.0f\ndnc = %.0f\n\n", pow(a, n), dncpow(a, n));
}
return 0;
}
My code works for small values of a and n, but a mismatch in the output of pow() and dncpow() is observed for inputs such as:
a = 7, n = 39
pow = 909543680129861204865300750663680
dnc = 909543680129861348980488826519552
I'm pretty sure that the algorithm is correct, but dncpow() is giving me wrong answers.
Can someone please help me rectify this? Thanks in advance!
Simple as that, these numbers are too large for what your computer can represent exactly in a single variable. With a floating point type, there's an exponent stored separately and therefore it's still possible to represent a number near the real number, dropping the lowest bits of the mantissa.
Regarding this comment:
I'm getting similar outputs upon replacing 'double' with 'long long'. The latter is supposed to be stored exactly, isn't it?
If you call a function taking double, it won't magically operate on long long instead. Your value is simply converted to double and you'll just get the same result.
Even with a function handling long long (which has 64 bits on nowadays' typical platforms), you can't deal with such large numbers. 64 bits aren't enough to store them. With an unsigned integer type, they will just "wrap around" to 0 on overflow. With a signed integer type, the behavior of overflow is undefined (but still somewhat likely a wrap around). So you'll get some number that has absolutely nothing to do with your expected result. That's arguably worse than the result with a floating point type, that's just not precise.
For exact calculations on large numbers, the only way is to store them in an array (typically of unsigned integers like uintmax_t) and implement all the arithmetics yourself. That's a nice exercise, and a lot of work, especially when performance is of interest (the "naive" arithmetic algorithms are typically very inefficient).
For some real-life program, you won't reinvent the wheel here, as there are libraries for handling large numbers. The arguably best known is libgmp. Read the manuals there and use it.

Why a and b are not swapped in this code?

Here's the code:
#include <stdio.h>
union
{
unsigned u;
double d;
} a,b;
int main(void)
{
printf("Enter a, b:");
scanf("%lf %lf",&a.d,&b.d);
if(a.d>b.d)
{
a.u^=b.u^=a.u^=b.u;
}
printf("a=%g, b=%g\n",a.d,b.d);
return 0;
}
The a.u^=b.u^=a.u^=b.u; statement should have swapped a and b if a>b, but it seems that whatever I enter, the output will always be exactly my input.
a.u^=b.u^=a.u^=b.u; causes undefined behaviour by writing to a.u twice without a sequence point. See here for discussion of this code.
You could write:
unsigned tmp;
tmp = a.u;
a.u = b.u;
b.u = tmp;
which will swap a.u and b.u. However this may not achieve the goal of swapping the two doubles, if double is a larger type than unsigned on your system (a common scenario).
It's likely that double is 64 bits, while unsigned is only 32 bits. When you swap the unsigned members of the unions, you're only getting half of the doubles.
If you change d to float, or change u to unsigned long long, it will probably work, since they're likely to be the same size.
You're also causing UB by writing to the variables twice without a sequence point. The proper way to write the XOR swap is with multiple statements.
b.u ^= a.u;
a.u ^= b.u;
b.u ^= a.u;
For more about why not to use XOR for swapping, see Why don't people use xor swaps?
In usual environment, memory size of datatype 'unsigned' and 'double' are different.
That is why variables are not look like changed.
And you cannot using XOR swap on floating point variable.
because they are represented totally different in memory.

how to write floating value accurately to a bin file

I am trying to dump the floating point values from my program to a bin file. Since I can't use any stdlib function, I am thinking of writting it char by char to a big char array which I am dumping in my test application to a file.
It's like
float a=3132.000001;
I will be dumping this to a char array in 4 bytes.
Code example would be:-
if((a < 1.0) && (a > 1.0) || (a > -1.0 && a < 0.0))
a = a*1000000 // 6 bit fraction part.
Can you please help me writting this in a better way.
Assuming you plan to read it back into the same program on the same architecture (no endianness issues), just write the number out directly:
fwrite(&a, sizeof(a), 1, f);
or copy it with memcpy to your intermediate buffer:
memcpy(bufp, &a, sizeof(a));
bufp += sizeof(a);
If you have to deal with endianness issues, you could be sneaky. Cast the float to a long, and use htonl:
assert(sizeof(float) == sizeof(long)); // Just to be sure
long n = htonl(*(long*)&a);
memcpy(bufp, &n, sizeof(n));
bufp += sizeof(n);
Reading it back in:
assert(sizeof(float) == sizeof(long)); // Just to be sure
long n;
memcpy(&n, bufp, sizeof(n));
n = ntohl(n);
a = *(float*)n;
bufp += sizeof(n);
Use frexp.
int32_t exponent, mantissa;
mantissa = frexp( a, &exponent ) / FLT_EPSILON;
The sign is captured in the mantissa. This should handle denormals correctly, but not infinity or NaN.
Writing exponent and mantissa will necessarily take more than 4 bytes, since the implicit mantissa bit was made explicit. If you want to write the float as raw data, the question is not about floats at all but rather handling raw data and endianness.
On the other end, use ldexp.
If you could use the standard library, printf has a format specifier just for this: %a. But maybe you consider frexp to be standard library too. Not clear.
If you aren't worried about platform differences between the reader and the writer:
#include <stdlib.h>
#include <stdint.h>
#include <stdio.h>
...
union float_bytes {
float val;
uint8_t a[sizeof(float)]; // This type can be unsigned char if you don't have stdint.h
};
size_t float_write(FILE * outf, float f) {
union float_bytes fb = { .val = f };
return fwrite(fb.a, sizeof(float), outf);
}
There are shorter ways to turn a float into a byte array, but they involve more typecasting and are more difficult to read. Other methods of doing this probably do not make faster or smaller compiled code (though the union would make the debug code bigger).
If you are trying to store floats in a platform independent way then the easiest way to do it is to store it as a string (with lots of digits after the . ) . More difficult is to choose a floating point bit layout to use and convert all of your floats to/from that format as you read/write them. Probably just choose IEEE floating point at a certain width and a certain endian and stick with that.

Resources