Implementing simple type inference - c

How do I implement basic type inference, nothing fancy just for inferring if the given value is an integer, double, or float. For instance, if I had a token for each type WHOLE_NUMBER, FLOAT_NUMBER, DOUBLE_NUMBER, and I had an expression like 4f + 2 + 5f, how would I deduce what type that is? My current idea was to just use the first type as the inferred type, so that would be a float. However, this doesn't work in most cases. What would I have to do?

My current idea was to just use the first type as the inferred type
No. Usually, the expression's type is that of its "widest" term. If it contains a double, then it's a double. If not but contains a float, then it's a float. If it has only integers then it is integer...
This applies to each parenthesized sub-expression.
Unless you make an explicit cast.
In your example above, there are 2 floats and an int, so it is a float. The compiler should warn you though, as any implicit conversion it has to make may result in a loss of data.

The way I would do it would be to cast into the most "accurate" or specific type. For example, if you add a bunch of integers together, the result can always be represented by an integer. The moment a floating-point value is included in the expression, the result must be a float, as the result of the calculation might be fractional due to the floating-point term in the addition.
Similarly, if there are any doubles in the expression, the answer must be a double, as down-casting to a float might result in loss of precision. So, the steps required to infer the type are:
Does the expression contain any doubles? If so, the result is a double - cast any integers or floats to double as appropriate. If not...
Does the expression contain any floats? If so, the result is a float - case any integers to float as appropriate. If not...
The result is an integer, as the expression is entirely in terms of integers.
Different programming languages handle these sorts of situations differently, and it might be appropriate to add compiler warnings in situations where these automatic casts could cause a precision error. In general, make sure the behaviour of your compiler/interpreter is well-defined and predictable, such that any developer needing alternate behaviour can (and knows when to) use explicit casts if they need to preserve the accuracy of a calculation.

Related

Use cast or macro for C integer constant expressions

When declaring or using in an artithmetic expression an integer
constant expression in C of a type that’s defined in
stdint.h,
for instance uint64_t, one could cast the integer to the desired
type (uint64_t)x, or use a Macro for Integer Constant
Expressions
such as UINT64_C(x) (where x is an integer constant expression).
I’m more enclined to use the macro, however I’m wondering in what
cases the two approaches are equivalent, differ, and what could go
wrong. More precisely: is there a case where using one would lead to
a bug, but not with the other?
Thanks!
More precisely: is there a case where using one would lead to a bug, but not with the other?
Yes, there are such cases, though they are rather contrived. Unary operators such as cast operators have high precedence, but all the postfix operators have higher. Of those, the indexing operator, [], can be applied to an integer constant when the expression inside is a pointer to a complete type. Thus, given this declaration in scope:
int a[4] = { 1, 2, 3, 4 };
... the expression (uint64_t) 1[a] evaluates to a uint64_t with value 2, whereas the expression UINT64_C(1)[a] evaluates to an int with value 2. The type difference can cause different behavior to manifest. That can arise from different implicit conversion behavior, which is generally a subtle effect, or if these are used as control expressions for a generic selection then the overall expression can evaluate to wildly different things depending on which variation you use.
However, I think there is no practical difference if you put the cast expression in parentheses: ((uint64_t) 1).
An esoteric but possible case: If the system has no 64-bit type (e.g. 32-bit int, 128-bit long), then (uint64_t)1 will fail to compile, whereas UINT64_C(1) will give something of the smallest unsigned integer type larger than 64 bits.
The macro forms are likely to expand to suffixes rather than casts. But I can't think of any other situation where a conforming program would behave differently (other than the syntax precedence issue of course).
If the program is non-conforming then there are various possibilities, e.g. UINT64_C(-1) is undefined behaviour no diagnostic required, as is UINT8_C(256). The macro argument must be an unsuffixed integer constant that is in range for the target type.
Keep it simple and make everything clear and obvious to the reader. I.e. avoid the preprocessor as much as possible, and only introduce a cast where absolutely necessary.

What does the C standard mean by "converted to its semantic type" for the floating-point macros?

I'll quote from N1570, but the C11 standard has similar wording:
The fpclassify macro classifies its argument value as NaN, infinite, normal,
subnormal, zero, or into another implementation-defined category. First, an argument
represented in a format wider than its semantic type is converted to its semantic type.
Then classification is based on the type of the argument.
(my emphasis)
And a footnote:
Since an expression can be evaluated with more range and precision than its type has, it is important to
know the type that classification is based on. For example, a normal long double value might
become subnormal when converted to double, and zero when converted to float.
What does it mean for the argument to be "converted to its semantic type". There is no definition of "semantic type" anywhere evident.
My understanding is that that any excess precision is removed, as if storing the expression's value to a variable of float, double or long double, resulting in a value of the precision the programmer expected. In which case, using fpclassify() and friends on an lvalue would result in no conversion necessary for a non-optimising compiler. Am I correct, or are these functions much less useful than advertised to be?
(This question arises from comments to a Code Review answer)
The semantic type is simply the type of the expression as described elsewhere in the C standard, disregarding the fact that the value is permitted to be represented with excess precision and range. Equivalently, the semantic type is the type of the expression if clause 5.2.4.2.2 paragraph 9 (which says that floating-point values may be evaluated with excess range and precision) were not in the standard.
Converting an argument to its semantic type means discarding the excess precision and range (by rounding the value to the semantic type using whatever rounding rule is in effect for the operation).
Regarding your hypothesis that applying fpclassify to an lvalue does not require any conversion (because the value stored in an object designated by an lvalue must have already been converted to its semantic type when it was assigned), I am not sure that holds formally. Certainly when the object’s value is updated by assignment, 5.2.4.2.2 9 requires that excess range and precision be removed. But consider alternate ways of modifying the value, such as the postfix increment operator. Does that count as an assignment? Its specification in 6.5.2.4 2 says to see the discussion of compound assignment for information on its conversions and effects. That is a bit vague. One would have to consider all possible ways of modifying an object and evaluate what the C standard says about them.

C fundamentals: double variable not equal to double expression?

I am working with an array of doubles called indata (in the heap, allocated with malloc), and a local double called sum.
I wrote two different functions to compare values in indata, and obtained different results. Eventually I determined that the discrepancy was due to one function using an expression in a conditional test, and the other function using a local variable in the same conditional test. I expected these to be equivalent.
My function A uses:
if (indata[i]+indata[j] > max) hi++;
and my function B uses:
sum = indata[i]+indata[j];
if (sum>max) hi++;
After going through the same data set and max, I end up with different values of hi depending on which function I use. I believe function B is correct, and function A is misleading. Similarly when I try the snippet below
sum = indata[i]+indata[j];
if ((indata[i]+indata[j]) != sum) etc.
that conditional will evaluate to true.
While I understand that floating point numbers do not necessarily provide an exact representation, why does that in-exact representation change when evaluated as an expression vs stored in a variable? Is recommended best practice to always evaluate a double expression like this prior to a conditional? Thanks!
I suspect you're using 32-bit x86, the only common architecture subject to excess precision. In C, expressions of type float and double are actually evaluated as float_t or double_t, whose relationships to float and double are reflected in the FLT_EVAL_METHOD macro. In the case of x86, both are defined as long double because the fpu is not actually capable of performing arithmetic at single or double precision. (It has mode bits intended to allow that, but the behavior is slightly wrong and thus can't be used.)
Assigning to an object of type float or double is one way to force rounding and get rid of the excess precision, but you can also just add a gratuitous cast to (double) if you prefer to leave it as an expression without assignments.
Note that forcing rounding to the desired precision is not equivalent to performing the arithmetic at the desired precision; instead of one rounding step (during the arithmetic) you now have two (during the arithmetic, and again to drop unwanted precision), and in cases where the first rounding gives you an exact-midpoint, the second rounding can go in the 'wrong' direction. This issue is generally called double rounding, and it makes excess precision significantly worse than nominal precision for certain types of calculations.

Make C floating point literals float (rather than double)

It is well known that in C, floating point literals (e.g. 1.23) have type double. As a consequence, any calculation that involves them is promoted to double.
I'm working on an embedded real-time system that has a floating point unit that supports only single precision (float) numbers. All my variables are float, and this precision is sufficient. I don't need (nor can afford) double at all. But every time something like
if (x < 2.5) ...
is written, disaster happens: the slowdown can be up to two orders of magnitude. Of course, the direct answer is to write
if (x < 2.5f) ...
but this is so easy to miss (and difficult to detect until too late), especially when a 'configuration' value is #define'd in a separate file by a less disciplined (or just new) developer.
So, is there a way to force the compiler to treat all (floating point) literals as float, as if with suffix f? Even if it's against the specs, I don't care. Or any other solutions? The compiler is gcc, by the way.
-fsingle-precision-constant flag can be used. It causes floating-point constants to be loaded in single precision even when this is not exact.
Note- This will also use single precision constants in operations on double precision variables.
Use warnings instead: -Wdouble-promotion warns about implicit float to double promotion, as in your example. -Wfloat-conversion will warn about cases where you may still be assigning doubles to floats.
This is a better solution than simply forcing double values to the nearest float value. Your floating-point code is still compliant, and you won't get any nasty surprises if a double value holds a positive value, say, less than FLT_DENORM_MIN (assuming IEEE-754) or greater than FLT_MAX.
You can cast the defined constants to (float) wherever they are used, the optimizer should do its job. This is a portable solution.
#define LIMIT 2.5
if (x < (float)LIMIT) ...
The -Wunsuffixed-float-constants flag could be used too, maybe combined with some of the other options in the accepted answer above. However, this probably won't catch unsuffixed constants in system headers. Would need to use -Wsystem-headers to catch those too. Could generate a lot of warnings...

Is there a difference between writing/passing value with or without float "f" suffix?

I believe I should type less if it's possible. Any unnecessary keystroke takes a bit a time.
In the context of objective-C, my question is:
Can I type this
[UIColor colorWithRed:0 green:0 blue:0 alpha:0.15];
[UIColor colorWithRed:0.45 green:0.45 blue:0.45 alpha:0.15];
or do I have to use the f suffix
[UIColor colorWithRed:0.0f green:0.0f blue:0.0f alpha:0.15f];
[UIColor colorWithRed:0.45f green:0.45f blue:0.45f alpha:0.15f];
1) why does it work even without "f"?
2) If I need to write f, do I still have an exception for "0", that means if it's zero, is it still ok without "f"?
What you are really asking is about type literals and implicit casting.
I haven't written C in ages, but this reference leads me to believe it's not dissimilar to C# (I can't speak about objective-c)
The problem boils down to this:
0.0 is a literal notation for the value 0 as a double
0.0f is a literal notation for the value 0 as a float
0 is a literal notation for the value 0 as a int
Supplying an int when a float is expected is fine, as there exists an implicit cast from int to float that the compiler can use.
However, if you specify a double when a float is expected, there is no implicit cast. This is because there is a loss of precision going from double to float and the compiler wants you to explicitly say you're aware of that.
So, if you write 0.0 when a float is expected, expect your compiler to moan at you about loss of precision :)
P.S.
I believe I should type less if it's possible. Any unnecessary keystroke takes a bit a time.
I wouldn't worry too much about the number of keystrokes. You'll waste far more time on unclear code in your life than you will from typing. If you're time concious then your best bet is to write clear and explicit code.
When you type 0 that is an integer constant. If you then assign it to a float, it gets promoted automatically: this is similar to a typecast but the compiler does it automatically.
When you type 0.0f that means "zero floating point" which is directly assigned to the float variable.
There is no meaningful difference between either method in this case.
The fact that you are asking this question, indicates that you should be explicit, despite the extra keystroke. The last thing any programmer wants to do when starting to work on some code is say "WTF is happening here". Code is read more often than it is written, and you've just demonstrated that someone with your level of experience may not know what that code does.
Yes it will work, and no, there's no compile/runtime downside of doing so, but code should be written for other people not the compiler -- the compiler doesn't care what junk you write, it will do it's best with it regardless. Other programmers on the other hand may throw up their hands and step away from the keyboard.
In both cases, the compiled code is identical. (Tested with LLVM 5.1, Xcode 5.1.1.)
The compiler is automatically converting the integer, float and double literals to CGFloats. Note that CGFloat is a float on 32-bit and a double on 64-bit, so the compiler will make a conversion whether you use 0.15f or 0.15.
I advise not worrying about this. My preference is to use the fewest characters, not because it is easier to type but because it it easier to read.

Resources