How can I minimize the code size of this program? - c

I have some problems with memory. Is it possible to reduce memory of compiled program in this function?
It makes some calculations with time variables {hh,mm,ss.0} and returns time (in millis) that depends on current progress (_SHOOT_COUNT)
unsigned long hour_koef=3600000L;
unsigned long min_koef=60000;
unsigned long timeToMillis(int* time)
{
return (hour_koef*time[0]+min_koef*time[1]+1000*time[2]+100*time[3]);
}
float Func1(float x)
{
return (x*x)/(x*x+(1-x)*(1-x));
}
float EaseFunction(byte percent,byte type)
{
if(type==0)
return Func1(float(percent)/100);
}
unsigned long DelayEasyControl()
{
long dd=timeToMillis(D1);
long dINfrom=timeToMillis(Din);
long dOUTto=timeToMillis(Dout);
if(easyINmode==0 && easyOUTmode==0) return dd;
if(easyINmode==1 && easyOUTmode==0)
{
if(_SHOOT_COUNT<duration) return (dINfrom+(dd-dINfrom)*EaseFunction(_SHOOT_COUNT*100/duration,0));
else return dd;
}
if(easyOUTmode==1)
{
if(_SHOOT_COUNT>=_SHOOT_activation && _SHOOT_activation!=-1)
{
if((_SHOOT_COUNT-_SHOOT_activation)<current_settings.delay_easyOUT_duration) return (dOUTto-(dOUTto-dd)*(1-EaseFunction((_SHOOT_COUNT-_SHOOT_activation)*100/duration,0)));
else return dOUTto;
}
else
{
if(easyINmode==0) return dd;
else if(_SHOOT_COUNT<duration) return (dINfrom+(dd-dINfrom)*EaseFunction(_SHOOT_COUNT*90/duration,0));
else return dd;
}
}
}

You mention that it's code size you want to optimize, and that you're doing this on an Arduino clone (based on the ATmega32U4).
Those controllers don't have hardware support for floating-point, so it's all going to be emulated in software which takes up a lot of code.
Try re-writing it to do fixed-point arithmetic, you will save a lot of code space that way.
You might see minor gains by optimizing the other data types, i.e. uint16_t instead of long might suffice for some of the values, and marking functions as inline can save the instructions needed to do the jump. The compiler might already be inlining, of course.

Most compilers have an option for optimizing for size, try it first. Then you may try a non-standard 24-bit float type available in some compilers for 8-bit MCUs like NXP's MRK III or MPLAB XC8
By default, the XC8 compiler uses a 24-bit floating-point format that is a truncated form of the 32-bit format and that has eight bits of exponent but only 16 bits of signed mantissa.
Understanding Floating-Point Values
That'll reduce the floating-point math library size a lot without any code changes, but it may still be too big for your MCU. In this case you'll need to rewrite the program. The most effective solution is to switch to fixed-point (A.K.A scaled integers) like #unwind said if you don't need very wide ranges. In fact that's a lot faster and takes much less ROM size than a software floating-point solution. Microchip's document above also suggests that solution:
The larger IEEE formats allow precise numbers, covering a large range of values to be handled. However, these formats require more data memory to store values of this type and the library routines that process these values are very large and slow. Floating-point calculations are always much slower than integer calculations and should be avoided if at all possible, especially if you are using an 8-bit device. This page indicates one alternative you might consider.
Also, you can try storing duplicated expressions like x*x and 1-x to a variable instead of calculating them twice like this (x*x)/(x*x+(1-x)*(1-x)), which helps a little bit if the compiler is too dumb. Same to easyINmode==0, easyOUTmode==1...
Some other things:
ALL_CAPS should be used for macros and constants only
Identifiers begin with _ and a capital letter is reserved for libraries. C may also use it for future features like _Bool or _Atomic. See What are the rules about using an underscore in a C++ identifier? (Arduino is probably C++)
Use functions instead of macros for things that are reused many times, because the inline expansion will eat some space each time it's used

Related

Fast floating point abs function

What is the fastest way to take the absolute value of a standard 32 bit float on x86-64 architectures in C99? The builtin functions fabsf and fabs are not fast enough. My current approach is bit twiddling:
unsigned int tmp = *((unsigned int *)&f) & 0x7fffffff;
float abs = *((float *)&tmp);
It works but is ugly. And I'm not sure it is optimal?
Please stop telling me about type-punned pointers because it's not what I'm asking about. I know the code can be phrased using unions but it doesn't matter because on all compilers (written in the last 10 years) it will emit exactly the same code.
Less standard violations:
/* use type punning instead of pointer arithmatics, to require proper alignment */
static inline float float2absf(float f) {
/* optimizer will optimize away the `if` statement and the library call */
if (sizeof(float) == sizeof(uint32_t)) {
union {
float f;
uint32_t i;
} u;
u.f = f;
u.i &= 0x7fffffff;
return u.f;
}
return fabsf(f);
}
IMHO, it would be safer to use the library function. This will improve code portability, especially on platforms where you might encounter a non-IEEE float representation or where type sizes might differ.
In general, once compiled for your platform, the library function should provide the fastest solution.
Having said that, library calls require both stack management and code jumps unless optimized away, which - for a simple bit-altering function - could result in more then twice the number of operations as well as cache misses. In many cases, this is avoidable by using compiler builtins, which could be done automatically by the compiler (it can optimize library functions into inline instructions).
Your bit-approach is (in theory) correct and could optimize away the operations related to function calls, as well as improve code locality... although the same could be achieved using compiler builtins and optimizations.
Also, please note that your approach isn't standard compliant and it assumes that sizeof(int) == sizeof(float)... I think that type punning using a union will improve that little bit.
In addition, using an inline function could work out like using a macro and make the code more readable. In addition, it could allow a fallbacks to the library function if type sizes don't match.

looking for snprintf()-replacement

I want to convert a float (e.g. f=1.234) to a char-array (e.g. s="1.234"). This is quite easy with snprintf() but for some size and performance-reasons I can't use it (it is an embedded platform where snprintf() is too slow because it uses doubles internally).
So: how can I easily convert a float to a char-array (positive and negative floats, no exponential representation, maximum three digits after the dot)?
Thanks!
PS: to clarify this: the platform comes with a NEON FPU which can do 32 bit float operations in hardware but is slow with 64 bit doubles. The C-library for this platform unfortunately does not have a specific NEON/float variant of snprintf, so I need a replacement. Beside of that the complete snprintf/printf-stuff increases code size too much
For many microcontrollers a simplified printf function without float/double support is available. For instance many platforms have newlib nano and texas instruments provides ustdlib.c.
With one of those non-float printf functions you could split up the printing to something using only integers like
float a = -1.2339f;
float b = a + ((a > 0) ? 0.0005f : -0.0005f);
int c = b;
int d = (int)(b * 1000) % 1000;
if (d < 0) d = -d;
printf("%d.%03d\n", c, d);
which outputs
-1.234
Do watch out for overflows of the integer on 8 and 16 bit platforms.
-edit-
Furthermore, as by the comments, rounding corner cases will provide different answers than printfs implementation.
You might check to see if your stdlib provide strfromf, the low-level routine that converts a float into a string that is normally used by printf and friends. If available, this might be lighter-weight than including the entire stdio lib (and indeed, that is the reason it is included in the 60559 C extension standard).

Efficient conversion from Indeterminate Value to Unspecified Value

Sometimes in C it is necessary to read a possibly-written item from a partially-written array, such that:
If the item has been written, the read will yield the value that was in fact written, and
If the item hasn't been written, the read will convert an Unspecified bit pattern to a value of the appropriate type, with no side-effects.
There are a variety of algorithms where finding a solution from scratch is expensive, but validating a proposed solution is cheap. If an array holds solutions for all cases where they have been found, and arbitrary bit patterns for other cases, reading the array, testing whether it holds a valid solution, and slowly computing the solution only if the one in the array isn't valid, may be a useful optimization.
If an attempt to read a non-written array element of a types like uint32_t could be guaranteed to always yield a value of the appropriate type, efficiently such an approach would be easy and straightforward. Even if that requirement only held for unsigned char, it might still be workable. Unfortunately, compilers sometimes behave as though reading an Indeterminate Value, even of type unsigned char, may yield something that doesn't behave consistently as a value of that type. Further, discussions in a Defect Report suggest that operations involving Indeterminate values yield Indeterminate results, so even given something like unsigned char x, *p=&x; unsigned y=*p & 255; unsigned z=(y < 256); it would be possible for z to receive the value 0.
From what I can tell, the function:
unsigned char solidify(unsigned char *p)
{
unsigned char result = 0;
unsigned char mask = 1;
do
{
if (*p & mask) result |= mask;
mask += (unsigned)mask; // Cast only needed for capricious type ranges
} while(mask);
return result;
}
would be guaranteed to always yield a value in the range of type unsigned char any time the storage identified can be accessed as that type, even if it happens to hold Indeterminate Value. Such an approach seems rather slow and clunky, however, given that the required machine code to obtain the desired effect should usually be equivalent to returning x.
Are there any better approaches that would be guaranteed by the Standard to always yield a value within the range of unsigned char, even if the source value is Indeterminate?
Addendum
The ability to solidify values is necessary, among other things, when performing I/O with partially-written arrays and structures, in cases where nothing will care about what bits get output for the parts that were never set. Whether or not the Standard would require that fwrite be usable with partially-writtten structures or arrays, I would regard I/O routines that can be used in such fashion (writing arbitrary values for portions that weren't set) to be of higher quality than those which might jump the rails in such cases.
My concern is largely with guarding against optimizations which are unlikely to be used in dangerous combinations, but which could nonetheless occur as compilers get more and more "clever".
A problem with something like:
unsigned char solidify_alt(unsigned char *p)
{ return *p; }
is that compilers may combine an optimization which could be troublesome but tolerable in isolation, with one that would be good in isolation but deadly in combination with the first:
If the function is passed the address of an unsigned char which has been optimized to e.g. a 32-bit register, a function like the above may blindly return the contents of that register without clipping it to the range 0-255. Requiring that callers manually clip the results of such functions would be annoying but survivable if that were the only problem. Unfortunately...
Since the above function function will "always" return a value 0-255, compilers may omit any "downstream" code that would try to mask the value into that range, check if it was outside, or otherwise do things that would be irrelevant for values outside the range 0-255.
Some I/O devices may require that code wishing to write an octet perform a 16-bit or 32-bit store to an I/O register, and may require that 8 bits contain the data to be written and other bits hold a certain pattern. They may malfunction badly if any of the other bits are set wrong. Consider the code:
void send_byte(unsigned char *p, unsigned int n)
{
while(n--)
OUTPUT_REG = solidify_alt(*p++) | 0x0200;
}
void send_string4(char *st)
{
unsigned char buff[5]; // Leave space for zero after 4-byte string
strcpy((char*)buff, st);
send_bytes(buff, 4);
}
with the indended semantics that send_string4("Ok"); should send out an 'O', a 'k', a zero byte, and an arbitrary value 0-255. Since the code uses solidify_alt rather than solidify, a compiler could legally turn that into:
void send_string4(char *st)
{
unsigned buff0, buff1, buff2, buff3;
buff0 = st[0]; if (!buff0) goto STRING_DONE;
buff1 = st[1]; if (!buff1) goto STRING_DONE;
buff2 = st[2]; if (!buff2) goto STRING_DONE;
buff3 = st[3];
STRING_DONE:
OUTPUT_REG = buff0 | 0x0200;
OUTPUT_REG = buff1 | 0x0200;
OUTPUT_REG = buff2 | 0x0200;
OUTPUT_REG = buff3 | 0x0200;
}
with the effect that OUTPUT_REG may receive values with bits set outside the proper range. Even if output expression were changed to ((unsigned char)solidify_alt(*p++) | 0x0200) & 0x02FF) a compiler could still simplify that to yield the code given above.
The authors of the Standard refrained from requiring compiler-generated initialization of automatic variables because it would have made code slower in cases where such initialization would be semantically unnecessary. I don't think they intended that programmers should have to manually initialize automatic variables in cases where all bit patterns would be equally acceptable.
Note, btw, that when dealing with short arrays, initializing all the values will be inexpensive and would often be a good idea, and when using large arrays a compiler would be unlikely to impose the above "optimization". Omitting the initialization in cases where the array is large enough that the cost matters, however, would make the program's correct operation reliant upon "hope".
This is not an answer, but an extended comment.
The immediate solution would be for the compiler to provide a built-in, for example assume_initialized(variable [, variable ... ]*), that generates no machine code, but simply makes the compiler treat the contents of the specified variable (either scalars or arrays) to be defined but unknown.
One can achieve a similar effect using a dummy function defined in another compilation unit, for example
void define_memory(void *ptr, size_t bytes)
{
/* Nothing! */
}
and calling that (e.g. define_memory(some_array, sizeof some_array)), to stop the compiler from treating the values in the array as indeterminate; this works because at compile time, the compiler cannot determine the values are unspecified or not, and therefore must consider them specified (defined but unknown).
Unfortunately, that has serious performance penalties. The call itself, even though the function body is empty, has a performance impact. However, worse yet is the effect on the code generation: because the array is accessed in a separate compilation unit, the data must actually reside in memory in array form, and thus typically generates extra memory accesses, plus restricts the optimization opportunities for the compiler. In particular, even a small array must then exist, and cannot be implicit or reside completely in machine registers.
I have experimented with a few architecture (x86-64) and compiler (GCC) -specific workarounds (using extended inline assembly to fool the compiler to believe that the values are defined but unknown (unspecified, as opposed to indeterminate), without generating actual machine code -- because this does not require any machine code, just a small adjustment to how the compiler treats the arrays/variables --, but with about zero success.
Now, to the underlying reason why I wrote this comment.
Years and years ago, working on numerical computation code and comparing performance to a similar implementation in Fortran 95, I discovered the lack of a memrepeat(ptr, first, bytes) function: the counterpart to memmove() with respect to memcpy(), that would repeat first bytes at ptr to ptr+first up to ptr+bytes-1. Like memmove(), it would work on the storage representation of the data, so even if the ptr to ptr+first contained a trap representation, no trap would actually trigger.
Main use case is to initialize arrays with floating-point data (one-dimensional, multidimensional, or structures with floating-point members), by initializing the first structure or group of values, and then simply repeating the storage pattern over the entire array. This is a very common pattern in numerical computation.
As an example, using
double nums[7] = { 7.0, 6.0, 5.0, 4.0, 3.0, 2.0, 1.0 };
memrepeat(nums, 2 * sizeof nums[0], sizeof nums);
yields
double nums[7] = { 7.0, 6.0, 7.0, 6.0, 7.0, 6.0, 7.0 };
(It is possible that the compiler could optimize the operation even better, if it was defined as e.g. memsetall(data, size, count), where size is the size of the duplicated storage unit, and count the total number of storage units (so count-1 units are actually copied). In particular, this allows easy implementation that uses nontemporal stores for the copies, reading from the initial storage unit. On the other hand, memsetall() can only copy full storage units unlike memrepeat(), so memsetall(nums, 2 * sizeof nums[0], 3); would leave the 7th element in nums[] unchanged -- i.e., in the above example, it'd yield { 7.0, 6.0, 7.0, 6.0, 7.0, 6.0, 1.0 }.)
Although you can trivially implement memrepeat() or memsetall(), even optimize them for a specific architecture and compiler, it is difficult to write a portable optimized version.
In particular, loop-based implementations that use memcpy() (or memmove()) yield quite inefficient code when compiled by e.g. GCC, because the compiler cannot coalesce a pattern of function calls into a single operation.
Most compilers often inline memcpy() and memmove() with internal, target-and-use-case-optimized versions, and doing that for such a memrepeat() and/or memsetall() function would make it portable. In Linux on x86-64, GCC inlines known-size calls, but keeps the function calls where the size is only known at runtime.
I did try to push it upstream, with some private and some public discussions on various mailing lists. The response was cordial, but clear: there is no way to get such features included into compilers, unless it is standardized by someone first, or you pique the interest of one of the core developers enough so that they want to try it themselves.
Because the C standards committee is only concerned at fulfilling the commercial interests of its corporate sponsors, there is zero chance of getting anything like that standardized into ISO C. (If there were, we really should push for basic features from POSIX like getline(), regex, and iconv to be included first; they'd have a much bigger positive impact on code we can teach new C programmers.)
None of this piqued the interest of the core GCC developers either, so at that point, I lost my interest in trying to push it upstream.
If my experience is typical -- and discussing it with a few people it does seem like it is --, OP and others worrying about such things will better utilize their time to find compiler/architecture-specific workarounds, rather than point out the deficiencies in the standard: the standard is already lost, those people do not care.
Better spend your time and efforts in something you can actually accomplish without having to fight against windmills.
I think this is pretty clear. C11 3.19.2
indeterminate value
either an unspecified value or a trap
representation
Period. It cannot be anything else than the two cases above.
So code such as unsigned z=(y < 256) can never return 0, because x in your example cannot hold a value larger than 255. As per the representation of character types, 6.2.6, an unsigned char is not allowed to contain padding bits or trap representations.
Other types, on wildly exotic systems, could in theory hold values outside their range, padding bits and trap representations.
On real-world systems, that are extremely likely to use two's complement, trap representations do not exist. So the indeterminate value can only be unspecified. Unspecified, not undefined! There is a myth saying that "reading an indeterminate value is always undefined behavior". Save for trap representations and some other special cases, this is not true, see this. This is merely unspecified behavior.
Unspecified behavior does not mean that the compiler can run havoc and make weird assumptions, as it can when it encounters undefined behavior. It will have to assume that the variable values are in range. What the compiler cannot assume, is that the value is the same between reads - this was addressed by some DR.

C long double in golang

I am porting an algorithm from C to Go. And I got a little bit confused. This is the C function:
void gauss_gen_cdf(uint64_t cdf[], long double sigma, int n)
{
int i;
long double s, d, e;
//Calculations ...
for (i = 1; i < n - 1; i++) {
cdf[i] = s;
}
}
And in the for loop value "s" is assigned to element "x" the array cdf. How is this possible? As far as I know, a long double is a float64 (in the Go context). So I shouldn't be able to compile the C code because I am assigning an long double to an array which just contains uint64 elements. But the C code is working fine.
So can someone please explain why this is working?
Thank you very much.
UPDATE:
The original C code of the function can be found here: https://github.com/mjosaarinen/hilabliss/blob/master/distribution.c#L22
The assignment cdf[i] = s performs an implicit conversion to uint64_t. It's hard to tell if this is intended without the calculations you omitted.
In practice, long double as a type has considerable variance across architectures. Whether Go's float64 is an appropriate replacement depends on the architecture you are porting from. For example, on x86, long double is an 80-byte extended precision type, but Windows systems are usually configured in such a way to compute results only with the 53-bit mantissa, which means that float64 could still be equivalent for your purposes.
EDIT In this particular case, the values computed by the sources appear to be static and independent of the input. I would just use float64 on the Go side and see if the computed values are identical to those of the C version, when run on a x86 machine under real GNU/Linux (virtualization should be okay), to work around the Windows FPU issues. The choice of x86 is just a guess because it is likely what the original author used. I do not understand the underlying cryptography, so I can't say whether a difference in the computed values impact the security. (Also note that the C code does not seem to properly seed its PRNG.)
C long double in golang
The title suggests an interest in whether of not Go has an extended precision floating-point type similar to long double in C.
The answer is:
Not as a primitive, see Basic types.
But arbitrary precision is supported by the math/big library.
Why this is working?
long double s = some_calculation();
uint64_t a = s;
It compiles because, unlike Go, C allows for certain implicit type conversions. The integer portion of the floating-point value of s will be copied. Presumably the s value has been scaled such that it can be interpreted as a fixed-point value where, based on the linked library source, 0xFFFFFFFFFFFFFFFF (2^64-1) represents the value 1.0. In order to make the most of such assignments, it may be worthwhile to have used an extended floating-point type with 64 precision bits.
If I had to guess, I would say that the (crypto-related) library is using fixed-point here because they want to ensure deterministic results, see: How can floating point calculations be made deterministic?. And since the extended-precision floating point is only being used for initializing a lookup table, using the (presumably slow) math/big library would likely perform perfectly fine in this context.

Integer type with floating point semantics for C or D

I'm looking for an existing implementation for C or D, or advice in implementing, signed and/or unsigned integer types with floating point semantics.
That is to say, an integer type that behaves as floating point types do when doing arithmetic: Overflow produces infinity (-infinity for signed underflow) rather than wrapping around or having undefined behavior, undefined operations produce NaN, etc.
In essence, a version of floating point where the distribution of presentable numbers falls evenly on the number line, instead of conglomerating around 0.
In addition, all operations should be deterministic; any given two's complement 32-bit architecture should produce the exact same result for the same computation, regardless of its implementation (whereas floating point may, and often will, produce slightly differing results).
Finally, performance is a concern, which has me worried about potential "bignum" (arbitrary-precision) solutions.
See also: Fixed-point and saturation arithmetic.
I do not know of any existing implementations of this.
But I would imagine implementing it would be a matter of (in D):
enum CheckedIntState : ubyte
{
ok,
overflow,
underflow,
nan,
}
struct CheckedInt(T)
if (isIntegral!T)
{
private T _value;
private CheckedIntState _state;
// Constructors, getters, conversion helper methods, etc.
// And a bunch of operator overloads that check the
// result on every operation and yield a CheckedInt!T
// with an appropriate state.
// You'll also want to overload opEquals and opCmp and
// make them check the state of the operands so that
// NaNs compare equal and so on.
}
Saturating arithmetic does what you want except for the part where undefined operations produce NaN; this is going to turn out to be problematic, because most saturating implementations use the full number range, and so there are not values left over to reserve for NaN. Thus, you probably can't easily build this on the back of saturating hardware instructions unless you have an additional "is this value NaN" field, which is rather wasteful.
Assuming that you're wedded to the idea of NaN values, all of the edge case detection will probably need to happen in software. For most integer operations, this is pretty straightforward, especially if you have a wider type available (let's assume long long is strictly larger than whatever integer type underlies myType):
myType add(myType x, myType y) {
if (x == positiveInfinity && y == negativeInfinity ||
x == negativeInfinity && y == positiveInfinity)
return notANumber;
long long wideResult = x + y;
if (wideResult >= positiveInfinity) return positiveInfinity;
if (wideResult <= negativeInfinity) return negativeInfinity;
return (myType)wideResult;
}
One solution might be to implement multiple-precision arithmetic with abstract data types. The book C Interfaces and Implementations by David Hanson has a chapter (interface and implementation) of MP arithmetic.
Doing calculations using scaled integers is also a possibility. You might be able to use his arbitrary-precision arithmetic, although I believe this implementation can't overflow. You could run out of memory, but that's a different problem.
In either case, you might need to tweak the code to return exactly what you want on overflow and such.
Source code (MIT license)
That page also has a link to buy the book from amazon.com.
Half of the requirements are satisfied in saturating arithmetic, which are implemented in e.g. ARM instructions, MMX and SSE.
As also pointed out by Stephen Canon, one needs additional elements to check overflow / NaN. Some instruction sets (Atmel at least) btw have a sticking flag to test for overflows (could be used to differentiate inf from max_int). And perhaps "Q" + 0 could mark for NaN.

Resources