Floating-point conversion without strtod/sprintf - c

Since I have decided to use UTF-16 internally in a program that should run on Windows and Linux, I need replacement for some string handling functions, since I do not want to convert to and from the native char representation for user-mode code. However, if float conversion is slow compared to running iconv, I can use a wrapper around strtod/sprintf
WINE did.

These conversions to and from decimal are difficult to make both fast and correct. The naïve (but correct) versions assume multi-precision integers, an implementation of which you were perhaps not planning on depending on. In short, wrap your existing stdtod/sprintf and do not worry about the overhead, it will be less than the loss of performance in using a naïve implementation of these functions.
In the “naïve incorrect” category, there is an implementation of strtod() floating around that all interpreters use when the host is lacking one. This implementation is terrible (it may return a result off by several ULPs), but if you do not mind, you could adapt this code to manipulate UTF-16 characters.
NOTE: there is a swprintf() in C99 I think, but it is for strings of wchar_t, which does not have to be UTF-16, so that may not work for you.

Related

How to print a float on stdout without printf()?

I'm on environment that has no printf() or any equivalent, so I'm writing it myself. But I have no idea how to perform such a conversion of float types. I tried to seen how gcc does it, but it's really hard to understand.
Floating-point formatting is very easy to get wrong. Writing a simplistic implementation that works for "most" numbers is deceptively easy, but it is likely to break on very large numbers, very small numbers, and numbers close to zero, not to mention IEEE754 subnormals, infinities and NaN. It might also get wrong the trailing decimals, failing to provide a string representation that allows reproducing the float bit-by-bit.
Fortunately, there are libraries out there that implement the work of formatting floating-point numbers, either for education, for embedded systems, or to improve on some aspect of standard-library formatting. If possible, I recommend that you incorporate David Gay's dtoa library, which has been extensively tested in Python and elsewhere.
You can take a look at musl libc implementation. musl is a lightweight libc.
In fmt_fp function defined in src/stdio/vfprintf.c, they are basically converting a float to a string for fprintf conversion specifiers like f.
If you search on the internet with keyword ftoa, you will find some other implementations of functions converting a float to a string.

Looking for Ansi C89 arbitrary precision math library

I wrote an Ansi C compiler for a friend's custom 16-bit stack-based CPU several years ago but I never got around to implementing all the data types. Now I would like to finish the job so I'm wondering if there are any math libraries out there that I can use to fill the gaps. I can handle 16-bit integer data types since they are native to the CPU and therefore I have all the math routines (ie. +, -, *, /, %) done for them. However, since his CPU does not handle floating point then I have to implement floats/doubles myself. I also have to implement the 8-bit and 32-bit data types (bother integer and floats/doubles). I'm pretty sure this has been done and redone many times and since I'm not particularly looking forward to recreating the wheel I would appreciate it if someone would point me at a library that can help me out.
Now I was looking at GMP but it seems to be overkill (library must be absolutely huge, not sure my custom compiler would be able to handle it) and it takes numbers in the form of strings which would be wasteful for obvious reasons. For example :
mpz_set_str(x, "7612058254738945", 10);
mpz_set_str(y, "9263591128439081", 10);
mpz_mul(result, x, y);
This seems simple enough, I like the api... but I would rather pass in an array rather than a string. For example, if I wanted to multiply two 32-bit longs together I would like to be able to pass it two arrays of size two where each array contains two 16-bit values that actually represent a 32-bit long and have the library place the output into an output array. If I needed floating point then I should be able to specify the precision as well.
This may seem like asking for too much but I'm asking in the hopes that someone has seen something like this.
Many thanks in advance!
Let's divide the answer.
8-bit arithmetic
This one is very easy. In fact, C already talks about this under the term "integer promotion". This means that if you have 8-bit data and you want to do an operation on them, you simply pad them with zero (or one if signed and negative) to make them 16-bit. Then you proceed with the normal 16-bit operation.
32-bit arithmetic
Note: so long as the standard is concerned, you don't really need to have 32-bit integers.
This could be a bit tricky, but it is still not worth using a library for. For each operation, you would need to take a look at how you learned to do them in elementary school in base 10, and then do the same in base 216 for 2 digit numbers (each digit being one 16-bit integer). Once you understand the analogy with simple base 10 math (and hence the algorithms), you would need to implement them in assembly of your CPU.
This basically means loading the most significant 16 bit on one register, and the least significant in another register. Then follow the algorithm for each operation and perform it. You would most likely need to get help from overflow and other flags.
Floating point arithmetic
Note: so long as the standard is concerned, you don't really need to conform to IEEE 754.
There are various libraries already written for software emulated floating points. You may find this gcc wiki page interesting:
GNU libc has a third implementation, soft-fp. (Variants of this are also used for Linux kernel math emulation on some targets.) soft-fp is used in glibc on PowerPC --without-fp to provide the same soft-float functions as in libgcc. It is also used on Alpha, SPARC and PowerPC to provide some ABI-specified floating-point functions (which in turn may get used by GCC); on PowerPC these are IEEE quad functions, not IBM long double ones.
Performance measurements with EEMBC indicate that soft-fp (as speeded up somewhat using ideas from ieeelib) is about 10-15% faster than fp-bit and ieeelib about 1% faster than soft-fp, testing on IBM PowerPC 405 and 440. These are geometric mean measurements across EEMBC; some tests are several times faster with soft-fp than with fp-bit if they make heavy use of floating point, while others don't make significant use of floating point. Depending on the particular test, either soft-fp or ieeelib may be faster; for example, soft-fp is somewhat faster on Whetstone.
One answer could be to take a look at the source code for glibc and see if you could salvage what you need.

Dumping struct in C

Is it a good idea to simply dump a struct to a binary file using fwrite?
e.g
struct Foo {
char name[100];
double f;
int bar;
} data;
fwrite(&data,sizeof(data),1,fout);
How portable is it?
I think it's really a bad idea to just throw whatever the compiler gives(padding,integer size,etc...). even if platform portability is not important.
I've a friend arguing that doing so is very common.... in practice.
Is it true???
Edit: What're the recommended way to write portable binary file? Using some sort of library?
I'm interested how this is achieved too.(By specifying byte order,sizes,..?)
That's certainly a very bad idea, for two reasons:
the same struct may have different sizes on different platforms due to alignment issues and compiler mood
the struct's elements may have different representations on different machines (think big-endian/little-endian, IEE754 vs. some other stuff, sizeof(int) on different platforms)
It rather critically matters whether you want the file to be portable, or just the code.
If you're only ever going to read the data back on the same C implementation (and that means with the same values for any compiler options that affect struct layout in any way), using the same definition of the struct, then the code is portable. It might be a bad idea for other reasons: difficulty of changing the struct, and in theory there could be security risks around dumping padding bytes to disk, or bytes after any NUL terminator in that char array. They could contain information that you never intended to persist. That said, the OS does it all the time in the swap file, so whatEVER, but try using that excuse when users notice that your document format doesn't always delete data they think they've deleted, and they just emailed it to a reporter.
If the file needs to be passed between different platforms then it's a pretty bad idea, because you end up accidentally defining your file format to be something like, "whatever MSVC on Win32 ends up writing". This could end up being pretty inconvenient to read and write on some other platform, and certainly the code you wrote in the first place won't do it when running on another platform with an incompatible storage representation of the struct.
The recommended way to write portable binary files, in order of preference, is probably:
Don't. Use a text format. Be prepared to lose some precision in floating-point values.
Use a library, although there's a bit of a curse of choice here. You might think ASN.1 looks all right, and it is as long as you never have to manipulate the stuff yourself. I would guess that Google Protocol Buffers is fairly good, but I've never used it myself.
Define some fairly simple binary format in terms of what each unsigned char in turn means. This is fine for characters[*] and other integers, but gets a bit tricky for floating-point types. "This is a little-endian representation of an IEEE-754 float" will do you OK provided that all your target platforms use IEEE floats. Which I expect they do, but you have to bet on that. Then, assemble that sequence of characters to write and interpret it to read: if you're "lucky" then on a given platform you can write a struct definition that matches it exactly, and use this trick. Otherwise do whatever byte manipulation you need to. If you want to be really portable, be careful not to use an int throughout your code to represent the value taken from bar, because if you do then on some platform where int is 16 bits, it won't fit. Instead use long or int_least32_t or something, and bounds-check the value on writing. Or use uint32_t and let it wrap.
[*] Until you hit an EBCDIC machine, that is. Not that anybody will seriously expect your files to be portable to a machine that plain text files aren't portable to either.
How fond are you of getting a call in the middle of the night? Either use a #pragma to pack them or write them variable by variable.
Yes, this sort of foolishness is very common but that doesn't make it a good idea. You should write each field individually in a specified byte order, that will avoid alignment and byte order problems at the cost of a little tiny bit of extra effort. Reading and writing field by field will also make your life easier when you upgrade your software and have to read your old data format or if the underlying hardware architecture changes.

Adding 64 bit support to existing 32 bit code, is it difficult?

There is a library which I build against different 32-bit platforms. Now, 64-bit architectures must be supported. What are the most general strategies to extend existing 32-bit code to support 64-bit architectures? Should I use #ifdef's or anything else?
The amount of effort involved will depend entirely on how well written the original code is. In the best possible case there will be no effort involved other than re-compiling. In the worst case you will have to spend a lot of time making your code "64 bit clean".
Typical problems are:
assumptions about sizes of int/long/pointer/etc
assigning pointers <=> ints
relying on default argument or function result conversions (i.e. no function prototypes)
inappropriate printf/scanf format specifiers
assumptions about size/alignment/padding of structs (particularly in regard to file or network I/O, or interfacing with other APIs, etc)
inappropriate casts when doing pointer arithmetic with byte offsets
Simply don't rely on assumption of the machine word size? always use sizeof, stdint.h, etc. Unless you rely on different library calls for different architectures, there should be no need for #ifdefs.
The easiest strategy is to build what you have with 64-bit settings and test the heck out of it. Some code doesn't need to change at all. Other code, usually with wrong assumptions about the size of ints/pointers will be much more brittle and will need to be modified to be non-dependant on the architecture.
Very often binary files containing binary records cause the most problems. This is especially true in environments where ints grow from 32-bit to 64-bit in the transition to a 64-bit build. Primarily this is due to the fact that integers get written natively to files in their current (32-bit) length and read in using an incorrect length in a 64-bit build where ints are 64-bit.

initialize a variable statically (at compile time)

1) I've got many constants in my C algo.
2) my code works both in floating-point and fixed-point.
Right now, these constants are initialized by a function, float2fixed, whereby in floating-point it does nothing, while in fixed-point, it finds their fixed-point representation. For instance, 0.5f stays 0.5f if working in floating-point, whereas it uses the pow() routine and becomes 32768 if working in fixed-point and the fixed-point representation is Qx.16.
That's easy to maintain, but it takes a lot of time actually to compute these constants in fixed-point (pow is a floatin-point function). In C++, I'd use some meta-programming, so the compiler computes these values at compile-time, so there's no hit at run-time. But in C, thats not possible. Or is it? Anybody knows of such a trick? Is any compiler clever enough to do that?
Looking forward to any answers.
A
Rather than using (unsigned)(x*pow(2,16)) to do your fixed point conversion, write it as (unsigned)(0.5f * (1 << 16))
This should be an acceptable as a compile-time constant expression since it involves only builtin operators.
When using fixed-point, can you write a program that takes your floating point values and converts them into correct, constant initializers for the fixed point type, so you effectively add a step to the compilation that generates the fixed point values.
One advantage of this will be that you can then define and declare your constants with const so that they won't change at run-time - whereas with the initialization functions, of course, the values have to be modifiable because they are calculated once.
I mean write a simple program that can scan for formulaic lines that might read:
const double somename = 3.14159;
it would read that and generate:
const fixedpoint_t somename = { ...whatever is needed... };
You design the operation to make it easy to manage for both notations - so maybe your converter always reads the file and sometimes rewrites it.
datafile.c: datafile.constants converter
converter datafile.constants > datafile.c
In plain C, there's not much you can do. You need to do the conversion at some point, and the compiler doesn't give you any access to call interesting user-provided functions at compile time. Theoretically, you could try to coax the preprocessor to do it for you, but that's the quick road to total insanity (i.e. you'd have to implement pow() in macros, which is pretty hideous).
Some options I can think of:
Maintain a persistent cache on disk. At least then it'd only be slow once, though you still have to load it, make sure it's not corrupt, etc.
As mentioned in another comment, use template metaprogramming anyway and compile with a C++ compiler. Most C works just fine (arguably better) with a C++ compiler.
Hmm, I guess that's about all I can think of. Good luck.
Recent versions of GCC ( around 4.3 ) added the ability to use GMP and MPFR to do some compile-time optimisations by evaluating more complex functions that are constant. That approach leaves your code simple and portable, and trust the compiler to do the heavy lifting.
Of course, there are limits to what it can do, and it would be hard to know if it's optimizing a given instance without going and looking at the assembly. But it might be worth checking out. Here's a link to the description in the changelog

Resources