C - What to do when double is not 8 bytes long? - c

I have a C program and I will use it on deprecated machines. I am a little bit scared of problems with basic types compatibilities - I mean, when I ask How can I guarantee, that double will be 8 bytes long? answer is
"You can put this in your code:
static_assert(sizeof(double) == 8, "invalid double size");"
Ok, I get it...
But what do I do, when I run the code and it goes "assertion failed"?
What are my options now? How can I deal with this problem? I need to run the program on that specific machine... Is it hard to adapt the program to different size of double?
P.S. My program needs 8byte doubles
EDIT: Why do I need 8 byte doubles?
I read some data from serial port in binary format to a buffer and from that buffer I need to extract the IEEE754 double precision and single precision numbers, to do some calculations with them.

If the reason you need 8-byte doubles is because you are reading and writing them to data files that way, you need to step back and have a long, hard think about this. Reading and writing "binary" data files is one of those things people love to do because it's "easy" and "efficient" -- but of course it's wildly unportable. You can try to patch things up to make your binary data files vaguely portable, but it's actually pretty difficult, and by the time you do, it's no longer easy or efficient.
If you can possibly see your way clear to using text data files, just about all of your portability problems will magically vanish, and you won't have to worry about those data files any more.
You can read and write floating-point quantities to text data files using %a, which will do a good job of preserving accuracy and precision.

If you know the specific types of machines and C environment on which your code must run, then you can and should study their documentation to determine whether their characteristics match your data format. If they do, then you have no problem. If they don't, then you have to write FP emulation code.
If you can't make this determination prior to writing the code, or if characteristics of the target machines vary in a way that makes a difference here, then you need to write the emulation code regardless. Of course, you can use such code on any machine, so you might stop there. You could also, however, provide an alternative code path that uses native FP, and choose between the two paths either at build time (via conditional compilation) or at run time (via an explicit test of FP representation).

Related

Looking for Ansi C89 arbitrary precision math library

I wrote an Ansi C compiler for a friend's custom 16-bit stack-based CPU several years ago but I never got around to implementing all the data types. Now I would like to finish the job so I'm wondering if there are any math libraries out there that I can use to fill the gaps. I can handle 16-bit integer data types since they are native to the CPU and therefore I have all the math routines (ie. +, -, *, /, %) done for them. However, since his CPU does not handle floating point then I have to implement floats/doubles myself. I also have to implement the 8-bit and 32-bit data types (bother integer and floats/doubles). I'm pretty sure this has been done and redone many times and since I'm not particularly looking forward to recreating the wheel I would appreciate it if someone would point me at a library that can help me out.
Now I was looking at GMP but it seems to be overkill (library must be absolutely huge, not sure my custom compiler would be able to handle it) and it takes numbers in the form of strings which would be wasteful for obvious reasons. For example :
mpz_set_str(x, "7612058254738945", 10);
mpz_set_str(y, "9263591128439081", 10);
mpz_mul(result, x, y);
This seems simple enough, I like the api... but I would rather pass in an array rather than a string. For example, if I wanted to multiply two 32-bit longs together I would like to be able to pass it two arrays of size two where each array contains two 16-bit values that actually represent a 32-bit long and have the library place the output into an output array. If I needed floating point then I should be able to specify the precision as well.
This may seem like asking for too much but I'm asking in the hopes that someone has seen something like this.
Many thanks in advance!
Let's divide the answer.
8-bit arithmetic
This one is very easy. In fact, C already talks about this under the term "integer promotion". This means that if you have 8-bit data and you want to do an operation on them, you simply pad them with zero (or one if signed and negative) to make them 16-bit. Then you proceed with the normal 16-bit operation.
32-bit arithmetic
Note: so long as the standard is concerned, you don't really need to have 32-bit integers.
This could be a bit tricky, but it is still not worth using a library for. For each operation, you would need to take a look at how you learned to do them in elementary school in base 10, and then do the same in base 216 for 2 digit numbers (each digit being one 16-bit integer). Once you understand the analogy with simple base 10 math (and hence the algorithms), you would need to implement them in assembly of your CPU.
This basically means loading the most significant 16 bit on one register, and the least significant in another register. Then follow the algorithm for each operation and perform it. You would most likely need to get help from overflow and other flags.
Floating point arithmetic
Note: so long as the standard is concerned, you don't really need to conform to IEEE 754.
There are various libraries already written for software emulated floating points. You may find this gcc wiki page interesting:
GNU libc has a third implementation, soft-fp. (Variants of this are also used for Linux kernel math emulation on some targets.) soft-fp is used in glibc on PowerPC --without-fp to provide the same soft-float functions as in libgcc. It is also used on Alpha, SPARC and PowerPC to provide some ABI-specified floating-point functions (which in turn may get used by GCC); on PowerPC these are IEEE quad functions, not IBM long double ones.
Performance measurements with EEMBC indicate that soft-fp (as speeded up somewhat using ideas from ieeelib) is about 10-15% faster than fp-bit and ieeelib about 1% faster than soft-fp, testing on IBM PowerPC 405 and 440. These are geometric mean measurements across EEMBC; some tests are several times faster with soft-fp than with fp-bit if they make heavy use of floating point, while others don't make significant use of floating point. Depending on the particular test, either soft-fp or ieeelib may be faster; for example, soft-fp is somewhat faster on Whetstone.
One answer could be to take a look at the source code for glibc and see if you could salvage what you need.

Compare two matching strings algorithms - Practical approach

I wrote two differents algorithms that resolve some particular case of strings matching (implemented in C). I know that the theoretical O of this algorithms are equals but I think that in practical, one is better than the oder.
My question is, someone could recommend me some paper or some reading where shows how to compare algorithms with a practical approach?
I have several test set, I'm interested in measure execute time and memory size. I need take this values as independently as possible of the operating system and others program that could be runing concurrently.
Thanks!!!
you could compare your algorithms by generating the assembly code and compare them.
You could generate the assembly code with the gcc -S mycode.c command
I find that "looking at the code" is a good start. If one uses more variables and is more complicated than the other, it is probably slower.
However, there are of course clever tricks that can make a more complicated function actually run faster (for example code that reads 8 bytes at a time - but of course, once you find a difference, the code is more complex - for long strings that are largely similar, there is a big win tho').
So, in the end, there is no substitute for actually running the code, using clock-cycle timing (RDTSC instruction on x86 processors, for example), or running a large loop to execute the code many times to give a reasonable length runtime.
If your code isn't supposed to run on a single embedded target, you probably want to run the code on a set of different hardware to determine if the code that is faster on processor A is also faster on B, C and D type processors. Often this does work, but sometimes you can find that a particular processor model is faster for SOME operations, and another is faster for another (for example based on cache-size, etc).
It would also be very important, in the case of string operations, to try with different size inputs, different points of difference (e.g. a long string, but different "early", vs. long string with difference "late"). Sometimes, the different approaches will show different results for short/long strings or early/late point of difference (and of course "equal" strings that are long or short).
In order to complete all comments, I found a book called "A guide to experimental algorithmics" by Catherine C. Mcgeoch Amazon and a profesor recommend me a practical paper pdf.

Dumping struct in C

Is it a good idea to simply dump a struct to a binary file using fwrite?
e.g
struct Foo {
char name[100];
double f;
int bar;
} data;
fwrite(&data,sizeof(data),1,fout);
How portable is it?
I think it's really a bad idea to just throw whatever the compiler gives(padding,integer size,etc...). even if platform portability is not important.
I've a friend arguing that doing so is very common.... in practice.
Is it true???
Edit: What're the recommended way to write portable binary file? Using some sort of library?
I'm interested how this is achieved too.(By specifying byte order,sizes,..?)
That's certainly a very bad idea, for two reasons:
the same struct may have different sizes on different platforms due to alignment issues and compiler mood
the struct's elements may have different representations on different machines (think big-endian/little-endian, IEE754 vs. some other stuff, sizeof(int) on different platforms)
It rather critically matters whether you want the file to be portable, or just the code.
If you're only ever going to read the data back on the same C implementation (and that means with the same values for any compiler options that affect struct layout in any way), using the same definition of the struct, then the code is portable. It might be a bad idea for other reasons: difficulty of changing the struct, and in theory there could be security risks around dumping padding bytes to disk, or bytes after any NUL terminator in that char array. They could contain information that you never intended to persist. That said, the OS does it all the time in the swap file, so whatEVER, but try using that excuse when users notice that your document format doesn't always delete data they think they've deleted, and they just emailed it to a reporter.
If the file needs to be passed between different platforms then it's a pretty bad idea, because you end up accidentally defining your file format to be something like, "whatever MSVC on Win32 ends up writing". This could end up being pretty inconvenient to read and write on some other platform, and certainly the code you wrote in the first place won't do it when running on another platform with an incompatible storage representation of the struct.
The recommended way to write portable binary files, in order of preference, is probably:
Don't. Use a text format. Be prepared to lose some precision in floating-point values.
Use a library, although there's a bit of a curse of choice here. You might think ASN.1 looks all right, and it is as long as you never have to manipulate the stuff yourself. I would guess that Google Protocol Buffers is fairly good, but I've never used it myself.
Define some fairly simple binary format in terms of what each unsigned char in turn means. This is fine for characters[*] and other integers, but gets a bit tricky for floating-point types. "This is a little-endian representation of an IEEE-754 float" will do you OK provided that all your target platforms use IEEE floats. Which I expect they do, but you have to bet on that. Then, assemble that sequence of characters to write and interpret it to read: if you're "lucky" then on a given platform you can write a struct definition that matches it exactly, and use this trick. Otherwise do whatever byte manipulation you need to. If you want to be really portable, be careful not to use an int throughout your code to represent the value taken from bar, because if you do then on some platform where int is 16 bits, it won't fit. Instead use long or int_least32_t or something, and bounds-check the value on writing. Or use uint32_t and let it wrap.
[*] Until you hit an EBCDIC machine, that is. Not that anybody will seriously expect your files to be portable to a machine that plain text files aren't portable to either.
How fond are you of getting a call in the middle of the night? Either use a #pragma to pack them or write them variable by variable.
Yes, this sort of foolishness is very common but that doesn't make it a good idea. You should write each field individually in a specified byte order, that will avoid alignment and byte order problems at the cost of a little tiny bit of extra effort. Reading and writing field by field will also make your life easier when you upgrade your software and have to read your old data format or if the underlying hardware architecture changes.

What is an optimal format for saving large amounts of numerical data (GBs) from a C program?

I'm a physicist that normally deals with large amounts of numerical data generated using C programs. Typically, I store everything as columns in ASCII files, but this had led to massively large files. Given that I am limited in space, this is an issue and I'd like to be a little smarter about the whole thing. So ...
Is there a better format than ASCII? Should I be using binary files, or perhaps a custom format some library?
Should I be compressing each file individually, or the entire directory? In either case, what format should I use?
Thanks a lot!
In your shoes, I would consider the standard scientific data formats, which are much less space- and time-consuming than ASCII, but (while maybe not quite as bit-efficient as pure, machine-dependent binary formats) still offer standard documented and portable, fast libraries to ease the reading and writing of the data.
If you store data in pure binary form, the metadata is crucial to make any sense out of the data again (are these numbers single or double precision, or integers and of what length, what are the arrays' dimensions, etc, etc), and issues with archiving and retrieving paired data/metadata pairs can, and in practice do, occasionally make perfectly good datasets unusable -- a real pity and waste.
CDF, in particular, is "a self-describing data format for the storage and manipulation of scalar and multidimensional data in a platform- and discipline-independent fashion" with many libraries and utilities to go with it. As alternatives, you might also consider NetCDF and HDF -- I'm less familiar with those (and such tradeoffs as flexibility vs size vs speed issues) but, seeing how widely they're used by scientists in many fields, I suspect any of the three formats could give you very acceptable results.
If you need the files for a longer time, they are important experimental data that prove somethings for you or so, don't use binary formats. You will not be able to read them when your architecture changes. dangerous. stick to text (yes ascii) files.
Choose a compression format that fits your needs. Is compression time an issue? Usually not, but check that for yourself. Is decompression time an issue? Usually yes, if you want to do data analysis on it. Under these conditions I'd go for bzip2. This is quite common nowadays, well tested, foolproof. I'd do files individually, since the larger your file, the larger the probability of losses. (Bits flip etc).
A terabyte disk is a hundred bucks. Hard to run out of space these days. Sure, storing the data in binary saves space. But there's a cost, you'll have a lot less choices to get the data out of the file again.
Check what your operating system can do. Windows supports automatic compression on folders for example, the file content get zipped by the file system without you having to do anything at all. Compression rates should compete well with raw binary data.
There's a lot of info you didn't include, but should think about:
1.) Are you storing integers or floats? What is the typical range of the numbers?
For example: storing small comma-separated integers in ascii, such as "1,2,4,2,1" will average 2-bytes per datum, but storing them as binary would require 4-bytes per datum.
If your integers are typically 3 digits, then comma-separated vs binary won't matter much.
On the other hand, storing doubles (8-byte values) will almost certainly be smaller in binary format.
2.) How do you need to access these values? If you are not concerned about access time, compress away! On the other hand, if you need speedy, random access then compression will probably hinder you.
3.) Are some values frequently repeated? Then you may consider a Huffman encoding or a table of "short-cut" values.

When to worry about endianness?

I have seen countless references about endianness and what it means. I got no problems about that...
However, my coding project is a simple game to run on linux and windows, on standard "gamer" hardware.
Do I need to worry about endianness in this case? When should I need to worry about it?
My code is simple C and SDL+GL, the only complex data are basic media files (png+wav+xm) and the game data is mostly strings, integer booleans (for flags and such) and static-sized arrays. So far no user has had issues, so I am wondering if adding checks is necessary (will be done later, but there are more urgent issues IMO).
The times when you need to worry about endianess:
you are sending binary data between machines or processes (using a network or file). If the machines may have different byte order or the protocol used specifies a particular byte order (which it should), you'll need to deal with endianess.
you have code that access memory though pointers of different types (say you access a unsigned int variable through a char*).
If you do these things you're dealing with byte order whether you know it or not - it might be that you're dealing with it by assuming it's one way or the other, which may work fine as long as your code doesn't have to deal with a different platform.
In a similar vein, you generally need to deal with alignment issues in those same cases and for similar reasons. Once again, you might be dealing with it by doing nothing and having everything work fine because you don't have to cross platform boundaries (which may come back to bite you down the road if that does become a requirement).
If you mean a PC by "standard gamer hardware", then you don't have to worry about endianness as it will always be little endian on x86/x64. But if you want to port the project to other architectures, then you should design it endianness-independently.
Whenever you recieve/transmit data from a network, remeber to convert to/from network and host byte order. The C functions htons, htonl etc, or equivalients in your language, should be used here.
Whenever you read multi-byte values (like UTF-16 characters or 32 bit ints) from a file, since that file might have originated on a system with different endianness. If the file is UTF 16 or 32 it probably has a BOM (byte-order mark). Otherwise, the file format will have to specify endianness in some way.
You only need to worry about it if your game needs to run on different hardware architectures. If you are positive that it will always run on Intel hardware then you can forget about it. If it will run on Linux though many people use different architectures than Intel and you may end up having to think about it.
Are you distributing you game in source code form?
Because if you are distributing you game as a binary only, then you know exactly which processor families your game will run on. Also, the media files, are they user generated (possibly via a level editor) or are they really only ment to be supplied by yourself?
If this is a truly closed environment (your distribute binaries and the game assets are not intended to be customized) then you know your own risks to endians and I personally wouldn't fool with it.
However, if you are either distributing source and/or hoping people will customize their game, then you have a potential for concern. However, with most of the desktop/laptop computers around these days moving to x86 I would think this is a diminishing concern.
The problem occurs with networking and how the data is sent and when you are doing bit fiddling on different processors since different processors may store the data differently in memory.
I believe Power PC has the opposite endianness of the Intel boards. Might be able to have a routine that sets the endianness dependant on the architecture? I'm not sure if you can actually tell what the hardware architecture is in code...maybe someone smarter then me does know the answer to that question.
Now in reference to your statement "standard" Gamer H/W, I would say typically you're going to look at Consumer Off the Shelf solutions are really what most any Standard Gamer is using, so you're almost going to for sure get the same endian across the board. I'm sure someone will disagree with me but that's my $.02
Ha...I just noticed on the right there is a link that is showing up related to the suggestion I had above.
Find Endianness through a c program

Resources