I've implemented some sorting algorithms (to sort integers) in C, carefully using uint64_t to store anything which has got to do with the data size (thus also counters and stuff), since the algorithms should be tested also with data sets of several giga of integers.
The algorithms should be fine, and there should be no problems about the amount of data allocated: data is stored on files, and we only load little chunks per time, everything works fine even when we choke the in-memory buffers to any size.
Tests with datasets up to 4 giga ints (thus 16GB of data) work fine (sorting 4Gint took 2228 seconds, ~37 minutes), but when we go above that (ie: 8 Gints) the algorithm doesn't seem to halt (it's been running for about 16 hours now).
I'm afraid the problem could be due to integer overflow, maybe a counter in a loop is stored on a 32 bits variable, or maybe we're calling some functions that works with 32 bits integers.
What else could it be?
Is there any easy way to check whether an integer overflow occurs at runtime?
This is compiler-specific, but if you're using gcc then you can compile with -ftrapv to issue SIGABRT when signed integral overflow occurs.
For example:
/* compile with gcc -ftrapv <filename> */
#include <signal.h>
#include <stdio.h>
#include <limits.h>
void signalHandler(int sig) {
printf("Overflow detected\n");
}
int main() {
signal(SIGABRT, &signalHandler);
int largeInt = INT_MAX;
int normalInt = 42;
int overflowInt = largeInt + normalInt; /* should cause overflow */
/* if compiling with -ftrapv, we shouldn't get here */
return 0;
}
When I run this code locally, the output is
Overflow detected
Aborted
Take a look at -ftrapv and -fwrapv:
-ftrapv
This option generates traps for signed overflow on addition, subtraction, multiplication operations.
-fwrapv
This option instructs the compiler to assume that signed arithmetic overflow of addition, subtraction and multiplication wraps around using twos-complement representation. This flag enables some optimizations and disables other. This option is enabled by default for the Java front-end, as required by the Java language specification.
See also Integer overflow in C: standards and compilers, and Useful GCC flags for C.
clang now support dynamic overflow checks for both signed and unsigned integers. See -fsanitize=integer switch. For now it is only one C++ compiler with fully supported dynamic overflow checking for debug purpose.
If you are using Microsoft's compiler, there are options to generate code that triggers a SEH exception when an integer conversion cuts off non-zero bits. In places where this is actually desired, use a bitwise AND to remove the upper bits before doing the conversion.
The only sure fire way is to wrap operations on those integers into functions that perform bounds violation checking. This will of course slow down integer ops, but if your code asserts or halts on a boundary violation with a meaningful error message, that will go a long way towards helping you identify where the problem is.
As for your particular issue, keep in mind that general case sorting is O(nlogn), so the reason the algorithm is taking much longer could be due to the fact that the increase in time is not linear with respect to the data set size. Since also didn't mention how much physical memory is in the box and how much of it is used for your algorithm, there could possibly be page faulting to disk with the larger data set, thus potentially slowing things to a crawl.
Related
Assuming to have a counter which counts from 0 to 100, is there an advantage of declaring the counter variable as uint16_t instead of uint8_t.
Obviously if I use uint8_t I could save some space. On a processor with natural wordsize of 16 bits access times would be the same for both I guess. I couldn't think why I would use a uint16_t if uint8_t can cover the range.
Using a wider type than necessary can allow the compiler to avoid having to mask the higher bits.
Suppose you were working on a 16 bit architecture, then using uint16_t could be more efficient, however if you used uint16_t instead of uint8_t on a 32 bit architecture then you would still have the mask instructions but just masking a different number of bits.
The most efficient type to use in a cross-platform portable way is just plain int or unsigned int, which will always be the correct type to avoid the need for masking instructions, and will always be able to hold numbers up to 100.
If you are in a MISRA or similar regulated environment that forbids the use of native types, then the correct standard-compliant type to use is uint_fast8_t. This guarantees to be the fastest unsigned integer type that has at least 8 bits.
However, all of this is nonsense really. Your primary goal in writing code should be to make it readable, not to make it as fast as possible. Penny-pinching instructions like this makes code convoluted and more likely to have bugs. Also because it is harder to read, the bugs are less likely to be found during code review.
You should only try to optimize like this once the code is finished and you have tested it and found the particular part which is the bottleneck. Masking a loop counter is very unlikely to be the bottleneck in any real code.
Obviously if I use uint8_t I could save some space.
Actually, that's not necessarily obvious! A loop index variable is likely to end up in a register, and if it does there's no memory to be saved. Also, since the definition of the C language says that much arithmetic takes place using type int, it's possible that using a variable smaller than int might actually end up costing you space in terms of extra code emitted by the compiler to convert back and forth between int and your smaller variable. So while it could save you some space, it's not at all guaranteed that it will — and, in any case, the actual savings are going to be almost imperceptibly small in the grand scheme of things.
If you have an array of some number of integers in the range 0-100, using uint8_t is a fine idea if you want to save space. For an individual variable, on the other hand, the arguments are pretty different.
In general, I'd say that there are two reasons not to use type uint8_t (or, equivalently, char or unsigned char) as a loop index:
It's not going to save much data space (if at all), and it might cost code size and/or speed.
If the loop runs over exactly 256 elements (yours didn't, but I'm speaking more generally here), you may have introduced a bug (which you'll discover soon enough): your loop may run forever.
The interviewer was probably expecting #1 as an answer. It's not a guaranteed answer — under plenty of circumstances, using the smaller type won't cost you anything, and evidently there are microprocessors where it can actually save something — but as a general rule, I agree that using an 8-bit type as a loop index is, well, silly. And whether or not you agree, it's certainly an issue to be aware of, so I think it's a fair interview question.
See also this question, which discusses the same sorts of issues.
The interview question doesn't make much sense from a platform-generic point of view. If we look at code such as this:
for(uint8_t i=0; i<n; i++)
array[i] = x;
Then the expression i<n will get carried out on type int or larger because of implicit promotion. Though the compiler may optimize it to use a smaller type if it doesn't affect the result.
As for array[i], the compiler is likely to use a type corresponding to whatever address size the system is using.
What the interviewer was fishing for is likely that uint32_t on a 32 bitter tend to generate faster code in some situations. For those cases you can use uint_fast8_t, but more likely the compiler will perform optimizations no matter.
The only optimization uint8_t blocks the compiler from doing, is to allocate a larger variable than 8 bits on the stack. It doesn't however block the compiler from optimizing out the variable entirely and using a register instead. Such as for example storing it in an index register with the same width as the address bus.
Example with gcc x86_64: https://godbolt.org/z/vYscf3KW9. The disassembly is pretty painful to read, but the compiler just picked CPU registers to store anything regardless of the type of i, giving identical machine code between uint8_t anduint16_t. I would have been surprised if it didn't.
On a processor with natural wordsize of 16 bits access times would be the same for both I guess.
Yes this is true for all mainstream 16 bitters. Some might even manage faster code if given 8 bits instead of 16. Some exotic systems like DSP exist, but in case of lets say a 1 byte=16 bits DSP, then the compiler doesn't even provide you with uint8_t to begin with - it is an optional type. One generally doesn't bother with portability to wildly exotic systems, since doing so is a waste of everyone's time and money.
The correct answer: it is senseless to do manual optimization without a specific system in mind. uint8_t is perfectly fine to use for generic, portable code.
I am working on a tutorial for binary numbers. Something I have wondered for a while is why all the integer maximum and minimum values touch. For example for an unsigned byte 255 + 1 = 0 and 0 - 1 = 255. I understand all the binary math that goes into it, but why was the decision made to have them work this way instead of a straight number line that gives an error when the extremes are breached?
Since your example is unsigned, I assume it's OK to limited the scope to unsigned types.
Allowing wrapping is useful. For example, it's what allows you (and the compiler) to always reorder (and constant-fold) a sequence of additions and subtractions. Even something such as x + 3 - 1 could not be optimized to x + 2 if the language requires trapping, because it changes the conditions under which the expression would trap. Wrapping also mixes better with bit manipulation, with the interpretation of an unsigned number as a vector of bits it makes very little sense if there's trapping. That applies especially to shifts, but addition, subtraction and even multiplication also make sense on bitvectors and combine usefully with the usual bitwise operations.
The algebraic structure you get when allowing wrapping, Z/2kZ, is fairly nice (perhaps not as nice as modulo a prime, but that would interact badly with the bitvector interpretation and it doesn't match typical hardware) and well known, so it's not like anything particularly unexpected or weird will happen, it's not like a wrapped result is a "uselessly arbitrary" result.
And of course testing the carry flag (or whatever may be required) after just about every operation has a big direct overhead as well.
Trapping on "unsigned overflow" is both expensive and undesirable, at least if it is the default behaviour.
Why not "give an error when the extremes are breached"?
Error handling is one of the hardest things in software development. When an error happens, there are many possible ways software could be required to do:
Show an annoying message to the user? Like Hey user, you have just tried to add 1 to this variable, which is already too big. Stop that! - there often is no context to show the user, that would be of any help.
Throw an exception? (BTW C has support for that) - that would show a stack trace, if you happened to execute your code in a debugger. Otherwise, it would just crash - not bad (it won't corrupt your files) but not good either (can be exploited as a denial of service attack).
Write it to a log file? - sometimes it's the best thing to do - record the error and move on, so it can be debugged later.
The right thing to do depends on your code. So a generic programming language like C doesn't want to restrict you by providing any mandatory behavior.
Instead, C provides two guidelines:
For unsigned types like unsigned int or uint8_t or (usually) char - it provides silent wraparound, for best performance.
For signed types like int - it provides "undefined behavior", which makes it possible to "choose", in a very limited way, what will happen on overflow
Throw an exception if using -ftrapv in gcc
Silent wraparound if using -fwrapv in gcc
By default (no fancy command-line options) - the compiler may assume it will never happen, which may help it produce optimized code
The idea here is that you (the programmer) should think where checking for overflow is worth doing, and how to recover from overflow (if the language provided a standard error handling mechanism, it would deny you the latter part). This approach has maximum flexibility, (potentially) maximum performance, and (usually) hardest to do - which fits the philosophy of C.
I am working with a large C library where some array indices are computed using int.
I need to find a way to trap integer overflows at runtime in such way as to narrow to problematic line of code. Libc manual states:
FPE_INTOVF_TRAP
Integer overflow (impossible in a C program unless you enable overflow trapping in a hardware-specific fashion).
however gcc option -ffpe-trap suggests that those only apply to FP numbers?
So how I do enable integer overflow trap? My system is Xeon/Core2, gcc-4.x, Linux 2.6
I have looked through similar questions but they all boil to modifying the code. I need to know however which code is problematic in the first place.
If Xeons can't trap overflows, which processors can? I have access to non-emt64 machines as well.
I have found a tool designed for llvm meanwhile: http://embed.cs.utah.edu/ioc/
There doesn't seem to be however an equivalent for gcc/icc?
Ok, I may have to answer my own question.
I found gcc has -ftrapv option, a quick test does confirm that at least on my system overflow is trapped. I will post more detailed info as I learn more since it seems very useful tool.
Unsigned integer arithmetic does not overflow, of course.
With signed integer arithmetic, overflow leads to undefined behaviour; anything could happen. And optimizers are getting aggressive about optimizing stuff that overflows. So, your best bet is to avoid the overflow, rather than trapping it when it happens. Consider using the CERT 'Secure Integer Library' (the URL referenced there seems to have gone AWOL/404; I'm not sure what's happened yet) or Google's 'Safe Integer Operation' library.
If you must trap overflow, you are going to need to specify which platform you are interested in (O/S including version, compiler including version), because the answer will be very platform specific.
Do you know exactly which line the overflow is occuring on? If so, you might be able to look at the assembler's Carry flag if the operation in question caused an overflow. This is the flag that the CPU uses to do large number calculation and, while not available at the C level, might help you to debug the problem - or at least give you a chance to do something.
BTW, found this link for gcc (-ftrapv) that talks about an integer trap. Might be what you are looking for.
You can use inline assembler in gcc to use an instruction that might generate an overflow and then test the overflow flag to see if it actually does:
int addo(int a, int b)
{
asm goto("add %0,%1; jo %l[overflow]" : : "r"(a), "r"(b) : "cc" : overflow);
return a+b;
overflow:
return 0;
}
In this case, it tries to add a and b, and if it does, it goes to the overflow label. If there's no overflow, it continues, doing the add again and returning it.
This runs into the GCC limitation that an inline asm block cannot both output a value and maybe branch -- if it weren't for that, you wouldn't need a second add to actually get the result.
Related question: How to detect integer overflow?
In C code, should integer overflow be addressed whenever integers are added? It seems like pointers and array indexes should be checked at all. When should integer overflow be checked for?
When numbers are added in C without type explicitly mentioned, or printed with printf, when will overflow occur?
Is there a way to automatically detect integer arithmetic overflow?
I've heard about setjmp()- or longjmp()-based exception handling in C, but I think there's no native way of doing this. What I usually do is just make sure the types used are long enough to contain all the additions/multiplications I'll need to make.
The whole point of using C, as opposed to managed languages such as C#, which will throw an OverflowException, is precisely the fact that no CPU power is wasted on safety checks. C will simply turn the counter around, and go from FFFFFFFF to 00000000, so you can check for that (if a>b and such), but other than that I can just recommend using longer types. 64 bits (long long) should address all your needs.
Overflow won't occur when you print a number with printf, or at least I haven't heard of such a possibility. For additions, I'd just use adequate types and tell the compiler how to interpret the values so that you can avoid unnecessary casts (like, the literal "123" will be interpreted as 32 bit, but "123LL" will be 64 bit - same as with ".1f" vs. ".1").
For array indices - you should always make sure you don't read/write out of your array, as C in many cases will happily corrupt your data without causing an error.
As for when integer overflow should be checked for... Well, whenever it may occur and you don't want it to occur :).
The general answer is rarely. If the result should be valid, but overflowed instead then you should have used a larger type. If the largest type isn't sufficient then you should have used a big int library.
There is no automatic, standard way to detect this built into C. Some hardware supports it, but it isn't standard. This was covered in the thread you linked to.
The type of literals is always defined, it's just not always explicit. Here's a list of literal types. Generally performing arithmetic with literals will overflow either if you manage to overflow whatever type the compiler uses for intermediate operation, or when the result gets assigned to a type of lower precision that doesn't have enough space.
When you say "automatically detect overflow", what exactly do you mean? Overflow detection as a debugging tool, i.e. something that aborts our program is a way similar to a failed assertion? Or some kind of full-time run-time mechanism that would let you detect and handle the situation gracefully?
If you are interested in it as a debugging tool, then you should consult your compiler documentation. GCC, for example, provides -ftrapv option that "generates traps for signed overflow on addition, subtraction, multiplication operations" (see code generation options)
If I make a calculation with a variable where an intermediate part of the calculation goes higher then the bounds of that variable type, is there any hazard that some platforms may not like?
This is an example of what I'm asking:
int a, b;
a=30000;
b=(a*32000)/32767;
I have compiled this, and it does give the correct answer of 29297 (well, within truncating error, anyway). But the part that worries me is that 30,000*32,000 = 960,000,000, which is a 30-bit number, and thus cannot be stored in a 16-bit int. The end result is well within the bounds of an int, but I was expecting that whatever working part of memory would have the same size allocated as the largest source variables did, so an overflow error would occur.
This is just a small example to show my problem, I am trying to avoid using floating points by making the fraction be a fraction of the max amount able to be stored in that variable (in this case, a signed integer, so 32767 on the positive side), because the embedded system I'm using I believe does not have an FPU.
So how do most processors handle calculations out of the bounds of the source and destination variables?
On a 16-bit compiler/CPU, you can (almost) plan on that code giving incorrect results. This is a bit sad, since nearly every CPU (that has a multiply instruction at all) will produce and store the intermediate result, but no C compiler (of which I'm aware) will normally use it (and if you made a and b unsigned, it wouldn't be allowed to use it).
You have a few choices to deal with this. One is to write small muldiv function in assembly language that does the multiplication (preserving the high word) then the division on that, and finally returns the value to C when it's been reduced back into range.
Another option is to do the math on unsigned integers, which at least allow you to figure out when a problem occurred. Unfortunately, none of the choices is what I'd call particularly appealing though...
As far as I know, most if not all processors will hold results for a word * word multiplication in a double word -- meaning, an 8 bit * 8 bit is stored in a 16-bit register(s) on an 8-bit processor, a 32-bit * 32 bit operation is stored in a 64-bit register(s) on a 32-bit machine. (At least, that's how it's been on all the embedded microcontrollers I've used)
If that weren't the case, the processor would be severely crippled in the sense of only allowing half-word * half-word multiplication.
AFAIK this kind of thing is formally "undefined". You have to do the algebra necessary to prevent overflow. That's always your first choice. Numeric stability is no accident, it requires some care in deciding when and how to do division and multiplication.
Or, you have to guarantee that you'll use an intermediate result buffer that's big enough.
Using a large intermediate buffer is what some C compilers do anyway. The language, however, doesn't make any guarantees.
So, to be sure that it works, most folks do something like this.
short a= 30000;
int temp= a;
int temp2= (a*32000)/32767;
// here you can check for errors; if temp2 > 32767, you have overflow.
short b= a;
Signed integer overflow is undefined behavior.
Almost any implementation you could possibly meet will wrap around on integer overflow, because (a) everyone uses 2's complement, in which arithmetic operations are bitwise identical for signed and unsigned types of the same size, and (b) wraparound is the defined behavior of unsigned types in C.
So, on an implementation with a 16 bit int, I would expect the result 0 for your calculation (and that is the result that it must have if you'd used an unsigned 16 bit int). But I'd code against the possibility it might throw a hardware exception, explode, etc.
Note that if you do the calculation with two 16 bit short variables on a machine with a 32 bit int, then you will generally get the "right" answer 29297, because the intermediate value (a*32000) is an int, and only gets truncated back to short at the end. I say "generally" because converting an out-of-bounds integer value to a signed integer type either gives an unspecified result or else raises a signal. But again, any implementation you'll encounter in polite company just takes a modulus.
Are you sure your compiler has 16 bit integers? On most systems nowadays, ints are 32 bits. Another possible reason you aren't getting an error is that some compilers will recognize that it can compute something like this at compile time and will do so.
If you are really concerned that you will end up with overflow, you can sometimes reorder or factor the formula differently so that no intermediate terms will overflow. In your example that would be hard to do since all of your terms are near the limit of a 16 bit value. Do you need the number to be exactly right, or can you approximate? If you can, you can do something like this:
int a, b;
a=30000;
//b=(a*32000)/32767 ~= a * (32000/32768) = a *(125/128)
b = (a / 128) * 125 // if a=30000, b = 29250 - about 0.16% error
Another option would be to use larger sized types for intermediate terms. If your compiler had 16 bit ints and 32 bit longs, you could do something like this:
int a, b;
a=30000;
b=((long)a*32000L)/32767L;
Really, there's no set answer for how to handle overflow. You need to evaluate each case on its own and decide what the best solution is.
Your compiler and target processor both have to do with the sizes of the various data types.
Compilers will usually promote variables to the largest easy to work with size during calculations and then convert the results whatever size is needed for an assignment at the end.
There's also C rules that govern promoting to sizes which are more difficult to work with for some calculations. If you are compiling for an AVR, which has 8 bit registers but defines an int to be 16 bits, many calculations end up using more registers than you might think that they need because of this promotion and the fact that constant numbers in your code have to be thought of as being int or unsigned int unless the compiler can prove to itself that this won't effect the outcome of the calculations.
Try rewriting your code with various different sizes of integers (short, int, long, long long) and see how that goes. You may also want to write a simple program that prints out the sizeof( ) of the standard predefined types.
If you need to worry about the sizes of your integer variables and/or the intermediate results of your calculations then you should include and use things like uint32_t and int64_t for your declarations and type casting.