Why is sound gradual typing slow? - typing

There are a whole bunch of articles making the claim that gradual typing is inherently slow. The most shocking claim comes from Takikawa et al (2016) who report a 100-fold slowdown on gradually typed programs.
I simply don't believe it and want to know exactly where the slowdown comes from. I can believe that Typed Racket's implementation of gradual typing is slow, but not that gradual typing would be that slow in general.
Take for example this gradually typed Python program:
x = some_fun()
y: int = x
It is unsound because x can be of another type than int. However, if a smart compiler inserts the equivalent of an assert
x = some_fun()
assert type(x) == int
y: int = x
then the type checking becomes sound. Clearly, even if you insert hundreds of asserts everywhere in a program, a 100-fold slowdown is not realistic. What am I missing here?

Related

Influencing branchiness when branch behaviour is known

Before I begin, yes, I'm aware of the compiler built-ins __builtin_expect and __builtin_unpredictable (Clang). They do solve the issue to some extent, but my question is about something neither completely solves.
As a very simple example, suppose we have the following code.
void highly_contrived_example(unsigned int * numbers, unsigned int count) {
unsigned int * const end = numbers + count;
for (unsigned int * iterator = numbers; iterator != end; ++ iterator)
foo(* iterator % 2 == 0 ? 420 : 69);
}
Nothing complicated at all. Just calls foo() with 420 whenever the current number is even, and with 69 when it isn't.
Suppose, however, that it is known ahead of time that the data is guaranteed to look a certain way. For example, if it were always random, then a conditional select (csel (ARM), cmov (x86), etc) possibly would be better than a branch.⁰ If it were always in highly predictable patterns (e.g. always a lengthy stream of evens/odds before a lengthy stream of the other, and so on), then a branch would be better.⁰ __builtin_expect would not really solve the issue if the number of evens/odds were about equal, and I'm not sure whether the absence of __builtin_unpredictable would influence branchiness (plus, it's Clang-only).
My current "solution" is to lie to the compiler and use __builtin_expect with a high probability of whichever side, to influence the compiler to generate a branch in the predictable case (for simple cases like this, all it seems to do is change the ordering of the comparison to suit the expected probability), and __builtin_unpredictable to influence it to not generate a branch, if possible, in the unpredictable case.¹ Either that or inline assembly. That's always fun to use.
⁰ Although I have not actually done any benchmarks, I'm aware that even using a branch may not necessarily be faster than a conditional select for the given example. The example is only for illustrative purposes, and may not actually exhibit the problem described.
¹ Modern compilers are smart. More often than not, they can determine reasonably well which approach to actually use. My question is for the niche cases in which they cannot reasonably figure that out, and in which the performance difference actually matters.

Assessing Haskell speed

In short: I need to check Haskell speed on simple operations, and currently have poor results, but I'm not sure if I doing compilation/optimization right.
UPD: The problem is answered, see comments - the trouble was in different integer type...
In details:
I work in a project where a number of services are doing bulk-processing on data, so that at least certain part of these services simply need to be fast. They are doing some heavy calculations and manipulations on data, not only extract-load. In other words it is a matter of how many instances and hours are going to be spent on say on each 1e15 records of data.
Currently we are considering adding a few more services to project and some of colleagues are curious to try writing them in different language from those already used. I'm less or more ok with it, but I insist we check the "core performance" of the proposed languages first. Surely speed testing is hard and controversial, so I propose we use very simple test, with simple operations and without complex data structures etc. We agreed for "poor recursive" fibonacci function:
fib x
| x < 2 = x
| otherwise = fib (x - 2) + fib (x - 1)
main = print (fib 43)
I wrote it in several languages for comparison. The C version looks like:
#include <stdio.h>
int fib(int x) {
return (x < 2) ? x : (fib(x - 2) + fib (x - 1));
}
int main(void) {
printf("%d\n", fib(43));
}
I compile the first with ghc -O2 test.hs and the latter with gcc -O2 test.c. GHC version is 8.4.3. Right now I see results differing about 12 times (2.2 seconds on C version, 3 seconds on Java version and 30 seconds on Haskell in the same machine).
I wonder if I did everything about compiler and compiler options right. For I thought as Haskell compiles to native-code (?) it should be comparable to C and similar languages. And I need hints on this.
P.S. Please, don't say that fibonacci function could be written without "exponential" recursion - I know - but as I said we need some test with a lot of simple calculations.
P.P.S. I don't mean if we can't make Haskell faster we won't use it. But probably we'll reserve it for some other service where time is spent mainly on input-output, so it won't matter. For current bunch of "speed-critical" services it's just the matter of whether the customer will pay $10000 or $120000 for these instances monthly.

Computation Speed of Functions

I have a quick question about optimizing the speed of my code.
Is it faster to return a double in a function or to have a pointer to your double as one of your arguments?
Returning double
double multiply(double x,double y) {
return x * y;
}
Pointer argument
void multiply(double x,double y,double *xy) {
*xy = x * y;
}
The sort of decision you're talking about here is often called a microoptimization - you're optimizing the code by making a tiny change rather than rethinking the overall strategy of the program. Typically, "microoptimization" has the connotation of "rewriting something in a less obvious way with the intent of squeezing out a little bit more performance." The language's explicit way of communicating data out of a function is through return values and programmers are used to seeing that for primitive types, so if you're going to go "off script" and use an outparameter like this, there should be a good reason to do so.
Are you absolutely certain that this code is executed so frequently that it's worth considering a rewrite that makes it harder to read in the interest of efficiency? A good optimizing compiler is likely to inline this function call anyway, so chances are there's not much of a cost at all. To make sure it's worth actually rewriting the code, you should start off by running a profiler and getting hard evidence that this particular code really is a bottleneck.
Once you've gathered evidence that this actually is slowing your program down, take a step and ask - why are you doing so many multiplications? Rather than doing a microoptimization, you might ask whether there's a different algorithm you could use that requires fewer multiplications. Changes like those are more likely to lead to performance increases.
Finally, if you're sure this is the bottleneck, and you're sure that there is no way to get rid of the multiplications, then you should ask whether this change would do anything. And for that, the only real way to find out would be to make the change and measure the difference. This change is so minor that most good compilers would be smart enough to realize what you're doing and optimize accordingly. As a result, I'd be astonished if you actually saw a performance increase here.
To summarize:
Readability is so much more valuable than efficiency that, as a starting point, it's worth using the "right" language feature for the job. Just use a return statement initially.
If you have hard, concrete evidence that the function you have is a bottleneck, see whether you actually need to call the function that many times by looking for bigger-picture ways of optimizing the code.
If you absolutely need to call the function that many times, then make the change and see if it works - but don't expect that it's actually going to make a difference because the change is minor and optimizing compilers are smart.

Techniques for static code analysis in detecting integer overflows

I'm trying to find some effective techniques which I can base my integer-overflow detection tool on. I know there are many ready-made detection tools out there, but I'm trying to implement a simple one on my own, both for my personal interest in this area and also for my knowledge.
I know techniques like Pattern Matching and Type Inference, but I read that more complicated code analysis techniques are required to detect the int overflows. There's also the Taint Analysis which can "flag" un-trusted sources of data.
Is there some other technique, which I might not be aware of, which is capable of detecting integer overflows?
It may be worth to try with cppcheck static analysis tool, that claims to detect signed integer overflow as of version 1.67:
New checks:
- Detect shift by too many bits, signed integer overflow and dangerous sign conversion
Notice that it supports both C and C++ languages.
There is no overflow check for unsigned integers, as by Standard unsigned types never overflow.
Here is some basic example:
#include <stdio.h>
int main(void)
{
int a = 2147483647;
a = a + 1;
printf("%d\n", a);
return 0;
}
With such code it gets:
$ ./cppcheck --platform=unix64 simple.c
Checking simple.c...
[simple.c:6]: (error) Signed integer overflow for expression 'a+1'
However I wouldn't expect too much from it (at least with current version), as slighly different program:
int a = 2147483647;
a++;
passes without noticing overflow.
It seems you are looking for some sort of Value Range Analysis, and detect when that range would exceed the set bounds. This is something that on the face of it seems simple, but is actually hard. There will be lots of false positives, and that's even without counting bugs in the implementation.
To ignore the details for a moment, you associate a pair [lower bound, upper bound] with every variable, and do some math to figure out the new bounds for every operator. For example if the code adds two variables, in your analysis you add the upper bounds together to form the new upper bound, and you add the lower bounds together to get the new lower bound.
But of course it's not that simple. Firstly, what if there is non-straight-line code? if's are not too bad, you can just evaluate both sides and then take the union of the ranges after it (which can lose information! if two ranges have a gap in between, their union will span the gap). Loops require tricks, a naive implementation may run billions of iterations of analysis on a loop or never even terminate at all. Even if you use an abstract domain that has no infinite ascending chains, you can still get into trouble. The keywords to solve this are "widening operator" and (optionally, but probably a good idea) "narrowing operator".
It's even worse than that, because what's a variable? Your regular local variable of scalar type that never has its address taken isn't too bad. But what about arrays? Now you don't even know for sure which entry is being affected - the index itself may be a range! And then there's aliasing. That's far from a solved problem and causes many real world tools to make really pessimistic assumptions.
Also, function calls. You're going to call functions from some context, hopefully a known one (if not, then it's simple: you know nothing). That makes it hard, not only is there suddenly a lot more state to keep track of at the same time, there may be several places a function could be called from, including itself. The usual response to that is to re-evaluate that function when a range of one of its arguments has been expanded, once again this could take billions of steps if not done carefully. There also algorithms that analyze a function differently for different context, which can give more accurate results, but it's easy to spend a lot of time analyzing contexts that aren't different enough to matter.
Anyway if you've made it this far, you could read Accurate Static Branch Prediction by Value Range Propagation and related papers to get a good idea of how to actually do this.
And that's not all. Considering only the ranges of individual variables without caring about the relationships between (keyword: non-relational abstract domain) them does bad on really simple (for a human reader) things such as subtracting two variables that always close together in value, for which it will make a large range, with the assumption that they may be as far apart as their bounds allow. Even for something trivial such as
; assume x in [0 .. 10]
int y = x + 2;
int diff = y - x;
For a human reader, it's pretty obvious that diff = 2. In the analysis described so far, the conclusions would be that y in [2 .. 12] and diff in [-8, 12]. Now suppose the code continues with
int foo = diff + 2;
int bar = foo - diff;
Now we get foo in [-6, 14] and bar in [-18, 22] even though bar is obviously 2 again, the range doubled again. Now this was a simple example, and you could make up some ad-hoc hacks to detect it, but it's a more general problem. This effect tends to blow up the ranges of variables quickly and generate lots of unnecessary warnings. A partial solution is assigning ranges to differences between variables, then you get what's called a difference-bound matrix (unsurprisingly this is an example of a relational abstract domain). They can get big and slow for interprocedual analysis, or if you want to throw non-scalar variables at them too, and the algorithms start to get more complicated. And they only get you so far - if you throw a multiplication in the mix (that includes x + x and variants), things still go bad very fast.
So you can throw something else in the mix that can handle multiplication by a constant, see for example Abstract Domains of Affine Relations⋆ - this is very different from ranges, and won't by itself tell you much about the ranges of your variables, but you could use it to get more accurate ranges.
The story doesn't end there, but this answer is getting long. I hope this does not discourage you from researching this topic, it's a topic that lends itself well to starting out simple and adding more and more interesting things to your analysis tool.
Checking integer overflows in C:
When you add two 32-bit numbers and get a 33-bit result, the lower 32 bits are written to the destination, with the highest bit signaled out as a carry flag. Many languages including C don't provide a way to access this 'carry', so you can use limits i.e. <limits.h>, to check before you perform an arithmetic operation. Consider unsigned ints a and b :
if MAX - b < a, we know for sure that a + b would cause an overflow. An example is given in this C FAQ.
Watch out: As chux pointed out, this example is problematic with signed integers, because it won't handle MAX - b or MIN + b if b < 0. The example solution in the second link (below) covers all cases.
Multiplying numbers can cause an overflow, too. A solution is to double the length of the first number, then do the multiplication. Something like:
(typecast)a*b
Watch out: (typecast)(a*b) would be incorrect because it truncates first then typecasts.
A detailed technique for c can be found HERE. Using macros seems to be an easy and elegant solution.
I'd expect Frama-C to provide such a capability. Frama-C is focused on C source code, but I don't know if it is dialect-sensitive or specific. I believe it uses abstract interpretation to model values. I don't know if it specifically checks for overflows.
Our DMS Software Reengineering Toolkit has variety of langauge front ends, including most major dialects of C. It provides control and data flow analysis, and also abstract interpretation for computing ranges, as foundations on which you can build an answer. My Google Tech Talk on DMS at about 0:28:30 specifically talks about how one can use DMS's abstract interpretation on value ranges to detect overflow (of an index on a buffer). A variation on checking the upper bound on array sizes is simply to check for values not exceeding 2^N. However, off the shelf DMS does not provide any specific overflow analysis for C code. There's room for the OP to do interesting work :=}

Writing float to array is taking too much time

I have following code inside a loop:
ekp = e[k][p];
hkp = h[k][p];
uk = round(ekp);
u[k] = uk;
yk = (ekp - uk) / hkp;
y[p] = yk;
The variables are declared the following way:
float ekp, yk, hkp;
int uk;
float **e, *y, **h;
int *u;
I use local variables to store the values from arrays to access them less times. When I profile the code with Xcode I get 9.3% of total execution time on
y[p] = yk;
and only 2.7% on
u[k] = uk;
Why is there such a great difference between storing an int to an array and storing a float?
Would using declaring the variables the following way be more efficient?
register float ekp, yk, hkp;
register int uk;
First of all, it is usually pointless to discuss a particular program's performance without any specific system and hardware in mind.
On a system where both int and float have the same size, there's no reason why there would be a performance difference. The division is what would take the most time in this program, and since the supposedly slow operation happened just after the division, I suspect that you shouldn't trust the benchmarking results all that well.
What happens if you change the code to
yk = (ekp - uk) / hkp;
u[k] = uk;
y[p] = yk;
There should be no difference, so if you experience one, the tool is not to be trusted. It might be that the yk variable gets optimized away, so that the source code lines don't correspond 1:1 to the machine code.
Would using declaring the variables the following way be more efficient?
No, register is an obsolete keyword from the dark ages, when compilers were barely able to optimize anything. A modern compiler doesn't need it, it will make much better optimizing decisions than the programmer.
register key word can be used with float. if the float doesn't fit in any of the registers then the compiler just ignores it. register keyword is only a suggestion to the compiler it is not mandatory.
As for the other question i am not sure. i looked at assembly code. May be you can try http://assembly.ynh.io/ to see the assembly code.
This is telling you about 12% of the time.
What is the other 88% doing?
If you try this method you will find out.
Don't make the drunk's mistake of searching for keys under the street lamp because that's where the light is.

Resources