Common stack-based VM bytecode optimizations? - c

Ok, I'm posting this fearing that it might be closed before anyone ever reads it - I'm quite used to that - but I'll give it a try... even pointing me to the right direction or some existing answer that does contain a specific answer would definitely do...
So, after this brief intro...
I'm currently writting a bytecode interpreter, in C, (stack-based VM) for a programming language I have designed.
If you want to have a look at the supported opcodes, feel free to check them out here: https://github.com/arturo-lang/arturo/blob/master/src/vm/opcodes.h
There is nothing really special about the stack machine. Values are being pushed and popped, and operators and functions work on them, pushing the evaluation result back to the stack. So far so good.
Now, I'm at the point where all the core functionality is in and I'm trying to give it an extra boost by doing further optimizations.
Here's an example (and hopefully a rather illustrative one).
Input:
fibo: $(x){
if x<2 {
return 1
} {
return [fibo x-1] + [fibo x-2]
}
}
i: 0
loop i<34 {
print "fibo(" + i + ") = " + [fibo i]
i: i+1
}
Bytecode produced:
|== Data Segment /======================>
0 : [Func ]= function <5,1>
1 : [Int ]= 34
2 : [String]= fibo(
3 : [String]= ) =
==/ Data Segment =======================|
|== Bytecode Listing /======================>
0 :0 JUMP [Dword] 31
1 :5 LLOAD0
2 :6 IPUSH2
3 :7 CMPLT
4 :8 JMPIFNOT [Dword] 20
5 :13 IPUSH1
6 :14 RET
7 :15 JUMP [Dword] 30
8 :20 LLOAD0
9 :21 IPUSH1
10 :22 SUB
11 :23 GCALL0
12 :24 LLOAD0
13 :25 IPUSH2
14 :26 SUB
15 :27 GCALL0
16 :28 ADD
17 :29 RET
18 :30 RET
19 :31 CPUSH0
20 :32 GSTORE0
21 :33 IPUSH0
22 :34 GSTORE1
23 :35 GLOAD1
24 :36 CPUSH1
25 :37 CMPLT
26 :38 JMPIFNOT [Dword] 61
27 :43 CPUSH2
28 :44 GLOAD1
29 :45 ADD
30 :46 CPUSH3
31 :47 ADD
32 :48 GLOAD1
33 :49 GCALL0
34 :50 ADD
35 :51 DO_PRINT
36 :52 GLOAD1
37 :53 IPUSH1
38 :54 ADD
39 :55 GSTORE1
40 :56 JUMP [Dword] 35
41 :61 END
==/ Bytecode Listing =======================|
For anyone who has worked with compilers, bytecode interpreters or even JVM, the code above should be familiar.
What I want?
Ideas - general or specific ones - about how to further optimize my bytecode.
For examples, every *2 (that is: IPUSH2 followed by a MUL instruction) is converted to: IPUSH1, SHL since it's a faster operation.
What else would you suggest? Is there anywhere a list of such things to optimize? Could you suggest something concrete?
Thanks in advance! :)

The example you give is not particularly good, because the performance gain for an interpreter is very low if it makes a shift instead of a multiplication. The overhead of executing a single byte code instruction at all outnumbers the gain of this particular optimization in a order of several magnitudes.
The highest performance gain for an interpreter is to minimize the number of instructions that need to be performed. For example, accumulate two succeeding additions or subtractions on the same register to a single operation when possible.
To be able to make this kind of optimizations, you should try to identify so-called Basic Blocks (these are blocks where either all or no instructions are executed, i.e. no jumps in or out of the block happens) and optimize the number of instructions in those blocks by substituting several instructions into a single one while maintaining the same code semantics.
If you really mean it, you can also try to write a gcc backend for your language to compile it to bytecode; this way you can benefit from gcc's sophisticated optimization methods on the intermediate code representation (RTL).

Related

Does MATLAB support truly 1d arrays?

It would really help me reason about my MATLAB code if I didn't have to worry about accidental 2d operations. For instance, if I want to do element-wise multiplication of 1d arrays, but one is a row and another is a column, I end up with a 2d result.
>> a = 1:8;
>> a = a(:);
>> a .* cumsum(ones(8))
ans =
1 1 1 1 1 1 1 1
4 4 4 4 4 4 4 4
9 9 9 9 9 9 9 9
16 16 16 16 16 16 16 16
25 25 25 25 25 25 25 25
36 36 36 36 36 36 36 36
49 49 49 49 49 49 49 49
64 64 64 64 64 64 64 64
I'd like to prevent this type of thing, and likely other problems that I can't foresee, by keeping all my arrays 1d wherever I can. But every time I check the size() of vector, I get at least 2 elements back:
>> size(1:1:6)
ans =
1 6
>> size(linspace(0, 5, 10))
ans =
1 10
I've tried the suggestions at How to create single dimensional array in matlab? and some of the options here (PDF download), and I can't get a "truly" 1d array. How would you deal with this type of issue?
There is no such thing as 1D array. The documentation says (emphasis mine):
All MATLAB variables are multidimensional arrays, no matter what type of data. A matrix is a two-dimensional array often used for linear algebra.
You may use isvector, isrow and iscolumn to identify vectors, row vectors and column vectors respectively.
#Sardar has already said the last word. Another clue is ndims:
N = ndims(A) returns the number of dimensions in the array A. The
number of dimensions is always greater than or equal to 2. ...
But about your other question:
How would you deal with this type of issue?
There's not much you can do. Debug, find the mistake and fix it. If it's some one-time script, you are done. But if you are writing functions that may be used later, it's better to protect them from accepting arguments with unequal dimensions:
function myFunc(A, B)
if ndims(A)~=ndims(B) || any(size(A)~=size(B))
error('Matrix dimensions must agree.');
end
% ...
end
Or, if your function really needs them to be vectors:
function myFunc(A, B)
if ~isvector(A) || ~isvector(B) || any(size(A)~=size(B))
error('A and B must be vectors with same dimensions.');
end
% ...
end
You can also validate different attributes of arguments using validateattributes:
function myFunc(A, B)
validateattributes(A, {'numeric'},{'vector'}, 'myFunc', 'A')
validateattributes(B, {'numeric'},{'size', size(A)}, 'myFunc', 'B')
% ...
end
Edit:
Also, if the function only needs the inputs to be vectors and their orientation does not matter, you can modify them inside the function (thanks to #CrisLuengo for commenting).
function myFunc(A, B)
if ~isvector(A) || ~isvector(B) || length(A)~=length(B)
error('A and B must be vectors with the same length.');
end
A = A(:);
B = B(:);
% ...
end
However, this is not recommended when the output of the function is also a vector with the same size as the inputs. This is because the caller expects the output to be in the same orientation as the inputs, and if this is not the case, problems may arise.

How should I selectively sum multiple axes of an array?

What is the preferred approach in J for selectively summing multiple axes of an array?
For instance, suppose that a is the following rank 3 array:
]a =: i. 2 3 4
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
16 17 18 19
20 21 22 23
My goal is to define a dyad "sumAxes" to sum over multiple axes of my choosing:
0 1 sumAxes a NB. 0+4+8+12+16+20 ...
60 66 72 78
0 2 sumAxes a NB. 0+1+2+3+12+13+14+15 ...
60 92 124
1 2 sumAxes a NB. 0+1+2+3+4+5+6+7+8+9+10+11 ...
66 210
The way that I am currently trying to implement this verb is to use the dyad |: to first permute the axes of a, and then ravel the items of the necessary rank using ,"n (where n is the number axes I want to sum over) before summing the resulting items:
sumAxes =: dyad : '(+/ # ,"(#x)) x |: y'
This appears to work as I want, but as a beginner in J I am unsure if I am overlooking some aspect of rank or particular verbs that would enable a cleaner definition. More generally I wonder whether permuting axes, ravelling and summing is idiomatic or efficient in this language.
For context, most of my previous experience with array programming is with Python's NumPy library.
NumPy does not have J's concept of rank and instead expects the user to explicitly label the axes of an array to reduce over:
>>> import numpy
>>> a = numpy.arange(2*3*4).reshape(2, 3, 4) # a =: i. 2 3 4
>>> a.sum(axis=(0, 2)) # sum over specified axes
array([ 60, 92, 124])
As a footnote, my current implementation of sumAxes has the disadvantage of working "incorrectly" compared to NumPy when just a single axis is specified (as rank is not interchangeable with "axis").
Motivation
J has incredible facilities for handling arbitrarily-ranked arrays. But there's one facet of the language which is simultaneously almost universally useful as well as justified, but also somewhat antithetical to this dimensionality-agnostic nature.
The major axis (in fact, leading axes in general) are implicitly privileged. This is the concept that underlies, e.g. # being the count of items (i.e. the dimension of the first axis), the understated elegance and generality of +/ without further modification, and a host of other beautiful parts of the language.
But it's also what accounts for the obstacles you're meeting in trying to solve this problem.
Standard approach
So the general approach to solving the problem is just as you have it: transpose or otherwise rearrange the data so the axes that interest you become leading axes. Your approach is classic and unimpeachable. You can use it in good conscience.
Alternative approaches
But, like you, it niggles me a bit that we are forced to jump through such hoops in similar circumstances. One clue that we're kind of working against the grain of the language is the dynamic argument to the conjunction "(#x); usually arguments to conjunctions are fixed, and calculating them at runtime often forces us to use either explicit code (as in your example) or dramatically more complicated code. When the language makes something hard to do, it's usually a sign you're cutting against the grain.
Another is that ravel (,). It's not just that we want to transpose some axes; it's that we want to focus on one specific axis, and then run all the elements trailing it into a flat vector. Though I actually think this reflects more a constraint imposed by how we're framing the problem, rather than one in the notation. More on in the final section of this post.
With that, we might feel justified in our desire to address a non-leading axis directly. And, here and there, J provides primitives that allow us to do exactly that, which might be a hint that the language's designers also felt the need to include certain exceptions to the primacy of leading axes.
Introductory examples
For example, dyadic |. (rotate) has ranks 1 _, i.e. it takes a vector on the left.
This is sometimes surprising to people who have been using it for years, never having passed more than a scalar on the left. That, along with the unbound right rank, is another subtle consequence of J's leading-axis bias: we think of the right argument as a vector of items, and the left argument as a simple, scalar rotation value of that vector.
Thus:
3 |. 1 2 3 4 5 6
4 5 6 1 2 3
and
1 |. 1 2 , 3 4 ,: 5 6
3 4
5 6
1 2
But in this latter case, what if we didn't want to treat the table as a vector of rows, but as a vector of columns?
Of course, the classic approach is to use rank, to explicitly denote the the axis we're interested in (because leaving it implicit always selects the leading axis):
1 |."1 ] 1 2 , 3 4 ,: 5 6
2 1
4 3
6 5
Now, this is perfectly idiomatic, standard, and ubiquitous in J code: J encourages us to think in terms of rank. No one would blink an eye on reading this code.
But, as described at the outset, in another sense it can feel like a cop-out, or manual adjustment. Especially when we want to dynamically choose the rank at runtime. Notationally, we are now no longer addressing the array as a whole, but addressing each row.
And this is where the left rank of |. comes in: it's one of those few primitives which can address non-leading axes directly.
0 1 |. 1 2 , 3 4 ,: 5 6
2 1
4 3
6 5
Look ma, no rank! Of course, we now have to specify a rotation value for each axis independently, but that's not only ok, it's useful, because now that left argument smells much more like something which can be calculated from the input, in true J spirit.
Summing non-leading axes directly
So, now that we know J lets us address non-leading axes in certain cases, we simply have to survey those cases and identify one which seems fit for our purpose here.
The primitive I've found most generally useful for non-leading-axis work is ;. with a boxed left-hand argument. So my instinct is to reach for that first.
Let's start with your examples, slightly modified to see what we're summing.
]a =: i. 2 3 4
sumAxes =: dyad : '(< # ,"(#x)) x |: y'
0 1 sumAxes a
+--------------+--------------+---------------+---------------+
|0 4 8 12 16 20|1 5 9 13 17 21|2 6 10 14 18 22|3 7 11 15 19 23|
+--------------+--------------+---------------+---------------+
0 2 sumAxes a
+-------------------+-------------------+---------------------+
|0 1 2 3 12 13 14 15|4 5 6 7 16 17 18 19|8 9 10 11 20 21 22 23|
+-------------------+-------------------+---------------------+
1 2 sumAxes a
+-------------------------+-----------------------------------+
|0 1 2 3 4 5 6 7 8 9 10 11|12 13 14 15 16 17 18 19 20 21 22 23|
+-------------------------+-----------------------------------+
The relevant part of the definition of for dyads derived from ;.1 and friends is:
The frets in the dyadic cases 1, _1, 2 , and _2 are determined by the 1s in boolean vector x; an empty vector x and non-zero #y indicates the entire of y. If x is the atom 0 or 1 it is treated as (#y)#x. In general, boolean vector >j{x specifies how axis j is to be cut, with an atom treated as (j{$y)#>j{x.
What this means is: if we're just trying to slice an array along its dimensions with no internal segmentation, we can simply use dyad cut with a left argument consisting solely of 1s and a:s. The number of 1s in the vector (ie. the sum) determines the rank of the resulting array.
Thus, to reproduce the examples above:
('';'';1) <#:,;.1 a
+--------------+--------------+---------------+---------------+
|0 4 8 12 16 20|1 5 9 13 17 21|2 6 10 14 18 22|3 7 11 15 19 23|
+--------------+--------------+---------------+---------------+
('';1;'') <#:,;.1 a
+-------------------+-------------------+---------------------+
|0 1 2 3 12 13 14 15|4 5 6 7 16 17 18 19|8 9 10 11 20 21 22 23|
+-------------------+-------------------+---------------------+
(1;'';'') <#:,;.1 a
+-------------------------+-----------------------------------+
|0 1 2 3 4 5 6 7 8 9 10 11|12 13 14 15 16 17 18 19 20 21 22 23|
+-------------------------+-----------------------------------+
Et voila. Also, notice the pattern in the left hand argument? The two aces are exactly at the indices of your original calls to sumAxe. See what I mean by the fact that providing a value for each dimension smelling like a good thing, in the J spirit?
So, to use this approach to provide an analog to sumAxe with the same interface:
sax =: dyad : 'y +/#:,;.1~ (1;a:#~r-1) |.~ - {. x -.~ i. r=.#$y' NB. Explicit
sax =: ] +/#:,;.1~ ( (] (-#{.#] |. 1 ; a: #~ <:#[) (-.~ i.) ) ##$) NB. Tacit
Results elided for brevity, but they're identical to your sumAxe.
Final considerations
There's one more thing I'd like to point out. The interface to your sumAxe call, calqued from Python, names the two axes you'd like "run together". That's definitely one way of looking at it.
Another way of looking at it, which draws upon the J philosophies I've touched on here, is to name the axis you want to sum along. The fact that this is our actual focus is confirmed by the fact that we ravel each "slice", because we do not care about its shape, only its values.
This change in perspective to talk about the thing you're interested in, has the advantage that it is always a single thing, and this singularity permits certain simplifications in our code (again, especially in J, where we usually talk about the [new, i.e. post-transpose] leading axis)¹.
Let's look again at our ones-and-aces vector arguments to ;., to illustrate what I mean:
('';'';1) <#:,;.1 a
('';1;'') <#:,;.1 a
(1;'';'') <#:,;.1 a
Now consider the three parenthesized arguments as a single matrix of three rows. What stands out to you? To me, it's the ones along the anti-diagonal. They are less numerous, and have values; by contrast the aces form the "background" of the matrix (the zeros). The ones are the true content.
Which is in contrast to how our sumAxe interface stands now: it asks us to specify the aces (zeros). How about instead we specify the 1, i.e. the axis that actually interests us?
If we do that, we can rewrite our functions thus:
xas =: dyad : 'y +/#:,;.1~ (-x) |. 1 ; a: #~ _1 + #$y' NB. Explicit
xas =: ] +/#:,;.1~ -#[ |. 1 ; a: #~ <:###$#] NB. Tacit
And instead of calling 0 1 sax a, you'd call 2 xas a, instead of 0 2 sax a, you'd call 1 xas a, etc.
The relative simplicity of these two verbs suggests J agrees with this inversion of focus.
¹ In this code I'm assuming you always want to collapse all axes except 1. This assumption is encoded in the approach I use to generate the ones-and-aces vector, using |..
However, your footnote sumAxes has the disadvantage of working "incorrectly" compared to NumPy when just a single axis is specified suggests sometimes you want to only collapse one axis.
That's perfectly possible and the ;. approach can take arbitrary (orthotopic) slices; we'd only need to alter the method by which we instruct it (generate the 1s-and-aces vector). If you provide a couple examples of generalizations you'd like, I'll update the post here. Probably just a matter of using (<1) x} a: #~ #$y or ((1;'') {~ (e.~ i.###$)) instead of (-x) |. 1 ; a:#~<:#$y.

What does 'ln-ct' and 'nc-lns' mean in GNU complexity output?

According to the manual, GNU complexity scores look like this:
Score | ln-ct | nc-lns| file-name(line): proc-name
3 13 12 tokenize.c(507): skip_params
3 15 13 score.c(121): handle_stmt_block
3 22 17 tokenize.c(64): check_quote
Can anyone confirm if ln-ct means line count and explain what nc-lns means?
Why I didn't just test this, I don't know. Anyway, from messing around with a source file I can confirm:
ln-ct = line count
nc-lns = non-comment lines
Leaving this here in case it's helpful - the manual doesn't describe these terms.

Generating random poll numbers

I struggle with this simple problem: I want to create some random poll numbers. I have 4 variables I need to fill with data (actually an array of integer). These numbers should represent a random percentage. All percentages added will be 100% . Sounds simple.
But I think it isn't that easy. My first attempt was to generate a random number between 10 and base (base = 100), and substract the number from the base. Did this 3 times, and the last value was assigned the base. Is there a more elegant way to do that?
My question in a few words:
How can I fill this array with random values, which will be 100 when added together?
int values[4];
You need to write your code to emulate what you are simulating.
So if you have four choices, generate a sample size of random number (0..1 * 4) and then sum all the 0's, 1's, 2's, and 3's (remember 4 won't be picked). Then divide the counts by the sample size.
for (each sample) {
poll = random(choices);
survey[poll] += 1;
}
It's easy to use a computer to simulate things, simple simulations are very fast.
Keep in mind that you are working with integers, and integers don't divide nicely without converting them to floats or doubles. If you are missing a few percentage points, odds are it has to do with your integers dividing with remainders.
What you have here is a problem of partitioning the number 100 into 4 random integers. This is called partitioning in number theory.
This problem has been addressed here.
The solution presented there does essentially the following:
If computes, how many partitions of an integer n there are in O(n^2) time. This produces a table of size O(n^2) which can then be used to generate the kth partition of n, for any integer k, in O(n) time.
In your case, n = 100, and k = 4.
Generate x1 in range <0..1>, subtract it from 1, then generate x2 in range <0..1-x1> and so on. Last value should not be randomed, but in your case equal 1-x1-x2-x3.
I don't think this is a whole lot prettier than what it sounds like you've already done, but it does work. (The only advantage is it's scalable if you want more than 4 elements).
Make sure you #include <stdlib.h>
int prev_sum = 0, j = 0;
for(j = 0; j < 3; ++j)
{
values[j] = rand() % (100-prev_sum);
prev_sum += values[j];
}
values[3] = 100 - prev_sum;
It takes some work to get a truly unbiased solution to the "random partition" problem. But it's first necessary to understand what "unbiased" means in this context.
One line of reasoning is based on the intuition of a random coin toss. An unbiased coin will come up heads as often as it comes up tails, so we might think that we could produce an unbiased partition of 100 tosses into two parts (head-count and tail-count) by tossing the unbiased coin 100 times and counting. That's the essence of Edwin Buck's proposal, modified to produce a four-partition instead of a two-partition.
However, what we'll find is that many partitions never show up. There are 101 two-partitions of 100 -- {0, 100}, {1, 99} … {100, 0} but the coin sampling solution finds less than half of them in 10,000 tries. As might be expected, the partition {50, 50} is the most common (7.8%), while all of the partitions from {0, 100} to {39, 61} in total achieved less than 1.7% (and, in the trial I did, the partitions from {0, 100} to {31, 69} didn't show up at all.) [Note 1]
So that doesn't seem like a unbiased sample of possible partitions. An unbiased sample of partitions would return every partition with equal probability.
So another temptation would be to select the size of the first part of the partition from all the possible sizes, and then the size of the second part from whatever is left, and so on until we've reached one less than the size of the partition at which point anything left is in the last part. However, this will turn out to be biased as well, because the first part is much more likely to be large than any other part.
Finally, we could enumerate all the possible partitions, and then choose one of them at random. That will obviously be unbiased, but unfortunately there are a lot of possible partitions. For the case of 4-partitions of 100, for example, there are 176,581 possibilities. Perhaps that is feasible in this case, but it doesn't seem like it will lead to a general solution.
For a better algorithm, we can start with the observation that a partition
{p1, p2, p3, p4}
could be rewritten without bias as a cumulative distribution function (CDF):
{p1, p1+p2, p1+p2+p3, p1+p2+p3+p4}
where the last term is just the desired sum, in this case 100.
That is still a collection of four integers in the range [0, 100]; however, it is guaranteed to be in increasing order.
It's not easy to generate a random sorted sequence of four numbers ending in 100, but it is trivial to generate three random integers no greater than 100, sort them, and then find adjacent differences. And that leads to an almost unbiased solution, which is probably close enough for most practical purposes, particularly since the implementation is almost trivial:
(Python)
def random_partition(n, k):
d = sorted(randrange(n+1) for i in range(k-1))
return [b - a for a, b in zip([0] + d, d + [n])]
Unfortunately, this is still biased because of the sort. The unsorted list is selected without bias from the universe of possible lists, but the sortation step is not a simple one-to-one match: lists with repeated elements have fewer permutations than lists without repeated elements, so the probability of a particular sorted list without repeats is much higher than the probability of a sorted list with repeats.
As n grows large with respect to k, the number of lists with repeats declines rapidly. (These correspond to final partitions in which one or more of the parts is 0.) In the asymptote, where we are selecting from a continuum and collisions have probability 0, the algorithm is unbiased. Even in the case of n=100, k=4, the bias is probably ignorable for many practical applications. Increasing n to 1000 or 10000 (and then scaling the resulting random partition) would reduce the bias.
There are fast algorithms which can produce unbiased integer partitions, but they are typically either hard to understand or slow. The slow one, which takes time(n), is similar to reservoir sampling; for a faster algorithm, see the work of Jeffrey Vitter.
Notes
Here's the quick-and-dirty Python + shell test:
$ python -c '
from random import randrange
n = 2
for i in range(10000):
d = n * [0]
for j in range(100):
d[randrange(n)] += 1
print(' '.join(str(f) for f in d))
' | sort -n | uniq -c
1 32 68
2 34 66
5 35 65
15 36 64
45 37 63
40 38 62
66 39 61
110 40 60
154 41 59
219 42 58
309 43 57
385 44 56
462 45 55
610 46 54
648 47 53
717 48 52
749 49 51
779 50 50
788 51 49
723 52 48
695 53 47
591 54 46
498 55 45
366 56 44
318 57 43
234 58 42
174 59 41
118 60 40
66 61 39
45 62 38
22 63 37
21 64 36
15 65 35
2 66 34
4 67 33
2 68 32
1 70 30
1 71 29
You can brute force it by, creating a calculation function that adds up the numbers in your array. If they do not equal 100 then regenerate the random values in array, do calculation again.

How to identify breaks within an array of MATLAB?

I have an array in MATLAB containing elements such as
A=[12 13 14 15 30 31 32 33 58 59 60];
How can I identify breaks in values of data? For example, the above data exhibits breaks at elements 15 and 33. The elements are arranged in ascending order and have an increment of one. How can I identify the location of breaks of this pattern in an array? I have achieved this using a for and if statement (code below). Is there a better method to do so?
count=0;
for i=1:numel(A)-1
if(A(i+1)==A(i)+1)
continue;
else
count=count+1;
q(count)=i;
end
end
Good time to use diff and find those neighbouring differences that aren't equal to 1. However, this will return an array which is one less than the length of your input array because it finds pairwise differences up until the last element, so naturally there will be one less. As such, when you find the locations that aren't equal to 1, make sure you add 1 to the locations to account for this:
>> A=[12 13 14 15 30 31 32 33 58 59 60];
>> q = find(diff(A) ~= 1) + 1
q =
5 9
This tells us that locations 5 and 9 in your array is where the jump happens, and that's right for your example data.
However, if you want to find the locations before the jump happens, such as in your code, don't add 1 to the result:
>> q = find(diff(A) ~= 1)
q =
4 8

Resources