How to create Orthogonal array? - arrays

Suppose we have following three factors:
Factor A: 5 possible values
Factor B: 4 possible values
Factor C: 2 possible values
How can I construct an Orthogonal array for these?
Main thing which I don't understand is making the combinations. I remember we used to follow '11112222', '11221122', '12121212' this kinda combinations, but it seems everyone has different approach for filling the values in array.
Is there any standard approach?

There isn't a single neat algorithm that generates orthogonal arrays to order. Instead there are a variety of constructions that have been discovered in a host of different areas of mathematics, and some techniques for modifying orthogonal arrays to change their parameters in some way or another. For instance see http://www.itl.nist.gov/div898/handbook/pri/section3/pri33a.htm and http://www.win.tue.nl/~aeb/preprints/oa3.pdf. Many statistics packages have an orthogonal array design utility which uses these rules and a list of known orthogonal arrays to try and find an orthogonal array that will satisfy the requirements it has been given.
In your case I can find nothing closer at the moment than the six five-level factors design at http://www.york.ac.uk/depts/maths/tables/l25.htm using 25 runs. You can certainly discard three columns. Where you have e.g. five levels in the design and only 4 (or 2) levels in the experiment I would be inclined to consistently relabel e.g. {1,2,3,4,5} -> {1,2,3,4,4} and {1,2,3,4,5} => {1,2,1,2,1} but I have no clear idea of what this does to the experimental properties.

The computing of orthogonal arrays can be computationally expensive, so designs are generally made available in the form of a library.
The R package DOE.base has a oa.design() function that retrieves a design with a given number of factors and factor levels. For example, to retrieve a design with 3 factors and levels of 3, 4 and 5, use these commands.
library(DOE.base)
oa.design(nlevels=c(3,4,5))
In this case, the returned design is a full factorial with 60 runs. This still is an orthogonal array, but a much more expensive experiment than the alternatives with equal factor levels.
To obtain an orthogonal array 3 factors with 5 levels each, use:
oa.design(nlevels=c(5,5,5))
A B C
1 1 5 4
2 2 1 5
3 3 4 5
4 3 5 2
5 5 2 4
6 3 3 3
7 5 5 5
8 5 4 3
9 2 5 3
10 5 1 2
11 4 1 3
12 5 3 1
13 4 4 4
14 1 1 1
15 1 2 3
16 3 2 1
17 2 3 4
18 4 3 2
19 4 5 1
20 3 1 4
21 1 3 5
22 1 4 2
23 4 2 5
24 2 2 2
25 2 4 1
The entering 3 factors with 4 levels each returns an orthogonal array of 16 runs and entering 3 factors of 3 levels returns an orthogonal array of 9 runs.
Alternatively, the Python package OApackage is available in PyPi (https://pypi.org/project/OApackage/).
For more information, see:
Complete Enumeration of Pure-Level and Mixed-Level Orthogonal Arrays, E.D. Schoen, P.T. Eendebak, M.V.M. Nguyen, Journal of Combinatorial Designs, Volume 18, Issue 2, pages 123-140, 2010.
Two-Level Designs to Estimate All Main Effects and Two-Factor Interactions, Pieter T. Eendebak, Eric D. Schoen, Technometrics Vol. 59 , Iss. 1, 2017

Related

Using Key operator to make a game but need more tree depth to create a more complex tree

I am working on a game using the Key operator to create simple parent tree nodes connected with children. Like (1 3 2 7 11 12) with 1 as a parent node and 3 2 7 11 12 children. The array has all the information via Key to create the nested array. Of course its extremely fast. But I actually need 2 or 3 more depth. I can create a different tree construction shown on the 'same' array - second image. This different encoding (1 2 1 1 2 3 1 3 3.....) allows arbitrarily nesting vector depth and works perfectly. - with just a simple array.
There could be enough information with the Key transformation on the array then more code to connect the children nodes - for needed depth. Are there any same or similar APL/Co-dfns for (1.) transforming the array into the tree (2.) - and back? I am new to APL and focusing on the rectangular. Tree wrangling is down the road. I need almost the same for Key speed due to very long arrays and their nested arrays.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
1 2 1 1 2 3 1 3 3 3 1 1 2 7 8 9 16 4
Using Key:
{⊂⍵}⌸1 2 1 1 2 3 1 3 3 3 1 1 2 7 8 9 16 4
(1 3 4 7 11 12) (2 5 13) (6 8 9 10) (14) (15) (16) (17) (18)
Using maybe Key and something else....
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
1. 1 2 1 1 2 3 1 3 3 3 1 1 2 7 8 9 16 4
2. (1 3 4 (7 14) 11 12) (2 5 13) (6 (8 15) (9 (16 17)) 10) (,18)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
1 2 2 2 2 2 1 7 8 3 10 11 10 10 10 15 9
(different array for same tree encoding)
(1(7(8 (9 17)))) (2 3 4 5 6) (10(11 12) 13 14 (15 16))
({⊂⍵}⌸⍠ 2) 1 2 1 1 2 3 1 3 3 3 1 1 2 7 8 9 16 4
Perhaps using Variant on Key down the road?
There are some ways to do this, but the best method will depend on what you want to do with the results. If you really do have very large arrays, then producing the "nested children" representation of arrays is going to be expensive no matter how you compute them, because the underlying representation is expensive (though, no more expensive than the same sort of representation in another language).
Section 3.2 of (Hsu 2019) discusses this in detail:
"A data parallel compiler hosted on the GPU". Hsu, Aaron W.
https://scholarworks.iu.edu/dspace/handle/2022/24749
Generally speaking, if you intend to work with the data in some way, it is almost always faster and easier to work directly with the parent vector or depth vector representation instead of first converting to a record-type style representation.
One technique is to query the data in parent vector form first, to identify the relevant nodes over which you intend to work, and only then to extract the children nodes for that limited set using primitives like membership (∊) or where (⍸).
If you can describe the sort of operations you intend to perform over these nested representations, there might be a better algorithm that does not require the conversion.
If you do wish to simply create the full record-type representation, there is some conversion code in (Hsu 2019). You can also look at the P2D and D2P functions in the Co-dfns compiler:
https://github.com/Co-dfns/Co-dfns/blob/master/src/codfns/P2D.aplf
https://github.com/Co-dfns/Co-dfns/blob/master/src/codfns/D2P.aplf
These may give you some additional help in converting between the formats.
If you need to convert directly between the parent and record-type representation, you can use something akin to this:
kids←{0=≢k←⍸p=⍵:⍵ ⋄ ⍵,∇¨k~⍵}¨
And apply it to the root nodes of your tree like this:
kids ⍸p=⍳≢p
where p is your parent vector.
I hope this helps!

How should I selectively sum multiple axes of an array?

What is the preferred approach in J for selectively summing multiple axes of an array?
For instance, suppose that a is the following rank 3 array:
]a =: i. 2 3 4
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
16 17 18 19
20 21 22 23
My goal is to define a dyad "sumAxes" to sum over multiple axes of my choosing:
0 1 sumAxes a NB. 0+4+8+12+16+20 ...
60 66 72 78
0 2 sumAxes a NB. 0+1+2+3+12+13+14+15 ...
60 92 124
1 2 sumAxes a NB. 0+1+2+3+4+5+6+7+8+9+10+11 ...
66 210
The way that I am currently trying to implement this verb is to use the dyad |: to first permute the axes of a, and then ravel the items of the necessary rank using ,"n (where n is the number axes I want to sum over) before summing the resulting items:
sumAxes =: dyad : '(+/ # ,"(#x)) x |: y'
This appears to work as I want, but as a beginner in J I am unsure if I am overlooking some aspect of rank or particular verbs that would enable a cleaner definition. More generally I wonder whether permuting axes, ravelling and summing is idiomatic or efficient in this language.
For context, most of my previous experience with array programming is with Python's NumPy library.
NumPy does not have J's concept of rank and instead expects the user to explicitly label the axes of an array to reduce over:
>>> import numpy
>>> a = numpy.arange(2*3*4).reshape(2, 3, 4) # a =: i. 2 3 4
>>> a.sum(axis=(0, 2)) # sum over specified axes
array([ 60, 92, 124])
As a footnote, my current implementation of sumAxes has the disadvantage of working "incorrectly" compared to NumPy when just a single axis is specified (as rank is not interchangeable with "axis").
Motivation
J has incredible facilities for handling arbitrarily-ranked arrays. But there's one facet of the language which is simultaneously almost universally useful as well as justified, but also somewhat antithetical to this dimensionality-agnostic nature.
The major axis (in fact, leading axes in general) are implicitly privileged. This is the concept that underlies, e.g. # being the count of items (i.e. the dimension of the first axis), the understated elegance and generality of +/ without further modification, and a host of other beautiful parts of the language.
But it's also what accounts for the obstacles you're meeting in trying to solve this problem.
Standard approach
So the general approach to solving the problem is just as you have it: transpose or otherwise rearrange the data so the axes that interest you become leading axes. Your approach is classic and unimpeachable. You can use it in good conscience.
Alternative approaches
But, like you, it niggles me a bit that we are forced to jump through such hoops in similar circumstances. One clue that we're kind of working against the grain of the language is the dynamic argument to the conjunction "(#x); usually arguments to conjunctions are fixed, and calculating them at runtime often forces us to use either explicit code (as in your example) or dramatically more complicated code. When the language makes something hard to do, it's usually a sign you're cutting against the grain.
Another is that ravel (,). It's not just that we want to transpose some axes; it's that we want to focus on one specific axis, and then run all the elements trailing it into a flat vector. Though I actually think this reflects more a constraint imposed by how we're framing the problem, rather than one in the notation. More on in the final section of this post.
With that, we might feel justified in our desire to address a non-leading axis directly. And, here and there, J provides primitives that allow us to do exactly that, which might be a hint that the language's designers also felt the need to include certain exceptions to the primacy of leading axes.
Introductory examples
For example, dyadic |. (rotate) has ranks 1 _, i.e. it takes a vector on the left.
This is sometimes surprising to people who have been using it for years, never having passed more than a scalar on the left. That, along with the unbound right rank, is another subtle consequence of J's leading-axis bias: we think of the right argument as a vector of items, and the left argument as a simple, scalar rotation value of that vector.
Thus:
3 |. 1 2 3 4 5 6
4 5 6 1 2 3
and
1 |. 1 2 , 3 4 ,: 5 6
3 4
5 6
1 2
But in this latter case, what if we didn't want to treat the table as a vector of rows, but as a vector of columns?
Of course, the classic approach is to use rank, to explicitly denote the the axis we're interested in (because leaving it implicit always selects the leading axis):
1 |."1 ] 1 2 , 3 4 ,: 5 6
2 1
4 3
6 5
Now, this is perfectly idiomatic, standard, and ubiquitous in J code: J encourages us to think in terms of rank. No one would blink an eye on reading this code.
But, as described at the outset, in another sense it can feel like a cop-out, or manual adjustment. Especially when we want to dynamically choose the rank at runtime. Notationally, we are now no longer addressing the array as a whole, but addressing each row.
And this is where the left rank of |. comes in: it's one of those few primitives which can address non-leading axes directly.
0 1 |. 1 2 , 3 4 ,: 5 6
2 1
4 3
6 5
Look ma, no rank! Of course, we now have to specify a rotation value for each axis independently, but that's not only ok, it's useful, because now that left argument smells much more like something which can be calculated from the input, in true J spirit.
Summing non-leading axes directly
So, now that we know J lets us address non-leading axes in certain cases, we simply have to survey those cases and identify one which seems fit for our purpose here.
The primitive I've found most generally useful for non-leading-axis work is ;. with a boxed left-hand argument. So my instinct is to reach for that first.
Let's start with your examples, slightly modified to see what we're summing.
]a =: i. 2 3 4
sumAxes =: dyad : '(< # ,"(#x)) x |: y'
0 1 sumAxes a
+--------------+--------------+---------------+---------------+
|0 4 8 12 16 20|1 5 9 13 17 21|2 6 10 14 18 22|3 7 11 15 19 23|
+--------------+--------------+---------------+---------------+
0 2 sumAxes a
+-------------------+-------------------+---------------------+
|0 1 2 3 12 13 14 15|4 5 6 7 16 17 18 19|8 9 10 11 20 21 22 23|
+-------------------+-------------------+---------------------+
1 2 sumAxes a
+-------------------------+-----------------------------------+
|0 1 2 3 4 5 6 7 8 9 10 11|12 13 14 15 16 17 18 19 20 21 22 23|
+-------------------------+-----------------------------------+
The relevant part of the definition of for dyads derived from ;.1 and friends is:
The frets in the dyadic cases 1, _1, 2 , and _2 are determined by the 1s in boolean vector x; an empty vector x and non-zero #y indicates the entire of y. If x is the atom 0 or 1 it is treated as (#y)#x. In general, boolean vector >j{x specifies how axis j is to be cut, with an atom treated as (j{$y)#>j{x.
What this means is: if we're just trying to slice an array along its dimensions with no internal segmentation, we can simply use dyad cut with a left argument consisting solely of 1s and a:s. The number of 1s in the vector (ie. the sum) determines the rank of the resulting array.
Thus, to reproduce the examples above:
('';'';1) <#:,;.1 a
+--------------+--------------+---------------+---------------+
|0 4 8 12 16 20|1 5 9 13 17 21|2 6 10 14 18 22|3 7 11 15 19 23|
+--------------+--------------+---------------+---------------+
('';1;'') <#:,;.1 a
+-------------------+-------------------+---------------------+
|0 1 2 3 12 13 14 15|4 5 6 7 16 17 18 19|8 9 10 11 20 21 22 23|
+-------------------+-------------------+---------------------+
(1;'';'') <#:,;.1 a
+-------------------------+-----------------------------------+
|0 1 2 3 4 5 6 7 8 9 10 11|12 13 14 15 16 17 18 19 20 21 22 23|
+-------------------------+-----------------------------------+
Et voila. Also, notice the pattern in the left hand argument? The two aces are exactly at the indices of your original calls to sumAxe. See what I mean by the fact that providing a value for each dimension smelling like a good thing, in the J spirit?
So, to use this approach to provide an analog to sumAxe with the same interface:
sax =: dyad : 'y +/#:,;.1~ (1;a:#~r-1) |.~ - {. x -.~ i. r=.#$y' NB. Explicit
sax =: ] +/#:,;.1~ ( (] (-#{.#] |. 1 ; a: #~ <:#[) (-.~ i.) ) ##$) NB. Tacit
Results elided for brevity, but they're identical to your sumAxe.
Final considerations
There's one more thing I'd like to point out. The interface to your sumAxe call, calqued from Python, names the two axes you'd like "run together". That's definitely one way of looking at it.
Another way of looking at it, which draws upon the J philosophies I've touched on here, is to name the axis you want to sum along. The fact that this is our actual focus is confirmed by the fact that we ravel each "slice", because we do not care about its shape, only its values.
This change in perspective to talk about the thing you're interested in, has the advantage that it is always a single thing, and this singularity permits certain simplifications in our code (again, especially in J, where we usually talk about the [new, i.e. post-transpose] leading axis)¹.
Let's look again at our ones-and-aces vector arguments to ;., to illustrate what I mean:
('';'';1) <#:,;.1 a
('';1;'') <#:,;.1 a
(1;'';'') <#:,;.1 a
Now consider the three parenthesized arguments as a single matrix of three rows. What stands out to you? To me, it's the ones along the anti-diagonal. They are less numerous, and have values; by contrast the aces form the "background" of the matrix (the zeros). The ones are the true content.
Which is in contrast to how our sumAxe interface stands now: it asks us to specify the aces (zeros). How about instead we specify the 1, i.e. the axis that actually interests us?
If we do that, we can rewrite our functions thus:
xas =: dyad : 'y +/#:,;.1~ (-x) |. 1 ; a: #~ _1 + #$y' NB. Explicit
xas =: ] +/#:,;.1~ -#[ |. 1 ; a: #~ <:###$#] NB. Tacit
And instead of calling 0 1 sax a, you'd call 2 xas a, instead of 0 2 sax a, you'd call 1 xas a, etc.
The relative simplicity of these two verbs suggests J agrees with this inversion of focus.
¹ In this code I'm assuming you always want to collapse all axes except 1. This assumption is encoded in the approach I use to generate the ones-and-aces vector, using |..
However, your footnote sumAxes has the disadvantage of working "incorrectly" compared to NumPy when just a single axis is specified suggests sometimes you want to only collapse one axis.
That's perfectly possible and the ;. approach can take arbitrary (orthotopic) slices; we'd only need to alter the method by which we instruct it (generate the 1s-and-aces vector). If you provide a couple examples of generalizations you'd like, I'll update the post here. Probably just a matter of using (<1) x} a: #~ #$y or ((1;'') {~ (e.~ i.###$)) instead of (-x) |. 1 ; a:#~<:#$y.

Divide a data set into bins of size n matlab

I have a data set of size 11490x1. the data is recorded every 0.25 second(i.e. 4hz). So, 1 second accounts for 4 data points. The goal here is to further create sub sets every 3 seconds, meaning that I want to look at data every 3 seconds and analyze it. for example: if I had data such as [1 2 3 4 5 6 8 2 4 2 4 3 2 4 2 5 2 5 24 2 5 1 5 1], I want to have a sub set [1 2 3 4 5 6 8 2 4 2 4 3 ] and so on...
Any help would be appreciate.
It really depends on how you plan to "analyse" your data. The simplest way is to use a loop:
n = 4*3;
breaks = 0:n:numel(data)
for i = 1:numel(breaks)-1
sub = data(breaks(i)+1:breaks(i+1));
%// do analysis
%// OR sub{i} = data(breaks(i)+1:breaks(i+1));
end
A vectorized approach might use reshape(data,[],12) after padding data so that mod(numel(data),12)==0
A third way might be to break your matrix up into a cell array using mat2cell or in a for loop like above but instead of sub=... rather use sub{i}=...

Need some simple logic help, been stuck for a few hours

The problem is asking to take any amount of numbers, and find the highest possible sum of difference(using absolute value) between consecutive numbers. For example numbers 1 2 and 3 would be arranged 3 1 2 to get a sum of 3 (3-1 = 2, and 1-2 = 1).
Now my first thoughts were to take the highest number in the list followed by the lowest number and arrange in that way through the end, but that doesnt work out as the end of the list will end up having all of the numbers in the middle accumulating almost no differences. The only other thing I have thought of is to find every single possible order and return the highest sum, but with a longer list this will take way too long and I assume there might be a better way.
For reference here are some sample input and output numbers
9 2 5 3 1 -> 21
7 3 4 5 5 7 6 8 5 4 -> 24
Any help at all would be much appreciated, even if its just pointing me in the right direction.
There are 2 approaches to this problem.
Approach 1:
Brute force.
Approach 2:
Figure out an algorithm for how to arrange the numbers.
I always like approach 2 better if it is feasible.
It seems reasonable that you would get a high sum if you order the numbers high-low-high-low-high...
So start by sorting the numbers and then divide them into two equally large groups of low and high numbers. If there is an odd number of numbers the middle number will be left over.
Then you just pick numbers alternately from the two groups.
It is easy to prove that the order of the interior numbers doesn't matter as long as you stick with the high-low-high-low ordering.
However, since the start and end number only has one neighbour, the first and last number should be the middle numbers.
Finally, if you have an odd number of numbers, place the last number at the start or end, whatever gives the biggest difference.
Example:
7 3 4 5 5 7 6 8 5 4 -> [sort] -> 3 4 4 5 5 5 6 7 7 8
high numbers: 5 6 7 7 8
low numbers: 3 4 4 5 5
Arranged:
5 3 6 4 7 4 7 5 8 5 = 24
Example:
9 2 5 3 1 -> [sort] -> 1 2 3 5 9
high numbers: 5 9
low numbers: 1 2
left over: 3
Arranged:
3 5 1 9 2 = 21 (3 goes at the start, because |3-5| > |3-2|)

Partitioning by counts in SQL

This issue is related to question I asked here. I have a table that looks like this:
Item Count
1 1
2 4
3 8
4 2
5 6
6 3
I need to group items that are, for example, less than 5 into a new group and the total of each groups should be at least 5. The result should look like this:
Item Group Count
1 1 1
2 1 4
3 2 8
4 3 2
5 4 6
6 3 3
How do I achieve this? Many thanks.
Why isn't this a correct result?
Item Group Count
1 1 1
2 2 4
3 3 8
4 4 2
5 5 6
6 1 3
Or this?
Item Group Count
1 1 1
2 2 4
3 3 8
4 4 2
5 5 6
6 6 3
Seems to me that you're trying to solve the answer 'how to group the items as to minimize the number of groups and maximize the number of items in each group, w/o exceeding the limit 5'. Which sounds a lot like the Knapsack problem. Perhaps a you should read the Celko's SQL Stumper: The Class Scheduling Problem and the solutions proposed. Others have also approached this problem, eg. And now for a completely inappropriate use of SQL Server. Heads up: this is no a trivial problem by any means. Any naive algorithm will die a slow death attempting to solve it on a 1M rows table...

Resources