I would like to enumerate all the combinations (tuples of values) of 3 or more finite-valued variables which satisfy a given condition. In math notation:
For example (inspired by Project Euler problem 9):
The truth tables for two variables at a time are easy enough:
a ∘.≤ b
1 1 1 1
0 1 1 1
0 0 1 1
b ∘.≤ c
1 1 1 1 1
0 1 1 1 1
0 0 1 1 1
0 0 0 1 1
After much head-scratching, I managed to combine them, by computing the ∧ of every 4-valued row of the former with each 4-valued column of the latter, and disclosing (⊃) on the correct axis, between 1 and 2:
⎕← tt ← ⊃[1.5] (⊂[2] a ∘.≤ b) ∘.∧ (⊂[1] b ∘.≤ c)
1 1 1 1 1
0 1 1 1 1
0 0 1 1 1
0 0 0 1 1
0 0 0 0 0
0 1 1 1 1
0 0 1 1 1
0 0 0 1 1
0 0 0 0 0
0 0 0 0 0
0 0 1 1 1
0 0 0 1 1
Then I could use its ravel to filter all possible tuples of values:
⊃ (,tt) / , a ∘., b ∘., c
1 1 1
1 1 2
1 1 3
1 1 4
1 1 5
1 2 2
1 2 3
...
3 3 5
3 4 4
3 4 5
Is this the best approach to this particular class of problems in APL?
Is there an easier or faster formula for this example, or for the general case?
More generally, comparing my (naïve?) array approach above to traditional scalar languages, I can see that I'm translating each loop into an additional dimension: 3 nested loops become a 3-rank truth table:
for c in 1..NC:
for b in 1..min(c, NB):
for a in 1..min(b, NA):
collect (a,b,c)
But in a scalar language one can effect optimizations along the way, for example breaking loops as soon as possible, or choosing the loop boundaries dynamically. In this case I don't even need to test for a ≤ b ≤ c, because it's implicit in the loop boundaries.
In this example both approaches have O(N³) complexity, so their runtime will only differ by a factor. But I'm wondering: how could I write the array solution in a more optimized way, if I needed to do so?
Are there any good books or online resources that address algorithmic issues or best practices in APL?
Here's an alternative approach. I'm not sure if it would run faster.
Following your algorithm for scalar languages, the possible values of c are
⎕IO←0
c←1+⍳NC
In the inner loops the values for b and a are
b←1+⍳¨NB⌊c
a←1+⍳¨¨NA⌊b
If we combine those
r←(⊂¨¨¨a,¨¨¨b),¨¨¨c
we get a nested array of (a,b,c) triplets which can be flattened and rearranged in a matrix
r←∊r
(((⍴r)÷3),3)⍴r
ADD:
Morten Kromberg sent me the following solution. On Dyalog APL it's ~ 30 times more efficient than the one above:
⎕IO←1
AddDim←{0≡⍵:⍪⍳⍺ ⋄ n←0⌈⍺-x←¯1+⊢/⍵ ⋄ (n⌿⍵),∊x+⍳¨n}
TTable←{⊃AddDim/⌽0,⍵}
TTable 3 4 5
Related
I am a new R user and I'm using a multinomial regression (i.e. logistic regression with the response variable which has more than 2 classes.) with the function 'vglm' in R. In my dataset there are 11 continuous predictors and 1 response variable which is categorical with 3 classes.
I want to get the best subset for my regression but I don't know how to do it. Is there any function for this or I must do it manually. Because the linear functions don't seem suitable.
I have tried bestglm function but its results don't seem to be suitable for a multinomial regression.
I have also tried a shrinkage method, glmnet which is relative to lasso. It chooses all the variables in the model. But on the other hand the multinomial regression using vglm reports some variables as insignificant.
I've searched a lot on the Internet including this website but haven't found any good answer. So I'm asking here because I need really a help on this.
Thanks
There's a few basic steps involved to get what you want:
define the model grid of all potential predictor combinations
model run all potential combinations of predictors
use a criteria (or a set of multiple criteria) to select the best subset of predictors
The model grid can be defined with the following function:
# define model grid for best subset regression
# defines which predictors are on/off; all combinations presented
model.grid <- function(n){
n.list <- rep(list(0:1), n)
expand.grid(n.list)
}
For example with 4 variables, we get n^2 or 16 combinations. A value of 1 indicates the model predictor is on and a value of zero indicates the predictor is off:
model.grid(4)
Var1 Var2 Var3 Var4
1 0 0 0 0
2 1 0 0 0
3 0 1 0 0
4 1 1 0 0
5 0 0 1 0
6 1 0 1 0
7 0 1 1 0
8 1 1 1 0
9 0 0 0 1
10 1 0 0 1
11 0 1 0 1
12 1 1 0 1
13 0 0 1 1
14 1 0 1 1
15 0 1 1 1
16 1 1 1 1
I provide another function below that will run all model combinations. It will also create a sorted dataframe table that ranks the different model fits using 5 criteria. The predictor combo at the top of the table is the "best" subset given the training data and the predictors supplied:
# function for best subset regression
# ranks predictor combos using 5 selection criteria
best.subset <- function(y, x.vars, data){
# y character string and name of dependent variable
# xvars character vector with names of predictors
# data training data with y and xvar observations
require(dplyr)
reguire(purrr)
require(magrittr)
require(forecast)
length(x.vars) %>%
model.grid %>%
apply(1, function(x) which(x > 0, arr.ind = TRUE)) %>%
map(function(x) x.vars[x]) %>%
.[2:dim(model.grid(length(x.vars)))[1]] %>%
map(function(x) tslm(paste0(y, " ~ ", paste(x, collapse = "+")), data = data)) %>%
map(function(x) CV(x)) %>%
do.call(rbind, .) %>%
cbind(model.grid(length(x.vars))[-1, ], .) %>%
arrange(., AICc)
}
You'll see the tslm() function is specified...others could be used such as vglm(), etc. Simply swap in the model function you want.
The function requires 4 installed packages. The function simply configures data and uses the map() function to iterate across all model combinations (e.g. no for loop). The forecast package then supplies the cross-validation function CV(), which has the 5 metrics or selection criteria to rank the predictor subsets
Here is an application example lifted from the book "Forecasting Principles and Practice." The example also uses data from the book, which is found in the fpp2 package.
library(fpp2)
# test the function
y <- "Consumption"
x.vars <- c("Income", "Production", "Unemployment", "Savings")
best.subset(y, x.vars, uschange)
The resulting table, which is sorted on the AICc metric, is shown below. The best subset minimizes the value of the metrics (CV, AIC, AICc, and BIC), maximizes adjusted R-squared and is found at the top of the list:
Var1 Var2 Var3 Var4 CV AIC AICc BIC AdjR2
1 1 1 1 1 0.1163 -409.3 -408.8 -389.9 0.74859
2 1 0 1 1 0.1160 -408.1 -407.8 -391.9 0.74564
3 1 1 0 1 0.1179 -407.5 -407.1 -391.3 0.74478
4 1 0 0 1 0.1287 -388.7 -388.5 -375.8 0.71640
5 1 1 1 0 0.2777 -243.2 -242.8 -227.0 0.38554
6 1 0 1 0 0.2831 -237.9 -237.7 -225.0 0.36477
7 1 1 0 0 0.2886 -236.1 -235.9 -223.2 0.35862
8 0 1 1 1 0.2927 -234.4 -234.0 -218.2 0.35597
9 0 1 0 1 0.3002 -228.9 -228.7 -216.0 0.33350
10 0 1 1 0 0.3028 -226.3 -226.1 -213.4 0.32401
11 0 0 1 1 0.3058 -224.6 -224.4 -211.7 0.31775
12 0 1 0 0 0.3137 -219.6 -219.5 -209.9 0.29576
13 0 0 1 0 0.3138 -217.7 -217.5 -208.0 0.28838
14 1 0 0 0 0.3722 -185.4 -185.3 -175.7 0.15448
15 0 0 0 1 0.4138 -164.1 -164.0 -154.4 0.05246
Only 15 predictor combinations are profiled in the output since the model combination with all predictors off has been dropped. Looking at the table, the best subset is the one with all predictors on. However, the second row uses only 3 of 4 variables and the performance results are roughly the same. Also note that after row 4, the model results begin to degrade. Thats because income and savings appear to be the key drivers of consumption. As these two variables are dropped from the predictors, model performance drops significantly.
The performance of the custom function is solid since the results presented here match those of the book referenced.
A good day to you.
I have 3 2d Arrays(matrix) with 0 and 1-
For each array, I will rotate 4 times clock-wise , 4 times anti clock-wise and flip the array and repeat the above and for each iteration I will repeat the steps for other array and so on to combine the array to build a symmetry or kind of Rubik's cube but with 5 elements each side. It means if I like to add 2 arrays , it means 1 of Array 1 must be fit with 0 of Array 2.
Following kind of structure-
Following is my 3 arrays
0 0 1 0 1
1 1 1 1 1
0 1 1 1 0
1 1 1 1 1
0 1 0 1 1
-------------
0 1 0 1 0
0 1 1 1 0
1 1 1 1 1
0 1 1 1 0
0 0 1 0 0
-------------
1 0 1 0 0
1 1 1 1 1
0 1 1 1 0
1 1 1 1 1
0 1 0 1 0
-------------
This problem is evolved from the problem I asked How to solve 5 * 5 Cube in efficient easy way.
Consider my rotate methods are as follows -
rotateLeft()
rotateRight()
flipSide()
for (firstArray){
element = single.rotateLeft();
for(secondArray){
element2 = single.rotateLeft();
if(element.combine(element2){
for(thirdArray){
}
}
}
}
Currently I have fixed 3 arrays , but how exactly and efficiently I must solve this problem.
I have a table of rows which consist of zeros and numbers like this:
A B C D E F G H I J K L M N
0 0 0 4 3 1 0 1 0 2 0 0 0 0
0 1 0 1 4 0 0 0 0 0 1 0 0 0
9 5 7 9 10 7 2 3 6 4 4 0 1 0
I want to calculate an average of the numbers including zeros, but starting from the first nonzero value and put it into column after tables end. E.g. for the first row first value is 4, so average - 11/11; for the second - 7/13; the last one is 67/14.
How could I using excel formulas do this? Probably OFFSET with nested IF?
This still needs to be entered as an array formula (ctrl-shift-enter) but it isn't volatile:
=AVERAGE(INDEX(($A2:$O2),MATCH(TRUE,$A2:$O2<>0,0)):$O2)
or, depending on location:
=AVERAGE(INDEX(($A2:$O2);MATCH(TRUE;$A2:$O2<>0;0)):$O2)
The sum is the same no matter how many 0's you include, so all you need to worry about is what to divide it by, which you could determine using nested IFs, or take a cue from this: https://superuser.com/questions/671435/excel-formula-to-get-first-non-zero-value-in-row-and-return-column-header
Thank you, Scott Hunter, for good reference.
I solved the problem using a huge formula, and I think it's a bit awkward.
Here it is:
=AVERAGE(INDIRECT(CELL("address";INDEX(A2:O2;MATCH(TRUE;INDEX(A2:O2<>0;;);0)));TRUE):O2)
actual problem is like this which I got from an Online competition. I solved it but my solution, which is in C, couldn't produce answer in time for large numbers. I need to solve it in C.
Given below is a word from the English dictionary arranged as a matrix:
MATHE
ATHEM
THEMA
HEMAT
EMATI
MATIC
ATICS
Tracing the matrix is starting from the top left position and at each step move either RIGHT or DOWN, to reach the bottom right of the matrix. It is assured that any such tracing generates the same word. How many such tracings can be possible for a given word of length m+n-1 written as a matrix of size m * n?
1 ≤ m,n ≤ 10^6
I have to print the number of ways S the word can be traced as explained in the problem statement. If the number is larger than 10^9+7, I have to print S mod (10^9 + 7).
In the testcases, m and n can be very large.
Imagine traversing the matrix, whatever path you choose you need to take exatcly n+m-2 steps to make the word, among of which n-1 are down and m-1 are to the right, their order may change but the numbers n-1 and m-1 remain same. So the problem got reduced to only select n-1 positions out of n+m-2, so the answer is
C(n+m-2,n-1)=C(n+m-2,m-1)
How to calculate C(n,r) for this problem:
You must be knowing how to multiply two numbers in modular arithmetics, i.e.
(a*b)%mod=(a%mod*b%mod)%mod,
now to calculate C(n,r) you also need to divide, but division in modular arithmetic can be performed by using modular multiplicative inverse of the number i.e.
((a)*(a^-1))%mod=1
Ofcourse a^-1 in modular arithmetic need not equal to 1/a, and can be computed using Extended Euclidean Algorithm, as in your case mod is a prime number therefore
(a^(-1))=a^(mod-2)%mod
a^(mod-2) can be computed efficiently using repetitive squaring method.
I would suggest a dynamic programming approach for this problem since calculation of factorials of large numbers shall involve a lot of time, especially since you have multiple queries.
Starting from a small matrix (say 2x1), keep finding solutions for bigger matrices. Note that this solution works since in finding the solution for bigger matrix, you can use the value calculated for smaller matrices and speed up your calculation.
The complexity of the above soltion IMO is polynomial in M and N for an MxN matrix.
Use Laplace's triangle, incorrectly named also "binomial"
1 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
1 1 0 0 0
1 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
1 1 1 0 0
1 2 0 0 0
1 0 0 0 0
0 0 0 0 0
0 0 0 0 0
1 1 1 1 0
1 2 3 0 0
1 3 0 0 0
1 0 0 0 0
0 0 0 0 0
1 1 1 1 1
1 2 3 4 0
1 3 6 0 0
1 4 0 0 0
1 0 0 0 0
1 1 1 1 1
1 2 3 4 5
1 3 6 10 0
1 4 10 0 0
1 5 0 0 0
1 1 1 1 1
1 2 3 4 5
1 3 6 10 15
1 4 10 20 0
1 5 15 0 0
1 1 1 1 1
1 2 3 4 5
1 3 6 10 15
1 4 10 20 35
1 5 15 35 0
1 1 1 1 1
1 2 3 4 5
1 3 6 10 15
1 4 10 20 35
1 5 15 35 70
Got it? Notice, that elements could be counted as binomial members. The diag members are here: C^1_2, C^2_4,C^3_6,C^4_8, and so on. Choose which you need.
I know that the title of this topic might be confusing, but I didn't know how to explain it in a single sentence!
I'll try to be more clear, I have a 2d array of boolean values, every value states if that particular position (or block) is alive or not.
Let's make an example:
1 1 1 1
1 1 1 1
1 1 1 1
1 1 1 1
This array contains 16 "alive" blocks, now I can "kill" some blocks, changing their state from 1 to 0.
What I would like to do is to know if after a "kill", the group splits in two or more separate groups, for example:
1 1 0 1
1 1 0 1
0 1 0 1
1 1 1 1
This shape is still "intact", since the group of 0 is not cutting any of the 1 groups, but in this case:
1 1 0 1
1 1 0 1
0 0 0 1
1 1 1 1
Now I've killed the only bit who was keeping all the 1 together, the shape has been divided in two smaller groups!
I've tried checking the neighbours of the last killed bit but then I can't be sure of other possible connection of the shape.
I've also tried a pathfinding algorithm but this operation should be very fast and a pathfinding is too complex.
How can I achieve this?
Pick any of the alive blocks and do a flood-fill and then check if it got to all the other live blocks.