Efficiently Creating Multiple Variables Using apply in R - database

I have a data frame DF which contains numerous variables. Each variable is present twice because I am conducting an analysis of "couples".
Among others, DF has a series of indicators of diversity :
DF$div1.1, DF$div2.1, .... , DF$divN.1, DF$div.1.2, ..., DF$divN.2
Similarly, it has a series of indicators of another characteristic:
DF$char1.1, DF$char2.1, .... , DF$charM.1, DF$char.1.2, ..., DF$charM.2
Here's a link to an example of DF: http://shorttext.com/5d90dd64
Each time the ".1", ".2" stand for the couple member considered.
My goal:
For each indicator divI and charJ, I want to create another variable DF$divchar that takes the value DF$divI.1 when DF$charJ.1>DF$charJ.2; and DF$divI.2 when DF$charJ.1<DF$charJ.2.
Here is the solution I came up with, it seems somehow very intricate and sometimes behaves in strange ways:
I created a series of binary variables that take the value one if DF$charJ.1>DF$charJ.2. The are stored under DF$CharMax.1.
Here's how I created it:
DF$CharMax.1 <- as.data.frame(
sapply(1:length(nam),
function(n)
as.numeric(DF[names(DF)==names.1[n]]
>DF[names(DF)==names.2[n]])
))
I created the function BinaryExtract:
BinaryExtract <- function(var1, var2, extract) {var1*extract +var2*(1-extract)}
I created the matrix NameFull that contains all the possible combinations of div and char, separated with "YY"
NameFull <- sapply(c("div1",...,"divN")
, function(nam) paste(nam, names(DF$YMax.1), sep="YY")
And then I create all my variables:
DF[, as.vector(NameFull)] <- lapply(as.vector(NameFull), function(e)
BinaryExtract(DF[,paste0(unlist(strsplit(e,"YY"))[1],".1")]
, DF[, paste0(unlist(strsplit(e,"YY"))[1],".1")]
, DF$charMax.1[unlist(strsplit(e,"YY"))[2]]))
My Problem
A. It looks like a very complicated solution for something that simple. What am I missing?
B. Moreover, when I print DF, just typing DF in the command window, I do not see the variables NameFull. They seem to appear with the names of char.
Here's what I get: http://shorttext.com/5d9102c
Similarly, I have tried to change all their names to get rid of the "YY" and it does not seem to work:
names(DF[, as.vector(NameFull)]) <- as.vector(c("div1",...,"divN"), sapply(, function(nam)
paste(nam, names(DF$YMax.1), sep=".")))
When I look at names(DF), I keep getting the old names with the "YY"
However, I do get a result if I explicitly call for them
> DF[,"divIYYcharJ"]
I would really appreciate any suggestion, comment and explanation. I am quite new to R ad was more used to Stata. I feel there is something deeply inefficient here. Thanks

Related

Is it possible to insert a matrix bigger than 10x10 in LaTeX? [duplicate]

I have a 3x12 matrix I'd like to input into my LaTeX (with amsmath) document but LaTeX seems to choke when the matrix gets larger than 3x10:
\begin{equation}
\textbf{e} =
\begin{bmatrix}
1&1&1&1&0&0&0&0&-1&-1&-1&-1\\
1&-1&0&0&1&1&-1&-1&0&0&1&-1\\
0&0&1&-1&1&-1&1&-1&1&-1&0&0
\end{bmatrix}
\end{equation}
The error: Extra alignment tab has been changed to \cr. tells me that I have more & than the bmatrix environment can handle. Is there a proper way to handle this? It also seems that the alignment for 1's and the -1's are different, is that also expected of the bmatrix?
From the amsmath documentation (texdoc amsmath):
The amsmath package provides some
environments for matrices beyond the
basic array environment of LATEX. The
pmatrix, bmatrix, Bmatrix, vmatrix and
Vmatrix have (respectively) ( ), [
], { }, | |, and ∥
∥ delimiters built in. For naming
consistency there is a matrix
environment sans delimiters. This is
not entirely redundant with the array
environment; the matrix environments
all use more economical horizontal
spacing than the rather prodigal
spacing of the array environment.
Also, unlike the array environment,
you don’t have to give column
specifications for any of the matrix
environments; by default you can have
up to 10 centered columns. (If you
need left or right alignment in a
column or other special formats you
must resort to array.)
i.e. bmatrix defaults to a 10 column maximum.
A footnote adds
More precisely: The maximum number of
columns in a matrix is determined by
the counter MaxMatrixCols (normal
value = 10), which you can change if
necessary using LATEX’s \setcounter or
\addtocounter commands.
If you came to this page looking for the exact command (thanks to Scott Wales for the answer), you want this in your preamble:
\setcounter{MaxMatrixCols}{20}
Where you can replace 20 with the maximum number of columns you want.
The answer by Scott is correct, but I've since learned you can override the alignment. Taken from http://texblog.net/latex-archive/maths/matrix-align-left-right/
\makeatletter
\renewcommand*\env#matrix[1][c]{\hskip -\arraycolsep
\let\#ifnextchar\new#ifnextchar
\array{*\c#MaxMatrixCols #1}}
\makeatother
Now allows the command:
\begin{bmatrix}[r] ....
to have right-alignment!
Instead of a bmatrix you can use +bmatrix from the tabularray package:
\documentclass{article}
\usepackage{tabularray}
\UseTblrLibrary{amsmath}
\begin{document}
\begin{equation}
\textbf{e} =
\begin{+bmatrix}
1&1&1&1&0&0&0&0&-1&-1&-1&-1\\
1&-1&0&0&1&1&-1&-1&0&0&1&-1\\
0&0&1&-1&1&-1&1&-1&1&-1&0&0
\end{+bmatrix}
\end{equation}
\end{document}

R automate testing?

Currently, I employ following script to test for something called Granger causality. Note: My main question is about the script structure, not the method.
#These values always have to be specified manually
dat <- data.frame(df[2],df[3])
lag = 2
# VAR-Model
V <- VAR(dat,p=lag,type="both")
V$var
summary(V)
wald.test(b=coef(V$var[[1]]), Sigma=vcov(V$var[[1]]), Terms=c(seq(2, by=2, length=lag)))
names(cof1[2])
wald.test(b=coef(V$var[[2]]), Sigma=vcov(V$var[[2]]), Terms=c(seq(1, by=2, length=lag)))
names(cof1[1])
Main issue is, that I always manually have to change the testing pair in dat <- data.frame(..). Further, I always manually enter "lag = x" after some tests on stationarity that can rather not be automated.
Let's say I would have to test following pairs:
df[2],df[3]
df[2],df[4]
df[2],df[5]
df[6],df[7]
df[8],df[9]
can I somehow state that in an array for the test? Assuming I would also know lag for each testing pair, could I also add that to it?
It would be perfect to directly output my test results into a table, instead of manually changing the data and then entering the result into an Excel/LaTeX.

Populating 2D-Arrays from CSV (without m*n-Loops)

while migrating an Excel-VBA project to Visual Basic 2010, I came across a problem when populating arrays.
In Excel-VBA I would do something like
Function mtxCorrel() As Variant
mtxCorrel = wsCorr.UsedRange
End Function
to read an m*n-matrix (in this case n*n), that is conveniently stored in a worksheet, into an array for further use.
In VB2010 I obviously won't use an Excel-Worksheet as storage. csv-Files (see below) seem like a decent alternative.
I want to populate an 2d-array with the csv-contents without looping n*n-times. Let's assume I already know n=4 for demonstration purposes.
This suggests that what I want to do cant be done.
Nevertheless I still hope something like the following could work:
Function mtxCorrel() As Object
Dim array1(4, 4) As String
Using ioReader As New Microsoft.VisualBasic.FileIO.TextFieldParser("C:\cm_KoMa.csv")
With ioReader
.TextFieldType = FileIO.FieldType.Delimited
.SetDelimiters(";")
' Here I want to...
' A) ...either populate the whole 2d-array with something like
array1 = .ReadToEnd()
' B) ... or populate the array by looping its 1d-"rows"
While Not .EndOfData
array1(.LineNumber, 0)= .ReadFields()
End While
End With
End Using
return array1
End Function
Notes:
I'm mainly interested in populating the array.
I'm less interested in potential errors with determining which csv-line belongs into which 1d-"row", and also not interested in checking n.
Appendix: sample csv-File:
1;0.5;0.9;0.3
0.5;1;0.6;0.2
0.9;0.6;1;0.1
0.3;0.2;0.1;1

Select random item from an array with certain probabilities and add it to the stage

Its quite a big task but ill try to explain.
I have an array with a list of 200 strings and I want to be able to randomly select one and add it to the stage using code. I have movieclips exported for actionscript with the same class name as the strings in the array. Also, if it is possible, would I be able to select the strings with predictability such as the first has a 0.7 chance the second a 0.1 etc. Here is what i have currently
var nameList:Array=["Jimmy","Bob","Fred"]
var instance:DisplayObject = createRandom(nameList);
addChild(instance);
function createRandom(typeArray:Array):*
{
// Select random String from typeArray.
var selection:String = typeArray[ int(Math.random() * typeArray.length) ];
// Create instance of relevant class.
var Type:Class = getDefinitionByName(selection) as Class;
// Return created instance.
return new Type();
}
All this throws me this error
ReferenceError: Error #1065: Variable [class Jimmy] is not defined.
Ive searched for other threads similar but none combine the three specific tasks of randomisation, predictability and addChild().
I think that you've got two problems: a language problem and a logic problem. In the .fla connected to your code above, in the Library find each symbol representing a name and write into the 'AS linkage' column for that symbol the associated name -- e.g., 'Bob,' 'Fred' -- just the name, no punctuation.
Now getDefinitionByName() will find your 'Class'
If you put a different graphic into each MovieClip -- say, a piece of fruit or a picture of Bob,Jim, Fred -- and run your program you'll get a random something on stage each time.
That should solve your language problem. But the logic problem is a little harder, no?
That's why I pointed you to Mr. Kelly's solution (the first one, which for me is easier to grasp).

Prolog: Finding the Nth Element in a List

I am attempting to locate the nth element of a List in Prolog. Here is the code I am attempting to use:
Cells = [OK, _, _, _, _, _] .
...
next_safe(_) :-
facing(CurrentDirection),
delta(CurrentDirection, Delta),
in_cell(OldLoc),
NewLoc is OldLoc + Delta,
nth1(NewLoc, Cells, SafetyIdentifier),
SafetyIdentifier = OK .
Basically, I am trying to check to see if a given cell is "OK" to move into. Am I missing something?
there is a predfined predicate called nth0 ..
5 ?- nth0(1,[1,2,3],X).
X = 2.
6 ?- listing(nth0).
lists:nth0(A, B, C) :-
integer(A), !,
A>=0,
nth0_det(A, B, C).
lists:nth0(A, B, C) :-
var(A), !,
nth_gen(B, C, 0, A).
true.
index listing start from 0
hope this helps ..
Louis, I'm not entirely clear on what you're aiming to do with this code, but a couple of comments that might hopefully help.
Things that start with a capital letter in Prolog are variables to be matched against in rules. _ is a special symbol that can be used in place of a variable name to indicate that any value can match.
next_safe(_) is therefore only capable of providing you with a true/false answer if you give it a specific value. One of the major benefits of Prolog is the ability to unify against a variable through backtracking (as ony said). This would mean that when written correctly you could just ask Prolog next_safe(X). and it would return all the possible values (safe moves) that unify with X.
To go back to the first point about capital letters. This means that OK is actually a variable waiting to be matched. It is effectively an empty box that you are trying to match against another empty box. I think what you're intending is to use the value ok which is different. You do not assign to variables in the same way that you do in other programming styles. Something like the following might be closer to what you are looking for, though I'm still not sure it's right as it looks like you're trying to assign things but I'm not certain how your nth1 works.
Cells = [ok, _, _, _, _, _] .
...
next_safe(NewLoc) :-
facing(CurrentDirection),
delta(CurrentDirection, Delta),
in_cell(OldLoc),
NewLoc is OldLoc + Delta,
nth1(NewLoc, Cells, ok).

Resources