Create a matrix from variables using subsequent values - loops

I have data in Stata with 3 variables, a string id and numeric variables (GPS data - latitude and longitude). I would like to convert the variables into a matrix in the following way (the lower table) to calculate the distance between two id-spots for all combinations. So a newly created subsequent column (e.g, id_1) has a subsequent(i+1) value of the original variable (e.g., id), and so on. However, the following command works only until the n-th row is reached to get a value; then the subsequent new rows become empty. Thus, the half bottom of the matrix gets missing (the upper table: ///) . For 2000 observations:
foreach num of numlist 1/2000 {
foreach var of varlist id num1 num2 {
gen `var'_`num'=`var'[_n+`num']
}
}

I post an answer if anybody finds any use.
//duplicate all observation to create all filled matrix
expand 2, gen(dupindex)
forvalue i = 1/1999 {
foreach var of varlist id num1 num2 {
gen `var'`i'=`var'[_n+`i']
}
}
//delete the unnecessary columns & rows
forvalue i = 2000/3999 {
drop id`i' num1`i' num2`i'
}
drop in 2001/3999
drop dupindex

Related

Problem with copying volatile data values to rows of a google sheet

I am trying to convert an old Excel VBA program to Google Sheets for projection 30 years of financial results. The key issue is the VBA Excel sheet haw a row (a1:ad1) of (30) cells that each contain the volatile function Rand() to generate a random number in each cell through each of 1000 loops:
In each loop these cells 1) reset their random number, then 2) some rows of calculations are made based on the new random numbers, then 3) then a resulting row of 30 new values in copy-pasted (values only) to another part of the sheet so the individual results can be statistically analyzed.
The copy-paste in google sheets is too slow and I hit a 6 minute execution time out, so I am trying to accumulate each loop into a single 2D array, each row being one of the 1000 loop results in 3 above,then write the entire array back to the sheet at once. To get the 30 cells to recalc, each time I read a row, I force a new value to be set in a single cell of the spreadsheet at the beginning of each loop. That seems to work, but trying move the data from the row to the accumulation array and then writing the values back to individual rows in the sheet at once, I always wind up with arrays the have too many dimensions some other problems getting the data back to the sheet as a 1000 x 30 range of static data.
Here's the basic outline of where I am so far after many variations of this :
function test1(){
var sheet=SpreadsheetApp.getActive().getSheetByName('UST_Results'),
range;
var values=[];
var LargeArray= Array(new Array()) //new Array()
for (var i=0;i<3;i++) { // using a small number (3) rows to get it work
values =getRandRow(sheet)[0][i];
LargeArray(LargeArray.length) = JSON.stringify(values)
};
};
function getRandRow(sheet) {
var values_array =[];
let myData =[];
var cell = 0;
// force the Rand() to recalc before getting the next copy of A1:Ad1
for (var i=0;i<1;i++) {
var cell = sheet.getRange(i+5,1);
cell.setValue(i);
SpreadsheetApp.flush();
}
range = sheet.getRange('A1:ad1');
values_array = range.getValues();
myData.push(values_array);
return myData;
}
A picture of the sheet testing with just the rand numbers and not the derived row.
enter image description here

How do I count the number of rows in an array?

My task is to grab a 2-dimensional table from cells on a worksheet into a 2-dimensional array, delete some or all of the rows (right terminology?) from testing, and then paste what's left into a worksheet.
To determine the range for pasting I need to know the length of the edited array. This is where I'm challenged.
// This gets the array which is 3 columns wide and X rows (X will vary)
var termEmp = spreadsheet.getRangeByName("roeList").getValues();
// e.g. termEmp = [ ["Bob", 1, "day", "key"] ["Cindy", 2, "day", "it"] ["Laura", 1, "night", "we"] ]
// Then I find the number of rows that actually have data
numRows = termEmp[0].length; // result = 3
// A for loop with counter i tests if the second element equals 2 of each row and deletes each array row if it's there
// In this example I want to delete the row with Cindy because of the 2
// To do this is use the splice method to delete the second row thusly:
termEmp.splice(i,1); // i = 1 in the for loop
// After testing all elements, and deleting the rows I want, I then need to count the number of rows remaining (to create a range for pasting into the worksheet)
numRows = termEmp[0].length;
// This is SUPPOSED to count the number of rows remaining (first element is ALWAYS non-blank)
Here's my problem. For this example the number of rows after the splice goes from 3 to 2. I looked at the array to confirm this.
But in my code termEmp[0].length STAYS at 3. I can't use it to define my range for pasting.
What's needed to get the count right?
For number of rows, you can get the length of the full array.
var numRows = termEmp.length
What you're getting with termEmp[0].length is the number of columns in first row.
EDIT
OP indicated the answer "doesn't work" (which is false) however, as a courtesy here's subsequent code that helps his followup question to identify members in an array contain another array (effectively 2-dimensional spreadsheet data). The below code will take all memembers from termEmp that are an array, and inserts them into cleanedArray.
var cleanedArray = [];
for(var i=0;i<termEmp.length;i++){
var singleMember = termEmp[i];
if(Array.isArray(singleMember)){
//makes a clean array with only 2d values
cleanedArray.push(singleMember);
}
}
var numberOfMembers = cleanedArray.length;
Logger.log(numberOfMembers);

An If Statement inside an apply

I'm trying to use apply() to go through an array by rows, look at a column of 1's and 0's and then populate another column in that same array by using a function if the first column is a one, and a different function if it's a 0.
So it would be something like...
apply(OutComes, 1, if(risk = 1) {OutComes[, "Age"] = Function_1} else{OutComes[, "Age"] = Function_2} )
where OutComes is the array in question and risk is the variable which determines which function we use.
The aim is that 2 functions determine life length and people fall into one of the two categories, each with its own function. Based on the risk group, I want to use a different function to calculate the age, but this doesn't seem to be working.
apply() needs the name of a function; you need to define a function here,
because no readymade function supplied.
example: apply(OutComes, 1, sum) -will return sums of each line.
The number of output in vector is same as number or rows, so you can assign that to a variable and then add by cbind or replace the values of an existing column.
apply(OutComes, 1, function(x) {
if (x[n] == 1) {
Function_1 ()
}else {
Function_2 ()
} ) -> new_age
# x : is the working row at the time
# n : column number for "risk" # or # if(x["risk"] ==1)
# also note == instead of = at if
OutComes = cbind(OutComes, new_age)
#or
OutComes$Age <- new_age

Append local macro in Stata

In Stata, I want to explore regressions with many combinations of different dependent and independent variables.
For this, I decided to use a loop that does all these regressions, and then saves the relevant results (coefficients, R2, etc.) in a matrix in a concise and convenient form.
For this matrix, I want to name rows and columns to make reading easier.
Here is my code so far:
clear
sysuse auto.dta
set more off
scalar i = 1
foreach v in price mpg {
foreach w in weight length {
quietly: reg `v' `w' foreign
local result_`v'_`w'_b = _b[`w']
local result_`v'_`w'_t = ( _b[`w'] / _se[`w'] )
local result_`v'_`w'_r2 = e(r2)
if scalar(i) == 1 {
mat A = `result_`v'_`w'_b', `result_`v'_`w'_t', `result_`v'_`w'_r2'
local rownms: var label `v'
}
if i > 1 {
mat A = A \ [`result_`v'_`w'_b', `result_`v'_`w'_t', `result_`v'_`w'_r2']
*local rownms: `rownms' "var label `v'"
}
scalar i = i+1
}
}
mat coln A = b t r2
mat rown A = `rownms'
matrix list A
It will give a resulting matrix A that looks like this:
. matrix list A
A[4,3]
b t r2
Price 3.3207368 8.3882744 .4989396
Price 90.212391 5.6974982 .31538316
Price -.00658789 -10.340218 .66270291
Price -.22001836 -9.7510366 .63866239
Clearly, there is something not quite finished yet. The row names of the matrix should be "price, price, mpg, mpg" because that is what the dependent variable is in the four regressions.
In the code above, consider the now-commented-out line
*local rownms: `rownms' "var label `v'"
It is commented out because in the current form, it gives an error.
I wish to append the local macro rownms with the label (or name) of the variable on every iteration, producing Price Price Mileage (MPG) Mileage (MPG).
But I cannot seem to get the quotes right to append the macro with the label of the current variable.
Matrix row and column names are limited in what they can hold. In general, variable labels won't be very suitable.
Here is some simpler code.
sysuse auto.dta, clear
matrix drop A
local rownms
foreach v in price mpg {
foreach w in weight length {
quietly: reg `v' `w' foreign
mat A = nullmat(A) \ (_b[`w'], _b[`w']/_se[`w'], e(r2))
local rownms `rownms' `v':`w'
}
}
mat coln A = b t r2
mat rown A = `rownms'
matrix list A
Notes:
The nullmat() trick removes the need for a branch of the code on first and later runs through.
Putting results into locals and then taking them out again is not needed. To get out of the habit, think of this analogy. You have a pen in your hand. You put it in a box. You take it out again. Now you have a pen in your hand. Why do the box thing if you don't need to?
This works with your example, but the results are not very good.
local rownms `rownms' "`: var label `v''"

Google Script Overwriting Entire Sheet, Not Specific Range

In the function below, I grab data that is in multiple columns via a form response on my second sheet and place the information on my first sheet organized in rows.
I would like to have the first blank column after the data, currently G on my new sheet editable so that someone can come in and "approve" the contents of each row. Right now, when this script runs, it overwrites the contents of Column G. I thought the number 6 in the line with sh0.getRange(2, 1, aMain.length, 6).setValues(aMain); was telling the script to only put data into 6 columns... looks like that's not the case.
I also thought that I may be able to do a workaround by changing that line to sh0.getRange(2, 2 ... it would let me keep the first column as an editable column... that didn't work either.
Any suggestions to allow me to use this script and keep a column editable?
function SPLIT() {
var ss = SpreadsheetApp.getActiveSpreadsheet();
var sh0 = ss.getSheets()[0], sh1 = ss.getSheets()[1];
// get data from sheet 2
var data = sh1.getDataRange().getValues();
// create array to hold data
var aMain = new Array();
// itterate through data and add to array
// i is the loop, j=3 is the column it starts to loop with, j<9 tells it where to stop.
// in the aMain.push line, use data[i][j] for the rows to search and put in the one column.
for(var i=1, dLen=data.length; i<dLen; i++) {
for(var j=5; j<9; j++) {
aMain.push([data[i][0],data[i][1],data[i][2],data[i][3],data[i][4],data[i][j]]);
}
// add array of data to first sheet
// in the last line, change the last number to equal the number of columns in your final sheet.
// the first number in getrange is the row the data starts on... 1 is column.
sh0.getRange(2, 1, aMain.length, 6).setValues(aMain);
}
}

Resources