Make Stata command "drop" to continue even if variable not found - loops

I have a set of 18 Stata data files (one per year) whose names are:
{aindresp.dta, bindresp.dta, ... , rindresp.dta}
I want to eliminate some variables from each dataset. For this, I want to use the fact that many variables across dataset have the same name, plus a prefix given by the dataset prefix (a, b, c, ... r). For example, the variable rach12 is called arach12 in dataset aindresp.dta, etc. Thus, to clean each dataset, I run a loop like the following:
clear all
local list a b c d e f g h i j k l m n o p q r
foreach var of local list {
use `var'indresp.dta
drop `var'rach12 `var'jbchc1 `var'jbchc2 `var'jbchc3 `var'xpchcf var'xpchc
save `var'indresp.dta, replace
}
The actual loop is much larger. I am deleting around 200 variables.
The problem is that some variables change name over time, or disappear after a few years. Other variables are added. Therefore, the loop stops as soon as a variable is not found. This is because the drop command in Stata stops. Yet, that command has no option to force it to continue.
How can I achieve my goal? I would not like to go manually over each dataset.

help capture
You can just put capture in front of the drop. You can just keep going, but a little better would be to flag which datasets fail.
In this sample code, I've presumed that there is no point to the save, replace if you didn't drop anything. The basic idea is that a failure of a command results in a non-zero error code accessible in _rc. This will be positive (true) if there was a failure and zero (false) otherwise.
A more elaborate procedure would be to loop over the variables concerned and flag specific variables not found.
clear all
local list a b c d e f g h i j k l m n o p q r
foreach var of local list {
use `var'indresp.dta
capture drop `var'rach12 `var'jbchc1 `var'jbchc2 `var'jbchc3 `var'xpchcf var'xpchc
if _rc {
noisily di "Note: failure for `var'indresp.data"
}
else save `var'indresp.dta, replace
}
See also Does Stata have any `try and catch` mechanism similar to Java?
EDIT:
If you want to drop whatever exists, then this should suffice for your problem.
clear all
local list a b c d e f g h i j k l m n o p q r
foreach var of local list {
use `var'indresp.dta
capture drop `var'rach12 `var'jbchc1 `var'jbchc2 `var'jbchc3 `var'xpchcf var'xpchc
if _rc {
di "Note: problem for `var'indresp.data"
checkdrop `var'rach12 `var'jbchc1 `var'jbchc2 `var'jbchc3
}
save `var'indresp.dta, replace
}
where checkdrop is something like
*! 1.0.0 NJC 1 April 2016
program checkdrop
version 8.2
foreach v of local 0 {
capture confirm var `v'
if _rc == 0 {
local droplist `droplist' `v'
}
else local badlist `badlist' `v'
}
if "`badlist'" != "" {
di _n "{p}{txt}variables not found: {res}`badlist'{p_end}"
}
if "`droplist'" != "" {
drop `droplist'
}
end

Related

Storing structural break points with "foreach" loop in stata

read.csv("C:\Users\easy\Desktop\workbook.csv")
I need to estimate the structural breakpoint of regression over a list of countries in my dataset and I need to store these breakeven points for each country I have and display these breakeven points in a table form once the loop finishes. My dataset is panel data that is why I need to loop over the countries.
I estimate the regression for each country in my countrynum variable of countries' list. And I try to store the breakeven point for each country regression estimation as follows
foreach i in countrynum {
by countrynum, sort: reg y x1 x2 x3 if `i'== countrynum
est store `r'(breakdate)
}
Stata is returning the following error message:
( invalid name
) invalid name
r(7);
Any idea what is wrong with my code?
Assuming the syntax fixes that Nick Cox aptly laid out, what you are missing is sbsingle or some other structural break command before asking Stata for r(breakdate); see here for more. After that you could do something like this, assuming that your panels are identified by countrynum.
* EX DATA
webuse usmacro, clear
tempfile append
save `append', replace
append using `append', gen(countrynum)
* Run By program (ssc install runby)
capture program drop panel_breakdate
program panel_breakdate
tsset date
regress fedfunds L.fedfunds
estat sbsingle
gen breakdate = r(breakdate)
end
runby panel_breakdate, by(countrynum) verbose
* After this format your breakdate how you please.
There is a lot wrong with your code, unfortunately, although you haven't noticed various errors because they are errors of meaning, not errors of syntax.
For a start,
foreach i in countrynum {
does not trigger a loop over the distinct values of countrynum. It is a loop over one item, the variable name countrynum.
So your test becomes
if countrynum == countrynum
which is always true, and the loop is no loop, but equivalent to
by countrynum, sort: reg y x1 x2 x3
est store `r'(breakdate)
Now the next problem is that the first command runs through several regressions, but only results for the last regression (for the last country named) will remain in memory.
The error that Stata noticed is that it does not know what you mean by
`r'(breakdate)
You are, it seems, referring to a result that requires extra syntax to get
`r(breakdate)'
Positive suggestion. Using statsby is a much better idea.
General Solution
I have a solution to your problem I believe. This program needs to all be run at the same time due to the use of local variables. This worked for me on the usmacro test data where I made half the observations country 1 and the other half country 2. It should work for you as well as long as your data is tsset already.
levelsof countrynum
foreach lev in `r(levels)' {
reg y x1 x2 x3 if countrynum == `lev'
estat sbsingle
scalar break`lev' = r(breakdate)
}
scalar list
As long as you have no scalars previously made, it will return a list of all the breakdates for the countries with the syntax of (break)(countrynum) without the parentheses. Let me know if this doesn't work for you, it's difficult without any example data from you but it works in my test environment.
Example
If you want to see how this works before you run it on your dataset use the following commands at once,
clear all
webuse usmacro
gen countrynum = 01 if _n < 35
replace countrynum = 22 if countrynum == .
tsset date
levelsof countrynum
foreach lev in `r(levels)' {
reg fedfunds L.fedfunds inflation if countrynum == `lev'
estat sbsingle
scalar break`lev' = r(breakdate)
}
scalar list
which will return the following in the stata output,
. scalar list
break22 = 1980q4
break1 = 1958q1

Using a loop's positional parameters inside an inner loop in Raku

Here is the code:
my #s=<a b c d>;
for #s.kv {
for ($^k ... #s.elems) {
printf("%s ", $^v);
}
printf("\n");
}
Expected output is:
# a b c d
# b c d
# c d
# d
But it gives this error (possibly among others)
key 0, val 1 Too few positionals passed; expected 2 arguments but got 1
It looks like the positional variables of the main loop $^k and $^v can't be used inside the inner loop. How to fix it? Thanks.
Update: Typo inside inner loop fixed
So for what you want to do I'd approach it like this :
my #s = <a b c d>;
for ^#s.elems -> $start-index {
for #s[$start-index..*] -> $value {
printf("%s ", $value );
}
print("\n");
}
Though really I'd do this.
my #s = <a b c d>;
(^#s.elems).map( { #s[$_..*].join(" ").say } )
Get the range from 0 to the number of elements in the array. Then the slice from there to the end for each, join on spaces and say.
A note on variables like $^k these are scoped to the current block only (hence why your above code is not working). Generally you only really want to use them in map, grep or other such things. Where possible I'd always advise naming your variables, this makes them scoped inside inner blocks as well.
Scimon Proctor's answer is essentially correct, but I'll try to explain why your example does not work. For starters, kv returns "an interleaved sequence of indexes and values", so this:
my #s=<a b c d>;
.say for #s.kv;
prints
0
a
1
b
2
c
3
d
Essentially, you're doing one turn of the loop for every key and value. Grouping them in pairs using rotor might be closer to what you're looking for:
.say for #s.kv.rotor(2)
which will return:
(0 a)
(1 b)
(2 c)
(3 d)
Since with this we got the value couple with the index, we can do...
my #s=<a b c d>;
for #s.kv.rotor(2) -> ($k, $) {
"{#s[$_]} ".print for ($k..^#s.elems);
printf("\n");
}
Please note that there was also an error in the inner loop, whose range went beyond the actual indices in #s. But, again, Scimon's answer that uses maps is much shorter, idiomatic and straightforward. This one is just kind of dwimming your original program. As a matter of fact, we are throwing away the values, so this would actually be:
my #s=<a b c d>;
for #s.keys -> $k {
"{#s[$_]} ".print for ($k..^#s.elems);
printf("\n");
}
No need to use kv at all, and just make do with the keys.

Dafny: Using "forall" quantifiers with the "reads" or "modifies" clauses

So I am trying to implement Dijkstra's single source shortest paths algorithm in Dafny based directly on the description of the algorithm in the CLRS algorithms book as part of an undergraduate project. As part of the implementation, I have defined a "Vertex" object with two fields representing the current length of shortest path from source and the predecessor vertex:
class Vertex{
var wfs : int ;
var pred: Vertex;
}
As well as a "Graph" object that contains an array of "Vertex"-es:
class Graph{
var vertices: array<Vertex>;
....
I am trying to state some properties of the fields in each "Vertex" of the vertices array using a predicate in the "Graph" object:
predicate vertexIsValid()
reads this;
reads this.vertices;
{
vertices != null &&
vertices.Length == size &&
forall m :: 0 <= m < vertices.Length ==> vertices[m].wfs != 900000 &&
vertices[m].pred != null
}
To my understanding, the "reads" and "modifies" clauses in Dafny only work on one layer and I'd have to specify to Dafny that I would be reading each entry in the vertices array ( reads this.vertices[x] ) . I tried using a "forall" clause to do it:
forall m :: 0 <= m < vertices.Length ==> reads this.vertices[m]
but this doesn't seem to be a feature in Dafny. Does anyone know if there is a way to use quantifiers with the "reads" clause or otherwise tell Dafny to read the fields in each entry of an array containing objects?
Thanks for the help.
You can do that most easily by using a set as a reads clause.
For your example, this additional reads clause on vertexIsValid worked for me:
reads set m | 0 <= m < vertices.Length :: vertices[m]
You can think of this set expression as saying "the set of all elements vertices[m] where m is in bounds".

combining different cell arrays into one in MATLAB

I am trying to look for specific characters in an array and print the output into an excel sheet in the same order (i.e if there are elements in between without a match it is left blank).
I used the following code within the loop:
EDIT:
[num,txt,~] = xlsread('protein-peptides.xls')
for i=1:size(txt)
str(i)=txt(i)
expression='\w*Pyro-glu from E\w*';
matchStr(i)=regexp(str(i),expression,'match','once');
ArrayOfStrings=vertcat(matchStr{:});
end
After the loop:
xlswrite(filename,ArrayOfStrings,1);
And the output is like below.
1) The elements without a match are not shown as blank
2) Each word of the match is displayed in a different cell.
P y r o - g l u f r o m E
P y r o - g l u f r o m E
P y r o - g l u f r o m E
P y r o - g l u f r o m E
How do I get the blank spaces left out in the matrix and have the entire matching phrase in a single cell in the output?
I tried concatenation of cells but that is printing all the output in a single row but still each character in different cells
I know you'd probably prefer to use xlswrite, but I wasn't able to get this to work (at least on OS X). It appears that xlswrite automatically strips out empty cells, as you observed. It also appears to be spreading each cell over multiple columns, which is bizarre and different from the behavior I remember. I wonder if there was a recent update (I'm using R2015b).
I was able to get this to work using simple fprintf calls to write a CSV file, which can be opened in Excel. Note that the termination character (\r) is critical; empty cells do not seem to be preserved in Excel if this is replaced by a linefeed. I also refactored your code a bit.
% Load data from Excel file
[~, txt, ~] = xlsread('protein-peptides.xls');
% Perform analysis
expression='\w*Pyro-glu from E\w*';
for i = 1:length(txt)
matches(i, 1) = regexp(txt(i), expression, 'match', 'once');
end
% Write data to CSV file
fid = fopen('test.csv', 'w+');
for i = 1:length(matches)
fprintf(fid, '%s\r', matches{i});
end
fclose(fid);
Input file rows
Pyro-Flu from E
test
Pyro-Flu from E
test
Output file rows
Pyro-glu from E
Pyro-glu from E

awk, custom delarray function

Can someone explain, why I am not getting expected results?
awk '
# I know there is delete array, but this is more portable
# that is what docs are saying, anyway wanted to test it out.
function delarray(a, i)
{
for (i in a)
delete a[i]
}
BEGIN {
a[3]=""
a[4]=""
for (e in a)
print e
delarray(a)
for (e in a)
print ".."
print e
}
'
Executing the above script, I expected to see:
3
4
..(nothing here)
I used .. thinking I won't see anything else because of
deleted array values so just to see .. as placeholder)
,but the actual output I see is:
3
4
4 #(why this?, and where are two dots?)
,also exit code was 1, why is that?
Your delete function worked.
Since you're missing the braces around your for (e in a) loop, it only contains the print ".." statement, which is why you don't see any dots.
The print e command simply prints the last value that was assigned to e (from the previous for (e in a) loop), which is 4.
But your function is not very useful since virtually all versions of awk allow the delete a command without an index. It's in the POSIX standard.
You are missing curly braces around your second loop.
for (e in a) {
print ".."
print e
}
Output:
3
4

Resources