I have the auto dataset and would like to create a few bar graphs:
sysuse auto, clear
local mpg "22 20 17"
local titles "Title1 Title2 Title3"
local path "twentytwo twenty seventeen"
foreach x of local mpg {
foreach y of local titles {
foreach z of local path {
keep if mpg==`x' & foreign==0
egen hv_rank=rank(price)
# delimit ;
graph bar price,
over (make, sort(hv_rank) reverse label(labsize(vsmall)))
ytitle("")
horizontal title("`y'", size(medium))
;
# delimit cr
graph save "$dir_gphs\mpg`z'f0-bal.gph", replace
drop hv_rank
sysuse auto, clear
}
}
}
I do not want to create a bar chart for every possible combination of the "values" of my 3 locals but instead I´d like to have if x=22, then y=Title1 and then z=twentytwo. Likewise if x=20 then y=Title2 and z=twenty.
This must be a simple problem. And I guess my search so far has not brought me any usable results because I do not know the right vocabulary of the problem.
Here is how I would approach the problem.
. local mpg 22 20 17
. local titles `" "Title 1" "Title 2" "Title 3" "'
. local path twentytwo twenty seventeen
.
. forvalues i = 1/3 {
2. local x : word `i' of `mpg'
3. local y : word `i' of `titles'
4. local z : word `i' of `path'
5. display `" `x' --- `y' --- `z' "'
6. }
22 --- Title 1 --- twentytwo
20 --- Title 2 --- twenty
17 --- Title 3 --- seventeen
Or alternatively
. local set1 22 "Title 1" twentytwo
. local set2 20 "Title 2" twenty
. local set3 17 "Title 3" seventeen
. forvalues i = 1/3 {
2. local x : word 1 of `set`i''
3. local y : word 2 of `set`i''
4. local z : word 3 of `set`i''
5. display `" `x' --- `y' --- `z' "'
6. }
22 --- Title 1 --- twentytwo
20 --- Title 2 --- twenty
17 --- Title 3 --- seventeen
As you say, you really want a single loop. Realising that depends on experience rather than finding some documentation.
I can't test this because it hinges on your local directory structure and a global macro that is not defined, so your example is not reproducible. I have made some incidental simplifications.
If your individual elements contained spaces, you would need double quotes to bind.
sysuse auto, clear
forval j = 1/3
local x : word `j' of 22 20 17
local title: word `j' of Title1 Title2 Title3
local path: word `j' of twentytwo twenty seventeen
graph bar price if mpg==`x' & foreign==0 ///
over(make, sort(1) reverse label(labsize(vsmall))) ///
ytitle("") horizontal title("`title'", size(medium))
graph save "$dir_gphs\mpg`path'f0-bal.gph", replace
}
Related
I have different folders with datasets called e.g.
3-1-1
3-1-2
3-2-1
3-1-2
the first placeholder is fixed, the second and third are elements of a list:
k1values = "1 2"
k2values = "1 2"
I want to do easy operations in my Gnuplot script e.g. cd to the above directories and read a line of a textfile. First, it shall cd to the folder, read a file and cd back again etc.
My first (1) idea was to connect system command and sprintf:
do for[i=1:words(k1values)]{
do for[j=1:words(k2values)]{
system sprintf("cd 3-%d-%d", i, j)
system 'pwd'
system 'cd ..'
}
}
with that the same path is being printed, so no CD is happening at all.
or system 'cd sprintf("3-%d-%d", i, j)'
Unfortunately, this is not working.
Error message: sh: 1: Syntax error: "(" unexpected
I also tried concatenating the values to a string and enter it as a path: This also doesn't work:
k1values = "1 2"
k2values = "1 2"
string1 = '3'
do for[i=1:words(k1values)]{
do for[j=1:words(k2values)]{
path = sprintf("%s-%d-%d", string1, i, j)
system sprintf("cd %s", path)
system 'pwd'
system 'cd ..'
}
}
I print the path for testing, but the operating path is not being changed at all.
Thanks in advance!
Edit: The idea in a given pseudo code is like this:
do for k1
do for k2
valueX = <readingCommand>
make dir "3-k1-k2/Pictures"
for int i = 0; i<valueX; i++
set output bla
plot "3-k1-k2/Data/i.txt" <options>
end for
end do for
end do for
Unless there is a reason which we don't know yet, why do you want to change back and forth into the subdirectories?
Why not creating your path/filename via a function and load the desired file and plot the desired lines?
For example, if you have the following directory structure:
CurrentFolder
3-1-1
Data.dat
3-1-2
Data.dat
3-2-1
Data.dat
3-2-2
Data.dat
and the following files:
3-1-1/Data.dat
1 1.14
2 1.15
3 1.12
4 1.11
5 1.13
3-1-2/Data.dat
1 1.24
2 1.25
3 1.22
4 1.21
5 1.23
3-2-1/Data.dat
1 2.14
2 2.15
3 2.12
4 2.11
5 2.13
3-2-2/Data.dat
1 2.24
2 2.25
3 2.22
4 2.21
5 2.23
The following example loads all the files Data.dat from the corresponding subdirectories and plots the lines 2 to 4 (the lines have 0-based index, check help every).
Script:
### plot specific lines from files from different directories
reset session
k1values = "1 2"
k2values = "1 2"
string1 = '3'
myPath(i,j) = sprintf("%s-%s-%s",string1,word(k1values,i),word(k2values,j))
myFile(i,j) = sprintf("%s/%s",myPath(i,j),"Data.dat")
set key out
plot for [i=1:words(k1values)] for[j=1:words(k2values)] myFile(i,j) \
u 1:2 every ::1::3 w lp pt 7 ti myPath(i,j)
### end of script
Result:
This is my final solution:
k1values = '0.5 1'
k2values = '0.5 1'
omega = 3
do for[i in k1values]{
do for[j in k2values]{
savingPoint = system('head -n 1 "3-'.i.'-'.j.'/<fileName>.dat" | tail -1')
number = savingPoint/<value>
do for[m = savingPoint:0:-<value>]{
set title <...>
set output <...>
plot ''.omega.'-'.i.'-'.j.'/Data/'.m.'.txt' <...>
}
}
}
<...> is a placeholder and irrelevant.
So this is how I finally iterate over the folders.
Within the second for loop, a reading command is executed and allocated to a variable which is needed in the third for loop. i and j are strings though, but that does not matter.
I have a very large dataset but to cut it short I demonstrated the data with the following example:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(patid death dateofdeath)
1 0 .
2 0 .
3 0 .
4 0 .
5 1 15007
6 0 .
7 0 .
8 1 15526
9 0 .
10 0 .
end
format %d dateofdeath
I am trying to sample for a case-control study based on date of death. At this stage, I need to first create a variable with each date of death repeated for all the participants (hence we end up with a dataset with 20 participants) and a pairid equivalent to the patient id patid of the corresponding case.
I created a macro for one case (which works) but I am finding it difficult to have it repeated for all cases (where death==1) in a loop.
The successful macro is as follows:
local i "5" //patient id who died
gen pairid= `i'
gen matchedindexdate = dateofdeath
replace matchedindexdate=0 if pairid != patid
gsort matchedindexdate
replace matchedindexdate= matchedindexdate[_N]
format matchedindexdate %d
save temp`i'
and the loop I attempted is:
* (min and max patid id)
forval j = 1/10 {
count if patid == `j' & death==1
if r(N)=1 {
gen pairid= `j'
gen matchedindexdate = dateofdeath
replace matchedindexdate=0 if pairid != patid
gsort matchedindexdate
replace matchedindexdate= matchedindexdate[_N]
save temp/matched`j'
}
}
use temp/matched1, clear
forval i=2/10 {
capture append using temp/matched`i'
save matched, replace
}
but I get:
invalid syntax
How can I do the loop?
I finally had it solved, please check:
https://www.statalist.org/forums/forum/general-stata-discussion/general/1591811-how-to-create-a-loop-for-a-macro
Let's say I have 60 variables, none with similar naming patterns. I want to assign labels to all variables, which I stored locally. So for example
local mylabels "dog cat bird"
However I am struggling with the exact expression of the loop. Do I have to store my variable range globally and then use a foreach? Or do I use forvalues?
Edit: I was referring to variable labels. I managed to create a loop, similar to the method used here http://www.stata.com/support/faqs/programming/looping-over-parallel-lists/. However I ran into a more difficult problem: my variables have no particular naming patterns, and the labels have special characters (spaces, commas, %-signs), and here is where my loop does not work.
Some example data (excuse the randomness):
gen Apples_ts_sum = .
gen Pears_avg_1y = .
gen Bananas_max_2y = .
And some example labels:
"Time series of apples, sum, %" "Average of pears, over 1 year"
"Maximum of bananas, over 2 years".
I ran into this entry by Nick Cox: http://www.stata.com/statalist/archive/2012-10/msg00285.html and tried to apply the mentioned parentheses method, like so:
local mylabels `" "Time series of apples, sum, %" "Average of pears, over 1 year" "Maximum of bananas, over 2 years" "'
But could not get it to work.
If you want to label all the variables the same thing, for example "dog cat bird", Then you can use the varlist option for the describe command. Let's say your 60 variables can be generally listed with the expression EXP. Then:
qui des EXP, varlist
foreach variable in `r(varlist)'{
label var `variable' "dog cat bird"
}
Edited:
Taking your example data, I created another local containing the variable names.
local myvar `" "Apples_ts_sum" "Pears_avg_1y" "Bananas_max_2y" "'
local mylabels `" "Time series of apples, sum, %" "Average of pears, over 1 year" "Maximum of bananas, over 2 years" "'
forval n = 1/3{
local a: word `n' of `mylabels'
local b: word `n' of `myvar'
di "variable `b', label `a'"
label var `b' "`a'"
}
Note that I manually created the list of variables. You can automatically create this list using the method I listed above, with des, varlist.
qui des , varlist
foreach var in `r(varlist)'{
local myvar_t "`myvar_t' `var'"
}
You can then use the local myvar_t instead of myvar in the above example.
Data is setup with a bunch of information corresponding to an ID, which can show-up more than once.
ID Data
1 X
1 Y
2 A
2 B
2 Z
3 X
I want a loop that signifies which instance of the ID I am looking at. Is it the first time, second time, etc? I want it as a string in the form _# so I have to go beyond the simple _n function in Stata, to my knowledge. If someone knows a way to do what I want without the loop let me know, but I would still like the answer.
I have the following loop in Stata
by ID: gen count_one = _n
gen count_two = ""
quietly forval j = 1/3 {
replace count_two = "_`j'" if count_one == `j'
}
The output now looks like this:
ID Data count_one count_two
1 X 1 _1
1 Y 2 _2
2 A 1 _1
2 B 2 _2
2 Z 3 _3
3 X 1 _1
The question is how can I replace the 16 above with to tell Stata to take the max of the count_one column because I need to run this weekly and that max will change and I want to reduce errors.
It's hard to understand why you want this, but it is one line whether you want numeric or string:
bysort ID : gen nummax = _N
bysort ID : gen strmax = "_" + string(_N)
Note that the sort order within ID is irrelevant to the number of observations for each.
Some parts of your question aren't clear ("...replace the 16 above with to tell Stata...") but:
Why don't you just use _n with tostring?
gsort +ID +data
bys ID: g count_one=_n
tostring count_one, gen(count_two)
replace count_two="_"+count_two
Then to generate the max (answering the partial question at the end there) -- although note this value will be repeated across instances of each ID value:
bys ID: egen maxcount1=max(count_one)
or more elegantly:
bys ID: g maxcount2=_N
I have a number of variables whose name begins with the prefix indoor. What comes after indoor is not numeric (that would make everything simpler).
I would like a tabulation for each of these variables.
My code is the following:
local indoor indoor*
foreach i of local indoor {
tab `i' group, col freq exact chi2
}
The problem is that indoor in the foreach command resolves to indoor* and not to the list of the indoor questions, as I hoped. For this reason, the tab command is followed by too many variables (it can only handle two) and this results in an error.
The simple fix is to substitute the first command with:
local indoor <full list of indoor questions>
But this is what I would like to avoid, that is to have to find all the names for these variables and then paste them in the code. It seems there is a quicker fix for this but I can't think of any.
The trick is to use ds or unab to create the varlist expansion before asking Stata to loop over values in the foreach loop.
Here's an example of each:
******************! BEGIN EXAMPLE
** THIS FIRST SECTION SIMPLY CREATES SOME FAKE DATA & INDOOR VARS **
clear
set obs 10000
local suffix `c(ALPHA)'
token `"`suffix'"'
while "`1'" != "" {
g indoor`1'`2'`3' = 1+int((5-1+1)*runiform())
lab var indoor`1'`2'`3' "Indoor Values for `1'`2'`3'"
mac shift 1
}
g group = rbinomial(1,.5)
lab var group "GROUP TYPE"
** NOW, YOU SHOULD HAVE A BUNCH OF FAKE INDOOR
**VARS WITH ALPHA, NOT NUMERIC SUFFIXES
desc indoor*
**USE ds TO CREATE YOUR VARLIST FOR THE foreach LOOP:
ds indoor*
di "`r(varlist)'"
local indoorvars `r(varlist)'
local n 0
foreach i of local indoorvars {
**LET'S CLEAN UP YOUR TABLES A BIT WITH SOME HEADERS VIA display
local ++n
di in red "--------------------------------------------"
di in red "Table `n': `:var l `i'' by `:var l group'"
di in red "--------------------------------------------"
**YOUR tab TABLES
tab `i' group, col freq chi2 exact nolog nokey
}
******************! END EXAMPLE
OR using unab instead:
******************! BEGIN EXAMPLE
unab indoorvars: indoor*
di "`indoorvars'"
local n 0
foreach i of local indoorvars {
local ++n
di in red "--------------------------------------------"
di in red "Table `n': `:var l `i'' by `:var l group'"
di in red "--------------------------------------------"
tab `i' group, col freq chi2 nokey //I turned off exact to speed things up
}
******************! END EXAMPLE
The advantages of ds come into play if you want to select your indoor vars using a tricky selection rule, like selecting indoor vars based on information in the variable label or some other characteristic.
You could do this with
foreach i of var `indoor' {
tab `i' group, col freq exact chi2
}
This would work. It is almost identical to the code in the question.
unab indoor : indoor*
foreach i of local indoor {
tab `i' group, col freq exact chi2
}
foreach v of varlist indoo* {
do sth with `v'
}