I´m trying to create a variable like this in Stata:
date
2012_1
2012_2
2013_1
2013_2
with the next loop:
forval y=2012/2013{
forval m=1/2{
display `m'
gen date = `y'_`m'
}
}
But I´m getting this error in the first iteration: 2012_1 invalid name. Sorry if the question is obvious, I´m newbie in Stata.
You face more problems than you realise here, but all are simple.
The immediate problem with your loop is that a value such as 2012_1 is intended by you as a value of a variable, but if so it must explicitly be a string, surrounded by "". The reason is that underscore _ is only acceptable as part of a string. Stata is clearly puzzled by your command. The error message does not quite fit the situation, although it is correct that 2012_1 is not an acceptable name, meaning the name of a variable or scalar.
If you fixed that, your next problem would be that second time around your loop the variable already exists and so generate is unacceptable. You would need to replace. So, the generate statement should be taken outside the loop.
Then again, all your loop does even with those problems fixed is to overwrite the variable each time with the same value. At your end of your loops all observations would contain the constant value 2013_2.
Longer term, there is still a problem. Evidently you want a monthly date variable, but monthly date variables like that are of little use in Stata. They sort in the correct order, but they are essentially useless for statistics or graphics.
This is a better idea all round:
generate mdate = .
local i = 1
forval y = 2012/2013 {
forval m = 1/2 {
replace mdate = ym(`y', `m') in `i'
local ++i
}
}
That is still not good style. I guess that you don't really want only months 1 and 2, but we can't know what you really want.
Do this in Stata:
clear
set obs 48
generate mdate = ym(2011, 12) + _n
format mdate %tm
list
to get an idea of a better approach -- with no loops at all.
There are quite some problems with your code. I'll go through them one by one.
`y'_`m' evaluates to 2012_1 the first iteration. Since it contains an underscore it cannot be interpreted as numeric. To be interpreted as a string value would require it to be enclosed in "". In the end, Stata tries to interpret it as a variable but 2012_1 is not a valid name (has to start with a letter), hence your error.
You could enclose your value in quotes to create a string variable: "`y'_`m'". This will work for the first iteration, but the second iteration you will get an error, since variable 'date' already exists. After creating a variable, you can only replace it.
Finally, your code says nothing about which value goes to which observation. Even if you would fix the problems already mentioned, your variable will just contain the same values for all observations which is the value of the last iteration in the loop. To replace only one observation you have to specify in i where i is the observation number.
All in all, this would be the amended code:
gen date = "."
local obs = 1
forval y=2012/2013{
forval m=1/2{
display `m'
replace date = "`y'_`m'" in `obs'
local ++obs
}
}
However, I would not recommend creating this type of date variable, as string variables are limited in what you can do with it. Stata's internal date format is the most convenient. If your values 1 and 2 represent half years you could create a half-yearly date variable, see help datetime for information on how to do this. Another option is to create a numeric variable containing the year, and a second numeric variable containing 1 and 2.
Related
I have a dataframe (raw) that can have one variable (iv1) with NA's in it. I want to replace the NA with different random values from the distribution of existing scores within (iv1), not one single value. the sample size (n) can be anything - 100 to 1000.
I save the distribution to a new data frame (dbmi) because I want to keep raw and dbmi separate, and calculate the mean and SD of the existing values of iv1 within dbmi. The following code works but replaces all of the NA's with just one value. I think I need to set up a for loop? Some kind of loop that finds the next occurrence of an NA and runs the new 'rnorm' value and sticks it in and goes to the next and does it again etc etc but I cant figure out how to do that. Any help?
dbmi<-raw
attach(dbmi)
rawmean<-mean(dbmi$iv1,na.rm=TRUE)
rawsd<-sd(dbmi$iv1,na.rm=TRUE)
for (i in 1:n){
dbmi$iv1[is.na(dbmi$iv1)]<-rnorm(1,rawmean,rawsd)
}
I actually solved my own problem. I set up the variable locations [i] that had the NA's into a variable called 'pull', then I just created a new stream into a variable called 'new' I used this code to substitute.
dbmi<-raw
attach(dbmi)
rawmean<-mean(dbmi$iv1,na.rm=TRUE)
rawsd<-sd(dbmi$iv1,na.rm=TRUE)
new<-rnorm(num,rawmean,rawsd)
for (i in 1:n){
dbmi$iv1[pull]<-new
}
I have a dataset in SPSS with 311 different variables and 1304 cases. 99 of those variables have ICD9 and ICD 10 codes which are sometimes only numeric (i.e. 303) and sometimes string (i.e. H233). I have made all of the variables string.
What I need to do is have SPSS go through each case and each of the 99 variables and see if it finds any of the codes from a large list of the codes, i.e.:
("3180","3181","3182","330","33111","33119",
"3314","33189","3319","3320","3321","3330",
"3332","3334","3335","3337","3339","334",
"335","343","34501","34581","3590","3591",
"3592","3593","3361","3368","3379","3418",
"34290","343","3440","34481","3449","34511",
"3453","34541","34561","34571","34591","3481",
"3484","3491","43401","43491","359","740",
"741","742","7595","78003","9962","99663",
"V452","V5301","V5302")
If it finds any for my specified list of variables I need it to make the variable ccc_n = 1, otherwise ccc_n needs to equal 0. I tried COMPUTE ccc_n = 0. How can I accomplish this? I have tried do repeat, do if, loop and vector but I can't seem to make it work.
Try this:
do repeat vr=vr1 to vr99.
compute ccc_n=any(vr, "3180","3181","3182","330","33111","33119" ....).
end repeat.
You should of course replace vr1 to vr99 with your real variable names (if they are not consecutive in the file you need to name them each separately). In the any() function enter all of your codes separated by commas.
I have a variable that is created by a loop. The variable is large enough and in a complicated enough form that I want to save the variable each time it comes out of the loop with a different name.
PM25 is my variable. But I want to save it as PM25_year in which the year changes based on `str = fname(13:end)'
PM25 = permute(reshape(E',[c,r/nlay,nlay]),[2,1,3]); % Reshape and permute to achieve the right shape. Each face of the 3D should be one day
str = fname(13:end); % The year
% Third dimension is organized so that the data for each site is on a face
save('PM25_str', 'PM25_Daily_US.mat', '-append')
The str would be a year, like 2008. So the variable saved would be PM25_2008, then PM25_2009, etc. as it is created.
Defining new variables based on data isn't considered best practice, but you can store your data more efficiently using a cell array. You can store even a large, complicated variable like your PM25 variable within a single cell. Here's how you could go about doing it:
Place your PM25 data for each year into the cell array C using your loop:
for i = 1:numberOfYears
C{i} = PM25;
end
Resulting in something like this:
C = { PM25_2005, PM25_2006, PM25_2007 };
Now let's say you want to obtain your variable for the year 2006. This is easy (assuming you aren't skipping years). The first year of your data will correspond to position 1, the second year to position 2, etc. So to find the index of the year you want:
minYear = 2005;
yearDesired = 2006;
index = yearDesired - minYear + 1;
PM25_2006 = C{index};
You can do this using eval, but note that it's often not considered good practice. eval may be a security risk, as it allows user input to be executed as code. A better way to do this may be to use a cell array or an array of objects.
That said, I think this will do what you want:
for year = 2008:2014
eval(sprintf('PM25_%d = permute(reshape(E',[c,r/nlay,nlay]),[2,1,3]);',year));
save('PM25_Daily_US.mat',sprintf('PM25_%d',year),'-append');
end
I do not recommend to set variables like this since there is no way to track these variables and completely prevents all kind of error checking that MATLAB does beforehand. This kind of code is handled completely in runtime.
Anyway in case you have a really good reason for doing this I recommend that you use the function assignin for this.
assignin('caller', ['myvar',num2str(1)], 63);
I currently look for an advice on the below piece of code which consists of efficiently looping through a dataset (of cell type) and extracting each column as data vector.
[i,j]=size(fimat);
k=2;
while k<=j % looping through columns
[num2str(k-1),'yr']=cell2mat(fimat(:,k)); %extract each column as vector
k=k+1;
end
My matter undeniably lies in the following statement:
[num2str(k-1),'yr']
that correctly concatenates numbers (reflected by variable k) and string name 'yr'. However the syntax fails in assigning for instance (during 1st iteration)
1yr=cell2mat(fimat(:,2))
The resulting error speaks from itself
Error: An array for multiple LHS assignment cannot contain LEX_TS_STRING.
but I'm still figuring out a way to do it. Thus any feedback would be appreciated.
Thanks
First of all, in matlab, a variable name cannot start with a digit. You should modify your code such that the variable name starts with either a letter or an underscore.
For instance ['yr' num2str(k-1)] or ['_' num2str(k-1) 'yr'] would be better.
Then, what you are trying to do is very strongly discouraged by everyone, including The Mathworks. It would be much better to use a cell yr and call to yr{k} rather than iterative variable names:
yr = cell(j,1);
for k = 2:j
yr{k-1} = cell2mat(fimat(:,k));
end
Anyway, if you still want to do this, you can use eval
while k<=j
eval(['_' num2str(k-1) 'yr = cell2mat(fimat(:,k));']);
k=k+1;
end
Best,
You can not dynamically create variable names like you did. The left side of the = must be a identifier, not a char. The alternative I recommend is to use a cell array instead of individual variable names. For example:
yr{k-1}=cell2mat(fimat(:,k))
If you must use variable names with numbers, which I strongly recommend not to do, you have to use eval for the line. Alternatives which I strongly recommend to check before using eval are struct with dynamic field names and containers.Map
Here is my answer to the question, for sharing purposes. Hope it will help and Thanks to the contributors of this post.
[i,j]=size(fimat); %get dimension of dataset (of cell type)
numdata=cell2mat(fimat(1:i,2:j)); %extract only numeric from dataset
for k=1:j-1
eval(sprintf('yr%d = numdata(:,k)', k));
end
Admitted, this question is not very interesting, but since the warnings in the sas-log can be very helpful sometimes I'd like to know what is going on here.
Consider the following minimal example. In step0 we created a dataset. In step 1 we want to copy the value of some variable in step0 to step1 but we forgot the correct name of the variable (or we remember correctly but someone changed it when we were away.) I write two versions of step1 named step1a and step1b.
Data step0;
Dog = 1;
run;
Data step1a;
value = cat;
run;
Data step1b;
array animals cat;
value = animals[1];
run;
Needless to say both version of step1 produce the same dataset, in this case an empty dataset with variables 'value' and 'cat'.
However: when running step1 in the way step1a is written, the SASlog will warn us that something is wrong:
NOTE: Variable cat is uninitialized.
We can go back to our code, notice that what we think was a cat was actually a dog all along, see the error of our ways and produce the correct dataset we had in mind.
When on the other hand running step1 in the way step1b is written, the SASlog will act as if everything is perfectly fine and we can go out singing and dancing in the street only to find out years later that the value of dog is lost forever.
So the question is: why does SAS think in the second case that no warning is needed?
That's because you HAVE initialized the variable in the third example, via the array declaration. When you declare an array, any variables not already existing are initialized to Numeric missing, unless you either specify $ in the array definition (in which case they are character missing (length 8)), or you specify an initialized value.