Assigning unique values to each case of a variable inside loop - looper

I have a variable name subject. For each unique subject there are 240 response latency recorded. Depending on that experimental condition is counterbalanced between-subject. Now I want to read the subject ID (variable name subject) and if they are even I should assign order to be 1 or if the subject ID is odd, I should assign variable order 2. Now this assignment should be done for each rows (ie 240 per subject)
I used if loop: The error I get is.... the condition has length > 1 and only the first element will be used
I also tried ifelse like this:
ifelse(data1$subject%%2==1, data1$order<-1, data1$order<-2)
Though the output is generated but it is not recorded/stored in the variable order.
Please help to make this happen.

I got the answer luckily.
the same ifelse will work in the following manner:
order<-ifelse(data1$subject%%2==1,1,2)
To include the new vector into the dataframe, we can use:
data1<-cbind(data1,order)

Related

Extract Data From Table Based on Multiple Criteria within ranges

I have a problem that I would like someone to help me with.
I need to cross data from a table, the table is the following below:
Literally I want to put in the "Yellow" line the amount I want, it can be any one from 0 to 3000. If in "Yellow" put for example 190, which is up to 200, then it will select column F. If you put 1000, then will already select column H.
Then I need to cross data with the lines, which will be up to x m2. That is, if in the line where it says "Green" select for example 0.3. Then it will select line 15. The Result of the 2 questions would be 1000 in this example.
However, I've already made a few attempts, and there I arrived at a formula killer:
=IFS(AND($E$20<=$F$14;$E$21<=$E$15);$F$15;AND($E$20<=$F$14;$E$21<=$E$16);$F$16;AND($E$20<=$F$14;$E$21<=$E$17);$F$17;AND($E$20<=$F$14;$E$21<=$E$18);$F$18;AND($E$20<=$G$14;$E$21<=$E$15);$G$15)
And this formula continues until the end. It's effective, it does its job, but in addition to being huge, it also makes it difficult to edit one day. I would like to try to improve it.
Any idea?
I apologize to everyone who was confused by my earlier attempt to explain my problem. Thank you all.
Per my understanding you are looking for two match criteria. For yellow criteria you look for exact match and for green criteria, the exact match or the next upper value.
You can use INDEX/XMATCH for that as follow using LET function in cell J3:
=LET(rng, B2:G5, upper, 1*TEXTAFTER(TEXTBEFORE(A2:A5, " m2"), " ", -1),
INDEX(rng, XMATCH(J2, upper, 1), XMATCH(J1, B1:G1))
)
or without LET function:
=INDEX(B2:G5, XMATCH(J2, 1*TEXTAFTER(TEXTBEFORE(A2:A5, " m2"), " ", -1), 1),
XMATCH(J1, B1:G1))
Note: The above approach doesn't require a helper column with the upper values, if such information is provided like in the updated version of the question (column E), then use the corresponding range instead.
Here is the output:
It assumes there is a space between m2 and the number in the green column. You need to standardize it in your input. For example the last green row doesn't have a space. If that is not the case you need to cleanup it first, via SUBSTITUTE function for example or manually it seems to be a typo.
The name upper contains the number associated to m2 in the green column using TEXTBEFORE and TEXTAFTER. The first XMATCH uses the third input argument (1) to ensure if the value doesn't exist, then it finds the next upper value. The second XMATCH look for an exact match for the yellow column.
This is a well known use case: Two dimensional lookup or two way lookup. For example you can check: INDEX XMATCH XMATCH to perform 2-dimentional lookup and just to adapt it to your specific case. You can also use XLOOKUP function for similar situations.
I'm trying to do it the way you showed me. However, I don't use excel and therefore I don't have this "LET" formula.
I had to improvise.
Followed this way:
I made a new column with the intended values.
All these values would be approximate, for example, if I put 150 in quantity and say that they are 5m2, it would give me the result of 1400. This is because it is below 200 units and below 5m2. Another example, if you put 499 units and put 13m2, then the result would be 360,
Currently, with this formula, I have already achieved approximations. However, the values are not matching up and when I pass 1500 units then it gives me this error: The value of Parameter 2 of the INDEX function, 5, is out of range.
Have a good year David.

Fill variable with a nested loop Stata

I´m trying to create a variable like this in Stata:
date
2012_1
2012_2
2013_1
2013_2
with the next loop:
forval y=2012/2013{
forval m=1/2{
display `m'
gen date = `y'_`m'
}
}
But I´m getting this error in the first iteration: 2012_1 invalid name. Sorry if the question is obvious, I´m newbie in Stata.
You face more problems than you realise here, but all are simple.
The immediate problem with your loop is that a value such as 2012_1 is intended by you as a value of a variable, but if so it must explicitly be a string, surrounded by "". The reason is that underscore _ is only acceptable as part of a string. Stata is clearly puzzled by your command. The error message does not quite fit the situation, although it is correct that 2012_1 is not an acceptable name, meaning the name of a variable or scalar.
If you fixed that, your next problem would be that second time around your loop the variable already exists and so generate is unacceptable. You would need to replace. So, the generate statement should be taken outside the loop.
Then again, all your loop does even with those problems fixed is to overwrite the variable each time with the same value. At your end of your loops all observations would contain the constant value 2013_2.
Longer term, there is still a problem. Evidently you want a monthly date variable, but monthly date variables like that are of little use in Stata. They sort in the correct order, but they are essentially useless for statistics or graphics.
This is a better idea all round:
generate mdate = .
local i = 1
forval y = 2012/2013 {
forval m = 1/2 {
replace mdate = ym(`y', `m') in `i'
local ++i
}
}
That is still not good style. I guess that you don't really want only months 1 and 2, but we can't know what you really want.
Do this in Stata:
clear
set obs 48
generate mdate = ym(2011, 12) + _n
format mdate %tm
list
to get an idea of a better approach -- with no loops at all.
There are quite some problems with your code. I'll go through them one by one.
`y'_`m' evaluates to 2012_1 the first iteration. Since it contains an underscore it cannot be interpreted as numeric. To be interpreted as a string value would require it to be enclosed in "". In the end, Stata tries to interpret it as a variable but 2012_1 is not a valid name (has to start with a letter), hence your error.
You could enclose your value in quotes to create a string variable: "`y'_`m'". This will work for the first iteration, but the second iteration you will get an error, since variable 'date' already exists. After creating a variable, you can only replace it.
Finally, your code says nothing about which value goes to which observation. Even if you would fix the problems already mentioned, your variable will just contain the same values for all observations which is the value of the last iteration in the loop. To replace only one observation you have to specify in i where i is the observation number.
All in all, this would be the amended code:
gen date = "."
local obs = 1
forval y=2012/2013{
forval m=1/2{
display `m'
replace date = "`y'_`m'" in `obs'
local ++obs
}
}
However, I would not recommend creating this type of date variable, as string variables are limited in what you can do with it. Stata's internal date format is the most convenient. If your values 1 and 2 represent half years you could create a half-yearly date variable, see help datetime for information on how to do this. Another option is to create a numeric variable containing the year, and a second numeric variable containing 1 and 2.

replacing r variables with multiple or different values

I have a dataframe (raw) that can have one variable (iv1) with NA's in it. I want to replace the NA with different random values from the distribution of existing scores within (iv1), not one single value. the sample size (n) can be anything - 100 to 1000.
I save the distribution to a new data frame (dbmi) because I want to keep raw and dbmi separate, and calculate the mean and SD of the existing values of iv1 within dbmi. The following code works but replaces all of the NA's with just one value. I think I need to set up a for loop? Some kind of loop that finds the next occurrence of an NA and runs the new 'rnorm' value and sticks it in and goes to the next and does it again etc etc but I cant figure out how to do that. Any help?
dbmi<-raw
attach(dbmi)
rawmean<-mean(dbmi$iv1,na.rm=TRUE)
rawsd<-sd(dbmi$iv1,na.rm=TRUE)
for (i in 1:n){
dbmi$iv1[is.na(dbmi$iv1)]<-rnorm(1,rawmean,rawsd)
}
I actually solved my own problem. I set up the variable locations [i] that had the NA's into a variable called 'pull', then I just created a new stream into a variable called 'new' I used this code to substitute.
dbmi<-raw
attach(dbmi)
rawmean<-mean(dbmi$iv1,na.rm=TRUE)
rawsd<-sd(dbmi$iv1,na.rm=TRUE)
new<-rnorm(num,rawmean,rawsd)
for (i in 1:n){
dbmi$iv1[pull]<-new
}

Making two items of different variable types relate

I have a record array called election (of index number size 4) that contains about 6 different field names, two of which I'll be focusing this question on. Say, that one field name is called totvot (total votes, declared as integer) and the other called nameC ( candidate name,declared as string) and I want to use an if then loop to say:
For count := 1 to 4 do
begin
if (election[count].totvot>wc)then
wc:=election[count].nameC;
end;
What I'm doing above is, assuming all four locations in array election[count].totvot is populated, I want to then sort all four to find the highest number. For example, if the four locations are populated as such: 2, 3, 5, and 6 then 6 would be the highest number. And six would also be in location four because it is the fourth number. After, now that I've found the highest number, I want to relate the name also found in location four to that number, hence the line WC:=election[count].nameC. Problem is, I declared WC (standing for winning candidate) as an integer and election[count].nameC as a string type variable. And I don't know what code to use to get the two to relate (not convert integer to string, but relate). Below is the code that I used to declare and initialize the variables I mentioned.
Var
wc,rate,total,choice,count,totgen,totspe,totspo,y,r: integer;
Election:array[1..4]of Elect;
Begin
clrscr;
textcolor(10);
wc:=0;
for count:=1 to 4 do
begin
Election[count].totvot:=0;
Election[count].nameC:='';
So essentially, I just wanna relate the highest value in the array election[count].totvot to the name of the candidate that has the highest votes.
The index of your array (values of count) already relates the each person's name with the number of votes. There are several different solutions to what you wish to do. Much of the skills in coding is deciding which solution you think is best. Here are two examples described. Writing the code is up to you.
Create a loop - either a for, while or repeat loop- which goes through the array from count:=1 to count:=4 and uses a new variable winner to store the number of the person with the highest score - each line of the loop would compare the votes for the winner so far to the next person's vote count, storing only the count of the highest winner.
Suppose your winning candidate was number 2 in the array (count:=2), the winner's name would be election[2].nameC and the winning number of votes would be election[2].totvot.
Immediately before your existing for statement add an lines to introduce a new variable, eg winner (as above) and set the value to 1. Inside the for statement compare the number of votes for the current candidate election[count].totvot to the value of election[winner].totvot, if election[count].totvot is larger then set winner:=count (so winner remains the highest by the end of the loop). Then your winner's name is election[winner].nameC.
Both solutions would need minor improvements if joint winners were quite likely (eg small number of total votes cast).
Your question got some down votes - possibly because your explanation was long and a bit hard to follow at times. If you can break a problem down into simple bullet pointed steps it will help and build your programming and problem solving skills as well.
Some Pascal references are hidden away in the wiki of the Pascal question tag may be useful - https://stackoverflow.com/tags/pascal/info
From your code, I can see that an Elect has a candidate's name (nameC) and the number of votes (totvot). The way you're using wc (winning count?) tells me that it's holding a vote total. Since it's a number type, you can't put a string into it, so you'll have to use something else. I would add a new variable winnerName:
Var winnerName: String;
//Initialization code here
For count := 1 to 4 do
begin
if (election[count].totvot>wc)then
wc:=election[count].totvot;
winnerName:=election[count].nameC;
end;

Creating a vector with unique observations from a variable in Stata

What I am mainly trying to do is to create a variable in which I can assign, within a stratum of my sample (defined by an 'id' variable, for instance), a name that is associated with the highest frequency (in the stratum) of this same name in another (string) variable. If tabulate* would work the way I need it to work, my code would run like this:
gen new_class_within_id=""
forvarlues i=1/80 {
tab class_var, matcell(x) if id==`i'
svmat x
sum x2
local name =x1 if x2==r(max)
replace new_class_within_id=`name' if id==`i'
}
That would be the general idea if tabulate would permit storing the unique observation names in a matrix -- the code might have some unintended errors too, of course. But while it does not seem to be possible using the above code, I thought that I could use mkmat if I would be able to store, in the loop, the unique observations inside a vector with some additional coding. Would that be possible? Also, is there an easier way to perform what I want to do?
*Firstly, I thought that using tabulate and extracting the results into a matrix would do the work that I need, but tabulate does not allow me to extract the names of the observations, just the frequencies. tabulate seemed nice because in its output it shows the unique observations of a variable in a column, but I could not find a way to extract those observations the way the output shows.
I think I understand your question, but maybe I don't. Some code:
clear
set more off
input ///
id str1 anothvar
1 a
1 a
1 a
1 b
1 m
2 c
2 c
2 m
2 a
2 z
end
list, sepby(id)
*-----
bysort id anothvar : gen count = _N
bysort id (count): gen newvar = anothvar[_N]
list, sepby(id)
More work needs to be done if you have missings and/or ties.

Resources