Summarize across a thousand variables for each observation in Stata

Summarize across a thousand variables for each observation in Stata - loops

I have made a loop that creates a variable, expectedgpa.
So now I have 1,000 variables for each observation, labeled expectedgpa1, expectedgpa2...expectedgpa1000.
I want to get the average and standard deviation for all the expectedgpas for each observation.
So if I have this
Joe 1 2 1 2 4
Sally 2 4 2 4 3
Larry 3 3 3 3 3
I want a variable returned that gives
Joe 2
Sally 3
Larry 3
Any help?

First, for future questions:
Please post code showing what you've tried. Your question shows no research effort.
Second, to clarify the terminology:
You created 1000 variables, each
one corresponding to some expected gpa. Each observation corresponds
to a different person. You want, as a result, three variables. One with the person's id
and another two with the the mean and sd of the gpa (by person).
This is my interpretation, at least.
One solution involves reshaping your data:
clear all
set more off
input ///
str5 id exgpa1 exgpa2 exgpa3 exgpa4 exgpa5
Joe 1 2 1 2 4
Sally 2 4 2 4 3
Larry 3 3 3 3 3
end
list
reshape long exgpa, i(id) j(exgpaid)
collapse (mean) mexgpa=exgpa (sd) sdexgpa=exgpa, by(id)
list
Instead of collapse, you can also run by id: summarize exgpa after the reshape, but this doesn't create new variables.
See help reshape, help collapse and help summarize for details.

You should not have created 1000 new variables without a strategy for how you were going to analyse them!
You could also use egen functions rowmean() and rowsd() and keep the same data structure.
A review of working "rowwise" in Stata is accessible at http://www.stata-journal.com/sjpdf.html?articlenum=pr0046

Related

Google spreadsheet how to count data if contains a value

been trying to solve this but am struggling. Hopefully this is the right place to ask.
What I need to do is search a row for a certain word, say "cat". If that word is found within that row then take the value of another cell ("Gain") in that row and add this to a total. Then what I need to do is take that total, and divide it by the number of times "cat" was found within a group of rows. Is this possible?
Hopefully that explains what I am trying to do.
For example my data looks likething like this -
1 2 3 4 5 Gain
1/6/22 cat bear elephant sheep 7
2/6/22 dog cat mouse cow 12
3/6/33 cat cow horse goat 5
Cow total: 2
Rows containing cow / gain (2/12+5) = 8.5
EDIT: What I have noticed it if I use SUMIF it will work ASLONG as the value I am search for is across a single column. However, if it is spreadout across multiple columns I get a value that isn't correct.

try:
=COUNTIFS(B1:E3, A6)
and then:
=INDEX(SUM(IF(B1:E3=A6, F1:F3, ))/COUNTIFS(B1:E3, A6))

This is the formula I use when I need to count how many cells have specific values in it within a range:
=COUNTif($J56:$J956,"=Cat")

(excel) How to return an array from a sum of ranges?

I'm setting up a morphological table that will have to go through potentially a couple hundred items, so it's desirable for this process to not be done by hand.
Here's a small summary of the situation:
fin
eng
op
fli
A
2
4
6
8
B
1
3
5
4
C
1
2
3
5
D
1
4
7
2
The first column holds named ranges A through D which have associated values from the 4 categories in row 1.
In a second table we create configurations based on which features are selected, something like this:
Config 1
Config 2
A
B
C
D
What I'm looking for is a formula that would read for each configuration which named range is selected, add the score for each category and return it in a simple array. Something like
Config 1 {3,6,9,13}, Config 2 {2,7,12,6}
So far I've found that the Indirect formula works exactly the way I want but I have to manually input each range. Something like:
=INDIRECT(A1)+INDIRECT(A2)
I've played around with different permutations of sum functions but instead of returning the arrays it returns the sum of the first values.
=SUM(INDIRECT(A1:A2))
Amy suggestion would be welcome.
I know this would probably be much simpler with code but this study needs to be done in excel..

I'm not sure if this answers your question as it doesn't use named ranges, but you could try something like this:
=MMULT(SEQUENCE(1,4,1,0),$B$2:$E$5*COUNTIF(INDEX($H$2:$I$3,0,ROW()-ROW($A$7)+1),$A$2:$A$5))

Gnuplot re Plotting lines after restart count

I am quite new in Gnuplot, so sorry if the question could be silly, but i have not found the solution yet.
I have a data file with this structure:
timestep=0
1 -1.367+00 -2.538572773308e-01
2 -1.351097897106e+00 -2.382132334519e-01
3 -1.372764576847e+00 -1.205983667912e-01
4 -1.33451163582e+00 -2.3438654806e-01
5 -2.414239606e+00 -2.683590584894e-01
6 -4.425446031e+00 -3.246530421864e-01
7 -6.438461740e+00 -4.589039346035e-01
...
timestep=1
...
timestep=2
...
So for every timesteps the count of iterations (what i want on x axis), restarts.
There will be many time steps, so if I plot all the timesteps together it is difficult to see every lines.
So the questions is: how can i plot the line of just one timestep?
The number of iterations of every time step is different.
Thanks

This question looks like a duplicate of this one.
Nevertheless, the idea is to plot every block separately:
plot for[in=0:2] 'file' index in u 1:2 w lines t columnheader(1)
Note, that you need to wrap every header using double quotation marks.
If you need to have a separate output for every block, than you need a do for construction looking like this:
do for [i=0:2] {
set output sprintf("%d.png", i)
plot 'file' index i u 1:2 not
}
UPDATE
I've checked it one more time, here is my minimal script:
set term png size 800, 600
set output "out.png"
plot for[in=0:1] 'file' index in u 1:2 w lp t columnheader(1)
And my "file" file:
"timestep=0"
1 0
2 3
3 2
4 1
5 6
"timestep=1"
1 4
2 3
3 9
4 6
5 3
The output has to look like this:

I think you can use plot <filename> every, more details here and another stackoverflow description here.
That way you can choose to plot not just one timestep; but say every second timestep, or every third as you choose.

Johnson-Trotter Permutation

I am trying to write a program that can generate permutations using the Johnson-Trotter method with varying number of elements. I am still confused on how to get the permutations exactly. For 5 elements, I can only get this far and then I get stuck. I am not asking for all of them just a few more so I can get the pattern down.
1 2 3 4 5
1 2 3 5 4
1 2 5 3 4
1 5 2 3 4
5 1 2 3 4

I found the following page:
http://introcs.cs.princeton.edu/java/23recursion/JohnsonTrotter.java.html
Try running this Java program step by step, it will surely be of great help...

I have used the Johnson-Trotter algorithm in a card-melding game. The princeton.edu link mentioned above is widely quoted, but pretty confusing. And the code I've seen is either recursive (yuck) or inefficient. I rewrote it as an iterator, and it works great in my game. See my other post here: https://stackoverflow.com/a/28241384/4266886 with (very short) code. Let me know if I can help further.

Taguchi Method Programming Example

I've been asked to research some programming related to the "Taguchi Method", especially as it relates to Multi-variant testing. This is one of the first subjects I've tried to research that I've found zero, nada, zilch, code examples for, especially considering its mathematical basis.
I've found some books describing the math involved but it looks like I'm going to be doing some math brush up unless I can find some code examples I can relate to.
Is this one of those rare things that once you work out the programming, it's so valuable that no one shares? Or do I just fail at Taguchi + google?

Taguchi designs are the same thing as covering arrays. The basic idea is that if you have F data "fields" and every one can have N different values, it is possible to construct NF different test cases. A covering array is basically a set of test cases that together cover all possible pairwise combinations of two field values, and the idea is to generate as small one as possible. E.g. if F=3 and N=3, you have 27 possible test cases, but it is enough to have nine test cases if you aim for pairwise coverage:
Field A | Field B | Field C
---------------------------
1 1 1
1 2 2
1 3 3
2 1 2
2 2 3
2 3 1
3 1 3
3 2 1
3 3 2
In this table, you can choose any two fields and any two values and you can always find a row that contains the chosen values for the chosen fields.
Generating Taguchi designs in general is a difficult combinatorial problem.
You can generate Taguchi designs by various methods:
Branch and bound
Stochastic search (e.g. tabu search or simulated annealing)
Greedy search
Specific mathematical constructions for some specific structures

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Summarize across a thousand variables for each observation in Stata - loops

Related

Google spreadsheet how to count data if contains a value

(excel) How to return an array from a sum of ranges?

Gnuplot re Plotting lines after restart count

Johnson-Trotter Permutation

Taguchi Method Programming Example

Categories

Resources