I'm new to Stata. I need to implement multiple imputation on Stata, but I have a problem when using it. I do everything like instruction by the following codes:
use http://www.stata-press.com/data/r11/mheart1s20
mi describe
mi impute regress bmi attack smokes age female hsgrad, add(20)
Then I got every thing as in instruction. However, I want to find the out put file (completed data).
There is no separate output data file; the 'completed' data is in memory. If you do mi describe again, you'll see that the dataset in memory now contains M = 40 imputations, whereas the output from the previous mi describe showed it contained M = 20 imputations. So you have added 20 imputations to the dataset, as specified by the add(20) option to your mi impute command.
Related
Summary
I have a database of ~1500 entries. Some of them have more than one entry (up to 5) under the same identifier. I have a pre-set list of IDs I would like to check to see if they have multiple records, and if so, pull all of them, so I can compare them for differences.
Current solution
I currently have this working correctly, using a method to pull the 1st, 2nd, 3rd, 4th and 5th response respectively. The current formula is:
=INDEX('DATABASE'!$B:$B,SMALL(IF($A2='DATABASE'!$A:$A,ROW('DATABASE'!$A:$A)-ROW('DATABASE'!$A$1)+1),1)) for the 1st result, =INDEX('DATABASE'!$B:$B,SMALL(IF($A2='DATABASE'!$A:$A,ROW('DATABASE'!$A:$A)-ROW('DATABASE'!$A$1)+1),2)) for the second, and so on. In this case, the above code would check the value in A2 (ID2Check), and then do a match for it on the 'DATABASE' sheet in column A. If it matches, it will return the first value that matches from column B. The second piece of code above does the same, except it returns the second match from column B. ID2Check is populated by me, and the Results 1-5 are the code output. For example:
Database:
ID
Data Item
AAA
Lemons
111
Greenhouse
FOO
Computer
AAA
Monitor
CAT
Coffee
ORANGE
Pintglass
123
Birthday
FOO
Avengers
AAA
Plasters
FOO
NachoTaco
Code output:
ID2Check
Result 1
Result 2
Result 3
Result 4
Result 5
AAA
Lemons
Monitor
123
Birthday
FOO
Computer
Avengers
NachoTaco
Problem
This solution works correctly, and will pull the expected values as required. The problem is, as this code needs to be entered 5 times per identifier, and there are ~1500 records to check, I need to flash fill 7500 formulas. Unsurprisingly, this leads to Excel crashing and becoming unresponsive for the best part of 30 minutes. I am looking to find a solution that would be more efficient and stable when running en-mass. Any assistance would be appreciated.
Final Notes / Info
I'm using MSO 365, version 2202 (I cannot update beyond this). This will be run in the Desktop version of Excel. I would prefer this is done exclusively using formulas, but I am open to using Visual Basic if it would be otherwise impossible or incredibly inefficient.
=UNIQUE(C5:C15)
=TRANSPOSE(FILTER($D$5:$D$15;$C$5:$C$15=F5))
If you have access to HSTACK() function then could try-
=HSTACK(UNIQUE(A2:A11),TEXTSPLIT(TEXTJOIN("/",FALSE,BYROW(UNIQUE(A2:A11),LAMBDA(x,TEXTJOIN("|",TRUE,FILTER(B2:B11,A2:A11=x))))),"|","/",,,""))
LUA neewb having trouble with all the different ways of looping.
I adjusted a template script for mapping audio files in my filesystem to an audio sampler software, where each audio file goes into 1 group --> zone --> file.
This works as intended but now for a "Lite-Version" package of the software I only need some of the samples from my computer and this is where I'm having problems with my loop functions.
This is the function causing issues:
for index, file in next, samples do
...
The table "samples" consists of 180 samples but I'm only creating 38 groups and zones in the audio software for putting audio files into, so obviously this returns an error.
ERROR:
" ...- PATH.LINE: bad argument #2 to '--index' (invalid index, size is 58 got 58)"
The 58 makes sense because it is 38 sample groups + 18 empty groups + 2 template groups that get duplicated.
Since that is still smaller than 180 it wants to keep going and I'm not sure how to tell it to stop there.
CODE:
local samples = {}
local root = 0
-- [[ FILE SYSTEM]]
local i = 1
for _,p in filesystem.directoryRecursive(folderPath) do
if filesystem.isRegularFile(p) then
if filesystem.extension(p) == '.wav' or filesystem.extension(p) == '.aif' or filesystem.extension(p) == '.aiff' then
samples[i] = p
i = i+1
end
end
end
After this I create the amount of groups I need in the audio software (38).
A group contains a zone which contains a sample
-- [[ ZONES & FILES ]]
-- Create zones and place one file in each of the created groups.
-- file is a string property of the object Zones
-- samples is a table, populated with paths to our samples
for index, file in next, samples do
-- Initialize the zone variable.
local z = Zone()
-- Add a zone for each sample.
instrument.groups[index +1].zones:add(z)
-- Populate the attached zone with a sample from our table.
z.file = file
-- detect and set root note
local detectedPitch = mir.detectPitch(index)
z.rootKey = math.floor(detectedPitch + 0.5)
end
So my question is: how do I loop through the table samples but only do it up to 38 and not to 180?
I could do
for index = 1, #NUM_FACT_LAYERS do
but what do I do about file then?
Samples only has the paths to the files but z.file needs the string of the file.
z.file = file won't work in this case.
I guess that's why the previous script used for for in
I have 50 external EXCEL files. For each of these files, let's say #I, I import data as it follows in the SYNTAX of SPSS-statistics25:
GET DATA /TYPE=XLSX
/FILE='file#I.xlsx'
/SHEET=name 'Sheet2'
/CELLRANGE=full
/READNAMES=on
/ASSUMEDSTRWIDTH=32767.
EXECUTE.
DATASET NAME DataSet1 WINDOW=FRONT.
Then, I rank the variables included in #I file (WA CI) and I select one single case, at most, as it follows:
RANK VARIABLES= WA CI (D)
/RANK
/PRINT=YES
/TIES=LOW.
COUNT SVAR= RWA RCI (1).
SELECT IF( SVAR=2).
EXECUTE.
The task is the following:
I should print the sum of values of RWA looping on each EXCEL file #I. RWA can have value 1 or can be empty. If there are not selected cases (RWA is empty), the contribution to the sum of values should be 0. The final outcome should be the number of times RWA and RCI have the same TOP rank out of 50 Excel files.
How can I do this in a smart way?
Since I can't see the real data files, the following is a little in the dark, but I think it should be a viable strategy (you might as well try :)):
* first defining a macro to stack all the files together.
define stackFiles ()
GET DATA /TYPE=XLSX /FILE='file1.xlsx'
/SHEET=name 'Sheet2' /CELLRANGE=full /READNAMES=on /ASSUMEDSTRWIDTH=32767 /keep WA CI.
compute source=1.
exe.
dataset name gen.
!do !i=2 !to 40
GET DATA /TYPE=XLSX /FILE=!concat("'file", !i, ".xlsx'")
/SHEET=name 'Sheet2' /CELLRANGE=full /READNAMES=on /ASSUMEDSTRWIDTH=32767/keep WA CI.
compute source=!i.
exe.
add files /file=gen /file=*.
exe.
!doend.
!enddefine.
* now run the macro.
stackFiles .
* now for the rest of the analysis.
* first split the data by source file, then rank and select.
sort cases by source.
split file by source.
RANK VARIABLES= WA CI (D) /RANK /PRINT=YES /TIES=LOW.
COUNT SVAR= RWA RCI (1).
SELECT IF SVAR=2.
EXECUTE.
At this point you have up to 40 rows remaining - 0 or 1 from each original file. You can count or sum using descriptives RWA.
I have a csv file .It logs some data depending upon the test condition.
The header file of this csv file is like below
UTC Time(s) SVID-1 Constel-1 Status-1 Zij-1 SVID-2 Constel-2 Status-2
10102 1 G P 0 2 G P
Zij-2 SVID-3 Constel-3 Status-3 Zij-3 .......
0.3 3 G A --
.....
Apart from UTC Time column, other columns may increase or decrease depending
upon test condition or number of satellites I use.
If any extra satellite introduces or reduces then corresponding Svid,Constel, Status,Zij will be present or will not be there.
I am interested to know whether is it possible to create runtime variable for each column without looking into csv file header.
I have a dataset that has patient data according to the site they visited our mobile clinic. I have now written up a series of commands such as freqs and crosstabs to produce the analyses I need, however I would like this to be done for patients at each site, rather than the dataset as whole.
If I had only one site, a mere filter command with the variable that specifies a patient's site would suffice, but alas I have 19 sites, so I would like to find a way to loop through my code to produce these outputs for each site. That is to say for i in 1 to 19:
1. Take the i th site
2. Compute a filter for this i th site
3. Run the tables using this filtered data of patients at ith site
Here is my first attempt using DO REPEA. I also tried using LOOP earler.
However it does not work I keep getting an error even though these are closed loops.
Is there a way to do this in SPSS syntax? Bear in mind I do not know Python well enough to do this using that plugin.
*LOOP #ind= 1 TO 19 BY 1.
DO REPEAT #ind= 1 TO 20.
****8888888888888888888888888888888888888888888888888888888 Select the Site here.
COMPUTE filter_site=(RCDSITE=#ind).
USE ALL.
FILTER BY filter_site.
**********************Step 3: Apply the necessary code for tables
*********Participation in the wellness screening, we actually do not care about those who did FP as we are not reporting it.
COUNT BIO= CheckB (1).
* COUNT FPS=CheckF(1).
* COUNT BnF= CheckB CheckF(1).
VAL LABEL BIO
1 ' Has the Wellness screening'
0 'Does not have the wellness screening'.
*VAL LABEL FPS
1 'Has the First patient survey'.
* VAL LABEL BnF
1 'Has either Wellness or FPS'
2 'Has both surveys done'.
FREQ BIO.
*************************Use simple math to calcuate those who only did the Wellness/First Patient survey FUB= F+B -FnB.
*******************************************************Executive Summary.
***********Blood Pressure.
FREQ BP.
*******************BMI.
FREQ BMI.
******************Waist Circumference.
FREQ OBESITY.
******************Glucose.
FREQ GLUCOSE.
*******************Cholesterol.
FREQ TC.
************************ Heamoglobin.
FREQ HAEMOGLOBIN.
*********************HIV.
FREQ HIV.
******************************************************************************I Lifestyle and General Health.
MISSING VALUES Gender GroupDep B8 to B13 ('').
******************Graphs 3.1
Is this just Frequencies you are producing? Try the SPLIT procedure by the variable RCDSITE. Should be enough.
SPLIT FILES allows you to partition your data by up to eight variables. Then each procedure will automatically iterate over each group.
If you need to group the results at a higher level than the procedure, that is, to run a bunch of procedures for each group before moving on to the next one so that all the output for a group will be together, you can use the SPSSINC SPLIT DATASET and SPSSINC PROCESS files extension commands to do this.
These commands require the Python Essentials. That and the commands can be downloaded from the SPSS Community website (www.ibm.com/developerworks/spssdevcentral) if you have at least version 18.
HTH,
Jon Peck
A simple but perhaps not very elegant way is to select from the menu: Data/Select Cases/If condition, there you enter the filter for site 1 and press Paste, not OK.
This will give the used filter as syntax code.
So with some copy/paste/replace/repeat you can get the freqs and all other results based on the different sites.