SPSS loop ROC analysis for lots of variables - loops

In SPSS, I would like to perform ROC analysis for lots of variables (989). The problem, when selecting all variables, it gives me the AUC values and the curves, but a case is immediately excluded if it has one missing value within any of the 989 variables. So, I was thinking of having a single-variable ROC analysis put into loop. But I don't have any idea how to do so. I already named all the variables var1, var2, var3, ..., var988, var989.
So, how could I loop a ROC analysis? (Checking "Treat user-missing values as valid" doesn't do the trick)
Thanks!

this sounds like a job for python. Its usually the best solution for this sort of job in SPSS.
So heres a framwork that might help you. I am woefully unfamiliar with ROC-Analysis, but this general pattern is applicable to all kinds of looping scenarios:
begin program.
import spss
for i in range(spss.GetVariableCount()):
var = spss.GetVariableName(i)
cmd = r'''
* your variable-wise analysis goes here --> use spss syntax, beetween the three ' no
* indentation is needed. since I dont know what your syntax looks like, we'll just
* run descriptives and frequencies for all your variables as an example
descriptives %(var)s
/sta mean stddev min max.
fre %(var)s.
'''%locals()
spss.Submit(cmd)
end program.
Just to quickly go over what this does: In line 4 we tell spss to do the following as many times as theres variables in the active dataset, 989 in your case. In line 5 we define a (python) variable named var which contains the variable name of the variable at index i (0 to 988 - the first variable in the dataset having index 0). Then we define a command for spss to execute. I like to put it in raw strings because that simplifies things like giving directories. A raw string is defined by the r''' and ends at the '''. in line 12. "spss.Submit(cmd)" gives the command defined after "cmd = " to spss for execution. Most importantly though, whenever the name of the variable would appear in your syntax, substitute it with "%(var)s"
If you put "set mprint on." a line above the "begin program." youll see exactly what it does in the viewer.

Related

Looping through a set of variables for R package analysis

Here's a novice question..being new to R this has got to be.
I am trying to run an R package that analyzes "csv" data using the following R scripts:
library(agricolae)
LXTOUTPUT2<-with(RLINXTES2, lineXtester(Replication, Lines, Tester, Y))
All elements analyzed by the function "lineXtester" are numerics.
Analyzing 1 variable is fine. However, I have several variables to supply as "Y" and would like to run this through as one chunk.
I tried the "for loop" but couldn't find the right script that would cycle thru all variables.
Instead of "for loop" is there a better, faster option? I read about "vectorizing" but R is still a strange stuff for me.
Would greatly appreciate your help.
Thank you.
My sincere apology. I was finally able to figure out my problem by reading and learning more about "vectorization" and applying it to my dataframe and accessing the elements using the [[]] indexing.
Indeed, it is much simpler and faster than using the "for loop".
Please disregard my request for help.
Thank you just the same.

Prolog - load 2D array from text file into program

I'm currently working on a Prolog program which would, logically, have some kind of "save/load" feature. I've gotten the save part to work, where I'm now (as a start) creating three *.txt files, which will contain a 2D array/list each. However, I'm facing some issues trying to load it back into the program.
What I have right now is something as simple as:
% Initialize globals
?- nb_setval(aisles_global, []).
% Load all previously saved data from the given .txt files
load_all():-
exists_file('C:\\Users\\Xariez\\Desktop\\aisles.txt'),
open('C:\\Users\\Xariez\\Desktop\\aisles.txt', read, InAisles),
read_line_to_codes(InAisles, AisleString),
% read_line_to_string(InAisles, AisleString),
writeln(AisleString),
nb_setval(aisles_global, AisleString),
close(InAisles).
As previously mentioned, the files will have a 2D array each, but as an example:
aisles.txt
[["Beer", "Cider" ], [ "Milk", "Juice" ], ["Light Bread", "Dark Bread"]]
I've tried using both read_line_to_codes/2 and read_line_to_string/2. While it technically works when reading it into codes, I feel like it would quickly become annoying to reconstruct a 2D list/array since it's now got every character as a code. And while reading into a string succeeds in the reading part, we now have a string that LOOKS like a list, but isn't really one (if I've understood this situation correctly?). And hence I'm here.
If anyone got any ideas/help, that'd be greatly appreciated. Thanks!
Prolog has predicates for doing input/output of terms directly. You don't need to roll these yourself. Reading terms is done using read, while for writing there are several options.
Your best shot for writing is probably write_canonical, which will write terms in "canonical" syntax. This means that everything is quoted as needed (for example, an atom 'A' will be printed as 'A' and not as plain A like write would print it), and terms with operators are printed in prefix syntax, which means you get the same term even if the reader doesn't have the same operators declared (for example, x is y is printed as is(x, y)).
So you can write your output like:
dump(Aisles, Filename) :-
open(Filename, write, OutAisles),
write_canonical(OutAisles, Aisles),
write(OutAisles, '.'),
close(OutAisles).
Writing the . is necessary because read expects to read a term terminated by a period. Your reading predicate could be:
load(Aisles, Filename) :-
open(Filename, read, InAisles),
read(InAisles, Aisles),
close(InAisles).
Running this using some example data:
?- aisles(As), dump(As, aisles).
As = [["Beer", "Cider"], x is y, 'A', _G1380, ["Milk", "Juice"], ["Light Bread", "Dark Bread"]].
?- load(As, aisles).
As = [["Beer", "Cider"], x is y, 'A', _G1338, ["Milk", "Juice"], ["Light Bread", "Dark Bread"]].
The contents of the file, as you can check in a text editor, is:
[["Beer","Cider"],is(x,y),'A',_,["Milk","Juice"],["Light Bread","Dark Bread"]].
Note the canonical syntax for is. You should almost certainly avoid writing variables, but this shouldn't be a problem in your case.

Renaming and creating variables in a list of Stata files

I have a list of Stata datasets: among some a variable tor is absent, and I want to add that variable if it doesn't exist.
The datasets contain a variable called xclass where x could be anything (e.g. Aclass, lclass, etc.). I would like to rename those variables to dec.
I want to create a variable adjusted which is "yes" if the file name contains adjusted and "no" if not.
I guess it would look something like:
Loop through list of datasets and their variables {
if variable contains pattern class
rename to dec
if no variable tor, then
gen str tor = total
if file name contains pattern adjusted
gen str adjusted = yes
else gen str adjusted = no
}
But then in proper Stata language.
So I've got this now, but it's not working, it doesn't do anything...
cd "C:\Users\test"
local filelist: dir "." files "*.dta", respectcase
foreach filename of local myfilelist {
ds *class
local found `r(varlist)'
local nfound : word count `found'
if `nfound' == 1 {
rename `found' dec
}
else if `nfound' > 1 {
di as err "warning: multiple *class variables in `filename'"
}
capture confirm var tor
if !_rc == 0 {
gen tor = "total"
}
gen adjusted = cond(strpos("`filename'", "_adjusted_"), "yes", "no")
}
This is not an answer, this is advice that won't fit into a comment.
What you are attempting is not elementary Stata. If indeed you are unfamiliar with Stata (not stata) you will find it challenging to automate this process. I'm sympathetic to you as a new user of Stata - it's a lot to absorb. And even worse if perhaps you are under pressure to produce some output quickly. Nevertheless, I'd like to encourage you to take a step back from your immediate tasks.
When I began using Stata in a serious way, I started by reading my way through the Getting Started with Stata manual relevant to my setup. Chapter 18 then gives suggested further reading, much of which is in the Stata User's Guide, and I worked my way through much of that reading as well. There are a lot of examples to copy and paste into Stata's do-file editor to run yourself, and better yet, to experiment with changing the options to see how the results change.
All of these manuals are included as PDFs in the Stata installation (since version 11) and are accessible from within Stata - for example, through the PDF Documentation section of Stata's Help menu. The objective in doing the reading was not so much to master Stata as to be sure I'd become familiar with a wide variety of important basic techniques, so that when the time came that I needed them, I might recall their existence, if not the full syntax.
The Stata documentation is really exemplary - there's just a lot of it. The path I followed surfaces the things you need to know to get started in a hurry.
With that said, you will perhaps find the foreach command helpful for looping, the filelist command for obtaining a list of Stata datasets (not databases), and the ds command for obtaining a list of variable names within a Stata dataset. More subtly, the capture command will let you attempt to generate your tor variable and will simply fail gracefully if it already exists, saving a small amount of program logic.
The middle part can be sketched:
// assumes local macro filename contains file name
ds *class
local found `r(varlist)'
local nfound : word count `found'
if `nfound' == 1 {
rename `found' dec
}
else if `nfound' > 1 {
di as err "warning: multiple *class variables in `filename'"
}
capture confirm var tor
if _rc {
gen tor = "total"
}
gen adjusted = cond(strpos("`filename'", "adjusted"), "yes", "no")
On managing lists of files: filelist (SSC) is very good; also see fs (SSC) for a different approach.
EDIT: Here is proof of concept for the last detail:
. local filename1 "something adjusted somehow"
. local filename2 "frog toad newt dragon"
. di cond(strpos("`filename1'", "adjusted"), "yes", "no")
yes
. di cond(strpos("`filename2'", "adjusted"), "yes", "no")
no
strpos("<string1>", "<string2>") returns a non-zero result, namely the starting position of the second string in the first if the first contains the second. Non-zero as an argument means true in Stata; zero means false.
See help strpos() and if desired help cond().
I can't see your filenames to comment or test your code, but one possible problem is that the local macro is not defined in the same namespace as that in which you are trying to evaluate the expression. (That's what local means.) A macro that isn't defined will be evaluated as an empty string, with the result you mention.

SPSS: Use index variable inside quotation marks

I have several datasets over which i want to run identical commands.
My basic idea is to create a vector with the names of the datasets and loop over it, using the specified name in my GET command:
VECTOR=(9) D = Name1 to Name9.
LOOP #i = 1 to 9.
GET
FILE = Directory\D(#i).sav
VALUE LABELS V1 to V8 'some text D(#i)'
LOOP END.
Now SPSS doesn't recognize that i want it to use the specific value of the vector D.
In Stata i'd use
local D(V1 to V8)
foreach D{
....`D' .....
}
You can't use VECTOR in this way i.e. using GET command within a VECTOR/LOOP loop.
However you can use DEFINE/!ENDDEFINE. This is SPSS's native macro facility language, if you are not aware of this, you'll most likely need to do a lot of reading on it and understand it's syntax usage.
Here's an example:
DEFINE !RunJob ()
!DO !i !IN 1 !TO 9
GET FILE = !CONCAT("Directory\D(",#i,").sav").
VALUE LABELS V1 to V8 !QUOTE(!ONCAT("some text D(",#i,")",
!DOEND
!ENDDEFINE.
SET MPRINT ON.
!RunJob.
SET MPRINT OFF.
All the code between DEFINE and !ENDDEFINE is the body of the macro and the syntax near to the end !RunJob. then runs and executes those procedures defined in the macro.
This a very simply use of a macro with no parameters/arguments assigned but there is scope for much more complexity.
If you are new to DEFINE/!ENDEFINE I would actually suggest you NOT invest time in learning this but instead learn Python Program ability which can be used to achieve the same (and much more) with relative ease compared to DEFINE/!ENDDEFINE.
A python solution to your example would look like this (you will need Python Programmability integration with your SPSS):
BEGIN PROGRAM.
for i in xrange(1,9+1):
spss.Submit("""
GET FILE = Directory\D(%(i)s).sav
VALUE LABELS V1 to V8 'some text D(%(i)s)'.""" % locals())
END PROGRAM.
As you will notice there is much more simplicity to the python solution.
#Caspar: use Python for SPSS for such jobs. SPSS macros have been long deprecated and had better be avoided.
If you use Python for this, you don't even have to type in the file names: you can simply look up all file names in some folder that end with ".sav" as shown in this example.
HTH!
The Python approach is as Ruben says much superior to the old macro facility, but you can use the SPSSINC PROCESS FILES extension command to do tasks like this without any need to know Python. PROCESS FILES is included in the Python Essentials in recent versions of Statistics but can be downloaded from the SPSS Community website (www.ibm.com/developerworks/spssdevcentral) in older versions.
The idea is that you create a syntax file that works on one data file, and PROCESS FILES iterates that over a list of input files or a wildcard specification. For each file, it defines file handles and macros that you can use in the syntax file to open and process the data.

String substitution using tcl API

Is there a way to (ab)use the tcl C-API to 'parse' a string, doing all the replacement (including sub commands in square brackets), but stopping before actually evaluating the resulting command line?
What I'm trying to do is create a command (in C, but I'll consider doing a tcl-wrapper, if there's an elegant way to do it there) which takes a block as a parameter (i.e. curly-braces-quoted-string). I'd like to take that block, split it up and perform substitutions in the same way as if it was to be executed, but stop there and interpret the resulting lines instead.
I've considered creating a namespace, where all valid first-words are defined as commands, however this list is so vast (and pretty much dynamic) so it quickly becomes too cumbersome. I also tried this approach but with the unknown command to intercept the different commands. However, unknown is used for a bunch of stuff, and cannot be bound to a namespace, so I'd have to define it whenever I execute the block, and set it back to whatever it was before when I'm done, which feels pretty shaky. On top of that I'd run the risk (fairly low risk, but not zero) of colliding with an actual command, so I'd very much prefer to not use the unknown command.
The closest I can get is Tcl_ParseCommand (and the rest of the family), which produces a parse tree, which I could manually evaluate. I guess I'll resort to doing it this way if there's no better solution, but I would of course prefer it, if there was an 'official' way..
Am I missing something?
Take a look at Tcl_SubstObj. It's the C equivalent of the [subst] command, which appears to be what you're looking for.
As you indicated in your comment, subst doesn't quite do what you're looking to do. If it helps, the following Tcl code may be what you're looking for:
> set mydata {mylist item $listitem group item {$group item}}
> set listitem {1 2 3}
> subst $mydata ;# error: can't read "group": no such variable
> proc groupsubst {data} {
return [uplevel 1 list $data]
}
> groupsubst $mydata ;# mylist item {1 2 3} group item {$group item}

Resources