SPSS: Use index variable inside quotation marks - loops

I have several datasets over which i want to run identical commands.
My basic idea is to create a vector with the names of the datasets and loop over it, using the specified name in my GET command:
VECTOR=(9) D = Name1 to Name9.
LOOP #i = 1 to 9.
GET
FILE = Directory\D(#i).sav
VALUE LABELS V1 to V8 'some text D(#i)'
LOOP END.
Now SPSS doesn't recognize that i want it to use the specific value of the vector D.
In Stata i'd use
local D(V1 to V8)
foreach D{
....`D' .....
}

You can't use VECTOR in this way i.e. using GET command within a VECTOR/LOOP loop.
However you can use DEFINE/!ENDDEFINE. This is SPSS's native macro facility language, if you are not aware of this, you'll most likely need to do a lot of reading on it and understand it's syntax usage.
Here's an example:
DEFINE !RunJob ()
!DO !i !IN 1 !TO 9
GET FILE = !CONCAT("Directory\D(",#i,").sav").
VALUE LABELS V1 to V8 !QUOTE(!ONCAT("some text D(",#i,")",
!DOEND
!ENDDEFINE.
SET MPRINT ON.
!RunJob.
SET MPRINT OFF.
All the code between DEFINE and !ENDDEFINE is the body of the macro and the syntax near to the end !RunJob. then runs and executes those procedures defined in the macro.
This a very simply use of a macro with no parameters/arguments assigned but there is scope for much more complexity.
If you are new to DEFINE/!ENDEFINE I would actually suggest you NOT invest time in learning this but instead learn Python Program ability which can be used to achieve the same (and much more) with relative ease compared to DEFINE/!ENDDEFINE.
A python solution to your example would look like this (you will need Python Programmability integration with your SPSS):
BEGIN PROGRAM.
for i in xrange(1,9+1):
spss.Submit("""
GET FILE = Directory\D(%(i)s).sav
VALUE LABELS V1 to V8 'some text D(%(i)s)'.""" % locals())
END PROGRAM.
As you will notice there is much more simplicity to the python solution.

#Caspar: use Python for SPSS for such jobs. SPSS macros have been long deprecated and had better be avoided.
If you use Python for this, you don't even have to type in the file names: you can simply look up all file names in some folder that end with ".sav" as shown in this example.
HTH!

The Python approach is as Ruben says much superior to the old macro facility, but you can use the SPSSINC PROCESS FILES extension command to do tasks like this without any need to know Python. PROCESS FILES is included in the Python Essentials in recent versions of Statistics but can be downloaded from the SPSS Community website (www.ibm.com/developerworks/spssdevcentral) in older versions.
The idea is that you create a syntax file that works on one data file, and PROCESS FILES iterates that over a list of input files or a wildcard specification. For each file, it defines file handles and macros that you can use in the syntax file to open and process the data.

Related

Why is Jena tdb2.tdbquery optimization stuck on "Reorder/generic"

I am using apache-jena-4.5.0 and fuseki pretty much out-of-the-box. I had created a TDB2 dataset using fuseki, but now shut it off and using command-line utilities of jena on a Windows box inside a bash shell.
My basic command is:
java -cp "*" tdb2.tdbquery --loc ~/path/to/databases/DEMO--explain --set arq:logExec=FINE --time --query ~/path/to/demoquery.txt
And my question is why does the output always contain only Reorder/generic like this:
15:56:00 INFO exec :: Reorder/generic
Even after I have tried all these:
successfully run tdb2.tdbstats and gotten a reasonable-looking temp.opt file as output
moved that temp.opt to each of /path/to/DEMO/stats.opt and /path/to/DEMO/Data-001/stats.opt
tried uppercase STATS.OPT for each since I'm on windows, just to be sure
Still I don't seem to be able to produce any output with Reorder/stats
This question did not contain enough detail to answer. The intended question was why won't TDB2 optimize my query and the answer was in the SPARQL, not in the invocation of tdb2.tdbquery or the location of the stats.opt file.
My SPARQL contained multiple FROM clauses, which forced TDB into BGP mode (instead of quads) and thwarted any optimization. As best we can tell at the moment, one wishing to use the TDB2 optimizer should use either the default graph, or a combination of FROM NAMED and GRAPH which causes the evaluation of graphs one at a time.

SPSS loop ROC analysis for lots of variables

In SPSS, I would like to perform ROC analysis for lots of variables (989). The problem, when selecting all variables, it gives me the AUC values and the curves, but a case is immediately excluded if it has one missing value within any of the 989 variables. So, I was thinking of having a single-variable ROC analysis put into loop. But I don't have any idea how to do so. I already named all the variables var1, var2, var3, ..., var988, var989.
So, how could I loop a ROC analysis? (Checking "Treat user-missing values as valid" doesn't do the trick)
Thanks!
this sounds like a job for python. Its usually the best solution for this sort of job in SPSS.
So heres a framwork that might help you. I am woefully unfamiliar with ROC-Analysis, but this general pattern is applicable to all kinds of looping scenarios:
begin program.
import spss
for i in range(spss.GetVariableCount()):
var = spss.GetVariableName(i)
cmd = r'''
* your variable-wise analysis goes here --> use spss syntax, beetween the three ' no
* indentation is needed. since I dont know what your syntax looks like, we'll just
* run descriptives and frequencies for all your variables as an example
descriptives %(var)s
/sta mean stddev min max.
fre %(var)s.
'''%locals()
spss.Submit(cmd)
end program.
Just to quickly go over what this does: In line 4 we tell spss to do the following as many times as theres variables in the active dataset, 989 in your case. In line 5 we define a (python) variable named var which contains the variable name of the variable at index i (0 to 988 - the first variable in the dataset having index 0). Then we define a command for spss to execute. I like to put it in raw strings because that simplifies things like giving directories. A raw string is defined by the r''' and ends at the '''. in line 12. "spss.Submit(cmd)" gives the command defined after "cmd = " to spss for execution. Most importantly though, whenever the name of the variable would appear in your syntax, substitute it with "%(var)s"
If you put "set mprint on." a line above the "begin program." youll see exactly what it does in the viewer.

How to run one TCL script from a batch/job file by passing command line arguments?

I am writing scripts in tcl for a project I am working on.
I wanted to automate things as much as possible and wanted to not touch the source code of the script as far as possible. I want to run the main script file from a .bat or .job file sort of thing where I pass the command to execute the script along with the arguments.
I have referred to this post on stackoverflow:
How to run tcl script in other tcl script?
And have done pretty much the same thing. However, since my script is naked code rather than a single huge proc, I dont have the "args" parameter to read the arguments I wanted to pass.
For example, if script1.tcl is the main file containing the naked code, I want a file script2.job or script2.bat such that,
<command-to-run-script1.tcl> <mandatory-args> <optional-args>
is the content of the file.
Any suggestions on how I can implement the same?
To run a Tcl script, passing in some arguments, do:
tclsh script1.tcl theFirstArgument theSecondArgument ...
That's how it works in CMD scripts/BAT files on Windows, and in shell scripts on all Unixes. You might want to put quotes around some of the arguments too, but that's just absolute normal running of a program with arguments. (The tclsh might need to be tclsh8.5 or tclsh85 or … well, it depends on how it's installed. And script1.tcl might need to be a full path to the script.)
Inside the script, the arguments (starting at theFirstArgument) will appear in the Tcl list in the global argv variable. Note that this is not args, which is a feature of procedures. There are lots of ways of parsing the list of arguments, but any quoting supplied during the call itself should have been already stripped.
Here's a very simple version:
foreach argument $argv {
puts "Oh, I seem to have a >>$argument<<"
}
You probably need something more elaborate! There's many possibilities though, so be sure to be exact to get more focussed ideas.
If you're calling Tcl from another Tcl script, you need to use exec to do it. On the other hand, you can make things a bit easier for yourself in other ways:
exec [info nameofexecutable] script1.tcl theFirstArgument theSecondArgument ...
The info nameofexecutable command returns the name of the Tcl interpreter program (often tclsh8.5 or wish86 or …)

Create functions in matlab

How can I create a function with MATLAB so I can call it any where in my code?
I'm new to MATLAB so I will write a PHP example of the code I want to write in MATLAB!
Function newmatlab(n){
n=n+1;
return n;
}
array=array('1','2','3','4');
foreach($array as $x){
$result[]=newmatlab($x);
}
print_f($result);
So in nutshell, I need to loop an array and apply a function to each item in this array.
Can some one show me the above function written in MATLAB so I can understand better?
Note: I need this because I wrote a code that analyzes a video file and then plots data on a graph. I then and save this graph into Excel and jpg. My problem is that I have more than 200 video to analyze, so I need to automate this code to loop inside folders and analyze each *.avi file inside and etc.
As others have said, the documentation covers this pretty thoroughly, but perhaps we can help you understand.
There are a handful of ways that you can define functions in Matlab, but probably the most useful for you to get started is to define one in an m-file. I'll use your example code. You can do this by creating a file called newmatlab.m in your project's directory that looks something like this
% newmatlab.m
function result = newmatlab(array)
result = array + 1
Note that the function has the same name as the file and that there is no explicit return statement - it figures that out by what you've named the output parameter(s) (result in this case).
Then, in the same directory, you can create a script (or another function) that calls your newmatlab function by that name:
% main.m (or whatever)
a = [1 2 3 4];
b = newmatlab(a)
That's it! This is a simplified explanation, but hopefully enough to get you started and then the documentation can help more.
PS: There's no "include" in Matlab; any functions that are defined in m-files in the current path are visible. You can find out what's in the path by using the path command. Roughly, it's going to consist of
Matlab's own directory
The MATLAB subdirectory of your Documents directory
The current working directory

C - Reading multiple files

just had a general question about how to approach a certain problem I'm facing. I'm fairly new to C so bear with me here. Say I have a folder with 1000+ text files, the files are not named in any kind of numbered order, but they are alphabetical. For my problem I have files of stock data, each file is named after the company's respective ticker. I want to write a program that will open each file, read the data find the historical low and compare it to the current price and calculate the percent change, and then print it. Searching and calculating are not a problem, the problem is getting the program to go through and open each file. The only way I can see to attack this is to create a text file containing all of the ticker symbols, having the program read that into an array and then run a loop that first opens the first filename in the array, perform the calculations, print the output, close the file, then loop back around moving to the second element (the next ticker symbol) in the array. This would be fairly simple to set up (I think) but I'd really like to avoid typing out over a thousand file names into a text file. Is there a better way to approach this? Not really asking for code ( unless there is some amazing function in c that will do this for me ;) ), just some advice from more experienced C programmers.
Thanks :)
Edit: This is on Linux, sorry I forgot to metion that!
Under Linux/Unix (BSD, OS X, POSIX, etc.) you can use opendir / readdir to go through the directory structure. No need to generate static files that need to be updated, when the file system has the information you want. If you only want a sub-set of stocks at a given time, then using glob would be quicker, there is also scandir.
I don't know what Win32 (Windows / Platform SDK) functions are called, if you are developing using Visual C++ as your C compiler. Searching MSDN Library should help you.
Assuming you're running on linux...
ls /path/to/text/files > names.txt
is exactly what you want.
opendir(); on linux.
http://linux.die.net/man/3/opendir
Exemple :
http://snippets.dzone.com/posts/show/5734
In pseudo code it would look like this, I cannot define the code as I'm not 100% sure if this is the correct approach...
for each directory entry
scan the filename
extract the ticker name from the filename
open the file
read the data
create a record consisting of the filename, data.....
close the file
add the record to a list/array...
> sort the list/array into alphabetical order based on
the ticker name in the filename...
You could vary it slightly if you wish, scan the filenames in the directory entries and sort them first by building a record with the filenames first, then go back to the start of the list/array and open each one individually reading the data and putting it into the record then....
Hope this helps,
best regards,
Tom.
There are no functions in standard C that have any notion of a "directory". You will need to use some kind of platform-specific function to do this. For some examples, take a look at this post from Cprogrammnig.com.
Personally, I prefer using the opendir()/readdir() approach as shown in the second example. It works natively under Linux and also on Windows if you are using Cygwin.
Approach 1) I would just have a specific directory in which I have ONLY these files containing the ticker data and nothing else. I would then use the C readdir API to list all files in the directory and iterate over each one performing the data processing that you require. Which ticker the file applies to is determined only by the filename.
Pros: Easy to code
Cons: It really depends where the files are stored and where they come from.
Approach 2) Change the file format so the ticker files start with a magic code identifying that this is a ticker file, and a string containing the name. As before use readdir to iterate through all files in the folder and open each file, ensure that the magic number is set and read the ticker name from the file, and process the data as before
Pros: More flexible than before. Filename needn't reflect name of ticker
Cons: Harder to code, file format may be fixed.
but I'd really like to avoid typing out over a thousand file names into a text file. Is there a better way to approach this?
I have solved the exact same problem a while back, albeit for personal uses :)
What I did was to use the OS shell commands to generate a list of those files and redirected the output to a text file and had my program run through them.
On UNIX, there's the handy glob function:
glob_t results;
memset(&results, 0, sizeof(results));
glob("*.txt", 0, NULL, &results);
for (i = 0; i < results.gl_pathc; i++)
printf("%s\n", results.gl_pathv[i]);
globfree(&results);
On Linux or a related system, you could use the fts library. It's designed for traversing file hierarchies: man fts,
or even something as simple as readdir
If on Windows, you can use their Directory Management API's. More specifically, the FindFirstFile function, used with wildcards, in conjunction with FindNextFile

Resources