Looping through a set of variables for R package analysis - loops

Here's a novice question..being new to R this has got to be.
I am trying to run an R package that analyzes "csv" data using the following R scripts:
library(agricolae)
LXTOUTPUT2<-with(RLINXTES2, lineXtester(Replication, Lines, Tester, Y))
All elements analyzed by the function "lineXtester" are numerics.
Analyzing 1 variable is fine. However, I have several variables to supply as "Y" and would like to run this through as one chunk.
I tried the "for loop" but couldn't find the right script that would cycle thru all variables.
Instead of "for loop" is there a better, faster option? I read about "vectorizing" but R is still a strange stuff for me.
Would greatly appreciate your help.
Thank you.

My sincere apology. I was finally able to figure out my problem by reading and learning more about "vectorization" and applying it to my dataframe and accessing the elements using the [[]] indexing.
Indeed, it is much simpler and faster than using the "for loop".
Please disregard my request for help.
Thank you just the same.

Related

how to do batch shorthand loop for large arrays?

hi I'm new to doing batch scripts so my code is simple.
I know you should use a for loop to iterate over an arrays contents but I saw a shorthand method that retrieves the contents with indexes between the specified range like so :
array[1-7]
and it works but when you use two digit numbers it runs into problems.
what am I doing wrong here?
appreciate your help :)

Advice on reading multiple text files into an array with Ruby

I'm currently writing out a program in Ruby, which I'm fairly new at, and it requires multiple text files to be pushed into an array line by line.
I am currently unable to actually test my code since I'm at work and this is for personal use, but I'm seeking advice to see if my code is correct. I knows how to read a file and push it to the array. If possible can someone check it over and advise if I have the correct idea? I'm self taught regarding Ruby and have no-one to check my work.
I understand if this isn't the right place for trying to get this sort of advice and it's deleted/locked. Apologies if so.
contentsArray = []
Dir.glob('filepath').each do |filename|
next if File.directory?(filename)
r = File.open("#{path}#{filename}")
r.each_line { |line| contentsArray.push line}
end
I'm hoping this snippet will take the lines from multiple files in the same directory and stick them in the array so I can later splice what's in there.
Thank you for the question.
First let's assume that 'filepath' is something like the target pattern you want to glob in Dir.glob('filepath') (I used Dir.glob('src/*.h').each do |filename| in my test).
After that, File.open("#{path}#{filename}") prepends another path to the already complete path you'll have in filename.
And lastly, although this is probably not the problem, the code opens the file and never closes it. The IO object provides a readlines method that takes care of opening and closing the file for you.
Here's some working code that you can adapt:
contentsArray = []
Dir.glob('filepath').each do |filename|
next if File.directory?(filename)
lines = IO.readlines(filename)
contentsArray.concat(lines)
end
puts "#{contentsArray.length} LINES"
Here are references to the Ruby doc's for the IO::readlines and Array::concat methods used:
https://ruby-doc.org/core-2.5.5/IO.html#method-i-readlines
https://ruby-doc.org/core-2.5.5/Array.html#method-i-concat
As an alternative to using the goto (next) the code could conditionally execute on files, like this:
if File.file?(filename)
lines = IO.readlines(filename)
contentsArray.concat(lines)
end

SPSS loop ROC analysis for lots of variables

In SPSS, I would like to perform ROC analysis for lots of variables (989). The problem, when selecting all variables, it gives me the AUC values and the curves, but a case is immediately excluded if it has one missing value within any of the 989 variables. So, I was thinking of having a single-variable ROC analysis put into loop. But I don't have any idea how to do so. I already named all the variables var1, var2, var3, ..., var988, var989.
So, how could I loop a ROC analysis? (Checking "Treat user-missing values as valid" doesn't do the trick)
Thanks!
this sounds like a job for python. Its usually the best solution for this sort of job in SPSS.
So heres a framwork that might help you. I am woefully unfamiliar with ROC-Analysis, but this general pattern is applicable to all kinds of looping scenarios:
begin program.
import spss
for i in range(spss.GetVariableCount()):
var = spss.GetVariableName(i)
cmd = r'''
* your variable-wise analysis goes here --> use spss syntax, beetween the three ' no
* indentation is needed. since I dont know what your syntax looks like, we'll just
* run descriptives and frequencies for all your variables as an example
descriptives %(var)s
/sta mean stddev min max.
fre %(var)s.
'''%locals()
spss.Submit(cmd)
end program.
Just to quickly go over what this does: In line 4 we tell spss to do the following as many times as theres variables in the active dataset, 989 in your case. In line 5 we define a (python) variable named var which contains the variable name of the variable at index i (0 to 988 - the first variable in the dataset having index 0). Then we define a command for spss to execute. I like to put it in raw strings because that simplifies things like giving directories. A raw string is defined by the r''' and ends at the '''. in line 12. "spss.Submit(cmd)" gives the command defined after "cmd = " to spss for execution. Most importantly though, whenever the name of the variable would appear in your syntax, substitute it with "%(var)s"
If you put "set mprint on." a line above the "begin program." youll see exactly what it does in the viewer.

Linux Bash pass function argument to array name

I'm working on a script that has a number of functions in place which pull data from a few different arrays. We hope to keep the arrays individualized for reporting purposes. The information in the arrays does not change and the only thing different between each function is which array name is being used. Since all of the functions have 98% the same content I'm trying to pull them into 1 single array for simplified management.
The issue I'm facing though is that I'm not able to figure out the correct syntax to obtain the length of an array based on the array title that is passed in the function argument. I can't post the actual script, but here is a mock up that details a simplified version of what I'm testing with. I believe if we can get it working using the mock script below I can transfer the needed changes to the actual script.
array1=(
"item1 123"
"item2 456"
)
array2=(
"stockA qwe"
"stockB asd"
"stockC zxc"
)
test() {
local ref=${1}[#]
IFS=$'\n'; for i in ${!ref}; do echo $i ; done
}
test array1
test array2
The script above so far will echo the content of each array line based on argument 1 when the function and it's argument is called; which is working as needed. I've tried many different combinations such as len=${#${1}[#]} but I always receive a "bad substitution" error. The functions I mention before have while loops and for statements that use the array length to know when to stop, so being able to pull that information really ties it all together. What I'm hoping for is something like the flow below
I plan to continue my research on this, but thank you for any help and knowledge that can be provided!
-Cyanide
I think the only solution is to create a copy of the array, then take the length of that array:
local ref=${1}[#]
copy=( "${!ref}" )
len=${#copy[#]}
Since bash does not allow chaining of the parameter expansion operators, I know of no shorter way to use both ${#...} and ${!...} on the same line.

Fast way to add line/row number to text file

I have a file wich has about 12 millon lines, each line looks like this:
0701648016480002020000002030300000200907242058CRLF
What I'm trying to accomplish is adding a row numbers before the data, the numbers should have a fixed length.
The idea behind this is able to do a bulk insert of this file into a SQLServer table, and then perform certain operations with it that require each line to have a unique identifier. I've tried doing this in the database side but I havenĀ“t been able to accomplish a good performance (under 4' at least, and under 1' would be ideal).
Right now I'm trying a solution in python that looks something like this.
file=open('file.cas', 'r')
lines=file.readlines()
file.close()
text = ['%d %s' % (i, line) for i, line in enumerate(lines)]
output = open("output.cas","w")
output.writelines(str("".join(text)))
output.close()
I don't know if this will work, but it'll help me having an idea of how will it perform and side effects before I keep on trying new things, I also thought doing it in C so I have a better memory control.
Will it help doing it in a low level language? Does anyone know a better way to do this, I'm pretty sure it has being done but I haven't being able to find anything.
thanks
oh god no, don't read all 12 million lines in at once! If you're going to use Python, at least do it this way:
file = open('file.cas', 'r')
try:
output = open('output.cas', 'w')
try:
output.writelines('%d %s' % tpl for tpl in enumerate(file))
finally:
output.close()
finally:
file.close()
That uses a generator expression which runs through the file processing one line at a time.
Why don't you try cat -n ?
Stefano is right:
$ time cat -n file.cas > output.cas
Use time just so you can see how fast it is. It'll be faster than python since cat is pure C code.

Resources