Passing an array as an argument from a Perl script to a R script - arrays

I am new to R and I have a Perl Script in which I want to call a R Script, which calculates something for me (not important what in this context). I want to give as arguments an input file, an array which contains some numbers and a number for a total number of clusters. medoid.r is the name of my R Script.
my $R_out;
$R_out = qx{./script/medoid.r $output #cluster $NUMBER_OF_CLUSTERS}
My current R code looks like this. Right now I just print cluster to see what is inside.
args <- commandArgs(TRUE)
filename = args[1]
cluster = as.vector(args[2])
number_of_cluster = args[3]
matrix = read.table(filename, sep='\t', header=TRUE, row.names=1, quote="")
print(cluster)
Is it possible to give an array as an argument? How can I save it in R? Right now only the first number of the array is stored and printed, but I would like to have every number in a vector or something similar.

If you do this in Perl
$R_out = qx{./script/medoid.r $output #cluster $NUMBER_OF_CLUSTERS};
your command line will look similar to this
./scriptmedoid.r output 111 222 333 3
assuming that $output is 'output' and #clusters = (111, 222, 333).
If you want to read that in R, you need to assign all elements after the first one in args to cluster but the last one, and the last one to number_of_cluster. In Perl you can use shift and pop for that.
my #args = #_;
my $output = shift #args;
my $number = pop #args;
# now #args only contains the clusters
I don't know if those operators exist in R.
You cannot pass a full data structure unless you serialize it in some way.

In perl, qx will expect a string as an argument. You may certainly use an array to generate that string, but ultimately it will still be a string. You cannot "pass an array" to a system call, you can only pass command-line text/arguments.
Keep in mind, you are executing a system call running Rscript as a child process. The way you're describing the issue, there is no inter-process communication beyond the command line. Think of it this way: how would you type an array on the command line? You may have some textual way of representing an array, but you can't type an array on the command line. Arrays are stored and accessed in memory differently by various different languages, and thus are not really portable between two languages like you're suggesting.
One solution: all that said, there may be a simple solution for you. You haven't provided any information on the type of data you want to pass in your array. If it is simple enough, you may try passing it on the command line as delimited text, and then break it up to use in your Rscript.
Here is an Rscript that shows you what I mean:
args = commandArgs(trailingOnly=TRUE)
filename = args[1]
cluster <- c(strsplit(args[2],"~"))
sprintf("Filename: %s",filename)
sprintf("Cluster list: %s",cluster)
print("Cluster:")
cluster
sprintf("First Item: %s",cluster[[1]][1])
Save it as "test.r" and try executing it with "Rscript test.r test.txt one~two" and you'll get the following output (tested on Rscript 46084, OpenBSD):
[1] "Filename: test.txt"
[1] "Cluster list: c(\"one\", \"two\")"
[1] "Cluster:"
[[1]]
[1] "one" "two"
[1] "First Item: one"
So, all you'd have to do on the perl side of things is join() your array using "~" or any other delimiter- it is highly dependent on your data, and you haven't provided it.
Summary: re-think how you want to communicate between perl and Rscript. Consider sending the data as a delimited string (if it's the right size) and breaking it up on the other side. Look into IPC if that won't work, consider environment variables or other options. There is no way to send an array reference on the command-line.
Note: you may want to read up on security risks of different system calls in perl.

Related

Shell script split a string by space

The bash shell script can split a given string by space into a 1D array.
str="a b c d e"
arr=($str)
# arr[0] is a, arr[1] is b, etc. arr is now an array, but what is the magic behind?
But, what exactly happened when we can arr=($str)? My understanding is the parenthesis here creates a subshell, but what happen after that?
In an assignment, the parentheses simply indicate that an array is being created; this is independent of the use of parentheses as a compound command.
This isn't the recommended way to split a string, though. Suppose you have the string
str="a * b"
arr=($str)
When $str is expanded, the value undergoes both word-splitting (which is what allows the array to have multiple elements) and pathname expansion. Your array will now have a as its first element, b as its last element, but one or more elements in between, depending on how many files in the current working directly * matches. A better solution is to use the read command.
read -ra arr <<< "$str"
Now the read command itself splits the value of $str without also applying pathname expansion to the result.
It seems you've confused
arr=($str) # An array is created with word-splitted str
with
(some command) # executing some command in a subshell
Note that
arr=($str) is different from arr=("$str")in that in the latter, the double quotes prevents word splitting ie the array will contain only one value -> a b c d e.
You can check the difference between the two by the below
echo "${#arr[#]}"

bash array with square bracket strings

I want to make an array with string values that have square brackets. but every time I keep getting output unexpected.
selections=()
for i in $choices
do
selections+=("role[${filenames[$i]}]")
done
echo ${selections[#]}
If choices were 1 and 2, and the array filenames[1] and filenames[2] held the values 'A', 'B' I want the selections array to hold the strings role[A], and role[B]
instead the output I get is just roles.
I can make the code you presented produce the output you wanted, or not, depending on the values I assign to variables filenames and choices.
First, I observe that bash indexed arrays are indexed starting at 0, not 1. If you are using the values 1 and 2 as indices into array filenames, and if that is an indexed array with only two elements, then it may be that ${filenames[2]} expands to nothing. This would be the result if you initialize filenames like so:
# NOT WHAT YOU WANT:
filenames=(A B)
Instead, either assign array elements individually, or add a dummy value at index 0:
# Could work:
filenames=('' A B)
Next, I'm suspicious of choices. Since you're playing with arrays, I speculate that you may have initialized choices as an array, like so:
# NOT CONSISTENT WITH YOUR LATER USAGE:
choices=(1 2)
If you expand an array-valued variable without specifying an index, it is as if you specified index 0. With the above initialization, then, $choices would expand to just 1, not 1 2 as you intend. There are two possibilities: either initialize choices as a flat string:
# Could work:
choices='1 2'
or expand it differently:
# or expand it this way:
for i in "${choices[#]}"
. Do not overlook the quotes, by the way: that particular form will expand to one word per array element, but without the quotes the array elements would be subject to word splitting and other expansions (though that's moot for the particular values you're using in this case).
The quoting applies also, in general, to your echo command: if you do not quote the expansion then you have to analyze the code much more carefully to be confident that it will do what you intend in all cases. It will be subject not only to word splitting, but pathname expansion and a few others. In your case, there is a potential for pathname expansion to be performed, depending on the names of the files in the working directory (thanks #CharlesDuffy). It is far safer to just quote.
Anyway, here is a complete demonstration incorporating your code verbatim and producing the output you want:
#!/bin/bash
filenames=('' 'A' 'B')
choices="1 2"
selections=()
for i in $choices
do
selections+=("role[${filenames[$i]}]")
done
echo ${selections[#]}
# better:
# echo "${selections[#]}"
Output:
role[A] role[B]
Finally, as I observed in comments, there is no way that your code could output "roles", as you claim it does, given the inputs (variable values) you claim it has. If that's in fact what you see, then either it is not related to the code you presented at all, or your inputs are different than you claim.

Can i format output of matlab command such that i can use it to declare a new variable?

It's best explained with an easier example. Say some script in MATLAB gives me a cell array of strings:
temp = dir;
names = {temp.name}'
ans =
'folder1'
'folder2'
'file1'
I would like to use this output in another script, in another matlab session. Ideally, in the second script i would write
names = {'folder1', 'folder2', 'file1'}
but this means copypasting the output right under "ans = " and then manually adding the commas and curly brackets. In my case the cell array is quite large so this is undesirable. Even more it feels clumsy and there could be an easier way. Is there any way to make matlab print the output in such a way that i do not have to do this?
Exactly the same thing would be nice to know for matrices instead of cell arrays!!
I am aware of saving the variable in a .mat file and loading it, but i was wondering if the above is also possible (it would be cleaner in my case).
Personally I would advise the use of a cleaner way of handling this (such as mat files).
But then again sometimes the time spent setting these up is just not worth it for simple tasks which are unlikely to be repeated much...
For matrices there is a builtin function to do this, for cells however we would need produce a sting with the required format...
Matrix
For 1d or 2d matrices mat2str provides this functionality
mat2str(eye(2))
ans =
[1 0;0 1]
Cell
However to my knowledge there is no such builtin function for cells.
For a 1d cell array of strings the following will give the output in a copyable format:
['{',sprintf('''%s'' ',names{:}),'}']
ans =
{'folder1' 'folder2' 'file1' }
note: the stings in the cells cannot contain the ' character
If i understand you correctly, you are getting the names output from one script and want to use it within another script. Since you then cannot pass it as function argument, you are currently copying it over. One could do that with eval and copy&paste around:
names = {'folder1'
'folder2'
'file1'};
% create the command
n = length(names);
cmd = sprintf(['names = {',repmat('''%s'', ', 1, n-1) ,'''%s''}'], names{:}); % '%s, %s, ...., %s' format
% cmd contains the string: names_new = {'folder1', 'folder2', 'file1'}
% eval the cmd in script 2
eval(cmd) % evals the command names = {'folder1', 'folder2', 'file1'}
But this is generally very bad practice as it gets insanely hard to debug if something goes wrong somewhere. Also it makes you copy and paste things around, which i feel is uncomfortable. How about storing them in a txt file and loading them in the second script? It gets things done autmatically.
names = {'folder1'
'folder2'
'file1'};
% write output to file
fid = fopen('mynames.txt', 'w'); % open file to write something
fprintf(fid, [repmat('%s, ',1, n-1), '%s'], names{:});
fclose(fid);
% here comes script 2
fid = fopen('mynames.txt', 'r'); % open file to read something
names_loaded = textscan(fid, '%s');
names_loaded = names_loaded{:};
fclose(fid)
I think the key here is that you have a variable in 1 place, and want to use it in a different case.
In that situation you don't want to copy the output matlab generates, you just want to save the value itself.
After finding the result just do this:
save names
Later you can load this variable with
load names
Check doc save and doc names for more extensive examples. You may for example want to save all relevant variables in a file with a more generic name.

Generate variable name by merging another variable value (shell)

I have an array which contains
commName[0]="ls"
commName[1]="date"
commName[2]="crontab"
commName[3]="uname"
commName[4]="hostname"
Now the array doesn't always contain these. Sometimes it can have more indices sometimes less. And the values are not always ls,date,... They can be different. Bottom line, I don't know the size nor the values of the array when I'm coding.
Every array value ls,date,... has its own unique address. So for example, ls would have /home/test/ and date would have /home/test/test2/ etc... These addresses need to be stored into a variable which will be used later on in the code. So I should have following variables according to the given array
$lsAddress
$dateAddress
$crontabAddress
$unameAddress
$hostnameAddress
Therefore, I need a way to make these variables (have in mind that I don't know ls,date,uname,....)
My approach was this
for ((j=0 ; j<${#commName[#]} ; j++))
do
set commName[$j]Nick="hi"
echo $(${commName[$j]}Nick)
done
What I expected this to do was to create new variables for every index of the array and set them equal to hi (just for test purposes) and then access those new variables.
Also, The new created variables Must be accessible anywhere. So, I can't have a temporary variable that keeps getting replaced.
However, this method isn't working... Is there any other way I can do this?
Use eval. Try this:
for ((j=0 ; j<${#commName[#]} ; j++))
do
param=`echo ${commName[$j]}Nick`
eval "$param=hi1"
eval "echo \$$param"
done
Use two parallel arrays, so that the entry in the command array matches with the corresponding entry in the address array.
commName[0]="ls"
commName[1]="date"
commName[2]="crontab"
commName[3]="uname"
commName[4]="hostname"
commAddress[0]="/home/test/" # ls
commAddress[1]="/home/test/test2" # date
# etc
Then, when you have a particular value of i, you know that ${commName[i]} and ${commAddress[i]} go together.
I recommend the two arrays, but you might also consider using bash's indirect parameter expansion instead.
$ commName[0]="ls"
$ lsAddress="/home/test"
$ name="${commName[0]}Address"
$ echo "${!name}"
/home/test

Linux Bash pass function argument to array name

I'm working on a script that has a number of functions in place which pull data from a few different arrays. We hope to keep the arrays individualized for reporting purposes. The information in the arrays does not change and the only thing different between each function is which array name is being used. Since all of the functions have 98% the same content I'm trying to pull them into 1 single array for simplified management.
The issue I'm facing though is that I'm not able to figure out the correct syntax to obtain the length of an array based on the array title that is passed in the function argument. I can't post the actual script, but here is a mock up that details a simplified version of what I'm testing with. I believe if we can get it working using the mock script below I can transfer the needed changes to the actual script.
array1=(
"item1 123"
"item2 456"
)
array2=(
"stockA qwe"
"stockB asd"
"stockC zxc"
)
test() {
local ref=${1}[#]
IFS=$'\n'; for i in ${!ref}; do echo $i ; done
}
test array1
test array2
The script above so far will echo the content of each array line based on argument 1 when the function and it's argument is called; which is working as needed. I've tried many different combinations such as len=${#${1}[#]} but I always receive a "bad substitution" error. The functions I mention before have while loops and for statements that use the array length to know when to stop, so being able to pull that information really ties it all together. What I'm hoping for is something like the flow below
I plan to continue my research on this, but thank you for any help and knowledge that can be provided!
-Cyanide
I think the only solution is to create a copy of the array, then take the length of that array:
local ref=${1}[#]
copy=( "${!ref}" )
len=${#copy[#]}
Since bash does not allow chaining of the parameter expansion operators, I know of no shorter way to use both ${#...} and ${!...} on the same line.

Resources