I'm trying to create n variables in bash...ideally an array with these n variables so that I may later go through and assign them to columns I read in from a csv file. I guess I'm just really confusing myself with syntax. Help is much appreciated!
The easiest way in bash to read a line from a CSV file and put it into an array is:
IFS=, read -r -a ARRAY < filename
The IFS=, at the beginning tells read to use , as a field separator. The option -a ARRAY tells read to put the results in a bash array named ARRAY (you could use any name; it doesn't need to be uppercase).
You would normally want to do that in a loop, something like:
while IFS=, read -r -a ARRAY; do
# do something with ARRAY
done < filename
This is not a very robust technique, since it will not work with quoted fields and especially not with embedded commas in quoted fields. There are CSV-parsing libraries for most languages; if you have any familiarity with Python, it might be a good choice.
Related
I'm not an expert in bash coding and I'm trying to do one interative-like code to help me in my work.
I have a file that contains some numbers (coordinates), and I'm trying to make a code to read some specific numbers from the file and then store them in an array. Modify that array using some arithmetic operation and then replace the numbers in the original file with the modified array. So far I've done everything except replacing the numbers in the file, I tried using sed but it does not change the file. The original numbers are stored in an array called "readfile" and the new numbers are stored in an array called "d".
I'm trying to use sed in this way: sed -i 's/${readfile[$j]}/${d[$k]}/' file.txt
And I loop j and k to cover all the numbers in the arrays. Everything seems to work but the file is not being modified. After some digging, I'm noticing that sed is not reading the value of the array, but I do not know how to fix that.
Your help is really appreciated.
When a file isn't modified by sed -i, it means sed didn't find any matches to modify. Your pattern is wrong somehow.
After using " instead of ' so that the variables can actually be evaluated inside the string, look at the contents of the readfile array and check whether it actually matches the text. If it seems to match, look for special characters in the pattern, characters that would mean something specific to sed (the most common mistake is /, which will interfere with the search command).
The fix for special characters is either to (1) escape them, e.g. \/ instead of just /, or (2) (and especially for /) to use another delimiter for the search/replace command (instead of s/foo/bar/ you can use s|foo|bar| or s,foo,bar, etc - pretty much any delimiter works, so you can pick one that you know isn't in the pattern string).
If you post data samples and more of your script, we can look at where you went wrong.
I have a file with pipe delimiter and one record has more columns than expected.
For example:
File NPS.txt
1|a|10
2|b|20
3|c|30
4|d|40|old
The last column has more columns than expected and I want to know the line number to understand what the problem is.
I found this command:
awk -F\; '{print NF}' NPS.txt | sort | uniq -c
With this command I know that one columns has one column added but I do not know which one is.
I would use a bash script
a) Define a counter variable, starting at 0,
b) iterate over each line in your file, adding +1 to the counter at the beginning of each loop,
c) split each line into an array based on the "|" delimiter, logging the counter # if the array contains more than 3 elements. you can log to console or write to a file.
It's been awhile since I've scripted in Linux, but these references might help:
Intro:
https://www.guru99.com/introduction-to-shell-scripting.html
For Loops:
https://www.cyberciti.biz/faq/bash-for-loop/
Bash String Splitting
How do I split a string on a delimiter in Bash?
Making Scripts Executable
https://www.andrewcbancroft.com/blog/musings/make-bash-script-executable/
There may be a good one-liner out there, but it's not a difficult script to write.
I've seen some suggestions to use eval, but that assumes the quoted text is an entire string in itself, and not part of one.
Simple example:
Split the string
<SERVICETYPE Name="Two words">
So that we get
<SERVICETYPE
Name="Two words">
Is this possible? Ideally in a statement that I can then use to loop through the values. (Yes, I know perl or something would be easier, but I don't have anything more useful than bash availale, so I have to get this working).
I'm currently splitting into an array with the following
IFS=" " read -ra xmlfield <<< ${xmlline}
for i in "${xmlfield[#]}"; do
But then of course that gives me:
<SERVICETYPE
Name="Two
words">
Which is a pain.
I am new to R and I have a Perl Script in which I want to call a R Script, which calculates something for me (not important what in this context). I want to give as arguments an input file, an array which contains some numbers and a number for a total number of clusters. medoid.r is the name of my R Script.
my $R_out;
$R_out = qx{./script/medoid.r $output #cluster $NUMBER_OF_CLUSTERS}
My current R code looks like this. Right now I just print cluster to see what is inside.
args <- commandArgs(TRUE)
filename = args[1]
cluster = as.vector(args[2])
number_of_cluster = args[3]
matrix = read.table(filename, sep='\t', header=TRUE, row.names=1, quote="")
print(cluster)
Is it possible to give an array as an argument? How can I save it in R? Right now only the first number of the array is stored and printed, but I would like to have every number in a vector or something similar.
If you do this in Perl
$R_out = qx{./script/medoid.r $output #cluster $NUMBER_OF_CLUSTERS};
your command line will look similar to this
./scriptmedoid.r output 111 222 333 3
assuming that $output is 'output' and #clusters = (111, 222, 333).
If you want to read that in R, you need to assign all elements after the first one in args to cluster but the last one, and the last one to number_of_cluster. In Perl you can use shift and pop for that.
my #args = #_;
my $output = shift #args;
my $number = pop #args;
# now #args only contains the clusters
I don't know if those operators exist in R.
You cannot pass a full data structure unless you serialize it in some way.
In perl, qx will expect a string as an argument. You may certainly use an array to generate that string, but ultimately it will still be a string. You cannot "pass an array" to a system call, you can only pass command-line text/arguments.
Keep in mind, you are executing a system call running Rscript as a child process. The way you're describing the issue, there is no inter-process communication beyond the command line. Think of it this way: how would you type an array on the command line? You may have some textual way of representing an array, but you can't type an array on the command line. Arrays are stored and accessed in memory differently by various different languages, and thus are not really portable between two languages like you're suggesting.
One solution: all that said, there may be a simple solution for you. You haven't provided any information on the type of data you want to pass in your array. If it is simple enough, you may try passing it on the command line as delimited text, and then break it up to use in your Rscript.
Here is an Rscript that shows you what I mean:
args = commandArgs(trailingOnly=TRUE)
filename = args[1]
cluster <- c(strsplit(args[2],"~"))
sprintf("Filename: %s",filename)
sprintf("Cluster list: %s",cluster)
print("Cluster:")
cluster
sprintf("First Item: %s",cluster[[1]][1])
Save it as "test.r" and try executing it with "Rscript test.r test.txt one~two" and you'll get the following output (tested on Rscript 46084, OpenBSD):
[1] "Filename: test.txt"
[1] "Cluster list: c(\"one\", \"two\")"
[1] "Cluster:"
[[1]]
[1] "one" "two"
[1] "First Item: one"
So, all you'd have to do on the perl side of things is join() your array using "~" or any other delimiter- it is highly dependent on your data, and you haven't provided it.
Summary: re-think how you want to communicate between perl and Rscript. Consider sending the data as a delimited string (if it's the right size) and breaking it up on the other side. Look into IPC if that won't work, consider environment variables or other options. There is no way to send an array reference on the command-line.
Note: you may want to read up on security risks of different system calls in perl.
I have a set of value pairs in a text file and need to read them in two different arrays. The values in the file are stored in the following manner
100=5
300=10
19=30
I need to read 100, 300, 19 in a separate array and 5,10 and 30 in a different array. so far I'm able to read the values of 5,10 and 30, but how do I read the other values?
below is the code i have to read the assigned values.
while read -r line; do declare $line; done <file
POSIX shell does not specify an array datatype (the tags only mention "shell"), so you cannot "read them in two different arrays" unless you're willing to use a shell which supports such a datatype.
This should work in Bash (untested):
keys=()
values=()
while IFS='=' read -r key value
do
keys+=("$key")
values+=("$value")
done < key_value_pairs.txt
References:
IFS
Word splitting
Arrays