using perl array as input to bash bedtools command - arrays

I'm wondering if it is possible to use a perl array as the input to a program called bedtools ( http://bedtools.readthedocs.org/en/latest/ )
The array is itself generated by bedtools via the backticks method in perl. When I try to use the perl array in another bedtools bash command it complains that the argument list is too long because it seems to treat each word or number in the array as a separate argument.
Example code:
my #constit_super = `bedtools intersect -wa -a $enhancers -b $super_enhancer`;
that works fine and can be viewed by:
print #constit_super
which looks like this onscreen:
chr10 73629894 73634938
chr10 73636240 73639574
chr10 73639726 73657218
but then if I try to use this array in bedtools again e.g.
my $bedtools = `bedtools merge -i #constit_super`;
then i get this error message:
Can't exec "/bin/sh": Argument list too long
Is there anyway to use this perl array in bedtools?
many thanks
27/9/14 thanks for the info on doing it via a file. however, sorry to be a pain I would really like to do this without writing a file if possible.

I haven't tested this but I think it would work.
bedtools is expecting one argument with the -i flag, the name of a .bed file. This was in the docs. You need to write your array to a file and then input it into the bedtools merge command.
open(my $fh, '>', "input.bed") or die $!;
print $fh join("", #constit_super);
close $fh;
Then you can sort it with this command from the docs:
`sort -k1,1 -k2,2n input.bed > input.sorted.bed`;
Finally, you can run your merge command.
my $bedtools = `bedtools merge -i input.sorted.bed`;
Hopefully this sets you on the right track.

Related

Why does "echo $array" print all members of the array in this specific case instead of only the first member like in any other case?

I have encountered a very curious problem, while trying to learn bash.
Usually trying to print an echo by simply parsing the variable name like this only outputs the first member Hello.
#!/bin/bash
declare -a test
test[0]="Hello"
test[1]="World"
echo $test # Only prints "Hello"
BUT, for some reason this piece of code prints out ALL members of the given array.
#!/bin/bash
declare -a files
counter=0
for file in "./*"
do
files[$counter]=$file
let $((counter++))
done
echo $files # prints "./file1 ./file2 ./file3" and so on
And I can't seem to wrap my head around it on why it outputs the whole array instead of only the first member. I think it has something to do with my usage of the foreach-loop, but I was unable to find any concrete answer. It's driving me crazy!
Please send help!
When you quoted the pattern, you only created a single entry in your array:
$ declare -p files
declare -a files=([0]="./*")
If you had quoted the parameter expansion, you would see
$ echo "$files"
./*
Without the quotes, the expansion is subject to pathname generation, so echo receives multiple arguments, each of which is printed.
To build the array you expected, drop the quotes around the pattern. The results of pathname generation are not subject to further word-splitting (or recursive pathname generation), so no quotes would be needed.
for file in ./*
do
...
done

Can I send a string to grep with commands and file names in a bash script?

Is it possible to send an array variable from the command line,
(where argsGrep="$#" and the command line input is something to the extent of -i Something) to a grep command
e.g.
result=$(grep $argsGrep ./file)
When $argsGrep has only the term to be searched, it works just fine, but the moment it contains more than the text and has a grep command, I can't get it to work whatsoever.
Don't use the intermediate string. It will just break things.
Just expand "$#" at the point you need it.
If you must save the contents of "$#" for some reason then you must use another array.
argsarr=("$#")
result=$(grep "${argsarr[#]}" ./file)

Output from bash command not storing in array

my below code is very simple... all im doing is grepping a file using an IP address REGEX, and any IP addresses that are found that match my REGEX, I want to store them in #array2.
i know my REGEX works, because ive tested it on a linux command line and it works exactly how I want, but when its integrated into the script, it just returns blank. there should be 700 IP's stored in the array.
#!/usr/bin/perl
use warnings;
use strict;
my #array2 = `grep -Eo "\"\b[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\"" test1.txt`;
print #array2;
Backticks `` behave like a double quoted string by default.
Therefore you need to escape your backslashes:
my #array2 = `grep -Eo "\\"\\b[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\"" test1.txt`;
Alternatively, you can use a single quoted version of qx to avoid any interpolation:
my #array2 = qx'grep -Eo "\"\b[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\"" test1.txt';
However, the method I'd recommend is to not shell out at all, but instead do this logic in perl:
my #array2 = do {
open my $fh, '<', 'test1.txt' or die "Can't open file: $!";
grep /\b[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\b/, <$fh>;
};
I really wouldn't mix bash and perl. It's just asking for pain. Perl can do it all natively.
Something like:
open (my $input_fh, "<", "test.txt" ) or die $!;
my #results = grep ( /\b[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}/, <$input_fh> );
This does however, require slurping the file into memory, which isn't optimal - I'd generally use a while loop, instead.
The text inside the backticks undergoes double-quotish substitution. You will need to double your backslashes.
Running grep from inside Perl is dubious, anyway; just slurp in the text file and use Perl to find the matches.
The easiest way to retrieve the output from an external command is to use open():
open(FH, 'grep -Eo \"\b[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\" test1.txt'."|")
my #array2=<FH>;
close (FH);
..though I think Sobrique's idea is the best answer here.

bash4 read file into associative array

I am able to read file into a regular array with a single statement:
local -a ary
readarray -t ary < $fileName
Not happening is reading a file into assoc. array.
I have control over file creation and so would like to do as simply as possible w/o loops if possible at all.
So file content can be following to be read in as:
keyname=valueInfo
But I am willing to replace = with another string if cuts down on code, especially in a single line code as above.
And ...
So would it be possible to read such a file into an assoc array using something like an until or from - i.e. read into an assoc array until it hits a word, or would I have to do this as part of loop?
This will allow me to keep a lot of similar values in same file, but read into separate arrays.
I looked at mapfile as well, but does same as readarray.
Finally ...
I am creating an options list - to select from - as below:
local -a arr=("${!1}")
select option in ${arr[*]}; do
echo ${option}
break
done
Works fine - however the list shown is not sorted. I would like to have it sorted if possible at all.
Hope it is ok to put all 3 questions into 1 as the questions are similar - all on arrays.
Thank you.
First thing, associative arrays are declared with -A not -a:
local -A ary
And if you want to declare a variable on global scope, use declare outside of a function:
declare -A ary
Or use -g if BASH_VERSION >= 4.2.
If your lines do have keyname=valueInfo, with readarray, you can process it like this:
readarray -t lines < "$fileName"
for line in "${lines[#]}"; do
key=${line%%=*}
value=${line#*=}
ary[$key]=$value ## Or simply ary[${line%%=*}]=${line#*=}
done
Using a while read loop can also be an option:
while IFS= read -r line; do
ary[${line%%=*}]=${line#*=}
done < "$fileName"
Or
while IFS== read -r key value; do
ary[$key]=$value
done < "$fileName"

Splitting string separated by comma into array values in shell script?

My data set(data.txt) looks like this [imageID,sessionID,height1,height2,x,y,crop]:
1,0c66824bfbba50ee715658c4e1aeacf6fda7e7ff,1296,4234,194,1536,0
2,0c66824bfbba50ee715658c4e1aeacf6fda7e7ff,1296,4234,194,1536,0
3,0c66824bfbba50ee715658c4e1aeacf6fda7e7ff,1296,4234,194,1536,0
4,0c66824bfbba50ee715658c4e1aeacf6fda7e7ff,1296,4234,194,1536,950
These are a set of values which I wish to use. I'm new to shell script :) I read the file line by line like this ,
cat $FILENAME | while read LINE
do
string=($LINE)
# PROCESSING THE STRING
done
Now, in the code above, after getting the string, I wish to do the following :
1. Split the string into comma separated values.
2. Store these variables into arrays like imageID[],sessionID[].
I need to access these values for doing image processing using imagemagick.
However, I'm not able to perform the above steps correctly
set -A doesn't work for me (probably due to older BASH on OSX)
Posting an alternate solution using read -a in case someone needs it:
# init all your individual arrays here
imageId=(); sessionId=();
while IFS=, read -ra arr; do
imageId+=(${arr[0]})
sessionId+=(${arr[1]})
done < input.csv
# Print your arrays
echo "${imageId[#]}"
echo "${sessionId[#]}"
oIFS="$IFS"; IFS=','
set -A str $string
IFS="$oIFS"
echo "${str[0]}";
echo "${str[1]}";
echo "${str[2]}";
you can split and store like this
have a look here for more on Unix arrays.

Resources