Exclude elements of a list that are in another array - arrays

I'd like to create an array like this
#exclude = ("[INFO] Reading file", "[INFO] All file(s) read");
which contains items that I would like to ignore while looping through another array
The other array is #nyuulog which I've ready into from a file and looks similar to this:
[INFO] Uploading 37 article(s) from 3 file(s) totalling 23.98 MiB```
[INFO] Reading file 157.1.1.par2...
[INFO] Reading file 159.1.1.rar...
[INFO] Reading file 159.1.1.vol0+1.par2...
[INFO] All file(s) read...
[INFO] Finished uploading 23.98 MiB in 00:00:16.083 (1527.03 KiB/s). Raw upload: 2613.34 KiB/s
So I'm using this:
foreach $line(#nyuulog) {print $txtfile("$line\n");}
which writes all of the lines but I'm wanting to not write lines out to the filehandle that contains an element in the #exclude array.
Is there an easy way to do this? I've tried numerous attempts at using grep or the new Perl ~~ command (which I don't think applies in this situation) and can't get the right combination of commands.
Any help or pointing me in the right direction - would be greatly appreciated.
Thank you

Try this:
my $FilterRe = join("|", map({"(^\Q$_\E)"} #exclude));
my #Filtered = grep({!/$FilterRe/} #nyuulog);
Inspired by a question on perlmonks.

Construct a look-up hash for what's to be excluded, and filter the array with it
my %excl = map { $_ => 1 } #exclude;
my #filtered = grep { not $excl{$_} } #original;
This is about as efficient as list processing can be, O(N), and hopefully clear and easy.
Can also have it in a do block to avoid an extra variable (%excl) floating around
my #filtered = do {
my %excl = map { $_ => 1 } #exclude;
grep { not $excl{$_} } #original;
};

Related

I am not able to get any values to store in the array #AA

It is getting the correct inputs and printing them inside the for loop but when I try to send it to a function module later or if I try to print it outside the for loop it is empty.
What do I need to change?
#!/usr/bin/perl
use lib "."; # This pragma include the current working directory
use Mytools;
$inputfilename = shift #ARGV;
open (INFILE, $inputfilename) or die
("Error reading file $inputfilename: $! \n");
# Storing every line of the input file in array #file_array
while (<INFILE>){
$file_array[ $#file_array + 1 ] = $_;
}
my $protein;
my #AA;
foreach $protein (#file_array)
{
#AA = Mytools::dnaToAA($protein);
print "The main AA\n",#AA;
}
print "The main array",#file_array;
my $header1 = "AA";
my $header2 = "DNA";
Mytools::printreport($header1, $header2, \#AA, \#file_array);
You're overwriting the #AA in every iteration of the foreach loop.
Instead of
#AA = Mytools::dnaToAA($protein);
use
push #AA, Mytools::dnaToAA($protein);
See push.
Next time, try to post runnable code (see mre), i.e. avoid Mytools as they're irrelevant to the problem and make the code impossible to run for anyone else but you.

Benchmark channel creation NextFlow

I am performing a scatter-gather operation on NextFlow.
It looks like the following:
reads = PATH+"test_1.fq"
outdir = "results"
split_read_ch = channel.fromFilePairs(reads, checkIfExists: true, flat:true ).splitFastq( by: 10, file:"test_split" )
process Scatter_fastP {
tag 'Scatter_fastP'
publishDir outdir
input:
tuple val(name), path(reads) from split_read_ch
output:
file "${reads}.trimmed.fastq" into gather_fatsp_ch
script:
"""
fastp -i ${reads} -o ${reads}.trimmed.fastq
"""
}
gather_fatsp_ch.collectFile().view().println{ it.text }
I run this code with all the benchmarks options proposed by Nextflow (https://www.nextflow.io/docs/latest/tracing.html):
nextflow run main.nf -with-report nextflow_report -with-trace nextflow_trace -with-timeline nextflow_timeline -with-dag nextflow_dag.html
In these tracing files, I can find the resources and speed of the 10 Scatter_fastP processes.
But I would like to also measure the resources and speed of the creation of the split_read_ch and the gather_fastp_ch channels.
I have tried to include the channels' creation in processes but I cannot find a solution to make it work.
Is there a way to include the channel creation into the tracing files? Or is there a way I have not found to create these channels into processes?
Thank you in advance for your help.
Although Nextflow can parse FASTQ files and split them into smaller files etc, generally it's better to pass off these operations to another process or set of processes, especially if your input FASTQ files are large. This is beneficial in two ways: (1) your main nextflow process doesn't need to work as hard, and (2) you get granular task process stats in your nextflow reports.
The following example uses GNU split to split the input FASTQ files, and gathers the outputs using the groupTuple() operator and the groupKey() built-in to stream the collected values as soon as possible. You'll need to adapt for your non-gzipped inputs:
nextflow.enable.dsl=2
params.num_lines = 40000
params.suffix_length = 5
process split_fastq {
input:
tuple val(name), path(fastq)
output:
tuple val(name), path("${name}-${/[0-9]/*params.suffix_length}.fastq.gz")
shell:
'''
zcat "!{fastq}" | split \\
-a "!{params.suffix_length}" \\
-d \\
-l "!{params.num_lines}" \\
--filter='gzip > ${FILE}.fastq.gz' \\
- \\
"!{name}-"
'''
}
process fastp {
input:
tuple val(name), path(fastq)
output:
tuple val(name), path("${fastq.getBaseName(2)}.trimmed.fastq.gz")
"""
fastp -i "${fastq}" -o "${fastq.getBaseName(2)}.trimmed.fastq.gz"
"""
}
workflow {
Channel.fromFilePairs( './data/*.fastq.gz', size: 1 ) \
| split_fastq \
| map { name, fastq -> tuple( groupKey(name, fastq.size()), fastq ) } \
| transpose() \
| fastp \
| groupTuple() \
| map { key, fastqs -> tuple( key.toString(), fastqs ) } \
| view()
}

ref to a hash -> its member array -> this array's member's value. How to elegantly access and test?

I want to use an expression like
#{ %$hashref{'key_name'}[1]
or
%$hashref{'key_name}->[1]
to get - and then test - the second (index = 1) member of an array (reference) held by my hash as its "key_name" 's value. But, I can not.
This code here is correct (it works), but I would have liked to combine the two lines that I have marked into one single, efficient, perl-elegant line.
foreach my $tag ('doit', 'source', 'dest' ) {
my $exists = exists( $$thisSectionConfig{$tag});
my #tempA = %$thisSectionConfig{$tag} ; #this line
my $non0len = (#tempA[1] =~ /\w+/ ); # and this line
if ( !$exists || !$non0len) {
print STDERR "No complete \"$tag\" ... etc ... \n";
# program exit ...
}
I know you (the general 'you') can elegantly combine these two lines. Could someone tell me how I could do this?
This code it testing a section of a config file that has been read into a $thisSectionConfig reference-to-a-hash by Config::Simple. Each config file key=value pair then is (I looked with datadumper) held as a two-member array: [0] is the key, [1] is the value. The $tag 's are configuration settings that must be present in the config file sections being processed by this code snippet.
Thank you for any help.
You should read about Arrow operator(->). I guess you want something like this:
foreach my $tag ('doit', 'source', 'dest') {
if(exists $thisSectionConfig -> {$tag}){
my $non0len = ($thisSectionConfig -> {$tag} -> [1] =~ /(\w+)/) ;
}
else {
print STDERR "No complete \"$tag\" ... etc ... \n";
# program exit ...
}

many array element and i need to search a file and print array whcih are not present in the file and there should be no duplicate records in my output

Please help me on the below code :
I have an array with 155 elements and i have a file which has some elements of array inside it , i need all values of the array elements which are found in the file and also i need the array element to be printed as zero if the array element is not found in the file .
Thanks in advance, this is what i have tried.
args=("C9" "DP10" "DP11" "DP20" "DP21" "DP30" "DP31" "DP50" "FR31" "G128" "G402" "G602" "GA" "GI" "GT08" "GT14" "GT17" "GT25" "GT37" "GT67" "H6" "H7" "IL" "IM" "J6" "JD05" "JD09" "JD14" "JD25" "JD37" "K1" "K2" "L100" "L106" "L116" "L150" "L202" "L7" "L8" "L9" "LD11" "LD21" "LE09" "LE26" "LP11" "LP21" "LP31" "LP55" "LQ11" "LQ21" "LQ31" "LS07" "LT09" "LT10" "LT12" "LT15" "LT20" "LT22" "LT24" "LT25" "LT30" "LT38" "LT42" "LT43" "LT44" "LT48" "LT50" "LT59" "LT60" "LT65" "M395" "OV04" "OV07" "OV14" "OV18" "OV23" "OV27" "OV35" "OV39" "OV40" "OV79" "Q15" "Q150" "Q19" "QD11" "QD21" "QD31" "QD65" "QE11" "QE21" "QE31" "QF50" "QM25" "QP10" "QP15" "QP20" "QP30" "QP31" "QP50" "QT25" "QT50" "R39" "R40" "r57" "R9" "rc23" "RC27" "RC39" "rc7" "rc79" "S1" "S101" "S117" "S118" "S13p" "S18" "S202" "S317" "S318" "S319" "S40" "S408" "S67" "S76" "S82" "S99" "SD11" "SD12" "SD14" "SD17" "SD29" "SD3" "SD5" "SD98" "SF20" "SF74" "SR07" "SV19" "SV6p" "T402" "T602" "TG00" "TG17" "TG43" "TG8" "TG92" "WD09" "WD14" "WD17" "WD24" "WD29" "WD37" "WD43" "WWE1" "XR91")
MY CODE :
for loop i have used to traverse the elements search inside a file .
for i in ${args[#]}; do
grep $i file.txt
if [ $? -ne 0 ]; then
echo $i"","""0"
fi
done >> output.txt
TOTAL FILE:
C9,5015319
DP10,36870732
DP11,188
DP20,18728254
DP21,341182
DP30,8415555
DP31,2390000
DP50,12371853
FR31,24541
G128,49780
G402,2000
G602,2000
GA,879888
GT08,1580384
GT17,1968192
GT25,4104
GT37,21550
GT67,24770
H6,660652
IL,137651
JD05,1518400
JD14,325800
JD25,828600
JD37,357100
K1,261549
K2,4715330
L100,284
L116,80000
L7,200847
L8,3158
L9,5054495
LE09,75776
LE26,343410
LP11,1030
LP21,492
LP31,113
LP55,3
LQ11,6776000
LQ21,3543600
LQ31,4525600
LT09,682800
LT12,5715
LT15,568873
LT22,236077
LT24,702800
LT25,4600
LT38,28990
LT65,300125
M395,29600
OV14,462
OV18,86300
OV40,217899
Q150,678
QD11,1000022
QD31,50
QF50,58575
QM25,57900
QP10,1792153
QP15,953400
QP20,770000
QP30,179450
QP31,163223
QP50,8
QT50,66340
R39,62440
R40,18807
r57,3456
rc23,3370
RC27,2809
RC39,2570
rc7,7137
rc79,1296
S1,25007
S117,1000000
S13p,52313
S18,75000
S317,289148
S318,3046
S319,30000
S40,300
S408,4967
S76,28
S82,103238
S99,480
SD11,6719
SD12,23123
SD14,22595
SD17,100000
SD29,252392
SD3,20000
SD5,14090
SD98,653
SF20,1000
SF74,7330
SV19,26461
SV6p,154994
T402,2000
T602,2000
TG17,2031
TG8,2964
TG92,1759
WD17,131194
WD24,94589
WD29,202198
WD37,101794
WD43,112942
WWE1,9600
XR91,70000
EXPECTED OUTPUT :
The output should contain the values which are present in the file for each array element.
If not present the output should contain the array element as zero. For eg:
c9 is not present in the file
output of c9 should be
c9,0
Your approach is not bad. I just would use
^$i,
as a grep-pattern. With your current file data, it's not necessary, but maybe one day your file will contain things like
X,2354
XA,1234
and suddenly your algorithm will fail, if args contain the element X.
Also, the echo statement is unnecessarily complex. I would write it simply as
echo $x,0
You can also simplify the if, by combining it with the grep
if ! grep ^$i, file.txt
but this is mere cosmetics and a matter of taste.

Extract Text from Array - perl

I am trying to extract the Interface from an array created from an SNMP Query.
I want to create an array like THIS:
my #array = ( "Gig 11/8",
"Gig 10/1",
"Gig 10/4",
"Gig 10/2");
It currently looks like THIS:
my #array =
( "orem-g13ap-01 Gig 11/8 166 T AIR-LAP11 Gig 0",
"orem-g15ap-06 Gig 10/1 127 T AIR-LAP11 Gig 0",
"orem-g15ap-05 Gig 10/4 168 T AIR-LAP11 Gig 0",
"orem-g13ap-03 Gig 10/2 132 T AIR-LAP11 Gig 0");>
I am doing THIS:
foreach $ints (#array) {
#gig = substr("$ints", 17, 9);
print("Interface: #gig");
Sure it works, but the hostname [orem-g15ap-01] doesn't always stay the same length, it varies depending on the site. I need to extract the word "Gig" plus the next 6 characters. I have no idea what is the best way of doing this.
I am a novice at perl but trying. Thanks
# "I need to extract the word "Gig" plus the next 6 characters."
# This looks like a fixed-with file format, so consider using unpack.
foreach ( #lines ) {
my( $orem, $gig, $rest ) = unpack 'a17 a9 a*';
print "[$gig]\n";
}
If it's not fixed-with format, then you need to find out what the file spec is and then maybe use a regular expression, something like:
my( $orem, $gig, $rest ) = m/(\S+)\s+(.{9})(.*)/;
But this will not work in the general case without a proper file spec.
Stuff like that is what Perl is made for. Regular Expressions are the way to go. Read the perldoc perlre.
foreach $ints (#array) {
$ints =~ s/(Gig.{6})/$1/g;
}
So you want the second and third field.
my #array = map { /^\S+\s+(\S+\s\S+)/s } #source;
This one is like ikegami's, but I recommend that if you know how something you want looks, then by all means, specify that. Because this is done in a list context, any string that does not match the spec, returns an empty list--or is ignored.
my #results = map { m!(\bGig\s+\d+/d+)! } #array;

Resources