Using reference file to find desired data in another file - Perl - arrays

I used a Perl script to compare two arrays and retrieve data of interest. Now I want to use that list of retrieved data to get desired information from another list, using only the first part as an identifier but pulling all the information in that line.
Example data:
Reference:
apple
orange
pear
Search list:
apple 439
plum 657
orange 455
Result:
apple 439
orange 455
I've tried doing this with Array::Compare but haven't had any luck as it compares the entire line not just the first portion.
Thanks!
EDIT: Thanks to DVK I now have the following code:
#!/usr/bin/perl
use strict;
use warnings;
use File::Slurp;
#Convert the first file into an array of keys #keys
my #keys = read_file('Matching_strains.txt');
#Convert the second file into an array of lines #lines2
my #lines = read_file('output2.txt');
#Convert that array of lines into a hash using map and split
my %data = map { split(/\s+/, $, 2) } #lines; # 2 limits # of entries
#Get a list of data for which keys are in the first list
my %final = map { exists $data{$_} ? ( $_=>$data{$_} ) : () } #keys;
#Print that hash out
print "%final\n";
But I'm getting a number found where operator expected for the my %data, I've consulted perldoc but I'm not sure what number its referring to.
Thanks!

Convert the first file into an array of keys #keys
Left as excercise for the reader
Convert the second file into an array of lines #lines2
Left as excercise for the reader
Convert that array of lines into a hash using map and split
my %data = map { split(/\s+/, $, 2) } #lines; # 2 limits # of entries
Get a list of data for which keys are in the first list
my %final = map { exists $data{$_} ? ( $_=>$data{$_} ) : () } #keys;
Print that hash out

This sort of thing should achieve what you're after:
use warnings;
use strict;
open my $file1, '<', 'in.txt' or die $!;
open my $file2, '<', 'in.2.txt' or die $!;
my (%keys, %data);
while(<$file1>){
chomp;
$keys{$_} = 1;
}
while(<$file2>){
chomp;
my #split = split/\s/;
$data{$split[0]} = $split[1];
}
foreach (keys %keys){
print "$_ $data{$_}\n" if exists $data{$_};
}
apple 439
orange 455

Related

search for string in file using an aray elements perl

I have a array which contains set of unique elements my_array= [aab, abc def, fgh,]
I have a file which containing these elements(repeated also)
I want to count each unique element has how many repetitions if no repetition then count is 1
example of file :
i want to have aab but no i dont want abc
i want to have aab but no i dont want def
output should be
aab - 2
abc - 1
def - 1
I tried to search first and print it its not woking
use strict;
use warnings;
my #my_array;
#my_array =("abc", "aab", "def");
open (my $file, '<', 'filename.txt') or die;
my $value;
foreach $value (#my_array) {
while(<$file>) {
if ($_ =~ /$value/){
print "found : $value\n";
}
}
}
Also tried 2nd method
use strict;
use warnings;
my #my_array;
#my_array =("abc", "aab", "def");
open (my $file, '<', 'filename.txt') or die;
while (<$file>) {
my $k=0;
if ($_ =~ /$my_array[$k]/) {
print "$my_array[$k]”;
}
}
Sample input data does not specify if lookup words repeat in the line or not.
Following demo code assumes that lookup words do not repeat in the line.
If this statement above does not true then the line should be split into tokens and each token must be inspected to get correct count of lookup words.
use strict;
use warnings;
use feature 'say';
use Data::Dumper;
my(%count,#lookup);
#lookup =('abc', 'aab', 'def');
while( my $line = <DATA> ) {
for ( #lookup ) {
$count{$_}++ if $line =~ /\b$_\b/;
}
}
say Dumper(\%count);
exit 0;
__DATA__
i want to have aab but no i dont want abc
i want to have aab but no i dont want def
Output
$VAR1 = {
'aab' => 2,
'abc' => 1,
'def' => 1
};
I'm a fan of the Algorithm::AhoCorasick::XS module for performing efficient searches for multiple strings at once. An example:
#!/usr/bin/env perl
use warnings;
use strict;
use Algorithm::AhoCorasick::XS;
my #words = qw/abc aab def/;
my $aho = Algorithm::AhoCorasick::XS->new(\#words);
my %counts;
while (my $line = <DATA>) {
$counts{$_}++ for $aho->matches($line);
}
for my $word (#words) {
printf "%s - %d\n", $word, $counts{$word}//1;
}
__DATA__
i want to have aab but no i dont want abc
i want to have aab but no i dont want def
outputs
abc - 1
aab - 2
def - 1
The $counts{$word}//1 bit in the output will give you a 1 if that word doesn't exist in the hash because it wasn't encountered in the text.
Can build an alternation pattern from the keywords and so match all that are on the line in one regex run, then populate a frequency hash with the matches
use warnings;
use strict;
use feature 'say';
use Data::Dumper;
my #keywords = qw(aab abc def fgh);
my $re_w = join '|', #keywords;
my %freq;
while (<>) {
++$freq{$_} for /($re_w)/g
}
say Dumper \%freq;
The <> operator reads line by line the files with names given on the command line, so the program is used as prog.pl file. (Or open the file "manually" in the program.)
The for loop imposes list context on its expression, so that regex returns the list of matches (captures), as the match operator does in the list context, and the ++$freq{$_} expression works with them one at a time.
The code counts all instances of keywords that repeat on a line. If that's not desired please clarify (can add a call to List::Util::uniq before feeding the list of matches to the for loop).
There are a number of other details that may need closer attention.
One example: if there are overlapping keywords, which one takes precedence? For instance, with keywords the and there, once the word there is encountered in the text should it be matched by there or by the? If it is there then keywords in the alternation pattern should be ordered from longest to shortest,
my $re_w = join '|', sort { length $b <=> length $a } #w;
Please clarify if there are additional considerations.

Perl: Matching perl hash keys from two files and utilizing in new hash

Just started learning perl, and was wondering if anyone could provide suggestions, relevant examples or resources regarding a coding problem I'm having below.
So I have two data files with tab-delineated columns, similar to the example below.
File#1:
GeneID ColA ColB
Gene01 5 15
Gene02 4 8
Gene03 25 5
File#2:
GeneID ColA ColC
Gene01 12 3
Gene03 5 20
Gene05 22 40
Gene06 88 2
The actual files I'm using have >50 columns and rows, but are similar to what's above.
First, I want to input the files, establish variables holding the column names for each file, and establish hashes using the column 1 genes as keys and the concatenated values of the other 2 columns per key.
This way there is one key per one value in each row of the hash.
My trouble is the third hash %commongenes. I need to find the keys that are the same in both hashes and use just those keys, and their associated values in both files, in the third hash. In the above example, this would be the following key value pairs:
File1: File2:
Gene01 5 15 Gene01 12 3
Gene03 25 5 Gene03 5 20
I know the following if loop is incorrect, but the concatenation of columns from both files it what I'd like to have.
if ($tmpArray1[0] eq $tmpArray2[0]){
$commongenes{$tmpArray2[0]} =
$tmpArray1[1].':'.$tmpArray1[2].':'.$tmpArray2[1].':'.$tmpArray2[2];
}
Here is the main body of the code below:
#!/usr/bin/perl -w
use strict;
my $file1=$ARGV[0];
my $file2=$ARGV[1];
open (FILE1, "<$file1") or die "Cannot open $file1 for processing!\n";
open (FILE2, "<$file2") or die "Cannot opent $file2 for processing!\n";
my #fileLine1=<FILE1>;
my #fileLine2=<FILE2>;
my %file1_allgenes=();
my %file2_allgenes=();
my %commongenes =();
my ($file1_group0name, $file1_group1name, $file1_group2name)=('','','','');
my ($file2_group0name, $file2_group1name, $file2_group2name)=('','','','');
for (my $i=0; $i<=$#fileLine1 && $i<=$#fileLine2; $i++) {
chomp($fileLine1[$i]);
chomp($fileLine2[$i]);
my #tmpArray1=split('\t',$fileLine1[$i]);
my #tmpArray2=split('\t',$fileLine2[$i]);
if ($i==0) { ## Column Names and/or Letters
$file1_group0name=substr($tmpArray1[0],0,6);
$file1_group1name=substr($tmpArray1[1],0,4);
$file1_group2name=substr($tmpArray1[2],0,4);
$file2_group0name=substr($tmpArray2[0],0,6);
$file2_group1name=substr($tmpArray2[1],0,4);
$file2_group2name=substr($tmpArray2[2],0,4);
}
if ($i!=0) { ## Concatenated values in 3 separate hashes
if (! defined $file1_allgenes{$tmpArray1[0]}) {
$file1_allgenes{$tmpArray1[0]}=$tmpArray1[1].':'.$tmpArray1[2];
}
if (! defined $file2_allgenes{$tmpArray2[0]}) {
$file2_allgenes{$tmpArray2[0]}=$tmpArray2[1].':'.$tmpArray2[2];
}
if ($tmpArray1[0] eq $tmpArray2[0]){
$commongenes{$tmpArray2[0]} =
$tmpArray1[1].':'.$tmpArray1[2].':'.$tmpArray2[1].':'.$tmpArray2[2];
}
}
my #commongenes = %commongenes;
print "#commongenes\n\n";
}
Any suggestions are most appreciated.
Use a hash of arrays so you don't need to substr and concatenate the strings all the time.
#!/usr/bin/perl
use warnings;
use strict;
open my $F1, '<', 'file1' or die $!;
<$F1>; # Skip the header.
my %h;
while (<$F1>) {
my #cols = split;
$h{ $cols[0] } = [ #cols[ 1 .. $#cols ] ];
}
my %common;
open my $F2, '<', 'file2' or die $!;
<$F2>;
while (<$F2>) {
my #cols = split;
$common{ $cols[0] } = [ #{ $h{ $cols[0] } }, #cols[ 1 .. $#cols ] ]
if exists $h{ $cols[0] };
}
use Data::Dumper; print Dumper \%common;

creating hash from array in perl

I have an array that I want to convert into a hash table. Basically, I want #array[0] to be the keys of the hash, and #array[1] to be the values of the hash. Is there an easy way to do this in perl? The code I have so far is as follows:
#!/usr/bin/perl
use warnings;
use strict;
use diagnostics;
unless( open(INFILE, "<", 'scratch/Drosophila/fb_synonym_fb_2014_05.tsv')) {
die "Cannot open file for reading: ", $!;
while(<INFILE>) {
my #values = split();
#convert values[0] to keys, values[1] to values
}
the file is available for download here
#array[0] (an array slice, used to return multiple elements) is a bad way of writing $array[0] (an array lookup, used to return a single element). use warnings; would have told you this.
To set a hash element, one uses
$hash{$key} = $val;
So the code becomes
my %hash;
while (<>) {
chomp;
my #fields = split /\t/;
$hash{ $fields[0] } = $fields[1];
}
Better yet,
my %hash;
while (<>) {
chomp;
my ($key, $val) = split /\t/;
$hash{$key} = $val;
}
The name of the file implies the fields are tab-separated, not whitespace separated, so I switched
split ' '
to
split /\t/
This required the addition of chomp.

Perl compare array content to hash values

I'm trying to check the existence of words from a file in a hash, and if so to display the corresponding hash key and value.
So far, I've been able to do it with an array #motcherches which values are given in the script.
I can't find a way to give as values to this array words from a external file (here FILEDEUX).
Could you point me in the right direction?
(Please don't hesitate to correct my way of explaining my problem.)
#!/usr/bin/perl
use v5.10;
use strict;
use warnings;
my $hashtable = $ARGV[0];
my $textecompare = $ARGV[1];
open FILE,"<$hashtable" or die "Could not open $hashtable: $!\n";
open FILEDEUX,"<$textecompare" or die "Could not open $textecompare: $!\n";
while ( my $line = <FILE>)
{
chomp($line);
my %elements = split(" ",$line);
my #motcherches = qw/blop blup blip blap/;
foreach my $motcherches (#motcherches) {
while ((my $key, my $value) = each (%elements)) {
# Check if element is in hash :
say "$key , $value" if (exists $elements{$motcherches}) ;
}
}
}
close FILE;
close FILEDEUX;
EDIT: Example inputs
FILE (transformed into %elements hash)
Zieler player
Chilwell player
Ulloa player
Mahrez player
Ranieri coach
============================================================================
FILEDEUX (transformed into motcherchesarray)
One save is quickly followed by another, and this time Zieler had to be a little sharper to keep it out.
Izaguirre gives Mahrez a taste of his own medicine by beating him all ends up down the left flank, although the Leicester winger gave up far too easily.
Claudio Ranieri will be happy with what he has seen from his side so far.
=============================================================================
Expected output:
Zieler player
Mahrez player
Ranieri coach
use strict;
use warnings;
use feature qw/ say /;
# Read in your paragraph of text here:
open my $text, '<', 'in.txt' or die $!;
my %data;
while(<$text>){
chomp;
$data{$_}++ for split; # by default split splits on ' '
}
my %players;
while(<DATA>){
chomp;
my #field = split;
say if $data{$field[0]};
}
__DATA__
Zieler player
Chilwell player
Ulloa player
Mahrez player
Ranieri coach
Everything is fine but add the chomp in your program
while ( my $line = <FILE>)
{
chomp($line);
my %elements = split(" ",$line);

How to read in multiple lines into one array or hash until the next line fits certain criteria in Perl?

If I want to read multiple lines with same elements into one array or hash until reaching the next line with a different element. These elements have already been sorted so they are in the lines next to each other. For example:
1_1111 1234
1_1111 2234
1_1111 3234
1_1112 4234
1_1112 5234
1_1112 6234
1_1112 7234
1_1113 8234
1_1113 9234
I want to read the first three lines with same element 1_1111 into one array, process it, then read the next few lines with the same element 1_1112
my $key;
my #nums;
while (<>) {
my #fields = split;
if (#nums && $fields[0] ne $key) {
process($key, #nums);
#nums = ();
}
$key = $fields[0];
push #nums, $fields[1];
}
process($key, #nums) if #nums;
You can read the file into a hash of arrays:
#!/usr/bin/perl
use strict;
use warnings;
my %hash;
while (<DATA>) {
my ($key, $value) = split /\s+/;
push #{ $hash{$key} }, $value;
}
__DATA__
1_1111 1234
1_1111 2234
1_1111 3234
1_1112 4234
1_1112 5234
1_1112 6234
1_1112 7234
1_1113 8234
1_1113 9234
The keys of the hash correspond to the numbers in the left column, while the values are arrays of numbers from the right column. Now you can iterate through the hash and process it as you wish.

Resources