So, well I am trying around again and now I am stuck.
while (<KOERGEBNIS>){
my $counter = 0;
my $curline = $_;
for (my $run = 0; $run < $arrayvalue; $run++){
if ($curline =~ m/#tidgef[$counter]/){
my $row = substr($curline, 0, 140);
push #array$counter, $row;
print "Row $. was saved in ID: #filtered[$counter]\n";
}
$counter++;
}
}
Background is that I want to save all lines beginning with the same 8 characters in the same array so I can count the lines and start working with those arrays. The only thing I could think of right now is with switch and cases but I thought I'd ask first before throwing this code to garbage.
Example:
if theres a line in a .txt like this:
50004000_xxxxxxxxxxxxxx31
50004000_xxxxxxxxxxxxxx33
60004001_xxxxxxxxxxxxxx11
60004001_xxxxxxxxxxxxxx45
I took the first 8 chars of each line and used uniq to filter duplicates and saved them in the array #tidgef, now I want to save Line1 and Line2 in #array1 or even better #array50004000 and Line4 and Line4 to #array2 or #array60004001.
I hope I explained my problem well enough! thank you guys
You're hovering dangerously close to an idea called "symbolic references" (also known as "use a variable to get a variable's name"). It's a very bad idea, for all sorts of reasons.
It's a much better idea to use this as an excuse to learn about complex data structures in Perl. It's not really clear what you want to do with this data, but this example should get you started:
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
use Data::Dumper;
my %lines;
while (<DATA>) {
chomp;
my $key = substr($_, 0, 8);
push #{$lines{$key}}, $_;
}
say Dumper \%lines;
__DATA__
50004000_xxxxxxxxxxxxxx31
50004000_xxxxxxxxxxxxxx33
60004001_xxxxxxxxxxxxxx11
60004001_xxxxxxxxxxxxxx45
You should think carefully about why you want arrays called #array50004000 #array60004001. Your program could create them, but you have no way of knowing what those names are. While the code is running, unless you are stepping through it with the debugger, they may be called #x and #y for all you know. You can't even dump their contents because you have no idea what to dump
What you're looking for is a hash, specifically a hash of arrays. Unlike the symbol table, there are operators like keys, values and each that will allow you to enquire what values have been stored in a hash
Your code would look something like this. I have used the example data from your question and put it into myfile
use strict;
use warnings 'all';
my %data;
open KOERGEBNIS, '<', 'myfile' or die $!;
while ( <KOERGEBNIS> ) {
chomp;
my ($key) = split /_/;
push #{ $data{$key} }, $_;
}
for my $key ( sort keys %data ) {
my $val = $data{$key};
print $key, "\n";
print " $_\n" for #$val;
print "\n";
}
output
50004000
50004000_xxxxxxxxxxxxxx31
50004000_xxxxxxxxxxxxxx33
60004001
60004001_xxxxxxxxxxxxxx11
60004001_xxxxxxxxxxxxxx45
Related
I'm totally new to Perl, and I've been assigned some task... I have to read a tab separated file, and then do some operations with the data in a DB. The .tsv file is like this:
ID Name Date
155 Pedro 1988-05-05
522 Mengano 2002-08-02
So far I thought that creating a multidimensional array with the data of the file will be a good solution to handle this data later. So I read the file line by line, skip the item title columns and save the values in an array. However, I'm having difficulties creating this multidimensional array... this is what I've done so far:
#Read file from path
my #array;
my $fh = path($filename)->openr_utf8;
while (my $line = <$fh>) {
chomp $line;
# skip comments and blank lines and title line
next if $line =~ /^\#/ || $line =~ /^\s*$/ || $line =~ /^\+/ || $line =~ /ID/;
#split each line into array
my #aux_line = split(/\s+/, $line);
push #array, #{ $aux_line };
}
Obviously, last line is not working... how could be done to create an array of arrays this way? I'm little bit lost with references... And somebody can think of a better way to store this data we read from file? Thank you!
You can also do this with map:
use Data::Dumper;
my #stuff = map {[split]} <$fh>;
print Dumper \#stuff;
(with maybe a grep to skip comments)
But it may suit your use case better to use an array of hashes :
my #stuff ;
chomp(my #header = split ' ', <$fh>);
while ( <$fh>) {
my %this_row;
#this_row{#header} = split;
push ( #stuff, \%this_row) ;
}
First, use strict and use warnings. That would instantly alert you about that your wrong way to get array reference tries to access completely different variable (Perl allows variable of different types have same names).
After that just change your last line to:
push #array, \#aux_line;
Why does the hash remove the first value apple:2 when I print the output?
use warnings;
use strict;
use Data::Dumper;
my #array = ("apple:2", "pie:4", "cake:2");
my %wordcount;
our $curword;
our $curnum;
foreach (#array) {
($curword, $curnum) = split(":",$_);
$wordcount{$curnum}=$curword;
}
print Dumper (\%wordcount);
Perl hash can only have unique keys, so
$wordcount{2} = "apple";
is later overwritten by
$wordcount{2} = "cake";
What you probably wanted to do was:
use warnings;
use strict;
use Data::Dumper;
my #array = ("apple:2", "pie:4", "cake:2");
my %wordcount;
for my $entry (#array) {
my ($word, $num) = split /:/, $entry;
push #{$wordcount{$num}}, $word;
}
print Dumper (\%wordcount);
This way, each entry in %wordcount relates a word count to an array of the words which appear that many times (assuming the :n in the notation indicates the count).
It is OK to be a beginner, but it is not OK to assume other people can read your mind.
Also, don't use global variables (our) when lexically scoped (my) will do.
I have this following input file:
test.csv
done_cfg,,,,
port<0>,clk_in,subcktA,instA,
port<1>,,,,
I want to store the elements of each CSV column into an array, but I always get error when I try to fetch those "null" elements in the csv when I run the script. Here's my code:
# ... assuming file was correctly opened and stored into
# ... a variable named $map_in
my $counter = 0;
while($map_in){
chomp;
#hold_csv = split(',',$_);
$entry1[$counter] = $hold_csv[0];
$entry2[$counter] = $hold_csv[1];
$entry3[$counter] = $hold_csv[2];
$entry4[$counter] = $hold_csv[3];
$counter++;
}
print "$entry1[0]\n$entry2[0]\n$entry3[0]\n$entry3[0]"; #test printing
I always got use of uninitialized value error whenever i fetch empty CSV cells
Can you help me locate the error in my code ('cause I know I have somewhat missed something on my code)?
Thanks.
This looks like CSV. So the tool for the job is really Text::CSV.
I will also suggest - having 4 different arrays with numbered names says to me that you're probably wanting a multi-dimensional data structure in the first place.
So I'd be doing something like:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
use Text::CSV;
my $csv = Text::CSV->new( { binary => 1 } );
open( my $input, "<", "input.csv" ) or die $!;
my #results;
while ( my $row = $csv->getline($input) ) {
push ( #results, \#$row );
}
print join ( ",", #{$results[0]} ),"\n";
print Dumper \#results;
close($input);
If you really want separate arrays, I'd suggest naming them something different, but you could do it like this:
push ( #array1, $$row[0] ); #note - double $, because we dereference
I will note - there's an error in your code - I doubt:
while($map_in){
is doing what you think it is.
When you're assigning $entryN, define a default value:
$entry1[$counter] = $hold_csv[0] || '';
same for other #entry
I think there is a typo in while($map_in) { it should be while (#map_in) {.
I have two arrays (Array1 and Array2). Array1 contains approximately 20,000 non-unique elements. Array2 contains approximately 90,000 unique elements. I am trying to count the number of elements in Array1 that also show up as elements of Array2. The perl script below does this successfully but slowly. Is there another method for counting the number of Array1 elements that exist in Array2 that is likely to be faster than this?
#!/usr/bin/perl
use strict;
use warnings;
use autodie;
# This program counts the total number of elements in one array that exist in a separate array.
my $Original_Array_File = "U:/Perl/MasterArray.txt";
# Read in the master file into an array.
open my $file, '<', $Original_Array_File or die $!;
my #Array2 = <$file>;
close $file;
my $path = "C:/Files by Year/1993";
chdir($path) or die "Cant chdir to $path $!";
for my $new_file ( grep -f, glob('*.txt') ) {
open my ($new_fh), '<', $new_file;
my #Array1 = <$new_file>;
my #matched_array_count ;
foreach #Array1 {
++$matched_array_count if ($_ ~~ #Array2 ) ;
}
The primary source of your performance problem is:
foreach #Array1 {
++$matched_array_count if ($_ ~~ #Array2 );
}
According to the smartmatch documentation, the behavior of
$string ~~ #array
is like
grep { $string ~~ $_ } #array
and
$string1 ~~ $string2
is like
$string1 eq $string2
Putting these together with your original code snippet, we get something like:
foreach my $string1 #Array1 {
++$matched_array_count if grep { $string1 eq $_ } #Array2;
}
In other words, take the first element in #Array1 and compare it to every single element in #Array2, then take the second element in #Array1 and compare it to every single element in #Array2, and so on. This works out to
scalar #Array1 * scalar #Array2
comparisons, or roughly 1.8 billion total.
The way this is usually done in Perl is with a hash. It is significantly faster to do a single hash lookup than to search through every element in an array. The basic algorithm is:
Load your "haystack" (what you're searching in) into a hash.
Loop through your "needles" (what you're searching for) and use exists to see if there's a matching key in the hash.
In your particular case, you would load the contents of U:/Perl/MasterArray.txt into a hash and then perform
scalar #Array1
hash lookups, or ~20,000. That will be significantly faster than what you have now.
Here is an example of searching for words in a Linux dictionary file:
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
my $file = '/usr/share/dict/linux.words';
open my $fh, '<', $file or die "Failed to open '$file': $!";
my %haystack = map { chomp; $_ => 1 } <$fh>;
my $count;
while (<DATA>) {
chomp;
$count++ if exists $haystack{$_};
}
say $count;
__DATA__
foo
bar
foobar
fubar
Output:
3
(Apparently, "foobar" is a word. "FUBAR" is a word, but "fubar" in lowercase is not.)
Smartmatch is experimental
You should also be aware that as of Perl 5.18.0, the smartmatch family of features are now experimental:
Smart match, added in v5.10.0 and significantly revised in v5.10.1, has been a regular point of complaint. Although there are a number of ways in which it is useful, it has also proven problematic and confusing for both users and implementors of Perl. There have been a number of proposals on how to best address the problem. It is clear that smartmatch is almost certainly either going to change or go away in the future. Relying on its current behavior is not recommended. (emphasis added)
It looks like you want to compare many files.
You might split this up to use a pool of threads with one thread per file. Or if you are more memory constrained you might split one of the arrays into parts and have each part sent to a different thread.
I've looked through several threads on websites including this one to try and understand why I am getting an undeclared variable error for my usage of my $line . Each element of the #lines array is an array of strings.
The error is in line 25 and 27 with the $line[$count] statement
use strict;
use warnings;
my #lines;
my #sizes;
# read input from stdin file into 2d array
while(<>)
{
push(#lines, my #tokens = split(/\s+/, $_));
}
# search through each array for largest sizes in
# corresponding elements
for (my $count = 0; $count <= 5; $count++)
{
push(#sizes, 0);
foreach my $line (#lines)
{
if(length($line[$count])>$sizes[$count])
{
$sizes[$count] = length($line[$count]);
}
}
}
I can post the full code if it is necessary, but I am pretty sure the error must be in here somewhere.
The problem is here:
push(#lines, my #tokens = split(/\s+/, $_));
Pushing one array into another just adds all elements to the first array. So you are making a really long one dimensional array.
To fix this, use brackets to make an array reference:
push #lines, [ split(/\s+/, $_) ]; #No need for a temp variable.
Also, to access the array reference, you have to de-reference it. Both of these syntaxes are options:
${$line}[$count];
$line->[$count];
I think the second syntax is more readable.
Update: Also, you could simplify your code if you keep track of the longest lengths while you go through the file:
use strict;
use warnings;
use List::Util qw/max/;
my #lines;
my #sizes = (0)x6;
while(<>)
{
push #lines, [ my #tokens = split ];
#sizes = map { max ( length($tokens[$_]), $sizes[$_] ) } 0..$#tokens;
}
Note: The Data::Dumper core module is an invaluable tool when working with complex data structures in Perl.
use Data::Dumper;
print Dumper #lines;
This will print out the complete structure of whatever variable you give it. That way you can see if you actually created what you thought you did.