Indexing values from a file - database

I have a text file database that is structured like this (for each line) ID#VALUE1#VALUE2, here's how it looks like:
1#foo#bar
2#boo#tall
3#34h3s#kdfjf8
4#asdff34#fret45
For my purposes it would be useful to have a mask like this:
$ID#$val1#$val2
so --- for the sake of the example --- a code like this:
print "type ID number:";
$ID=<>;
print "value1 is:$val1, and value2 is:$val2";
For a standard input = 1 ($ID = 1) my script will return:
value1 is:foo, and value2 is:bar
How do I load and index the file in such a way? I was thinking about using an Hash table but it doesn't quite work as it returns only val1. Perhaps it can be done with a simple array.. Is there a clever way to do that? How would it look like?
PS. it is also important for me to retrieve only $val1 OR only $val2

If your keys are all numeric and there are no gaps in the sequence then you should be using an array, not a hash.
And you can get around the problem of storing more than one value in each element by storing a reference to an array of two values.
This program demonstrates
use strict;
use warnings;
open my $fh, '<', 'data.txt' or die "can't open data file: $!";
my #data;
while (<$fh>) {
chomp;
my ($key, #values) = split /#/;
$data[$key] = \#values;
}
print "type ID number:";
my $id = <>;
printf "value1 is: %s, and value2 is: %s\n", #{$data[$id]};

Related

Iterate through a file multiple times, each time finding a regex and returning one line (perl)

I have one file with ~90k lines of text in 4 columns.
col1 col2 col3 value1
...
col1 col2 col3 value90000
A second file contains ~200 lines, each one corresponding to a value from column 4 of the larger file.
value1
value2
...
value200
I want to read in each value from the smaller file, find the corresponding line in the larger file, and return that line. I have written a perl script that places all the values from the small file into an array, then iterates through that array using each value as a regex to search through the larger file. After some debugging, I feel like I have it almost working, but my script only returns the line corresponding to the LAST element of the array.
Here is the code I have:
open my $fh1, '<', $file1 or die "Could not open $file1: $!";
my #array = <$fh1>;
close $fh1;
my $count = 0;
while ($count < scalar #array) {
my $value = $array[$count];
open my $fh2, '<', $file2 or die "Could not open $file2: $!";
while (<$fh2>) {
if ($_ =~ /$value/) {
my $line = $_;
print $line;
}
}
close $fh2;
$count++;
}
This returns only:
col1 col2 col3 value200
I can get it to print each value of the array, so I know it's iterating through properly, but it's not using each value to search the larger file as I intended. I can also plug any of the values from the array into the $value variable and return the appropriate line, so I know the lines are there. I suspect my bug may have to do with either:
newlines in the array elements, since all the elements have a newline except the last one. I've tried chomp but get the same result.
or
something to do with the way I'm handling the second file with opening/closing. I've tried moving or removing the close command and that either breaks the code or doesn't help.
You should only be reading the 90k line file once, and checking each value from the other file against the fourth column of each line as you do, instead of reading the whole large file once per line of the smaller one:
#!usr/bin/env perl
use warnings;
use strict;
use feature qw/say/;
my ($file1, $file2) = #ARGV;
# Read the file of strings to match against
open my $fh1, '<', $file1 or die "Could not open $file1: $!";
my %words = map { chomp; $_ => 1 } <$fh1>;
close $fh1;
# Process the data file in one pass
open my $fh2, '<', $file2 or die "Could not open $file2: $!";
while (my $line = <$fh2>) {
chomp $line;
# Only look at the fourth column
my #fields = split /\s+/, $line, 4;
say $line if exists $words{$fields[3]};
}
close $fh2;
Note this uses a straight up string comparison (Via hash key lookup) against the last column instead of regular expression matching - your sample data looks like that's all that's needed. If you're using actual regular expressions, let me know and I'll update the answer.
Your code does look like it should work, just horribly inefficiently. In fact, after adjusting your sample data so that more than one line matches, it does print out multiple lines for me.
Slightly different approach to the problem
use warnings;
use strict;
use feature 'say';
my $values = shift;
open my $fh1, '<', $values or die "Could not open $values";
my #lookup = <$fh1>;
close $fh1;
chomp #lookup;
my $re = join '|', map { '\b'.$_.'\b' } #lookup;
((split)[3]) =~ /$re/ && print while <>;
Run as script.pl value_file data_file

How to load a CSV file into a perl hash and access each element

I have a CSV file with the following information seperated by commas ...
Owner,Running,Passing,Failing,Model
D42,21,54,543,Yes
T43,54,76,75,No
Y65,76,43,765,Yes
I want to open this CSV file and place its containments inside of a perl hash in my program. I am also interested in the code needed to print a specific element inside of the has. For example, how I will print the "Passing" count for the "Owner" Y65.
The code I currently have:
$file = "path/to/file";
open $f, '<', $files, or die "cant open $file"
while (my $line = <$f>) {
#inside here I am trying to take the containments of this file and place it into a hash. I have tried numerous ways of trying this but none have seemed to work. I am leaving this blank because I do not want to bog down the visibility of my code for those who are kind enough to help and take a look. Thanks.
}
AS well as placing the csv file inside of a hash I also need to understand the syntax to print and navigate through specific elements. Thank you very much in advance.
Here is an example of how to put the data into a hash %owners and later (after having read the file) extract a "passing count" for a particular owner. I am using the Text::CSV module to parse the lines of the file.
use feature qw(say);
use open qw(:std :utf8); # Assume UTF-8 files and terminal output
use strict;
use warnings qw(FATAL utf8);
use Text::CSV;
my $csv = Text::CSV->new ( )
or die "Cannot use CSV: " . Text::CSV->error_diag ();
my $fn = 'test.csv';
open my $fh, "<", $fn
or die "Could not open file '$fn': $!";
my %owners;
my $header = $csv->getline( $fh ); # TODO: add error checking
while ( my $row = $csv->getline( $fh ) ) {
next if #$row == 0; # TODO: more error checking
my ($owner, #values) = #$row;
$owners{$owner} = \#values;
}
close $fh;
my $key = 'Y65';
my $index = 1;
say "Passing count for $key = ", $owners{$key}->[$index];
Since it's not really clear what "load a CSV file into a perl hash" means (Nor does it really make sense. An array of hashes, one per row, maybe, if you don't care about keeping the ordering of fields, but just a hash? What are the keys supposed to be?), let's focus on the rest of your question, in particular
how I will print the "Passing" count for the "Owner" Y65.
There are a few other CSV modules that might be of interest that are much easier to use than Text::CSV:
Tie::CSV_File lets you access a CSV file like a 2D array. $foo[0][0] is the first field of the first row of the tied file.
So:
#!/usr/bin/perl
use warnings;
use strict;
use feature qw/say/;
use Tie::CSV_File;
my $csv = "data.csv";
tie my #data, "Tie::CSV_File", $csv or die "Unable to tie $csv!";
for my $row (#data) {
say $row->[2] and last if $row->[0] eq "Y65";
}
DBD::CSV lets you treat a CSV file like a table in a database you can run SQL queries on.
So:
#!/usr/bin/perl
use warnings;
use strict;
use feature qw/say/;
use DBI;
my $csv = "data.csv";
my $dbh = DBI->connect("dbi:CSV:", undef, undef,
{ csv_tables => { data => { f_file => $csv } } })
or die $DBI::errstr;
my $owner = "Y65";
my $p = $dbh->selectrow_arrayref("SELECT Passing FROM data WHERE Owner = ?",
{}, $owner);
say $p->[0] if defined $p;
Text::AutoCSV has a bunch of handy functions for working with CSV files.
So:
#!/usr/bin/perl
use warnings;
use strict;
use feature qw/say/;
use Text::AutoCSV;
my $csv = "data.csv";
my $acsv = Text::AutoCSV->new(in_file => $csv) or die "Unable to open $csv!";
my $row = $acsv->search_1hr("OWNER", "Y65");
say $row->{"PASSING"} if defined $row;
This last one is probably closest to what I think you think you want.

Is there a way to create variable Arrays in Perl?

So, well I am trying around again and now I am stuck.
while (<KOERGEBNIS>){
my $counter = 0;
my $curline = $_;
for (my $run = 0; $run < $arrayvalue; $run++){
if ($curline =~ m/#tidgef[$counter]/){
my $row = substr($curline, 0, 140);
push #array$counter, $row;
print "Row $. was saved in ID: #filtered[$counter]\n";
}
$counter++;
}
}
Background is that I want to save all lines beginning with the same 8 characters in the same array so I can count the lines and start working with those arrays. The only thing I could think of right now is with switch and cases but I thought I'd ask first before throwing this code to garbage.
Example:
if theres a line in a .txt like this:
50004000_xxxxxxxxxxxxxx31
50004000_xxxxxxxxxxxxxx33
60004001_xxxxxxxxxxxxxx11
60004001_xxxxxxxxxxxxxx45
I took the first 8 chars of each line and used uniq to filter duplicates and saved them in the array #tidgef, now I want to save Line1 and Line2 in #array1 or even better #array50004000 and Line4 and Line4 to #array2 or #array60004001.
I hope I explained my problem well enough! thank you guys
You're hovering dangerously close to an idea called "symbolic references" (also known as "use a variable to get a variable's name"). It's a very bad idea, for all sorts of reasons.
It's a much better idea to use this as an excuse to learn about complex data structures in Perl. It's not really clear what you want to do with this data, but this example should get you started:
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
use Data::Dumper;
my %lines;
while (<DATA>) {
chomp;
my $key = substr($_, 0, 8);
push #{$lines{$key}}, $_;
}
say Dumper \%lines;
__DATA__
50004000_xxxxxxxxxxxxxx31
50004000_xxxxxxxxxxxxxx33
60004001_xxxxxxxxxxxxxx11
60004001_xxxxxxxxxxxxxx45
You should think carefully about why you want arrays called #array50004000 #array60004001. Your program could create them, but you have no way of knowing what those names are. While the code is running, unless you are stepping through it with the debugger, they may be called #x and #y for all you know. You can't even dump their contents because you have no idea what to dump
What you're looking for is a hash, specifically a hash of arrays. Unlike the symbol table, there are operators like keys, values and each that will allow you to enquire what values have been stored in a hash
Your code would look something like this. I have used the example data from your question and put it into myfile
use strict;
use warnings 'all';
my %data;
open KOERGEBNIS, '<', 'myfile' or die $!;
while ( <KOERGEBNIS> ) {
chomp;
my ($key) = split /_/;
push #{ $data{$key} }, $_;
}
for my $key ( sort keys %data ) {
my $val = $data{$key};
print $key, "\n";
print " $_\n" for #$val;
print "\n";
}
output
50004000
50004000_xxxxxxxxxxxxxx31
50004000_xxxxxxxxxxxxxx33
60004001
60004001_xxxxxxxxxxxxxx11
60004001_xxxxxxxxxxxxxx45

More clarification about the usage of split in Perl

I have this following input file:
test.csv
done_cfg,,,,
port<0>,clk_in,subcktA,instA,
port<1>,,,,
I want to store the elements of each CSV column into an array, but I always get error when I try to fetch those "null" elements in the csv when I run the script. Here's my code:
# ... assuming file was correctly opened and stored into
# ... a variable named $map_in
my $counter = 0;
while($map_in){
chomp;
#hold_csv = split(',',$_);
$entry1[$counter] = $hold_csv[0];
$entry2[$counter] = $hold_csv[1];
$entry3[$counter] = $hold_csv[2];
$entry4[$counter] = $hold_csv[3];
$counter++;
}
print "$entry1[0]\n$entry2[0]\n$entry3[0]\n$entry3[0]"; #test printing
I always got use of uninitialized value error whenever i fetch empty CSV cells
Can you help me locate the error in my code ('cause I know I have somewhat missed something on my code)?
Thanks.
This looks like CSV. So the tool for the job is really Text::CSV.
I will also suggest - having 4 different arrays with numbered names says to me that you're probably wanting a multi-dimensional data structure in the first place.
So I'd be doing something like:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
use Text::CSV;
my $csv = Text::CSV->new( { binary => 1 } );
open( my $input, "<", "input.csv" ) or die $!;
my #results;
while ( my $row = $csv->getline($input) ) {
push ( #results, \#$row );
}
print join ( ",", #{$results[0]} ),"\n";
print Dumper \#results;
close($input);
If you really want separate arrays, I'd suggest naming them something different, but you could do it like this:
push ( #array1, $$row[0] ); #note - double $, because we dereference
I will note - there's an error in your code - I doubt:
while($map_in){
is doing what you think it is.
When you're assigning $entryN, define a default value:
$entry1[$counter] = $hold_csv[0] || '';
same for other #entry
I think there is a typo in while($map_in) { it should be while (#map_in) {.

Search for, and remove column from CSV file

I'm trying to write a subroutine that will take two arguments, a filename and the column name inside a CSV file. The subroutine will search for the second argument (column name) and remove that column (or columns) from the CSV file and then return the CSV file with the arguments removed.
I feel like I've gotten through the first half of this sub (opening the file, retrieve the headers and values) but I can't seem to find a way to search the CSV file for the string that the user inputs and delete that whole column. Any ideas? Here's what I have so far.
sub remove_columns {
my #Para = #_;
my $args = #Para;
die "Insufficent arguments\n" if ($nargs < 2);
open file, $file
$header = <file>;
chomp $header;
my #hdr = split ',',$header;
while (my $line = <file>){
chomp $line;
my #vals = split ',',$line;
#hash that will allow me to access column name and values quickly
my %h;
for (my $i=0; $i<=$#hdr;$i++){
$h{$hdr[$i]}=$i;
}
....
}
Here's where the search and removal will be done. I've been thinking about how to go about this; the CSV files that I'll be modifying will be huge, so speed is a factor, but I can't seem to think of a good way to go about this. I'm new to Perl, so I'm struggling a bit.
Here are a few hints that will hopefully get you going.
To remove the element of an array at position $index of an array use :
splice #array,$index,1 ;
As speed is an issues, you probably want to construct an array of column numbers at the start and then loop on the the elements of the array
for my $index (#indices) {
splice #array,$index,1 ;
}
(this way is more idiomatic Perl than for (my $i=0; $i<=$#hdr;$i++) type loop )
Another thing to consider - CSV format is surprisingly complicated. Might your data have data with , within " " such as
1,"column with a , in it"
I would consider using something like Text::CSV
You should probably look in the direction of Text::CSV
Or you can do something like this:
my $colnum;
my #columns = split(/,/, <$file>);
for(my $i = 0; $i < scalar(#columns); $i++) {
if($columns[$i] =~ /^$unwanted_column_name$/) {
$colnum = $i;
last;
};
};
while(<$file>) {
my #row = split(/,/, $_);
splice(#row, $colnum, 1);
#do something with resulting array #row
};
Side note:
you really should use strict and warnings;
split(/,/, <$file>);
won't work with all CSV files
There is elegant way how to remove some columns from array. If I have columns to removal in array #cols, and headers in #headers I can make array of indexes to preserve:
my %to_delete;
#to_delete{#cols} = ();
my #idxs = grep !exists $to_delete{$headers[$_]}, 0 .. $#headers;
Then it's easy to make new headers
#headers[#idxs]
and also new row from read columns
#columns[#idxs]
The same approach can be used for example for rearranging arrays. It is very fast and pretty idiomatic Perl way how to do this sort of tasks.

Resources