Select one random row from cycle of function with SQLite and Perl - database

Hi I tried to select one random number from this:
My source:
use DBI;
use CGI;
my $file = '.\input.txt'; # Name the file
open(FILE, $file) or die("Unable to open file");
my #data = <FILE>;
foreach my $line (#data)
{
chomp $line
my $sth = $dbh->prepare("SELECT columnA FROM table WHERE columnA LIKE '%$line%'");
$sth->execute;
my $result = $sth->fetchall_arrayref;
foreach my $row ( #$result ) {
print "- ";
print "#$row\n";
print "<BR />";
}
}
How can I print only ONE RANDOM row???
I tried something like that:
my $sth = $dbh->prepare("SELECT nazov_receptu FROM recepty WHERE nazov_receptu LIKE '%$line%' AND kategoria == 'p' AND (rowid = (abs(random()) % (select max(rowid)+1 from recepty)) or rowid = (select max(rowid) from recepty)) order by rowid limit 1;");
but its not clear... i dont know why...
I am using SQLite and printing it to web interface.
You can try it when you have
input.txt:
A
C
database:
id name
1 A
2 B
3 C
4 D
5 E
OUT:
A OR C (random)

Why not join the file arguments into the query right away instead of looping over them? Then it is a simple matter to extract a random index in perl:
use strict;
use warnings; # Always use these two pragmas
my $file = '.\input.txt';
open my $fh, "<", $file or die "Unable to open file: $!";
chomp(my #data = <$fh>); # chomp all lines at once
my $query = "SELECT columnA FROM table WHERE ";
$query .= join " OR ", ( "columnA LIKE ?" ) x #data;
# add placeholder for each line
#data = map "%$_%", #data; # add wildcards
my $sth = $dbh->prepare($query);
$sth->execute(#data); # execute query with lines as argument
my $result = $sth->fetchall_arrayref;
my $randid = rand #$result; # find random index
my $row = $result->[ $randid ];
print "- #$row\n";
print "<BR />";
As you see, I've used placeholders, which is the proper way to use variables with queries. It also happens to be a simple way to handle an arbitrary amount of arguments. Because we include all lines in the query, we do not need a for loop.
As you see, I've also changed a few other small details, such as using three argument open with a lexical file handle, including the error variable $! in the die statement, using proper indentation, using strict and warnings (you should never code without them)
I've handled the randomization in perl because it is simplest for me. It may be as simple and more effective to handle in the SQL query. You may just tack on the ORDER BY random() LIMIT 1 to the end of it, and that might work just fine too.

Perhaps order by random(),
SELECT nazov_receptu FROM recepty ORDER BY RANDOM() LIMIT 1;
If you want to fetch only one random row, make sure to put this code out of the loop,
my $sth = $dbh->prepare("SELECT nazov_receptu FROM recepty ORDER BY RANDOM() LIMIT 1");
$sth->execute;
my ($nazov_receptu) = $sth->fetchrow_array;

Because your query is inside the foreach my $line (#data) loop, it will run once for each item in #data, getting a different random row each time. If you want it to only run once total, you need to move it outside of that loop (in addition to using "order by random() limit 1").

Related

Iterate through a file multiple times, each time finding a regex and returning one line (perl)

I have one file with ~90k lines of text in 4 columns.
col1 col2 col3 value1
...
col1 col2 col3 value90000
A second file contains ~200 lines, each one corresponding to a value from column 4 of the larger file.
value1
value2
...
value200
I want to read in each value from the smaller file, find the corresponding line in the larger file, and return that line. I have written a perl script that places all the values from the small file into an array, then iterates through that array using each value as a regex to search through the larger file. After some debugging, I feel like I have it almost working, but my script only returns the line corresponding to the LAST element of the array.
Here is the code I have:
open my $fh1, '<', $file1 or die "Could not open $file1: $!";
my #array = <$fh1>;
close $fh1;
my $count = 0;
while ($count < scalar #array) {
my $value = $array[$count];
open my $fh2, '<', $file2 or die "Could not open $file2: $!";
while (<$fh2>) {
if ($_ =~ /$value/) {
my $line = $_;
print $line;
}
}
close $fh2;
$count++;
}
This returns only:
col1 col2 col3 value200
I can get it to print each value of the array, so I know it's iterating through properly, but it's not using each value to search the larger file as I intended. I can also plug any of the values from the array into the $value variable and return the appropriate line, so I know the lines are there. I suspect my bug may have to do with either:
newlines in the array elements, since all the elements have a newline except the last one. I've tried chomp but get the same result.
or
something to do with the way I'm handling the second file with opening/closing. I've tried moving or removing the close command and that either breaks the code or doesn't help.
You should only be reading the 90k line file once, and checking each value from the other file against the fourth column of each line as you do, instead of reading the whole large file once per line of the smaller one:
#!usr/bin/env perl
use warnings;
use strict;
use feature qw/say/;
my ($file1, $file2) = #ARGV;
# Read the file of strings to match against
open my $fh1, '<', $file1 or die "Could not open $file1: $!";
my %words = map { chomp; $_ => 1 } <$fh1>;
close $fh1;
# Process the data file in one pass
open my $fh2, '<', $file2 or die "Could not open $file2: $!";
while (my $line = <$fh2>) {
chomp $line;
# Only look at the fourth column
my #fields = split /\s+/, $line, 4;
say $line if exists $words{$fields[3]};
}
close $fh2;
Note this uses a straight up string comparison (Via hash key lookup) against the last column instead of regular expression matching - your sample data looks like that's all that's needed. If you're using actual regular expressions, let me know and I'll update the answer.
Your code does look like it should work, just horribly inefficiently. In fact, after adjusting your sample data so that more than one line matches, it does print out multiple lines for me.
Slightly different approach to the problem
use warnings;
use strict;
use feature 'say';
my $values = shift;
open my $fh1, '<', $values or die "Could not open $values";
my #lookup = <$fh1>;
close $fh1;
chomp #lookup;
my $re = join '|', map { '\b'.$_.'\b' } #lookup;
((split)[3]) =~ /$re/ && print while <>;
Run as script.pl value_file data_file

Assigning range to an array in Perl

I have some mini problem. How can I assign a range into an array, like this one:
input file: clktest.spf
*
.GLOBAL vcc! vss!
*
.SUBCKT eclk_l_25h brg2eclk<1> brg2eclk<0> brg_cs_sel brg_out brg_stop cdivx<1>
+ eclkout1<24> eclkout1<23> eclkout1<22> eclkout1<21> eclkout1<20> eclkout1<19>
+ mc1_brg_dyn mc1_brg_outen mc1_brg_stop mc1_div2<1> mc1_div2<0> mc1_div3p5<1>
+ mc1_div3p5<0> mc1_div_mux<3> mc1_div_mux<2> mc1_div_mux<1> mc1_div_mux<0>
+ mc1_gsrn_dis<0> pclkt6_0 pclkt6_1 pclkt7_0 pclkt7_1 slip<1> slip<0>
+ ulc_pclkgpll0<1> ulc_pclkgpll0<0> ulq_eclkcib<1> ulq_eclkcib<0>
*
*Net Section
*
*|GROUND_NET 0
*
*|NET eclkout3<48> 2.79056e-16
*|P (eclkout3<48> X 0 54.8100 -985.6950)
*|I (RXR0<16>#NEG RXR0<16> NEG X 0 54.2255 -985.6950)
C1 RXR0<16>#NEG 0 5.03477e-17
C2 eclkout3<48> 0 2.28708e-16
Rk_6_1 eclkout3<48> RXR0<16>#NEG 0.110947
output (this should be the saved value in the array)
.SUBCKT eclk_l_25h brg2eclk<1> brg2eclk<0> brg_cs_sel brg_out brg_stop cdivx<1>
+ eclkout1<24> eclkout1<23> eclkout1<22> eclkout1<21> eclkout1<20> eclkout1<19>
+ mc1_brg_dyn mc1_brg_outen mc1_brg_stop mc1_div2<1> mc1_div2<0> mc1_div3p5<1>
+ mc1_div3p5<0> mc1_div_mux<3> mc1_div_mux<2> mc1_div_mux<1> mc1_div_mux<0>
+ mc1_gsrn_dis<0> pclkt6_0 pclkt6_1 pclkt7_0 pclkt7_1 slip<1> slip<0>
+ ulc_pclkgpll0<1> ulc_pclkgpll0<0> ulq_eclkcib<1> ulq_eclkcib<0>
*
*Net Section
my simple code:
#!/usr/bin/perl
use strict;
use warnings;
my $input = "clktest.spf";
open INFILE, $input or die "Can't open $input" ;
my #allports;
while (<INFILE>){
#allports = /\.SUBCKT/ ... /\*Net Section/ ;
print #allports;
}
I am doing a correct job of assigning the selected range into an array? If not how can I modify this code?
Thanks for advance.
The while loop only gives you one line at a time so you can't assign all of the lines you want at once. Use push instead to grow the array line by line.
Also, you should be using lexical file handles like $in_fh (rather than global ones like INFILE) with the three-parameter form of open, and you should include the $! variable in the die string so that you know why the open failed.
This is how your program should look
#!/usr/bin/perl
use strict;
use warnings;
my $input = 'clktest.spf';
open my $in_fh, '<', $input or die "Can't open $input: $!" ;
my #allports;
while ( <$in_fh> ) {
push #allports, $_ if /\.SUBCKT/ ... /\*Net Section/;
}
print #allports;
Note that, if all you want to do is to print the selected lines from the file, you can forget about the array and replace push #allports, $_ with print
The <INFILE> inside a while will read the file line-by-line, so not a right place to apply a regex which need to cover more than one line. In order to get a substring, the simplest way is to first join all these lines. And only after that you apply your regex.
my $contents = "";
while ( <INFILE> ) {
$contents = $contents . $_;
}
$contents =~ s/.*(\.SUBCKT.*\*Net Section).*/$1/s; # remove unneeded part
Please note that there is /s modifier in the last part of substitution line. This is required because $contents contains newlines.
To get the substring into array, just use split my #allports = split("\n", $contents);

More clarification about the usage of split in Perl

I have this following input file:
test.csv
done_cfg,,,,
port<0>,clk_in,subcktA,instA,
port<1>,,,,
I want to store the elements of each CSV column into an array, but I always get error when I try to fetch those "null" elements in the csv when I run the script. Here's my code:
# ... assuming file was correctly opened and stored into
# ... a variable named $map_in
my $counter = 0;
while($map_in){
chomp;
#hold_csv = split(',',$_);
$entry1[$counter] = $hold_csv[0];
$entry2[$counter] = $hold_csv[1];
$entry3[$counter] = $hold_csv[2];
$entry4[$counter] = $hold_csv[3];
$counter++;
}
print "$entry1[0]\n$entry2[0]\n$entry3[0]\n$entry3[0]"; #test printing
I always got use of uninitialized value error whenever i fetch empty CSV cells
Can you help me locate the error in my code ('cause I know I have somewhat missed something on my code)?
Thanks.
This looks like CSV. So the tool for the job is really Text::CSV.
I will also suggest - having 4 different arrays with numbered names says to me that you're probably wanting a multi-dimensional data structure in the first place.
So I'd be doing something like:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
use Text::CSV;
my $csv = Text::CSV->new( { binary => 1 } );
open( my $input, "<", "input.csv" ) or die $!;
my #results;
while ( my $row = $csv->getline($input) ) {
push ( #results, \#$row );
}
print join ( ",", #{$results[0]} ),"\n";
print Dumper \#results;
close($input);
If you really want separate arrays, I'd suggest naming them something different, but you could do it like this:
push ( #array1, $$row[0] ); #note - double $, because we dereference
I will note - there's an error in your code - I doubt:
while($map_in){
is doing what you think it is.
When you're assigning $entryN, define a default value:
$entry1[$counter] = $hold_csv[0] || '';
same for other #entry
I think there is a typo in while($map_in) { it should be while (#map_in) {.

Empty array in a perl while loop, should have input

Was working on this script when I came across a weird anomaly. When I go to print #extract after declaring it, it prints correctly the following:
------MMMMMMMMMMMMMMMMMMMMMMMMMM-M-MMMMMMMM
------SSSSSSSSSSSSSSSSSSSSSSSSSS-S-SSSSSDTA
------TIIIIIIIIIIIIITIIIVVIIIIII-I-IIIIITTT
Now the weird part, when I then try to print or return #extract (or $column) inside of the while loop, it comes up empty, thus rendering the rest of the script useless. I've never come across this before up until now, haven't been able to find any documentation or people with similar problems as mine. Below is the code, I marked with #<------ where the problems are and are not, to see if anyone can have any idea what is going on? Thank you kindly.
P.S. I am utilizing perl version 5.12.2
use strict;
use warnings;
#use diagnostics;
#use feature qw(say);
open (S, "Val nuc align.txt") || die "cannot open FASTA file to read: $!";
open (OUTPUT, ">output.txt");
my #extract;
my $sum = 0;
my #lines = <S>;
my #seq = ();
my $start = 0; #amino acid column start
my $end = 10; #amino acid column end
#Removing of the sequence tag until amino acid sequence composition (from >gi to )).
foreach my $line (#lines) {
$line =~ s/\n//g;
if ($line =~ />/g) {
$line =~ s/>.*\]/>/g;
push #seq, $line;
}
else {
push #seq, $line;
}
}
my $seq = join ('', #seq);
my #seq_prot = join "\n", split '>', $seq;
#seq_prot = grep {/[A-Z]/} #seq_prot;
#number of sequences
print OUTPUT "Number of sequences:", scalar (grep {defined} #seq_prot), "\n";
#selection of amino acid sequence. From $start to $end.
my #vertical_array;
while ( my $line = <#seq_prot> ) {
chomp $line;
my #split_line = split //, $line;
for my $index ( $start..$end ) { #AA position, extracts whole columns
$vertical_array[$index] .= $split_line[$index];
}
}
# Print out your vertical lines
for my $line ( #vertical_array ) {
my $extract = say OUTPUT for unpack "(a200)*", $line; #split at end of each column
#extract = grep {defined} $extract;
}
print OUTPUT #extract; #<--------------- This prints correctly the input
#Count selected amino acids excluding '-'.
my %counter;
while (my $column = #extract) {
print #extract; #<------------------------ Empty print, no input found
}
Update: Found the main problem to be with the unpack command, I thought I could utilize it to split my columns of my input at X elements (43 in this case). While this works, the minute I change $start to another number that is not 0 (say 200), the code brings up errors. Probably has something to do with the number of column elements does not match the lines. Will keep updated.
Write your last while loop the same way as your previous for loop. The assignment
my $column = #extract
is in scalar context, which does not give you the same result as:
for my $column (#extract)
Instead, it will give you the number of elements in the array. Try this second option and it should work.
However, I still have a concern, because in fact, if #extract had anything in it, you would obtain an infinite loop. Is there any code that you did not include between your two commented lines?

Search for, and remove column from CSV file

I'm trying to write a subroutine that will take two arguments, a filename and the column name inside a CSV file. The subroutine will search for the second argument (column name) and remove that column (or columns) from the CSV file and then return the CSV file with the arguments removed.
I feel like I've gotten through the first half of this sub (opening the file, retrieve the headers and values) but I can't seem to find a way to search the CSV file for the string that the user inputs and delete that whole column. Any ideas? Here's what I have so far.
sub remove_columns {
my #Para = #_;
my $args = #Para;
die "Insufficent arguments\n" if ($nargs < 2);
open file, $file
$header = <file>;
chomp $header;
my #hdr = split ',',$header;
while (my $line = <file>){
chomp $line;
my #vals = split ',',$line;
#hash that will allow me to access column name and values quickly
my %h;
for (my $i=0; $i<=$#hdr;$i++){
$h{$hdr[$i]}=$i;
}
....
}
Here's where the search and removal will be done. I've been thinking about how to go about this; the CSV files that I'll be modifying will be huge, so speed is a factor, but I can't seem to think of a good way to go about this. I'm new to Perl, so I'm struggling a bit.
Here are a few hints that will hopefully get you going.
To remove the element of an array at position $index of an array use :
splice #array,$index,1 ;
As speed is an issues, you probably want to construct an array of column numbers at the start and then loop on the the elements of the array
for my $index (#indices) {
splice #array,$index,1 ;
}
(this way is more idiomatic Perl than for (my $i=0; $i<=$#hdr;$i++) type loop )
Another thing to consider - CSV format is surprisingly complicated. Might your data have data with , within " " such as
1,"column with a , in it"
I would consider using something like Text::CSV
You should probably look in the direction of Text::CSV
Or you can do something like this:
my $colnum;
my #columns = split(/,/, <$file>);
for(my $i = 0; $i < scalar(#columns); $i++) {
if($columns[$i] =~ /^$unwanted_column_name$/) {
$colnum = $i;
last;
};
};
while(<$file>) {
my #row = split(/,/, $_);
splice(#row, $colnum, 1);
#do something with resulting array #row
};
Side note:
you really should use strict and warnings;
split(/,/, <$file>);
won't work with all CSV files
There is elegant way how to remove some columns from array. If I have columns to removal in array #cols, and headers in #headers I can make array of indexes to preserve:
my %to_delete;
#to_delete{#cols} = ();
my #idxs = grep !exists $to_delete{$headers[$_]}, 0 .. $#headers;
Then it's easy to make new headers
#headers[#idxs]
and also new row from read columns
#columns[#idxs]
The same approach can be used for example for rearranging arrays. It is very fast and pretty idiomatic Perl way how to do this sort of tasks.

Resources