New here, and to perl as well.
I'm trying to write a perl script to do some data archiving at work. Our directory structure is separated at one level by year, then has every day/month inside that.
So, for example:
\2016\1-1
\2016\1-2
....
\2016\12-20
etc...
I'm trying to make this script usable for every test (some of which are run in the future), so I have a prompt for the user to enter the year, which I use to populate the directory structure in the code. I'm having trouble with the month-date portion.
Usually tests are 3-4 days long. Right now I'm asking the user for the test length, then subtracting one. I do that because a 4 day test starting on the 8th, would be the 8 ,9 ,10, and 11th. If I don't subtract one, and just add the duration to the start date, it would go to the 12th.
I can't figure out how to take a start date and duration, and get that range of dates into an array, that I can then call when I need to find/create directories for the archiving.
Ideally, I'd like to have the user enter the year, start date, and length of the test, then create an array that will hold the month-day for all dates of the test. Then I could call those array dates when I create/find the needed directories.
Right now I'm still working on getting the inputs from the user, so that is all that I have.
Couple notes: I don't have every Perl module available to me. I have been using DateTime, and I'm not 100% sure what other modules I could use. Also, the directory structure does not have leading zeros on the month/date (i.e., 2-5, NOT 02-05).
Here is the input code that I have so far. It gives me a start date of today (which I need to change), and gives me the correct end date based on the test length. As you can see, I have not been able to get very far in this:
use warnings;
use DateTime;
my #test_dates;
print "What NAS?";
my $NAS = <STDIN>;
chomp($NAS);
print "What is the Multi-Mission mode (ops/nonops)?";
my $MM_Mode = <STDIN>;
chomp($MM_Mode);
print "What is the data mode (ops/sim/tst)?";
my $Data_Mode = <STDIN>;
chomp($Data_Mode);
print "Which spacecraft (NPP/J01)?";
my $sc = <STDIN>;
chomp($sc);
print "What LOM is being cleaned?";
my $LOM = <STDIN>;
chomp($LOM);
print "What MDMZ is being cleaned?";
my $MDMZ = <STDIN>;
chomp($MDMZ);
print "How many days is the test?";
my $Length = ( <STDIN> - 1 );
my $date = DateTime->from_epoch( epoch => time );
my $Duration = DateTime::Duration->new( days => $Length );
print "NAS is $NAS\n";
print "Multi-Mission mode is $MM_Mode\n";
print "Data Mode is $Data_Mode\n";
print "LOM is $LOM\n";
print "MDMZ is $MDMZ\n";
printf $date->ymd('/');
print #test_dates;
I don't think you have it clear in your mind exactly what you need, but this should help
use strict;
use warnings 'all';
use Time::Piece;
use Time::Seconds 'ONE_DAY';
print 'Enter start date (YYYY-MM-DD): ';
chomp(my $start = <>);
$start = Time::Piece->strptime($start, '%Y-%m-%d');
print 'Enter number of days: ';
chomp(my $duration = <>);
{
my $end = $start + ONE_DAY * ($duration-1);
die "Period crosses end of year" if $start->year != $end->year;
}
my $date = $start;
while ( --$duration ) {
my $dir = $date->strftime('\%Y\%m-%d');
$dir =~ s/\D\K0+//g;
print $dir, "\n";
$date += ONE_DAY;
}
output
E:\Perl\source>perl date_duration.pl
Enter start date (YYYY-MM-DD): 2016-7-18
Enter number of days: 22
\2016\7-18
\2016\7-19
\2016\7-20
\2016\7-21
\2016\7-22
\2016\7-23
\2016\7-24
\2016\7-25
\2016\7-26
\2016\7-27
\2016\7-28
\2016\7-29
\2016\7-30
\2016\7-31
\2016\8-1
\2016\8-2
\2016\8-3
\2016\8-4
\2016\8-5
\2016\8-6
\2016\8-7
Here is an approach that uses DateTime. You don't need to create a DateTime::Duration object, DateTime does that implicitly for you whenever you use date math.
use strict;
use warnings 'all';
use DateTime;
use feature 'say';
my $duration = 4;
my ($year, $month, $day) = (2016, 4, 9);
my $start = DateTime->new( year => $year, month => $month, day => $day);
my #dates;
foreach my $i ( 0 .. ( $duration - 1 )) {
my $date = $start->clone->add( days => $i )->strftime('\%Y\%m-%d');
$date =~ s/(?<=\D)0+//g;
push #dates, $date;
}
say for #dates;
I left out the user input stuff and just set some values. Basically it builds each day, creates the folder names and puts them into an array.
Output
\2016\4-9
\2016\4-10
\2016\4-11
\2016\4-12
Related
Please, have a look!
I have number of days from 1 to 30 days, so I need to loop through number of days and identify a date for each correspondings days with one "for" loop, The started date is: $epoc = 2020-05-11; So, iam converting it into epoc seconds, and I found the last date in sec (30th day).I was able to solve it like this, but I need with one "for" loop. Is it possible?
#!/usr/bin/perl
use Time::Local;
use Time::Localtime;
$day_a1 = 1;
$day_a2 = 30;
my ($year, $month, $day) = split('-', $epoc);
$epoc = timelocal($s, $m, $h, $day, $month-1, $year-1900);
$interval=$day_a2*60*60*24;
$epoc1=$epoc+$interval;
print scalar(localtime($epoc1)), "\n";
#x1=();
#date1=();
for ($d = $day_a1; $d <= $day_a2; $d++){
push (#x1, $d);
}
for ($d = $epoc; $d <= $epoc1; $d+=86400){
print scalar(localtime($d));
push (#date1, $d);
}
print #x1;
print #date1;
You would make your life a lot easier if you used the tools that are available for you. In particular, Time::Piece and Time::Seconds have been part of the standard Perl distribution since 2007.
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
use Time::Piece;
use Time::Seconds;
my $start = '2020-05-11';
# Add time of midday to avoid DST problems.
my $curr = Time::Piece->strptime("$start 12:00:00", '%Y-%m-%d %H:%M:%S');
for (1 .. 30) {
say $curr->strftime('%Y-%m-%d');
$curr += ONE_DAY;
}
Perhaps you intended to write code in following form
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
use Time::Local;
use Time::Localtime;
my $days = 30;
my $date = '2020-05-11';
my $day_sec = 60*60*24;
my ($year, $month, $day) = split('-', $date);
my $epoch = timelocal(0, 0, 0, $day, $month-1, $year-1900);
for(my $day=0; $day<$days; $day++) {
say scalar localtime($epoch);
$epoch += $day_sec;
}
I successfully create a hash of arrays, and I am using it to calculate log-odds scores for each DNA sequence from a file (Creating a hash of arrays for DNA sequences, Perl has input file format). I get a score for each sequence, but I get a warning for each calculation. Naturally, I want to clear up the warning. The warning is: Use of uninitialized value in string eq at line 148.
Here is a summarized version of the code (I can post the full code if necessary):
use strict;
use warnings;
use Data::Dumper;
#USER SPECIFICATIONS
print "Please enter the filename of the fasta sequence data: ";
my $filename1 = <STDIN>;
#Remove newline from file
chomp $filename1;
#Open the file and store each dna seq in hash
my %id2seq = ();
my %HoA = ();
my %loscore = ();
my $id = '';
open (FILE, '<', $filename1) or die "Cannot open $filename1.",$!;
my $dna;
while (<FILE>)
{
if($_ =~ /^>(.+)/)
{
$id = $1; #Stores 'Sequence 1' as the first $id, for example
}
else
{
$HoA{$id} = [ split(//) ]; #Splits the contents to allow for position reference later
$id2seq{$id} .= $_; #Creates a hash with each seq associated to an id number, used for calculating tables that have been omitted for space
$loscore{$id} .= 0; #Creates a hash with each id number to have a log-odds score
}
}
close FILE;
#User specifies motif width
print "Please enter the motif width:\n";
my $width = <STDIN>;
#Remove newline from file
chomp $width;
#Default width is 3 (arbitrary number chosen)
if ($width eq '')
{
$width = 3;
}
#Omitting code about $width<=0, creation of log-odds score hash to save space
foreach $id (keys %HoA, %loscore)
{
for my $pos (0..($width-1))
{
for my $base (qw( A C G T))
{
if ($HoA{$id}[$pos] eq $base) #ERROR OCCURS HERE
{
$loscore{$id} += $logodds{$base}[$pos];
}
elsif ( ! defined $HoA{$id}[$pos])
{
print "$pos\n";
}
}
}
}
print Dumper(\%loscore);
The output I get is:
Use of uninitialized value in string eq at line 148, <STDIN> line 2.
2
(This error repeats 4 times for each position - most likely due to matching to each $base?)
$VAR1 = {
'Sequence 15' => '-1.27764697876093',
'Sequence 4' => '0.437512962981119',
(continues for 29 sequences)
}
To summarize, I want to calculate the log-odds score of each sequence. I have a log-odds score hash %loscore that contains the score of a base at each location within a motif. The log-odds score is calculated by summing the referenced values. For example, if the log-odds table was
A 4 3 2
C 7 2 1
G 6 9 2
T 1 0 3
The log-odds score of the sequence CAG would be 7+3+2=12.
At the moment, I believe that the error occurs because of the way I split the strings of DNA to be put into the hash of arrays. As I previously stated, if you want all the code so you can copy-paste, I can provide it. I think the solution is pretty simple, and I just need someone to point me in the right direction. Any and all help is appreciated, and I can clarify as questions arise. Also, any tips that could help me to post more concise questions are appreciated (I know this one is lengthy, I just want to provide enough background information).
Here is the code that I am using to iterate through the `%HoA. It calculates a log-odds score for each sequence, then works through each sequence to find a maximum score for each sequence. Big thanks to everyone for helping out!
foreach $id (keys %HoA)
{
for my $pos1 (0..length($HoA{$id})-1)
{
for my $pos2 ($pos1..$pos1+($width-1))
{
for my $base (qw( A C G T))
{
if ($HoA{$id}[$pos2] eq $base)
{
for my $pos3 (0..$width-1)
{
$loscore{$id} += $logodds{$base}[$pos3];
if ($loscore{$id} > $maxscore{$id})
{
$maxscore{$id} = $loscore{$id};
}
}
}
elsif ( ! defined $HoA{$id}[$pos2])
{
print "$pos2\n";
}
}
}
}
}
I've conceptualized a function in my head,and I imagine it exists, but if it does, I don't know the name of it, and for that reason it's difficult to google.
Lets say I want to define two outcomes- I'll call them Pass and Fail
I want several different rules to apply to each of the two outcomes.
So, for clarity (hopefully), let's say imagine I have a file with different years stored on each line, or blank spaces.
File
line1 1978
line2 1989
line3
line4 1978
line5 1999
line6
line7 1768
Now lets say there is a user input of 1978 to compare to the file.
This date will be compared to each line in the file.
There can only be two results base on the comparison- they are Pass and Fail.
For a comparison to be in the Pass group it would need to meet the following criteria:
1) It matches the year in the file
2) Neither the input, or the line in the file contains a year
3) Only one line contains a year, and the other is blank
For a line to meet the Fail condition:
1) Both the input and the file record contain a year, but they are different.
So for my example input of 1978 the results would look like this
line1 1978 Pass
line2 1989 Fail
line3 Pass
line4 1978 Pass
line5 1999 Fail
line6 Pass
line7 1768 Fail
So basically I want to assign different (and multiple) condition to Pass and Fail as outlined above.
What I want to know is, what (if anything) is the name of this type of function.
What is the name of some sort of function where I can apply these rules to Pass and Fail. As far as I'm aware, I can't create Pass and Fail variables because there is multiple rules for each condition. Is there some other built in function I can use?
This should help you. It simply tests each line for the failure condition. Everything else is a pass.
use strict;
use warnings 'all';
my $input = 1978;
while ( my $line = <DATA> ) {
$line =~ s/\s*\z//;
my $result = $line && $input && $line ne $input ? 'Fail' : 'Pass';
printf "%4s %s\n", $line, $result;
}
__DATA__
1978
1989
1978
1999
1768
output
1978 Pass
1989 Fail
Pass
1978 Pass
1999 Fail
Pass
1768 Fail
Why does line 5 match? The input is 1978 and the file contains 1999? [Ah, you've just edited your question to fix that]
With the proviso that my code returns false for that match (which seems correct to me). Here's my version:
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
my #years = (1978, 1989, undef, 1978, 1999, undef, 1768);
my $year = 1978;
say compare($year, \#years);
sub compare {
my ($year, $years) = #_;
my #return;
foreach (#$years) {
push #return, 1 and next if ! defined $_;
push #return, 1 and next if $_ == $year;
push #return, 0;
}
return #return;
}
There's no standard library function or name for exactly what you've described. You have two arrays and you want to compare their elements using complex criteria. There are an infinite number of ways two arrays could be compared using complex criteria; we can't create a snappy name and write a standard library function for all of them!
Using the minimal assumptions, we can write this function as follows:
sub compareArrayElementsForEitherBlankSpaceOrMatch {
## takes two array references to compare; assumes they're the same length
my ($A,$B) = #_;
my #res;
if (#$A > 0) {
$#res = #$A-1; ## preallocate array
for (my $i = 0; $i < #$A; ++$i) {
$res[$i] = $A->[$i] eq ' ' || $B->[$i] eq ' ' || $A->[$i] eq $B->[$i] ? 1 : 0;
}
}
return #res;
}
Demo:
my #res = compareArrayElementsForEitherBlankSpaceOrMatch(
[' ', ' ' , 2000, 2000, 2000 ],
[' ', 2000, ' ' , 2000, 1900 ]
);
print("#res\n");'
## 1 1 1 1 0
Update: Based on your edit, you're not comparing two arrays against each other, but rather you are comparing one array against a single value. This can be accomplished as follows:
sub compareArrayElementsForEitherBlankSpaceOrMatch {
## takes an array reference and compares its elements with a given value
## an element is considered to match if it is blank or equal to the value
my ($A,$x) = #_;
my #res;
if (#$A > 0) {
$#res = #$A-1; ## preallocate array
for (my $i = 0; $i < #$A; ++$i) {
$res[$i] = $A->[$i] =~ m(^\s*$) || $A->[$i] eq $x ? 1 : 0;
}
}
return #res;
}
Demo:
my #res = compareArrayElementsForEitherBlankSpaceOrMatch(
[1978,1989,"",1978,1999,"",1768],
1978
);
print("#res\n");
## 1 0 1 1 0 1 0
Note that the line 5 result should be a fail, since 1999 does not equal 1978.
I enhanced the blank check to use a regex test against m(^\s*$) rather than a fixed equality check against a single space.
My original comment is still correct regarding the non-existence of a standard function or name for this exact operation.
I have few lines in my array #lines in which * shows me the start time of a command (like sync/fetch) and the line with same processID pid and the command without * shows me the end time. They may not be continuous always. I would like to get the startdate and enddate of a particular processID and cmd. Like for usera the cmd sync with processID 11859 started at 2015/01/13 13:53:01.491-05:00 and ended at 2015/01/13 13:55:01.492-05:00
Below is my approach in which I took a hash of array and used processID as key and did split the lines. This works fine only when the start and end lines of a command are continuous , but how can I make it work even when they are not continuous.
my %users;
foreach my $line (#lines) {
if ($line =~ m{(\*)+}) {
($stdate, $sttime, $pid, $user, $cmd) = split ' ', $line;
$startdate ="$stdate $sttime";
}
else {
($eddate, $edtime, $pid, $user, $cmd) = split ' ', $line;
$enddate = "$eddate $edtime";
}
$users{$pid} = [ $startdate, $enddate, $user, $cmd ];
}
Content in #lines:
2015/01/13 13:53:01.491-05:00 11859 usera *sync_cmd 7f1f9bfff700 10.101.17.111
2015/01/13 13:57:02.079-05:00 11863 userb *fetch_cmd 7f1f9bfff700 10.101.17.111
2015/01/13 13:59:02.079-05:00 11863 userb fetch_cmd 7f1f9bfff700 10.101.17.111
2015/01/13 13:55:01.492-05:00 11859 usera sync_cmd 7f1f9bfff700 10.101.17.111
I'm looking at your code and wondering why you're using a hash of arrays.
As far as I'm concerned, the purpose of array is a set of similar but ordered values.
Could you not instead do:
my %processes;
foreach (#lines) {
my ( $date, $time, $pid, $user, $cmd, #everything_else ) = split;
if ( $cmd =~ m/^\*/ ) {
#if command starts with a * - it started.
if ( defined $processes{$pid} ) {
print "WARNING: $pid reused\n";
}
$processes{$pid}{'start_date'} = $date;
$processes{$pid}{'time'} = $time;
$processes{$pid}{'user'} = $user;
$processes{$pid}{'cmd'} = $cmd;
}
else {
#cmd does not start with '*'.
if ( $processes{$pid}{'cmd'} =~ m/$cmd/ ) {
#this works, because 'some_command' is a substring of '*some_command'.
$processes{$pid}{'end_date'} = $date;
$processes{$pid}{'end_time'} = $time;
}
else {
print
"WARNING: $pid has a command of $cmd, where it started with $processes{$pid}{'cmd'}\n";
}
}
}
You might want some additional validation tests in case you've got e.g. a long enough log that pids get reused, or e.g. you've got a log that doesn't include both start and finish of a particular process.
When you assign to %users{$pid} you are presuming that the most recent $startdate and $enddate are both relevant. This problem is exacerbated by the fact that your variables that hold your field values have a scope larger than the foreach loop, allowing these values to bleed between records.
In the if block, you should assign the values of $startdate, $user, $cmd to the array. Individually or as a slice if you like. In the else block you should assign $enddate to it's element in the array.
Regex extra credit: You don't seem to really care if there is more that one * in a record, making the + in the regex superfluous. As an added bonus, without it the capturing group is also of no value. m{\*} should do quite nicely.
Hi I tried to select one random number from this:
My source:
use DBI;
use CGI;
my $file = '.\input.txt'; # Name the file
open(FILE, $file) or die("Unable to open file");
my #data = <FILE>;
foreach my $line (#data)
{
chomp $line
my $sth = $dbh->prepare("SELECT columnA FROM table WHERE columnA LIKE '%$line%'");
$sth->execute;
my $result = $sth->fetchall_arrayref;
foreach my $row ( #$result ) {
print "- ";
print "#$row\n";
print "<BR />";
}
}
How can I print only ONE RANDOM row???
I tried something like that:
my $sth = $dbh->prepare("SELECT nazov_receptu FROM recepty WHERE nazov_receptu LIKE '%$line%' AND kategoria == 'p' AND (rowid = (abs(random()) % (select max(rowid)+1 from recepty)) or rowid = (select max(rowid) from recepty)) order by rowid limit 1;");
but its not clear... i dont know why...
I am using SQLite and printing it to web interface.
You can try it when you have
input.txt:
A
C
database:
id name
1 A
2 B
3 C
4 D
5 E
OUT:
A OR C (random)
Why not join the file arguments into the query right away instead of looping over them? Then it is a simple matter to extract a random index in perl:
use strict;
use warnings; # Always use these two pragmas
my $file = '.\input.txt';
open my $fh, "<", $file or die "Unable to open file: $!";
chomp(my #data = <$fh>); # chomp all lines at once
my $query = "SELECT columnA FROM table WHERE ";
$query .= join " OR ", ( "columnA LIKE ?" ) x #data;
# add placeholder for each line
#data = map "%$_%", #data; # add wildcards
my $sth = $dbh->prepare($query);
$sth->execute(#data); # execute query with lines as argument
my $result = $sth->fetchall_arrayref;
my $randid = rand #$result; # find random index
my $row = $result->[ $randid ];
print "- #$row\n";
print "<BR />";
As you see, I've used placeholders, which is the proper way to use variables with queries. It also happens to be a simple way to handle an arbitrary amount of arguments. Because we include all lines in the query, we do not need a for loop.
As you see, I've also changed a few other small details, such as using three argument open with a lexical file handle, including the error variable $! in the die statement, using proper indentation, using strict and warnings (you should never code without them)
I've handled the randomization in perl because it is simplest for me. It may be as simple and more effective to handle in the SQL query. You may just tack on the ORDER BY random() LIMIT 1 to the end of it, and that might work just fine too.
Perhaps order by random(),
SELECT nazov_receptu FROM recepty ORDER BY RANDOM() LIMIT 1;
If you want to fetch only one random row, make sure to put this code out of the loop,
my $sth = $dbh->prepare("SELECT nazov_receptu FROM recepty ORDER BY RANDOM() LIMIT 1");
$sth->execute;
my ($nazov_receptu) = $sth->fetchrow_array;
Because your query is inside the foreach my $line (#data) loop, it will run once for each item in #data, getting a different random row each time. If you want it to only run once total, you need to move it outside of that loop (in addition to using "order by random() limit 1").