Contrived example:
use strict;
use warnings;
my $myval = 'a';
my #result = my_sub($myval);
if (#result) {
print "DEFINED\n";
}
my ($res1, $res2, $res3) = #result;
print "res1=$res1, res2=$res2, res3=$res3\n";
sub my_sub {
my $myval = shift;
if ($myval eq 'a') {
return undef;
}
return ("a","b","c");
}
How do I check if sub returned undef?
or
How do I check if sub did not return undef?
return undef in list context returns list of one element, which is undef.
#result = my_sub($myval);
if (#result == 1 && !defined($result[0])) {
warn "my_sub() returned undef";
} else {
print "my_sub() returned data\n";
}
That said, a list with one undef element is almost never what you want. See How do I return nothing from a subroutine? You will generally just want to return with no arguments. In scalar context, that returns undef and in list context it returns an empty list.
sub my_other_sub {
my $myval = shift;
if ($myval eq 'a') {
return;
}
return ("a","b","c");
}
...
#result = my_other_sub($arg1);
$result = my_other_sub($arg2);
if (#result == 0) { # or: if (!#result) ... or: unless (#result) ...
warn "my_other_sub(arg1) did not return any data";
} else {
print "my_other_sub(arg1) returned data\n";
}
if (!defined($result)) {
warn "my_other_sub(arg2) did not return any data";
} else {
print "my_other_sub(arg2) returned data\n";
}
I would recommend, as others have, to use return, or the more explicit return (). This returns an empty list. However, since the example is contrived, we can't be sure that an empty list isn't an otherwise valid return. If it is a valid return, or it might reasonably be so one day, then you have other options which, IMO, are less ideal but can be more flexible.
An obvious one is to use die, as zdim suggested, but that can be relatively heavy-handed. It may actually be really what you want - if this situation really isn't supposed to happen, die is perfect as it might cause your program to abort if you don't wrap the failure in an eval.
Another alternative is to have your sub return an array ref instead of a list. And then you can return undef directly, your caller would be able to check that easily enough: my $result = my_sub(...);. Other uses of that array would just need to go through a dereference, e.g., my ($res1, $res2, $res3) = #$result;. This is probably my preference when a simple return () cannot suffice. Bonus points in that only the reference is passed back, not the whole list. Consider doing this even when an empty list isn't valid but the list can be very large.
Other options also abound, although those are probably the simplest. You could, for example, return a hash (or array) with one entry indicating success/failure and another with the array. You could return success/failure as the first element (your contrived example would return (1, "a", "b", "c") and you'd shift that first element off to see if it was successful or not). You could embed your return in an object that encapsulated all that. Of these, only the object one would I give serious consideration to, but it would greatly depend on the rest of the architecture, and would be very rare.
A subroutine returns a list of scalars in list context which you use, then assigned to variables. So if you return an undef from it the $res variables are going to be undef -- first assigned undef and others unassigned (while the array #result will have one element, an undef)
perl -Mstrict -wE'
my $val = shift;
sub t { return undef if shift eq "bad"; return qw(a b) };
my ($v1, $v2) = t($val);
if (not defined $v1 and not defined $v2) { say "undef" }
else { say "$v1, $v2" }
' bad
The first shift takes the value off of #ARGV, so vary input of "bad" to see other cases. This can be written in more compact and clearer ways if we knew how it's used.
I appreciate that this is a test example, but it is still too convoluted, allowing for tricky edge cases; for example, your first case won't work since #result is not "false" (empty list), what the code tests for, as it does have one element, which is undef.
For such "special" returns either use return with no arguments or throw a die, in situations you consider exceptional. For context-aware returns see wantarray.
Related
I am new to perl programming and I am trying to build a script using several subroutines on it. For a start I am trying to run a short mocke script to work out subroutines behaviour, but I don't get to understand the input.
Here is my code:
sub prueba{
my (#array1, #array2)=#_;
if (scalar(#array1)<scalar(#array2)) {
print #array1,"\n";
} elsif (scalar(#array1)>scalar(#array2)){
print #array2,"\n";
}
};
my #primero=(1,5,9);
my #segundo=(1,7,8,9,6,5,6,9);
prueba(#primero,#segundo);
I am passing two arrays and I want the subroutine to retrieve the answer according to those arrays, but when I run it I get no output, not even warning errors messages... I already tried using the refference to the array, but still not working:
sub prueba{
my (#array1, #array2)=#_;
if (scalar(#array1)<scalar(#array2)) {
print #array1,"\n";
} elsif (scalar(#array1)>scalar(#array2)){
print #array2,"\n";
}
};
my #primero=(1,5,9);
my #segundo=(1,7,8,9,6,5,6,9);
prueba(\#primero,\#segundo);
You can't pass arrays to subs (and they can't return them). You can only pass a number of scalars. What you are doing is equivalent to the following:
prueba(1,5,9,1,7,8,9,6,5,6,9);
All of the arguments end up in #array1. What we do is pass references to arrays.
prueba(\#primero,\#segundo);
But that also requires changing the sub. Without change, all of the arguments still end up in #array1. See perlreftut for a start on working with references. You can use
sub prueba{
my ($array1, $array2)=#_;
if (scalar(#$array1)<scalar(#$array2)) {
print "#$array1\n";
} elsif (scalar(#$array1)>scalar(#$array2)){
print "#$array2\n";
}
}
or just
sub prueba {
my ($array1, $array2) = #_;
if (#$array1 < #$array2) { say "#$array1"; }
elsif (#$array1 > #$array2) { say "#$array2"; }
}
< and > expect a number, so they already impose scalar context. And might as well use say, though that requires use feature qw( say ); (or something like use 5.014; which does the trick as well).
You can use prototypes to make it look like you're passing multiple arrays, and have perl turn them automatically into references:
sub prueba :prototype(\#\#) {
my ($array1, $array2) = #_;
if (#$array1 < #$array2) {
print #$array1,"\n";
} elsif (#$array1 > #$array2){
print #$array2,"\n";
}
}
my #primero=(1,5,9);
my #segundo=(1,7,8,9,6,5,6,9);
prueba(#primero, #segundo);
But read the documentation carefully to understand the cases where the subroutine can be called without enforcing the prototype.
Thanks all I just figured out what I wanted. I found that I can actually pass an array to perl, however maybe I am not explaning mysfel properly.The thing is to load the arrays as follow, inside the subroutine.
my #primero=#{$_[0]};
my #segundo=#{$_[1]};
This means we are using the reference. Ehen running the function, we must write the \ before each input:
prueba(\#primero,\#segundo);
I have an array where each element comes from a line delimited by tab.
Initial code:
#!/usr/bin/perl -w
use strict;
The code below is a piece of the code.
sub parser_domains {
my #params = #_;
my $interpro_line = "";
my #interpro_vector = ( );
my $idr_sub_id = $params[0];
my $idr_sub_start = $params[1]+1;
my $idr_sub_end = $params[2]+1;
my $interpro_id = "";
my $interpro_start_location = 0;
my $interpro_end_location = 0;
my $interpro_db = "";
my $interpro_length = 0;
my $interpro_db_accession = "";
my $interpro_signature = "";
my $interpro_evalue = "";
my $interpro_vector_size = 0;
my $interpro_sub_file= "";
my $idr_sub_lenght = ($idr_sub_end-$idr_sub_start)+1;
$interpro_sub_file = "$result_directory_predictor/"."$idr_sub_id/"."$idr_sub_id".".fsa.tsv";
#open file; if it fails, print a error and exits.
unless( open(TSV_FILE_DATA, $interpro_sub_file) ) {
print "Cannot open file \"$interpro_sub_file\"\n\n";
return;
}
my #interpro_file_line = <TSV_FILE_DATA>;
close TSV_FILE_DATA;
foreach $interpro_line (#interpro_file_line) {
#interpro_vector = split('\t',$interpro_line);
$interpro_id = $interpro_vector[0];
$interpro_db = $interpro_vector[3];
$interpro_db_accession = $interpro_vector[4];
$interpro_start_location = $interpro_vector[6];
$interpro_end_location = $interpro_vector[7];
$interpro_signature = $interpro_vector[11];
$interpro_length = ($interpro_end_location-$interpro_start_location) + 1;
if ($interpro_signature eq ""){
$interpro_signature = "NOPIR";
printf IDP_REGION_FILE "\nFound a $interpro_db domain with no IPR: starts at $interpro_start_location and ends at $interpro_end_location\n";
printf IDP_REGION_FILE "The size of $interpro_db domain in the sequence is $interpro_length\n";
printf IDP_REGION_FILE "The IDR starts at $idr_sub_start and and ends at $idr_sub_end\n";
printf IDP_REGION_FILE "The size of IDR is $idr_sub_lenght\n";
domains_regions($idr_sub_start,$idr_sub_end,$interpro_start_location,$interpro_end_location,$interpro_signature,$interpro_length,$interpro_db,$idr_sub_id,$interpro_db_accession,$idr_sub_lenght);
}
else{
for $entry_line (#entry_file_line) {
#entry_vector = split('\t',$entry_line);
$entry_ac = $entry_vector[0];
$entry_type = $entry_vector[1];
$entry_name = $entry_vector[2];
chomp($entry_name);
if ($interpro_signature eq $entry_ac) {
printf IDP_REGION_FILE "\nFound a $interpro_db domain with Interpro Signature $entry_ac: starts at $interpro_start_location and ends at $interpro_end_location\n";
printf IDP_REGION_FILE "The size of $interpro_db domain in the sequence is $interpro_length\n";
printf IDP_REGION_FILE "The Interpro Signature $entry_ac belongs to type $entry_type\n";
printf IDP_REGION_FILE "The name of $entry_ac is $entry_name\n";
printf IDP_REGION_FILE "The IDR starts at $idr_sub_start and ends at $idr_sub_end\n";
printf IDP_REGION_FILE "The size of IDR is $idr_sub_lenght\n";
domains_regions($idr_sub_start,$idr_sub_end,$interpro_start_location,$interpro_end_location,$interpro_signature,$interpro_length,$interpro_db,$idr_sub_id,$interpro_db_accession,$idr_sub_lenght);
}
}
}
}
}
A example of tsv file (interproscan):
P51587 14086411a2cdf1c4cba63020e1622579 3418 Pfam PF09103 BRCA2, oligonucleotide/oligosaccharide-binding, domain 1 2670 2799 7.9E-43 T 15-03-2013
P51587 14086411a2cdf1c4cba63020e1622579 3418 ProSiteProfiles PS50138 BRCA2 repeat profile. 1002 1036 0.0 T 18-03-2013 IPR002093 BRCA2 repeat GO:0005515|GO:0006302
P51587 14086411a2cdf1c4cba63020e1622579 3418 Gene3D G3DSA:2.40.50.140 2966 3051 3.1E-52 T 15-03-2013
...
The scripts works perfectly, but the comparison $interpro_signature eq "" provides a warning.
Use of uninitialized value $interpro_signature in string eq at /home/duca/eclipse-workspace/idps/idp_parser_interpro.pl line 666.
So, I searched and tried manners to replace the empty value into the array before the comparison. I would like the empty value by "NOIPR".
I'm working with 9 completed genomes, and I have more than 324000 proteins to parse.
How can I replace the empty values in my array?
Thanks.
Your array may not have 12 elements (or the 12-th element may be undef)
my $interpro_signature = $interpro_vector[11] // 'some_default_value';
The // is the defined-or operator.
The error Use of uninitialized value means that the variable hasn't been initialized, or it's been set to undef.
See perldiag and use it regularly. Run code with perl -Mdiagnostics ... on errors, regularly.
The use warnings; is actually better than -w.
Update to a substantial edit of the question
From shown data it appears that yet other fields may not be given in the file; so proof all variables with defaults, much like for the array element at index 11 above. This is what you want to do in general anyway. For example, if there are all fields in the file but some may be empty (two tabs with nothing in between)
my #interpro_defaults = ('id_default', 'db_default', ...);
my ($interpro_id, $interpro_db, ...) =
map {
$interpro_vector[$_] // $interpro_defaults[$_]
} 0 .. $#interpro_defaults;
This relies on order (of variables) in the list, what can be error prone with variables; see below.
If some fields are simply not there there may be (far) more work to do.
There are too many separate variables, all related and named as $interpro_X (and then there are $idr_Y and $entry_Z, but fewer and perhaps manageable).
Can you not bundle them in a container-type variable or a data structure?
A hash %interpro seems suitable, with keys X (so, $interpro{id} etc). Then you can use them more easily and can perform some actions on the whole lot. You still have to watch for order when initializing since they are read sequentially, but it should be clearer this way. For example
my #interpro_vars = qw(id db db_accesssion ...);
my #interpro_vector = qw(id_default db_default ...);
my %interpro;
#interpro{#interpro_vars} = #interpro_vector;
# or simply
#interpro{qw(id db ...)} = qw(id_default db_default ...);
I've defined arrays with keys and values first and then used them, in case that you may want to later have those lists in arrays. If that's not the case you can initialize the hash with lists (the last line).
Here
my %h;
#h{LIST-keys} = LIST-values;
is a way to assign the list of LIST-values to the set of keys of the hash %h given in LIST-keys. They are assigned one for one, in the given order of both lists (which had better match in size). There is the # sigil in front of hash's keys since we are having a list (of keys) there, not a hash. Note that the hash must have been declared somewhere. See slices in perldata.
The problem is that your 3rd line contains only 9 elements. So
#interpro_vector = split('\t',$interpro_line);
for that line assigns only 9 elements to #interpro_vector but you then access $interpro_vector[11] (i.e. the 12th element) and that doesn't exist. You can now either check that #interpro_vector contains (at least) 12 elements:
if (#interpro_vector >= 12) {
...
}
Or you can use the defined-or operator as #zdim suggested to use a default value in case $interpro_vector[11] isn't defined:
$interpro_signature = $interpro_vector[11] // '';
The above line is equivalent to
if (defined $interpro_vector[11]) {
$interpro_signature = $interpro_vector[11];
} else {
$interpro_signature = '';
}
Now
if ($interpro_signature eq "") {
...
}
will work.
I'm working on a project that's scalating a lot, lately, and I'm re-writing code to make it more OOP and passing all redundant code into sub-routines.
The script checks whether a gene exists in the database (through various means) or not. It may also report possible duplicates. Before reporting a duplicate, the script makes sure it's not a "biological duplicate" (essentially the same biological data but a with different position in the genome and, hence, not an actual duplicate). In order to do so...
my #gene_ids;
my #gene_names;
while(my $gene = $geners_bychecksum->next){
my $gene_name = $gene->gene_name;
my $gene_id = $gene->gene_id;
push #gene_ids, $gene_id;
push #gene_names, $gene_name;
}
print STDERR "$id\tJ\tALERT CHECKSUM MULTI-HIT\t(".join(",",#gene_names).")\n";
my $solve_multihit = solve_multihit($id, \#gene_names, \#gene_ids, $spc, $species_directory, $dataset);
print STDERR "$id\tJ\tALERT CHECKSUM MULTI-HIT\t(".join(",",#gene_names).")\n";
if($solve_multihit){
print STDERR "$id\tM\tUPDATE \n";
print $report "$id\tM\tUPDATE \n";
$countM++;
} else {
print STDERR "$id\tJ\tALERT CHECKSUM MULTI-HIT\t(".join(",",#gene_names).")\n";
}
Here, $geners_bychecksum is a DBIC resulset with database hits from a prior search and, for this case-scenario, it always has more than 1 gene. The $id,$spc,$species_directory and $dataset are all strings that come from the config and are defined above this chunk.
The solve_multihit subroutine is a rather complicated function that tries to resolve whether the multi-hits are actual duplicates or biological duplicates. Notice that I'm passing the #gene_names and #gene_ids arrays to this function. This function will return the gene_id of the proper gene, if it was able to solve the discrepancy; or 0 if not. Simplified code for the sub can be found in the following link
https://codeshare.io/2EM8qN
THE ACTUAL QUESTION
You may have noticed that the
print STDERR "$id\tJ\tALERT CHECKSUM MULTI HIT\t(".join(",",#gene_names).")\n";
is both before and after the solve_multihit subroutine call... and the array seems to go empty after running the function, according to the STDERR:
BBOV_I005030 J ALERT CHECKSUM MULTI-HIT (XP_001609152.1,XP_001609157.1)
BBOV_I005030 J ALERT CHECKSUM MULTI-HIT ()
BBOV_I005040 J ALERT CHECKSUM MULTI-HIT (XP_001609156.1,XP_001609153.1)
BBOV_I005040 J ALERT CHECKSUM MULTI-HIT ()
BBOV_I005050 J ALERT CHECKSUM MULTI-HIT (XP_001609154.1,XP_001609155.1)
BBOV_I005050 J ALERT CHECKSUM MULTI-HIT ()
BBOV_I005060 J ALERT CHECKSUM MULTI-HIT (XP_001609154.1,XP_001609155.1)
BBOV_I005060 J ALERT CHECKSUM MULTI-HIT ()
BBOV_I005070 J ALERT CHECKSUM MULTI-HIT (XP_001609156.1,XP_001609153.1)
BBOV_I005070 J ALERT CHECKSUM MULTI-HIT ()
BBOV_I005080 J ALERT CHECKSUM MULTI-HIT (XP_001609152.1,XP_001609157.1)
BBOV_I005080 J ALERT CHECKSUM MULTI-HIT ()
Why would that happen? I'm pretty sure I could solve it by returning the arrays along with the results of the solve_multihit{} sub, but I wonder why would it go empty.
PS: The J in the report is just a case-scenario key code.
my #gene_names = splice(shift);
my #gene_ids = splice(shift);
is short for
my #gene_names = splice(#{ shift(#_) });
my #gene_ids = splice(#{ shift(#_) });
splice(#a) empties the array and returns its contents. There's no reason to do that! The above should be
my #gene_names = #{ shift(#_) };
my #gene_ids = #{ shift(#_) };
Honestly, there's no need to make a copy of the array. Just use the provided reference.
my $gene_names = shift;
my $gene_ids = shift;
I'd provide a fixed-up version of solve_multihit, but it has numerous major problems I can't fix with the information I have.
I can see two ways for your code to accomplish the data removal that it seems to be doing.
The function arguments available in #_ are aliased to data passed to it. So if you change #_ itself (or its elements) you change the data outside of the function.
More likely, as you are passing by reference, your sub probably works directly with it
sub ff {
my ($rary) = #_;
#$rary = ();
}
my #data = 1..4;
ff(\#data);
say for #data; # empty
If your processing needs to change the array it works with then make a local copy first
sub ff {
my ($rary) = #_;
my #local_ary = #$ary;
# now changes to #local_ary do not affect #data in the caller
}
This is generally safer, while it does introduce a data copy which doesn't happen when working with the reference.
The edit together with ikegami's answer clears this up: splice is destructive to the array it works with and here by curious syntax it's fed an anonymous array formed out of a dereferenced #_ argument, whereby it changes the data in the caller.
There is no reason for splice in what you do. Its purpose is to change the array.
Instead, use arrayrefs that are passed to the sub
sub solve_multihit {
my ($id, $gene_names, $gene_ids, ...) = #_;
foreach my $name (#$gene_names) {
...
}
...
}
or make a local copy if you wish
sub solve_multihit {
my $id = shift;
my #gene_names = #{ shift #_ };
...
}
where my #gene_names is a lexical variable in this scope (the sub in your case ) and changes to it do not affect the one with the same name in the calling scope.
I have a Perl-Script, which executes a recursive function. Within it compares two elements of a 2dimensional Array:
I call the routine with a 2D-Array "#data" and "0" as a starting value. First I load the parameters into a separate 2D-Array "#test"
Then I want to see, if the array contains only one Element --> Compare if the last Element == the first. And this is where the Error occurs: Modification of non creatable array value attempted, subscript -1.
You tried to make an array value spring into existence, and the subscript was probably negative, even counting from end of the array backwards.
This didn't help me much...I'm pretty sure it has to do with the if-clause "$counter-1". But I don't know what, hope you guys can help me!
routine(#data,0);
sub routine {
my #test #(2d-Array)
my $counter = $_[-1]
for(my $c=0; $_[$c] ne $_[-1]; $c++){
for (my $j=0; $j<13;$j++){ #Each element has 13 other elements
$test[$c][$j] = $_[$c][$j];
}
}
if ($test[$counter-1][1] eq $test[-1][1]{
$puffertime = $test[$counter][4];
}
else{
for (my $l=0; $l<=$counter;$l++){
$puffertime+= $test[$l][4]
}
}
}
#
#
#
if ($puffertime <90){
if($test[$counter][8]==0){
$counter++;
routine(#test,$counter);
}
else{ return (print"false");}
}
else{return (print "true");}
Weird thing is that I tried it out this morning, and it worked. After a short time of running he again came up with this error message. Might be that I didn't catch up a error constellation, which could happen by the dynamic database-entries.
Your routine() function would be easier to read if it starts off like this:
sub routine {
my #data = #_;
my $counter = pop(#data);
my #test;
for(my $c=0; $c <= $#data; $c++){
for (my $j=0; $j<13;$j++){ #Each element has 13 other elements
$test[$c][$j] = $data[$c][$j];
}
}
You can check to see if #data only has one element by doing scalar(#data) == 1 or $#data == 0. From your code snippet, I do not see why you need to copy the data to passed to routine() to #test. Seems superfluous. You can just as well skip all this copying if you are not going to modify any of the data passed to your routine.
Your next code might look like this:
if ($#test == 0) {
$puffertime = $test[0][4];
} else {
for (my $l=0; $l <= $counter; $l++) {
$puffertime += $test[$l][4];
}
}
But if your global variable $puffertime was initialized to zero then you can replace this code with:
for (my $l=0; $l <= $counter; $l++) {
$puffertime += $test[$l][4];
}
I've a question about perl that I used to not bother about in the past, but it's bugging me now.
I have a method call saveItems which takes in a value from a text log and parses the input.
so I have this few lines in the method.
$intime = $_[1];
$timeHr = substr($intime, 0,2);
$timeMin = substr($intime, 2,2);
$timeSec = substr($intime, 5,2);
$object[$_[0]]->hr($timeHr);
$object[$_[0]]->min($timeMin);
$object[$_[0]]->sec($timeSec);
$intime being the value of the time passed into this method.
Sample of $intime: 0431:12
My question is that why does the above not give me any error but when I try to shorten the lines like so :
$object[$_[0]]->hr(substr($intime, 0,2));
$object[$_[0]]->min(substr($intime, 2,2));
$object[$_[0]]->sec(substr($intime, 5,2));
Only the first one works while the rest gives me an out of string error.
I am relatively new to perl, as you can see, but can anyone give me an answer to this?
EDIT
Sample HR:
sub hr {
my $self = shift;
if (#_) { $self->{HR} = shift }
return $self->{HR};
}
EDIT
Case Closed.. Read my answer post
From the comments above, adding .'' after each substr solved your problem. The reason for this is that the ->hr, ->min, and ->sec methods are modifying their argument in some way. Without seeing it further I can't say for certain what is happening.
The substr function returns a value that is a valid lvalue. This means that it can be assigned to. So when something in those methods assigns to the slice from substr, it is interfering with the other methods.
Appending an empty string fixes the problem by breaking the alias between the slice and the original string (stored in $intime).
If you wrote the hr, min and sec methods, you should figure out why they are modifying their arguments. Adding print "[$intime]\n"; statements between each method call should be revealing.
Can you come up with self-contained runnable code that demonstrates the problem? The problem you describe doesn't quite match up with the code you show, though I don't understand #object's role in your code.
The following works just fine:
use strict;
use warnings;
package Class;
sub new { bless {} }
sub saveItems {
my $intime = $_[1];
$_[0]->hr(substr($intime, 0,2));
$_[0]->min(substr($intime, 2,2));
$_[0]->sec(substr($intime, 5,2));
}
sub hr {
my $self = shift;
if (#_) { $self->{HR} = shift }
return $self->{HR};
}
sub min {
my $self = shift;
if (#_) { $self->{MIN} = shift }
return $self->{MIN};
}
sub sec {
my $self = shift;
if (#_) { $self->{SEC} = shift }
return $self->{SEC};
}
package main;
my $object = Class->new();
$object->saveItems( '0431:12' );
print "hr: ", $object->hr(), " min: ", $object->min(), " sec: ", $object->sec(), "\n";
This matter has been resolved.
The way of using substr as follows, are able to perform normally, without errors.
$object[$_[0]]->hr(substr($intime, 0,2));
$object[$_[0]]->min(substr($intime, 2,2));
$object[$_[0]]->sec(substr($intime, 5,2));
However, it is the log file that has trailing blank lines that got this script to fail.
Thanks to #ysth for asking me to reproduce the problem, when I realized that the problem actually lies with the log file instead of the script.
Lesson learnt: Check the codes AND the source before raising an issue