Storing a Hash of Arrays into An Array? - arrays

I have to modify some Perl scripts for a piped run and write a wrapper script to run them with a given set of input parameters. Before I can do that, I have to understand what is going in the first program. I need help deciphering this code:
# declare and initialise an empty hash
my %to_keep= ();
# an array
#line = ('some\one', 'two', 'three', 'four');
# trim the identifier
$line[0]=~s/\/[1]$//;
# store this into an array
#{$to_keep{$line[0]}{'1'}}=($line[1],$line[2]);
print #;
I'm familiar with the perl substitute function, s///. It goes:
s/text-regex_to_be_replaced/replacement/modifier.
However, I'm not too sure what the code above is doing. If I understand correctly, it replaces every occurrence of of '\' with line[1], until the end of the string (indicated by the '$/'). Is this correct?
The other part I'm unsure about is the code below the 'store' comment. I think it's storing a hash of array into an array. Can someone explain how the code works and what it prints out given the variables? Also, how can I retrieve the data I store in the array?
Bonus question: Can someone explain how modifying a perl script for a piped run works?
thanks

hmm, this is wired.
s/\/[1]$//;
would will only match and remove /1 at the end of a string. So in your example it has no influence.
#{$to_keep{$line[0]}{'1'}}=($line[1],$line[2]);
broken down, on the left side you got
$to_keep{'some\one'}{1} which is undefined in the example! But if we say it would give you the value foo, then you take this value and replaces in to #{foo} which basically means use the value foo as the array name, hence #foo.
on the right side you save the second and third element of $line as a list into this variable name, #foo.
If we ignore the undefined and try to guess the intention, you got a script that defines its own variable names. Using the first element of list as the variable name, and setting it equal to the rest of the list

Related

How to parse an array to a hash of arrays?

I'm a beginner (a wet lab biologist, who has to fiddle a bit with bioinformatics for the first time in my life) and today I've got stuck on one problem: how to parse an array to a hash of arrays in perl?
This doesn't work:
#myhash{$key} = #mytable;
I've finally circumvented my problem with a for loop:
for(my $i=0;$i<=$#mytable;$i++){$myhash{$key}[$i]=$mytable[$i]};
Of course it works and it does what I need to be done, but it seems to me not a solution to my problem, but just a way to circumvent it... When something doesn't work I like to understand why...
Thank you very much for your advice!
If you are asking how to put an array as one value of a hash, you do this by taking a reference to the array, since references are scalars and the values of hashes must be scalars. This is done with the backslash operator.
$myhash{$key} = \#mytable;
The for loop you describe creates such a reference through autovivification, as $myhash{$key}[0] creates an array reference at $myhash{$key} in order to assign to its index. Also note that the difference between taking a reference and copying each value is that in the former case, changes to the array after the fact will also affect the values referenced via the hash value, and vice versa.
$mytable[5] = 42; # $myhash{$key}[5] is also changed
As Grinnz mentioned you can save a reference to an array, but any change on the array latter will be reflected in hash (it is same data).
For example if you reuse same array in the loop then data in hash will reflect last iteration of the loop.
In such case you will want a copy of array stored in the hash.
#{$hash{$key}} = #array;
Programming Perl: Data strutures

Sorting an array of URLs

I have an array with quasar URLs stored in it
http://dr12.sdss3.org/sas/dr12/sdss/spectro/redux/26/spectra/0269/spec-0269-51581-0467.fits
http://dr12.sdss3.org/sas/dr12/sdss/spectro/redux/26/spectra/0329/spec-0329-52056-0059.fits
http://dr12.sdss3.org/sas/dr12/sdss/spectro/redux/104/spectra/2957/spec-2957-54807-0164.fits
http://dr12.sdss3.org/sas/dr12/sdss/spectro/redux/26/spectra/0342/spec-0342-51691-0089.fits
http://dr12.sdss3.org/sas/dr12/sdss/spectro/redux/26/spectra/2881/spec-2881-54502-0508.fits
http://dr12.sdss3.org/sas/dr12/sdss/spectro/redux/26/spectra/0302/spec-0302-51616-0435.fits
http://dr12.sdss3.org/sas/dr12/sdss/spectro/redux/26/spectra/2947/spec-2947-54533-0371.fits
http://dr12.sdss3.org/sas/dr12/sdss/spectro/redux/26/spectra/0301/spec-0301-51942-0460.fits
http://dr12.sdss3.org/sas/dr12/sdss/spectro/redux/104/spectra/2962/spec-2962-54774-0461.fits
http://dr12.sdss3.org/sas/dr12/sdss/spectro/redux/26/spectra/2974/spec-2974-54592-0185.fits
I want to sort out the URL array on basis of the number next to spec- and not using alphabetic order. I sorted the array with sort but it didn't help as it always took the 3rd row and 2nd last row to the top because they have a 1.
I'd like to have an output like this
http://dr12.sdss3.org/sas/dr12/sdss/spectro/redux/26/spectra/0269/spec-0269-51581-0467.fits
http://dr12.sdss3.org/sas/dr12/sdss/spectro/redux/26/spectra/0301/spec-0301-51942-0460.fits
http://dr12.sdss3.org/sas/dr12/sdss/spectro/redux/26/spectra/0302/spec-0302-51616-0435.fits
http://dr12.sdss3.org/sas/dr12/sdss/spectro/redux/26/spectra/0329/spec-0329-52056-0059.fits
http://dr12.sdss3.org/sas/dr12/sdss/spectro/redux/26/spectra/0342/spec-0342-51691-0089.fits
http://dr12.sdss3.org/sas/dr12/sdss/spectro/redux/26/spectra/2881/spec-2881-54502-0508.fits
http://dr12.sdss3.org/sas/dr12/sdss/spectro/redux/26/spectra/2947/spec-2947-54533-0371.fits
http://dr12.sdss3.org/sas/dr12/sdss/spectro/redux/104/spectra/2957/spec-2957-54807-0164.fits
http://dr12.sdss3.org/sas/dr12/sdss/spectro/redux/104/spectra/2962/spec-2962-54774-0461.fits
http://dr12.sdss3.org/sas/dr12/sdss/spectro/redux/26/spectra/2974/spec-2974-54592-0185.fits
If you will always have this pattern, you can try:
fileName = strsplit(myUrl, '/')(end)
number = strsplit(fileName(5:end), '.')(0)
Gonna walk you through this cause understanding is everything...
We start with
http://dr12.sdss3.org/sas/dr12/sdss/spectro/redux/26/spectra/0269/spec-0269-51581-0467.fits
First we split the URL on the / characters. This will return a vector of strings split up from this character. Since the number to sort on resides after the final /, we can pass end to grab the last one. Now we have
spec-0269-51581-0467.fits
Next, let's remove that pesky spec- from the number. This step isn't actually necessary, since it's constant across all the URLs, but let's just do it for fun. We can use Matlab's substring to grab the characters after the -, using fileName(5:end). This will create a string starting with the 5th character (in this case, a 0) and continue to the end. Great, now we have
0269-51581-0467.fits
Looking good! Again, this part isn't completely necessary either, but just in case for whatever reason you may need to, I've included it. We can use the strsplit function again, but this time split on the ., and grab the first element by passing a 0. Now, we have
0269-51581-0467
Go ahead and sort that little guy and you're good to go!

Concatenate string && integer as array variable of double type - MATLAB

I currently look for an advice on the below piece of code which consists of efficiently looping through a dataset (of cell type) and extracting each column as data vector.
[i,j]=size(fimat);
k=2;
while k<=j % looping through columns
[num2str(k-1),'yr']=cell2mat(fimat(:,k)); %extract each column as vector
k=k+1;
end
My matter undeniably lies in the following statement:
[num2str(k-1),'yr']
that correctly concatenates numbers (reflected by variable k) and string name 'yr'. However the syntax fails in assigning for instance (during 1st iteration)
1yr=cell2mat(fimat(:,2))
The resulting error speaks from itself
Error: An array for multiple LHS assignment cannot contain LEX_TS_STRING.
but I'm still figuring out a way to do it. Thus any feedback would be appreciated.
Thanks
First of all, in matlab, a variable name cannot start with a digit. You should modify your code such that the variable name starts with either a letter or an underscore.
For instance ['yr' num2str(k-1)] or ['_' num2str(k-1) 'yr'] would be better.
Then, what you are trying to do is very strongly discouraged by everyone, including The Mathworks. It would be much better to use a cell yr and call to yr{k} rather than iterative variable names:
yr = cell(j,1);
for k = 2:j
yr{k-1} = cell2mat(fimat(:,k));
end
Anyway, if you still want to do this, you can use eval
while k<=j
eval(['_' num2str(k-1) 'yr = cell2mat(fimat(:,k));']);
k=k+1;
end
Best,
You can not dynamically create variable names like you did. The left side of the = must be a identifier, not a char. The alternative I recommend is to use a cell array instead of individual variable names. For example:
yr{k-1}=cell2mat(fimat(:,k))
If you must use variable names with numbers, which I strongly recommend not to do, you have to use eval for the line. Alternatives which I strongly recommend to check before using eval are struct with dynamic field names and containers.Map
Here is my answer to the question, for sharing purposes. Hope it will help and Thanks to the contributors of this post.
[i,j]=size(fimat); %get dimension of dataset (of cell type)
numdata=cell2mat(fimat(1:i,2:j)); %extract only numeric from dataset
for k=1:j-1
eval(sprintf('yr%d = numdata(:,k)', k));
end

SAS: why no 'NOTE: variable is unitialized' when uninitialized variable is part of an array?

Admitted, this question is not very interesting, but since the warnings in the sas-log can be very helpful sometimes I'd like to know what is going on here.
Consider the following minimal example. In step0 we created a dataset. In step 1 we want to copy the value of some variable in step0 to step1 but we forgot the correct name of the variable (or we remember correctly but someone changed it when we were away.) I write two versions of step1 named step1a and step1b.
Data step0;
Dog = 1;
run;
Data step1a;
value = cat;
run;
Data step1b;
array animals cat;
value = animals[1];
run;
Needless to say both version of step1 produce the same dataset, in this case an empty dataset with variables 'value' and 'cat'.
However: when running step1 in the way step1a is written, the SASlog will warn us that something is wrong:
NOTE: Variable cat is uninitialized.
We can go back to our code, notice that what we think was a cat was actually a dog all along, see the error of our ways and produce the correct dataset we had in mind.
When on the other hand running step1 in the way step1b is written, the SASlog will act as if everything is perfectly fine and we can go out singing and dancing in the street only to find out years later that the value of dog is lost forever.
So the question is: why does SAS think in the second case that no warning is needed?
That's because you HAVE initialized the variable in the third example, via the array declaration. When you declare an array, any variables not already existing are initialized to Numeric missing, unless you either specify $ in the array definition (in which case they are character missing (length 8)), or you specify an initialized value.

Problems with Arrays in Perl

I am new to Perl and having some difficulty with arrays in Perl. Can somebody will explain to me as to why I am not able to print the value of an array in the script below.
$sum=();
$min = 999;
$LogEntry = '';
foreach $item (1, 2, 3, 4, 5)
{
$min = $item if $min > $item;
if ($LogEntry eq '') {
push(#sum,"1"); }
print "debugging the IF condition\n";
}
print "Array is: $sum\n";
print "Min = $min\n";
The output I get is:
debugging the IF condition
debugging the IF condition
debugging the IF condition
debugging the IF condition
debugging the IF condition
Array is:
Min = 1
Shouldn't I get Array is: 1 1 1 1 1 (5 times).
Can somebody please help?
Thanks.
You need two things:
use strict;
use warnings;
at which point the bug in your code ($sum instead of #sum) should become obvious...
$sum is not the same variable as #sum.
In this case you would benefit from starting your script with:
use strict;
use warnings;
Strict forces you to declare all variables, and warnings gives warnings..
In the meantime, change the first line to:
#sum = ();
and the second-to-last line to:
print "Array is: " . join (', ', #sum) . "\n";
See join.
As others have noted, you need to understand the way Perl uses sigils ($, #, %) to denote data structures and the access of the data in them.
You are using a scalar sigil ($), which will simply try to access a scalar variable named $sum, that has nothing to do with a completely distinct array variable named #sum - and you obviously want the latter.
What confuses you is likely the fact that, once the array variable #sum exists, you can access individual values in the array using $sum[0] syntax, but here the sigil+braces ($[]) act as a "unified" syntactic constract.
The first thing you need to do (after using strict and warnings) is to read the following documentation on sigils in Perl (aside from good Perl book):
https://stackoverflow.com/a/2732643/119280 - brian d. foy's excellent summary
The rest of the answers to the same question
This SO answer
The best summary I can give you on the syntax of accessing data structures in Perl is (quoting from my older comment)
the sigil represents the amount of data from the data structure that you are retrieving ($ of 1 element, # for a list of elements, % for entire hash)
whereas the brace style represent what your data structure is (square for array, curly for hash).
As a special case, when there are NO braces, the sigil will represent BOTH the amount of data, as well as what the data structure is.
Please note that in your specific case, it's the last bullet point that matters. Since you're referring to the array as a whole, you won't have braces, and therefore the sigil will represent the data structure type - since it's an array, you must use the # sigil.
You push the values into the array #sum, then finish up by printing the scalar $sum. #sum and $sum are two completely independent variables. If you print "#sum\n" instead, you should get the output "11111".
print "Array is: $sum\n";
will print a non-existent scalar variable called $sum, not the array #sum and not the first item of the array.
If you 'use strict' it will flag the user of un-initialized variables like this.
You should definitly add use strict; and use warnings; to your script. That would have complained about the print "Array is: $sum\n"; line (among others).
And you initialize an array with my #sum=(); not with my $sum=();
Like CFL_Jeff mentions, you can't just do a quick print. Instead, do something like:
print "Array is ".join(', ',#array);
Still would like to add some details to this picture. )
See, Perl is well-known as a Very High Level Language. And this is not just because you can replace (1,2,3,4,5) with (1..5) and get the same result.
And not because you may leave your variables without (explicitly) assigning some initial values to them: my #arr is as good as my #arr = (), and my $scal (instead of my $scal = 'some filler value') may actually save you an hour or two one day. Perl is usually (with use warnings, yes) good at spotting undefined values in unusual places - but not so lucky with 'filler values'...
The true point of VHLL is that, in my opinion, you can express a solution in Perl code just like in any human language available (and some may be even less suitable for that case).
Don't believe me? Ok, check your code - or rather your set of tasks, for example.
Need to find the lowest element in a array? Or a sum of all values in array? List::Util module is to your command:
use List::Util qw( min sum );
my #array_of_values = (1..10);
my $min_value = min( #array_of_values );
my $sum_of_values = sum( #array_of_values );
say "In the beginning was... #array_of_values";
say "With lowest point at $min_value";
say "Collected they give $sum_of_values";
Need to construct an array from another array, filtering out unneeded values? grep is here to save the day:
#filtered_array = grep { $filter_condition } #source_array;
See the pattern? Don't try to code your solution into some machine-codish mumbo-jumbo. ) Find a solution in your own language, then just find means to translate THAT solution into Perl code instead. It's easier than you thought. )
Disclaimer: I do understand that reinventing the wheel may be good for learning why wheels are so useful at first place. ) But I do see how often wheels are reimplemented - becoming uglier and slower in process - in production code, just because people got used to this mode of thinking.

Resources