Getting exact match in a has but with a twist - arrays

I have something I cannot get my head around
Let's say I have a phone list used for receiving and dialing out stored like below. The from and to location is specified as well.
Country1 Country2 number1 number2
USA_Chicago USA_LA 12 14
AUS_Sydney USA_Chicago 19 15
AUS_Sydney USA_Chicago 22 21
CHI_Hong-Kong RSA_Joburg 72 23
USA_LA USA_Chigaco 93 27
Now all I want to do is to remove all the duplicates and give only what is relevant to the countries as keys and each number that is assigned to it in a pair, but the pair needs to be bi-directional.
In other words I need to get results back and then print them like this.
USA_Chicago-USA_LA 27 93 12 14
Aus_Sydney-USA_Chicago 19 15 22 21
CHI_Hong-kong-RSA_Joburg 72 23
I have tried many methods including a normal hash table and the results seem fine, but it does not do the bi-direction, so I will get this instead.
USA_Chicago-USA_LA 12 14
Aus_Sydney-USA_Chicago 19 15 22 21
CHI_Hong-kong-RSA_Joburg 72 23
USA_LA-USA_Chicago 93 27
So the duplicate removal works in one way, but because there is another direction, it will not remove the duplicate "USA_LA-USA_Chicago" which already exists as "USA_Chicago-USA_LA" and will store the same numbers under a swopped name.
The hash table I tried last is something like this. (not exactly as I trashed the lot and had to rewrite it for this post)
#input= ("USA_Chicago USA_LA 12 14" ,
"AUS_Sydney USA_Chicago 19 15" ,
"AUS_Sydney USA_Chicago 22 21" ,
"CHI_Hong-Kong RSA_Joburg 72 23" '
"USA_LA USA_Chigaco 93 27");
my %hash;
for my $line (#input) {
my ($c1, $c2, $n1, $n2) = split / [\s\|]+ /x, $line6;
my $arr = $hash{$c1} ||= [];
push #$arr, "$n1 $n2";
}
for my $c1 (sort keys %hash) {
my $arr = $hash{$c1};
my $vals = join " : ", #$arr;
print "$c1 $vals\n";
}
So all if A-B exists and so does B-A, use only one but assign the values from the key being removed, to the remaining key. I basically need to do is get rid of any duplicate key in any direction, but assign the values for to the remaining key. So A-B and B-A would be considered a duplicate, but A-C and B-C are not. -_-

Simply normalise the destinations. I chose to sort them.
use strictures;
use Hash::MultiKey qw();
my #input = (
'USA_Chicago USA_LA 12 14',
'AUS_Sydney USA_Chicago 19 15',
'AUS_Sydney USA_Chicago 22 21',
'CHI_Hong-Kong RSA_Joburg 72 23',
'USA_LA USA_Chicago 93 27'
);
tie my %hash, 'Hash::MultiKey';
for my $line (#input) {
my ($c1, $c2, $n1, $n2) = split / [\s\|]+ /x, $line;
my %map = ($c1 => $n1, $c2 => $n2);
push #{ $hash{[sort keys %map]} }, #map{sort keys %map};
}
__END__
(
['CHI_Hong-Kong', 'RSA_Joburg'] => [72, 23],
['AUS_Sydney', 'USA_Chicago'] => [19, 15, 22, 21],
['USA_Chicago', 'USA_LA'] => [12, 14, 27, 93],
)

Perl is great for creating complex data structures but learning to use them effectively takes practices.
Try:
#!/usr/bin/env perl
use strict;
use warnings;
# --------------------------------------
use charnames qw( :full :short );
use English qw( -no_match_vars ); # Avoids regex performance penalty
use Data::Dumper;
# Make Data::Dumper pretty
$Data::Dumper::Sortkeys = 1;
$Data::Dumper::Indent = 1;
# Set maximum depth for Data::Dumper, zero means unlimited
local $Data::Dumper::Maxdepth = 0;
# conditional compile DEBUGging statements
# See http://lookatperl.blogspot.ca/2013/07/a-look-at-conditional-compiling-of.html
use constant DEBUG => $ENV{DEBUG};
# --------------------------------------
# skip the column headers
<DATA>;
my %bidirectional = ();
while( my $line = <DATA> ){
chomp $line;
my ( $country1, $country2, $number1, $number2 ) = split ' ', $line;
push #{ $bidirectional{ $country1 }{ $country2 } }, [ $number1, $number2 ];
push #{ $bidirectional{ $country2 }{ $country1 } }, [ $number1, $number2 ];
}
print Dumper \%bidirectional;
__DATA__
Country1 Country2 number1 number2
USA_Chicago USA_LA 12 14
AUS_Sydney USA_Chicago 19 15
AUS_Sydney USA_Chicago 22 21
CHI_Hong-Kong RSA_Joburg 72 23
USA_LA USA_Chicago 93 27

Related

Use of uninitialized value within #spl in substitution (s///)

I am getting following error while running the script.
Use of uninitialized value in print at PreProcess.pl line 137.
Use of uninitialized value within #spl in substitution (s///) at PreProcess.pl line 137.
Is there any syntax error in the script?
(Running it in Windows - Strawberry 64 last version)
my $Dat=2;
my $a = 7;
foreach (#spl) {
if ( $_ =~ $NameInstru ) {
print $spl[$Dat] =~ s/-/\./gr, " 00:00; ",$spl[$a],"\n"; # data
$Dat += 87;
$a += 87;
}
}
inside of array i hve this type of data
"U.S. DOLLAR INDEX - ICE FUTURES U.S."
150113
2015-01-13
098662
ICUS
01
098
128104
14111
88637
505
13200
50
269
43140
34142
1862
37355
482
180
110623
126128
17480
1976
1081
-3699
8571
-120
646
50
248
1581
-8006
319
2093
31
-30
1039
1063
42
18
100.0
11.0
69.2
0.4
10.3
0.0
0.2
33.7
26.7
1.5
29.2
0.4
0.1
86.4
98.5
13.6
1.5
215
7
.
.
16
.
.
50
16
8
116
6
4
197
34
28.6
85.1
41.3
91.3
28.2
85.1
40.8
91.2
"(U.S. DOLLAR INDEX X $1000)"
"098662"
"ICUS"
"098"
"F90"
"Combined"
"U.S. DOLLAR INDEX - ICE FUTURES U.S."
150106
2015-01-06
098662
ICUS
01
098
127023
17810
80066
625
12554
0
21
41559
42148
1544
35262
452
210
109585
125065
17438
1958
19675
486
23911
49
2717
0
-73
9262
-5037
30
5873
270
95
18439
19245
1237
431
100.0
14.0
63.0
0.5
9.9
0.0
0.0
32.7
33.2
1.2
27.8
0.4
0.2
86.3
98.5
13.7
1.5
202
7
.
.
16
0
.
48
16
9
105
6
4
185
34
29.3
83.2
43.2
90.6
28.9
83.2
42.8
90.5
"(U.S. DOLLAR INDEX X $1000)"
"098662"
"ICUS"
"098"
"F90"
"Combined"
You are probably trying to load a file of data sets of a size of 87 lines each into an array, and then you get an error at the end of your data, when you try to read outside of the last array index.
You can probably solve it by iterating over the array indexes instead of the array values, e.g.
my $Dat = 2;
my $a = 7;
my $set_size = 87;
for (my $n = 0; $n + $a < #spl; $n += $set_size) {
if ( $spl[$n] =~ $NameInstru ) {
print $spl[$n + $Dat] =~ s/-/\./gr, " 00:00; ",$spl[$n + $a],"\n"; # data
}
}
While this might solve your problem, it might be better to try and find a proper way to parse your file.
If the records inside the input file are separated by a blank line, you can try to read whole records at once by changing the input record separator to "" or "\n\n". Then you can split each element in the resulting array on newline \n and get an entire record as a result. For example:
$/ = "";
my #spl;
open my $fh ...
while (<$fh>) {
push #spl, [ split "\n", $_ ];
}
...
for my $record (#spl) {
# #$record is now an 87 element array with each record in the file
}
TLP's solution of iterating over the indexes of an array, incrementing by 87 at time is great.
Here's a more complex solution, but one that doesn't require loading the entire file into memory.
my $lines_per_row = 87;
my #row;
while (<>) {
chomp;
push #row, $_;
if (#row == $lines_per_row) {
my ($instru, $dat, $a) = #row[0, 2, 7];
if ($instru =~ $NameInstru) {
print $dat =~ s/-/\./gr, " 00:00; $a\n";
}
#row = ();
}
}

Powershell script to break up list into multiple arrays

I am very new to powershell I have a code a co-worker helped me build. It works on a small set of data. However, I am sending this to a SAP business objects query and that will only accept about 2000 pieces of data. Each month the amount of data I have to run will vary but is usually around 7000-8000 items. I need help to update my script to run through the list of data create an array, add 2000 items to it and then create a new array with the next 2000 items, etc until it reaches the end of the list.
$source = "{0}\{1}" -f $ENV:UserProfile, "Documents\Test\DataSD.xls"
$WorkbookSource = $Excel.Workbooks.Open("$source")
$WorkSheetSource = $WorkbookSource.WorkSheets.Item(1)
$WorkSheetSource.Activate()
$row = [int]2
$docArray = #()
$docArray.Clear() |Out-Null
Do
{
$worksheetSource.cells.item($row, 1).select() | Out-Null
$docArray += #($worksheetSource.cells.item($row, 1).value())
$row++
}
While ($worksheetSource.cells.item($row,1).value() -ne $null)
So for this example I would need the script to create 4 separate arrays. The first 3 would have 2000 items in them and the last would have 1200 items in it.
for this to work, you will need to export the data to a CSV or otherwise extract it to a collection that holds all the items. using something like the StreamReader stuff would probably allow for faster processing, but i have never worked with it. [blush]
once the $CurBatch is generated, you can feed that into whatever process you want.
$InboundCollection = 1..100
$ProcessLimit = 22
# the "- 1" is to correct for "starts at zero"
$ProcessLimit = $ProcessLimit - 1
$BatchCount = [math]::Floor($InboundCollection.Count / $ProcessLimit)
#$End = 0
foreach ($BC_Item in 0..$BatchCount)
{
if ($BC_Item -eq 0)
{
$Start = 0
}
else
{
$Start = $End + 1
}
$End = $Start + $ProcessLimit
# powershell will happily slice past the end of an array
$CurBatch = $InboundCollection[$Start..$End]
''
$Start
$End
# the 1st item is not the _number in $Start_
# it's the number in the array # "[$Start]"
"$CurBatch"
}
output ...
0
21
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
22
43
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
44
65
45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66
66
87
67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88
88
109
89 90 91 92 93 94 95 96 97 98 99 100
To do this, there are a number of options.
You can read in everything from the Excel file in one large array and split that afterwards in smaller chunks
or you can add the Excel file values in separate arrays while reading.
The code below does just that.
In any case, it is up to you when you would like to actually send the data.
process each array immediately (send it to a SAP business objects
query) while reading from Excel
add it to a Hashtable so you keep all arrays together in memory
store it on disk for later use
In the code below, I choose the second option to read in the data in a number of arrays and keep these in memory in a hashTable.
The advantage is that you do not need to interrupt the reading of the Excel data like with option 1. and there is no need to create and re-read 'in-between' files as with option 3.
$source = Join-Path -Path $ENV:UserProfile -ChildPath "Documents\Test\DataSD.xls"
$maxArraySize = 2000
$Excel = New-Object -ComObject Excel.Application
# It would speed up things considerably if you set $Excel.Visible = $false
$WorkBook = $Excel.Workbooks.Open($source)
$WorkSheet = $WorkBook.WorkSheets.Item(1)
$WorkSheet.Activate()
# Create a Hashtable object to store each array under its own key
# I don't know if you need to keep the order of things later,
# but it maybe best to use an '[ordered]' hash here.
# If you are using PowerShell version below 3.0. you need to create it using
# $hash = New-Object System.Collections.Specialized.OrderedDictionary
$hash = [ordered]#{}
# Create an ArrayList for better performance
$list = New-Object System.Collections.ArrayList
# Initiate a counter to use as Key in the Hashtable
$arrayCount = 0
# and maybe a counter for the total number of items to process?
$totalCount = 0
# Start reading the Excel data. Begin at row $row
$row = 2
do {
$list.Clear()
# Add the values of column 1 to the arraylist, but keep track of the maximum size
while ($WorkSheet.Cells.Item($row, 1).Value() -ne $null -and $list.Count -lt $maxArraySize) {
[void]$list.Add($WorkSheet.Cells.Item($row, 1).Value())
$row++
}
if ($list.Count) {
# Store this array in the Hashtable using the $arrayCount as Key.
$hash.Add($arrayCount.ToString(), $list.ToArray())
# Increment the $arrayCount variable for the next iteration
$arrayCount++
# Update the total items counter
$totalCount += $list.Count
}
} while ($list.Count)
# You're done reading Excel data, so close it and release Com objects from memory
$Excel.Close()
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($WorkSheet) | Out-Null
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($WorkBook) | Out-Null
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($Excel) | Out-Null
[System.GC]::Collect()
[System.GC]::WaitForPendingFinalizers()
# At this point you should have all arrays stored in the hash to process
Write-Host "Processing $($hash.Count) arrays with a total of $totalCount items"
foreach ($key in $hash.Keys) {
# Send each array to a SAP business objects query separately
# The array itself is at $hash.$key or use $hash[$key]
}
This is not 100% but i will fine tune it a bit later today:
$docarray = #{}
$values = #()
$i = 0
$y = 0
for ($x = 0; $x -le 100; $x++) {
if ($i -eq 20) {
$docarray.add($y, $values)
$y++
$i=0
$values = #()
}
$values += $x
$i++
}
$docarray.add($y, $values) ## required
$docarray | Format-List
If the limit is 2000 then you would set the if call to trigger at 2000. The results of this will be a hash table of x amount:
Name : 4
Value : {80, 81, 82, 83...}
Name : 3
Value : {60, 61, 62, 63...}
Name : 2
Value : {40, 41, 42, 43...}
Name : 1
Value : {20, 21, 22, 23...}
Name : 0
Value : {0, 1, 2, 3...}
Whereby each name in the hash array has x amount of values represented by the $i iterator on the if statement.
You should then be able to send this to your SAP business objects query by using a foreach loop with the values for each item in the hash array:
foreach ($item in $docarray) {
$item.Values
}

add each column to array, not just whole line - PERL

I am writing a perl script and currently working on a subroutine to sum all values of an array. Currently, my code only reads in each line and stores the entire line into each array element. I need each individual number stored in it's own element.
Here's a sample of my data:
50 71 55 93 115
45 76 49 88 102
59 78 53 96 145
33 65 39 82 100
54 77 56 98 158
Here's my code:
my #array;
#bring in each line and store into array called 'array'
open(my $fh, "<", "score")
or die "Failed to open file: $!\n";
while(<$fh>) {
chomp;
push #array, $_;
}
close $fh;
When I call my subroutine to sum the values of the array, my result is 241. That is the sum of each of the first numbers in each line.
Any help or suggestions?
So, you want to add all values inside an array. Easy, But In your code, you are adding strings of values instead of value itself.
With push #array, $_; you are creating an array of lines in the file score.
Try:
print Dumper(\#array);
You will see output like this:
$VAR1 = [
'50 71 55 93 115',
'45 76 49 88 102',
'59 78 53 96 145',
'33 65 39 82 100',
'54 77 56 98 158'
];
So when you are adding the values, it adds all elements of array:
'50 71 55 93 115' + '59 78 53 96 145' + '33 65 39 82 100' ......and so on
The moment you put + with string it is treated as numeric and by default, perl adds first character in the string to the first character in the other string. If the first character is not a number, It is treated as 0.
You should check perlop for more info.
The solution for this problem is to separate the numbers from every line, treat each of them individually and store them inside the array. This can be done simply using:
push #array, split;
Now when you try:
print Dumper(\#array);
It will be like this:
$VAR1 = [
'50',
'71',
'55',
'93',
'115',
'45',
'76',
'49',
'88',
'102',
'59',
'78',
'53',
'96',
'145',
'33',
'65',
'39',
'82',
'100',
'54',
'77',
'56',
'98',
'158'
];
After this just call your subroutine using:
my $total_sum = sum(#array);
print $total_sum,"\n";
and define your subroutine as:
sub sum {
my #nums = #_;
my $total_sum = 0;
$total_sum += $_ foreach(#nums);
return $total_sum;
}
The output will be 1937 as expected.

Array manipulation in Perl

The Scenario is as follows:
I have a dynamically changing text file which I'm passing to a variable to capture a pattern that occurs throughout the file. It looks something like this:
my #array1;
my $file = `cat <file_name>.txt`;
if (#array1 = ( $file =~ m/<pattern_match>/g) ) {
print "#array1\n";
}
The array looks something like this:
10:38:49 788 56 51 56 61 56 59 56 51 56 80 56 83 56 50 45 42 45 50 45 50 45 43 45 54 10:38:51 788 56 51 56 61 56 59 56 51 56 80 56 83 56 50 45 42 45 50 45 50 45 43 45 54
From the above array1 output, the pattern of the array is something like this:
T1 P1 t1(1) t1(2)...t1(25) T2 P2 t2(1) t2(2)...t2(25) so on and so forth
Currently, /g in the regex returns a set of values that occur only twice (only because the txt file contains this pattern that number of times). This particular pattern occurrence will change depending on the file name that I plan to pass dynamically.
What I intend to acheive:
The final result should be a csv file that contains these values in the following format:
T1,P1,t1(1),t1(2),...,t1(25)
T2,P2,t2(1),t2(2),...,t2(25)
so on and so forth
For instance: My final CSV file should look like this:
10:38:49,788,56,51,56,61,56,59,56,51,56,80,56,83,56,50,45,42,45,50,45,50,45,43,45,54
10:38:51,788,56,51,56,61,56,59,56,51,56,80,56,83,56,50,45,42,45,50,45,50,45,43,45,54
The delimiter for this pattern is T1 which is time in the format \d\d:\d\d:\d\d
Example: 10:38:49, 10:38:51 etc
What I have tried so far:
use Data::Dumper;
use List::MoreUtils qw(part);
my $partitions = 2;
my $i = 0;
print Dumper part {$partitions * $i++ / #array1} #array1;
In this particular case, my $partitions = 2; holds good since the pattern occurrence in the txt file is only twice, and hence, I'm splitting the array into two. However, as mentioned earlier, the pattern occurrence number keeps changing according to the txt file I use.
The Question:
How can I make this code more generic to achieve my final goal of splitting the array into multiple equal sized arrays without losing the contents of the original array, and then converting these mini-arrays into one single CSV file?
If there is any other workaround for this other than array manipulation, please do let me know.
Thanks in advance.
PS: I considered Hash of Hashes and Array of Hashes, but that kind of a data structure did not seem to be healthy solution for the problem I'm facing right now.
As far as I can tell, all you need is splice, which will work fine as long as you know the record size and it's constant
The data you showed has 52 fields, but the description of it requires 27 fields per record. It looks like each line has T, P, and t1 .. t24, rather than ending at t25
Here's how it looks if I split the data into 26-element chunks
use strict;
use warnings 'all';
my #data = qw/
10:38:49 788 56 51 56 61 56 59 56 51 56 80 56 83 56 50 45 42 45 50 45 50 45 43 45 54 10:38:51 788 56 51 56 61 56 59 56 51 56 80 56 83 56 50 45 42 45 50 45 50 45 43 45 54
/;
while ( #data ) {
my #set = splice #data, 0, 26;
print join(',', #set), "\n";
}
output
10:38:49,788,56,51,56,61,56,59,56,51,56,80,56,83,56,50,45,42,45,50,45,50,45,43,45,54
10:38:51,788,56,51,56,61,56,59,56,51,56,80,56,83,56,50,45,42,45,50,45,50,45,43,45,54
If you wanted to use List::MoreUtils instead of splice, the the natatime function returns an iterator that will do the same thing as the splice above
Like this
use List::MoreUtils qw/ natatime /;
my $iter = natatime 26, #data;
while ( my #set = $iter->() ) {
print join(',', #set), "\n";
}
The output is identical to that of the program above
Note
It is very wrong to start a new shell process just to use cat to read a file. The standard method is to undefine the input record separator $/ like this
my $file = do {
open my $fh, '<', '<file_name>.txt' or die "Unable to open file for input: $!";
local $/;
<$fh>;
};
Or if you prefer you could use File::Slurper like this
use File::Slurper qw/ read_binary /;
my $file = read_binary '<file_name>.txt';
although you will probably have to install it as it is not a core module

How could I print a #slice array elements in Perl?

I have this part of code to catch the greater value of an array immersed in a Hash. When Perl identified the biggest value the array is removed by #slice array:
if ( max(map $_->[1], #$val)){
my #slice = (#$val[1]);
my #ignored = #slice;
delete(#$val[1]);
print "$key\t #ignored\n";
warn Dumper \#slice;
}
Data:Dumper out:
$VAR1 = [
[
'3420',
'3446',
'13',
'39',
55
]
];
I want to print those information separated by tabs (\t) in one line like this list:
miRNA127 dvex589433 - 131 154
miRNA154 dvex546562 + 232 259
miRNA154 dvex573491 + 297 324
miRNA154 dvex648254 + 147 172
miRNA154 dvex648254 + 287 272
miRNA32 dvex320240 - 61 83
miRNA32 dvex623745 - 141 163
miRNA79 dvex219016 + ARRAY(0x100840378)
But in the last line always obtain this result.
How could I generate this output?:
miRNA127 dvex589433 - 131 154
miRNA154 dvex546562 + 232 259
miRNA154 dvex573491 + 297 324
miRNA154 dvex648254 + 147 172
miRNA154 dvex648254 + 287 272
miRNA32 dvex320240 - 61 83
miRNA32 dvex623745 - 141 163
miRNA79 dvex219016 + 3420 3446
Additional explication:
In this case, I want to catch the highest value in $VAR->[1] and looking if the subtraction with the minimum in $VAR->[0] is <= to 55. If not, i need to eliminate this AoA (the highest value) and fill a #ignored array with it. Next, i want to print some values of #ignored, like a list. Next, with the resultants AoA, I want to iterate the last flow...
print "$key\t $ignored[0]->[0]\t$ignored[0]->[1]";
You have an array of arrays, so each element of #ignored is an array. The notation $ignored[0] gets to the zeroth element (which is an array), and ->[0] and ->[1] retrieves the zeroth and first elements of that array.
For example:
use strict;
use warnings;
use Data::Dumper;
my #ignored;
$ignored[0] = [ '3420', '3446', '13', '39', 55 ];
my $key = 'miRNA79 dvex219016 +';
print Dumper \#ignored;
print "\n";
print "$key\t$ignored[0]->[0]\t$ignored[0]->[1]";
Output:
$VAR1 = [
[
'3420',
'3446',
'13',
'39',
55
]
];
miRNA79 dvex219016 + 3420 3446
Another option that generates the same output is to join all the values with a \t:
print join "\t", $key, #{ $ignored[0] }[ 0, 1 ];

Resources