Transform/pivot array in perl - arrays

Im stuck writing Perl code which transforms 2d array.
First column of the array is always date
Second column of the array is key that sorts.
Data is located in array "data" and is ordered by date and then key.
My situation should be understandable from the tables under. Unique values from the second column will be selected and later divided into columns header (green table)
It should work with and number of columns or dates/keys.
Structure before
Structure after
My code:
#creates filtered array of all the unique dates and its count
my #date = #{ $data->[0] };
my #filtDate = uniq #date;
my $countFiltDate = scalar #filtDate;
#unique list of keys
my #klic = #{ $data->[1] };
my #filtKlic = uniq #klic;
#orders filtered keys
#filtKlic = sort #filtKlic;
my $countFiltKlic = scalar #filtKlic;
#count of columns
my $columnsCount = scalar #{$data};
#test code - counts how many new number columns to make.
my $columnsCountAfter = ( $columnsCount - 2 ) * $countFiltKlic;
#inserst filtered dates into first column
my $dataGraph;
for ( my $i = 0; $i < $countFiltDate; $i++ ) {
$dataGraph->[0]->[$i] = #filtDate[$i];
}
#biggest loop with number of dates
for ( my $k = 0; $k < $countFiltDate; $k++ ) {
my $l;
my $c;
#columns sount k $i
for ( my $i = 0; $i < $columnsCount - 2; $i++ ) {
#loop for different keys k $j
for ( my $j = 0; $j < $countFiltKlic; $j++ ) {
$l++; #riadok v prvej tabulke
#EVERYTHING after this part is written horibly.
# I'm trying to make it work even
#if key values are missing.
for ( my $m = 0; $m < 5; $m++ ) {
if ( $data->[1]->[ $l - 1 ] eq $filtKlic[$j] ) {
print " [" . $data->[1]->[ ( $l - 1 ) ] . ',' . $filtKlic[$j] . "]";
$dataGraph->[ $l + $c ]->[$k] = $data->[ $i + 2 ]->[ ( ( $k * $countFiltKlic ) + $j ) ];
#print " [".$data->[1]->[$j].','.($filtKlic[($j)])."]-";
print " [" . ( $i + 2 ) . ',' . ( ( $k * $countFiltKlic ) + $j ) . "]-";
print " [" . ( $l + $c ) . ',' . $k . "]<br>";
$m = 5; #just random number... i don't want to get infinite loops during testing
} else {
if ( $m = 5 ) {
$l--;
$c++;
}
$j++;
}
}
}
}
}
my #nameStlpceKlic;
#nameStlpceKlic[0] = "date";
my $o;
for ( my $i = 0; $i < $columnsCount - 2; $i++ ) {
foreach (#filtKlic) {
my $o;
$o++;
#nameStlpceKlic[$o] = #filtKlic[ ( $o - 1 ) ] . $i;
}
}
I have 2 problems.
How to make sure that this will work even if some of the key are missing at some dates.
How to write it properly. My code is too clumsy.

Here is my general approach for solving this kind of problem.
In the second table, you're grouping your data by the date, then displaying the values for number1 and the values for number2. This should give you a hint as to how you want to organise your data structure and what you need to index for printing.
Your current data is (I assume) stored in an array of arrays. I was too lazy to copy the values, so I made my own AoA with made up values. I've sprinkled comments through the code so you can see how I worked on this.
my $arr = [
['date','key','number1','number2'],
['22.12.2013','1a','1a1-34567','1a2-1234567'],
['22.12.2013','2b','2b1-3249871','2b2-4597134'],
['22.12.2013','3c','3c1-1234567',''],
['22.12.2013','4d','4c1-3249871','4c2-4597134'],
['22.13.2013','1a','1a1-34567','1a2-1234567'],
['22.13.2013','2b','','2b2-4597134'],
['22.13.2013','3c','3c1-1234567','3c2-1234567'],
['22.13.2013','4d','4c1-3249871','4c2-4597134'],
];
# first, remove the first row, which contains the column headers.
my $col_h = shift #$arr;
my $data;
my $key_list;
foreach (#$arr) {
my %hash;
# use a hash slice with the column header array as keys
# and the array as the values
#hash{#$col_h} = #$_;
# store this hash in a data hash indexed by date then key
$data->{ $hash{date} }{ $hash{key} } = \%hash;
# compile a separate hash with the keys in it
$key_list->{ $hash{key} }++;
}
# make a sorted list of keys, ready for printing
my #key_list = sort keys %$key_list;
# remove the first two elements from the column headers ('date' and 'key')
splice(#$col_h, 0, 2);
# print out the header row for the table (I'm doing a simple tab-delim'd table)
print STDERR "Date\t";
# for each NUMBER from NUMBER1 ... NUMBERn
foreach my $c (#$col_h) {
# print "keyID NUMBERn"
map { print STDERR "$_ $c\t" } #key_list;
}
print STDERR "\n";
# Now print out the data itself. Sort by date...
foreach my $date (sort keys %$data) {
print STDERR "$date\t";
# for each NUMBER1 ... NUMBERn
foreach my $header (#$col_h) {
foreach my $key (#key_list) {
## print out the value OR - if there is no value
print STDERR ( $data->{$date}{$key}{$header} || "-" ) . "\t";
}
}
print STDERR "\n"; # end of the table row
}
Output (with tabs expanded for display purposes):
Date 1a number1 2b number1 3c number1 4d number1 1a number2 2b number2 3c number2 4d number2
22.12.2013 1a1-34567 2b1-3249871 3c1-1234567 4c1-3249871 1a2-1234567 2b2-4597134 - 4c2-4597134
22.13.2013 1a1-34567 - 3c1-1234567 4c1-3249871 1a2-1234567 2b2-4597134 3c2-1234567 4c2-4597134

I was able to put together code that works using great answer from "i alarmed alien" .
First thing that is different is that my data are formatted as array of arrays in transposed way.
$arr1 = [ '2013-12-22', '2013-12-22' ];
$arr2 = [ 'Number1','Number2'];
$arr3 = [ '2328942', '679204'];
$arr4 = [ '1450398', '436713'];
Also transformed data should be saved in an array. I've written this piece of code. ( It's far from perfect, if there are any suggestions how to improve it further I'd be happy to hear those.)
####################
#transpose data
my $datas = $args{DATA};
my $headers = $args{HEADERS};
my #rows = ();
my #transposed = ();
for my $row (#$datas) {
for my $column (0 .. $#{$row}) {
push(#{$transposed[$column]}, $row->[$column]);
}
}
#################################
my #arr = #transposed;
# first, define headers.
my $col_h = $args{HEADERS};
my $data;
my $key_list;
foreach (#arr) {
my %hash;
# use a hash slice with the column header array as keys
# and the array as the values
#hash{#$col_h} = #$_;
# store this hash in a data hash indexed by date then key
$data->{ $hash{date} }{ $hash{key} } = \%hash;
# compile a separate hash with the keys in it
$key_list->{ $hash{key} }++;
}
# make a sorted list of keys, ready for printing
my #key_list = sort keys %$key_list;
# remove the first two elements from the column headers ('date' and 'key')
splice(#$col_h, 0, 2);
my #output;
my #header;
# print out the header row for the table (I'm doing a simple tab-delim'd table)
#print STDERR "Date\t";
push(#header, "Date\t");
# for each NUMBER from NUMBER1 ... NUMBERn
foreach my $c (#$col_h) {
# print "keyID NUMBERn"
map { push (#header,"$_ $c\t" )} #key_list;
#map { print STDERR "$_ $c\t" } #key_list;
}
#print STDERR "<br>";
push (#output,\#header );
my $row;
my $column;
# Now print out the data itself. Sort by date...
foreach my $date (sort keys %$data) {
#print STDERR "$date\t";
$row++;
my #line;
push(#line, "$date");
# for each NUMBER1 ... NUMBERn
foreach my $header (#$col_h) {
foreach my $key (#key_list) {
## print out the value OR - if there is no value
$column++;
push (#line,( $data->{$date}{$key}{$header} || "-" ) . "\t");
#print STDERR ( $data->{$date}{$key}{$header} || "-" ) . "\t";
}
}
print STDERR "<br>"; # end of the table row
$column = 0;
push (#output,\#line );
}
my $x = 1;
return #output;
}
This code works but it's little ugly. Please let me know If there is cleaner/better way to do this.

Related

Is there a built in Perl Function for finding duplicate subarrays(exact order) in an array?

Lets say the array is (1,2,3,4,5,6,7,8,9),
Another subarray is (2,3,4)
Is there a function to check if the subarray pattern(full exact order) exists within array?
In this case, it would return any indicator(index) that shows it exists.
Also would need to work for duplicates if there are multiple subarrays existing in the array like (4,2,3,4,2,3,4).
If it happens to match multiple times for example:
Array = (2,3,2,3,2,2,3,2)
Sub Array = (2,3,2)
Would just return starting index of matches in order: 0,2,5
Or if it removes, would result in (3,2)
Edit: Elements don't have to be num
There's no built-in method, but it's easy to write:
#!/usr/bin/env perl
use warnings;
use strict;
use feature qw/say/;
# Takes two arrayrefs of numbers.
#
# Returns the first index in the first one where the second list appears, or
# -1 if not found.
sub find_sublist(++) {
my ($haystack, $needle) = #_;
my $nlen = #$needle;
my $hlen = #$haystack;
return -1 if $hlen == 0 || $nlen == 0;
HAYSTACK_POS:
for (my $n = 0; $n <= $hlen - $nlen; $n++) {
for (my $m = 0; $m < $nlen; $m++) {
if ($haystack->[$n + $m] != $needle->[$m]) {
next HAYSTACK_POS;
}
}
return $n;
}
return -1;
}
# Takes two arrayrefs of numbers.
#
# Returns a list of the starting indexes of the first list
# of every run of the second list. Returns an empty list if
# there are no matches.
sub find_sublists(++) {
my ($haystack, $needle) = #_;
my $nlen = #$needle;
my $hlen = #$haystack;
my #positions;
return #positions if $hlen == 0 || $nlen == 0;
HAYSTACK_POS:
for (my $n = 0; $n <= $hlen - $nlen; $n++) {
for (my $m = 0; $m < $nlen; $m++) {
if ($haystack->[$n + $m] != $needle->[$m]) {
next HAYSTACK_POS;
}
}
push #positions, $n;
}
return #positions;
}
# Takes two arrayrefs of numbers.
#
# Returns a new list that is the first one with every non-overlapping run of
# the second second list removed.
sub remove_sublists(++) {
my #haystack = #{$_[0]};
my $needle = $_[1];
while ((my $pos = find_sublist #haystack, $needle) != -1) {
splice #haystack, $pos, #$needle;
}
return #haystack;
}
my #list1 = (1,2,3,4,5,6,7,8,9);
my #list2 = (4,2,3,4,2,3,4);
my #list3 = (2,3,2,3,2,2,3,2);
say find_sublist(#list1, [2, 3, 4]); # Returns 1
say find_sublist([2,9,3,4], [2,3,4]); # Returns -1
my #positions = find_sublists(#list2, [2,3,4]); # 1,4
say join(",", #positions);
#positions = find_sublists(#list3, [2,3,2]); # 0,2,5
say join(",", #positions);
say join(",", remove_sublists(#list1, [2,3,4])); # 1,5,6,7,8,9
say join(",", remove_sublists(#list3, [2,3,2])); # 3,2
If the inputs are numbers representable by your perl's integers (as shown), you can use
# Indexes
my $pattern = pack "W*", #pattern;
my $array = pack "W*", #array;
my #indexes;
push #indexes, $-[0] while $array =~ /\Q$pattern/g;
# Removal
my $pattern = pack "W*", #pattern;
my $array = pack "W*", #array;
$array =~ s/\Q$pattern//g;
#array = unpack "W*", $array;
How it handles overlaps:
/---\ /---\ Removed
2,3,2 from 2,3,2,3,2,2,3,2
\---/ Not removed
Note that this also works if you can map the inputs to numbers.
my ( %map_f, #map_r );
for ( #array, #pattern ) {
if ( !exists{ $map{ $_ } } ) {
$map_f{ $_ } = #map_r;
push #map_r, $_;
}
}
my $pattern = pack "W*", #map_f{ #pattern };
my $array = pack "W*", #map_f{ #array };
$array =~ s/\Q$pattern//g;
#array = #map_r[ unpack "W*", $array ];
It's not the best algorithm, but it should be very fast by moving the work from Perl to the regex engine.

Add all values in array for each ID in Perl

I have this table:
NAME |12/31/2016|VALUE
AAA |1/31/2017 |10
AAA |2/1/2017 |20
AAA |2/2/2017 |30
AAA |2/3/2017 |40
AAA |2/4/2017 |50
NAME |2/9/2017 |VALUE
BBB |2/10/2017 |20
BBB |2/11/2017 |30
BBB |2/12/2017 |40
BBB |2/13/2017 |50
BBB |2/14/2017 |60
and this would be my desired output:
NAME |DATE |VALUE
AAA |12/31/2016 |150
AAA |1/31/2017 |140
AAA |2/1/2017 |120
NAME |DATE |VALUE
BBB |2/9/2017 |200
BBB |2/10/2017 |180
BBB |2/11/2017 |150
What I want to do is, for each of the valid symbols, (AAA, BBB) I want to have three rows.
For the first row of each column, I want all the values added,
For example, row 1 value for AAA:
10+20+30+40+50 = 150
then for row 2 I want to just add from the second value to the last.
For example row 2 value for AAA
20+30+40+50 = 140
and so on same goes for BBB.
I want to shift the dates down so that 12/31/2016 would match AAA, then get the first three dates for each row.
I currently have this code. but this doesn't do much. it just gives me a bunch of numbers.
use strict;
use warnings;
use Scalar::Util qw(looks_like_number);
use Data::Dumper;
sub uniq {
my %seen;
grep !$seen{$_}++, #_;
}
my %cashflow;
my %fields = (
ID => 0,
DATES => 1,
VALUE => 2,
);
my #total;
my #IDs;
my #uniqueIDs;
my #dates;
my #add;
my $i = 0;
my #values;
my $counter = 3;
open( FILE, "try.CSV" );
while ( my $line = <FILE> ) {
chomp( $line );
my #lineVals = split( /\|/, $line );
if ( $lineVals[ $fields{ID} ] !~ /^SYMBOL$/i ) {
push #IDs, $lineVals[ $fields{ID} ];
}
#uniqueIDs = uniq( #IDs );
#push all CASH FLOW AMOUNTS to #cashflow
if ( looks_like_number( $lineVals[ $fields{VALUE} ] ) ) {
$lineVals[ $fields{VALUE} ] =~ s/\r//;
push #total, $lineVals[ $fields{VALUE} ];
}
if ( $lineVals[ $fields{DATES} ] =~ /(\d{1,2})\/(\d{1,2})\/(\d{4})/ ) {
$lineVals[ $fields{DATES} ] = sprintf( '%04d%02d%02d', $3, $2, $1 );
}
$cashflow{ uc $lineVals[ $fields{ID} ] }{DATES} = $lineVals[ $fields{DATES} ];
$cashflow{ uc $lineVals[ $fields{ID} ] }{VALUE} = $lineVals[ $fields{VALUE} ];
foreach my $ID ( #uniqueIDs ) {
foreach my $symb ( keys %cashflow ) {
if ( $ID = $symb ) {
if ( looks_like_number( $lineVals[ $fields{VALUE} ] ) ) {
$lineVals[ $fields{VALUE} ] =~ s/\r//;
push #total, $lineVals[ $fields{VALUE} ];
my $i = 0;
my $grand = 0;
foreach my $val ( #total ) {
while ( $i < $counter ) {
$grand += $val;
print "$grand \n";
$i++;
}
shift #total;
}
}
}
}
}
}
close FILE;
I'm really stuck with this. I don't know what to do with the problem.
A possible solution:
#!perl
use strict;
use warnings;
sub trim {
my ($str) = #_;
s!\A\s+!!, s!\s+\z!! for $str;
$str
}
my $file = 'try.CSV';
open my $fh, '<', $file or die "$0: $file: $!\n";
my ($group_name, #dates, #values);
my $sum = 0;
my $print_group = sub {
return if !defined $group_name;
my $format = " %-6s|%-11s|%s\n";
printf $format, 'NAME', 'DATE', 'VALUE';
for my $date (#dates) {
printf $format, $group_name, $date, $sum;
$sum -= shift #values if #values;
}
};
while (my $line = readline $fh) {
my ($name, $date, $value) = map trim($_), split /\|/, $line;
if ($name eq 'NAME') {
$print_group->();
$group_name = undef;
#dates = $date;
#values = ();
$sum = 0;
next;
}
$group_name ||= $name;
push #dates, $date if #dates < 3;
push #values, $value if #values < 2;
$sum += $value;
}
$print_group->();
Let's go over it.
sub trim {
my ($str) = #_;
s!\A\s+!!, s!\s+\z!! for $str;
$str
}
A helper function for removing leading/trailing whitespace from a string. We're using ! as the s delimiter here because / breaks SO's syntax highlighting. Shrug.
my $file = 'try.CSV';
open my $fh, '<', $file or die "$0: $file: $!\n";
Open our input file. Note: We use a lexical variable ($fh) instead of a bareword filehandle, and we use 3-argument open. This is strongly recommended. We also check open's return value and produce a nice error message in case of failure, including both the name of the file that couldn't be opened ($file) and the reason for failing ($!).
my ($group_name, #dates, #values);
my $sum = 0;
We set up some state variables that we want to preserve across loop iterations. $group_name is the name of the group we're currently processing, #dates is the saved dates we've seen so far, #values is the saved values we've seen so far. $sum is a running sum of all the values in the current group, and it starts at 0.
my $print_group = sub {
return if !defined $group_name;
my $format = " %-6s|%-11s|%s\n";
printf $format, 'NAME', 'DATE', 'VALUE';
for my $date (#dates) {
printf $format, $group_name, $date, $sum;
$sum -= shift #values if #values;
}
};
A helper function for printing the output for a single group. If $group_name isn't set, we haven't processed any input for the current group yet, so we do nothing and return. Otherwise we print a NAME | DATE | VALUE header, followed by a row of data for each element in #dates. For each $date we output the current group name (e.g. AAA), $date, and the sum of values (all nicely formatted using printf). Initially $sum is the sum of all group values, but after the first iteration we start subtracting the values from #values: If the list of values in the input was x1, x2, x3, x4, ..., then $sum is initially x1 + x2 + x3 + x4 + ..., and that's what's printed in the first line of output. After that we subtract x1, so the next line gets x1 + x2 + x3 + x4 + ... - x1, which is x2 + x3 + x4 + .... After that we subtract x2, so the third row of data gets x3 + x4 + ....
while (my $line = readline $fh) {
my ($name, $date, $value) = map trim($_), split /\|/, $line;
Our main loop. We read a line of input, split it on |, and trim each field.
if ($name eq 'NAME') {
$print_group->();
$group_name = undef;
#dates = $date;
#values = ();
$sum = 0;
next;
}
If $name is 'NAME', this is the start of a new group. Print the output for the current group if any ($print_group->() does nothing if there is no current group), then reset our state variables back to initial values, except for #dates, which is filled with the $date value from the header row. Then start the next iteration of the loop because we're done with this line.
$group_name ||= $name;
push #dates, $date if #dates < 3;
push #values, $value if #values < 2;
$sum += $value;
If we get here, this line is not the start of a new group. We set $group_name if it hasn't been set yet. We add $date to our list of saved dates (but we only need 3 dates, so do nothing if we already have 3). We add $value to our list of saved values (but we only need 2 of them). Finally we add $value to our total $sum within the group.
}
$print_group->();
At the end of the loop we've also just finished processing a group, so we need to call $print_group here as well.
This will do as you ask. It reads the whole data file into an array of arrays and manipulates that array before printing it. The blocks are processed backwards from the end so that the other blocks remain in place when the trailing lines are deleted
This program expects the path to the input file as a parameter on the command line and writes the result to STDOUT
use strict;
use warnings 'all';
my #data = map [ /[^|\s]+/g ], <>;
# Make a list of the indices of all the header rows
my #headers = grep { $data[$_][0] eq 'NAME' } 0 .. $#data;
# Make a list of the indices of the first
# and last lines of all the data blocks
my #blocks = map {
[
$headers[$_] + 1,
$_ == $#headers ? $#data : $headers[$_+1] - 1
]
} 0 .. $#headers;
# Shift the second column down
# Replace the col2 header with 'DATE'
#
$data[$_][1] = $data[$_-1][1] for reverse 1 .. $#data;
$data[$_][1] = 'DATE' for #headers;
# Edit each block of data
#
for my $block ( reverse #blocks ) {
my ( $beg, $end ) = #$block;
# Calculate the block total
my $total = 0;
for ( $beg ... $end ) {
$total += $data[$_][2];
}
# Calculate the first three data values
for my $i ( $beg .. $beg + 2 ) {
my $next = $total - $data[$i][2];
$data[$i][2] = $total;
$total = $next;
}
# Remove everything except those three lines
splice #data, $beg+3, $end-$beg-2;
}
print join('|', #$_), "\n" for #data;
output
NAME|DATE|VALUE
AAA|12/31/2016|150
AAA|1/31/2017|140
AAA|2/1/2017|120
NAME|DATE|VALUE
BBB|2/9/2017|200
BBB|2/10/2017|180
BBB|2/11/2017|150

Perl find similar elements from two arrays

I would like to retrieve elements from #amplicon_exon array that contain similar element (like) to #failedamplicons array. Each element in #failedamplicons is unique and can only match a single element from #amplicon_exon. I've tried two for loops but get repeat values. Is there a better way of finding and retrieving similar values from the two arrays?
#failedamplicons: example:
OCP1_FGFR3_8.87
OCP1_AR_14.89
#amplicon_exon: example:
TEST_Focus_ERBB2_2:22:ERBB2:GENE_ID=ERBB2;PURPOSE=CNV,Hotspot;CNV_ID=ERBB2;CNV_HS=1
OCP1_FGFR3_8:intron:FGFR3:GENE_ID=FGFR3;PURPOSE=CNV;CNV_ID=FGFR3;CNV_HS=1
OCP1_CDK6_14:intron:CDK6:GENE_ID=CDK6;PURPOSE=CNV;CNV_ID=CDK6;CNV_HS=1
Here is two for loop code:
my $i = 0;
my $j = 0;
for ( $i = 0; $i < #amplicon_exon; $i++ ) {
for ( $j = 0; $j < #failedamplicons; $j++ ) {
my $fail_amp = ( split /\./, $failedamplicons[$j] )[0];
#print "the failed amp before match is $fail_amp\n";
if ( index( $amplicon_exon[$i], $fail_amp ) != -1 ) {
#print "the amplicon exon that matches $amplicon_exon[$i] and sample is $sample_id\n";
print "the failed amp that matches $fail_amp and sample is $sample_id\n";
my #parts = split /:/, $amplicon_exon[$i];
my $exon_amp = $parts[1];
next unless $parts[3] =~ /Hotspot/; #includes only Hotspot amplicons
my $gene_res = $parts[2];
my $depth = ( split /\./, $failedamplicons[$j] )[1];
my #total_amps = (
$run_name, $sample_id, $gene_res, $depth, $fail_amp, $run_date, $matrix_status
);
my $lines = join "\t", #total_amps;
push( #finallines, $lines );
}
}
}
split and grep are your friends, as is the idiomatic approach to iterating over a list. Simply iterate over the first array, extract just the part you want to match on (by using split to split the element on a . character, then taking only the first entry), then using a regex, grep for that part of the string in the second array from the beginning of the element up to the ::
for my $elem (#failedamplicons){
my $to_match = (split /\./, $elem)[0];
if (my ($matched) = grep {$_ =~ /^\Q$to_match:/} #amplicon_exon){
print "$matched\n";
}
}

Perl: Inserting values into specific columns of CSV file

I have CSV data of the form:
S.No,Label,Customer1,Customer2,Customer3...
1,label1,Y,N,Y
2,label2,N,Y,N
...
I need to reproduce the "label" to the left of "customer" columns marked with Y - and have nothing ("") to the left of columns marked with N.
Expected output:
S.No,Label,Customer1,Customer1,Customer2,Customer2,Customer3,Customer3...
1,label1,label1,Y,"",N,label1,Y
2,label2,"",N,label2,Y,"",N
When opened using Excel, it would look like this:
S.No Label Customer1 Customer1 Customer2 Customer2 Customer3 Customer3...
1 label1 label1 Y N label1 Y
2 label2 N label2 Y N
The two leftmost columns, referring to S.No and the original "Label" column, are constant.
What is the simplest way to do this? I tried the following code:
use strict;
use warnings;
my $nonIncludesFile = "nonIncludes.csv";
open(my $xfh, "+>", $nonIncludesFile) or warn "Unable to open $nonIncludesFile, $!";
chomp( my $header = <$xfh> );
my #names = split ",", $header;
my #names1;
my #fields;
my #fields1;
for(my $j=0; $j< scalar(#names); $j++)
{
$names1[$j] = $names[$j];
}
while(<$xfh>)
{
my $nonIncLine = $_;
$nonIncLine = chomp($nonIncLine);
#fields = split ",", $nonIncLine;
next if $. == 1; #skip the first line
for(my $i = 0; $i < scalar(#fields) -2; $i++) #Number of "customers" = scalar(#fields) -2
{
$fields1[0] = $fields[0];
$fields1[1] = $fields[1];
if('Y' eq $fields[ $i + 2 ])
{
$fields1[$i+2] = 'Y';
substr(#fields1, $i + 1, 0, $fields[1]); #insert the label to the left - HERE
}
else
{
$fields1[$i+2] = 'N';
substr(#fields1, $i + 1, 0, "");
}
}
}
print $xfh #names1;
print $xfh #fields1;
close($xfh);
This however complains of "substr outside of string" at the line marked by "HERE".
What am I doing wrong? And is there any simpler (and better) way to do this?
Something like this maybe?
#!/usr/bin/perl
use strict;
use warnings;
#read the header row
chomp( my ( $sn, $label, #customers ) = split( /,/, <DATA> ) );
#double the 'customers' column headings (one is suffixed "_label")
print join( ",", $sn, $label, map { $_ . "_label", $_ } #customers ), "\n";
#iterate data
while (<DATA>) {
#strip trailing linefeed
chomp;
#extract fields with split - note breaks if you've quoted commas inline.
my ( $sn, $label, #row ) = split /,/;
print "$sn,$label,";
#iterate Y/N values, and either prints "Y" + label, or anything else + blank.
foreach my $value (#row) {
print join( ",", $value eq "Y" ? $label : "", $value ),",";
}
print "\n";
}
__DATA__
S.No,Label,Customer1,Customer2,Customer3
1,label1,Y,N,Y
2,label2,N,Y,N
Assumes you don't have any fruity special characters (e.g. commas) in the fields, because it'll break if you do, and you might want to consider Text::CSV instead.
It is always much better to post some usable test data than write a something like this question
However, it looks like your data has no quoted fields or escaped characters, so it looks like you can just use split and join to process the CSV data
Here's a sample Perl program that fulfils your requirement. The example output uses your data as it is. Each line of data has to be processed backwards so that the insertions don't affect the indices of elements that are yet to be processed
use strict;
use warnings 'all';
use feature 'say';
while ( <DATA> ) {
chomp;
my #fields = split /,/;
for ( my $i = $#fields; $i > 1; --$i ) {
my $newval =
$. == 1 ? $fields[$i] :
lc $fields[$i] eq 'y' ? $fields[1] :
'';
splice #fields, $i, 0, $newval;
}
say join ',', #fields;
}
__DATA__
S.No,Label,Customer1,Customer2,Customer3...
1,label1,Y,N,Y
2,label2,N,Y,N
output
S.No,Label,Customer1,Customer1,Customer2,Customer2,Customer3...,Customer3...
1,label1,label1,Y,,N,label1,Y
2,label2,,N,label2,Y,,N

ID tracking while swapping and sorting other two arrays in perl

#! /usr/bin/perl
use strict;
my (#data,$data,#data1,#diff,$diff,$tempS,$tempE, #ID,#Seq,#Start,#End, #data2);
#my $file=<>;
open(FILE, "< ./out.txt");
while (<FILE>){
chomp $_;
#next if ($line =~/Measurement count:/ or $line =~/^\s+/) ;
#push #data, [split ("\t", $line)] ;
my #data = split('\t');
push(#ID, $data[0]);
push(#Seq, $data[1]);
push(#Start, $data[2]);
push(#End, $data[3]);
# push #$data, [split ("\t", $line)] ;
}
close(FILE);
my %hash = map { my $key = "$ID[$_]"; $key => [ $Start[$_], $End[$_] ] } (0..$#ID);
for my $key ( %hash ) {
print "Key: $key contains: ";
for my $value ($hash{$key} ) {
print " $hash{$key}[0] ";
}
print "\n";
}
for (my $j=0; $j <=$#Start ; $j++)
{
if ($Start[$j] > $End[$j])
{
$tempS=$Start[$j];
$Start[$j]=$End[$j];
$End[$j]=$tempS;
}
print"$tempS\t$Start[$j]\t$End[$j]\n";
}
my #sortStart = sort { $a <=> $b } #Start;
my #sortEnd = sort { $a <=> $b } #End;
#open(OUT,">>./trial.txt");
for(my $i=1521;$i>=0;$i--)
{
print "hey";
my $diff = $sortStart[$i] - $sortStart[$i-1];
print "$ID[$i]\t$diff\n";
}
I have three arrays of same length, ID with IDs (string), Start and End with integer values (reading from a file).
I want to loop through all these arrays and also want to keep track of IDs. First am swapping elements in Start with End if Start > End, then I have to sort these two arrays for further application (as I am negating Start[0]-Start[1] for each item in that Start). While sorting, the Id values may change, and as my IDs are unique for each Start and End elements, how can I keep track of my IDs while sorting them?
Three arrays, ID, Start and End, are under my consideration.
Here is a small chunk of my input data:
DQ704383 191990066 191990037
DQ698580 191911184 191911214
DQ724878 191905507 191905532
DQ715191 191822657 191822686
DQ722467 191653368 191653339
DQ707634 191622552 191622581
DQ715636 191539187 191539157
DQ692360 191388765 191388796
DQ722377 191083572 191083599
DQ697520 189463214 189463185
DQ709562 187245165 187245192
DQ540163 182491372 182491400
DQ720940 180753033 180753060
DQ707760 178340696 178340726
DQ725442 178286164 178286134
DQ711885 178250090 178250119
DQ718075 171329314 171329344
DQ705091 171062479 171062503
The above ID, Start, End respectively. If Start > End i swapped them only between those two arrays. But after swapping the descending order may change, but i want them in descending order also their corresponding ID for negation as explained above.
Don't use different arrays, use a hash to keep the related pieces of information together.
#!/usr/bin/perl
use warnings;
use strict;
use enum qw( START END );
my %hash;
while (<>) {
my ($id, $start, $end) = split;
$hash{$id} = [ $start < $end ? ($start, $end)
: ($end, $start) ];
}
my #by_start = sort { $hash{$a}[START] <=> $hash{$b}[START] } keys %hash;
my #by_end = sort { $hash{$a}[END] <=> $hash{$b}[END] } keys %hash;
use Test::More;
is_deeply(\#by_start, \#by_end, 'same');
done_testing();
Moreover, in the data sample you provided, the order of id's is the same regardless of by what you sort them.

Resources