Trimming an array (filled via DBI) - sql-server

I'm trying to read data from SQL Server database using Perl and the DBI module. My intention is to read the data and print it into a text file (comma separated). When I do this, I get the result like this:
var1,var2,var3
40406,20 ,783
50230,78 ,680
50230,78 ,680
50230,78 ,680
50230,78 ,680
So there is a whitespace between the second variable data and the comma. I tried to trim this using the code below, but it did not work. How should I modify my code to get rid of those whitespaces?
My code is here:
#!/bin/perl
use warnings;
use strict;
use DBI;
sub trim;
my $dbs = "dbi:ODBC:DRIVER={SQL Server};SERVER={xxxx}";
my ($username, $password) = ('un', 'pwd');
my $dbh = DBI->connect($dbs, $username, $password)
or die "Can't connect to $dbs: $DBI::errstr";
my $sth = $dbh->prepare("select var1, var2, var3 from db.dbo.table")
or die "Can't prepare statement: $DBI::errstr";
$sth->execute();
my $outfile = 'temp.txt';
open OUTFILE, '>', $outfile or die "Unable to open $outfile: $!";
print OUTFILE join(",", #{$sth->{NAME}}), "\n";
while (my #re = $sth->fetchrow_array) {
print OUTFILE join(",", trim(#re)), "\n";
}
close OUTFILE;
$sth->finish();
$dbh->disconnect();
############## subroutines ##################
sub trim($) {
my $string = shift;
$string =~ s/^\s+//;
$string =~ s/\s+$//;
return $string;
}

Your trim() function does not modify the list in place (nor it handles a list).
So, in real TIMTOWTDI fashion, you should either modify the function to return a new array:
sub trimArray {
my #arr = #_;
my #rv;
for my $val (#arr) {
$val =~ s/^\s+//;
$val =~ s/\s+$//;
push #rv, $val;
}
return #rv;
}
#and then
print OUTFILE join(",", trimArray(#re)), "\n";
or pass a reference to your function and then modify the array in place
sub trimInPlace {
my $arrRef = shift;
for my $val (#$arrRef) {
$val =~ s/^\s+//;
$val =~ s/\s+$//;
}
}
#and then
trimInPlace(\#re); #Note the \
print OUTFILE join(",", #re), "\n";
or use map
#!/bin/perl
use warnings;
use strict;
use DBI;
#... the same
while (my #re = $sth->fetchrow_array) {
print OUTFILE join(",", map { trim($_); } #re), "\n"; #Applies
#trim() to each element
}
#...
############## subroutines ##################
sub trim { #Don't use prototypes
my $string = shift;
$string =~ s/^\s+//;
$string =~ s/\s+$//;
return $string;
}
or try using chomp, by modifying $/, which will only remove a trailing space, nothing more.
#!/bin/perl
use warnings;
use strict;
use DBI;
#... the same
my $old_sep = $/;
$/ = " ";
while (my #re = $sth->fetchrow_array) {
chomp(#re); #Modifies in place, returning number of changes
print OUTFILE join(",", #re), "\n";
}
$/ = $old_sep;

You could also check to see if DBD::ODBC supports the ChopBlanks attribute:
my $dbh = DBI->connect($dbs, $username, $password, { ChopBlanks => 1 } )
the ChopBlanks attribute trims the trailing whitespace of any CHAR fields (that is if your driver supports it ... I'm not sure if DBD::ODBC does).

Why does that field have trailing whitespace? Usually that points to some sort of problem with the database model. Besides your trim() function, you might investigate why the data is dirty.

Related

Convert string into array perl

I have a script which takes headers of a multi-fasta file and pushes them into an array. Then I want to loop through this array to find a specific pattern and perform some commands.
open(FH, '<', $ref_seq) or die $!;
while(<FH>){
$line = $_;
chomp $line;
if(m/^>([^\s]+)/){
$ref_header = $1;
print "$ref_header\n";
chomp $header;
if($1 eq $header){
$ref_header = $header;
#print "header is $ref_header\n";
}
}
}
This code prints headers like
chr1
chr2
chr3
How can I push these headers into an array?
I tried following code, but it splits individual letters, instead of $header_array[0] being chr1
#header_array = split(/\n*/, $ref_header);
print ("Here's the first element $header_array[0]");
Any help will be appreciated.
Shorten the code as shown below, removing some extra statements, and use push. You can combine push and the pattern match:
#!/usr/bin/env perl
use strict;
use warnings;
use Carp;
my $in_file = shift;
my #headers;
open my $in_fh, '<', $in_file or croak "cannot open $in_file: $!";
while ( <$in_fh> ) {
push #headers, />(\S+)/;
}
close $in_fh or croak "cannot close $in_file: $!";
print "#headers";
# Now, loop through headers and select the ones you need, for example:
for my $header ( #headers ) {
if ( $header =~ /foo/ ) {
# do something
}
}
A few suggestion on fixing your original code are below:
# Always use strict and use warnings.
# Remove extra parens and make the error message more informative:
open(FH, '<', $ref_seq) or die $!;
while(<FH>){
$line = $_;
chomp $line;
# [^\s] is simply \S:
if(m/^>([^\s]+)/){
$ref_header = $1;
print "$ref_header\n";
# where is $header coming from?
chomp $header;
# if the condition is satisfied, this assignment does not make sense:
# $ref_header is already the same as $header:
if($1 eq $header){
$ref_header = $header;
#print "header is $ref_header\n";
}
}
}
You can use push:
push #header_array, $ref_header;

Perl and Foreach loop with Splitting

I have a Problem about the Foreach Loop and Splitting with Perl.
I want to loop the Arrays and split it to name and value.
1.) I read the File and save to an String.
2.) I split the empty line and save to an Array.
My Script:
#!/usr/bin/perl
use strict;
my $pathconfigfile = 'config.conf';
my #configline;
open(my $configfile, "<", $pathconfigfile);
local $/;
my #configdata = split("\n\n",<$configfile>);
#print $configdata[0], "\n";
#print $configdata[1], "\n";
#print $configdata[2], "\n";
foreach my $data (#configdata){
my #editing = split /#/, $data;
my ($name, $value) = #editing[0,1];
print $name "\n";
print $value "\n";
}
close $configfile;
Configfile:
Testingtttttttttttttttttttttttt
############################################
0987654345678909876MN09uz6t56789oiuhgölkjhgfr
0987654323456789098765fgnloiuztlkjhgfrtzuiknb
MegaMixoiuzt
############################################
09876543457890098765NSUDlkjhzgtfr67899ztz9098
098765435678987t87656789876567898765679097658
TESTINGPARTS
############################################
0987654567890098765hzzasza654567uhgdjdjfacdaa
9876545678987654mchfuiaq754567898765434567876
My wish result:
$name = Testingtttttttttttttttttttttttt
$value = 0987654345678909876MN09uz6t56789oiuhgölkjhgfr
0987654323456789098765fgnloiuztlkjhgfrtzuiknb
$name = MegaMixoiuzt
$value = 09876543457890098765NSUDlkjhzgtfr67899ztz9098
098765435678987t8765678987656789876567909765
$name = TESTINGPARTS
$value = 0987654567890098765hzzasza654567uhgdjdjfacdaa
9876545678987654mchfuiaq754567898765434567876
split /#/ # A "#" separates the two.
should be
split /\n#+\n/ # A line of "#" separates the two.
With other improvements:
#!/usr/bin/perl
use strict;
use warnings;
my $config_qfn = 'config.conf';
open(my $config_fh, "<", $config_qfn )
or die("Can't open \"$config_qfn\": $!\n");
local $/ = ""; # Paragraph mode
while (my $rec = <$config_fh>) {
my ($name, $value) = split(/\n#+\n/, $rec);
print "\$name = $name\n";
print "\$value = $value\n";
}
Also something like this without foreach:
use strict;
use warnings;
open my $fh, '<config.conf' or die "$!"; my $data = join '', <$fh>; close $fh;
my %hash = $data =~ /^(.+)\n#+\n(\S+\n\S+)/mg;
print "NAME: $_\nVALUE: $hash{$_}\n\n" for keys %hash
You want to split by multiple # so use #+
+ match one or more times.
Try it
#!/usr/bin/perl
use strict;
my $pathconfigfile = 'config.conf';
my #configline;
open(my $configfile, "<", $pathconfigfile);
local $/;
my #configdata = split("\n\n",<$configfile>);
foreach my $data (#configdata){
my ($name,$value) = split /\n#+\n/, $data;
print "$name $value\n\n";
}

To remove duplicate elements from an array in Perl

I have a data set
10-101570715-101609901-hsa-mir-3158-1 10-101600739-101609661-ENSG00000166171 10-101588288-101609668-ENSG00000166171 10-101588325-101609447-ENSG00000166171 10-101594702-101609439-ENSG00000166171 10-101570560-101596651-ENSG00000166171
10-103389007-103396515-hsa-mir-1307 10-103389041-103396023-ENSG00000173915 10-103389050-103396074-ENSG00000173915 10-103389050-103396441-ENSG00000173915 10-103389050-103396466-ENSG00000173915 10-103389050-103396466-ENSG00000173915
Except for the first element in each line, I have multiple values, which are redundant and I want to remove the redundant values. I have written a code but I don't feel its working fine.
open (fh, "file1");
while ($line=<fh>)
{
chomp ($line);
#array=$line;
my #unique = ();
my %Seen = ();
foreach my $elem ( #array )
{
next if $Seen{ $elem }++;
push #unique, $elem;
}
print #unique;
}
a hash is for duplicate detection :
my %seen;
my #removeduplicate = grep { !$seen{$_}++ } #array;
For me below code is working fine :
use strict;
use warnings;
my %seen;
open my $fh, "<", 'file.txt' or die "couldn't open : $!";
while ( my $line = <$fh>)
{
chomp $line;
my #array = split (' ', $line);
my #removeduplicate = grep { !$seen{$_}++ } #array;
print "#removeduplicate\n";
}

Perl Hashes of Arrays and Some issues

I currently have a csv file that looks like this:
a,b
a,d
a,f
c,h
c,d
So I saved these into a hash such that the key "a" is an array with "b,d,f" and the key "c" is an array with "h,d"... this is what I used for that:
while(<$fh>)
{
chomp;
my #row = split /,/;
my $cat = shift #row;
$category = $cat if (!($cat eq $category)) ;
push #{$hash{$category}}, #row;
}
close($fh);
Not sure about the efficiency but it seems to work when I do a Data Dump...
Now, the issue I'm having is this; I want to create a new file for each key, and in each of those files I want to print every element in the key, as such:
file "a" would look like this:
b
d
f
<end of file>
Any ideas? Everything I've tried isn't working, I'm not too familiar / experienced with hashes...
Thanks in advance :)
The output process is very simple using the each iterator, which provides the key and value pair for the next hash element in a single call
use strict;
use warnings;
use autodie;
open my $fh, '<', 'myfile.csv';
my %data;
while (<$fh>) {
chomp;
my ($cat, $val) = split /,/;
push #{ $data{$cat} }, $val;
}
while (my ($cat, $values) = each %data) {
open my $out_fh, '>', $cat;
print $out_fh "$_\n" for #$values;
}
#!/usr/bin/perl
use strict;
use warnings;
my %foos_by_cat;
{
open(my $fh_in, '<', ...) or die $!;
while (<$fh_in>) {
chomp;
my ($cat, $foo) = split /,/;
push #{ $foos_by_cat{$cat} }, $foo;
}
}
for my $cat (keys %foos_by_cat) {
open(my $fh_out, '>', $cat) or die $!;
for my $foo (#{ $foos_by_cat{$cat} }) {
print($fh_out "$foo\n");
}
}
I wrote the inner loop as I did to show the symmetry between reading and writing, but it can also be written as follows:
print($fh_out "$_\n") for #{ $foos_by_cat{$cat} };

problem with the code in perl

My problem is that I am not able to figure out that why my code is taking each of the line from the file as one element of an array instead of taking the whole record starting from AD to SS as one element of the array. As you can see that my file is starting from AD and ending at SS which is same for all the followed lines in the data. But I want to make the array having elements starting from AD to SS which will be having all the lines in between AD to SS that is BC....,EG...., FA.....etc.not each line as an element. I tried my way and get the same file as such.Could anyone check my code. Thanks in advance.
AD uuu23
BC jjj
EG iii
FA vvv
SS
AD hhh25
BC kkk
EG ppp
FA aaa
SS
AD ttt26
BC xxx
FA rrr
SS
#!/usr/bin/env perl
use strict;
use warnings;
my $ifh;
my $line = '';
my #data;
my $ifn = "fac.txt";
open ($ifh, "<$ifn") || die "can't open $ifn";
my $a = "AD ";
my $b = "SS ";
my $_ = " ";
while ($line = <$ifh>)
{
chomp
if ($line =~ m/$a/g); {
$line = $_;
push #data, $line;
while ($line = <$ifh>)
{
$line .= $_;
push #data, $line;
last if
($line =~ m/$b/g);
}
}
push #data, $line; }
print #data;
If I understand correctly your problem, the fact is that the way you are reading the file:
while ($line = <$ifh>)
is inherently a line-by-line approach. It uses the content of the "line termination variable" ($/) to understand where to split lines. One easy way to change this behavior is un-defining the $/:
my $oldTerminator = $/;
undef $/;
....... <your processing here>
$/ = $oldTerminator;
so, your file would be just one line, but I am not sure what would happen of your code.
Another approach is the following (keeping in mind what I said about the fact that you are reading the file line-by-line): instead of doing
`push #data, $line;`
at each iteration of your loop, just accumulate the lines you read in a variable
$line .= $_;
(like you already do), and do the push only at the end, just once. Actually, this second approach will be more easily applicable to your code (you only have to remove the two push statements you have and put one outside of the loop).
I believe part of your problem is here
chomp
if ($line =~ m/$a/g);
it should be
chomp;
if ($line =~ m/$a/g)
otherwise the if statement is always executed. Please update your question if this has helped you advance
Here's a way to accomplish reading the records into an array, with newlines removed:
Code:
use strict;
use warnings;
use autodie;
my #data;
my $record;
my $file = "fac.txt";
open my $fh, '<', $file;
while (<$fh>) {
chomp;
if (/^AD /) { # new record starts
$record = $_;
while (<$fh>) {
chomp;
$record .= $_;
last if /^SS\s*/;
}
push #data, $record;
} else { die "Data outside record: $_" }
}
use Data::Dumper;
print Dumper \#data;
Output:
$VAR1 = [
'AD uuu23BC jjjEG iiiFA vvvSS',
'AD hhh25BC kkkEG pppFA aaaSS',
'AD ttt26BC xxxFA rrrSS'
];
This is another version, using the input record separator $/:
use strict;
use warnings;
use autodie;
my $file = "fac.txt";
open my $fh, '<', $file;
my #data;
$/ = "\nSS";
while (<$fh>) {
s/\n//g;
push #data, $_;
}
use Data::Dumper;
print Dumper \#data;
Produces the same output with this data. It does not care about the record start characters, only the end, which is SS at the beginning of a line.

Resources