I have the following code in my while loop and it is significantly slow, any suggestions on how to improve this?
open IN, "<$FileDir/$file" || Err( "Failed to open $file at location: $FileDir" );
my $linenum = 0;
while ( $line = <IN> ) {
if ( $linenum == 0 ) {
Log(" This is header line : $line");
$linenum++;
} else {
$linenum++;
my $csv = Text::CSV_XS->new();
my $status = $csv->parse($line);
my #val = $csv->fields();
$index = 0;
Log("number of parameters for this file is: $sth->{NUM_OF_PARAMS}");
for ( $index = 0; $index <= $#val; $index++ ) {
if ( $index < $sth->{NUM_OF_PARAMS} ) {
$sth->bind_param( $index + 1, $val[$index] );
}
}
if ( $sth->execute() ) {
$ifa_dbh->commit();
} else {
Log("line $linenum insert failed");
$ifa_dbh->rollback();
exit(1);
}
}
}
By far the most expensive operation there is accessing the database server; it's a network trip, hundreds of milliseconds or some such, each time.
Are those DB operations inserts, as they appear? If so, instead of inserting row by row construct a string for an insert statement with multiple rows, in principle as many as there are, in that loop. Then run that one transaction.
Test and scale down as needed, if that adds up to too many rows. Can keep adding rows to the string for the insert statement up to a decided maximum number, insert that, then keep going.†
A few more readily seen inefficiencies
Don't construct an object every time through the loop. Build it once befor the loop, and then use/repopulate as needed in the loop. Then, there is no need for parse+fields here, while getline is also a bit faster
Don't need that if statement for every read. First read one line of data, and that's your header. Then enter the loop, without ifs
Altogether, without placeholders which now may not be needed, something like
my $csv = Text::CSV_XS->new({ binary => 1, auto_diag => 1 });
# There's a $table earlier, with its #fields to populate
my $qry = "INSERT into $table (", join(',', #fields), ") VALUES ";
open my $IN, '<', "$FileDir/$file"
or Err( "Failed to open $file at location: $FileDir" );
my $header_arrayref = $csv->getline($IN);
Log( "This is header line : #$header_arrayref" );
my #sql_values;
while ( my $row = $csv->getline($IN) ) {
# Use as many elements in the row (#$row) as there are #fields
push #sql_values, '(' .
join(',', map { $dbh->quote($_) } #$row[0..$#fields]) . ')';
# May want to do more to sanitize input further
}
$qry .= join ', ', #sql_values;
# Now $qry is readye. It is
# INSERT into table_name (f1,f2,...) VALUES (v11,v12...), (v21,v22...),...
$dbh->do($qry) or die $DBI::errstr;
I've also corrected the error handling when opening the file, since that || in the question binds too tightly in this case, and there's effectively open IN, ( "<$FileDir/$file" || Err(...) ). We need or instead of || there. Then, the three-argument open is better. See perlopentut
If you do need the placeholders, perhaps because you can't have a single insert but it must be broken into many or for security reasons, then you need to generate the exact ?-tuples for each row to be inserted, and later supply the right number of values for them.
Can assemble data first and then build the ?-tuples based on it
my $qry = "INSERT into $table (", join(',', #fields), ") VALUES ";
...
my #data;
while ( my $row = $csv->getline($IN) ) {
push #data, [ #$row[0..$#fields] ];
}
# Append the right number of (?,?...),... with the right number of ? in each
$qry .= join ', ', map { '(' . join(',', ('?')x#$_) . ')' } #data;
# Now $qry is ready to bind and execute
# INSERT into table_name (f1,f2,...) VALUES (?,?,...), (?,?,...), ...
$dbh->do($qry, undef, map { #$_ } #data) or die $DBI::errstr;
This may generate a very large string, what may push the limits of your RDBMS or some other resource. In that case break #data into smaller batches. Then prepare the statement with the right number of (?,?,...) row-values for a batch, and execute in the loop over the batches.‡
Finally, another way altogether is to directly load data from a file using the database's tool for that particular purpose. This will be far faster than going through DBI, probably even including the need to process your input CSV into another one which will have only the needed data.
Since you don't need all data from your input CSV file, first read and process the file as above and write out a file with only the needed data (#data above). Then, there's two possible ways
Either use an SQL command for this – COPY in PostgreSQL, LOAD DATA [LOCAL] INFILE in MySQL and Oracle (etc); or,
Use a dedicated tool for importing/loading files from your RDBMS – mysqlimport (MySQL), SQL*Loader/sqlldr (Oracle), etc. I'd expect this to be the fastest way
The second of these options can also be done out of a program, by running the appropriate tool as an external command via system (or better yet via the suitable libraries).
† In one application I've put together as much as millions of rows in the initial insert -- the string itself for that statement was in high tens of MB -- and that keeps running with ~100k rows inserted in a single statement daily, for a few years by now. This is postgresql on good servers, and of course ymmv.
‡
Some RDBMS do not support a multi-row (batch) insert query like the one used here; in particular Oracle seems not to. (We were informed in the end that that's the database used here.) But there are other ways to do it in Oracle, please see links in comments, and search for more. Then the script will need to construct a different query but the principle of operation is the same.
I have 2 foreach loops:
Loop 1: goes through a json file
Loop 2: goes through a database
i want to compare the results form the json to the results in the database.
// TRANSACTIONS
$transaction = json_decode(transaction_list($wallet_id), true);
// DATABASE ROWS
$rows = $db->run('SELECT * FROM transactions');
// loop through transactions
foreach ($transaction as $tx) {
$tx_id = $tx['id'];
}
// loop through database
foreach ($rows as $row) {
$id = $row['id'];
}
echo $id . " | " . $tx_id . "\n";
The results are only one line, Id like to get results for all the lines.
24d418b322e889e39d8e4bf3b8d6060e479d40032658cb9b080ff6d615eee9cf |
7c6c161695a21ad9143b1f3e242d176880b3484ebb1f6c820772c92ece916bdb
How do i get all the results from the database?
I tired a for loop to count the rows and produce one result for each one, but I just got the above line 12 times instead of 12 different results.
The goal is to be able to compare the json files results to the database and if theyre different to insert the json thats not already in the database into the database. for example, if the $tx_id doesnt exist to add the new transaction to the database. Maybe Im doing this wrong?
EDIT1:
I don't actually get an empty array as stated below. Instead, I get an empty response body because of the following exception:
DBD::ODBC::st fetchrow_array failed: st_fetch/SQLFetch (long truncated DBI attribute LongTruncOk not set and/or LongReadLen too small) (SQL-HY000) [state was HY000 now 01004]
Which I can see there are posts about. I will look at those to see if I can fix this on my on. Will edit if not successful.
First let me start by saying, I do not know Perl well at all. This could be a careless error — I hope it is. I am building a hash from an array that is returned from SQL or from JavaScript on the front-end and one of the keys in the hash, "short-desc" needs to have the value which in the code below will be coming from a SQL database.
BFHHOTH 15x24S/S +2 UP-HNDWHL-UNSPOKED-GALV 15"x24" flush escape hatch w/hinge, internal handwheel, T-handle on top steel cover and ring
However with the code (removed unnecessary cases from switch):
#!perl
use Switch;
use DBI;
use JSON;
use CGI qw /param/;
use CGI::Carp qw(fatalsToBrowser);
use IO::Compress::Gzip qw(gzip $GzipError);
use URI::Encode;
use URI::Escape;
my $gzip_ok;
my $accept_encoding = $ENV{HTTP_ACCEPT_ENCODING};
if ( $accept_encoding && $accept_encoding =~ /\bgzip\b/ ) {
# $gzip_ok = 1;
}
print "Content-Type: application/json\n";
if ($gzip_ok) {
print "Content-Encoding: gzip\n";
}
print "\n";
my $action = param('ACTION');
my %jsonData;
my #jsonArray;
my $azDSN = DBI->connect('dbi:ODBC:Driver={SQL Server Native Client 10.0};Server=myServer;Database=myDB;Uid=me;Pwd={myPass};Encrypt=yes;Connection Timeout=30;');
switch ($action) {
case "GETINFO" {
my $paramID = param('ID');
getInfo($paramID);
my $json_text = JSON->new->pretty->utf8->encode( \#jsonArray );
if ($gzip_ok) {
my $zipText;
gzip \$json_text, \$zipText,
or die "gzip failed: $GzipError\n";
print $zipText;
}
else {
print $json_text;
}
}
}
sub getInfo {
my $myID = $_[0];
my $statement = <<"SQL";
SELECT
trefQuoteItemsID,
quote_position,
description,
comments
FROM
myDB.dbo.myTable where tID = $myID;
SQL
my $sti = $azDSN->prepare($statement) or die $statement;
$sti->execute() or die $DBI::errstr;
while ( my #row = $sti->fetchrow_array ) {
my %tempData;
%tempData = (
"tref" => $row[0],
"position" => $row[1],
"short_desc" => $row[2],
"comments" => $row[3]
);
$jsonArray[$count] = {%tempData};
$count++;
}
}
An empty array is returned to me on the front-end.
Oddly, if the string is:
BFHHOTH 15x24S/S +2 UP-HNDWHL-UNSPOKED-
the array contains the correct object.
But empty again if the string is:
BFHHOTH 15x24S/S +2 UP-HNDWHL-UNSPOKED-G
Have also tried with strings:
qwertyuiopasdfghjklzxcvbnm1234567890qwe #length is 39
which lets the hash gets built and:
qwertyuiopasdfghjklzxcvbnm1234567890qwer #length is 40
which will return an empty array so hash doesn't get built.
Are there any Perl gurus who have any suggestions?
From what I can tell, it's related to "long object" columns.
DBD::ODBC says:
You can retrieve a lob in chunks like this:
$sth->bind_col($column, undef, {TreatAsLOB=>1});
while(my $retrieved = $sth->odbc_lob_read($column, \my $data, $length)) {
print "retrieved=$retrieved lob_data=$data\n";
}
NOTE: to retrieve a lob like this you must first bind the lob column specifying BindAsLOB or DBD::ODBC will 1) bind the column as normal and it will be subject to LongReadLen and b) fail odbc_lob_read.
NOTE: Some database engines and ODBC drivers do not allow you to retrieve columns out of order (e.g., MS SQL Server unless you are using cursors). In those cases you must ensure the lob retrieved is the last (or only) column in your select list.
NOTE: You can retrieve only part of a lob but you will probably have to call finish on the statement handle before you do anything else with that statement. When only retrieving part of a large lob you could see a small delay when you call finish as some protocols used by ODBC drivers send the lob down the socket synchronously and there is no way to stop it (this means the ODBC driver needs to read all the lob from the socket even though you never retrieved it all yourself).
NOTE: If your select contains multiple lobs you cannot read part of the first lob, the second lob then return to the first lob. You must read all lobs in order and completely or read part of a lob and then do no further calls to odbc_lob_read.
There's no mention if you can retrieve a lob any other way, so I don't know if you have to do it this way, but at least you know one way. But I believe you can work around the problem by increasing the connection's LongReadLen attribute.
You should be able to set the attribute as follows:
my $dbh = DBI->connect($dsn, $user, $passwd, {
...,
LongReadLen => ...,
});
You should also be able to set the attribute as follows:
$dbh->{LongReadLen} = ...;
Hopefully, someone can give you a better answer.
I have a SQL column named "details" and it contains the following data:
<changes><RoundID><new>8394</new></RoundID><RoundLeg><new>JAYS CLOSE AL6 Odds(1 - 5)</new></RoundLeg><SortType><new>1</new></SortType><SortOrder><new>230</new></SortOrder><StartDate><new>01/01/2009</new></StartDate><EndDate><new>01/01/2021</new></EndDate><RoundLegTypeID><new>1</new></RoundLegTypeID></changes>
<changes><RoundID><new>8404</new></RoundID><RoundLeg><new>HOLLY AREA AL6 (1 - 9)</new></RoundLeg><SortType><new>1</new></SortType><SortOrder><new>730</new></SortOrder><StartDate><new>01/01/2009</new></StartDate><EndDate><new>01/01/2021</new></EndDate><RoundLegTypeID><new>1</new></RoundLegTypeID></changes>
<changes><RoundID><new>8379</new></RoundID><RoundLeg><new>PRI PARK AL6 (1 - 42)</new></RoundLeg><SortType><new>1</new></SortType><SortOrder><new>300</new></SortOrder><StartDate><new>01/01/2009</new></StartDate><EndDate><new>01/01/2021</new></EndDate><RoundLegTypeID><new>1</new></RoundLegTypeID></changes>
What is the easiest way to separate this data out into individual columns? (that is all one column)
Try this:
SELECT DATA.query('/changes/RoundID/new/text()') AS RoundID
,DATA.query('/changes/RoundLeg/new/text()') AS RoundLeg
,DATA.query('/changes/SortType/new/text()') AS SortType
-- And so on and so forth
FROM (SELECT CONVERT(XML, Details) AS DATA
FROM YourTable) AS T
Once you get your result set from the sql (mysql or whatever) you will probably have an array of strings. As I understand your question, you wanted to know how to extract each of the xml nodes that were contained in the string that was stored in the column in question. You could loop through the results from the sql query and extract the data that you want. In php it would look like this:
// Set a counter variable for the first dimension of the array, this will
// number the result sets. So for each row in the table you will have a
// number identifier in the corresponding array.
$i = 0;
$output = array();
foreach($results as $result) {
$xml = simplexml_load_string($result);
// Here use simpleXML to extract the node data, just by using the names of the
// XML Nodes, and give it the same name in the array's second dimension.
$output[$i]['RoundID'] = $xml->RoundID->new;
$output[$i]['RoudLeg'] = $xml->RoundLeg->new;
// Simply create more array items here for each of the elements you want
$i++;
}
foreach ($output as $out) {
// Step through the created array do what you like with it.
echo $out['RoundID']."\n";
var_dump($out);
}
Noob here.
I have a super column family sorted by timeuuidtype which has a number of entries. I'm trying to perform a simple get function with phpcassa that wont work. I'm trying to return a specific value from a UTF8 sorted column within a TimeUUID sorted SC. The exact code works with a similar SC Family sorted by BytesType.
Here is the info on the scf I'm trying to get from which i previously entered via -cli.
ColumnFamily: testSCF (Super)
Columns sorted by: org.apache.cassandra.db.marshal.TimeUUIDType/org.apache.cassandra.db.marshal.UTF8Type
RowKey: TestKey
=> (super_column=48dd0330-5bd6-11e0-adc5-343960c1b6b8,
(column=test, value=74657374, timestamp=1301603831288000))
=> (super_column=141a69b0-5c6e-11e0-bcce-343960c1b6b8,
(column=new test, value=6e657774657374, timestamp=1301669004440000))
And here is the phpcassa script I'm using to retrieve the data.
<?php
require_once('.../connection.php');
require_once('.../columnfamily.php');
$conn = new Connection('siteRoot');
$scf = 'testSCF';
$key = 'testKey';
$super = '141a69b0-5c6e-11e0-bcce-343960c1b6b8';
$col = 'new test';
$entry = new ColumnFamily($conn, $scf);
$q = ($entry->get($key, $columns=array($super)));
echo $q[$super][$col];
?>
Also if I don't specify the SC like so.
$q = ($entry->get($key));
print_r($q);
It returns:
Array ( [HÝ0[ÖàÅ49`Á¶¸] => Array ( [test] => test ) [i°\nà¼Î49`Á¶¸] => Array ( [new test] => newtest ) )
I know part of the issue might have been brought up in How do I insert a row with a TimeUUIDType column in Cassandra?
But it didn't really help me as I presumably have accepted timeuuidtypes.
Thanks for any help guys.
Suppose I didn't try hard enough to begin with. The answer in fact was everything to do with the link.
Appears that the -cli accepted what jbellis in the link describes as 32 byte representation of the timeUUID (141a69b0-5c6e-11e0-bcce-343960c1b6b8) when I inserted it. This confused me.
It works fine when you 'get()' with the "raw" 16 byte form (HÝ0[ÖàÅ49`Á¶¸).
Cheers.