How do I make multiple database queries more efficient in Perl? - database

I have a queries that reside in multiple methods each (query) of which can contain multiple parameters. I am trying to reduce file size and line count to make it more maintainable. Below is such an occurrence:
$sql_update = qq { UPDATE database.table
SET column = 'UPDATE!'
WHERE id = ?
};
$sth_update = $dbh->prepare($sql_update);
if ($dbh->err) {
my $error = "Could not prepare statement. Error: ". $dbh->errstr ." Exiting at line " . __LINE__;
print "$error\n";
die;
}
$sth_rnupdate->execute($parameter);
if ($dbh->err) {
my $error = "Could not execute statement. Error: ". $dbh->errstr ." Exiting at line " . __LINE__;
print "$error\n";
die;
}
This is just one example, however, there are various other select examples that contain just the one parameter to be passed in, however there is also some with two or more parameters. I guess I am just wondering would it be possible to encapsulate this all into a function/method, pass in an array of parameters, how would the parameters be populated into the execute() function?
If this was possible I could write a method that you simply just pass in the SQL query and parameters and get back a reference to the fetched records. Does this sound safe at all?

If line-count and maintainable code is your only goal, your best bet would be to use any one of the several fine ORM frameworks/libraries available. Class::DBI and DBIx::Class are two fine starting points. Just in case, you are worried about spending additional time to learn these modules - dont: It took me just one afternoon to get started and productive. Using Class::DBI for example your example is just one line:
Table->retrieve(id => $parameter)->column('UPDATE!')->update;
The only down-side (if that) of these frameworks is that very complicated SQL statements required writing custom methods learning which may take you some additional time (not too much) to get around.

No sense in checking for errors after every single database call. How tedious!
Instead, when you connect to the database, set the RaiseError option to true. Then if a database error occurs, an exception will be thrown. If you do not catch it (in an eval{} block), your program will die with a message, similar to what you were doing manually above.

The "execute" function does accept an array containing all your parameters.
You just have to find a way to indicate which statement handle you want to execute and you're done ...
It would be much better to keep your statement handles somewhere because if you create a new one each time and prepare it each time you don't really rip the benefits of "prepare" ...
About returning all rows you can do that ( something like "while fetchrow_hashref push" ) be beware of large result sets that coudl eat all your memory !

Here's a simple approach using closures/anonymous subs stored in a hash by keyword name (compiles, but not tested otherwise), edited to include use of RaiseError:
# define cached SQL in hash, to access by keyword
#
sub genCachedSQL {
my $dbh = shift;
my $sqls = shift; # hashref for keyword => sql query
my %SQL_CACHE;
while (my($name,$sql) = each %$sqls) {
my $sth = $dbh->prepare($sql);
$SQL_CACHE{$name}->{sth} = $sth;
$SQL_CACHE{$name}->{exec} = sub { # closure for execute(s)
my #parameters = #_;
$SQL_CACHE{$name}->{sth}->execute(#parameters);
return sub { # closure for resultset iterator - check for undef
my $row; eval { $row = $SQL_CACHE{$name}->{sth}->fetchrow_arrayref(); };
return $row;
} # end resultset closure
} # end exec closure
} # end while each %$sqls
return \%SQL_CACHE;
} # end genCachedSQL
my $dbh = DBI->connect('dbi:...', { RaiseError => 1 });
# initialize cached SQL statements
#
my $sqlrun = genCachedSQL($dbh,
{'insert_table1' => qq{ INSERT INTO database.table1 (id, column) VALUES (?,?) },
'update_table1' => qq{ UPDATE database.table1 SET column = 'UPDATE!' WHERE id = ? },
'select_table1' => qq{ SELECT column FROM database.table1 WHERE id = ? }});
# use cached SQL
#
my $colid1 = 1;
$sqlrun->{'insert_table1'}->{exec}->($colid1,"ORIGINAL");
$sqlrun->{'update_table1'}->{exec}->($colid1);
my $result = $sqlrun->{'select_table1'}->{exec}->($colid1);
print join("\t", #$_),"\n" while(&$result());
my $colid2 = 2;
$sqlrun->{'insert_table1'}->{exec}->($colid2,"ORIGINAL");
# ...

I'm very impressed with bubaker's example of using a closure for this.
Just the same, if the original goal was to make the code-base smaller and more maintainable, I can't help thinking there's a lot of noise begging to be removed from the original code, before anyone embarks on a conversion to CDBI or DBIC etc (notwithstanding the great libraries they both are.)
If the $dbh had been instantiated with RaiseError set in the attributes, most of that code goes away:
$sql_update = qq { UPDATE database.table
SET column = 'UPDATE!'
WHERE id = ?
};
$sth_update = $dbh->prepare($sql_update);
$sth_update->execute($parameter);
I can't see that the error handling in the original code is adding much that you wouldn't get from the vanilla die produced by RaiseError, but if it's important, have a look at the HandleError attribute in the DBI manpage.
Furthermore, if such statements aren't being reused (which is often the main purpose of preparing them, to cache how they're optimised; the other reason is to mitigate against SQL injection by using placeholders), then why not use do?
$dbh->do($sql_update, \%attrs, #parameters);

Related

Filter Array For IDs Existing in Another Array with Ruby on Rails/Mongo

I need to compare the 2 arrays declared here to return records that exist only in the filtered_apps array. I am using the contents of previous_apps array to see if an ID in the record exists in filtered_apps array. I will be outputting the results to a CSV and displaying records that exist in both arrays to the console.
My question is this: How do I get the records that only exist in filtered_apps? Easiest for me would be to put those unique records into a new array to work with on the csv.
start_date = Date.parse("2022-02-05")
end_date = Date.parse("2022-05-17")
valid_year = start_date.year
dupe_apps = []
uniq_apps = []
# Finding applications that meet my criteria:
filtered_apps = FinancialAssistance::Application.where(
:is_requesting_info_in_mail => true,
:aasm_state => "determined",
:submitted_at => {
"$exists" => true,
"$gte" => start_date,
"$lte" => end_date })
# Finding applications that I want to compare against filtered_apps
previous_apps = FinancialAssistance::Application.where(
is_requesting_info_in_mail: true,
:submitted_at => {
"$exists" => true,
"$gte" => valid_year })
# I'm using this to pull the ID that I'm using for comparison just to make the comparison lighter by only storing the family_id
previous_apps.each do |y|
previous_apps_array << y.family_id
end
# This is where I'm doing my comparison and it is not working.
filtered_apps.each do |app|
if app.family_id.in?(previous_apps_array) == false
then #non_dupe_apps << app
else "No duplicate found for application #{app.hbx_id}"
end
end
end
So what am I doing wrong in the last code section?
Let's check your original method first (I fixed the indentation to make it clearer). There's quite a few issues with it:
filtered_apps.each do |app|
if app.family_id.in?(previous_apps_array) == false
# Where is "#non_dupe_apps" declared? It isn't anywhere in your example...
# Also, "then" is not necessary unless you want a one-line if-statement
then #non_dupe_apps << app
# This doesn't do anything, it's just a string
# You need to use "p" or "puts" to output something to the console
# Note that the "else" is also only triggered when duplicates WERE found...
else "No duplicate found for application #{app.hbx_id}"
end # Extra "end" here, this will mess things up
end
end
Also, you haven't declared previous_apps_array anywhere in your example, you just start adding to it out of nowhere.
Getting the difference between 2 arrays is dead easy in Ruby: just use -!
uniq_apps = filtered_apps - previous_apps
You can also do this with ActiveRecord results, since they are just arrays of ActiveRecord objects. However, this doesn't help if you specifically need to compare results using the family_id column.
TIP: Getting the values of only a specific column/columns from your database is probably best done with the pluck or select method if you don't need to store any other data about those objects. With pluck, you only get an array of values in the result, not the full objects. select works a bit differently and returns ActiveRecord objects, but filters out everything but the selected columns. select is usually better in nested queries, since it doesn't trigger a separate query when used as a part of another query, while pluck always triggers one.
# Querying straight from the database
# This is what I would recommend, but it doesn't print the values of duplicates
uniq_apps = filtered_apps.where.not(family_id: previous_apps.select(:family_id))
I highly recommend getting really familiar with at least filter/select, and map out of the basic array methods. They make things like this way easier. The Ruby docs are a great place to learn about them and others. A very simple example of doing a similar thing to what you explained in your question with filter/select on 2 arrays would be something like this:
arr = [1, 2, 3]
full_arr = [1, 2, 3, 4, 5]
unique_numbers = full_arr.filter do |num|
if arr.include?(num)
puts "Duplicates were found for #{num}"
false
else
true
end
end
# Duplicates were found for 1
# Duplicates were found for 2
# Duplicates were found for 3
=> [4, 5]
NOTE: The OP is working with ruby 2.5.9, where filter is not yet available as an array method (it was introduced in 2.6.3). However, filter is just an alias for select, which can be found on earlier versions of Ruby, so they can be used interchangeably. Personally, I prefer using filter because, as seen above, select is already used in other methods, and filter is also the more common term in other programming languages I usually work with. Of course when both are available, it doesn't really matter which one you use, as long as you keep it consistent.
EDIT: My last answer did, in fact, not work.
Here is the code all nice and working.
It turns out the issue was that when comparing family_id from the set of records I forgot that the looped record was a part of the set, so it would return it, too. I added a check for the ID of the array to match the looped record and bob's your uncle.
I added the pass and reject arrays so I could check my work instead of downloading a csv every time. Leaving them in mostly because I'm scared to change anything else.
start_date = Date.parse(date_from)
end_date = Date.parse(date_to)
valid_year = start_date.year
date_range = (start_date)..(end_date)
comparison_apps = FinancialAssistance::Application.by_year(start_date.year).where(
aasm_state:'determined',
is_requesting_voter_registration_application_in_mail:true)
apps = FinancialAssistance::Application.where(
:is_requesting_voter_registration_application_in_mail => true,
:submitted_at => date_range).uniq{ |n| n.family_id}
#pass_array = []
#reject_array = []
apps.each do |app|
family = app.family
app_id = app.id
previous_apps = comparison_apps.where(family_id:family.id,:id.ne => app.id)
if previous_apps.count > 0
#reject_array << app
puts "\e[32mApplicant hbx id \e[31m#{app.primary_applicant.person_hbx_id}\e[32m in family ID \e[31m#{family.id}\e[32m has registered to vote in a previous application.\e[0m"
else
<csv fields here>
csv << [csv fields here]
end
end
Basically, I pulled the applications into the app variable array, then filtered them by the family_id field in each record.
I had to do this because the issue at the bottom of everything was that there were records present in app that were themselves duplicates, only submitted a few days apart. Since I went on the assumption that the initial app array would be all unique, I thought the duplicates that were included were due to the rest of the code not filtering correctly.
I then use the uniq_apps array to filter through and look for matches in uniq_apps.each do, and when it finds a duplicate, it adds it to the previous_applications array inside the loop. Since this array resets each go-round, if it ever has more than 0 records in it, the app gets called out as being submitted already. Otherwise, it goes to my csv report.
Thanks for the help on this, it really got my brain thinking in another direction that I needed to. It also helped improve the code even though the issue was at the very beginning.

Does the feature "execute command multiple times" result in multiple round-trips to database?

In the Dapper documentation, it says you can use an IEnumerable parameter to execute a command multiple times. It gives the following example:
connection.Execute(#"insert MyTable(colA, colB) values (#a, #b)",
new[] { new { a=1, b=1 }, new { a=2, b=2 }, new { a=3, b=3 } }
).IsEqualTo(3); // 3 rows inserted: "1,1", "2,2" and "3,3"
Will this result in multiple round-trips to the database (i.e. one for each T in the IEnumerable<T>)? Or is Dapper smart enough to transform the multiple queries into a batch and just do one round-trip? The documentation says an example usage is batch loading, so I suspect it only does one round-trip, but I want to be sure before I use it for performance-sensitive code.
As a follow-up question, and depending on the answer to the first, I'd be curious how transactions are handled? That is, is there one transaction for the whole set of Ts, or one transaction per T?
I finally got around to looking at this again. Looking at the source code (in \Dapper\SqlMapper.cs), I found the following snippet in method ExecuteImpl:
// ...
foreach (var obj in multiExec)
{
if (isFirst)
{
masterSql = cmd.CommandText;
isFirst = false;
identity = new Identity(command.CommandText, cmd.CommandType, cnn, null, obj.GetType(), null);
info = GetCacheInfo(identity, obj, command.AddToCache);
}
else
{
cmd.CommandText = masterSql; // because we do magic replaces on "in" etc
cmd.Parameters.Clear(); // current code is Add-tastic
}
info.ParamReader(cmd, obj);
total += cmd.ExecuteNonQuery();
}
// ...
The interesting part is on the second-last line where ExecuteNonQuery is called. That method is being called on each iteration of the for loop, so I guess it is not being batched in the sense of a set-based operation. Therefore, multiple round-trips are required. However, it is being batched in the sense that all operations are performed on the same connection, and within the same transaction if so specified.
The only way I can think of to do a set-based operation is to create a custom table-valued type (in the database) for the object of interest. Then, in the .NET code pass a DataTable object containing matching names and types as a command parameter. If there were a way to do this without having to create a table-valued type for every object, I'd love to hear about it.

Generate sql query by anorm, with all nulls except one

I developing web application with play framework 2.3.8 and scala, with complex architecture on backend and front-end side. As backend we use MS SQL, with many stored procedures, and called it by anorm. And here one of the problems.
I need to update some fields in database. The front end calls play framework, and recive name of the field, and value. Then I parse, field name, and then I need to generate SQL Query for update field. I need assign null, for all parameters, except recived parameter. I try to do it like that:
def updateCensusPaperXX(name: String, value: String, user: User) = {
DB.withConnection { implicit c =>
try {
var sqlstring = "Execute [ScXX].[updateCensusPaperXX] {login}, {domain}"
val params = List(
"fieldName1",
"fieldName2",
...,
"fieldNameXX"
)
for (p <- params){
sqlstring += ", "
if (name.endsWith(p))
sqlstring += value
else
sqlstring += "null"
}
SQL(sqlstring)
.on(
"login" -> user.login,
"domain" -> user.domain,
).execute()
} catch {
case e: Throwable => Logger.error("update CensusPaper04 error", e)
}
}
}
But actually that doesn't work in all cases. For example, when I try to save string, it give's me an error like:
com.microsoft.sqlserver.jdbc.SQLServerException: Incorrect syntax near 'some phrase'
What is the best way to generate sql query using anorm with all nulls except one?
The reason this is happening is because when you write the string value directly into the SQL statement, it needs to be quoted. One way to solve this would be to determine which of the fields are strings and add conditional logic to determine whether to quote the value. This is probably not the best way to go about it. As a general rule, you should be using named parameters rather than building a string to with the parameter values. This has a few of benefits:
It will probably be easier for you to diagnose issues because you will get more sensible error messages back at runtime.
It protects against the possibility of SQL injection.
You get the usual performance benefit of reusing the prepared statement although this might not amount to much in the case of stored procedure invocation.
What this means is that you should treat your list of fields as named parameters as you do with user and domain. This can be accomplished with some minor changes to your code above. First, you can build your SQL statement as follows:
val params = List(
"fieldName1",
"fieldName2",
...,
"fieldNameXX"
)
val sqlString = "Execute [ScXX].[updateCensusPaperXX] {login}, {domain}," +
params.map("{" + _ + "}").mkString{","}
What is happening above is that you don't need to insert the values directly, so you can just build the string by adding the list of parameters to the end of your query string.
Then you can go ahead and start building your parameter list. Note, the parameters to the on method of SQL is a vararg list of NamedParameter. Basically, we need to create Seq of NamedParameters that covers "login", "domain" and the list of fields you are populating. Something like the following should work:
val userDomainParams: Seq[NamedParameter] = (("login",user.login),("domain",user.domain))
val additionalParams = params.map(p =>
if (name.endsWith(p))
NamedParameter(p, value)
else
NamedParameter(p, None)
).toSeq
val fullParams = userDomainParams ++ additionalParams
// At this point you can execute as follows
SQL(sqlString).on(fullParams:_*).execute()
What is happening here is that you building the list of parameters and then using the splat operator :_* to expand the sequence into the varargs needed as arguments for the on method. Note that the None used in the NamedParameter above is converted into a jdbc NULL by Anorm.
This takes care of the issue related to strings because you are no longer writing the string directly into the query and it has the added benefit eliminating other issues related with writing the SQL string rather than using parameters.

Contact form is changing order of form data

I have a contact form that uses a CGI script to get form data and send through email. The script works fine except for the fact that it seems to change the order of form elements. I think I have pinpointed the block of code responsible for this.
Is there a way to alter this so that it sends the form data as-is, without re-ordering?
sub get_data {
use CGI qw/:standard/;
my $query = new CGI;
foreach $key ($query->param()){
$data{$key} = $query->param($key);
}
%data; # return associative array of name=value
}
From perldoc CGI
If the script was invoked with a parameter list (e.g. "name1=value1&name2=value2&name3=value3"), the param() method will return the parameter names as a list. If the script was invoked as an script and contains a string without ampersands (e.g. "value1+value2+value3") , there will be a single parameter named "keywords" containing the "+"-delimited keywords.
NOTE: As of version 1.5, the array of parameter names returned will be in the same order as they were submitted by the browser. Usually this order is the same as the order in which the parameters are defined in the form (however, this isn't part of the spec, and so isn't guaranteed).
So you can keep order of keys by storing them in array,
my #ordered = $query->param();
or don't use hash at all,
my #data;
foreach $key ($query->param()){
push #data, [ $key, $query->param($key) ];
}
return #data;
Well, you're putting your parameters into a hash. And hashes have no intrinsic ordering. The only way to get key/value pairs out for a hash in the same order as you put them in, is to keep a separate array containing the order of the keys.
Do you really need the parameters in a hash? Can't you just use your foreach loop at the point when you're creating the email?

CakePHP ODBC driver connecting to Microsoft SQL Server; how to remove backticks?

I've got a CakePHP application connecting to a remote MSSQL server through ODBC, but it's not working as planned. Every query dies becasue it is trying to put backticks around identifiers, which is not correct for MSSQL.
As an example, I have a model called Item for a table called items, and when I call
$this->Item->find('all')
it tries to use the query
SELECT `Item`.`id`, `Item`.`name`, `Item`.`descrtiption` FROM `items` AS `Item` WHERE 1
...and I get an error about invalid syntax near ` at line 1.
Is there anyway to prevent this behaviour and remove the backticks? Or else use square brackets like SQL Server seems to like?
I recently took a good look around the odbc driver with the intention of using it against MSSQL 2008 in CakePHP 1.3. Unless you are prepared to put a considerable amount of work in then it's not feasible at present.
Your immediate problem is that you need to override the default quotes with [ and ]. These are set at the top of the dbo_odbc.php file here
var $startQuote = "[";
var $endQuote = "]";
Once you do this the next issue you will run into is the default use of LIMIT, so you'll need to provide your own limiting function copied from dbo_mssq to override
/**
* Returns a limit statement in the correct format for the particular database.
*
* #param integer $limit Limit of results returned
* #param integer $offset Offset from which to start results
* #return string SQL limit/offset statement
*/
function limit($limit, $offset = null) {
if ($limit) {
$rt = '';
if (!strpos(strtolower($limit), 'top') || strpos(strtolower($limit), 'top') === 0) {
$rt = ' TOP';
}
$rt .= ' ' . $limit;
if (is_int($offset) && $offset > 0) {
$rt .= ' OFFSET ' . $offset;
}
return $rt;
}
return null;
}
You'll then run into two problems, neither of which I solved.
In the describe function the odbc_field_type call is not returning a
field type. I'm not sure how critical this is if you describe the fields in the model, but it doesn't sound promising.
More crucially, in the fields function that's used to generate a field list cake works by recursively exploding the . syntax to generate a series of AS aliases. This is fine if you're recursion level is zero, but with deeper recursion you end up with a field list that looks something like 'this.that.other AS this_dot_that.other AS this_dot_that_dot_other', which is invalid MSSQL syntax.
Neither of these are unsolvable, but at this point I decided it was simpler to reload my server and use the MSSQL driver than continue to chase prblems with the ODBC driver, but YMMV
ADDED: This question seems to be getting a bit of attention: so anyone who takes this further could they append their code to this answer - and hopefully we can assemble a solution between us.
why dont you just use the mssql dbo https://github.com/cakephp/cakephp/blob/master/cake/libs/model/datasources/dbo/dbo_mssql.php

Resources