Perl performance is slow, file I/O issue or due to while loop - database

I have the following code in my while loop and it is significantly slow, any suggestions on how to improve this?
open IN, "<$FileDir/$file" || Err( "Failed to open $file at location: $FileDir" );
my $linenum = 0;
while ( $line = <IN> ) {
if ( $linenum == 0 ) {
Log(" This is header line : $line");
$linenum++;
} else {
$linenum++;
my $csv = Text::CSV_XS->new();
my $status = $csv->parse($line);
my #val = $csv->fields();
$index = 0;
Log("number of parameters for this file is: $sth->{NUM_OF_PARAMS}");
for ( $index = 0; $index <= $#val; $index++ ) {
if ( $index < $sth->{NUM_OF_PARAMS} ) {
$sth->bind_param( $index + 1, $val[$index] );
}
}
if ( $sth->execute() ) {
$ifa_dbh->commit();
} else {
Log("line $linenum insert failed");
$ifa_dbh->rollback();
exit(1);
}
}
}

By far the most expensive operation there is accessing the database server; it's a network trip, hundreds of milliseconds or some such, each time.
Are those DB operations inserts, as they appear? If so, instead of inserting row by row construct a string for an insert statement with multiple rows, in principle as many as there are, in that loop. Then run that one transaction.
Test and scale down as needed, if that adds up to too many rows. Can keep adding rows to the string for the insert statement up to a decided maximum number, insert that, then keep going.†
A few more readily seen inefficiencies
Don't construct an object every time through the loop. Build it once befor the loop, and then use/repopulate as needed in the loop. Then, there is no need for parse+fields here, while getline is also a bit faster
Don't need that if statement for every read. First read one line of data, and that's your header. Then enter the loop, without ifs
Altogether, without placeholders which now may not be needed, something like
my $csv = Text::CSV_XS->new({ binary => 1, auto_diag => 1 });
# There's a $table earlier, with its #fields to populate
my $qry = "INSERT into $table (", join(',', #fields), ") VALUES ";
open my $IN, '<', "$FileDir/$file"
or Err( "Failed to open $file at location: $FileDir" );
my $header_arrayref = $csv->getline($IN);
Log( "This is header line : #$header_arrayref" );
my #sql_values;
while ( my $row = $csv->getline($IN) ) {
# Use as many elements in the row (#$row) as there are #fields
push #sql_values, '(' .
join(',', map { $dbh->quote($_) } #$row[0..$#fields]) . ')';
# May want to do more to sanitize input further
}
$qry .= join ', ', #sql_values;
# Now $qry is readye. It is
# INSERT into table_name (f1,f2,...) VALUES (v11,v12...), (v21,v22...),...
$dbh->do($qry) or die $DBI::errstr;
I've also corrected the error handling when opening the file, since that || in the question binds too tightly in this case, and there's effectively open IN, ( "<$FileDir/$file" || Err(...) ). We need or instead of || there. Then, the three-argument open is better. See perlopentut
If you do need the placeholders, perhaps because you can't have a single insert but it must be broken into many or for security reasons, then you need to generate the exact ?-tuples for each row to be inserted, and later supply the right number of values for them.
Can assemble data first and then build the ?-tuples based on it
my $qry = "INSERT into $table (", join(',', #fields), ") VALUES ";
...
my #data;
while ( my $row = $csv->getline($IN) ) {
push #data, [ #$row[0..$#fields] ];
}
# Append the right number of (?,?...),... with the right number of ? in each
$qry .= join ', ', map { '(' . join(',', ('?')x#$_) . ')' } #data;
# Now $qry is ready to bind and execute
# INSERT into table_name (f1,f2,...) VALUES (?,?,...), (?,?,...), ...
$dbh->do($qry, undef, map { #$_ } #data) or die $DBI::errstr;
This may generate a very large string, what may push the limits of your RDBMS or some other resource. In that case break #data into smaller batches. Then prepare the statement with the right number of (?,?,...) row-values for a batch, and execute in the loop over the batches.‡
Finally, another way altogether is to directly load data from a file using the database's tool for that particular purpose. This will be far faster than going through DBI, probably even including the need to process your input CSV into another one which will have only the needed data.
Since you don't need all data from your input CSV file, first read and process the file as above and write out a file with only the needed data (#data above). Then, there's two possible ways
Either use an SQL command for this – COPY in PostgreSQL, LOAD DATA [LOCAL] INFILE in MySQL and Oracle (etc); or,
Use a dedicated tool for importing/loading files from your RDBMS – mysqlimport (MySQL), SQL*Loader/sqlldr (Oracle), etc. I'd expect this to be the fastest way
The second of these options can also be done out of a program, by running the appropriate tool as an external command via system (or better yet via the suitable libraries).
† In one application I've put together as much as millions of rows in the initial insert -- the string itself for that statement was in high tens of MB -- and that keeps running with ~100k rows inserted in a single statement daily, for a few years by now. This is postgresql on good servers, and of course ymmv.
‡
Some RDBMS do not support a multi-row (batch) insert query like the one used here; in particular Oracle seems not to. (We were informed in the end that that's the database used here.) But there are other ways to do it in Oracle, please see links in comments, and search for more. Then the script will need to construct a different query but the principle of operation is the same.

Related

SQL Server query using case statement IN Clause doesn't work [duplicate]

What are the best workarounds for using a SQL IN clause with instances of java.sql.PreparedStatement, which is not supported for multiple values due to SQL injection attack security issues: One ? placeholder represents one value, rather than a list of values.
Consider the following SQL statement:
SELECT my_column FROM my_table where search_column IN (?)
Using preparedStatement.setString( 1, "'A', 'B', 'C'" ); is essentially a non-working attempt at a workaround of the reasons for using ? in the first place.
What workarounds are available?
An analysis of the various options available, and the pros and cons of each is available in Jeanne Boyarsky's Batching Select Statements in JDBC entry on JavaRanch Journal.
The suggested options are:
Prepare SELECT my_column FROM my_table WHERE search_column = ?, execute it for each value and UNION the results client-side. Requires only one prepared statement. Slow and painful.
Prepare SELECT my_column FROM my_table WHERE search_column IN (?,?,?) and execute it. Requires one prepared statement per size-of-IN-list. Fast and obvious.
Prepare SELECT my_column FROM my_table WHERE search_column = ? ; SELECT my_column FROM my_table WHERE search_column = ? ; ... and execute it. [Or use UNION ALL in place of those semicolons. --ed] Requires one prepared statement per size-of-IN-list. Stupidly slow, strictly worse than WHERE search_column IN (?,?,?), so I don't know why the blogger even suggested it.
Use a stored procedure to construct the result set.
Prepare N different size-of-IN-list queries; say, with 2, 10, and 50 values. To search for an IN-list with 6 different values, populate the size-10 query so that it looks like SELECT my_column FROM my_table WHERE search_column IN (1,2,3,4,5,6,6,6,6,6). Any decent server will optimize out the duplicate values before running the query.
None of these options are ideal.
The best option if you are using JDBC4 and a server that supports x = ANY(y), is to use PreparedStatement.setArray as described in Boris's anwser.
There doesn't seem to be any way to make setArray work with IN-lists, though.
Sometimes SQL statements are loaded at runtime (e.g., from a properties file) but require a variable number of parameters. In such cases, first define the query:
query=SELECT * FROM table t WHERE t.column IN (?)
Next, load the query. Then determine the number of parameters prior to running it. Once the parameter count is known, run:
sql = any( sql, count );
For example:
/**
* Converts a SQL statement containing exactly one IN clause to an IN clause
* using multiple comma-delimited parameters.
*
* #param sql The SQL statement string with one IN clause.
* #param params The number of parameters the SQL statement requires.
* #return The SQL statement with (?) replaced with multiple parameter
* placeholders.
*/
public static String any(String sql, final int params) {
// Create a comma-delimited list based on the number of parameters.
final StringBuilder sb = new StringBuilder(
String.join(", ", Collections.nCopies(possibleValue.size(), "?")));
// For more than 1 parameter, replace the single parameter with
// multiple parameter placeholders.
if (sb.length() > 1) {
sql = sql.replace("(?)", "(" + sb + ")");
}
// Return the modified comma-delimited list of parameters.
return sql;
}
For certain databases where passing an array via the JDBC 4 specification is unsupported, this method can facilitate transforming the slow = ? into the faster IN (?) clause condition, which can then be expanded by calling the any method.
Solution for PostgreSQL:
final PreparedStatement statement = connection.prepareStatement(
"SELECT my_column FROM my_table where search_column = ANY (?)"
);
final String[] values = getValues();
statement.setArray(1, connection.createArrayOf("text", values));
try (ResultSet rs = statement.executeQuery()) {
while(rs.next()) {
// do some...
}
}
or
final PreparedStatement statement = connection.prepareStatement(
"SELECT my_column FROM my_table " +
"where search_column IN (SELECT * FROM unnest(?))"
);
final String[] values = getValues();
statement.setArray(1, connection.createArrayOf("text", values));
try (ResultSet rs = statement.executeQuery()) {
while(rs.next()) {
// do some...
}
}
No simple way AFAIK.
If the target is to keep statement cache ratio high (i.e to not create a statement per every parameter count), you may do the following:
create a statement with a few (e.g. 10) parameters:
... WHERE A IN (?,?,?,?,?,?,?,?,?,?) ...
Bind all actuall parameters
setString(1,"foo");
setString(2,"bar");
Bind the rest as NULL
setNull(3,Types.VARCHAR)
...
setNull(10,Types.VARCHAR)
NULL never matches anything, so it gets optimized out by the SQL plan builder.
The logic is easy to automate when you pass a List into a DAO function:
while( i < param.size() ) {
ps.setString(i+1,param.get(i));
i++;
}
while( i < MAX_PARAMS ) {
ps.setNull(i+1,Types.VARCHAR);
i++;
}
You can use Collections.nCopies to generate a collection of placeholders and join them using String.join:
List<String> params = getParams();
String placeHolders = String.join(",", Collections.nCopies(params.size(), "?"));
String sql = "select * from your_table where some_column in (" + placeHolders + ")";
try ( Connection connection = getConnection();
PreparedStatement ps = connection.prepareStatement(sql)) {
int i = 1;
for (String param : params) {
ps.setString(i++, param);
}
/*
* Execute query/do stuff
*/
}
An unpleasant work-around, but certainly feasible is to use a nested query. Create a temporary table MYVALUES with a column in it. Insert your list of values into the MYVALUES table. Then execute
select my_column from my_table where search_column in ( SELECT value FROM MYVALUES )
Ugly, but a viable alternative if your list of values is very large.
This technique has the added advantage of potentially better query plans from the optimizer (check a page for multiple values, tablescan only once instead once per value, etc) may save on overhead if your database doesn't cache prepared statements. Your "INSERTS" would need to be done in batch and the MYVALUES table may need to be tweaked to have minimal locking or other high-overhead protections.
Limitations of the in() operator is the root of all evil.
It works for trivial cases, and you can extend it with "automatic generation of the prepared statement" however it is always having its limits.
if you're creating a statement with variable number of parameters, that will make an sql parse overhead at each call
on many platforms, the number of parameters of in() operator are limited
on all platforms, total SQL text size is limited, making impossible for sending down 2000 placeholders for the in params
sending down bind variables of 1000-10k is not possible, as the JDBC driver is having its limitations
The in() approach can be good enough for some cases, but not rocket proof :)
The rocket-proof solution is to pass the arbitrary number of parameters in a separate call (by passing a clob of params, for example), and then have a view (or any other way) to represent them in SQL and use in your where criteria.
A brute-force variant is here http://tkyte.blogspot.hu/2006/06/varying-in-lists.html
However if you can use PL/SQL, this mess can become pretty neat.
function getCustomers(in_customerIdList clob) return sys_refcursor is
begin
aux_in_list.parse(in_customerIdList);
open res for
select *
from customer c,
in_list v
where c.customer_id=v.token;
return res;
end;
Then you can pass arbitrary number of comma separated customer ids in the parameter, and:
will get no parse delay, as the SQL for select is stable
no pipelined functions complexity - it is just one query
the SQL is using a simple join, instead of an IN operator, which is quite fast
after all, it is a good rule of thumb of not hitting the database with any plain select or DML, since it is Oracle, which offers lightyears of more than MySQL or similar simple database engines. PL/SQL allows you to hide the storage model from your application domain model in an effective way.
The trick here is:
we need a call which accepts the long string, and store somewhere where the db session can access to it (e.g. simple package variable, or dbms_session.set_context)
then we need a view which can parse this to rows
and then you have a view which contains the ids you're querying, so all you need is a simple join to the table queried.
The view looks like:
create or replace view in_list
as
select
trim( substr (txt,
instr (txt, ',', 1, level ) + 1,
instr (txt, ',', 1, level+1)
- instr (txt, ',', 1, level) -1 ) ) as token
from (select ','||aux_in_list.getpayload||',' txt from dual)
connect by level <= length(aux_in_list.getpayload)-length(replace(aux_in_list.getpayload,',',''))+1
where aux_in_list.getpayload refers to the original input string.
A possible approach would be to pass pl/sql arrays (supported by Oracle only), however you can't use those in pure SQL, therefore a conversion step is always needed. The conversion can not be done in SQL, so after all, passing a clob with all parameters in string and converting it witin a view is the most efficient solution.
Here's how I solved it in my own application. Ideally, you should use a StringBuilder instead of using + for Strings.
String inParenthesis = "(?";
for(int i = 1;i < myList.size();i++) {
inParenthesis += ", ?";
}
inParenthesis += ")";
try(PreparedStatement statement = SQLite.connection.prepareStatement(
String.format("UPDATE table SET value='WINNER' WHERE startTime=? AND name=? AND traderIdx=? AND someValue IN %s", inParenthesis))) {
int x = 1;
statement.setLong(x++, race.startTime);
statement.setString(x++, race.name);
statement.setInt(x++, traderIdx);
for(String str : race.betFair.winners) {
statement.setString(x++, str);
}
int effected = statement.executeUpdate();
}
Using a variable like x above instead of concrete numbers helps a lot if you decide to change the query at a later time.
I've never tried it, but would .setArray() do what you're looking for?
Update: Evidently not. setArray only seems to work with a java.sql.Array that comes from an ARRAY column that you've retrieved from a previous query, or a subquery with an ARRAY column.
My workaround is:
create or replace type split_tbl as table of varchar(32767);
/
create or replace function split
(
p_list varchar2,
p_del varchar2 := ','
) return split_tbl pipelined
is
l_idx pls_integer;
l_list varchar2(32767) := p_list;
l_value varchar2(32767);
begin
loop
l_idx := instr(l_list,p_del);
if l_idx > 0 then
pipe row(substr(l_list,1,l_idx-1));
l_list := substr(l_list,l_idx+length(p_del));
else
pipe row(l_list);
exit;
end if;
end loop;
return;
end split;
/
Now you can use one variable to obtain some values in a table:
select * from table(split('one,two,three'))
one
two
three
select * from TABLE1 where COL1 in (select * from table(split('value1,value2')))
value1 AAA
value2 BBB
So, the prepared statement could be:
"select * from TABLE where COL in (select * from table(split(?)))"
Regards,
Javier Ibanez
I suppose you could (using basic string manipulation) generate the query string in the PreparedStatement to have a number of ?'s matching the number of items in your list.
Of course if you're doing that you're just a step away from generating a giant chained OR in your query, but without having the right number of ? in the query string, I don't see how else you can work around this.
You could use setArray method as mentioned in this javadoc:
PreparedStatement statement = connection.prepareStatement("Select * from emp where field in (?)");
Array array = statement.getConnection().createArrayOf("VARCHAR", new Object[]{"E1", "E2","E3"});
statement.setArray(1, array);
ResultSet rs = statement.executeQuery();
Here's a complete solution in Java to create the prepared statement for you:
/*usage:
Util u = new Util(500); //500 items per bracket.
String sqlBefore = "select * from myTable where (";
List<Integer> values = new ArrayList<Integer>(Arrays.asList(1,2,4,5));
string sqlAfter = ") and foo = 'bar'";
PreparedStatement ps = u.prepareStatements(sqlBefore, values, sqlAfter, connection, "someId");
*/
import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.SQLException;
import java.util.ArrayList;
import java.util.List;
public class Util {
private int numValuesInClause;
public Util(int numValuesInClause) {
super();
this.numValuesInClause = numValuesInClause;
}
public int getNumValuesInClause() {
return numValuesInClause;
}
public void setNumValuesInClause(int numValuesInClause) {
this.numValuesInClause = numValuesInClause;
}
/** Split a given list into a list of lists for the given size of numValuesInClause*/
public List<List<Integer>> splitList(
List<Integer> values) {
List<List<Integer>> newList = new ArrayList<List<Integer>>();
while (values.size() > numValuesInClause) {
List<Integer> sublist = values.subList(0,numValuesInClause);
List<Integer> values2 = values.subList(numValuesInClause, values.size());
values = values2;
newList.add( sublist);
}
newList.add(values);
return newList;
}
/**
* Generates a series of split out in clause statements.
* #param sqlBefore ""select * from dual where ("
* #param values [1,2,3,4,5,6,7,8,9,10]
* #param "sqlAfter ) and id = 5"
* #return "select * from dual where (id in (1,2,3) or id in (4,5,6) or id in (7,8,9) or id in (10)"
*/
public String genInClauseSql(String sqlBefore, List<Integer> values,
String sqlAfter, String identifier)
{
List<List<Integer>> newLists = splitList(values);
String stmt = sqlBefore;
/* now generate the in clause for each list */
int j = 0; /* keep track of list:newLists index */
for (List<Integer> list : newLists) {
stmt = stmt + identifier +" in (";
StringBuilder innerBuilder = new StringBuilder();
for (int i = 0; i < list.size(); i++) {
innerBuilder.append("?,");
}
String inClause = innerBuilder.deleteCharAt(
innerBuilder.length() - 1).toString();
stmt = stmt + inClause;
stmt = stmt + ")";
if (++j < newLists.size()) {
stmt = stmt + " OR ";
}
}
stmt = stmt + sqlAfter;
return stmt;
}
/**
* Method to convert your SQL and a list of ID into a safe prepared
* statements
*
* #throws SQLException
*/
public PreparedStatement prepareStatements(String sqlBefore,
ArrayList<Integer> values, String sqlAfter, Connection c, String identifier)
throws SQLException {
/* First split our potentially big list into lots of lists */
String stmt = genInClauseSql(sqlBefore, values, sqlAfter, identifier);
PreparedStatement ps = c.prepareStatement(stmt);
int i = 1;
for (int val : values)
{
ps.setInt(i++, val);
}
return ps;
}
}
Spring allows passing java.util.Lists to NamedParameterJdbcTemplate , which automates the generation of (?, ?, ?, ..., ?), as appropriate for the number of arguments.
For Oracle, this blog posting discusses the use of oracle.sql.ARRAY (Connection.createArrayOf doesn't work with Oracle). For this you have to modify your SQL statement:
SELECT my_column FROM my_table where search_column IN (select COLUMN_VALUE from table(?))
The oracle table function transforms the passed array into a table like value usable in the IN statement.
try using the instr function?
select my_column from my_table where instr(?, ','||search_column||',') > 0
then
ps.setString(1, ",A,B,C,");
Admittedly this is a bit of a dirty hack, but it does reduce the opportunities for sql injection. Works in oracle anyway.
Sormula supports SQL IN operator by allowing you to supply a java.util.Collection object as a parameter. It creates a prepared statement with a ? for each of the elements the collection. See Example 4 (SQL in example is a comment to clarify what is created but is not used by Sormula).
Generate the query string in the PreparedStatement to have a number of ?'s matching the number of items in your list. Here's an example:
public void myQuery(List<String> items, int other) {
...
String q4in = generateQsForIn(items.size());
String sql = "select * from stuff where foo in ( " + q4in + " ) and bar = ?";
PreparedStatement ps = connection.prepareStatement(sql);
int i = 1;
for (String item : items) {
ps.setString(i++, item);
}
ps.setInt(i++, other);
ResultSet rs = ps.executeQuery();
...
}
private String generateQsForIn(int numQs) {
String items = "";
for (int i = 0; i < numQs; i++) {
if (i != 0) items += ", ";
items += "?";
}
return items;
}
instead of using
SELECT my_column FROM my_table where search_column IN (?)
use the Sql Statement as
select id, name from users where id in (?, ?, ?)
and
preparedStatement.setString( 1, 'A');
preparedStatement.setString( 2,'B');
preparedStatement.setString( 3, 'C');
or use a stored procedure this would be the best solution, since the sql statements will be compiled and stored in DataBase server
I came across a number of limitations related to prepared statement:
The prepared statements are cached only inside the same session (Postgres), so it will really work only with connection pooling
A lot of different prepared statements as proposed by #BalusC may cause the cache to overfill and previously cached statements will be dropped
The query has to be optimized and use indices. Sounds obvious, however e.g. the ANY(ARRAY...) statement proposed by #Boris in one of the top answers cannot use indices and query will be slow despite caching
The prepared statement caches the query plan as well and the actual values of any parameters specified in the statement are unavailable.
Among the proposed solutions I would choose the one that doesn't decrease the query performance and makes the less number of queries. This will be the #4 (batching few queries) from the #Don link or specifying NULL values for unneeded '?' marks as proposed by #Vladimir Dyuzhev
SetArray is the best solution but its not available for many older drivers. The following workaround can be used in java8
String baseQuery ="SELECT my_column FROM my_table where search_column IN (%s)"
String markersString = inputArray.stream().map(e -> "?").collect(joining(","));
String sqlQuery = String.format(baseSQL, markersString);
//Now create Prepared Statement and use loop to Set entries
int index=1;
for (String input : inputArray) {
preparedStatement.setString(index++, input);
}
This solution is better than other ugly while loop solutions where the query string is built by manual iterations
I just worked out a PostgreSQL-specific option for this. It's a bit of a hack, and comes with its own pros and cons and limitations, but it seems to work and isn't limited to a specific development language, platform, or PG driver.
The trick of course is to find a way to pass an arbitrary length collection of values as a single parameter, and have the db recognize it as multiple values. The solution I have working is to construct a delimited string from the values in the collection, pass that string as a single parameter, and use string_to_array() with the requisite casting for PostgreSQL to properly make use of it.
So if you want to search for "foo", "blah", and "abc", you might concatenate them together into a single string as: 'foo,blah,abc'. Here's the straight SQL:
select column from table
where search_column = any (string_to_array('foo,blah,abc', ',')::text[]);
You would obviously change the explicit cast to whatever you wanted your resulting value array to be -- int, text, uuid, etc. And because the function is taking a single string value (or two I suppose, if you want to customize the delimiter as well), you can pass it as a parameter in a prepared statement:
select column from table
where search_column = any (string_to_array($1, ',')::text[]);
This is even flexible enough to support things like LIKE comparisons:
select column from table
where search_column like any (string_to_array('foo%,blah%,abc%', ',')::text[]);
Again, no question it's a hack, but it works and allows you to still use pre-compiled prepared statements that take *ahem* discrete parameters, with the accompanying security and (maybe) performance benefits. Is it advisable and actually performant? Naturally, it depends, as you've got string parsing and possibly casting going on before your query even runs. If you're expecting to send three, five, a few dozen values, sure, it's probably fine. A few thousand? Yeah, maybe not so much. YMMV, limitations and exclusions apply, no warranty express or implied.
But it works.
No one else seems to have suggested using an off-the-shelf query builder yet, like jOOQ or QueryDSL or even Criteria Query that manage dynamic IN lists out of the box, possibly including the management of all edge cases that may arise, such as:
Running into Oracle's maximum of 1000 elements per IN list (irrespective of the number of bind values)
Running into any driver's maximum number of bind values, which I've documented in this answer
Running into cursor cache contention problems because too many distinct SQL strings are "hard parsed" and execution plans cannot be cached anymore (jOOQ and since recently also Hibernate work around this by offering IN list padding)
(Disclaimer: I work for the company behind jOOQ)
Just for completeness: So long as the set of values is not too large, you could also simply string-construct a statement like
... WHERE tab.col = ? OR tab.col = ? OR tab.col = ?
which you could then pass to prepare(), and then use setXXX() in a loop to set all the values. This looks yucky, but many "big" commercial systems routinely do this kind of thing until they hit DB-specific limits, such as 32 KB (I think it is) for statements in Oracle.
Of course you need to ensure that the set will never be unreasonably large, or do error trapping in the event that it is.
Following Adam's idea. Make your prepared statement sort of select my_column from my_table where search_column in (#)
Create a String x and fill it with a number of "?,?,?" depending on your list of values
Then just change the # in the query for your new String x an populate
There are different alternative approaches that we can use for IN clause in PreparedStatement.
Using Single Queries - slowest performance and resource intensive
Using StoredProcedure - Fastest but database specific
Creating dynamic query for PreparedStatement - Good Performance but doesn't get benefit of caching and PreparedStatement is recompiled every time.
Use NULL in PreparedStatement queries - Optimal performance, works great when you know the limit of IN clause arguments. If there is no limit, then you can execute queries in batch.
Sample code snippet is;
int i = 1;
for(; i <=ids.length; i++){
ps.setInt(i, ids[i-1]);
}
//set null for remaining ones
for(; i<=PARAM_SIZE;i++){
ps.setNull(i, java.sql.Types.INTEGER);
}
You can check more details about these alternative approaches here.
For some situations regexp might help.
Here is an example I've checked on Oracle, and it works.
select * from my_table where REGEXP_LIKE (search_column, 'value1|value2')
But there is a number of drawbacks with it:
Any column it applied should be converted to varchar/char, at least implicitly.
Need to be careful with special characters.
It can slow down performance - in my case IN version uses index and range scan, and REGEXP version do full scan.
After examining various solutions in different forums and not finding a good solution, I feel the below hack I came up with, is the easiest to follow and code:
Example: Suppose you have multiple parameters to pass in the 'IN' clause. Just put a dummy String inside the 'IN' clause, say, "PARAM" do denote the list of parameters that will be coming in the place of this dummy String.
select * from TABLE_A where ATTR IN (PARAM);
You can collect all the parameters into a single String variable in your Java code. This can be done as follows:
String param1 = "X";
String param2 = "Y";
String param1 = param1.append(",").append(param2);
You can append all your parameters separated by commas into a single String variable, 'param1', in our case.
After collecting all the parameters into a single String you can just replace the dummy text in your query, i.e., "PARAM" in this case, with the parameter String, i.e., param1. Here is what you need to do:
String query = query.replaceFirst("PARAM",param1); where we have the value of query as
query = "select * from TABLE_A where ATTR IN (PARAM)";
You can now execute your query using the executeQuery() method. Just make sure that you don't have the word "PARAM" in your query anywhere. You can use a combination of special characters and alphabets instead of the word "PARAM" in order to make sure that there is no possibility of such a word coming in the query. Hope you got the solution.
Note: Though this is not a prepared query, it does the work that I wanted my code to do.
Just for completeness and because I did not see anyone else suggest it:
Before implementing any of the complicated suggestions above consider if SQL injection is indeed a problem in your scenario.
In many cases the value provided to IN (...) is a list of ids that have been generated in a way that you can be sure that no injection is possible... (e.g. the results of a previous select some_id from some_table where some_condition.)
If that is the case you might just concatenate this value and not use the services or the prepared statement for it or use them for other parameters of this query.
query="select f1,f2 from t1 where f3=? and f2 in (" + sListOfIds + ");";
PreparedStatement doesn't provide any good way to deal with SQL IN clause. Per http://www.javaranch.com/journal/200510/Journal200510.jsp#a2 "You can't substitute things that are meant to become part of the SQL statement. This is necessary because if the SQL itself can change, the driver can't precompile the statement. It also has the nice side effect of preventing SQL injection attacks." I ended up using following approach:
String query = "SELECT my_column FROM my_table where search_column IN ($searchColumns)";
query = query.replace("$searchColumns", "'A', 'B', 'C'");
Statement stmt = connection.createStatement();
boolean hasResults = stmt.execute(query);
do {
if (hasResults)
return stmt.getResultSet();
hasResults = stmt.getMoreResults();
} while (hasResults || stmt.getUpdateCount() != -1);
OK, so I couldn't remember exactly how (or where) I did this before so I came to stack overflow to quickly find the answer. I was surprised I couldn't.
So, how I got around the IN problem a long time ago was with a statement like this:
where myColumn in ( select regexp_substr(:myList,'[^,]+', 1, level) from dual connect by regexp_substr(:myList, '[^,]+', 1, level) is not null)
set the myList parameter as a comma delimited string: A,B,C,D...
Note: You have to set the parameter twice!
This is not the ideal practice, yet it's simple and works well for me most of the time.
where ? like concat( "%|", TABLE_ID , "|%" )
Then you pass through ? the IDs in this way: |1|,|2|,|3|,...|

What causes DBD::ODBC::st fetchrow_array failed: st_fetch/SQLFetch and how to avoid it?

EDIT1:
I don't actually get an empty array as stated below. Instead, I get an empty response body because of the following exception:
DBD::ODBC::st fetchrow_array failed: st_fetch/SQLFetch (long truncated DBI attribute LongTruncOk not set and/or LongReadLen too small) (SQL-HY000) [state was HY000 now 01004]
Which I can see there are posts about. I will look at those to see if I can fix this on my on. Will edit if not successful.
First let me start by saying, I do not know Perl well at all. This could be a careless error — I hope it is. I am building a hash from an array that is returned from SQL or from JavaScript on the front-end and one of the keys in the hash, "short-desc" needs to have the value which in the code below will be coming from a SQL database.
BFHHOTH 15x24S/S +2 UP-HNDWHL-UNSPOKED-GALV 15"x24" flush escape hatch w/hinge, internal handwheel, T-handle on top steel cover and ring
However with the code (removed unnecessary cases from switch):
#!perl
use Switch;
use DBI;
use JSON;
use CGI qw /param/;
use CGI::Carp qw(fatalsToBrowser);
use IO::Compress::Gzip qw(gzip $GzipError);
use URI::Encode;
use URI::Escape;
my $gzip_ok;
my $accept_encoding = $ENV{HTTP_ACCEPT_ENCODING};
if ( $accept_encoding && $accept_encoding =~ /\bgzip\b/ ) {
# $gzip_ok = 1;
}
print "Content-Type: application/json\n";
if ($gzip_ok) {
print "Content-Encoding: gzip\n";
}
print "\n";
my $action = param('ACTION');
my %jsonData;
my #jsonArray;
my $azDSN = DBI->connect('dbi:ODBC:Driver={SQL Server Native Client 10.0};Server=myServer;Database=myDB;Uid=me;Pwd={myPass};Encrypt=yes;Connection Timeout=30;');
switch ($action) {
case "GETINFO" {
my $paramID = param('ID');
getInfo($paramID);
my $json_text = JSON->new->pretty->utf8->encode( \#jsonArray );
if ($gzip_ok) {
my $zipText;
gzip \$json_text, \$zipText,
or die "gzip failed: $GzipError\n";
print $zipText;
}
else {
print $json_text;
}
}
}
sub getInfo {
my $myID = $_[0];
my $statement = <<"SQL";
SELECT
trefQuoteItemsID,
quote_position,
description,
comments
FROM
myDB.dbo.myTable where tID = $myID;
SQL
my $sti = $azDSN->prepare($statement) or die $statement;
$sti->execute() or die $DBI::errstr;
while ( my #row = $sti->fetchrow_array ) {
my %tempData;
%tempData = (
"tref" => $row[0],
"position" => $row[1],
"short_desc" => $row[2],
"comments" => $row[3]
);
$jsonArray[$count] = {%tempData};
$count++;
}
}
An empty array is returned to me on the front-end.
Oddly, if the string is:
BFHHOTH 15x24S/S +2 UP-HNDWHL-UNSPOKED-
the array contains the correct object.
But empty again if the string is:
BFHHOTH 15x24S/S +2 UP-HNDWHL-UNSPOKED-G
Have also tried with strings:
qwertyuiopasdfghjklzxcvbnm1234567890qwe #length is 39
which lets the hash gets built and:
qwertyuiopasdfghjklzxcvbnm1234567890qwer #length is 40
which will return an empty array so hash doesn't get built.
Are there any Perl gurus who have any suggestions?
From what I can tell, it's related to "long object" columns.
DBD::ODBC says:
You can retrieve a lob in chunks like this:
$sth->bind_col($column, undef, {TreatAsLOB=>1});
while(my $retrieved = $sth->odbc_lob_read($column, \my $data, $length)) {
print "retrieved=$retrieved lob_data=$data\n";
}
NOTE: to retrieve a lob like this you must first bind the lob column specifying BindAsLOB or DBD::ODBC will 1) bind the column as normal and it will be subject to LongReadLen and b) fail odbc_lob_read.
NOTE: Some database engines and ODBC drivers do not allow you to retrieve columns out of order (e.g., MS SQL Server unless you are using cursors). In those cases you must ensure the lob retrieved is the last (or only) column in your select list.
NOTE: You can retrieve only part of a lob but you will probably have to call finish on the statement handle before you do anything else with that statement. When only retrieving part of a large lob you could see a small delay when you call finish as some protocols used by ODBC drivers send the lob down the socket synchronously and there is no way to stop it (this means the ODBC driver needs to read all the lob from the socket even though you never retrieved it all yourself).
NOTE: If your select contains multiple lobs you cannot read part of the first lob, the second lob then return to the first lob. You must read all lobs in order and completely or read part of a lob and then do no further calls to odbc_lob_read.
There's no mention if you can retrieve a lob any other way, so I don't know if you have to do it this way, but at least you know one way. But I believe you can work around the problem by increasing the connection's LongReadLen attribute.
You should be able to set the attribute as follows:
my $dbh = DBI->connect($dsn, $user, $passwd, {
...,
LongReadLen => ...,
});
You should also be able to set the attribute as follows:
$dbh->{LongReadLen} = ...;
Hopefully, someone can give you a better answer.

mysql2sqlite.sh script is not working as required

I am using mysql2sqlite.sh from script Github to change my mysql database to sqlite. But the problem i am getting is that in my table the data 'E-001' gets changed to 'E?001'.
I have no idea how to modify the script to get the required result. Please help me.
the script is
#!/bin/sh
# Converts a mysqldump file into a Sqlite 3 compatible file. It also extracts the MySQL `KEY xxxxx` from the
# CREATE block and create them in separate commands _after_ all the INSERTs.
# Awk is choosen because it's fast and portable. You can use gawk, original awk or even the lightning fast mawk.
# The mysqldump file is traversed only once.
# Usage: $ ./mysql2sqlite mysqldump-opts db-name | sqlite3 database.sqlite
# Example: $ ./mysql2sqlite --no-data -u root -pMySecretPassWord myDbase | sqlite3 database.sqlite
# Thanks to and #artemyk and #gkuenning for their nice tweaks.
mysqldump --compatible=ansi --skip-extended-insert --compact "$#" | \
awk '
BEGIN {
FS=",$"
print "PRAGMA synchronous = OFF;"
print "PRAGMA journal_mode = MEMORY;"
print "BEGIN TRANSACTION;"
}
# CREATE TRIGGER statements have funny commenting. Remember we are in trigger.
/^\/\*.*CREATE.*TRIGGER/ {
gsub( /^.*TRIGGER/, "CREATE TRIGGER" )
print
inTrigger = 1
next
}
# The end of CREATE TRIGGER has a stray comment terminator
/END \*\/;;/ { gsub( /\*\//, "" ); print; inTrigger = 0; next }
# The rest of triggers just get passed through
inTrigger != 0 { print; next }
# Skip other comments
/^\/\*/ { next }
# Print all `INSERT` lines. The single quotes are protected by another single quote.
/INSERT/ {
gsub( /\\\047/, "\047\047" )
gsub(/\\n/, "\n")
gsub(/\\r/, "\r")
gsub(/\\"/, "\"")
gsub(/\\\\/, "\\")
gsub(/\\\032/, "\032")
print
next
}
# Print the `CREATE` line as is and capture the table name.
/^CREATE/ {
print
if ( match( $0, /\"[^\"]+/ ) ) tableName = substr( $0, RSTART+1, RLENGTH-1 )
}
# Replace `FULLTEXT KEY` or any other `XXXXX KEY` except PRIMARY by `KEY`
/^ [^"]+KEY/ && !/^ PRIMARY KEY/ { gsub( /.+KEY/, " KEY" ) }
# Get rid of field lengths in KEY lines
/ KEY/ { gsub(/\([0-9]+\)/, "") }
# Print all fields definition lines except the `KEY` lines.
/^ / && !/^( KEY|\);)/ {
gsub( /AUTO_INCREMENT|auto_increment/, "" )
gsub( /(CHARACTER SET|character set) [^ ]+ /, "" )
gsub( /DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP|default current_timestamp on update current_timestamp/, "" )
gsub( /(COLLATE|collate) [^ ]+ /, "" )
gsub(/(ENUM|enum)[^)]+\)/, "text ")
gsub(/(SET|set)\([^)]+\)/, "text ")
gsub(/UNSIGNED|unsigned/, "")
if (prev) print prev ","
prev = $1
}
# `KEY` lines are extracted from the `CREATE` block and stored in array for later print
# in a separate `CREATE KEY` command. The index name is prefixed by the table name to
# avoid a sqlite error for duplicate index name.
/^( KEY|\);)/ {
if (prev) print prev
prev=""
if ($0 == ");"){
print
} else {
if ( match( $0, /\"[^"]+/ ) ) indexName = substr( $0, RSTART+1, RLENGTH-1 )
if ( match( $0, /\([^()]+/ ) ) indexKey = substr( $0, RSTART+1, RLENGTH-1 )
key[tableName]=key[tableName] "CREATE INDEX \"" tableName "_" indexName "\" ON \"" tableName "\" (" indexKey ");\n"
}
}
# Print all `KEY` creation lines.
END {
for (table in key) printf key[table]
print "END TRANSACTION;"
}
'
exit 0
I can't give a guaranteed solution, but here's a simple technique I've been using successfully to handle similar issues (See "Notes", below). I've been wrestling with this script the last few days, and figure this is worth sharing in case there are others who need to tweak it but are stymied by the awk learning curve.
The basic idea is to have the script output to a text file, edit the file, then import into sqlite (More detailed instructions below).
You might have to experiment a bit, but at least you won't have to learn awk (though I've been trying and it's pretty fun...).
HOW TO
Run the script, exporting to a file (instead of passing directly
to sqlite3):
./mysql2sqlite -u root -pMySecretPassWord myDbase > sqliteimport.sql
Use your preferred text editing technique to clean up whatever mess
you've run into. For example, search/replace in sublimetext. (See the last note, below, for a tip.)
Import the cleaned up script into sqlite:
sqlite3 database.sqlite < sqliteimport.sql
NOTES:
I suspect what you're dealing with is an encoding problem -- that '-' represents a character that isn't recognized by, or means something different to, either your shell, the script (awk), or your sqlite database. Depending on your situation, you may not be able to finesse the problem (see the next note).
Be forewarned that this is most likely only going to work if the offending characters are embedded in text data (not just as text, but actual text content stored in a text field). If they're in a machine name (foreign key field, entity id, e.g.), binary data stored as text, or text data stored in a binary field (blob, eg), be careful. You could try it, but don't get your hopes up, and even if it seems to work be sure to test the heck out of it.
If in fact that '-' represents some unusual character, you probably won't be able to just type a hyphen into the 'search' field of your search/replace tool. Copy it from the source data (eg., open the file, highlight and copy to clipboard) then paste into the tool.
Hope this helps!
To convert mysql to sqlite3 you can use Navicom Premium.

Querying multiple times in Oracle using perl returns only the first query

Note: I have corrected the variable differences and it does print the query from the first set but it returns nothing from the second set. If I use the second set only it works.
In the code below, I have some_array which is array of array the array contains text like name. So
#some_array= ([sam, jon, july],[Mike, Han,Tommy],[angie, sita, lanny]);
Now when I querying the list like 'sam jon july' first and 'mike han tommy' . Only the execute return the result from the first list others is undef. I don't know why any help will be appreciated.
my $pointer;
my $db = $db->prepare_cached("
begin
:pointer := myFun(:A1);
end;
") or die "Couldn't prepare stat: " . $db->errstr;
$db->bind_param_inout(":pointer",\$pointer,0,{ ora_type => ORA_RSET });
for (my $i=0; $i < #some_array ; $i++) {
my #firstarray = #{$some_array[$i]};
my $sql = lc(join(" ", #firstarray));
print "<pre>$sql</pre>\n";
$db->bind_param(":A1",$sql);
$db->execute();
print "<pre>".Dumper($db->execute())."</pre>\n";
}
Just like everyone told you on the last question you asked, initialize your array with parentheses, not nested brackets.
#some_array= ([sam, jon, july],[Mike, Han,Tommy],[angie, sita, lanny])
not
#some_array= [[sam, jon, july],[Mike, Han,Tommy],[angie, sita, lanny]]
You would also benefit tremendously from including
use strict;
use warnings;
at the top of all of your programs. That would catch the strange way you are trying to initialize #some_array, and it would catch your inconsistent usage of #sql and #query. update and $sdh and $db and $dbh.

array deccleration

I have written a code in php to connect and insert into a MSSQL database. i don't know much in php because am new to this. i used odbc to connect database.user can enter his details through the form. after submitting the form the details are getting stored into a database.
while inserting rows into a database am not trying to insert duplicate values . for this i have given if conditions.these conditions are able to notice the user cname and name exist in the database if the same name exist. but the else part after these conditions not working i.e rows are not getting inserted. i put everything inside the while loop. how can i correct it?
this is my code written in php
$connect = odbc_connect('ServerDB','sa', 'pwd');//connects database
$query2="select count(*) from company";//this is needer for loop through
$result2=odbc_exec($connect,$query2);
while(odbc_fetch_row($result2));
{
$count=odbc_result($result2,1);
echo "</br>","$count";
}
$query1="select * from company";
$result1 = odbc_exec($connect, $query1);
# fetch the data from the database
$index=0;
while(odbc_fetch_row($result1))
{
$compar[$count] = odbc_result($result1, 1);
$namearray[$count] = odbc_result($result1, 2);
if($compar[$count]==$_POST['cname'])
{
echo "<script> alert(\"cname Exists\") </script>";
}
else if($namearray[$count]==$_POST['name'])
{
echo "<script> alert(\"Name Exists\") </script>";
}
else {
$query=("INSERT INTO company(cname,name) VALUES ('$_POST[cname]','$_POST[name]') ");
$result = odbc_exec($connect, $query);
echo "<script> alert(\"Row Inserted\") </script>";
} }
You generally don't allocate memory in PHP. The PHP interpreter handles it for you. Just go ahead and assign elements to your array and all the memory allocation is taken care of for you.
$index = 0;
Then you do $index++ until the last index is reached.
So index further on stays e.g. 12
So anywhere you can only access the index [12]
In PHP arrays are implemented as what would be a HashMap in Java. It will be sized automatically depending on what you put into it, and there is no way you can give an initial capacity.

Resources