I'm connecting to a large postgresql database(300 million records) with libdbi and perform a SELECT * query, then show the result line by line. I'm getting full swap and memory so it seems autocommit is enabled and it loads the whole result set in the memory. Is there any option to disable autocommit or at least hold cursors after commit like ResultSet.HOLD_CURSORS_OVER_COMMIT in java? I didn't find any option for dbi_conn_set_option to do it. here is my code:
dbi_conn conn;
dbi_result result;
int64_t id;
dbi_initialize(NULL);
conn = dbi_conn_new("pgsql");
if (conn == NULL)
{
printf("connection error.\n");
return EXIT_FAILURE;
}
dbi_conn_set_option(conn, "host", "127.0.0.1");
dbi_conn_set_option(conn, "username", "postgres");
dbi_conn_set_option(conn, "password", "123456");
dbi_conn_set_option(conn, "dbname", "backup");
if (dbi_conn_connect(conn) < 0)
{
printf("could not connect to database.\n");
return EXIT_FAILURE;
}
result = dbi_conn_query(conn, "SELECT * FROM tbl");
if (result)
{
while (dbi_result_next_row(result))
{
id = dbi_result_get_longlong(result, "_id");
printf("This is _id: %ld\n", id);
}
dbi_result_free(result);
}
dbi_conn_close(conn);
dbi_shutdown();
It's not related to autocommit.
The solution to the out-of-memory problem is indeed to use a cursor to fetch N results at a time as opposed to all results in one step.
libdbi doesn't provide an abstraction for SQL cursors, so it needs to be done with SQL queries.
The documentation's page on FETCH has a complete sequence of queries in its example that shows how it's done. You need to call these queries with libdbi in C with two loops: an outer loop calling FETCH N from cursor_name until there's nothing left to fetch, and an inner loop processing the results of that FETCH, just like your current code processing the results of the select * itself.
Related
I need to find number of pages scanned by the operator in a sql-server query plan.
I used
SET STATISTICS IO ON
This returned number of logical, physical, read-ahead pages scanned per table,
but I need for every operator.
Moreover, I am unable to read the IO messages using JDBC driver programs, and
I have more than 100 queries to be executed and to record number of pages read by each operator.
Is there any method to at least get number of pages per table, that can be accessed by JDBC driver programs
OR
Is there any flag kind of thing to be set to get the number of pages scanned in the XML PLAN Itself.
Regarding the IO messages, informational and warning messages generated during query execution can be retrieved in JDBC with getWarnings. Below is a prepared statement example.
try (
Connection con = DriverManager.getConnection("jdbc:sqlserver://yourserver:1433;databaseName=AdventureWorks;user=youruserid;password=y0urp#ssw0rd;");
PreparedStatement ps=con.prepareStatement ("SET STATISTICS IO ON;SELECT * FROM Person.Person WHERE BusinessEntityID = ?;");
) {
ps.setInt(1, 1);
ResultSet rs = ps.executeQuery();
//consume result set(s)
do {
if(!rs.isClosed()) {
while(rs.next()) {}
rs.close();
}
} while(ps.getMoreResults());
//get info and warning messages (including statistic io messages)
SQLWarning w = ps.getWarnings();
while(w != null) {
System.out.println(w.getMessage());
w = w.getNextWarning();
}
} catch (SQLException e1) {
throw e1;
}
Your other questions are better answered in your dba.stackexchange question.
I'm using R to do a statistical analysis on a SQL Server 2008 R2 database. My database client (aka driver) is JDBC and thereby I'm using RJDBC package.
My query is pretty simple and I'm sure that query would return a lot of rows (about 2 million rows).
SELECT * FROM [maindb].[dbo].[users]
My R script is as follows.
library(RJDBC);
javaPackageName <- "com.microsoft.sqlserver.jdbc.SQLServerDriver";
clientJarFile <- "/home/abforce/mystuff/sqljdbc_3.0/enu/sqljdbc4.jar";
driver <- JDBC(javaPackageName, clientJarFile);
conn <- dbConnect(driver, "jdbc:sqlserver://192.168.56.101", "username", "password");
query <- "SELECT * FROM [maindb].[dbo].[users]";
result <- dbSendQuery(conn, query);
dbHasCompleted(result)
In the codes above, the last line always returns TRUE. What could be wrong here?
The fact of function dbHasCompleted always returning TRUE seems to be a known issue as I've found other places in the Internet where people were struggling with this issue.
So, I came with a workaround. Instead of function dbHasCompleted, we can use conditional statement nrow(result) == 0.
For example:
result <- dbSendQuery(conn, query);
repeat {
chunk <- dbFetch(result, n = 10);
if(nrow(chunk) == 0){
break;
}
# Do something with 'chunk';
}
dbClearResult(result);
I have a multi-statement stored procedure that first performs a select and then raises an error if certain conditions are met.
The raise error in the stored procedure doesn't cause a JDBC SQLException like I expect however.
If I remove the SELECT, then it works fine. The same type of behavior occurs with the print statement.
I have multiple other ways to handle this, but for future reference I was wondering if there was a way to check if raised errors do exist.
The way the SQL server protocol works, you first need to process the result set produced by the select, and then move to the next result to get the exception.
To process all results (result sets, update counts and exceptions), you need do something like:
CallableStatement csmt = ...;
boolean isResultSet = cstmt.execute();
do {
if (isResultSet) {
// process result set
try (ResultSet rs = csmst.getResultSet()) {
while(rs.next()) {
// ...
}
}
} else {
int updateCount = rs.getUpdateCount();
if (updateCount == -1) {
// -1 when isResultSet == false means: No more results
break;
} else {
// Do something with update count
}
}
isResultSet = cstmt.getMoreResults();
} while(true);
When the execution of the stored procedure reaches the exception, this will also report the exception to your java application (iirc from getMoreResults()).
I have MS SQL Server stored procedure (SQL Server 2012) which returns:
return value describing procedure execution result in general (successfull or not) with return #RetCode statement (RetCode is int type)
one result set (several records with 5 fields each)
another result set (several records with 3 fields each)
I calling this procedure from my Groovy (and Java) code using Java's CallableStatement object and I cannot find right way to handle all three outputs.
My last attempt is
CallableStatement proc = connection.prepareCall("{ ? = call Procedure_Name($param_1, $param_2)}")
proc.registerOutParameter(1, Types.INTEGER)
boolean result = proc.execute()
int returnValue = proc.getInt(1)
println(returnValue)
while(result){
ResultSet rs = proc.getResultSet()
println("rs")
result = proc.getMoreResults()
}
And now I get exception:
Output parameters have not yet been processed. Call getMoreResults()
I tried several approaches for some hours but didn't find correct one. Some others produced another exceptions.
Could you please help me with the issue?
Thanks In Advance!
Update (for Tim):
I see rc while I launched code:
Connection connection = dbGetter().connection
CallableStatement proc = connection.prepareCall("{ ? = call Procedure_Name($param_1, $param_2)}")
boolean result = proc.execute()
while(result){
ResultSet rs = proc.getResultSet()
println(rs)
result = proc.getMoreResults()
}
I see rc as object: net.sourceforge.jtds.jdbc.JtdsResultSet#1937bc8
I changed code and this code insert 30 000 row/min , but it is too slow. My be anybody give me another idea how to improved speed?
Connection connection = poolledConnection.getConnection();
connection.setAutoCommit(false);
int bathcount = 0;
Statement st = connection.createStatement();
for (condit){
st.addBatch(sql);
if (bathcount >= 10000){
st.executeBatch();
connection.commit();
st.clearBatch();
bathcount = 0;
}
bathcount++;
}
}
Since you are using Statement, instead of PreparedStatement, it is likely that DB2 is doing a prepare for each of your insert statements. Doing the prepare once, instead of thousands, or millions of times will save you a significant amount of CPU time.
In order to improve the speed, you should have a SQL statement with parameter markers, and set those parameters for each row.
I'm assuming that in your example, you must be building the SQL somehow for each row. If I'm wrong, and you're using the same insert values for each row, you can skip setting the parameter values, and it will be even faster.
So for my suggested change, it would look something like this (I'm assuming this is Java):
String sql = "INSERT INTO TBL (COLS...) VALUES (?,?...)";
Connection connection = poolledConnection.getConnection();
connection.setAutoCommit(false);
int bathcount = 0;
PreparedStatement ps = connection.prepareStatement(sql);
for (MyObject object : objectList /*conduit???*/){
ps.setString(1,object.getVal1());
ps.setString(2,object.getVal2());
ps.addBatch();
if (bathcount >= 10000){
ps.executeBatch();
connection.commit();
bathcount = 0;
}
bathcount++;
}
/* Make sure you add this to get the last batch if it's not exactly 10k*/
if (batchcount > 0) {
ps.executeBatch();
connection.commit();
}
}