Is it possible to iterate over a result set and if a condition is met, delete the current row?
i.e. something like
int rc;
sqlite3_stmt* statement;
sqlite3_exec(db, "BEGIN", 0, 0, 0);
sqlite3_prepare_v2(_db, "SELECT id,status,filename,del FROM mytable", -1, &statement, NULL);
rc = sqlite3_step(statement);
while (rc == SQLITE_ROW){
int id = sqlite3_column_int(statement, 1);
int status = sqlite3_column_int(statement, 2);
const unsigned char* filename = sqlite3_column_int(statement, 3);
int del = sqlite3_column_int(statement, 4);
if (status == 0 || del > 0){
int rc = unlink(filename);
if (rc == 0)
// Now delete the current row
else
// unlink failed, find out why, try again or ... ?
}
rc = sqlite3_step(statement);
}
sqlite3_finalize(statement);
sqlite3_exec(db, "COMMIT", 0, 0, 0);
I could just call a single sql statement to delete all rows that match the criteria, but I don't want to do that if for some reason the unlink fails.
Can I call an operation to delete the current row?
EDIT:
So there is a special column called rowid. Do I just add that that as a column in the
previous statement and create another statement like "delete from table where rowid=?" and pass in the current rowid?
That should work right? Is this the best way of going about it?
In terms of efficiency, it's probably not the most efficient. If you're doing this for something on the level of thousands or greater number of rows, you should consider doing one (or a combination) of the following:
Change your query to only consider rows whose del is > 0 (SELECT id,status,filename,del FROM mytable WHERE del > 0). You're performing a table scan with your current method, which you should always try to avoid. Also make sure you have an index on the del column.
Build up an intermediary array of row ids, and then perform a query of the following form: DELETE FROM table WHERE id IN (?), and the parameterized value is your collected row ids joined into a comma separated string. Based on the number of rows you're dealing with, you could set this delete to be performed in batches (delete in batch sizes of 1000, 5000, etc.); since it's SQLite, tune to the device you're running with.
Register a custom SQLite function at connection creation time using the form:
void deleteFileFunc(sqlite3_context * context, int argc, sqlite3_value ** argv) {
assert(argc == 1);
const char * fileName = sqlite3_value_text(argv[0]);
int rc = unlink(fileName);
sqlite3_result_int(context, rc);
}
sqlite3_create_function(db, "delete_file", 1, SQLITE3_UTF8, NULL, &deleteFileFunc, NULL, NULL);
and then change your database query to the form DELETE FROM mytable WHERE del > 0 AND delete_file(filename) == 0. The row will only be deleted if the delete succeeds, and you don't need to iterate over the result set. SQLite 3 create function page: http://www.sqlite.org/c3ref/create_function.html
it's OK to delete the row directly.
the doc says it doesn't interfere the running SELECT to delete a row which has already been read.
but the result is undefined if a future row that is expected to be read later is deleted.
https://www.sqlite.org/isolation.html
Related
I am using FreeTDS to process simple SELECT statements.
My problem is that I cannot get more than the first 4096 bytes of a large column value.
Let's say we have a table like this:
CREATE TABLE tab (
largecol varbinary(max),
othercol int PRIMARY KEY
);
My code looks like this (simplified and omitting error checks):
#include <sybfront.h>
#include <sybdb.h>
int main ()
{
DBPROCESS *dbproc;
LOGINREC *login;
char *data;
DBINT len;
/* setup */
dbinit();
login = dblogin();
DBSETLUSER(login, "username");
DBSETLPWD(login, "password");
DBSETLAPP(login, "my_program");
DBSETLPACKET(login, 10000);
DBSETLNATLANG(login, "us_english");
DBSETLCHARSET(login, "UTF-8");
/* connect */
dbproc = dbopen(login, "hostname");
dbuse(dbproc, "dbname");
/* execute query */
dbcmd(dbproc, "SELECT largecol, othercol FROM tab");
dbsqlexec(dbproc);
dbresults(dbproc);
/* retrieve result */
dbnextrow(dbproc);
data = (char *)dbdata(dbproc, 1);
len = dbdatlen(dbproc, 1);
/* more processing */
}
Now no matter how large the data in largecol are, I never get more than 4096 bytes in data and len.
The only lead I have to make this work is the dbreadtext function, but I don't understand how to use it. The only bit of information I get is:
Use dbreadtext instead of dbnextrow to read SQLTEXT and SQLIMAGE values.
That function does not take a column number as argument, so I have no idea how to use it. Can it only be used with queries that retrieve only a single column?
How can I retrieve large column data?
FreeTDS has an option, text size, which can be set in freetds.conf:
See table 3.3, text size:
https://www.freetds.org/userguide/freetdsconf.html
Give that a try?
As a sort of exercise, I'm seeing how fast I can insert bulk records into SQLite. The data set is about 50MB and contains 1M rows. Here is what I currently have:
sqlite3 *db;
int rc = sqlite3_open("MyDB.db", &db);
sqlite3_exec(db, "BEGIN TRANSACTION", NULL, NULL, NULL);
char* sql_buffer = malloc(200 * sizeof(char));
for (int i=0; item=row[i]; i ++) {
snprintf(sql_buffer, 200, "insert into myTable (id, format, size) VALUES (%d, '%s', %d)", item.id, item.format, item.size);
rc = sqlite3_exec(db, sql_buffer, NULL, NULL, NULL);
}
sqlite3_exec(db, "COMMIT TRANSACTION", NULL, NULL, NULL);
Doing the above 1M inserts, it takes 3.39s. About 90% of that time is the SQLite inserts and 10% is the snprintf function. I tried the following to see if it would increase speed:
Doing inserts after every 10K, 50K, 100K, instead of at the end (1M)
Writing to memory instead of a file.
Changing various pragmas, for example: PRAGMA cache_size = 400000; PRAGMA synchronous = OFF; PRAGMA journal_mode = OFF;...
None of those seemed to have more than any effect of more than a 0.1s difference or so.
Are there any further ways that I could increase the insert speed here? If we assume the file is "parsed" and cannot just be loaded directly from something like a csv file, could it theoretically be possible to insert 1M rows in under 1s? If not, what is the limitation in doing something like that?
Appreciate that with your current approach, inserting 1 million rows would require executing 1 million separate round trip inserts to SQLite. Instead, you could try using one of the following two approaches. For more recent versions of SQLite:
INSERT INTO myTable (id, format, size)
VALUES
(%d, '%s', %d),
(%d, '%s', %d),
(%d, '%s', %d),
... (more rows)
For earlier versions of SQLite, you may use an INSERT INTO ... SELECT construct:
INSERT INTO myTable (id, format, size)
SELECT %d, '%s', %d UNION ALL
SELECT %d, '%s', %d UNION ALL
... (more rows)
The basic idea here is that you can try just making a single insert call to SQLite with all of your data, instead of inserting one row at a time.
Not a C person, but here is how you might build the insert string from your C code:
const int MAX_BUF = 1000; // make this as large as is needed
char* sql_buffer = malloc(MAX_BUF * sizeof(char));
int length = 0;
length += snprintf(sql_buffer+length, MAX_BUF-length, "INSERT INTO myTable (id, format, size) VALUES");
for (int i=0; item=row[i]; i++) {
length += snprintf(sql_buffer+length, MAX_BUF-length, " (%d, '%s', %d)", item.id, item.format, item.size);
}
rc = sqlite3_exec(db, sql_buffer, NULL, NULL, NULL);
I am trying to check if the primary-key (entered manually by the user) exists already in the SQLite database or not (in order to decide whether to proceed with an insertion of a new record or to do an update for an existent one).
I've tried:
query.exce();
query.isEmpty();
and bellow I'm trying: guery.isNull();
However, they all give me the same result: they all say the record doesn't exist (return 0) and go to the insertion function. They return 0 even if the ref_no does exist).
Here is my code for isNull() function:
int DatabaseManager::checkRefNoExist(QString ref_no){
QSqlQuery query;
query.prepare("SELECT * FROM basic_info WHERE ref_no = :ref_no");
query.bindValue(":ref_no", ref_no);
query.exec();
if(query.isNull(ref_no.toInt())){
return 0; // whatever the ref_no is, it always comes here !!
} else {
return 1;
}
}
You are using ref_no.toInt() as column index. This does not make sense.
To check whether the query returned any result row, try to fetch the first result (with query.first() or query.next()).
I changed code and this code insert 30 000 row/min , but it is too slow. My be anybody give me another idea how to improved speed?
Connection connection = poolledConnection.getConnection();
connection.setAutoCommit(false);
int bathcount = 0;
Statement st = connection.createStatement();
for (condit){
st.addBatch(sql);
if (bathcount >= 10000){
st.executeBatch();
connection.commit();
st.clearBatch();
bathcount = 0;
}
bathcount++;
}
}
Since you are using Statement, instead of PreparedStatement, it is likely that DB2 is doing a prepare for each of your insert statements. Doing the prepare once, instead of thousands, or millions of times will save you a significant amount of CPU time.
In order to improve the speed, you should have a SQL statement with parameter markers, and set those parameters for each row.
I'm assuming that in your example, you must be building the SQL somehow for each row. If I'm wrong, and you're using the same insert values for each row, you can skip setting the parameter values, and it will be even faster.
So for my suggested change, it would look something like this (I'm assuming this is Java):
String sql = "INSERT INTO TBL (COLS...) VALUES (?,?...)";
Connection connection = poolledConnection.getConnection();
connection.setAutoCommit(false);
int bathcount = 0;
PreparedStatement ps = connection.prepareStatement(sql);
for (MyObject object : objectList /*conduit???*/){
ps.setString(1,object.getVal1());
ps.setString(2,object.getVal2());
ps.addBatch();
if (bathcount >= 10000){
ps.executeBatch();
connection.commit();
bathcount = 0;
}
bathcount++;
}
/* Make sure you add this to get the last batch if it's not exactly 10k*/
if (batchcount > 0) {
ps.executeBatch();
connection.commit();
}
}
In my previous implementation, I was streaming results from a sqlite3 table directly into my output application. However, since I am changing the interface to a temporary data structure, I now need to get the number of rows. The preferred way to do that seems to be with a temporary table, so my original
sprintf(query,"SELECT %s AS x, AVG(%s) AS y, AVG((%s)*(%s)) AS ysq FROM %s WHERE %s=%s AND %s GROUP BY x;",x,y,y,y,from,across,val,where);
sqlite3_prepare_v2(db, query, -1, &acResult,NULL);
while(sqlite3_step(acResult)==SQLITE_ROW) { ... }
sqlite3_finalize(acResult);
turns into
sprintf(query,"CREATE TEMP TABLE tt AS SELECT %s AS x, AVG(%s) AS y, AVG((%s)*(%s)) AS ysq FROM %s WHERE %s=%s AND %s GROUP BY x;",x,y,y,y,from,across,val,where);
sqlite3_prepare_v2(db, query, -1, &acResult,NULL);
sqlite3_step(acResult);
sqlite3_finalize(acResult);
sqlite3_prepare_v2(db, "SELECT COUNT(*) FROM tt;", -1, &acResult, NULL);
sqlite3_step(acResult);
int length = sqlite3_column_int(acResult,0);
sqlite3_finalize(acResult);
sqlite3_prepare_v2(db, "SELECT x,y, ysq FROM tt;", -1, &acResult, NULL);
while(sqlite3_step(acResult)==SQLITE_ROW) { ... }
sqlite3_finalize(acResult);
sqlite3_prepare_v2(db, "DROP TABLE tt;", -1, &acResult, NULL);
sqlite3_step(acResult);
sqlite3_finalize(acResult);
Now, this mostly works. The problem is that I have this inside a loop across another stepping query, which seems to be responsible for the table being locked when I try to drop it. If I finalize that query, it "works" (the drop works; everything else breaks because it's part of the logic). There is no possible way the outer query could be referencing tt, because I created it within that "scope".
Is there a way of reminding sqlite that it shouldn't be locked, or am I stuck switching the outer loop away from streaming as well?
This is a read-only application (with the exception of the temp table), if that helps.