As a sort of exercise, I'm seeing how fast I can insert bulk records into SQLite. The data set is about 50MB and contains 1M rows. Here is what I currently have:
sqlite3 *db;
int rc = sqlite3_open("MyDB.db", &db);
sqlite3_exec(db, "BEGIN TRANSACTION", NULL, NULL, NULL);
char* sql_buffer = malloc(200 * sizeof(char));
for (int i=0; item=row[i]; i ++) {
snprintf(sql_buffer, 200, "insert into myTable (id, format, size) VALUES (%d, '%s', %d)", item.id, item.format, item.size);
rc = sqlite3_exec(db, sql_buffer, NULL, NULL, NULL);
}
sqlite3_exec(db, "COMMIT TRANSACTION", NULL, NULL, NULL);
Doing the above 1M inserts, it takes 3.39s. About 90% of that time is the SQLite inserts and 10% is the snprintf function. I tried the following to see if it would increase speed:
Doing inserts after every 10K, 50K, 100K, instead of at the end (1M)
Writing to memory instead of a file.
Changing various pragmas, for example: PRAGMA cache_size = 400000; PRAGMA synchronous = OFF; PRAGMA journal_mode = OFF;...
None of those seemed to have more than any effect of more than a 0.1s difference or so.
Are there any further ways that I could increase the insert speed here? If we assume the file is "parsed" and cannot just be loaded directly from something like a csv file, could it theoretically be possible to insert 1M rows in under 1s? If not, what is the limitation in doing something like that?
Appreciate that with your current approach, inserting 1 million rows would require executing 1 million separate round trip inserts to SQLite. Instead, you could try using one of the following two approaches. For more recent versions of SQLite:
INSERT INTO myTable (id, format, size)
VALUES
(%d, '%s', %d),
(%d, '%s', %d),
(%d, '%s', %d),
... (more rows)
For earlier versions of SQLite, you may use an INSERT INTO ... SELECT construct:
INSERT INTO myTable (id, format, size)
SELECT %d, '%s', %d UNION ALL
SELECT %d, '%s', %d UNION ALL
... (more rows)
The basic idea here is that you can try just making a single insert call to SQLite with all of your data, instead of inserting one row at a time.
Not a C person, but here is how you might build the insert string from your C code:
const int MAX_BUF = 1000; // make this as large as is needed
char* sql_buffer = malloc(MAX_BUF * sizeof(char));
int length = 0;
length += snprintf(sql_buffer+length, MAX_BUF-length, "INSERT INTO myTable (id, format, size) VALUES");
for (int i=0; item=row[i]; i++) {
length += snprintf(sql_buffer+length, MAX_BUF-length, " (%d, '%s', %d)", item.id, item.format, item.size);
}
rc = sqlite3_exec(db, sql_buffer, NULL, NULL, NULL);
Related
I have C code that creates different tables based on data being encountered in a separate operation. As a further complication, some of the data needs to be inserted into the same table as other data, and can be encountered in any order. As the data does have common indices, I am handling this using indices and UPSERT statements. Below is a simple example of the setup. The prepared statement doesn't seem to be updating as I am only getting partial insertions or none at all. I think this is related to some memory allocation on the statement?
// System includes
#include <stdlib.h>
#include <stdio.h>
// Local includes
#include <sqlite3.h>
// Function to create the insertion string for sqlite
char* insert_string(int flag)
{
if(flag==1)
{
return("INSERT INTO test(?, ?, ?, ?, ?) ON CONFLICT (a,b) DO UPDATE SET c=excluded.c, d=excluded.d");
}
else if(flag==2)
{
return("INSERT INTO test(?, ?, ?, ?, ?) ON CONFLICT (a,b) DO UPDATE SET e=excluded.e");
}
}
// Function to create tables based on an integer flag
void create_table(int flag, sqlite3* sqldb)
{
if(flag==1)
{
sqlite3_exec(sqldb, "CREATE TABLE IF NOT EXISTS test(a integer, b integer, c real, d real, e real)", NULL, NULL, NULL);
sqlite3_exec(sqldb, "CREATE UNIQUE INDEX IF NOT EXISTS sqldb_idx ON test(a,b)", NULL, NULL, NULL);
}
else if(flag==2)
{
sqlite3_exec(sqldb, "CREATE TABLE IF NOT EXISTS test(a integer, b integer, c real, d real, e real)", NULL, NULL, NULL);
sqlite3_exec(sqldb, "CREATE UNIQUE INDEX IF NOT EXISTS sqldb_idx ON test(a,b)", NULL, NULL, NULL);
}
}
int main()
{
// Initialize database
sqlite3 *sqldb;
int sql_rc;
sqlite3_stmt* sql_stmt;
sqlite3_open("test.db", &sqldb);
// Loop over some integer flags
for(int i=1; i<3; i++)
{
// Create the table and begin the transaction
create_table(i, sqldb);
sqlite3_exec(sqldb, "BEGIN TRANSACTION;", NULL, NULL, NULL);
// Prepare the insertion statement
sqlite3_prepare_v2(sqldb, insert_string(i), -1, &sql_stmt, NULL);
// Insert a different amount of data depending on the flag
sqlite3_bind_int(sql_stmt, 1, 1);
sqlite3_bind_int(sql_stmt, 2, 2);
if(i==1)
{
sqlite3_bind_double(sql_stmt,3,1.0);
sqlite3_bind_double(sql_stmt,4,2.0);
}
else if(i==2)
{
sqlite3_bind_double(sql_stmt,5,3.0);
}
sqlite3_step(sql_stmt);
sqlite3_reset(sql_stmt);
// End the transaction
sqlite3_exec(sqldb, "END TRANSACTION;", NULL, NULL, NULL);
}
// Finalize and close
sqlite3_finalize(sql_stmt);
sqlite3_close(sqldb);
}
The SQL you're attempting to compile isn't valid. This will fix it:
if(flag==1)
{
return("INSERT INTO test VALUES(?, ?, ?, ?, ?) ON CONFLICT (a,b) DO UPDATE SET c=excluded.c, d=excluded.d");
}
else if(flag==2)
{
return("INSERT INTO test VALUES(?, ?, ?, ?, ?) ON CONFLICT (a,b) DO UPDATE SET e=excluded.e");
}
Note the added VALUES in the SQL strings.
I'd also highly recommend checking all of the outputs of sqlite3_ calls, even in test code like this. Doing so will show that without the change here, the first call to sqlite3_prepare_v2 fails with a SQLITE_ERROR, showing a problem with the SQL itself.
My code is like this...
char dis[20];
int tc,tac,trc;
puts("Enter the data:\n");
puts("District : ");
while((getchar())!='\n');
fgets(dis,20,stdin);
puts("Total Cases : ");
scanf("%d",&tc);
puts("Total Active Cases : ");
scanf("%d",&tac);
puts("Total Recovered Cases : ");
scanf("%d",&trc);
sql = "INSERT INTO COV VALUES (dis,tc,tac,trc);"; //won't work
sql = "INSERT INTO COV VALUES ('abc',1,1,0);"; //works
database = sqlite3_exec(db, sql,0,0,0);
I want to save the values obtained from user in sqlite database but I can't do it as shown below.
It works if I just pass the exact value (i.e. during compile time).
How can I send values computed during runtime execution to sqlite database?
Have a look to the Sqlite C interface documentation.
Assuming you have a table defined like this:
CREATE TABLE COV (id PRIMARY KEY, dis VARCHAR, tc INTEGER, tac INTEGER, trc INTEGER);
You need to bind your parameters with specific bind API to prevent SQL injection.
Prepare your INSERT string using ?N template:
char sql[512];
snprintf(sql, sizeof(sql), "INSERT INTO COV(dis, tc, tac, trc) VALUES (?1,?2,?3,?4);");
Then bind your program variables with the corresponding parameter:
sqlite3_stmt *stmt;
sqlite3_prepare_v2(db, sql, sizeof(sql), &stmt, NULL);
sqlite3_bind_text(stmt, 1, dis, 20, NULL);
sqlite3_bind_int(stmt, 2, tc);
sqlite3_bind_int(stmt, 3, tac);
sqlite3_bind_int(stmt, 4, trc);
ret = sqlite3_step(stmt);
if (ret == SQLITE_DONE)
printf("record inserted!\n");
else
printf("Error: %s\n", sqlite3_errmsg(db));
sqlite3_finalize(stmt);
You need to put the actual values in using sprintf
char sqlscript[128];
Then, since you are using sql to send to the DB, assign sqlscript to sql first
sql = sqlscript;
sprintf(sql, "INSERT INTO COV VALUES('%s', %d, %d, %d);", dis, tc, tac, trc);
I am using JDBC to create a temporary table, add records to it (with prepared statement and batch) and then transfer everything to another table:
String createTemporaryTable = "declare global temporary table temp_table (RECORD smallint,RANDOM_INTEGER integer,RANDOM_FLOAT float,RANDOM_STRING varchar(600)) ON COMMIT PRESERVE ROWS in TEMP";
statement.execute(createTemporaryTable);
String sql = "INSERT INTO session.temp_table (RECORD,RANDOM_INTEGER,RANDOM_FLOAT,RANDOM_STRING) VALUES (?,?,?,?)";
PreparedStatement preparedStatement = connection.prepareStatement(sql);
float f = 0.7401298f;
Integer integer = 123456789;
String string = "This is a string that will be inserted into the table over and over again.";
// add however many random records you want to the temporary table
int numberOfRecordsToInsert = 35000;
for (int i = 0; i < numberOfRecordsToInsert; i++) {
preparedStatement.setInt(1, i);
preparedStatement.setInt(2, integer);
preparedStatement.setFloat(3, (float) f);
preparedStatement.setString(4, string);
preparedStatement.addBatch();
}
preparedStatement.executeBatch();
// transfer everything from the temporary table just created to the main table
String transferFromTempTableToMain = "insert into main_table select * from session.temp_table";
statement.execute(transferFromTempTableToMain);
This works fine up to about 30000 records in this example. However, if I were to insert say 35000 records I get the following error:
Invalid data conversion: Requested conversion would result in a loss
of precision of 32768. ERRORCODE=-4461, SQLSTATE=42815
The problem is that field RECORD is a smallint. A smallint is a signed 16 bit integer with a range of -32768 to 32767.
So inserting an int value of 32768 is not allowed as it won't fit. You need to declare record as INTEGER instead.
In my previous implementation, I was streaming results from a sqlite3 table directly into my output application. However, since I am changing the interface to a temporary data structure, I now need to get the number of rows. The preferred way to do that seems to be with a temporary table, so my original
sprintf(query,"SELECT %s AS x, AVG(%s) AS y, AVG((%s)*(%s)) AS ysq FROM %s WHERE %s=%s AND %s GROUP BY x;",x,y,y,y,from,across,val,where);
sqlite3_prepare_v2(db, query, -1, &acResult,NULL);
while(sqlite3_step(acResult)==SQLITE_ROW) { ... }
sqlite3_finalize(acResult);
turns into
sprintf(query,"CREATE TEMP TABLE tt AS SELECT %s AS x, AVG(%s) AS y, AVG((%s)*(%s)) AS ysq FROM %s WHERE %s=%s AND %s GROUP BY x;",x,y,y,y,from,across,val,where);
sqlite3_prepare_v2(db, query, -1, &acResult,NULL);
sqlite3_step(acResult);
sqlite3_finalize(acResult);
sqlite3_prepare_v2(db, "SELECT COUNT(*) FROM tt;", -1, &acResult, NULL);
sqlite3_step(acResult);
int length = sqlite3_column_int(acResult,0);
sqlite3_finalize(acResult);
sqlite3_prepare_v2(db, "SELECT x,y, ysq FROM tt;", -1, &acResult, NULL);
while(sqlite3_step(acResult)==SQLITE_ROW) { ... }
sqlite3_finalize(acResult);
sqlite3_prepare_v2(db, "DROP TABLE tt;", -1, &acResult, NULL);
sqlite3_step(acResult);
sqlite3_finalize(acResult);
Now, this mostly works. The problem is that I have this inside a loop across another stepping query, which seems to be responsible for the table being locked when I try to drop it. If I finalize that query, it "works" (the drop works; everything else breaks because it's part of the logic). There is no possible way the outer query could be referencing tt, because I created it within that "scope".
Is there a way of reminding sqlite that it shouldn't be locked, or am I stuck switching the outer loop away from streaming as well?
This is a read-only application (with the exception of the temp table), if that helps.
Is it possible to iterate over a result set and if a condition is met, delete the current row?
i.e. something like
int rc;
sqlite3_stmt* statement;
sqlite3_exec(db, "BEGIN", 0, 0, 0);
sqlite3_prepare_v2(_db, "SELECT id,status,filename,del FROM mytable", -1, &statement, NULL);
rc = sqlite3_step(statement);
while (rc == SQLITE_ROW){
int id = sqlite3_column_int(statement, 1);
int status = sqlite3_column_int(statement, 2);
const unsigned char* filename = sqlite3_column_int(statement, 3);
int del = sqlite3_column_int(statement, 4);
if (status == 0 || del > 0){
int rc = unlink(filename);
if (rc == 0)
// Now delete the current row
else
// unlink failed, find out why, try again or ... ?
}
rc = sqlite3_step(statement);
}
sqlite3_finalize(statement);
sqlite3_exec(db, "COMMIT", 0, 0, 0);
I could just call a single sql statement to delete all rows that match the criteria, but I don't want to do that if for some reason the unlink fails.
Can I call an operation to delete the current row?
EDIT:
So there is a special column called rowid. Do I just add that that as a column in the
previous statement and create another statement like "delete from table where rowid=?" and pass in the current rowid?
That should work right? Is this the best way of going about it?
In terms of efficiency, it's probably not the most efficient. If you're doing this for something on the level of thousands or greater number of rows, you should consider doing one (or a combination) of the following:
Change your query to only consider rows whose del is > 0 (SELECT id,status,filename,del FROM mytable WHERE del > 0). You're performing a table scan with your current method, which you should always try to avoid. Also make sure you have an index on the del column.
Build up an intermediary array of row ids, and then perform a query of the following form: DELETE FROM table WHERE id IN (?), and the parameterized value is your collected row ids joined into a comma separated string. Based on the number of rows you're dealing with, you could set this delete to be performed in batches (delete in batch sizes of 1000, 5000, etc.); since it's SQLite, tune to the device you're running with.
Register a custom SQLite function at connection creation time using the form:
void deleteFileFunc(sqlite3_context * context, int argc, sqlite3_value ** argv) {
assert(argc == 1);
const char * fileName = sqlite3_value_text(argv[0]);
int rc = unlink(fileName);
sqlite3_result_int(context, rc);
}
sqlite3_create_function(db, "delete_file", 1, SQLITE3_UTF8, NULL, &deleteFileFunc, NULL, NULL);
and then change your database query to the form DELETE FROM mytable WHERE del > 0 AND delete_file(filename) == 0. The row will only be deleted if the delete succeeds, and you don't need to iterate over the result set. SQLite 3 create function page: http://www.sqlite.org/c3ref/create_function.html
it's OK to delete the row directly.
the doc says it doesn't interfere the running SELECT to delete a row which has already been read.
but the result is undefined if a future row that is expected to be read later is deleted.
https://www.sqlite.org/isolation.html