Understanding sqlite3_step and why it "takes time" between iterations - c

I'm wondering why sometimes the query results return quite quick and other times it seems to take a while between the sqlite3_step() calls. For example:
rc = sqlite3_prepare_v2(db, "SELECT * FROM mytable m GROUP BY field1, field2, field3, field4, field5 LIMIT 500", -1, &stmt, NULL);
while ((rc = sqlite3_step(stmt)) == SQLITE_ROW);
Each sqlite3_step takes about the same time when I view the print statements, and the fetch statements add up to take about 6.5s
And then with a normal, fast unaggregated query, sqlite3_step is almost instant:
rc = sqlite3_prepare(db, "SELECT * FROM mytable LIMIT 500", -1, &stmt, NULL);
In the above query, everything takes under 0.1s to query and retrieve.
I understand why the query itself would be much slower, but it seems like sqlite3_stmt is not really "doing anything" and all the SQL-ing work goes into the sqlite3_step and is spread out within each step. A few questions from this:
Why aren't the resultset fetched once, and successive steps returned instantly (for a small queryset, such as the above where there are less than 1K records)?
What exactly is sqlite3_prepare doing? Does it touch the actual file (or in-memory) data, or is merely just "setting up" for the query to run?
What is SQLite doing between each sqlite3_step that it is taking additional time between steps (if the query has already been 'executed')?
The documentation of this method is listed here, but it doesn't say much about what it actually does other than saying what return codes it might give. From the docs:
This routine is used to evaluate a prepared statement that has been previously created by the sqlite3_prepare() interface. The statement is evaluated up to the point where the first row of results are available. To advance to the second row of results, invoke sqlite3_step() again. Continue invoking sqlite3_step() until the statement is complete. Statements that do not return results (ex: INSERT, UPDATE, or DELETE statements) run to completion on a single call to sqlite3_step().
What does "to completion" mean in this case? Does it basically just do a LIMIT 1 and the user needs to call it again to do an OFFSET 1, or what does that mean exactly?

sqlite3_prepare() compiles the statement into sqlite's internal bytecode. sqlite3_step() evaluates that bytecode up to the point where a row is returned or there are no more rows. The next call resumes evaluating at that point; it doesn't always calculate all result rows all at once, but often just one at a time. sqlite3_step() can take different times depending on how much work has to be done to get the next row (like processing groups of different sizes), if pages have to be fetched from disc instead of being present in the page cache, etc.

Related

SQL Server : function precedence and short curcuiting in where clause

Consider this setup:
create table #test (val varchar(10))
insert into #test values ('20100101'), ('1')
Now if I run this query
select *
from #test
where ISDATE(val) = 1
and CAST(val as datetimeoffset) > '2005-03-01 00:00:00 +00:00'
it will fail with
Conversion failed when converting date and/or time from character string
which tells me that the where conditions are not short-circuited and both functions are evaluated. OK.
However if I run
select *
from #test
where LEN(val) > 2
and CAST(val as datetimeoffset) > '2005-03-01 00:00:00 +00:00'
it doesn't fail, which tells me that where clause is short-circuited in this case.
This
select *
from #test
where ISDATE(val) = 1
and CAST(val as datetimeoffset) > '2005-03-01 00:00:00 +00:00'
and LEN(val) > 2
fails again, but if I move length check to before cast, it work. So looks like the functions are evaluated in the order they appear in query.
Can anyone explain why first query fails?
It fails because SQL is declarative so the order of your conditions is not taken into account when the plan is generated (nor is it required to do so).
The usual way to get around this is to use CASE which has strict rules about sequence and when to stop.
In your case you will probably need nested CASEs, something like this:
WHERE
(
case when ISDATE(val) = 1 then
case when CAST(val as datetimeoffset) > '2005-03-01 00:00:00 +00:00' and
LEN(val) > 2
THEN 1 ELSE 0 END
ELSE 0
END
) = 1
(note this is unlikely to be actually correct SQL as I just typed it in).
By the way, even if you get it "working" by rearranging the conditions, I'd advise you don't. Accept that SQL simply doesn't work in that way. As the data changes & stats change, SQL is upgraded, workload varies, indexes are added the query plan could change. Any attempts to "get it working" are going to be short-lived at best so go with the CASE which will continue to work once you've got it right (provided you nest CASE statements where required and don't fall into the same precedence trap in the CASE conditions!)
The mystery is answered if you examine the Execution Plan. Both the CAST() and the LEN() are applied as part of the Table Scan step, while the test for IsDate() is a separate Filter test after the Table Scan.
It appears that the SQL Engine's internal optimizations use certain filtering functions as part of the retrieval of the data, and others as separate filters, almost certainly as a form of query optimization to minimize the load from disk into main memory. However, more complex functions, such as IsDate(), which is dependent on system variables such as system date format in some cases (is '01/02/2017' Jan 2nd or Feb 1st?), need to have the data retrieved before the filter is applied.
Although I have no hard information on this, I strongly suspect that any filter more resource intensive than a certain level is delegated to the Filter steps in the query plan, and anything simple/fast enough to be checked as the data is being read in is applied during the Scan/Seek steps. Also, if a filter could be applied on the data in the index, I am certain that it will be tested before any non-index data is tested, solely to minimize disk reads, which are bad performance juju (this may not apply on the Clustered index of the table). In these cases, the short-circuiting might not be straightforward, with an IsDate() test specified on a non-index field being executed after a similar test on an indexed field, no matter where they are in the list of conditions.
That said, it appears to be true that conditions short-circuit when they are executed in the same step of the query plan. If you insert a string like '201612123' into the temp table, then add a check on Len(val) < 9 after the date comparison, it still generates an error, instead of checking both LEN() conditions at the same time in a tiny optimization.
which tells me that where conditions are not short-circuited and both functions are evaluated.
To expand on LoztInSpace's answer, your terminology suggests you are not interpreting SQL correctly, on its own terms.
The various parts of a SELECT statement are not "functions". The entire statement is atomic. You supply the query as unit, and the DBMS responds. There is no "before" and no "after". There is just the query.
Those are the rules. Your job in formulating the query is to supply one that is valid. It's a logical progression: valid question, valid answer, etc. The moment you step out of that frame, you might as well be asking, "why is the sky seven?".
One a small clarification to #LoztInSpace's answer. When he refers to the order of your statements, he's presumably talking about the phrasing of your query, which for purposes of evaluation is inconsequential. Sequential SQL statements are executed sequentially, as presented. That is guaranteed by the SQL standard.

Optimal passing of optional parameters into a query

I have a parameterized query with optional parameters.
Multiple tables are joined.
A part of the WHERE clause looks like this:
and ((x.a = #arg1) OR (#arg1 IS NULL))
and ((y.b = #arg2) OR (#arg2 IS NULL))
and ((z.c = #arg3) OR (#arg3 IS NULL))
So, the idea is: A parameter can either be used for applying a filter, or, if the parameter is NULL, then no filtering will be applied.
However, I found that the execution plan is not good for this code.
When a parameter is actually set, then it is much better to write
and x.a = #arg1
instead of
and ((x.a= #arg1) OR (#arg1 IS NULL))
Actually, I have in total 8 tables which are joined together. In both statements, all 8 tables are joined, and there are the same index seeks/scans applied on all these tables. However, the join order is different, thus resulting in different execution speeds.
Is there a way to rewrite the above statement, such that the execution plan can work optimal? Possibly with some query hints ?
Or is there no way around writing dynamic SQL? I want to avoid the latter, because
dynamic SQL is hard to read,
SSMS does not show dependencies,
passing the parameters into the dynamic SQL is awful
try using COALESCE(#arg1, x.a) instead of ((x.a = #arg1) OR (#arg1 IS NULL))
and x.a = COALESCE(#arg1, x.a)
and y.b = COALESCE(#arg2, xyb)
and z.c = COALESCE(#arg3, z.c)
Unfortunately the optimiser will tend to stick with whatever plan suits the first invocation. Your best options are to write a stored procedure for each combination (or at least the major ones) or try OPTION(RECOMPILE) which will take the time to evaluate the actual parameters, but at some cost.
To answer your question about "Is there a way to rewrite the above statement, such that the execution plan can work optimal?", the answer is probably not. There are too many variations to come up with a single plan with so many things that may not occur (your null variables).

ColdFusion 10 error with Stored Procedures

In a .CFC file, within a CFfunction and with CFargument tags.
<cfscript>
var sp=new storedproc();
sp.setDatasource(variables.datasource);
sp.setProcedure("storedProcedure_INSERT");
sp.addParam(cfsqltype="cf_sql_integer",type="in",value=arguments.one);
sp.addParam(cfsqltype="cf_sql_integer",type="in",value=arguments.two);
sp.addParam(cfsqltype="cf_sql_integer",type="in",value=arguments.three);
sp.addParam(cfsqltype="cf_sql_integer",type="in",value=arguments.four);
sp.addProcResult(name="results",resultset=1);
//writeDump(sp);break; //This dump is reached
var spObj=sp.execute(); //blows up here; this is never reached
writeDump(spObj);break; //This is never reached, either.
var spResults=spObj.getProcResultSets().results;
A shiny nickle to anyone who can tell me why the sp.execute() is blowing up with message
"Cannot find results key in structure.
The specified key, results, does not exist in the structure."
I've used this psuedo-code many, may times in the past, and never had it do this. I'm connected to a MSSQL Server 2012 DB, everything's cricket in CF Admin, and other SPs are working properly. The stack trace doesn't even include any of MY code at all o_O
The error occurred in C:/ColdFusion10/cfusion/CustomTags/com/adobe/coldfusion/base.cfc: line 491
Called from C:/ColdFusion10/cfusion/CustomTags/com/adobe/coldfusion/storedproc.cfc: line 142
Called from //hq-devfs/development$/websites/myProject/cfc/mySOAPWSDLs.cfc: line 123
And SO is blowing up if I try and paste anymore of that. Google has...not been helpful ._.
Short answer: The error means you are trying to retrieve a resultset from the stored procedure, when it does not actually return one. A simple solution is to add a SELECT to the end of your procedure, so it returns a resultset containing the data you need. Then your original code will work:
SELECT ##ROWCOUNT AS NumOfRowsAffected;
Longer answer:
The method you are using, addProcResult(), is the equivalent of <cfprocresult>. It is intended to capture a resultset returned from a stored procedure. (Due to CF's poor choice of attribute names, a lot of people think "resultset" means the storedproc "result" structure, but they are two totally different things). A "resultset" is a query object", in CF parlance.
While all four (4) of the primary sql statements return some result, not all of them return a "query object"
Only SELECT statements generate a "query object"
INSERT/UPDATE/DELETE statements simply return the number of rows affected. They do not generate a "query object".
Since your stored procedure performs an INSERT, it does not generate a "query object". Hence the error when you try and grab the non-existent query here:
sp.addProcResult(name="results",resultset=1);
The simple solution is to add a SELECT statement to the end of your stored procedure, so that it does return a query object. Then your code will work as expected.
As an aside, I suspect you were actually trying to grab the "result" structure, but used the wrong method. The equivalent of <cfstoredproc result=".."> is getPrefix(). Though that would not work here anyway. According to the docs, it does not contain the number of rows affected. Probably because stored procedures can execute multiple statements, each one potentially returning a row count, so there is not just a single value to return.

TSQL Prevent "FAST N" output by default

When I execute a TSQL query or a stored procedure that returns large number of records (1M+) by default it begins to display result while query is still executing.
Is there way to prevent this and postpone returning of the result until the query execution is complete?
If you add a column like the following to the returned dataset, it almost certainly will not be able to do a FAST(N):
.., MAX(prevColumn) OVER(PARTITON BY 1) As Dummy, ...
Where prevColumn is any other column that you are already returning, especially if it's not an indexed column.
You can use ORDER BY or Aggregate function in SELECT will help you.

T-SQL Where Clause Case Statement Optimization (optional parameters to StoredProc)

I've been battling this one for a while now. I have a stored proc that takes in 3 parameters that are used to filter. If a specific value is passed in, I want to filter on that. If -1 is passed in, give me all.
I've tried it the following two ways:
First way:
SELECT field1, field2...etc
FROM my_view
WHERE
parm1 = CASE WHEN #PARM1= -1 THEN parm1 ELSE #PARM1 END
AND parm2 = CASE WHEN #PARM2 = -1 THEN parm2 ELSE #PARM2 END
AND parm3 = CASE WHEN #PARM3 = -1 THEN parm3 ELSE #PARM3 END
Second Way:
SELECT field1, field2...etc
FROM my_view
WHERE
(#PARM1 = -1 OR parm1 = #PARM1)
AND (#PARM2 = -1 OR parm2 = #PARM2)
AND (#PARM3 = -1 OR parm3 = #PARM3)
I read somewhere that the second way will short circuit and never eval the second part if true. My DBA said it forces a table scan. I have not verified this, but it seems to run slower on some cases.
The main table that this view selects from has somewhere around 1.5 million records, and the view proceeds to join on about 15 other tables to gather a bunch of other information.
Both of these methods are slow...taking me from instant to anywhere from 2-40 seconds, which in my situation is completely unacceptable.
Is there a better way that doesn't involve breaking it down into each separate case of specific vs -1 ?
Any help is appreciated. Thanks.
I read somewhere that the second way will short circuit and never eval the second part if true. My DBA said it forces a table scan.
You read wrong; it will not short circuit. Your DBA is right; it will not play well with the query optimizer and likely force a table scan.
The first option is about as good as it gets. Your options to improve things are dynamic sql or a long stored procedure with every possible combination of filter columns so you get independent query plans. You might also try using the "WITH RECOMPILE" option, but I don't think it will help you.
if you are running SQL Server 2005 or above you can use IFs to make multiple version of the query with the proper WHERE so an index can be used. Each query plan will be placed in the query cache.
also, here is a very comprehensive article on this topic:
Dynamic Search Conditions in T-SQL by Erland Sommarskog
it covers all the issues and methods of trying to write queries with multiple optional search conditions
here is the table of contents:
Introduction
The Case Study: Searching Orders
The Northgale Database
Dynamic SQL
Introduction
Using sp_executesql
Using the CLR
Using EXEC()
When Caching Is Not Really What You Want
Static SQL
Introduction
x = #x OR #x IS NULL
Using IF statements
Umachandar's Bag of Tricks
Using Temp Tables
x = #x AND #x IS NOT NULL
Handling Complex Conditions
Hybrid Solutions – Using both Static and Dynamic SQL
Using Views
Using Inline Table Functions
Conclusion
Feedback and Acknowledgements
Revision History
If you pass in a null value when you want everything, then you can write your where clause as
Where colName = IsNull(#Paramater, ColName)
This is basically same as your first method... it will work as long as the column itself is not nullable... Null values IN the column will mess it up slightly.
The only approach to speed it up is to add an index on the column being filtered on in the Where clause. Is there one already? If not, that will result in a dramatic improvement.
No other way I can think of then doing:
WHERE
(MyCase IS NULL OR MyCase = #MyCaseParameter)
AND ....
The second one is more simpler and readable to ther developers if you ask me.
SQL 2008 and later make some improvements to optimization for things like (MyCase IS NULL OR MyCase = #MyCaseParameter) AND ....
If you can upgrade, and if you add an OPTION (RECOMPILE) to get decent perf for all possible param combinations (this is a situation where there is no single plan that is good for all possible param combinations), you may find that this performs well.
http://blogs.msdn.com/b/bartd/archive/2009/05/03/sometimes-the-simplest-solution-isn-t-the-best-solution-the-all-in-one-search-query.aspx

Resources