My application makes about 5 queries per second to a SQL Server database. Each query results in 1500 rows on average. The application is written on C++/QT, database operations are implemented using QODBC driver. I determined that query processing takes about 25 ms, but fetching the result - 800 ms. Here is how code querying the data base looks like
QSqlQuery query(db)
query.prepare(queryStr);
query.setForwardOnly(true);
if(query.exec())
{
while( query.next() )
{
int v = query.value(0).toInt();
.....
}
}
How to optimize result fetching?
This does not directly answer your question as I haven't used QT in years. In the actual ODBC API you can often speed up the retrieval of rows by setting SQL_ATTR_ROW_ARRAY_SIZE to N then each call to SQLFetch returns N rows at once. I took a look at SqlQuery in qt and could not see a way to do this but it may be something you could look in to with QT or simply write to the ODBC API directly. You can find an example at Preparing to Return Multiple Rows
Related
I use a SqlTransaction in my C# project, and I use a Delete statement with an EcexuteNonQuery call.
This works very well and I have always the same amount of rows to delete, but 95% of the time, this needs 1 ms and approx 5% of the time, it is between 300 - 500 ms.
My code:
using (SqlTransaction DbTrans = conn.BeginTransaction(IsolationLevel.ReadCommitted))
{
SqlCommand dbQuery = conn.CreateCommand();
dbQuery.Transaction = DbTrans;
dbQuery.CommandType = CommandType.Text;
dbQuery.CommandText = "delete from xy where id = #ID";
dbQuery.Parameters.Add("ID", SqlDbType.Int).Value = x.ID;
dbQuery.ExecuteNonQuery();
}
Is something wrong with my code?
Read Understanding how SQL Server executes a query and How to analyse SQL Server performance to get you started on troubleshooting such issues.
Of course I assume you have an index on xy.id. Your DELETE is likely blocking from time to time. This an be caused by many causes:
data locks from other queries
IO block from your hardware
log growth events
etc
The gist of it is that using the techniques in the articles linked above (specially the second one) you can identify the cause and address it appropriately.
Changes to your C# code will have little impact, if any at all. Using a stored procedure is
not going to help. You need to root cause the problem.
I am using MS SQL server 2008 with Hibernate. the question I have is how Hibernate implements setMaxResults
Take the following simple scenario.
If I have a query that returns 100 rows and if I pass 1 to setMaxResults, will this affect the returned result from the SQL server itself(as if running a select top 1 statement) or does Hibernate get all the results first (all 100 rows in this case) and pick the top one itself?
Reason I am asking is that it would have a huge performance issue when the number of rows starts to grow.
Thank you.
Hibernate will generate a limit-type query, for all dialects which supports limit query. As the SQLServerDialect supports this (see org.hibernate.dialect.SQLServerDialect.supportsLimit(), and .getLimitString()), you will get a select top 1-query.
If you would like to be absolutly sure, you may turn on debug-logging, or enable the showSql-option and test.
May be following snippet will help. Assume we have a managed Bean class EmpBean and we want only first 5 records. So following is the code
public List<EmpBean> getData()
{
Session session = null;
try
{
session = HibernateUtil.getSession();
Query qry = session.createQuery("FROM EmpBean");
qry.setMaxResults(5);
return qry.list();
}
catch(HibernateException e)
{}
finally
{
HibernateUtil.closeSession(session);
}
return null;
}
Here getSession and closeSession are static utility methods which will take care of creating and closing session
NB: I am using db (not ndb) here. I know ndb has a count_async() but I am hoping for a solution that does not involve migrating over to ndb.
Occasionally I need an accurate count of the number of entities that match a query. With db this is simply:
q = some Query with filters
num_entities = q.count(limit=None)
It costs a small db operation per entity but it gets me the info I need. The problem is that I often need to do a few of these in the same request and it would be nice to do them asynchronously but I don't see support for that in the db library.
I was thinking I could use run(keys_only=True, batch_size=1000) as it runs the query asynchronously and returns an iterator. I could first call run() on each query and then later count the results from each iterator. It costs the same as count() however run() has proven to be slower in testing (perhaps because it actually returns results) and in fact it seems that batch_size is limited at 300 regardless of how high I set it which requires more RPCs to do a count of thousands of entities than the count() method does.
My test code for run() looks like this:
queries = list of Queries with filters
iters = []
for q in queries:
iters.append( q.run(keys_only=True, batch_size=1000) )
for iter in iters:
count_entities_from(iter)
No, there's no equivalent in db. The whole point of ndb is that it adds these sort of capabilities which were missing in db.
OutOfMemoryError caused when db4o databse has 15000+ objects
My question is in reference to my previous question (above). For the same PostedMessage model and same query.
With 100,000 PostedMessage objects, the query takes about 1243 ms to return first 20 PostedMessages.
Now, I have saved 1,000,000 PostedMessage objects in db4o. The same query took 342,132 ms. Which is non-linearly high.
How can I optimize the query speed?
FYR:
The timeSent and timeReceived are Indexed fields.
I am using SNAPSHOT query mode.
I am not using TA/TP.
Do you sort the result? Unfortunatly db4o doesn't use the index for sorting / orderBy. That means it will run a regular sort algorith, with O(n*log(n)). It won't scala liniearly.
Also db4o doesn't support a TOP operator. That means even without sorting it takes quite a bit of time to copy the ids to the results set, even when you never read the entities afterwards.
So, there's no real good solution for this, except trying to use some criteria which cut down the result size.
Some adventerous people might use a different query evaluation, but personally don't recommend that.
#Gamlor No, I am not sorting at all. The code is as follows:
public static ObjectSet<PostedMessage> getMessagesBetweenDates(
Calendar after,
Calendar before,
ObjectContainer db) {
if (after == null || before == null || db == null) {
return null;
}
Query q = db.query(); //db is pre-configured to use SNAPSHOT mode.
q.constrain(PostedMessage.class);
Constraint from = q.descend("timeRecieved").constrain(new Long(after.getTimeInMillis())).greater().equal();
q.descend("timeRecieved").constrain(new Long(before.getTimeInMillis())).smaller().equal().and(from);
ObjectSet<EmailMessage> results = q.execute();
return results;
}
The arguments to this method are as follows:
after = 13-09-2011 10:55:55
before = 13-09-2011 10:56:10
And I expect only 10 PostedMessages to be returned between "after" and "before". (I am generating dummy PostedMessage with timeReceived incremented by 1 sec each.)
Does anybody have experience with getting random results from index with +100,000,000 (100 million) records.
The goal is getting 30 results ordered by random, at least 100 times per second.
Actually my records are in MySQL but selecting ORDER BY RAND() from huge tables is the most easiest way to kill MySQL.
Sphinxsearch or whatever what do you recommend?
I dont have that big an index to try.
barry#server:~/modules/sphinx-2.0.1-beta/api# time php test.php -i gi_stemmed --sortby #random --select id
Query '' retrieved 20 of 3067775 matches in 0.081 sec.
Query stats:
Matches:
<SNIP>
real 0m0.100s
user 0m0.010s
sys 0m0.010s
This is on a reasonably powerful dedicated server - that is serving live queries (~20qps)
But to be honest if you dont need filtering (ie each query has a 'WHERE' clause), you can just setup a system that returns random results - can do this with mysql. Just using ORDER BY RAND() is evil (and sphinx while better at sorting than mysql is still doing basically the same thing).
How 'sparse' is your data? If most of your ids are used, can just do soemthing like
$ids = array();
$max = getOne("SELECT MAX(id) FROM table");
foreach(range(1,30) as $idx) {
$ids[] = rand(1,$max);
}
$query = "SELECT * FROM table WHERE id IN (".implode(',',$ids).")";
(may want to use shuffle() in php on the results afterwards as you likly to get the results out of mysql in id order)
Which will be much more efficient. If you do have holes, perhaps just lookup 33 rows. Sometimes will get more than need, (just discard), but you should still get 30 most of the times.
(Of course you could cache the '$max' somewhere, so it doesnt have to be looked up all the time.)
Otherwise you could setup a dedicated 'shuffled' list. Basically a FIFO buffer, have one thread, filling it with random results (perhaps using the above system, using 3000 ids at a time) and then the consumers just read random results directly out of this queue.
FIFO, is not particully easy to implement with mysql, so maybe use a different system - maybe redis, or even just memcache.