I have this SQL server instance which is shared by several client-processes. I want queries to finish taking as little time as possible.
Say a call needs to read 1k to 10k records from this shared Sql Server. My natural choice would be to use ExecuteReaderAsync to take advantage of async benefits such as reusing threads.
I started wondering whether async will pose some overhead since execution might stop and resume for every call to ExecuteReaderAsync. That being true, seems that overall time for query to complete would be longer if compared to a implementation that uses ExecuteReader. Does that make (any) sense?
Whether you use sync or async to call SQL Server makes no difference for the work that SQL Server does and for the CPU-bound work that ADO.NET does to serialize and deserialize request and response. So no matter what you chose the difference will be small.
Using async is not about saving CPU time. It is about saving memory (less thread stacks) and about having a nice programming model in UI apps.
In fact async never saves CPU time as far as I'm aware. It adds overhead. If you want to save CPU time use a synchronous approach.
On the server using async in low-concurrency workloads adds no value whatsoever. It adds development time and CPU cost.
The difference between the async approach and the sync approach is that the async call will cause the compiler to generate a state machine, whereas the sync call will simply block while the work agains't the database is being done.
IRL, the best way to choose is to benchmark both approaches. As usr said, usually those differrences are neglectable compared to the time the query takes to execute. Async will shine brighter in places where it may save resources such as allocating a new thread
There are many posts about async performance:
Async await performance
The zen of async: best practices for best performance
Async Performance: Understanding the Costs of Async and Await
Related
I just wonder, why would be the case of using async DB connect.
From my perspective, it's better to tweak query and schema, maybe re-think your indices at first glance.
In my understanding, db should be executing no more than a couple of ms.
Problem with long connectivity could be solved with connection pulling (constant pinging if there is a tunnel)
If you have async parts of your code, all the code should be async too.
3.1 Not to forget async is less robust, more error-prone if you have minor experience with it.
3.2 Async code is longer in a single execution by default, due to the loop nature.
Any really long computation, with the ability to process long networking in business logic - better be done in old fashion - by the separate queue. Since you will lose control of the execution time of your requests anyway. So what is the point anyway?
I could be missing something, but.
As far as I know, Tornado creator (#bdarnell) always saying the same. Correct me please if I'm wrong.
I'm going to create a web-api using pure node.js that do CRUD operations on SQL Server and return results to clients. The queries are almost long running (around 3 seconds) and request per second is high (around 30 rps). I'm using mssql package with a call back function to return result once it's ready.
I've already read a lot about node and I know its quite fits for IO intensive not CPU intensive apps and also event loop shouldn't be blocked because it's single threaded...
My question: Is Node.js suitable for this (SQL intensive) scenario? Is there any performance issue to use Node.js for this case?
Thanks
Node.js has gone all-in on non-blocking code to the degree that pretty much any function that's blocking in the Node.js API will be labelled as Sync.
Every database driver I've seen follows the model of requiring callbacks, using Promises, or in some cases both.
As a Node.js developer you must read the documentation carefully to look for any potentially blocking calls, and need to employ the correct concurrency method to handle asynchronous operations. Normally you don't need to overly concern yourself with the details of how long any given operation is, but you should be still be careful when doing things that are slow. Process data in smaller chunks (e.g. row by row) instead of all at once.
Even though it's just single threaded, Node.js can perform very well under load because it's very quick to switch between asynchronous operations. You can also scale up by having multiple Node.js processes working in parallel quite easily, especially if you're using a message bus or HTTP-type fan-out through a load balancer.
I work with Oracle and Mysql, and I struggle to understand why the APIs are not written such that I can issue a call, go away and do something else, and then come back and pick it up later eg NIO - I am forced to dedicate a thread to waiting for data. It seems that the SQL interfaces are the only place where sync IO is still forced, which means tying up a thread waiting for the DB.
Can anybody explain the reasons for this? Is there something fundamental that makes this difficult?
It would be great to be able to use 1-2 threads to manage my DB query issue and result fetch, rather than use worker threads to retrieve data.
I do note that there are two experimental attempts (eg: adbcj) at implementing an async API but none seem to be ready for Production use.
Database servers should be able to handle thousands of clients. To provide an asyncronous interface, the DB server will need to keep the resultset from the query in memory, so you can pick it up at later stage. It will quickly become out of resources.
A considerable problem with async is many many libraries use threadlocal for transactions.
For example in Java Much of the JDBC specification relies on a synchronous behavior to achieve single thread per-transaction. That is you write your transaction in procedural order.
To do it right transactions would have to be done through callback but they are not. I know of only node.js that does this but its unclear if its really async.
Of course even if you do async I'm not sure if it will really improve performance as the database itself if is probably doing it synchronous.
There are lots of ways to avoid thread over-population in (Java):
Is asynchronous jdbc call possible?
Personally to get around this issue I use a Message Bus like RabbitMQ.
Setup
I have web service that takes its inputs through a REST interface. The REST call does not return any meaningful data, so whatever is passed in to the web service is just recorded in the database and that is it. It is an analytics service which my company is using internally to do some special processing on web requests that are received on their web page. So it is very important the response take as little time to return as possible.
I have pretty much optimized the code down as much as possible, to make the response as fast as possible. However, the time the database stays open still keeps the connection open for longer than I want before a response is sent back to the web client.
The code looks basically like this, by the way it is ASP.NET MVC, using Entity Framework, running on IIS 7, if that matters.
public ActionResult Add(/*..bunch of parameters..*/) {
using (var db = new Entities()) {
var log = new Log {
// populate Log from parameters
}
db.AddToLogs(log);
db.SaveChanges();
}
return File(pixelImage, "image/gif");
}
Question
Is there a way to off load the database insert in to another process, so the response to the client is returned almost instantly?
I was thinking about wrapping everything in the using block in another thread, to make the database insert asynchronous, but didn't know if that was the best way to free up the response back to the client.
What would you recommend if you were trying to accomplish this goal?
If the request has to be reliable then you need to write it into the database. Eg. if your return means 'I have paid the merchant' then you can't return before you actually commit in the database. If the processing is long then there are database based asynchronous patterns, using a table as a queue or using built-in queuing like Asynchronous procedure execution. But these apply when heavy and lengthy processing is needed, not for a simple log insert.
When you want just to insert a log record (visitor/url tracking stuff) then the simplest solution is to use CLR's thread pools and just queue the work, something like:
...
var log = new Log {// populate Log from parameters}
ThreadPool.QueueUserWorkItem(stateInfo=>{
var queueLog = stateInfo as Log;
using (var db = new Entities())
{
db.AddToLogs(queuedLog);
db.SaveChanges();
}
}, log);
...
This is quick and easy and it frees the ASP handler thread to return the response as soon as possible. But it has some drawbacks:
If the incomming rate of requests exceeds the thread pool processing rate then the in memory queue will grow until it will trigger an app pool 'recycle', thus loosing all items 'in progress' (as well as warm caches and other goodies).
The order of requests is not preserved (may or may not be important)
It consumes a CLR pool thread on doing nothing but waiting for a response from the DB
The last concern can be addressed by using a true asynchronous database call, via SqlCommand.BeginExecuteXXX and setting the AsynchronousProcessing on the connection to true. Unfortunately AFAIK EF doesn't yet have true asynchronous execution, so you would have to resort to the SqlClient layer (SqlConnection, SqlCommand). But this solution would not address the first concern, when the rate of page hits is so high that this logging (= writes on every page hit) becomes a critical bottle neck.
If the first concern is real then and no threading and/or producer/consumer wizardry can aleviate it. If you trully have an incomming rate vs. write rate scalability concern ('pending' queue grows in memory) you have to either make the writes faster in the DB layer (faster IO, special log flush IO) and/or you have to aggregate the writes. Instead of logging every requests, just increment in memory counters and write them periodically as aggregates.
I've been working on multi-tier solutions mostly for the last year or so that require this sort of functionality, and that's exactly how I've been doing it.
I have a singleton that takes care of running tasks in the background based on an ITask interface. Then I just register a new ITask with my singleton and pass control from my main thread back to the client.
Create a separate thread that monitors a global, in memory queue. Have your request put it's information on the queue and return, the thread then takes the item off the queue and posts it to the DB.
Under heavy load, if the thread lags the requests, your queue will grow.
Also, if you lose the machine, you will lose any unprocessed queue entries.
Whether these limitations are acceptable to you, you'd need to decide that.
A more formal mechanism is using some actual middleware messaging system (JMS in Java land, dunno the equivalent in .NET, but there's certainly something).
It depends: When you return to the client do you need to be 100% sure that the data is stored in the database?
Take this scenario:
Request comes in
A thread is started to save to the database
Response is sent to the client
Server crashes
Data was not saved to the database
You also need to check how many milliseconds you save by starting a new thread instead of saving to the database.
The added complexity and maintainence cost is probably too high compared with the savings in response time. And the savings in response time are probably so low that they will not be noticed.
Before I spent a lot of time on the optimization I'd be sure of where the time is going. Connections like these have significant latency overhead (check this out). Just for grins, make your service a NOP and see how it performs.
It seems to me that the 'async-ness' needs to be on the client - it should fire off the call to your service and move on, especially since it doesn't care about the result?
I also suspect that if the NOP performance is good-to-tolerable on your LAN it will be a different story in the wild.
I am developing an application which involves multiple user interactivity in real time. It basically involves lots of AJAX POST/GET requests from each user to the server - which in turn translates to database reads and writes. The real time result returned from the server is used to update the client side front end.
I know optimisation is quite a tricky, specialised area, but what advice would you give me to get maximum speed of operation here - speed is of paramount importance, but currently some of these POST requests take 20-30 seconds to return.
One way I have thought about optimising it is to club POST requests and send them out to the server as a group 8-10, instead of firing individual requests. I am not currently using caching in the database side, and don't really have too much knowledge on what it is, and whether it will be beneficial in this case.
Also, do the AJAX POST and GET requests incur the same overhead in terms of speed?
Rather than continuously hitting the database, cache frequently used data items (with an expiry time based upon how infrequently the data changes).
Can you reduce your communication with the server by caching some data client side?
The purpose of GET is as its name
implies - to GET information. It is
intended to be used when you are
reading information to display on the
page. Browsers will cache the result
from a GET request and if the same GET
request is made again then they will
display the cached result rather than
rerunning the entire request. This is
not a flaw in the browser processing
but is deliberately designed to work
that way so as to make GET calls more
efficient when the calls are used for
their intended purpose. A GET call is
retrieving data to display in the page
and data is not expected to be changed
on the server by such a call and so
re-requesting the same data should be
expected to obtain the same result.
The POST method is intended to be used
where you are updating information on
the server. Such a call is expected to
make changes to the data stored on the
server and the results returned from
two identical POST calls may very well
be completely different from one
another since the initial values
before the second POST call will be
differentfrom the initial values
before the first call because the
first call will have updated at least
some of those values. A POST call will
therefore always obtain the response
from the server rather than keeping a
cached copy of the prior response.
Ref.
The optimization tricks you'd use are generally the same tricks you'd use for a normal website, just with a faster turn around time. Some things you can look into doing are:
Prefetch GET requests that have high odds of being loaded by the user
Use a caching layer in between as Mitch Wheat suggests. Depending on your technology platform, you can look into memcache, it's quite common and there are libraries for just about everything
Look at denormalizing data that is going to be queried at a very high frequency. Assuming that reads are more common than writes, you should get a decent performance boost if you move the workload to the write portion of the data access (as opposed to adding database load via joins)
Use delayed inserts to give priority to writes and let the database server optimize the batching
Make sure you have intelligent indexes on the table and figure out what benefit they're providing. If you're rebuilding the indexes very frequently due to a high write:read ratio, you may want to scale back the queries
Look at retrieving data in more general queries and filtering the data when it makes to the business layer of the application. MySQL (for instance) uses a very specific query cache that matches against a specific query. It might make sense to pull all results for a given set, even if you're only going to be displaying x%.
For writes, look at running asynchronous queries to the database if it's possible within your system. Data synchronization doesn't have to be instantaneous, it just needs to appear that way (most of the time)
Cache common pages on disk/memory in a fully formatted state so that the server doesn't have to do much processing of them
All in all, there are lots of things you can do (and they generally come down to general development practices on a more bite sized scale).
The common tuning tricks would be:
- use more indexing
- use less indexing
- use more or less caching on filesystem, database, application, or content
- provide more bandwidth or more cpu power or more memory on any of your components
- minimize the overhead in any kind of communication
Of course an alternative would be to:
0 develop a set of tests, preferable automatic that can determine, if your application works correct.
1 measure the 'speed' of your application.
2 determine how fast it has to become
3 identify the source of the performane problems:
typical problems are: network throughput, file i/o, latency, locking issues, insufficient memory, cpu
4 fix the problem
5 make sure it is actually faster
6 make sure it is still working correct (hence the tests above)
7 return to 1
Have you tried profiling your app?
Not sure what framework you're using (if any), but frankly from your questions I doubt you have the technical skill yet to just eyeball this and figure out where things are slowing down.
Bluntly put, you should not be messing around with complicated ways to try to solve your problem, because you don't really understand what the problem is. You're more likely to make it worse than better by doing so.
What I would recommend you do is time every step. Most likely you'll find that either
you've got one or two really long running bits or
you're running a shitton of queries because of an n+1 error or the like
When you find what's going wrong, fix it. If you don't know how, post again. ;-)