Overnight Batch Processing - Inline Application Code Vs Transact SQL - sql-server

The debate over the use of inline code vs stored procedures pretty much always centres around simple CRUD operations and whether to call stored procedures or use inline SQL in a live application. However, it is also common for companies such as banks, hedge funds and insurance companies to do batch processing which is scheduled to occur after hours. These are not simple CRUD operations, we're talking about often transactional, specialised business logic. One example might be the calculation daily compounded interest.
The process needs to be efficient and scalable due to the volume of records to be processed. By processing overnight, the batches can utilise resources that would not be available to it during the day.
It is no surprise to me that this kind of logic is often implemented in the back-end using something like stored procedures in SQL server or its equivalent on other platforms. I would expect such an implementation to always be more efficient than inline code, even if that inline code was implemented as a service running on the database server (without network latency).
The TSQL implementation benefits from compiled query execution plans and does not have to parse data between processes via a connection.
Am I wrong about this? I would like to hear from people with experience in this area. Anyone who believes a back-end implementation means writing inline code via cursors in stored procedures need not comment.

Related

Effect of stored procedures on network traffic in Access/SQL setup

I am currently administering/developing an Access 2010 frontend/SQL backend database. We are trying to improve frontend performance, and one solution that has been suggested is pushing a lot of the VBA that is running the front end down into stored procedures on the server. I'm fairly proficient in VBA, but very new to SQL and network architecture. Everything I've turned up on google has been information about splitting the database, which is already done, rather than information about network loads resulting from running stored procedures vs running VBA.
What is the difference in network traffic between the current setup and pushing this action down to a stored procedure?
As a specific example, if I'm populating a form in the current setup, there are a few queries run to provide data to different elements on the form. With the current architecture, does Access retrieve the queried tables from the backend, query them client-side and then populate the data? How would that be different in terms of network traffic from, say, executing a SP when the form loads, and only transferring the data necessary for displaying the form?
The end goal is to reduce the chattiness between Access and SQL, and I'm mostly trying to figure out exactly what is happening where.
As a general rule, if you launch a form open with a where clause to restrict the form to one record, then using a bound form, or adopting a stored procedure will NOT result in any difference or reduction in network traffic.
Any local access query based on a table simply will request the one record. There is no “local” concept of processing in this regards EVEN with a linked table. Note the word “table” or singular here.
Access does not and will not pull down a whole table unless you have such forms and quires without any “where” clause to restrict the data pulled.
In other words if you have a poorly designed form, dump and change that design to something in which you now ONLY pull down the one record, then of course the setup will result in reduced network traffic.
However the above reduction is NOT DUE to adopting the stored procedure but ONLY that of adopting a design in which you restrict the records requested into the form.
So doing something poorly and then improving that process is NOT a justification to adopt stored procedures.
Thus in the case of pulling records into a form the using a stored procedure will NOT improve performance. Worse is binding a form to a stored procedure results in a form that is READY ONLY anyway!
So stored procedures don’t necessary increase performance or reduce network traffic when talking about loading a record into a form in terms of response time or performance.
If you have to do large amounts of recordset processing then of course adopting a stored procedure can save network performance. So in place of some VBA code to process 100,000 payroll reocrds, then yes moving such code server side will help. However processing a 100,000 payroll records is NOT common task and is NOT a user interface issue in most cases anyway. In other words, you don’t have a slow loading form or slow response time to load such forms. In other words, such types of processing are NOT done interactive by users waiting for a form to load.
SQL server is indeed a high performance system, and also a system that can scale to many users.
If you write your application in c++, or VB or in your case with ms-access, in GENERAL the performance of all of these tools will BE THE SAME.
In other words...sql server is rather nice, and is a standard system used in the IT industry.
However, sql server will NOT solve your performance issues without efforts on your part. And, it turns out that MOST of those same efforts also make your non sql server Access applications run better.
In fact, we see many posts that mention moving the back end data
to sql server actually slowed things down. (and in fact on a single machine, Access JET (now called ACE) is actually FASTER THEN SQL server (so when single user on same machine – Access is faster than SQL server on the same machine in most cases).
A few things:
Having a table with 75k records is quite small. Let’s assume you have 12 users. With a just a 100% file base system (jet), and no sql server, then the performance of that system should really have screamed.
I have some applications out there with 50, or 60 HIGHLY related tables. With 5 to 10 users on a network, response time is instant. I don't think any form load takes more than one second. Many of those 60+ tables are highly relational and in the 50 to 75k records range.
So, with my 5 users I see no reason why I can’t scale to 15 users with such small tables in the 75,000 record range. And this is without SQL server.
If the application did not perform with such small tables of only 75k records then upsizing to sql server will do absolute nothing to fix performance issues. In fact, in the sql server newsgroups you see weekly posts by people who find that upgrading to sql actually slowed things down.
I even seem some very cool numbers showing that some queries where actually MORE EFFICIENT in terms of network use by JET then sql server.
My point here is that technology will NOT solve performance problems. However, good designs that make careful use of limited bandwidth resources is the key here. So, if the application was not written with good performance in mind then you kind are stuck with a poor design!
I mean, when using a JET file share, you grab a invoice from the 75k record table only the one record is transferred down the network with a file share (and, sql server will also only transfer one record). So, at this point, you
really will NOT notice any performance difference by upgrading to SQL Server. There is no magic here. And adopting a SQL stored procedure will be even a GREATER waste of time!
And adopting a stored procedure in place of above will NOT gain you performance either!
Sql server is a robust and more scalable product then is JET. And, security, backup and host of other reasons make sql server a good choice. However, sql server will NOT solve a performance problem with dealing with such small tables as 75k records
Of course, when efforts are made to utilize sql server, then significant advances in performance can be realized.
I will give a few tips...these apply when using ms-access as a file share (without a server), or even odbc to sql server:
** Ask the user what they need before you load a form!
The above is so simple, but so often I see the above concept ignored. For example, when you walk up to an instant teller machine, does it download every account number and THEN ASK YOU what you want to do?
In access, it is downright silly to open up form attached to a table WITHOUT FIRST asking the user what they want! So, if it is a customer invoice, get the invoice number, and then load up the form with the ONE record. How can one record be slow? When done editing the record and the form is closed, and you are back to the prompt ready to do battle with the next customer.
You can read up on how this "flow" of a good user interface works here (and this applies to both JET, and sql server applications):
http://www.kallal.ca/Search/index.html
My only point here is restrict the form to only the ONE record the user needs. You don't need nor gain by using a stored procedure to accomplish this task. I am always dismayed how often a developer builds a nice form, attaches it to a large table, and then opens it and the throws this form attached to some huge table and then tells the users to go have at this and have fun. Don't we have any kind of concern for those poor users? Often, the user will not even know how to search for something!
So prompt, and asking the user also makes a HUGE leap forward in usability. And, the big bonus is reduced network traffic too! Gosh better and faster, and less network traffic! What more do we want!
** USE CAUTION with quires that require more than one linked table
JET has a real difficult time joining odbc tables together. Often the Access data engine (jet/Ace) does a good job, but often such joins are slow. However most forms for editing data are NOT based on a multi-table query. (so again, a stored procedure will not speed up form load for editing of data).
The simple solution for such multiple joins (for both forms and reports) is build the sql server side as a view, and then link to that view.
This view approach is MUCH less work then a stored procedure and results in the joins occurring server side. And results view are updatable as opposed to READ ONLY when you adopt stored procedures. And performance of such views will again equal that of stored procedure in THIS context.
So once gain, adopting stored procedures DOES NOT help and is more expensive from a developer cost then simply using a view. Really this just amounts to people suggesting that you rack up bills and use developer time to create something that yields nothing over that of a view except more billable hours.
I don't think it needs pointing out that if the query in question already runs well, then the above can be ignored, but just keep in mind that local queries with more than one table based on links to sql server can often run slow. So, just be aware of the above.
This view trick also applies well to combo boxes.
So one can continue to use bound forms to a linked table but one simply needs to restrict the form to the ONE RECORD you need.
You can safely open up to a single invoice form etc. but simply ENSURE you open such forms (openForm) by restricting records via the "where" clause. No view, or stored procedure is required here.
Bound forms are way less work then un-bound forms and performance is generally just as good anyway when done right.
Avoid large loading of combo boxes. A combo box is good for about 100 entries. After that you are torturing the user (what they got to look through 100s of entries). So, keep things like combo boxes down to a min size. This is both faster and MORE importantly it is kinder to your users.
After all, at the end of the day what we really want is to treat users well. It seems that treating the users well, and reducing the bandwidth (amount of data) goes hand in hand.
So, better applications treat the users well and run faster! (this is good news!)
So, #1 tip is to reduce the data that you transfer into a form.
Using stored procedures is not required in the vast majority of cases and will not reduce bandwidth requirements anymore then adopting where clauses and views.

Stored procedure vs embedded SQL in SSIS performance

I recently completed a SSIS course.
One of the piece of best practice I came away with, was to ALWAYS use stored procedures in data flow tasks in SSIS.
I guess there is an argument around security, however the tutor said that as the stored procedures performed all of the work "native" on the SQL Server there was/is a significant performance boost.
Is there any truth to this or articles that debate the point?
Thanks
Remember - mostly courses are done by clueless people because people with knowledge earn money doing consulting which pays a LOT better than training. Most trainers live in a glass house that never spends 9 months working on a 21tb data warehouse ;)
This is wrong. Point.
It only makes sense when the SQL Statement does not pull data out of the database - for example merging tables etc.
Otherwise it is a question of how smart you set up the SSIS side. SSIS can write data not using SQL, using bulk copy mechanisms. SSIS is a lot more flexible, and if you pull data from a remote database then the argument of not leaving the database (i.e. processing native) is a stupid point to make. When I copy data from SQL Server A to SQL Server B, a SP on B can not process he data from A native.
In general, it is only faster when you take data FROM A and push it TO A and all the processing can be done in a simple SP - which is a degenerate edge case (i.e. a simplistic one).
The advantage of SSIS is the flexibility of processing data in an environment designed for data flow, which in many cases is needed in the project and doing that in stored procedures would turn nightmare.
Old thread, but a pertinent topic.
For a data source connection, I favor SPs over embedded queries when A) the logic is simple enough to be handled in both ways, and B) the support of the SP is easier than working with the package.
I haven't found much, if any, difference in performance for the data source if the SP returns a fairly straighforward result set.
Our shop has a more involved deploy process for packages, which makes SPs a preferred source.
I have not found very many applications for a SP being a data destination, except maybe an occasional logging SP call.

How much performance do I lose by increasing the number of trips to SQL Server?

I have a web application where the web server and SQL Server 2008 database sit on different boxes in the same server farm.
If I take a monolithic stored procedure and break it up into several smaller stored procs, thus making the client code responsible for calls to multiple stored procedures instead of just one, and I going to notice a significant performance hit in my web application?
Additional Background Info:
I have a stored procedure with several hundred lines of code containing decision logic, update statements, and finally a select statement that returns a set of data to the client.
I need to insert a piece of functionality into my client code (in this sense, the client code is the ASP web server that is calling the database server) that calls a component DLL. However, the stored procedure is updating a recordset and returning the udpated data in the same call, and my code ideally needs to be called after the decision logic and update statements are called, but before the data is returned to the client.
To get this functionality to work, I'm probably going to have to split the existing stored proc into at least two parts: one stored proc that updates the database and another that retrieves data from the database. I would then insert my new code between these stored proc calls.
When I look at this problem, I can't help but think that, from a code maintenance point of view, it would be much better to isolate all of my update and select statements into thin stored procs and leave the business logic to the client code. That way whenever I need to insert functionality or decision logic into my client code, all I need to do is change the client code instead of modifying a huge stored proc.
Although using thin stored procs might be better from a code maintenance point-of-view, how much performance pain will I experience by increasing the number of trips to the database? The net result to the data is the same, but I'm touching the database more frequently. How does this approach affect performance when the application is scaled up to handle demand?
I'm not one to place performance optimization above everything else, especially when it affects code maintenance, but I don't want to shoot myself in the foot and create headaches when the web application has to scale.
in general, as a rule of thumb you should make roundtrips to SQL server at a minimum.
the "hit" on the server is very expensive, it's actually more expensive to devide the same operation into 3 parts then doing 1 hit and everything else on the server.
regarding maintenance, you can call 1 stored proc from the client, having that proc call another 2 proc's.
I had an application with extreme search logic, thats what I did to implement it.
some benchmarking results...
I had a client a while back that had servers falling and crumbling down, when we checked for the problem it was many roundtrips to SQL server, when we minimized it, the servers got back to normal.
It will affect it. We use a Weblogic server where all the business logic is in the AppServer connected to a DB/2 database. We mostly use entity beans in our project and for most business service calls make several trips to the DB with no visible side effects. (We do tune some queries to be multi-table when needed).
It really depends on your app. You are going to need to benchmark.
A well setup SQL Server on good hardware can process many thousands of transactions per second.
In fact breaking up a large stored procedure can be beneficial, because you can only have one cached query plan per batch. Breaking into several batches means they will each get their own query plan.
You should definitely err on the side of code-maintenance, but benchmark to be sure.
Given that the query plan landscape will chnage, you should alos be prepared to update your indexes, perhaps creating different covering indexes.
In essence, this question is closely related to tight vs. loose coupling.
At the outset: You could always take the monolithic stored procedure and break it up into several smaller stored procs, that are all called by one stored procedure, thus making the client code only responsible for calling one stored procedure.
Unless the client will do something (change the data or provide status to user) I would probably not recommend moving multiple calls to the client, since you would be more tightly coupling the client to the order of operations for the stored procedure without a significant performance increase.
Either way, I would benchmark it and adjust from there.

Many connections vs. big data queries

Hello I am creating a windows application that will be installed in 10 computers that will access the same database thru Entity Framework.
I was wondering what's better:
Spread the queries into packets (i.e. load contact then attach the included navigation properties - [DataContext.Contacts.Include("Phone"]).
Load everything in one query rather then splitting it out in individual queries.
You name it.
BTW I have a query that its trace string produced over 500 lines of sql, im doubting, maybe i should waive user-exprience for performance since performance is also a part of u.e.
You could put your SQL in stored procedures and write your Entity Framework logic to use the procedures instead of generating the SQL and sending it over the wire.
As with everything database related, it depends. Things like the connection type (LAN vs WAN), how you handle caching, database load level, type of database load (writes vs reads) etc, can all make a difference.
But in general, whenever you can reduce the number of round trips to the database that's a good thing. And remember: you can have more than one result set after executing a single SqlCommand.
Load everything in one query rather
then splitting it out in individual
queries.
This will normally be superior. You're usually better off writing chunkier queries than chatty ones. Fewer calls have less overhead - you need to obtain fewer connections, deal with less latency, etc..
Does the database server have to support other applications? For most business software applications, SQL server won't even break a sweat servicing ten clients - particularly performing basic entity lookups. It won't even really know you're there unless it's installed on a 486SX.

Implementing functionality/code directly in database system

RDBMS packages today offer a tremendous amount of functionality beyond standard data storage and retrieval. SQL Server for example can send emails, expose web service methods, and execute CLR code amongst other capabilities. However, I have always tried to limit the amount of processing my database server does to just data storage and retrieval as much as possible, for the following reasons:
A database server is harder to scale than web servers
In a lot of projects I've worked on, the DB server is a lot busier than the web servers, and thus has less spare capacity
It potentially exposes your database server to a security attack (web services for example)
My question is, how do you decide how much functionality or code should be implemented directly on your database server versus other servers in your architecture? What recommendations do you have for people starting new projects?
I know Microsoft SQL Server and Oracle really push using stored procedures for everything, which helps to encapsulate the relational architecture and creates a more procedural interface for the software developers, who typically aren't as facile writing SQL queries.
But then half your application logic is written in PL/SQL (or T-SQL or whatever) and the other half is written in your application language, Java or PHP or C#, etc. The DBA is typically responsible for coding the procedures, and the developers are responsible for everything else. No one has visibility and access to the full application logic. This tends to slow down development, testing, and future revisions to the project.
Also software development tools tend to be poor for stored procedures. Tools and best practices for debugging, source control, and testing all seem to be about 10-15 years behind the state of the art for application languages.
So I tend to stay away from stored procedures and triggers if at all possible. Except in certain cases when a well-placed stored procedure can make a complex SQL operation happen entirely in the server instead of shuffling data back and forth. This can be very effective at eliminating performance bottlenecks.
It's possible to go too far in the other direction as well. People who prefer the application manage data versus metadata, and employ designs like Entity-Attribute-Value or Polymorphic Associations, get themselves into trouble. Let the database manage that. Use referential integrity constraints (foreign keys). Use transactions.
The vendors have one set of best practices. You, however, voice concerns with that.
Years ago I supported a Major Software Product. Major.
They said "The database is relational storage. Nothing more." Every user conference people would ask about stored procedure, triggers, and all that malarkey.
Their architect was firm. As soon as you get away from plain-old-SQL, you've got a support and maintenance nightmare. They did object-relational mapping from the DB into their product, and everything else was in their product.
This scales well. Multiple application servers can easily share a single database server.

Resources