SQL Query optimisation - cast versus converting in ETL - sql-server

I am reviewing some slow running processes in a data warehouse and noticed there are a few cast (column as date) conversion when joining tables in a stored procedure which feeds to the semantic layer.
I was wondering if moving these conversations to the staging layer ETLs will make any noticeable difference?
I understand converting them in the ETL is best practice and you do it only once and don't have use cast on all the other stored procedures/views/queries but I wonder if converting in the ETL is any faster than cast() function in SQL Server?

Related

Good Practice Question: Using DAX vs Doing Everything in the Database

Is it a better practice to do all of the calculations, etc in SQL Server (or whatever database you are using) instead of DAX and avoid DAX at all cost or only use it for minor things?
I'm inexperienced with DAX.
We are having some performance issues with DAX, so I was wondering what the best practice would be.
Data transformations and calculations should be performed at the lowest level they make sense, and in an environment you are comfortable and productive in.
So if all your data comes from a SQL Server you control, and you are comfortable performing data transformation and calculation tasks using TSQL, then you should do much of the prep work, modeling, and basic calculations there.
Note that TSQL is incapable of expressing complex business calculations that can be applied across arbitrary "filter contexts", so you will still use DAX for some measure calculations.
On the other hand, if you don't have a SQL Server you control, or you're mashing-up data directly in Power BI, or you don't have a TSQL skillset, then you should do the data transformation in Power Query/DAX and the business calculations in DAX.

SSIS data flow task or store procedure?

Which is the faster way to load data from a table in one server to another table present in a different server? Data flow task or a stored procedure in an Execute SQL Task? Say, the table has around 100 million records and has necessary indexing.
Where I work, stored procedures are used and I would like to better the execution time taken. I was just wondering whether changing it to DFT could make it any faster. And, in some cases there are a lot of JOINS involved.So, on a general aspect which can offer better performance (regardless of the table structure or execution plan)?
Any help is appreciated.

SSRS Best Practice - Data Calculations/Aggregatopm in SQL SP or in SSRS Expressions (VS/Report Builder)

Should I try to do all (or as many as possible) necessary calculations for an SSRS report in SQL code (stored procedures) like summing, percentages etc. or should I do the calculations using Expressions in Report Builder/VS?
Is there an advantage to doing one over the other?
In other words, should I try to keep the data in my Datasets very granular, detailed, low level and then just use Report Builder 3.0/VS to do all the necessary calculations/aggregations?
There is no one-size-fits-all best approach. In a lot of cases, SQL will be faster at performing aggregations than SSRS. SSRS will be faster at performing the kind of operations that would cause a table scan instead of an index seek when it's done in SQL.
Experience, common sense, and testing are the best guides.
Almost always you want to do your filtering and calcs on the server side. If you do it through a stored procedure SQL Server can optimize the query and create a well prepared, reusable, query plan. You can examine the resulting query plan and optimize it. None of this is possible if you create and run the code on the client side. How will it use indexes on the client? If your report uses a lot of data your report will take a much longer time to run and your users will blame you. The editor in BIDS is much poorer then the one in SSMS. Procs can be backed up and managed through SVN of TFS. Unless you know for sure that it runs faster on the client (and this is very rare) learn how to create stored procedures.

Stored procedure vs embedded SQL in SSIS performance

I recently completed a SSIS course.
One of the piece of best practice I came away with, was to ALWAYS use stored procedures in data flow tasks in SSIS.
I guess there is an argument around security, however the tutor said that as the stored procedures performed all of the work "native" on the SQL Server there was/is a significant performance boost.
Is there any truth to this or articles that debate the point?
Thanks
Remember - mostly courses are done by clueless people because people with knowledge earn money doing consulting which pays a LOT better than training. Most trainers live in a glass house that never spends 9 months working on a 21tb data warehouse ;)
This is wrong. Point.
It only makes sense when the SQL Statement does not pull data out of the database - for example merging tables etc.
Otherwise it is a question of how smart you set up the SSIS side. SSIS can write data not using SQL, using bulk copy mechanisms. SSIS is a lot more flexible, and if you pull data from a remote database then the argument of not leaving the database (i.e. processing native) is a stupid point to make. When I copy data from SQL Server A to SQL Server B, a SP on B can not process he data from A native.
In general, it is only faster when you take data FROM A and push it TO A and all the processing can be done in a simple SP - which is a degenerate edge case (i.e. a simplistic one).
The advantage of SSIS is the flexibility of processing data in an environment designed for data flow, which in many cases is needed in the project and doing that in stored procedures would turn nightmare.
Old thread, but a pertinent topic.
For a data source connection, I favor SPs over embedded queries when A) the logic is simple enough to be handled in both ways, and B) the support of the SP is easier than working with the package.
I haven't found much, if any, difference in performance for the data source if the SP returns a fairly straighforward result set.
Our shop has a more involved deploy process for packages, which makes SPs a preferred source.
I have not found very many applications for a SP being a data destination, except maybe an occasional logging SP call.

Performance Difference between LINQ and Stored Procedures

Related
LINQ-to-SQL vs stored procedures?
I have heard a lot of talk back and forth about the advantages of stored procedures being pre compiled. But what are the actual performance difference between LINQ and Stored procedures on Selects, Inserts, Updates, Deletes? Has anyone run any tests at all to see if there is any major difference. I'm also curious if a greater number of transactions makes a difference.
My guess is that LINQ statements get cached after the first transaction and performance is probably going to be nearly identical. Thoughts?
LINQ should be close in performance but I disagree with the statement above that says LINQ is faster, it can't be faster, it could possibly be just as as fast though, all other things being equal.
I think the difference is that a good SQL developer, who knows how to optimize, and uses stored procedures is always going to have a slight edge in performance. If you are not strong on SQL, let Linq figure it out for you, and your performance is most likely going to be acceptable. If you are a strong SQL developer, use stored procedures to squeeze out a bit of extra performance if you app requires it.
It certainly is possible if you write terrible SQL to code up some stored procedures that execute slower than Linq would, but if you know what you are doing, stored procedures and a Datareader can't be beat.
Stored procedures are faster as compared to LINQ query they can take the full advantage of SQL features.
when a stored procedure is being executed next time, the database used the cached execution plan to execute that stored procedure.
while LINQ query is compiled each and every time.
Hence, LINQ query takes more time in execution as compared to stored procedures.
Stored procedure is a best way for writing complex queries as compared to LINQ.
LINQ queries can (and should be) precompiled as well. I don't have any benchmarks to share with you, but I think everyone should read this article for reference on how to do it. I'd be especially interested to see some comparison of precompiled LINQ queries to SPROCS.
There is not much difference except that LINQ can degrade when you have lot of data and you need some database tuning.
LINQ2SQL queries will not perform any differently from any other ad-hoc parameterized SQL query, other than the possibility that the generator may not optimize the query in the best fashion.
The common perception is that ad-hoc sql queries perform better than Stored Procedures. However, this is false:
SQL Server 2000 and SQL Server version
7.0 incorporate a number of changes to statement processing that extend many
of the performance benefits of stored
procedures to all SQL statements. SQL
Server 2000 and SQL Server 7.0 do not
save a partially compiled plan for
stored procedures when they are
created. A stored procedure is
compiled at execution time, like any
other Transact-SQL statement. SQL
Server 2000 and SQL Server 7.0 retain
execution plans for all SQL statements
in the procedure cache, not just
stored procedure execution plans.
-- SqlServer's Books Online
Given the above and the fact that LINQ generates ad-hoc queries, my conclusion is that there is no performance difference between Stored Procedures & LINQ. And I am also apt to believe that SQL Server wouldn't move backwards in terms of query performance.
Linq should be used at the business logic layer on top of views created in sql or oracle. Linq helps you insert another layer for business logic, maintenance of which is in the hands of coders or non sql guy. It will definitely not perform as well as sql coz its not precompiled and you can perform lots of different things in sps.
But you can definitely add a programming detail and segregate the business logic from core sql tables and database objects using Linq.
See LINQ-to-SQL vs stored procedures for help - I think that post has most of the info. you need.
Unless you are trying to get every millisecond out of your application, whether to use a stored procedure or LINQ may need to be determined by what you expect developers to know and maintainability.
Stored procedures will be fast, but when you are actively developing an application you may find that the easy of using LINQ may be a positive, as you can change your query and your anonymous type that is created from LINQ very quickly.
Once you are done writing the application and you know what you need, and start to look at optimizing it, then you can look at other technologies and if you have good unit testing then you should be able to compare different techniques and determine which solution is best.
You may find this comparison of various ways for .NET 3.5 to interact with the database useful.
http://toomanylayers.blogspot.com/2009/01/entity-framework-and-linq-to-sql.html
I don't think I would like having my database layer in compiled code. It should be a separate layer not combined. As I develop and make use of Agile I am constantly changing the database design, and the process goes very fast. Adding columns, removing columns or creating a new tables in SQL Server is as easy as typing into Excel. Normalizing a table or de-normalizing is also pretty fast at the database level. Now with Linq I would also have to change the object representation every time I make a change to the database or live with it not truly reflecting how the data is stored. That is a lot of extra work.
I have heard that Linq entities can shelter your application from database change but that doesn't make sense. The database design and application design need to go hand in hand. If I normalize several tables or do some other redesign of the database I wouldn't want a Linq object model to no longer reflect the actual database design.
And what about advantage of tweaking a View or Stored Procedure. You can do that directly at the database level without having to re-compile code and release it to production. If I have a View which shows data from several tables and I decide to change the database design all I have to do is change that View. All my code remains the same.
Consider a database table with a million entries, joined to another table with a million entries... do you honestly think that doing this on the webserver (be it in LINQ or ad-hoc SQL) is going to be faster or more efficient than letting SQL Server do it on the database?
For simple queries, then LINQ is obviously better as it will be pre-compiled, giving you the advantage of having type checking , etc. However, for any intensive database operations, report building, bulk data analysis that need doing, stored procedures will win hands down.
<script>alert("hello") </script> I think that doing this on the webserver (be it in LINQ or ad-hoc SQL) is going to be faster or more efficient than letting SQL Server do it on the database?
For simple queries, then LINQ is obviously better as it will be pre-compiled, giving you the advantage of having type checking , etc. However, for any intensive database operations, report building, bulk data analysis that need doing, stored procedures will win hands dow

Resources