Insert from memory optimized table to physical table - sql-server

Imagine this scenario in SQL Server 2016: we have to tables A and B
A is a memory optimized table
B is a normal table
We join A and B, and nothing happens and 1000 rows are returned in min time.
But when we want to insert this result set into another table (memory optimized table OR normal table or even a temp table), it takes 10 to 20 seconds to insert.
Any ideas?
UPDATE : Execution plans for normal scenario and memory optimized table added

When a DML statement targets a Memory-Optimized table, the query cannot run in parallel, and the server will employ a serialized plan. So, your first statement runs in a single-core mode.
In the second instance, the DML statement leverages the fact that "SELECT INTO / FROM" is parallelizable. This behavior was added in SQL Server 2014. Thus, you get a parallel plan for that. Here is some information about this:
Reference: What's New (Database Engine) - SQL Server 2014
I have run into this problem countless times with Memory-Optimized targets. One solution I have found, if the I/O requirements are high on the retrieval, is to stage the result of the SELECT statement into a temporary table or other intermediate location, then insert from there into the Memory-Optimized table.
The third issue is that, by default, statements that merely read from a Memory-Optimized table, even if that table is not the target of DML, are also run in serialized fashion. There is a hotfix for this, which you can enable with a query hint.
The hint is used like this:
OPTION(HINT USE ('ENABLE_QUERY_OPTIMIZER_HOTFIXES'))
Reference: Update enables DML query plan to scan query memory-optimized tables in parallel in SQL Server 2016
In either case, any DML that has a memory-optimized table as a target is going to run on a single core. This is by design. If you need to exploit parallelism, you cannot do it if the Memory-Optimized table is the target of the statement. You will need to benchmark different approaches to find the one that performs best for your scenario.

Related

SQL Server Update Statistics

I have 2 questions about SQL Server statistics, please help me. I am using SQL Server 2016.
My table TBL1 has only one column COL1. When I used COL1 in joins with other tables, statistics are automatically created on COL1.
Next I create a non-clustered index on COL1 of TBL1, then another set of statistics are created on COL1. Now I have 2 sets of statistics on COL1.
Out of the above 2 statistics, which statistics are used by SQL Server for further queries? I am assuming that the statistics created by the non-clustered index will be used, am I right?
If I use the Update Statics TBL1 command, all the statistics for TBL1 are updated. In the MSDN documentation, I see that updating statistics causes queries to recompile, what do they mean by re-compiling of queries? The MSDN link is
https://learn.microsoft.com/en-us/sql/relational-databases/statistics/update-statistics?view=sql-server-ver15
Please explain.
If there's only 1 column in your table, there's no reason to have a non-clustered index. This creates a separate copy of that data. Just create the clustered index on that column.
Yes - Since your table only has the one column and an index was created on that column, it's almost certain that SQL Server will use that index whenever joining to that table and thus the statistics for that index will be used.
In this context, it means that the execution plan in cache will be invalidated due to stale statistics and the next time a query executes the optimizer will recreate an execution plan. In other words, it will be assumed there may be a better set of steps to execute the query and the optimizer will try to assemble a better set of steps (execution plan) to execute.
Recommended Reading:
SQL Server Statistics
Understanding Execution Plans
Execution Plan Caching & Reuse

Different ways to define which database to use?

I am trying to find out the differences in the way you can define which database to use in SSMS.
Is there any functional difference between using the 'Available Databases' drop down list
Adventure works Available Databases dropdown,
the database being defined in the query
SELECT * FROM AdventureWorks2008.dbo.Customers
and
stating the database at the start?
USE AdventureWorks2008
GO
SELECT * FROM dbo.Customers
I'm interested to know if there is a difference in terms of performance or something that happens behind the scenes for each case.
Thank you for your help
Yes, there is. Very very small overhead is added when you use "USE AdventureWorks2008" as it will execute it against the database every time you execute the query. It also will print the "Command(s) completed successfully.". However it is so small overhead and if you are OK with this message then just do not care about that.
Yes, there can be the difference.
When you execute statements like this: SELECT * FROM AdventureWorks2008.dbo.Customers in the context of another database (not AdventureWorks2008) that another database's settings are applied.
First of all, any database has its Compatibility Level that can be different, so it can limit usage of some code, for example you cannot use APPLY operator in the context of database with CL set to 80 but you can do it within database with CL >= 90
Second, every database has its own set of options such as AUTO_UPDATE_STATISTICS_ASYNC and Forced Parameterization that can affect your query plan.
I did encounter some cases when the context of database influenced the plan:
One case was when I created filtered index for one table and it was used in the plan until I executed my query in the context of database with Simple parameterization, and it was not used for the same query when executed in the context of database with Forced parameterization. When I used the hint to force that index I've got the error that the query plan cannot be produced due to query hint, so I need to investigate and I found out that my query was parameterized and instead of my condition fld = 0 there was fld = #p and it could not use my filtered index with fld = 0 condition.
The second case was reguarding table cardinality estimation: we use staging tables to load the data in our ETL procedures and then switch then to actual tables like this:
insert into stg with(tablock);
...
truncate table actual;
alter table stg swith to actual;
All the staging tables are empty when the procedure compiles but within the proc they are filled with the data, so when we do joins between them they are not emty anymore. Passing from 0 rows to non-0 rows triggers statement recompilation that should take in consideration actual number of rows, but it did not happen on the production server, so all estimations were completely wrong (1 row fo every table) and I need to investigate. The cause was AUTO_UPDATE_STATISTICS_ASYNC set to ON in production database.
Now imagine you have 2 db: db1 and db2 with this option set to ON and OFF respectively, in db1 this code will have wrong estimations while if you execute it in db2 using db1.dbo.stg it will have right estimations. The execution time will be very different in these 2 databases.

Temp Tables in a Stored Procedure will it cause recompilation of execution plan

If I have Temp Tables being created in a stored procedure's definition and then dropping them when I am done with them will it result in recompilation of execution plan?
For stored procedures every time its called? Any personal Experience?
Any explanation please?
As when the temp tables are dropped at the end of every call, the execution plan becomes invalid. Does SQL Server still keep hold of the execution plan and reuse on next call or does it recompile it every time its called.
Dropping of a temporary table doesn't matter.
If a table is created (either permanent or temporary), all statement after that statement are recompiled (even if they don’t refer to the table). Calls to executable objects using EXEC aren’t recompiled. That's because SQL Server can create the plan after the objects are created. (In this case, the temp. table.)
You can monitor recompilation using Extended Events and its sql_statement_recompile or SQL Trace / SQL Server Profiler SQL:StmtRecompile.
A statement starts to execute. SP:StmtStarting or SQL:StmtStarting is raised
The statement is recompiled. SQL:StmtRecompile is raised. SP:StmtStarting or SQL:StmtStarting is raised again
The statement is finished. SP:StmtCompleted or SQL:StmtCompleted is raised
Not the whole procedure is recompiled but only individual statements.
Generally speaking, any DDL in your store procedure will result in recompilation, then if you use create and drop table instructions you are going to get recompilations.
It can be mitigated by including the DDL as the first statement in the store procedure but you should test before and see it with your own eyes in your server.
If the dataset you have to put in the temporal table is small and you don't need non-unique indexes, you should try to use table variables instead.
It's not a good idea to put too many rows in a table variable because they dont have statistics, Sql Server allways "thinks" they have only one record and the query plan could be a little bit far than optimal one(but it is going to avoid the recompilations due to temporal table creation).
The temp tables can cause recompilation. It happens just because they are treated like the regular tables by the SQL Server Engine. When the tables (in which underlying queries rely on)
change significantly, SQL Server detects this change(using auto update statistics) and marks the dependent queries to be recompiled so that the next execution can create an optimal execution plan.
Once the temp table or the queries relying on the temp table changes the Query engine will not be able to execute the same cached plan as it would not accommodate the query.
It should be noted that table variables inherently do not cause recompilation. In some situations these may be a better choice.
See http://sqlserverplanet.com/optimization/temp-table-recompiles for further information on temp table recompilation.

When should I use a table variable vs temporary table in sql server?

I'm learning more details in table variable. It says that temp tables are always on disk, and table variables are in memory, that is to say, the performance of table variable is better than temp table because table variable uses less IO operations than temp table.
But sometimes, if there are too many records in a table variable that can not be contained in memory, the table variable will be put on disk like the temp table.
But I don't know what the "too many records" is. 100,000 records? or 1000,000 records? How can I know if a table variable I'm using is in memory or is on disk? Is there any function or tool in SQL Server 2005 to measure the scale of the table variable or letting me know when the table variable is put on disk from memory?
Your question shows you have succumbed to some of the common misconceptions surrounding table variables and temporary tables.
I have written quite an extensive answer on the DBA site looking at the differences between the two object types. This also addresses your question about disk vs memory (I didn't see any significant difference in behaviour between the two).
Regarding the question in the title though as to when to use a table variable vs a local temporary table you don't always have a choice. In functions, for example, it is only possible to use a table variable and if you need to write to the table in a child scope then only a #temp table will do
(table-valued parameters allow readonly access).
Where you do have a choice some suggestions are below (though the most reliable method is to simply test both with your specific workload).
If you need an index that cannot be created on a table variable then you will of course need a #temporary table. The details of this are version dependant however. For SQL Server 2012 and below the only indexes that could be created on table variables were those implicitly created through a UNIQUE or PRIMARY KEY constraint. SQL Server 2014 introduced inline index syntax for a subset of the options available in CREATE INDEX. This has been extended since to allow filtered index conditions. Indexes with INCLUDE-d columns or columnstore indexes are still not possible to create on table variables however.
If you will be repeatedly adding and deleting large numbers of rows from the table then use a #temporary table. That supports TRUNCATE (which is more efficient than DELETE for large tables) and additionally subsequent inserts following a TRUNCATE can have better performance than those following a DELETE as illustrated here.
If you will be deleting or updating a large number of rows then the temp table may well perform much better than a table variable - if it is able to use rowset sharing (see "Effects of rowset sharing" below for an example).
If the optimal plan using the table will vary dependent on data then use a #temporary table. That supports creation of statistics which allows the plan to be dynamically recompiled according to the data (though for cached temporary tables in stored procedures the recompilation behaviour needs to be understood separately).
If the optimal plan for the query using the table is unlikely to ever change then you may consider a table variable to skip the overhead of statistics creation and recompiles (would possibly require hints to fix the plan you want).
If the source for the data inserted to the table is from a potentially expensive SELECT statement then consider that using a table variable will block the possibility of this using a parallel plan.
If you need the data in the table to survive a rollback of an outer user transaction then use a table variable. A possible use case for this might be logging the progress of different steps in a long SQL batch.
When using a #temp table within a user transaction locks can be held longer than for table variables (potentially until the end of transaction vs end of statement dependent on the type of lock and isolation level) and also it can prevent truncation of the tempdb transaction log until the user transaction ends. So this might favour the use of table variables.
Within stored routines, both table variables and temporary tables can be cached. The metadata maintenance for cached table variables is less than that for #temporary tables. Bob Ward points out in his tempdb presentation that this can cause additional contention on system tables under conditions of high concurrency. Additionally, when dealing with small quantities of data this can make a measurable difference to performance.
Effects of rowset sharing
DECLARE #T TABLE(id INT PRIMARY KEY, Flag BIT);
CREATE TABLE #T (id INT PRIMARY KEY, Flag BIT);
INSERT INTO #T
output inserted.* into #T
SELECT TOP 1000000 ROW_NUMBER() OVER (ORDER BY ##SPID), 0
FROM master..spt_values v1, master..spt_values v2
SET STATISTICS TIME ON
/*CPU time = 7016 ms, elapsed time = 7860 ms.*/
UPDATE #T SET Flag=1;
/*CPU time = 6234 ms, elapsed time = 7236 ms.*/
DELETE FROM #T
/* CPU time = 828 ms, elapsed time = 1120 ms.*/
UPDATE #T SET Flag=1;
/*CPU time = 672 ms, elapsed time = 980 ms.*/
DELETE FROM #T
DROP TABLE #T
Use a table variable if for a very small quantity of data (thousands of bytes)
Use a temporary table for a lot of data
Another way to think about it: if you think you might benefit from an index, automated statistics, or any SQL optimizer goodness, then your data set is probably too large for a table variable.
In my example, I just wanted to put about 20 rows into a format and modify them as a group, before using them to UPDATE / INSERT a permanent table. So a table variable is perfect.
But I am also running SQL to back-fill thousands of rows at a time, and I can definitely say that the temporary tables perform much better than table variables.
This is not unlike how CTE's are a concern for a similar size reason - if the data in the CTE is very small, I find a CTE performs as good as or better than what the optimizer comes up with, but if it is quite large then it hurts you bad.
My understanding is mostly based on http://www.developerfusion.com/article/84397/table-variables-v-temporary-tables-in-sql-server/, which has a lot more detail.
Microsoft says here
Table variables does not have distribution statistics, they will not trigger recompiles. Therefore, in many cases, the optimizer will build a query plan on the assumption that the table variable has no rows. For this reason, you should be cautious about using a table variable if you expect a larger number of rows (greater than 100). Temp tables may be a better solution in this case.
I totally agree with Abacus (sorry - don't have enough points to comment).
Also, keep in mind it doesn't necessarily come down to how many records you have, but the size of your records.
For instance, have you considered the performance difference between 1,000 records with 50 columns each vs 100,000 records with only 5 columns each?
Lastly, maybe you're querying/storing more data than you need? Here's a good read on SQL optimization strategies. Limit the amount of data you're pulling, especially if you're not using it all (some SQL programmers do get lazy and just select everything even though they only use a tiny subset). Don't forget the SQL query analyzer may also become your best friend.
Variable table is available only to the current session, for example, if you need to EXEC another stored procedure within the current one you will have to pass the table as Table Valued Parameter and of course this will affect the performance, with temporary tables you can do this with only passing the temporary table name
To test a Temporary table:
Open management studio query editor
Create a temporary table
Open another query editor window
Select from this table "Available"
To test a Variable table:
Open management studio query editor
Create a Variable table
Open another query editor window
Select from this table "Not Available"
something else I have experienced is: If your schema doesn't have GRANT privilege to create tables then use variable tables.
writing data in tables declared declare #tb and after joining with other tables, I realized that the response time compared to temporary tables tempdb .. # tb is much higher.
When I join them with #tb the time is much longer to return the result, unlike #tm, the return is almost instantaneous.
I did tests with a 10,000 rows join and join with 5 other tables

SQL Server 2000 temp table vs table variable

What would be more efficient in storing some temp data (50k rows in one and 50k in another) to perform come calculation. I'll be doing this process once, nightly.
How do you check the efficiency when comparing something like this?
The results will vary on which will be easier to store the data, in disk (#temp) or in memory (#temp).
A few excerpts from the references below
A temporary table is created and populated on disk, in the system database tempdb.
A table variable is created in memory, and so performs slightly better than #temp tables (also because there is even less locking and logging in a table variable). A table variable might still perform I/O to tempdb (which is where the performance issues of #temp tables make themselves apparent), though the documentation is not very explicit about this.
Table variables result in fewer recompilations of a stored procedure as compared to temporary tables.
[Y]ou can create indexes on the temporary table to increase query performance.
Regarding your specific case with 50k rows:
As your data size gets larger, and/or the repeated use of the temporary data increases, you will find that the use of #temp tables makes more sense
References:
Should I use a #temp table or a #table variable?
MSKB 305977 - SQL Server 2000 - Table Variables
There can be a big performance difference between using table variables and temporary tables. In most cases, temporary tables are faster than table variables. I took the following tip from the private SQL Server MVP newsgroup and received permission from Microsoft to share it with you. One MVP noticed that although queries using table variables didn't generate parallel query plans on a large SMP box, similar queries using temporary tables (local or global) and running under the same circumstances did generate parallel plans.
More from SQL Mag (subscription required unfortunately, I'll try and find more resources momentarily)
EDIT: Here is some more in depth information from CodeProject

Resources