I am trying to find out the differences in the way you can define which database to use in SSMS.
Is there any functional difference between using the 'Available Databases' drop down list
Adventure works Available Databases dropdown,
the database being defined in the query
SELECT * FROM AdventureWorks2008.dbo.Customers
and
stating the database at the start?
USE AdventureWorks2008
GO
SELECT * FROM dbo.Customers
I'm interested to know if there is a difference in terms of performance or something that happens behind the scenes for each case.
Thank you for your help
Yes, there is. Very very small overhead is added when you use "USE AdventureWorks2008" as it will execute it against the database every time you execute the query. It also will print the "Command(s) completed successfully.". However it is so small overhead and if you are OK with this message then just do not care about that.
Yes, there can be the difference.
When you execute statements like this: SELECT * FROM AdventureWorks2008.dbo.Customers in the context of another database (not AdventureWorks2008) that another database's settings are applied.
First of all, any database has its Compatibility Level that can be different, so it can limit usage of some code, for example you cannot use APPLY operator in the context of database with CL set to 80 but you can do it within database with CL >= 90
Second, every database has its own set of options such as AUTO_UPDATE_STATISTICS_ASYNC and Forced Parameterization that can affect your query plan.
I did encounter some cases when the context of database influenced the plan:
One case was when I created filtered index for one table and it was used in the plan until I executed my query in the context of database with Simple parameterization, and it was not used for the same query when executed in the context of database with Forced parameterization. When I used the hint to force that index I've got the error that the query plan cannot be produced due to query hint, so I need to investigate and I found out that my query was parameterized and instead of my condition fld = 0 there was fld = #p and it could not use my filtered index with fld = 0 condition.
The second case was reguarding table cardinality estimation: we use staging tables to load the data in our ETL procedures and then switch then to actual tables like this:
insert into stg with(tablock);
...
truncate table actual;
alter table stg swith to actual;
All the staging tables are empty when the procedure compiles but within the proc they are filled with the data, so when we do joins between them they are not emty anymore. Passing from 0 rows to non-0 rows triggers statement recompilation that should take in consideration actual number of rows, but it did not happen on the production server, so all estimations were completely wrong (1 row fo every table) and I need to investigate. The cause was AUTO_UPDATE_STATISTICS_ASYNC set to ON in production database.
Now imagine you have 2 db: db1 and db2 with this option set to ON and OFF respectively, in db1 this code will have wrong estimations while if you execute it in db2 using db1.dbo.stg it will have right estimations. The execution time will be very different in these 2 databases.
Related
Imagine this scenario in SQL Server 2016: we have to tables A and B
A is a memory optimized table
B is a normal table
We join A and B, and nothing happens and 1000 rows are returned in min time.
But when we want to insert this result set into another table (memory optimized table OR normal table or even a temp table), it takes 10 to 20 seconds to insert.
Any ideas?
UPDATE : Execution plans for normal scenario and memory optimized table added
When a DML statement targets a Memory-Optimized table, the query cannot run in parallel, and the server will employ a serialized plan. So, your first statement runs in a single-core mode.
In the second instance, the DML statement leverages the fact that "SELECT INTO / FROM" is parallelizable. This behavior was added in SQL Server 2014. Thus, you get a parallel plan for that. Here is some information about this:
Reference: What's New (Database Engine) - SQL Server 2014
I have run into this problem countless times with Memory-Optimized targets. One solution I have found, if the I/O requirements are high on the retrieval, is to stage the result of the SELECT statement into a temporary table or other intermediate location, then insert from there into the Memory-Optimized table.
The third issue is that, by default, statements that merely read from a Memory-Optimized table, even if that table is not the target of DML, are also run in serialized fashion. There is a hotfix for this, which you can enable with a query hint.
The hint is used like this:
OPTION(HINT USE ('ENABLE_QUERY_OPTIMIZER_HOTFIXES'))
Reference: Update enables DML query plan to scan query memory-optimized tables in parallel in SQL Server 2016
In either case, any DML that has a memory-optimized table as a target is going to run on a single core. This is by design. If you need to exploit parallelism, you cannot do it if the Memory-Optimized table is the target of the statement. You will need to benchmark different approaches to find the one that performs best for your scenario.
I'm working on updating a legacy stored procedure (which calls several other child stored procedures.) Within a transaction, it manipulates data in about a dozen or so tables and performs lots of calculations in the process, sometimes triggering lock escalation up to a table lock. This process could take 20 minutes or more to complete in some cases. Obviously, locking tables for that long is a big no no. So I'm working on a 2-stage plan to the reduce the blocking caused by this sproc in phase 1 and completely rewrite it to be more efficient and not take an inordinate amount of time in phase 2.
In order to reduce the blocking, wherever there is manipulation on the database tables, I plan to move that manipulation into a temporary table. By doing all of the work in temporary table and then updating the real tables with the final results at the very end of the process, I should be able to reduce the time spent blocking other users, significantly. (That's the "quick fix" for phase 1.)
Here's my issue: some of these temp table might have 100,000 rows or more in them while I use them for various calculations. Because of this I would like to generate indexes on the temp tables to keep performance up. And since these are temp tables that are created within a stored procedure, they need to have unique names to avoid errors if multiple users execute the sproc at the same time. I know that I can manually declare the temp tables using CREATE TABLE statements, and if I do that I can specify an index without a name and let SQL Server create the name for me. What I'm hoping to be able to do is use SELECT * INTO to generate the temp table and find another way to get SQL Server to auto-generate index names. I'm sure you're asking "Why?" My company has several changes in store for the system that I'm working with. If I can manage to use the SELECT INTO method, then, if a column gets added or resized or whatever, then there won't be an issue with the developers needing to know that they have to go back into these stored procedures and change their temp table definitions to match. Using SELECT INTO will automatically keep the temp tables matching the layout of the "real" tables.
So, does anyone know of a way to get SQL Server to auto-generate the name for an index on a temp table (aside from doing it as part of the CREATE TABLE syntax)?
Thank you!
And since these are temp tables that are created within a stored procedure, they need to have unique names to avoid errors if multiple users execute the sproc at the same time.
No they don't. Each session will have their own temp tables, and they will be automatically cleaned up.
And indexes don't have global name scope, so each temp table can have the same index names. eg
create procedure TempTest
as
begin
select * into #t from sys.objects
create index foo on #t(name)
waitfor delay '00:00:10'
select * from #t
end
And you can run
exec temptest
go 10
from multiple sessions.
I am creating a Java function that needs to use a SQL query with a lot of joins before doing a full scan of its result. Instead of hard-coding a lot of joins I decided to create a view with this complex query. Then the Java function just uses the following query to get this result:
SELECT * FROM VW_####
So the program is working fine but I want to make it faster since this SELECT command is taking a lot of time. After taking a look on its plan execution plan I created some indexes and made it +-30% faster but I want to make it faster.
The problem is that every operation in the execution plan have cost between 0% and 4% except one operation, a clustered-index insert that has +-50% of the execution cost. I think that the system is using a temporary table to store the view's data, but an index in this view isn't useful for me because I need all rows from it.
So what can I do to optimize that insert in the CWT_PrimaryKey? I think that I can't turn off that index because it seems to be part of the SQL Server's internals. I read somewhere that this operation could appear when you use cursors but I think that I am not using (or does the view use it?).
The command to create the view is something simple (no T-SQL, no OPTION, etc) like:
create view VW_#### as SELECTS AND JOINS HERE
And here is a picture of the problematic part from the execution plan: http://imgur.com/PO0ZnBU
EDIT: More details:
Well the query to create the problematic view is a big query that join a lot of tables. Based on a single parameter the Java-Client modifies the query string before creating it. This view represents a "data unit" from a legacy Database migrated to the SQLServer that didn't had any Foreign or Primary Key, so our team choose to follow this strategy. Because of that the view have more than 50 columns and it is made from the join of other seven views.
Main view's query (with a lot of Portuguese words): http://pastebin.com/Jh5vQxzA
The other views (from VW_Sintese1 until VW_Sintese7) are created like this one but without using extra views, they just use joins with the tables that contain the data requested by the main view.
Then the Java Client create a prepared Statement with the query "Select * from VW_Sintese####" and execute it using the function "ExecuteQuery", something like:
String query = "Select * from VW_Sintese####";
PreparedStatement ps = myConn.prepareStatement(query,ResultSet.TYPE_SCROLL_INSENSITIVE, ResultSet.CONCUR_READ_ONLY);
ResultSet rs = ps.executeQuery();
And then the program goes on until the end.
Thanks for the attention.
First: you should post the code of the view along with whatever is using the views because of the rest of this answer.
Second: the definition of a view in SQL Server is later used to substitute in querying. In other words, you created a view, but since (I'm assuming) it isn't an indexed view, it is the same as writing the original, long SELECT statement. SQL Server kind of just swaps it out in the DML statement.
From Microsoft's 'Querying Microsoft SQL Server 2012': T-SQL supports the following table expressions: derived tables, common table expressions (CTEs), views, inline table-valued functions.
And a direct quote:
It’s important to note that, from a performance standpoint, when SQL Server optimizes
queries involving table expressions, it first unnests the table expression’s logic, and therefore interacts with the underlying tables directly. It does not somehow persist the table expression’s result in an internal work table and then interact with that work table. This means that table expressions don’t have a performance side to them—neither good nor
bad—just no side.
This is a long way of reinforcing the first statement: please include the SQL code in the view and what you're actually using as the SELECT statement. Otherwise, we can't help much :) Cheers!
Edit: Okay, so you've created a view (no performance gain there) that does 4-5 LEFT JOIN on to the main view (again, you're not helping yourself out much here by eliminating rows, etc.). If there are search arguments you can use to filter down the resultset to fewer rows, you should have those in here. And lastly, you're ordering all of this at the top, so your query engine will have to get those views, join them up to a massive SELECT statement, figure out the correct order, and (I'm guessing here) the result count is HUGE and SQL's db engine is ordering it in some kind of temporary table.
The short answer: get less data (fewer columns and only the rows you need); don't order the results if the resultset is very large, just get the data to the client and then sort it there.
Again, if you want more help, you'll need to post table schemas and index strategies for all tables that are in the query (including the views that are joined) and you'll need to include all view definitions (including the views that are joined).
Everyday a company drops a text file with potentially many records (350,000) onto our secure FTP. We've created a windows service that runs early in the AM to read in the text file into our SQL Server 2005 DB tables. We don't do a BULK Insert because the data is relational and we need to check it against what's already in our DB to make sure the data remains normalized and consistent.
The problem with this is that the service can take a very long time (hours). This is problematic because it is inserting and updating into tables that constantly need to be queried and scanned by our application which could affect the performance of the DB and the application.
One solution we've thought of is to run the service on a separate DB with the same tables as our live DB. When the service is finished we can do a BCP into the live DB so it mirrors all of the new records created by the service.
I've never worked with handling millions of records in a DB before and I'm not sure what a standard approach to something like this is. Is this an appropriate way of doing this sort of thing? Any suggestions?
One mechanism I've seen is to insert the values into a temporary table - with the same schema as the target table. Null IDs signify new records and populated IDs signify updated records. Then use the SQL Merge command to merge it into the main table. Merge will perform better than individual inserts/updates.
Doing it individually, you will incur maintenance of the indexes on the table - can be costly if its tuned for selects. I believe with merge its a bulk action.
It's touched upon here:
What's a good alternative to firing a stored procedure 368 times to update the database?
There are MSDN articles about SQL merging, so Googling will help you there.
Update: turns out you cannot merge (you can in 2008). Your idea of having another database is usually handled by SQL replication. Again I've seen in production a copy of the current database used to perform a long running action (reporting and aggregation of data in this instance), however this wasn't merged back in. I don't know what merging capabilities are available in SQL Replication - but it would be a good place to look.
Either that, or resolve the reason why you cannot bulk insert/update.
Update 2: as mentioned in the comments, you could stick with the temporary table idea to get the data into the database, and then insert/update join onto this table to populate your main table. The difference is now that SQL is working with a set so can tune any index rebuilds accordingly - should be faster, even with the joining.
Update 3: you could possibly remove the data checking from the insert process and move it to the service. If you can stop inserts into your table while this happens, then this will allow you to solve the issue stopping you from bulk inserting (ie, you are checking for duplicates based on column values, as you don't yet have the luxury of an ID). Alternatively with the temporary table idea, you can add a WHERE condition to first see if the row exists in the database, something like:
INSERT INTO MyTable (val1, val2, val3)
SELECT val1, val2, val3 FROM #Tempo
WHERE NOT EXISTS
(
SELECT *
FROM MyTable t
WHERE t.val1 = val1 AND t.val2 = val2 AND t.val3 = val3
)
We do much larger imports than that all the time. Create an SSIS pacakge to do the work. Personally I prefer to create a staging table, clean it up, and then do the update or import. But SSIS can do all the cleaning in memory if you want before inserting.
Before you start mirroring and replicating data, which is complicated and expensive, it would be worthwhile to check your existing service to make sure it is performing efficiently.
Maybe there are table scans you can get rid of by adding an index, or lookup queries you can get rid of by doing smart error handling? Analyze your execution plans for the queries that your service performs and optimize those.
I am moving a system from a VB/Access app to SQL server. One common thing in the access database is the use of tables to hold data that is being calculated and then using that data for a report.
eg.
delete from treporttable
insert into treporttable (.... this thing and that thing)
Update treportable set x = x * price where (...etc)
and then report runs from treporttable
I have heard that SQL server does not like it when all records from a table are deleted as it creates huge logs etc. I tried temp sql tables but they don't persists long enough for the report which is in a different process to run and report off of.
There are a number of places where this is done to different report tables in the application. The reports can be run many times a day and have a large number of records created in the report tables.
Can anyone tell me if there is a best practise for this or if my information about the logs is incorrect and this code will be fine in SQL server.
If you do not need to log the deletion activity you can use the truncate table command.
From books online:
TRUNCATE TABLE is functionally
identical to DELETE statement with no
WHERE clause: both remove all rows in
the table. But TRUNCATE TABLE is
faster and uses fewer system and
transaction log resources than DELETE.
http://msdn.microsoft.com/en-us/library/aa260621(SQL.80).aspx
delete from sometable
Is going to allow you to rollback the change. So if your table is very large, then this can cause a lot of memory useage and time.
However, if you have no fear of failure then:
truncate sometable
Will perform nearly instantly, and with minimal memory requirements. There is no rollback though.
To Nathan Feger:
You can rollback from TRUNCATE. See for yourself:
CREATE TABLE dbo.Test(i INT);
GO
INSERT dbo.Test(i) SELECT 1;
GO
BEGIN TRAN
TRUNCATE TABLE dbo.Test;
SELECT i FROM dbo.Test;
ROLLBACK
GO
SELECT i FROM dbo.Test;
GO
i
(0 row(s) affected)
i
1
(1 row(s) affected)
You could also DROP the table, and recreate it...if there are no relationships.
The [DROP table] statement is transactionally safe whereas [TRUNCATE] is not.
So it depends on your schema which direction you want to go!!
Also, use SQL Profiler to analyze your execution times. Test it out and see which is best!!
The answer depends on the recovery model of your database. If you are in full recovery mode, then you have transaction logs that could become very large when you delete a lot of data. However, if you're backing up transaction logs on a regular basis to free the space, this might not be a concern for you.
Generally speaking, if the transaction logging doesn't matter to you at all, you should TRUNCATE the table instead. Be mindful, though, of any key seeds, because TRUNCATE will reseed the table.
EDIT: Note that even if the recovery model is set to Simple, your transaction logs will grow during a mass delete. The transaction logs will just be cleared afterward (without releasing the space). The idea is that DELETE will create a transaction even temporarily.
Consider using temporary tables. Their names start with # and they are deleted when nobody refers to them. Example:
create table #myreport (
id identity,
col1,
...
)
Temporary tables are made to be thrown away, and that happens very efficiently.
Another option is using TRUNCATE TABLE instead of DELETE. The truncate will not grow the log file.
I think your example has a possible concurrency issue. What if multiple processes are using the table at the same time? If you add a JOB_ID column or something like that will allow you to clear the relevant entries in this table without clobbering the data being used by another process.
Actually tables such as treporttable do not need to be recovered to a point of time. As such, they can live in a separate database with simple recovery mode. That eases the burden of logging.
There are a number of ways to handle this. First you can move the creation of the data to running of the report itself. This I feel is the best way to handle, then you can use temp tables to temporarily stage your data and no one will have concurency issues if multiple people try to run the report at the same time. Depending on how many reports we are talking about, it could take some time to do this, so you may need another short term solutio n as well.
Second you could move all your reporting tables to a difffernt db that is set to simple mode and truncate them before running your queries to populate. This is closest to your current process, but if multiple users are trying to run the same report could be an issue.
Third you could set up a job to populate the tables (still in separate db set to simple recovery) once a day (truncating at that time). Then anyone running a report that day will see the same data and there will be no concurrency issues. However the data will not be up-to-the minute. You also could set up a reporting data awarehouse, but that is probably overkill in your case.