Query optimization_T- SQL - sql-server

we have 14M data in a source table and wanted to know what are all the possible ways to insert data from source table to destination table without dropping indexes. Indexes are created on destination table but not on source table. In SSIS package we tried with Data flow task, Lookup and by using Execute SQL task, but there is no use and lacking in performance. So, Kindly let me now the possible ways to speedup the insertion without dropping indexes. Thanks in advance

let me now the possible ways to speedup the insertion without dropping indexes
It really depend upon the complete set up.
What is real query like or what is table schema ?
How many indexes are there ?
Is it one time operation or daily avg will be 14m.
General answer :
i) Run insert operation in downtime or during very minimum traffic hour.
ii) Use TABLOCK hint
INSERT INTO TAB1 WITH (TABLOCK)
SELECT COL1,COL2,COL3
FROM TAB2
iii) You can think of disable/Rebuild index
ALTER INDEX ALL ON sales.customers
DISABLE;
--Insert query
ALTER INDEX ALL ON sales.customers
REBUILD ;
GO
iv) If source server is different then you can put source data in some destination parking table which is without index.Now again another job will put from parking to destination table.
Transaction between same server will be relatively faster.

Related

How to do incremental load in SQL server

I have DB tables where there are no identity column. We have client data fetched from DB2 to SQL Server and unfortunately DB2 design doesn't have identity columns.
Now we have some data inserted, updated and deleted from source (DB2/SQL Server) and these data I want to load to destination (SQL Server) using some incremental load concept.
I tried SSIS lookups in Dataflow task however it's taking huge time to simply insert one new record. Please note that, in "lookup transformation editor" I'm mapping all "available input columns" to available "available lookup columns " as there is no identity column. I think, this is why it's taking time. I have few tables having around 20 million records.
Is there any faster method /ways available to do this, specially when table does not have identity column? Is except or SQL merge will help?
I'm open to have any other approach other than SSIS.
Look up is SSIS takes some time, so you can use ESQL Task and call the merge procedures.
I think what you can do is use merge procedures there you can create a column in your source table and update the records in the column like
merge desination
using
{
source columns from source s}
join desination d
on s.primarykey=d.primary key
when matched then
s.updatedrecord=1
when not matched then
insert into desination columns.
from the above the query you new records will be inserted and the updated records with the help of updatedrecord column you can update or insert them in your destination table successfully.
you can go to the following link for merge procedures.
https://www.sqlservercentral.com/Forums/Topic1042053-392-1.aspx
https://msdn.microsoft.com/en-us/library/bb510625.aspx
If your source is a SQL query from DB2 for instance, try adding a new column to this. It will be a checksum value over the columns you select "expect to change or want to monitor changes over".
SELECT
BINARY_CHECKSUM(
Column1
,Column2
,Column3)AS ChecksumValue
,Column1
,Column2
,Column3
FROM #TEMP
You would have to add this to your existing table in SQL as well to be able to start comparing.
If you have this, then you can do the lookup on the checksum value rater than on the columns. Since number lookups are a lot quicker than varchar comparisons over multiple columns. I am guessing since there is no key, you would then have to split the data between checksum matches (which should be no change existing records) and non matches. The non matches Could be new rows or just updates. But your set should be smaller to work with.
Good luck. HTH

SQL Server - simple discard of duplicate keys/ rows when inserting

I'm feeding data into SQL Server database and 1 out of every 1000 records is a duplicate due to matters outside my control. It's an exact duplicate - the entire record, the unique identifier -- everything.
I know this can solved with an 'updated' rather than insert step ... or 'on error, update' instead of insert, perhaps.
But is there a quick and easy way to make SQL Server ignore these duplicates? I haven't made an index/ unique constraint yet -- but if I did that, I don't want a 'duplicate' key value breaking or interrupting the ETL/ data flow process. I just SQL Server to keep executing the insert query. Is there a way to do this?
Just add a WHERE NOT EXISTS to the statement you're executing -
INSERT INTO table VALUES('123', 'blah') WHERE NOT EXISTS(select top 1 from table where unique_identifier_column = '123')
Just to be clear for anyone else hitting this issue, for the best performance and a slight chance of losing an insert, one should define primary key in the table and use IGNORE_DUP_KEY = ON.
If you're looking for a duplicate record on every field just use the distinct clause in your select:
Insert into DestinationTable
Select Distinct *
From SourceTable
EDIT:
I misinterpreted your question. You're trying to find a low impact way to prevent adding a record that already exists in your DestinationTable.
If you want your inserts to remain fast, one way to do it is to add an identity column to your table as the primary key. Let your duplicate records get added, but then run a maintenance routine on down or slow time that checks all records added since the last check and deletes any added duplicates. Otherwise, there is no easy way... you will have to check on every insert.

Disable auto creating statistics on table

Is there any possibility to disable auto creating statistics on specific table in database, without disabling auto creating statistics for entire database?
I have a procedure wich written as follow
create proc
as
create table #someTempTable(many columns, more than 100)
inserting into #someTempTable **always one or two row**
exec proc1
exec proc2
etc.
proc1, proc2 .. coontains many selects and updates like this:
select ..
from #someTempTable t
join someOrdinaryTable t2 on ...
update #someTempTable set col1 = somevalue
Profiler shows that before each select server starts collecting stats in #someTempTable, and it takes more than quarter of entire execution of proc. Proc is using in OLPT processing and should works very fast. I want to change this temporary table to table variable(because for table variables server doesn't collect stats) but can't because it lead me to rewrite all this procedures to passing variables between them and all of this legacy code should be retests. I'm searching alternative way how to force server to behave temporary table like table variables in part of collecting stats.
P.S. I'm know that stats is useful thing but in this case it's useless because table alway contains small amount of records.
I assume you know what you are doing. Disabling a statistics is generally a bad idea. Anyhow:
EXEC sp_autostats 'table_name', 'OFF'
More documentation here: https://msdn.microsoft.com/en-us/library/ms188775.aspx.
Edit: OP clarified that he wants to disable statistics for a temp table. Try this:
CREATE TABLE #someTempTable
(
ID int PRIMARY KEY WITH (STATISTICS_NORECOMPUTE = ON),
...other columns...
)
If you don't have a primary key already, use an identity column for a PK.

Permanently sorting a table in SQLServer based on pre-existing data

I have made a table in SQL Server based on pre-existing data:
SELECT pre_existing_data
INTO new_table
FROM existing_table
I am trying to get the output to permanently sort by a particular field once the table is created. I thought this would be as simple as adding an ORDER BY clause at the end of the chunk of code that makes the table, but the data still won't sort properly.
There is no way to permanently sort a table in SQL.
You can create an index on the table and queries which use the index (in an ORDER BY clause) will be returned quicker, but the order the data is stored on the disk is not controllable.
You can create an index-organized table by using a CLUSTERED INDEX, which stores the data on disk in an ordered fashion on the clustering key. Then if you ORDER BY in your query based on the clustering key, data should come out very fast. Note that you have to use the ORDER BY in your query no matter what.
I have made a new table on SQL Server on pre-existing Schema
insert into new_table
select * from old_table
ORDER BY col ASC|DSC;
After it drop old_table and rename new table to old_table_name
drop table old_table_name;
rename new_table_name to old_table_name;
Try this trick to short your data in the table permanently

SSIS – Locking in Target table when SCD is implemented using Lookup

To override the in-build SCD transformation in SSIS dataflow, I used checksum values of columns and a lookup. Below is the process.
I need to implement the SCD type 1 in Target_Fact_table.
Source Query
Select key, a, b, CHECKSUM (a,b) new_value from Source_table
In Lookup
Select key, CHECKSUM (a,b) Old_value from Target_Fact_table
If no match found the record will be inserted and if match found compare the New_value and Old_value and if any change then update the record.
First run doesn’t have any issue. But for the second time when source having more records to update and insert then the target table will be locked because of the bulk insert and update.
I tried removing the table lock from the OLE DB Destination task but still the locking is there.
What can I do to avoid this locking or can i put some small delay in the update transformation ?
Your optimal solution is instead of an OLEDB Command to update the matched values, insert your matched data into a staging table in the destination and then do a single UPDATE JOIN statement on the fact table and your staging table to update all the new values.
This avoids locking, increases throughput, and can provide a cleaner audit trail for changes.

Resources