SqlBulkCopy is slow, doesn't utilize full network speed - sql-server

for that past couple of weeks I have been creating generic script that is able to copy databases. The goal is to be able to specify any database on some server and copy it to some other location, and it should only copy the specified content. The exact content to be copied over is specified in a configuration file. This script is going to be used on some 10 different databases and run weekly. And in the end we are copying only about 3%-20% of databases which are as large as 500GB. I have been using the SMO assemblies to achieve this. This is my first time working with SMO and it took a while to create generic way to copy the schema objects, filegroups ...etc. (Actually helped find some bad stored procs).
Overall I have a working script which is lacking on performance (and at times times out) and was hoping you guys would be able to help. When executing the WriteToServer command to copy large amount of data (> 6GB) it reaches my timeout period of 1hr. Here is the core code for copying table data. The script is written in PowerShell.
$query = ("SELECT * FROM $selectedTable " + $global:selectiveTables.Get_Item($selectedTable)).Trim()
Write-LogOutput "Copying $selectedTable : '$query'"
$cmd = New-Object Data.SqlClient.SqlCommand -argumentList $query, $source
$cmd.CommandTimeout = 120;
$bulkData = ([Data.SqlClient.SqlBulkCopy]$destination)
$bulkData.DestinationTableName = $selectedTable;
$bulkData.BulkCopyTimeout = $global:tableCopyDataTimeout # = 3600
$reader = $cmd.ExecuteReader();
$bulkData.WriteToServer($reader); # Takes forever here on large tables
The source and target databases are located on different servers so I kept track of the network speed as well. The network utilization never went over 1% which was quite surprising to me. But when I just transfer some large files between the servers, the network utilization spikes up to 10%. I have tried setting the $bulkData.BatchSize to 5000 but nothing really changed. Increasing the BulkCopyTimeout to an even greater amount would only solve the timeout. I really would like to know why the network is not being used fully.
Anyone else had this problem? Any suggestions on networking or bulk copy will be appreciated. And please let me know if you need more information.
Thanks.
UPDATE
I have tweaked several options that increase the performance of SqlBulkCopy, such as setting the transaction logging to simple and providing a table lock to SqlBulkCopy instead of the default row lock. Also some tables are better optimized for certain batch sizes. Overall, the duration of the copy was decreased by some 15%. And what we will do is execute the copy of each database simultaneously on different servers. But I am still having a timeout issue when copying one of the databases.
When copying one of the larger databases, there is a table for which I consistently get the following exception:
System.Data.SqlClient.SqlException: Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
It is thrown about 16 minutes after it starts copying the table which is no where near my BulkCopyTimeout. Even though I get the exception that table is fully copied in the end. Also, if I truncate that table and restart my process for that table only, the tables is copied over without any issues. But going through the process of copying that entire database fails always for that one table.
I have tried executing the entire process and reseting the connection before copying that faulty table, but it still errored out. My SqlBulkCopy and Reader are closed after each table. Any suggestions as to what else could be causing the script to fail at the point each time?
CREATE TABLE [dbo].[badTable](
[someGUID] [uniqueidentifier] NOT NULL,
[xxx] [uniqueidentifier] NULL,
[xxx] [int] NULL,
[xxx] [tinyint] NOT NULL,
[xxx] [datetime] NOT NULL,
[xxx] [datetime] NOT NULL,
[xxx] [datetime] NOT NULL,
[xxx] [datetime] NULL,
[xxx] [uniqueidentifier] NOT NULL,
[xxx] [uniqueidentifier] NULL,
CONSTRAINT [PK_badTable] PRIMARY KEY NONCLUSTERED
(
[someGUID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
No indexes exist for this table on the target DB.

Have you considered removing indexes, doing the insert, and then reindexing?

I've used a dataset and wonder if this would be faster:
$ds=New-Object system.Data.DataSet
$da=New-Object system.Data.SqlClient.SqlDataAdapter($cmd)
[void]$da.fill($ds)
bulkData.WriteToServer($ds.Tables[0])

SqlBulk Copy is by far the fastest way of copying data into SQL tables.
You should be getting speeds in excess of 10,000 rows per second.
In order to test the bulk copy functionality, try DBSourceTools. ( http://dbsourcetools.codeplex.com )
This utility is designed to script Databases to disk, and then re-create them on a target server.
When copying data, DBSourceTools will first export all data to a local .xml file, and then do a Bulk Copy to the target database.
This will help to further identify where your bottleneck is, by breaking the process up into two passes : one for reading and one for writing.

Related

Shrink DB by replacing data with NULL?

I have a Azure SQL DB that stores photos in varchar(MAX) that are uploaded from a PowerApp during a sign-in / out process. The DB is growing quickly. I don't need the old pictures but I want to keep old records for in / out times. I thought I could shrink or make re-usable space by replacing the photo data with a NULL. However in testing it seems the DB is growing not shrinking when I do this. I can confirm the fields are becoming NULL but DB is growing as I do this.
This is what I ran:
UPDATE [dbo].[Daily Activity Attendance]
SET Signature = NULL, Photo = NULL, SigninSig = NULL, SigninPhoto = NULL
WHERE [AttendanceDate] < '2019-03-20'
Is this just a bad idea or am I doing something wrong?
To compact your LOB pages you should run an ALTER INDEX command and include the option WITH ( LOB_COMPACTION = ON)
I'm not sure exactly what is causing the growth - it may be worth monitoring the UNDOCUMENTED DMF : sys.dm_db_database_page_allocations to see what is happening to your rows as you change the value to NULL
SELECT *
FROM sys.dm_db_database_page_allocations(DB_ID(), OBJECT_ID(N'<your table name>'),1,1,'limited');

How to propagate Always encrypted column encryption to Test, Acceptance and Production?

We are using Always encrypted in a .Net core application.
The Sql Server database is maintained with EF-core migrations.
I was wondering how to propagate the column encryptions for the selected columns over to Test, Acceptance and Production. I didn't find any information on this.
It would be nice if this were taken care of in migrations, so automatic deployment will include newly added columns to be encrypted immediately. As I understand it, encrypting existing data is not straighforward. So, I would rather not have time elapse between running the migration and enabling encryption on the columns, at least not up-time on the application.
Can key names be reused, with different keys on each client and it that safe? If so, I think that using migrationBuilder.Sql() might help me here.
So far, however, I did not succeed.
Maybe, this should not be done this way at all.
CREATE TABLE [dbo].Encrypt(
[id] [int],
[sensitive] [nvarchar](max)
)
followed by
Alter table Encrypt alter column [sensitive] [nvarchar](max)
COLLATE Latin1_General_BIN2
ENCRYPTED WITH (
COLUMN_ENCRYPTION_KEY = [CEK_Auto1],
ENCRYPTION_TYPE = Deterministic,
ALGORITHM = 'AEAD_AES_256_CBC_HMAC_SHA_256'
)
gives an error:
Cannot alter column 'sensitive'. The statement attempts to encrypt, decrypt or re-encrypt the column in-place using a secure enclave, but the current and/or the target column encryption key for the column is not enclave-enabled.
column and one or more of the following column properties: collation (to a different code page), data type. Such changes cannot be combined in a single statement. Try using multiple statements.
when run from SQL Server Management Studio.
What is the way to go here?
If it amounts to down-time between running a migration and running a SQL-script, so be it.

DBCC SHRINKFILE EMPTYFILE stucked because of sysfiles1 table

I'm trying to migrate my pretty big db on SQLServer 2008 from one drive to another with minimum downtime and have some issues.
So , basically, my plan is to use DBCC SHRINKFILE ('filename', EMPTYFILE) for extent movement.
After some period of time, I shrink this file to avoid some space problems with log shipping db's in other server.
Huge amount of extents were moved successfully, but then I've got this error
DBCC SHRINKFILE: System table SYSFILES1 Page 1:21459450 could not be moved to other files because it only can reside in the primary file of the database.
Msg 2555, Level 16, State 1, Line 3
Cannot move all contents of file "filename" to other places to complete the emptyfile operation.
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
So, what I've tried already:
manually make my db bigger by adding empty space(just make file bigger by altering database)
work a little bit with files in SECONDARY filegroup
work with db after full\transactional backup
And this didn't work.
Can someone help me to fix this?
Thanks a lot.
As the error message states, there are some things that need to reside in the PRIMARY filegroup. Use the information in sys.allocation_units to find out what user (as opposed to system) objects are still in PRIMARY and move them with create index … with (drop_existing = on) on OTHER_FILEGROUP. Once you've moved all of your objects, you should be able to shrink the file down to as small as it can possibly be. Your final step will be to incur the downtime to move the primary file (in this case, minimal downtime does not mean "no downtime"). Luckily, what actually needs to reside in PRIMARY isn't very much data, so the downtime should be small. But you'll have a good idea once you get everything out of there.
While you're at it, set the default filegroup for your database to something other than primary to avoid putting user objects there in the future.

How can I fix this Access 2003 bug? Data Entry Auto-Generating a Value

I'm experiencing an odd data entry bug in MS Access and I am hoping that someone can possibly help shed a bit of light on why this might be happening and how to fix it.
I have a data table that is defined in our SQL Server database. The definition is below, with only the field names changed.
CREATE TABLE [dbo].[MyTable](
[ID] [int] IDENTITY(1,1) NOT NULL,
[TextField1] [nvarchar](10) NOT NULL,
[TextField2] [nvarchar](50) NOT NULL,
[Integer1] [int] NOT NULL CONSTRAINT [DF_MyTable_Integer1] DEFAULT (0),
[Integer2] [int] NOT NULL,
[LargerTextField] [nvarchar](300) NULL
) ON [PRIMARY]
As you can see from the definition of this table that there is nothing special about it. The problem that I am having is with a linked data table in an MS Access 2003 database that links through ODBC to this table.
After defining and creating the data table in SQL Server, I opened my working Access Database and linked to the new table. I need to manually create the records that belong in this table. However, when I started to add the data rows, I noticed that as I tabbed out of the LargerTextField to a new row, the LargerTextField was being defaulted to '2', even though I had not entered anything nor defined a default value on the field?!
Initially, I need this field to be Null. I'll come back later and with an update routine populate the data. But why would MS Access default a value in my field, even though the schema for the table clearly does not define one? Has anyone seen this or have any clue why this may happen?
EDIT
One quick correction, as soon as I tab into the LargerTextField, the value defaults to '2', not when I tab out. Small, subtle difference, but possibly important.
As a test, I also created a new, fresh MS Database an linked the table. I'm having the exact same problem. I assume this could be a problem with either MS SQL Server or, possibly, ODBC.
Wow, problem solved. This isn't a bug but it was certainly not behavior I desire or expected.
This behavior is occurring because of the data I am manually entering in fields Integer1 and Integer2. I am manually entering a 0 as the value of Integer1 and a 1 into Integer2. I've never seen Access automatically assume my data inputs, but it looks like it's recognizing data that is sequentially entered.
As a test, I entered a record with Integer1 set to 1 and Integer2 set to 2. Sure enough, when I tabbed into LargerTextField, the value of 3 was auto-populated.
I hate that this was a problem because of user ignorance but, I'll be honest, in my past 10+ years of using MS Access I can not recall even once seeing this behavior. I would almost prefer to delete this question to save face but since it caught me off guard and I'm an experienced user, I might as well leave it in the StackExchange archives for others who may have the same experience. :/
As an experiment fire up a brand-new Access DB and connect to this table to see if you get the same behavior. I suspect this Access DB was connected to a table like this in the past and had that default set. Access has trouble forgetting sometimes :)

SQL Server query execution very slow when comparing Primary Keys

I have a SQL Server 2008 R2 database table with 12k address records where I am trying to filter out duplicate phone numbers and flag them using the following query
SELECT a1.Id, a2.Id
FROM Addresses a1
INNER JOIN Addresses a2 ON a1.PhoneNumber = a2.PhoneNumber
WHERE a1.Id < a2.Id
Note: I realize that there is another way to solve this problem by using EXISTS, but this is not part of the discussion.
The table has a Primary Key on the ID field with a Clustered Index, the fragmentation level is 0 and the phone number field is not null and has about 130 duplicates out of the 12k records. To make sure it is not a server or database instance issue I ran it on 4 different systems.
Execution of the query takes several minutes, sometimes several hours. After trying almost everything as one of my last steps I removed the Primary Key and ran the query without it and voila it executed in under 1 second. I added the Primary Key back and it still ran in under one second.
Does anybody have an idea what is causing this problem?
Is it possible that the primary key gets somehow corrupted?
EDIT: My apologies I had a couple of typos in the Sql Query
Out of data statistics. Dropping and recreating the PK will give up fresh statistics.
Too late now, but I'd have suggest running sp_updatestats to see what happened.
If you backup and restore a database onto different systems, statistics follow the data
I'd suspect a different plan too after non-indexed (I guess) columns PhoneNumber and CCAPhoneN
I'm guessing there are no indexes on PhoneNumber or PhoneNo.
You are joining on these fields, but if they aren't indexed it's forcing TWO table scans, one for each instance of the table in the query, then probably doing a hash match to find matching records.
Next step - get an exec plan and see what the pain points are.
Then, add indexes to those fields (assuming you see a Clustered Index Scan) and see if that fixes it.
I think your other issue is a red herring. The PK likely has nothing to do with it, but you may have gotten page caching (did you drop the buffers and clear the cache between runs?) which made the later runs faster.

Resources