SqlPackage not exporting entire database - sql-server

I'm trying to move a fairly large database (50GB) to Azure. I am running this command on the local Sql Server to generate a bacpac I can upload.
SqlPackage.exe /a:Export /ssn:localhost /sdn:MDBILLING /su:sa /sp:SomePassword /tf:"D:\test.bacpac"
The export does not print any errors and finishes with "Successfully exported database and saved it to file 'D:\test.bacpac'."
When I look at the bacpac in the file system, it always comes out to be 3.7GB. There's no way a 50GB database can be compressed that small. I upload it to Azure regardless. The package upload succeeds, but when I query the Azure database most of the tables return 0 rows. It's almost as if the bacpac does not contain all my database's data.
Are there any known limitations with this export method? Database size, certain data types, etc?
I tried using the 64bit version of SqlPackage reading that some experienced out of memory issues on large databases, but I wasn't getting this error or any error for that matter.
UPDATE/EDIT: I made some progress after ensuring that the export is transactionally consistent by restoring a backup and then extracting a bacpac from that. However, now I have run into a new error when uploading to Azure.
I receive the following message (using S3 database):
Error encountered during the service operation. Data plan execution failed with message One or more errors occurred. One or more errors occurred. One or more errors occurred. One or more errors occurred. XML parsing: Document parsing required too much memory One or more errors occurred. XML parsing: Document parsing required too much memory

The problem is resolved. My issues were two-fold.
First, because bacpac operations are not transactionally consistent, I had to restore from backup and make a bacpac out of the restored database. This ensured users were not adding rows while the bacpac was being generated.
Second issue was an XML column in my database. The table has roughly 17 million rows and of those rows roughly 250 them had really large xml documents stored in them (200000+ characters). Removing those 250 rows and them reimporting solved my problems. I really don't think it was the size of the xml document that Azure had an issue with. I think those large documents contained special characters the xml parser didn't like.
It's unclear to me how Sql Server allows unparseable xml to get into my database in the first place, but that was the other issue.

Related

SSIS Package Full Table Load Slow

We have an SSIS package that is apparently termed as 'slow' by the development team. Since they do not have a person with SSIS ETL, as a DBA I tried digging into it. Below is the information I found:
SQL Server was 2014 version upgraded -inplace to 2017 so it has SSIS of both versions.
They load a SQL Server table of size 200 GB into SSIS and then zip the data into flatfile using command line zip functionality.
The data flow task simple hits a select * from view - the view is nothing but containing the table with no other fancy joins.
While troubleshooting I found that on SQL Server, there is hardly any load coming, possibly because the select command is running in single thread and not utilizing SQL server cores.
When I run the same select * command (only for 5 seconds, since it is 200 GB table), even my command is single threaded.
The package has a configuration file that the SQL job shows (this is how the package runs) with some connection settings.
Opening the package in BIDS show defaultBufferMaxRows as 10000 only (possibly default value) (since configuration file or any variables does not has a customer value, I guess this is what the package is using too).
Both SQL and SSIS are on same server. SQL has been allocated max memory leaving around 100 GB for SSIS and OS.
Kindly share any ideas on how can I force the SQL Server to run this select command using multiple threads so that entire table gets inside SSIS buffer pool faster.
Edit: I am aware that bcp can read data faster than any process and save it to flatfile but at this point changes to the SSIS package has to be kept minimum and exploring options that can be incorporated within SSIS package.
Edit2: Parallelism works perfectly for my SQL Server as I verified for a lot of other queries.The table in question is 200 GB. It is something with SSIS only which is not hammering my DB as hard as it should.
Edit3: I have made some progress, adjusted the buffer value to 100 MB and max rows to 100000 and now the package seem to be doing better. when I run this package on the server directly using dtexec utility, it generates good load of 40- 50 MB per second but through SQL job it never generates lod more than 10 MB. so I am trying to figure out this behavior.
Edit4: I found that when I run the package directly from logging to the server and invoking dtexec utility, it runs good because it generates good load on the DB causing data I\O to remain steady between 30-50 MB\sec.
The same thing from SQL job never exceeds the I\O more than 10 MB\sec.
I even tried to run the package using agent and opting for cmdline operation but no changes. Agent literally sucks here, any pointers on what could be wrong here?
Final Try:
I am stumped at the observation I have finally:
1)Same package runs 3x faster when run from command prompt from windows node by invoking dtexc utility
2) Exact same package runs 3 times slower than above when involked by SQL agent which has sysadmin permissions on windows as well as SQL Server
In both cases, I tried to see the version of DTEXEC they invoke, and they both invoke the same version. So why one would be so slow is out of my understanding.
I don't think that there is a general solution to this issue since it is a particular case that you didn't provide much information. Since there are two components in your data flow task (OLE DB Source and Flat File Destination), I will try to give some suggestions related to each component.
Before giving suggestions for each component, it is good to mention the following:
If no transformations are applied within the data flow task, It is not recommended to use this task. It is preferable to use bcp utility
Check the TempDb and the database log size.
If a clustered index exists, try to rebuild it. If not, try to create a clustered index.
To check the component that is slowing the package execution, open the package in Visual Studio and try to remove the flat file destination and replace it with a dummy Script Component (write any useless code, for example: string s = "";). And then run the package; if it is fast enough, then the problem is caused by the Flat File Destination, else you need to troubleshoot the OLE DB Source.
Try executing the query in the SQL Server management studio and shows the execution plan.
Check the package TargetServerVersion property within the package configuration and make sure it is correct.
OLE DB Source
As you mentioned, you are using a Select * from view query where data is stored in a table that contains a considerable amount of data. The SQL Server query optimizer may find that reading data using Table Scan is more efficient than reading from indexes, especially if your table does not have a clustered index (row store or column store).
There are many things you may try to improve data load:
Try replacing the Select * from view with the original query used to create the view.
Try changing the data provider used in the OLE DB Connection Manager: SQL Server Native Client, Microsoft OLE DB provider for SQL Server (not the old one).
Try increasing the DefaultBufferMaxRows and DefaultBufferSize properties. more info
Try replacing using SQL Command with specific column names instead of selecting the view name (Table of View data access mode). more info
Try to load data in chunks
Flat File Destination
Check that the flat file directory is not located on the same drive where SQL Server instance is installed
Check that the flat file is not located on a busy drive
Try to export data into multiple flat files instead of one huge file (split data into smaller files) , since when the exported data size increase in a single file, writing to this file become slower, then the package will become slower. (Check the 5th suggestion above)
Any indexes on the table could slow loading. If there are any indexes, try dropping them before the load and then recreating them after. This would also update the index statistics, which would be skewed by the bulk insert.
Are you seeing SQL server utilizing other cores too for other queries? If not, maybe someone played with the following settings:
Check these under server configuration setting:
Maximum Degree of Parallelism
Cost Threshold for Parallelism (server configuration setting).
Does processors affinitized to a CPU.
Also, MaxDOP query hint can cause this too but you said there is no fancy stuff in the view.
Also, it seems you have enough memory on error, why not increase defaultBufferMaxRows to an extremely large number so that SQL server doesn't get slowed down waiting for the buffer to get empty. Remember, they are using the same disk and they will have to wait for each other to use the disk, which will cause extra wait times for the both. It's better SQL server uses it, put into the buffer, and then SSIS starts processing and writing it into disk.
DefaultBufferSize : default is 10MB, max possible 2^31-1 bytes
DefaultBufferMaxRows : default is 10000
you can set AutoAdjustBufferSize so that DefaultBufferSize is automatically calculated based on DefaultBufferMaxRows
See other performance troubleshooting ideas here
https://learn.microsoft.com/en-us/sql/integration-services/data-flow/data-flow-performance-features?view=sql-server-ver15
Edit 1: Some other properties you can check out. These are explained in the above link as well
MaxConcurrentExecutables (package property): This defines how many threads a package can use.
EngineThreads (Data Flow property): how many threads the data flow engine can use
Also try running dtsexec under the same proxy user used by SQL agent to see if you get different result with this account versus your account. You can use runas /user:... cmd to open a command window under that user and then execute dtexec.
Try changing the proxy user used in SQL Agent to a new one and see if it will help. Or try giving elevated permissions in the directories it needs access to.
Try keeping the package in file-system and execute through dtexec from the SQL Agent directly instead of using catalog.start_execution.
Not your case but for other readers: if you have "Execute Package Task", make sure the child packages to be executed are set to run in-process via ExecuteOutOfProcess property. This just reduces overhead of using more processes.
Not your case but for other readers: if you're testing in BIDS, it will run in debug mode by default and thus run slow. Use CTRL-F5 (start without debugging). The best is to use dtexec directly to test the performance
A data flow task may not be the best choice to move this data. SSIS Data Flow tasks are an ETL tool where you can do transformations, look ups, redirect invalid rows, add derived columns and a lot more. If the data flow task is simple and only moves data with no manipulation or redirection of rows then ditch the Data Flow task and use a simple Execute SQL Task and OPENROWSET to import the flat file that was generated from command line and zipped up. Assuming the flat file is a .csv file here are some working examples to query a .csv and insert the data to a table.
You need [Ad Hoc Distributed Queries] run_value set to 1
into dbo.Destination
SELECT *
from openrowset('MSDASQL', 'Driver={Microsoft Text Driver (*.txt; *.csv)};
DefaultDir=D:\YourCsv.csv;Extensions=csv;','select * from YourCsv.csv') File;
Here is some additional examples https://sqlpowershell.blog/2015/02/09/t-sql-read-csv-files-using-openrowset/
There are suggestions in this MSDN article: MSDN DataFlow performance features
Key ones appear to be:
Check the EngineThreads property of the DataFlow task, which tells SSIS how may source and worker threads it should use
If using OLE DB Source to select data from a view uses "SQL Command" and write a SELECT * From View rather than Table or View
Let us know how you get on
You may be facing I/O bottleneck while writing the 200GB to the flat file. I don't see any problem with SQL Query.
If possible create multiple files and split the data (either by modifying SSIS or changing the select query)

TFS GIT getting full Error TF30042. tbl_content is full

We are running our project in TFS using Git. Recently it started giving Error
TF30042: The database is full. Contact your Team Foundation Server
administrator. Server: ATSS-P-AAI\SqlExpress01, Error: 1105, Message:
'Could not allocate space for object
'dbo.tbl_Content'.'PK_tbl_Content' in database 'Tfs_DefaultCollection'
because the 'PRIMARY' filegroup is full. Create disk space by deleting
unneeded files, dropping objects in the filegroup, adding additional
files to the filegroup, or setting autogrowth on for existing files in
the filegroup.
I have checked and found that the tbl_content itself has occupied around 9.5GB of space while the total DB size is 10GB. One of my teammate had mistakenly checked in a repository with huge binary files before this happened. He has deleted the repository but seems like the tbl_content is still having same space.
I have tried setting autogrowth as well, but nothing seems to be working. We are now not able to use it anymore.
Any solutions are suggested.
Restricted File Growth in autogrowth will not work in your situation. Since the 10GB limitation is from SQL Express s Daniel mentioned.
SQL Server Express: Limitations of the free version of SQL Server
The most important limitation is that SQL Server Express does not
support databases larger than 10 GB. This will prevent you from
growing your database to be large.
What you can do at present:
Clean the drive to free up space. Delete transaction logs, look for extraneous test case attachments, build drops checked into source that sort of thing.
Restore your prior back-up database
Use SQL Server Standard instead
This is because you're using SQL Express. SQL Express is limited to databases of up to 10 GB.
The easy answer here is that you should upgrade your SQL edition. It may be possible to remove data from the database, but doing so without explicit instructions from Microsoft is not recommended.

Receiving "not enough storage" error while reading big Excel file

While I am importing data from excel to SQL I receive the below error. the memory is on the max. the format of excel is .xlsx. The size of the excel is 170 MB (178,587,611 bytes). But I got:
not enough storage error.
I will appreciate if anyone helps me.
Data flow execution failed. Not enough storage is available to process
this command. (Exception from HRESULT: 0x80070008)
(Microsoft.SqlServer.DTSRuntimeWrap)
That error is coming from the SSIS runtime, not SQL Server.
Running out of space in SQL Server produces
Msg 9002, Level 17, State 4, Line 20
The transaction log for database 'XXX' is full due to 'ACTIVE_TRANSACTION'.
or
Msg 1105, Level 17, State 2, Line 20
Could not allocate space for object 'YYY' in database 'XXX' because the 'PRIMARY' filegroup is full. Create disk space by deleting unneeded files, dropping objects in the filegroup, adding additional files to the filegroup, or setting autogrowth on for existing files in the filegroup.
It's unrelated to storage, and usually indicates a memory problem. I would first try reducing the buffer sizes in your Data Flow, and ensure that your data flow doesn't have any components that require loading large amounts of data into memory, like lookups.
See Data Flow Performance Features
This error mainly occurs when handling big Excel files using OLE DB adapter (OLE DB Connection manager or EXCEL connection manager) since this adapter has many limitations. I suggest reading excel file in chunks. In similar situations, I use mainly a C# script to do that, or you can do it by implementing a for loop container to loop over used range in excel.
One additional suggestion, is to use an 64-bit version of Microsoft Excel. This may increase the amount of data to be manipulated.
For additional information, you can refer to the following answers:
How to split a large excel file into multiple small file in SSIS?
OutOfMemoryException while trying to read big Excel file into DataTable

how can I safely backup a huge database?

I need to backup a Drupal database it is huge. So it has over 1500 tables (don't blame me, its a Drupal thing) and is 10GB in size.
I couldn't do it with PHPMyAdmin, I just got an error when it started to build the .sql file.
I want to make sure I wont break anything or take the server down or anything when I try to back it up.
I was going to attempt a mysqldump on my server and then copy the file down locally but realised that this may cause unforeseen problems. So my question to you is, is it safe to use mysqldump on so many tables at once and even if it is safe are there any problems such a huge file could lead to in the future for rebuilding the database?
Thanks for the input guys.
is it safe to use mysqldump on so many tables at once
I run daily backups with mysqldump on servers literally 10x this size: 15000+ tables, 100+ GB.
If you have not examined the contents of a file produced by mysqldump ... you should, because to see its output is to understand why it is an intrinsically safe backup utility:
The backups are human-readable, and consist entirely of the necessary SQL statements to create a database exactly like the one you backed up.
In this form, their content is easily manipulated with ubiquitous tools like sed and grep and perl, which can be used to pluck out just one table from a file for restoration, for example.
If a restoration fails, the error will indicate the line number within the file where the error occurred. This is usually related to buggy behavior in the version of the server where the backup was created (e.g. MySQL Server 5.1 allowed you to create views in some situations where the server itself would not accept the output of its own SHOW CREATE VIEW statement. The create statement was not considered -- by the same server -- to be a valid view definition, but this was not a defect in mysqldump, or in the backup file, per se.)
Restoring from a mysqldump-created backup is not lightning fast, because the server must execute all of those SQL statements, but from the perspective of safety, I would argue that there isn't a safer alternative, since it is the canonical backup tool and any bugs are likely to be found and fixed by virtue of the large user base, if nothing else.
Do not use the --force option, except in emergencies. It will cause the backup to skip over any errors encountered on the server while the backup is running, causing your backup to be incomplete with virtually no warning. Instead, find and fix any errors that occur. Typical errors during backup are related to views that are no longer valid because they reference tables or columns that have been renamed or dropped, or where the user who originally created the view has been removed from the server. Fix these by redefining the view, correctly.
Above all, test your backups by restoring them to a different server. If you haven't done this, you don't really have backups.
The output file can be compressed, usually substantially, with gzip/pigz, bzip2/bpzip2, xz/pixz, or zpaq. These are listed in approximate order by amount of space saved (gzip saves the least, zpaq saves the most) and speed (gzip is the fastest, zpaq is the slowest). pigz, pbzip2, pixz, and zpaq will take advantage of multiple cores, if you have then. The others can only use a single core at a time.
Use mysqlhotcopy it is well working with large databases
Work only MyISAM and ARCHIVE-tables.
Work only on the server where the database is stored.
This utility is deprecated in MySQL 5.6.20 and removed in MySQL 5.7

Storing binary files in sql server

I'm writing an mvc/sql server application that needs to associate documents (word, pdf, excel, etc) with records in the database (supporting sql server 2005). The consensus is it's best to keep the files in the file system and only save a path/reference to the file in the database. However, in my scenario, an audit trail is extremely important. We already have a framework in place to record audit information whenever a change is made in the system so it would be nice to use the database to store documents as well. If the documents were stored in their own table with a FK to the related record would performance become an issue? I'm aware of the potential problems with backups/restores but would db performance start to degrade at some point if the document tables became very large? If it makes any difference I would never expect this system to need to service anywhere near 100 concurrent requests, maybe tens of requests.
Storing the files as blob in database will increase the size of the db and will definitely affect the backups which you know and is true.
There are many things of consideration whether the db and code server are same.
Because it happens to be code server requests and gets data from db server and then from code server to client.
If the file sizes are too large I would say go for the file system and save file paths in db.
Else you can keep the files as blog in db, it will definitely be more secure, as well as safe from virus, etc.

Resources