Empty the contents of a database - sql-server

I have a huge data base with complicated relations, how can I delete all tables contents without violating foreign key constraints,is there a a such way to do that?
note that I am writing a SQL script file to delete tables such as the following example:
delete from A
delete from B
delete from C
delete from D
delete from E
but I don't know what table should I start with.

In SQL Server, there is no native way to do what you're asking. You do have a few options depending on your particular environment limitations:
Figure out the relationships between the tables and start deleting rows out in the appropriate order from foreigns to parents. This may be time-consuming for a large number of objects, but is the "safest" in terms of least destruction.
Disable the foreign key constraints and TRUNCATE TABLE. This will be a bit faster if you're dealing with lots of data, but you still have to to know where all your relationships are. Not too terrible if you're working with fewer tables, though option 1 becomes just as viable
Script out the database objects and DROP DATABASE/CREATE DATABASE. If you don't care about a raw teardown of the database, this is another option, however, you'll still need to be aware of object precedence for creation. SQL Server—as well as third-party tools— offer ways to script object DROP/CREATE. If you decide to go this route, the upside is that you have a scripted backup of all the objects (which I like to keep "just in case") and future tear-downs are nearly instantaneous as long as you keep your scripts synchronized with any changes.
As you can see, it's not a terribly simple process because you're trying to subvert the very reason for the existence of the constraints.

Steps can be:
disable all the constraint in all the tables
delete all the records from all the tables
enable the constraint back again.
Also see this discussion: SQL: delete all the data from all available tables

TRUNCATE TABLE tableName
Removes all rows from a table without
logging the individual row deletions.
TRUNCATE TABLE is similar to the
DELETE statement with no WHERE clause;
however, TRUNCATE TABLE is faster and
uses fewer system and transaction log
resources.
TRUNCATE TABLE (Transact-SQL)

Dude, taking your question at face value... that you want to COMPLETELY recreate the schema with NO data... forget the individual queries (too slow)... just destroydb, and then createdb (or whatever your RDBM's equivalent is)... and you might want to hire a competent DBA.

Related

Set Null works in Sqlite but not works in Sql Server when relating table itself

The problem is that "DeleteBehavior.SetNull" works only in Sqlite and doesn't work at all in Sql Server, is this some limitation of Sql Server with SET NULL?
I have the "User" model:
User.Id
User.Name
And I also have the "Partner" model:
Partner.Id
Partner.Title
Partner.ParentId
Partner.Parent (virtual)
Scenario:
I create Partner 1
I create Partner 2 and define that the ParentId is Partner 1 (1 is the father of 2)
I try to delete Partner 1 (I try to delete the parent)
At that moment, Sqlite defines NULL in the ParentId of Partner 2, that's correct, that's the behavior I want, but in SQL Server I can't do that at all, I tried innumerable ways and I fall into some errors, follow below:
Errors:
Delete Error:
Microsoft.Data.SqlClient.SqlException (0x80131904): The DELETE statement conflicted with the SAME TABLE REFERENCE constraint "FK_Partners_Partners_ParentId". The conflict occurred in database "master", table "dbo.Partners", column 'ParentId'.
Migrations Error:
Introducing FOREIGN KEY constraint 'FK_Partners_Partners_ParentId' on table 'Partners' may cause cycles or multiple cascade paths. Specify ON DELETE NO ACTION or ON UPDATE NO ACTION, or modify other FOREIGN KEY constraints.
Could not create constraint or index. See previous errors.
I even found some old texts saying that this is a Sql Server limitation, but it's already 2023 and this limitation still exists? Is it possible to get around this in some way that is easy and affects every table in the database?
I already tried all the DefaultBehavior and none works like Sqlite, I was programming 100% in Sqlite and I managed to develop a system and everything is working, however when generating the migration and trying to use Sql Server I came across this problem.
The same thing is asked at dba.stackexchange.com. The answers explain in detail why this isn't so easy to implement. Relational databases operate on sets of rows at a time, not individual rows. Deleting or updating rows one by one is the slowest way possible.
While SQLite is built to handle a few thousand rows for a single application running inside a watch, SQL Server has to handle thousands of concurrent operations to the same table that may contain several millions of rows spread across multiple partitions. The self-referencing ON DELETE SET NULL has to work reliably and predictably when deleting 1 row in an 1000 row table and when deleting 10K rows in a 50M row table.
As Mikael Eriksson explains in the first answer, ON DELETE SET NULL converts a DELETE operation on a table to an UPDATE operation on the same table.
This DBA question on cascading DELETEs shows what's involved in the easy case :
In this picture the server :
Finds the rows that need to be deleted in the first table,
Removes them from the parent table. That means marking rows and pages for deletion, writing records to the transaction log
Spool the deleted keys so they can be used on the related table
Repeat 1-2 on the child table
When all that finishes, commit the transaction by committing all changes in the data pages and the transaction log.
And that's a single operation. ON DELETE SET NULL on the other hand converts the DELETE operation to an DELETE and an UPDATE on the same table. The database would have to both DELETE and UPDATE index rows on the ParentID index to get this to happen. Different kinds of locks would have to be taken, and some of them could be taken
There's a similar statement that does multiple operations at once, MERGE. Aaron Bertrand's Use Caution with SQL Server's MERGE Statement shows a list of 30 bugs for that statement alone. MERGE isn't even atomic and the UPDATE/DELETE/INSERT operations are executed separately, which is the cause of some of the bugs.
I'd rather not have ON DELETE SET NULL than have a slow or unreliable one
While trying to reproduce this I found an SQLite limitation - foreign keys aren't enforced by default for compatibility with the way it worked over a decade ago. The docs warn this can change in the future:
Foreign key constraints are disabled by default (for backwards compatibility), so must be enabled separately for each database connection. (Note, however, that future releases of SQLite might change so that foreign key constraints enabled by default. Careful developers will not make any assumptions about whether or not foreign keys are enabled by default but will instead enable or disable them as necessary.) The application can also use a PRAGMA foreign_keys statement to determine if foreign keys are currently enabled.
This can seem like an illogical restriction in 2023 until one remembers that SQLite was built to run on the weakest possible devices (microcontrollers, not even processors) where the very fact of checking constraints can cause significant problems. Those devices can easily be inside a car or other hardware device with a lifetime of decades.

Strategies to modify huge database

I am testing different strategies for a incoming breaking change. The problem is that each experiment would carry some costs in Azure.
The data is huge, and can have some inconsistencies due to many years with fixes and transactions before I even knew the company.
I need to change a column in a table with million of records and dozens of indexes. This will have a big downtime.
ALTER TABLE X ALTER COLUMN A1 decimal(15, 4) --The original column is int
One of the initial ideas (Now I know this is not possible) is to have a secondary replica, do the changes there, and, when changes finish, swap primary with secondary... zero or almost zero downtime. I am referring to a "live", redundant replica, not just a "copy"
EDIT:
Throwing new ideas:
Variations to what have been mentioned in one of the answers: Create a table replica (not the whole DB, just the table), apply a INSERT INTO... SELECT and swap the tables at the end of the process. Or... do the swap early to minimize downtime in trade of a delay during the post-addition of all records from the source
I have tried this, but takes AGES to complete. Also, some null and FK violations make the process to fail after processing for several hours.
"Resuming" could be an option but it makes the process slower with each execution. Without some kind of "Resume", each failure have to be repeated from scratch
An acceptable improvement could be to IGNORE the errors (but create logs, of course) and apply fixes after migration. But afaik, AzureSql (nor SqlServer) doesn't offer an "ignore" option
Drop all indexes, constraints and dependencies to the column that needs to be modified, modify the column and apply all indexes, constraints and dependencies again.
Also tried this one. Some indexes take AGES to complete. But for now seems to be the best bet.
There is a possible variation by applying ROW COMPRESSION before the datatype change, but I think it will not improve the real deal: index re-creation
Create a new column with the target datatype, copy the data from the source column, drop the old column and rename the new one.
This strategy also requires to drop and regenerate indexes, so it will not offer lot of gain (if any) with regards #2.
A friend thought of a variation on this, which is to duplicate the needed indexes ONLINE for the column copy. In the meanwhile, trigger all changes on source column to the column copy.
For any of the mentioned strategies, some gain can be obtained by increasing the processing power. But, anyway, we consider to increase the power with any of the approaches, therefore this is common for all solutions
When you need to update A LOT of rows as a one-time event, maybe it's more effective to use the following migration technique :
create a new target table
use INSERT INTO SELECT to fill the new table with correct / updated values
rename the old and new table
create indexes for the new table
After many tests and backups, we finally used the following aproach:
Create a new column [columnName_NEW] with the desired format change. Allow NULLS
Create a trigger for INSERTS to update the new column with the value in the column to be replaced
Copy the old column value to the new column by batches
This operation is very time consuming. We ran a batch every day in a maintenance window (2h during 4 days). Our batch filled the values taking oldest rows first, we counted on the trigger filling the new ones
Once #3 is complete, don't allow NULLS anymore on the new column, but set a default value to avoid the INSERT trigger to crash
Create all the needed indexes and views on the new column. This is very time consuming but can be done ONLINE
Allow NULLS on the old column
Remove the insert trigger - start downtime now!
Rename the old column to [columnName_OLD], the new to [columnName]. This requires few downtime seconds!
--> You can consider it is finally done!
After some safe time, you can backup the result and remove [columnName_OLD] with all of its dependencies
I selected the other answer, because I think it could be also useful in most situations. This one has more steps but has a very little downtime and is reversible at any step but the last.

How to find the next SQL Server Replication Statement

I am debugging my database and I am finding that the replication is failing on a delete statement.
I look at the source and destination tables and they are the same. So someone deleted a row from the source and then put it back in. (The delete fails because of a FK reference to some manual data that I don't want to cascade delete.)
Is there a way to find out the PK of the row it is trying to delete?
(All Replication monitor will tell me is the name of the FK that is causing the delete statement to fail.)
There are a couple of ways. I'll tell you the easy one (because I'm lazy). Put a trace on your subscriber for non-successful stored procedure executions. You should get a hit for one called something like sp_MSdel_table (where table is the name of your table). The argument(s) to that procedure will be the primary key of the record that it's trying to delete.
Easy way number two is to modify the sproc identified in the previous method not to be angry at a missing row (after all, it's just going to delete it so the fact that's it's now missing isn't that big a deal). You might have other non-convergence issues, but at least you can get your commands flowing again. (EDIT: Just noticed the reason for your issue. I'd advise not having FK constraints at the subscriber since any referential integrity should be taken care of at the publisher. I'll make your replication faster when SQL doesn't have to check that each time it does an applicable insert, update, or delete).
Hard way number one involves looking at the error in replication monitor an noting that there's a transaction id and sequence number specified. You then use a sproc in the distribution database to get the text of the command being executed.
Hard way number two involved diffing the tables either with tablediff.exe, something like RedGate's SQLCompare, or a roll-your-own join over linked servers to show the difference. Keep this in your back pocket just in case one of the other one-row-at-a-time methods mentioned above doesn't do it for you. My threshold for such things is about three. YMMV.

Error handling and data integrity when changing table schema

We have a few customers with large data sets and during our upgrade procedure we need to modify the schema of various tables (adding some columns, renaming others, occasionally changing data types, but that's rare).
Previously we've been going via a temporary table with the new schema, and then dropping the original and renaming the temp table but I'm hoping to speed that up dramatically by using ALTER table ... instead.
My question is what data integrity and error handling issues do I need to consider? Should I enclose all changes to a table in a transaction (and if so, how?) or will the DBMS guarantee atomicity and integrity over an ALTER operation?
We already heavily recommend customers backup their data before starting the upgrade so that should always be a fall back option.
We need to target SQLServer 2005 and Oracle, but obviously I can add conditional code if they require different approaches.
Comments for Oracle only:
Table alterations are DDL, so the concept of a transaction doesn't apply - every DDL statement locks the table for the duration of the operation and either succeeds or fails.
Adding (nullable!) columns or renaming existing columns is a relatively lightweight process and shouldn't present any problems if the table lock can be acquired.
If you're adding/modifying constraints (either NOT NULL or other more complex check constraints) Oracle will check existing data to validate the constraints unless you add the ENABLE NOVALIDATE clause to the constraint DDL. The validation of existing data can be a lengthy process for large tables.
If you're scripting the upgrade to be run as a SQL*Plus script, save yourself a lot of headaches by using the "whenever sqlerror exit sql.sqlcode" directive to abort the script on the first failure to make the review of partially implemented upgrades easier.
If the upgrade must be performed on a live system where you can neither control transactions or afford to miss them, consider using the Oracle DBMS_REDEFINITION package, which automatically creates a temporary configuration of temp tables and triggers to capture in-flight transactions while redefining the table in the "background". Warning - lots of work and a steep learning curve for this option.
If you're using SQL Server then ddl statements are transactional, so wrap in a transaction (I don't think this applies to Oracle though).
We split upgrades into individual patches that go with a particular feature. Which patches are applied go in a database_patch_history table, and it's easy to see which patches were applied and how to roll them back.
As you say, taking a backup before you start is important.
I have had to do changes like this in the past and have always been very paranoid about data loss. To help mitigate that risk I have always done tons of testing against "sandbox" databases that mirrored the target databases in schema and data as closely as possible. Test out the process as much as possible before rolling it out, just like you would any other area of the application.
If you dramatically change any data types of columns, for instance change a VARCHAR to an INT, the DBMS will panic and you will probably loose that data. Luckily, nowadays DBMSs are intelligent enough to do some data type conversions without loosing the data, but you don't want to run the risk of damaging any of it when making the alterations.
You shouldn't loose any data by renaming columns and definitely won't by adding new columns, it's when you move the data about that you have to be concerned.
Firstly, backup the entire table, both the schema and data, so at a second's notice you can roll back to the previous schema. Secondly, look at the alterations you are trying to make, see how drastic they are - try to figure out exactly what needs to change. If you're making datatype conversions push that data to an intermediatery table first with 3 columns, the foreign key (id or whatever so you can locate the row), the old data and the new column. Then either push the old data to the new column directly, or convert it at the application-level.
When it's all in the correct types and everything's been successful, run the ALTER statements and repopulate the database! It's simple enough to do, just needs a logical thought process.

Is deleting all records in a table a bad practice in SQL Server?

I am moving a system from a VB/Access app to SQL server. One common thing in the access database is the use of tables to hold data that is being calculated and then using that data for a report.
eg.
delete from treporttable
insert into treporttable (.... this thing and that thing)
Update treportable set x = x * price where (...etc)
and then report runs from treporttable
I have heard that SQL server does not like it when all records from a table are deleted as it creates huge logs etc. I tried temp sql tables but they don't persists long enough for the report which is in a different process to run and report off of.
There are a number of places where this is done to different report tables in the application. The reports can be run many times a day and have a large number of records created in the report tables.
Can anyone tell me if there is a best practise for this or if my information about the logs is incorrect and this code will be fine in SQL server.
If you do not need to log the deletion activity you can use the truncate table command.
From books online:
TRUNCATE TABLE is functionally
identical to DELETE statement with no
WHERE clause: both remove all rows in
the table. But TRUNCATE TABLE is
faster and uses fewer system and
transaction log resources than DELETE.
http://msdn.microsoft.com/en-us/library/aa260621(SQL.80).aspx
delete from sometable
Is going to allow you to rollback the change. So if your table is very large, then this can cause a lot of memory useage and time.
However, if you have no fear of failure then:
truncate sometable
Will perform nearly instantly, and with minimal memory requirements. There is no rollback though.
To Nathan Feger:
You can rollback from TRUNCATE. See for yourself:
CREATE TABLE dbo.Test(i INT);
GO
INSERT dbo.Test(i) SELECT 1;
GO
BEGIN TRAN
TRUNCATE TABLE dbo.Test;
SELECT i FROM dbo.Test;
ROLLBACK
GO
SELECT i FROM dbo.Test;
GO
i
(0 row(s) affected)
i
1
(1 row(s) affected)
You could also DROP the table, and recreate it...if there are no relationships.
The [DROP table] statement is transactionally safe whereas [TRUNCATE] is not.
So it depends on your schema which direction you want to go!!
Also, use SQL Profiler to analyze your execution times. Test it out and see which is best!!
The answer depends on the recovery model of your database. If you are in full recovery mode, then you have transaction logs that could become very large when you delete a lot of data. However, if you're backing up transaction logs on a regular basis to free the space, this might not be a concern for you.
Generally speaking, if the transaction logging doesn't matter to you at all, you should TRUNCATE the table instead. Be mindful, though, of any key seeds, because TRUNCATE will reseed the table.
EDIT: Note that even if the recovery model is set to Simple, your transaction logs will grow during a mass delete. The transaction logs will just be cleared afterward (without releasing the space). The idea is that DELETE will create a transaction even temporarily.
Consider using temporary tables. Their names start with # and they are deleted when nobody refers to them. Example:
create table #myreport (
id identity,
col1,
...
)
Temporary tables are made to be thrown away, and that happens very efficiently.
Another option is using TRUNCATE TABLE instead of DELETE. The truncate will not grow the log file.
I think your example has a possible concurrency issue. What if multiple processes are using the table at the same time? If you add a JOB_ID column or something like that will allow you to clear the relevant entries in this table without clobbering the data being used by another process.
Actually tables such as treporttable do not need to be recovered to a point of time. As such, they can live in a separate database with simple recovery mode. That eases the burden of logging.
There are a number of ways to handle this. First you can move the creation of the data to running of the report itself. This I feel is the best way to handle, then you can use temp tables to temporarily stage your data and no one will have concurency issues if multiple people try to run the report at the same time. Depending on how many reports we are talking about, it could take some time to do this, so you may need another short term solutio n as well.
Second you could move all your reporting tables to a difffernt db that is set to simple mode and truncate them before running your queries to populate. This is closest to your current process, but if multiple users are trying to run the same report could be an issue.
Third you could set up a job to populate the tables (still in separate db set to simple recovery) once a day (truncating at that time). Then anyone running a report that day will see the same data and there will be no concurrency issues. However the data will not be up-to-the minute. You also could set up a reporting data awarehouse, but that is probably overkill in your case.

Resources