I have an issue where an entire tables worth of data was deleted. It is a child table, and contains its own Primary Key, a Foreign Key to its parent and some other data.
I tried using Merge, generated from a stored procedure I found here:
https://github.com/readyroll/generate-sql-merge
This generates a giant merge statement for your whole table. That worked ok for a while, but I've since found that records from a parent table have since been deleted, and Merge doesn't handle this too well.
I've tried rewriting it, but I'm getting bogged down in it and it feels like something somebody else will have done before.
What I'd really like is a way to generate 1000's of insert statements with an If Exists above each one saying
IF NOT EXISTS (select PK from ChildTable where ID = <about to be inserted>) AND EXISTS (select FK from ParentTable where ID = <about to be inserted>)
INSERT RECORD
OUTPUT PK TO LOG TABLE
Theres about 20,000 records so its really something I don't want to have to do by hand, and because the delete event happened several times over a few months, I need to generate the data from several different databases to recreate the whole picture.
I'd like to keep the inserted Ids in a log table, so I can tell whats been inserted, and so the data could be restored to a prescript state for any reason.
Any advice on my approach would also be welcome.
Thanks :)
So long story short is I tried a few ways to fix this, and the best was to generate Insert statements for the table using SQL Generate scripts, and bring them into a temp table.
Because I only wanted to import 90% of the data, and exclude specific records based on a few conditions, I originally thought I should wrap each Insert with an IF, but 20,000 IFs broke down when trying to create a query plan.
Instead, I inserted all the records without a filter into a temp table. I then deleted all the records I didn't want from this table with a couple of Delete statements.
Lastly I then said for all remaining data in the temp table, insert it into the actual proper table, where the data was originally missing from.
This worked perfectly, and SQL Manager was able to run without crashing. It was also a lot clearer what I was doing, and I didn't have to add lots of complicated IFs in a string builder.
I also used OUTPUT on the insert into to log all the record Ids I'd inserted, to give the audit trail like I needed.
As for the string builder, the query was so long and complicated, and Excel was moaning about a 256 char limit, I ended up using 20+ columns to build up my query after concat'ing 26 columns of data. When I used the auto formula drag feature in Excel, it would crash my machine going across so many records, and I have a pretty grunty machine!
Using the Script Generate, rather than the string builder also had the benefit of not altering the data in anyway. It was purley like for like, so weird characters and new lines etc were no issue.
Related
I experienced a very strange occurrence relating to a multi-query transaction. After SQL Server was updated from 2008 to 2016 (with no warning from our host), we started dropping data after it was posted to the API. The weird thing is, some of the data arrived, and some didn’t.
In order to protect integrity, the queries are all joined in one transaction. The records can be created and then updated at a later time. They are formatted similar to this:
DELETE FROM table_1 WHERE parentID = 123 AND col2 = 321;
DELETE FROM table_2 WHERE parentID = 123 AND col2 = 321;
-- etc
INSERT INTO table_1 (parentID, col2, etc) VALUES (123, 321, 123456);
INSERT INTO table_2 (parentID, col2, etc) VALUES (123, 321, 654321);
-- etc
There could be hundreds of lines being executed. Due to design, the records in question do not have unique IDs, so the most performant way to execute the queries was to first delete the matching records, then re-insert them. Looping through the records and checking for existence is the only other option (as far as I know), and that is expensive with that many records.
Anyway, I was struggling to find a reason for this data loss, which seemed random. I had logs of the sql queries, so I know they were being formatted correctly and they had all the data intact. Finally, the only thing left I could think of was to separate the DELETE queries into a separate transaction and execute first*. That seems to have fixed the problem.
Q. Does anyone know if these queries could be executed out of order in which they were presented? Do you see a better way I could be writing these transactions?
* I don't necessarily like this solution, because the delete queries were the main reason I wanted a transaction in the first place. If an error occurs during the second transaction, then all the older matching records have been deleted, but the newer versions are never saved. Living on the edge...
P.S. One other problem I had, and this is probably due to my ignorance of the platform, when I tried to bracket these queries with BEGIN TRAN; and COMMIT TRAN;, immediately after this script finished, any following queries in the same thread got hung up for about 20-30 seconds or so. What am I doing wrong? Do I actually need these verbs if all the queries are being executed at once?
We could use a bit more information, such as if there is unique constraint on your table and ignore duplicate insert.
if the data is missing, it could be due to insert failed and this will register an entry in the Profiler event "User Error Message" under "Errors and Warnings" event class. Create a trace to filter this login only and check each statement and if there is any user errors raised in the trace.
If you have a other processes running (other applications or threads), it is possible that after you inserted the records other deleted that row without your knowledge. In this case, you might want to set up a delete trigger to log all update and delete actions on the table and see what is the user performing these actions. In short, if you think you have lost data, it is either the command was not executed , executed with error, or deleted buy other processes after execution.
The following statement takes at least 4 seconds:
INSERT INTO [SomeSmallTable]
SELECT * FROM ComplexView
WHERE [Date] = convert(datetime, '23/09/2020',103)
However, if we only run the SELECT part without the INSERT INTO, it takes less than half a second:
SELECT *
FROM ComplexView
WHERE [Date] = convert(datetime, '23/09/2020',103)
The view selects less than 200 rows, and the table called "SomeSmallTable" holds only a few rows. I think this issue started when we updated the view called "ComplexView". ComplexView is based on other views (and some of these views are based on other views itself), as well as some tables.
I tried to refresh all views using sp_refreshview, but to no avail.
How can we determine the cause of this issue and hopefully solve it?
[EDIT]
My reply to some comments:
#Dale K: I can't post the execution plans, I think they way to complex, and not relevant as they are equal for both statements, with or without the INSERT part, except for the Table Insert part. But I did see that the INSERT costs 100%. For some reason SQL has trouble inserting the view results in the table.
#Panagiotis Kanavos: Nobody but me is using the database. It's a copy of our clients database and I'm working on it on my local machine.
#gotqn: SomeSmallTable is a table, so no table variable or temporary table. However, it is created when a user opens a specific form in our application, and deleted then the user closes this form.
#Arvo: SomeSmallTable has no keys and no triggers. The view returns less than 200 rows which are inserted in this table, and before these are inserted the table is empty.
I followed the steps in the accepted answer, and eventually compared the current "ComplexView" with the previous version, and found out what caused this issue.
Checking the execution plan is the first step, as others have said. Given that the INSERT (rather than the query) is causing the delay, you could troubleshoot that further. Here are some things you can try:
Try using Statistics IO to find out more, as answered here.
Attempt an INSERT using static data (e.g. INSERT INTO [SomeSmallTable] VALUES (1, 2, '...etc');). This will tell you if the issue is any INSERT statement, or when inserting from a view specifically.
Check how much data the view is returning. 4s may or may not be reasonable, depending on how many rows are being inserted.
Check the table design to see how it is using primary keys, foreign keys, composite keys, indexes, triggers, etc. Some of these features optimise a table's design for selecting, but make insertion slower as a trade-off. A good answer about this can be found here.
If you know it's not a load issue (because you're the only one using this database), check whether something else might be restricting resources on the machine you're using (other resource-intensive tasks, any other queries happening at the same time, scheduled jobs within SQL Server, etc.) You can use SQL Server Profiler to watch the queries in real time.
If slow performance is not limited to this particular query, then there are other general design considerations you can look into.
I'm using tracing to log all delete or update queries run through the system. The problem is, if I run a query like DELETE FROM [dbo].[Artist] WHERE ArtistId>280, I know how many rows were deleted but I'm unable to find out which rows were deleted (the data they had).
I'm thinking of doing this as a logging system so it would be useful to see which rows were affected and what data they had if at all possible. I don't really want to use triggers for this job but I will if I have to (and if it's feasible).
If you need the original data and are planning on storing all the deleted data in a separate table why not just logically delete the original data rather than physically delete it? i.e.
UPDATE dbo.Artist SET Artist_deleted = 1 WHERE ArtistId>280
Then you only need add one column to your current table rather than creating new tables and scripts to support these. You could then partition the current table based on the deleted flag if you are worried about disk space/performance etc.
I'm currently changing the Id field of table to be an IDENTITY field. This is simple: Create a temp-table, copy all the data to the temp-table, adjust all the references from and to the table to point from and to the new temp-table, drop the old table, rename the temp-table to the original name.
Now I've got the problem that the copy step is taking too long. Actually the table doesn't have too many entries (~7.5 million rows), but it still takes multiple hours to do this.
I'm currently moving the data with a query like this:
SET IDENTITY_INSERT MyTable_Temp ON
INSERT INTO MyTable_Temp ([Fields]) SELECT [Fields] FROM MyTable
SET IDENTITY_INSERT MyTable_Temp OFF
I've had a look at bcp in combination with cmdshell and a following BULK INSERT, but I don't like the solution of first writing the data to a temp-file and afterwards dumping it back into the new table.
Is there a more efficient way to copy or move the data from the old to the new table? And can this be done in "pure" T-SQL?
Keep in mind, the data is correct (no external sources involved) and no changes are being made to the data during transfer.
Your approach seems fair, but the transaction generated by the insert command is too large and that is why it takes so long.
My approach when dealing with this in the past, was to use a cursor and a batching mechanism.
Perform the operation for only 100000 rows at a time, and you will see major improvements.
After the copy is made you can rebuild your references and eventually remove the old table... and so on. Be careful to reseed your new table accordingly after the data is copied.
I have a table that contains over than a million records (products).
Now, daily, I need to either update existing records, and/or add new ones.
Instead of doing it one-by-one (takes couple of hours), I managed to use SqlBulkCopy to work with bunch of records and managed to do my inserts in the matter of seconds, but it can handle only new inserts. So I am thinking about creating a new table that contains new records and old records; and then use that temporary table (on the SQL end) to update/add to the main table.
Any advice how can I perform that update?
One of the better ways to handle this is with the MERGE command in SQL. Mssqltips has a good tutorial on it, it can be a bit trickier to use than some of the other commands.
Also, due to locking you may want to break this up into multiple smaller transactions, unless you know you can tolerate blocking during the update.
We handle this situation in our code in the way you described; we have a temp table, then run an update where the ID in the temp table matches the table to be updated, then run an insert where the ID in the table to be updated is null. We normally do this for updates to library/program settings, though, so it is only run infrequently, on smaller tables. Performance may not be up to par for that many records, or daily runs.
The main "gotcha" I've encountered with this method is that for the update, we did a comparison to make sure at least one of several fields changed before actually running the update. (Our initial reason for this was to avoid overwriting some defaults, which could affect server behavior. Your reason for this might be performance, if your temp table could contain records that haven't actually changed). We encountered a case where we did actually want to update one of the defaults, but our old script didn't catch that. So if you do any comparisons to determine which products you want to update, make sure it is either complete from the start, or document well any fields you don't compare, and why.