Looking into column last_user_update in dm_db_index_usage_stats table.
I can see that even when I run
DELETE [table] WHERE 1=0
the last_user_update is being updated.
Ideas?
From the documentation, the user_updates column is defined as:
Number of updates by user queries. This includes Insert, Delete, and Updates representing number of operations done not the actual rows affected. For example, if you delete 1000 rows in one statement, this count increments by 1.
And last_user_update is defined as:
Time of last user update.
Therefore, if you DELETE 0 rows or 1,000,000 rows, the values will be updated, as it related to the operation, not the rows. A DELETE that results in 0 rows being deleted is still a DELETE operation; whether the WHERE has something that can never be true (0 = 1) or something more "reasonable" WHERE SomeColumn = #SomeVariable.
Related
I am trying to delete millions of records from 4 databases, and running into an unexpected error. I made a temp table that holds a list of all the id's I wish to delete:
CREATE TABLE #CaseList (case_id int)
INSERT INTO #CaseList
SELECT DISTINCT id
FROM my_table
WHERE <my criteria for choosing cases>
I have deleted all the associated records (with foreign key on case_id)
DELETE FROM image WHERE case_id in (SELECT case_id from #CaseList)
Then I'm deleting records from my_table in batches (so as not to blow up the transaction log - which despite my database being in Simple Mode - still grows when making changes like deletions):
DELETE FROM my_table WHERE id in (SELECT case_id
FROM #CaseList
ORDER by case_id
OFFSET 0 ROWS
FETCH NEXT 10000 ROWS ONLY)
This will work fine for one or three or five rounds (so I've deleted 10k-50k records), then will fail with this error message:
Msg 512, Level 16, State 1, Procedure trgd_image, Line 188
Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.
Which is really weird because as I said, I already deleted all the associated records from the image table. Then it gets weirder because if I select smaller batches, the deletion works without error.
I generally cut the FETCH NEXT n half (5k), then in half again (2500), then in half again (1200) etc. until it works
DELETE FROM my_table WHERE id in (SELECT case_id
FROM #CaseList
ORDER by case_id
OFFSET 50000 ROWS
FETCH NEXT 1200 ROWS ONLY)
Then repeat that amount until I get past where it failed, then turn it back up to 10000 and it will work again for a batch or three...
DELETE FROM my_table WHERE id in (SELECT case_id
FROM #CaseList
ORDER by case_id
OFFSET 60000 ROWS
FETCH NEXT 10000 ROWS ONLY)
then fail again with the same error... rinse, wash, and repeat.
What can cause that subquery error when there are NOT related records in the image table? Why would selecting the cases in smaller batches work "around it" and then allow larger batches again?
I would really love a solution to this so I can make a WHILE loop and run this deletion through the millions of rows that way instead of having to manage it manually which is going to take me weeks with millions of rows needed to be deleted out of 4 databases.
The query you're showing cannot produce the error you're seeing. If you're sure it is, you have a bug report. My guess is that trgd_image, Line 188 (or somewhere nearby) you'll find you're using a scalar comparison, = instead of in.
I also have some advice for you, free for the asking. I wrote lots of queries like yours, and never used anything like OFFSET 60000 ROWS FETCH NEXT 10000 ROWS ONLY. You don't need to, either, and your SQL will be easier to write if you don't.
First, unless your machine is seriously undersized for 2018 for the scale of data you're using, I think you'll find 100,000 row transactions are just fine. If not, at least try to understand why not. A machine managing many millions of rows ought to be able to deal with a 1% of them without breaking a sweat.
When you populate #CaseList, trap ##rowcount. Then you can print/record that, and compute the number of "chunks" in your work.
Ideally, though, there's no temporary table. Instead, those cases probably have some logical grouping you can operate on. They might have regions or owners or dates, whatever was used to select them in the first place. Iterate over that, e.g.
delete from T where id in (select id from S where user = 1
Once you do that, you can write a loop:
select #user = min(user) from S where ...
while #user is not NULL begin
print "deleting cases for user", #user
delete from T where id in (select id from S where user = #user)
select #u = #user
select #user = min(user) from S where ... and user > #u
end
That way, if the process blows up partway through -- for any reason -- you have a logical grouping of deletions and a clean break: you know all the cases for user (or whatever) less than #user are deleted, and you can look into what's wrong with the "current" one. Quite often, you'll discover that the problem isn't unique, and by solving it you'll prevent future problems with others.
I'm trying to understand the behavior of an UPDATE/REPLACE that I'm carrying out that is removing some invalid data and replacing with preferred data.
The UPDATE executes normally and does what it needs to do, but the rows affected are not what I expected in some cases (I'm carrying this out on multiple databases).
I've put part of the script below (The rest is essentially replicating the same function across multiple tables)
UPDATE TBL_HISTORY
SET DETAILS = REPLACE(DETAILS,'"','Times New Roman')
WHERE HISTORYID IN
(SELECT TOP 1000 (HISTORYID) FROM TBL_HISTORY
WHERE DETAILS LIKE '%"%')
GO
What I'd imagine to happen with the script above is to select the TOP 1000 records in TBL_HISTORY that contain the unwanted string of data and carry out the REPLACE.
The result has been in cases where there are more than 1000 affected rows it will update all of them, returning a value of 1068 rows affected for example.
HISTORYID is the PK on the table. Am I misunderstanding how this should work? Any guidance would be appreciated.
Try this instead(it is faster). If it still update more than 1000 rows, it is due to a trigger. If it updates 1000 rows then HISTORYID is not the only column in the primary key(composite primary key).
;WITH CTE as
(
SELECT top 1000
DETAILS
FROM
TBL_HISTORY
WHERE
DETAILS LIKE '%"%'
)
UPDATE CTE
SET DETAILS = REPLACE(DETAILS,'"','Times New Roman')
My main question is, in a single table, do the number of records NOT included in a WHERE clause affect query performance of SELECT, INSERT, and UPDATE?
Say I have a table with 20 million rows, and this table has an indexed error string column.
Pretend 19,950,000 of those records have 0 set for this column, and 50,000 have it set to NULL.
My query does SELECT * FROM pending_emails WHERE error IS NULL.
After some logic in my app, I then need to update those same records by ID to set their error:
UPDATE "pending_emails" SET "error" = '0' WHERE "pending_emails"."id" = 46
UPDATE "pending_emails" SET "error" = '0' WHERE "pending_emails"."id" = 50
I'm trying to determine if I can leave 'completed' records in the database without affecting performance of the active records I'm working with, or if I should delete them (not preferred).
Typically no. That's the purpose of indexing. You might want to consider a filtered index for this column: https://www.postgresql.org/docs/current/static/indexes-partial.html Then your index isn't even indexing the '0' rows at all.
I have an Access database table which sometimes contains duplicate ProfileIDs. I would like to create a query that excludes one (or more, if necessary) of the duplicate records.
The condition for a duplicate record to be excluded is: if the PriceBefore and PriceAfter fields are NOT equal, they are considered duplicate. If they are equal, the duplicate field remains.
In the example table above, the records with ID 7 and 8 have the same ProfileIDs. For ID 8, PriceBefore and PriceAfter are not equal, so this record should be excluded from the query. For ID 7, the two are equal, so it remains. Also note that PriceBefore and PriceAfter for ID 4 are the same, but as the ProfileID is not a duplicate, the record must remain.
What is the best way to do this? I am happy to use multiple queries if necessary.
Create a pointer query. Call it pQuery:
SELECT ProfileID, Sum(1) as X
FROM MyTableName
HAVING Sum(1) > 1
This will give you the ProfileID of every record that's part of a dupe.
Next, find the records where prices don't match. Call this pNoMatchQuery:
SELECT MyTableName.*
FROM MyTableName
INNER JOIN pQuery
ON pQuery.ProfileID = MyTableName.ProfileID
WHERE PriceBefore <> PriceAfter
You now have a query of every record that should be excluded from your dataset. If you want to permanently delete all of these records, run a DELETE query where you inner join your source table to pNoMatchQuery:
Delete MyTableName.*
From MyTableName
Where Exists( Select 1 From pNoMatchQuery Where pNoMatchQuery.ID = MyTableName.ID ) = True
First, make absolutely sure that pQuery and pNoMatchQuery are returning what you expect before you delete anything from your source table, because once it's gone it's gone for good (unless you make a backup first, which I would highly suggest before you run that delete the first time).
Need to keep running count of Rows in very large database. Row Count is needed enough times in my program that running Count(*) is too slow, so I will just keep running count to get around this in SQLITE.
CREATE TRIGGER RowCountUpdate AFTER INSERT ON LastSample
BEGIN
UPDATE BufferControl SET NumberOfSamples = NumberOfSamples +
(SELECT Count(*) FROM Inserted);
END;
So from here I want to take the current number of rows (NumberOfSamples) and increment it with how many rows were affected by the insert (do same with DELETE and decrementing). In the C API of Sqlite, this is done with Sqlite3_Changes(). However, I cannot use that function here in this script. I looked around and saw that some were using the SELECT Count(*) FROM Inserted, but I don't think Sqlite supports that.
Is there any statement that Sqlite recognizes that holds the amount of rows that were affected by the INSERT and DELETE queries?
SQLite has the changes() SQL function, but like the sqlite3_changes() API function, it reports the number of rows of the last completed statement.
During trigger execution, the triggering statement is not yet completed.
Just use a FOR EACH ROW trigger, and add 1 for each row.