Need to keep running count of Rows in very large database. Row Count is needed enough times in my program that running Count(*) is too slow, so I will just keep running count to get around this in SQLITE.
CREATE TRIGGER RowCountUpdate AFTER INSERT ON LastSample
BEGIN
UPDATE BufferControl SET NumberOfSamples = NumberOfSamples +
(SELECT Count(*) FROM Inserted);
END;
So from here I want to take the current number of rows (NumberOfSamples) and increment it with how many rows were affected by the insert (do same with DELETE and decrementing). In the C API of Sqlite, this is done with Sqlite3_Changes(). However, I cannot use that function here in this script. I looked around and saw that some were using the SELECT Count(*) FROM Inserted, but I don't think Sqlite supports that.
Is there any statement that Sqlite recognizes that holds the amount of rows that were affected by the INSERT and DELETE queries?
SQLite has the changes() SQL function, but like the sqlite3_changes() API function, it reports the number of rows of the last completed statement.
During trigger execution, the triggering statement is not yet completed.
Just use a FOR EACH ROW trigger, and add 1 for each row.
Related
I am trying to delete millions of records from 4 databases, and running into an unexpected error. I made a temp table that holds a list of all the id's I wish to delete:
CREATE TABLE #CaseList (case_id int)
INSERT INTO #CaseList
SELECT DISTINCT id
FROM my_table
WHERE <my criteria for choosing cases>
I have deleted all the associated records (with foreign key on case_id)
DELETE FROM image WHERE case_id in (SELECT case_id from #CaseList)
Then I'm deleting records from my_table in batches (so as not to blow up the transaction log - which despite my database being in Simple Mode - still grows when making changes like deletions):
DELETE FROM my_table WHERE id in (SELECT case_id
FROM #CaseList
ORDER by case_id
OFFSET 0 ROWS
FETCH NEXT 10000 ROWS ONLY)
This will work fine for one or three or five rounds (so I've deleted 10k-50k records), then will fail with this error message:
Msg 512, Level 16, State 1, Procedure trgd_image, Line 188
Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.
Which is really weird because as I said, I already deleted all the associated records from the image table. Then it gets weirder because if I select smaller batches, the deletion works without error.
I generally cut the FETCH NEXT n half (5k), then in half again (2500), then in half again (1200) etc. until it works
DELETE FROM my_table WHERE id in (SELECT case_id
FROM #CaseList
ORDER by case_id
OFFSET 50000 ROWS
FETCH NEXT 1200 ROWS ONLY)
Then repeat that amount until I get past where it failed, then turn it back up to 10000 and it will work again for a batch or three...
DELETE FROM my_table WHERE id in (SELECT case_id
FROM #CaseList
ORDER by case_id
OFFSET 60000 ROWS
FETCH NEXT 10000 ROWS ONLY)
then fail again with the same error... rinse, wash, and repeat.
What can cause that subquery error when there are NOT related records in the image table? Why would selecting the cases in smaller batches work "around it" and then allow larger batches again?
I would really love a solution to this so I can make a WHILE loop and run this deletion through the millions of rows that way instead of having to manage it manually which is going to take me weeks with millions of rows needed to be deleted out of 4 databases.
The query you're showing cannot produce the error you're seeing. If you're sure it is, you have a bug report. My guess is that trgd_image, Line 188 (or somewhere nearby) you'll find you're using a scalar comparison, = instead of in.
I also have some advice for you, free for the asking. I wrote lots of queries like yours, and never used anything like OFFSET 60000 ROWS FETCH NEXT 10000 ROWS ONLY. You don't need to, either, and your SQL will be easier to write if you don't.
First, unless your machine is seriously undersized for 2018 for the scale of data you're using, I think you'll find 100,000 row transactions are just fine. If not, at least try to understand why not. A machine managing many millions of rows ought to be able to deal with a 1% of them without breaking a sweat.
When you populate #CaseList, trap ##rowcount. Then you can print/record that, and compute the number of "chunks" in your work.
Ideally, though, there's no temporary table. Instead, those cases probably have some logical grouping you can operate on. They might have regions or owners or dates, whatever was used to select them in the first place. Iterate over that, e.g.
delete from T where id in (select id from S where user = 1
Once you do that, you can write a loop:
select #user = min(user) from S where ...
while #user is not NULL begin
print "deleting cases for user", #user
delete from T where id in (select id from S where user = #user)
select #u = #user
select #user = min(user) from S where ... and user > #u
end
That way, if the process blows up partway through -- for any reason -- you have a logical grouping of deletions and a clean break: you know all the cases for user (or whatever) less than #user are deleted, and you can look into what's wrong with the "current" one. Quite often, you'll discover that the problem isn't unique, and by solving it you'll prevent future problems with others.
I am currently performing analysis on a client's MSSQL Server. I've already fixed many issues (unnecessary indexes, index fragmentation, NEWID() being used for identities all over the shop etc), but I've come across a specific situation that I haven't seen before.
Process 1 imports data into a staging table, then Process 2 copies the data from the staging table using an INSERT INTO. The first process is very quick (it uses BULK INSERT), but the second takes around 30 mins to execute. The "problem" SQL in Process 2 is as follows:
INSERT INTO ProductionTable(field1,field2)
SELECT field1, field2
FROM SourceHeapTable (nolock)
The above INSERT statement inserts hundreds of thousands of records into ProductionTable, each row allocating a UNIQUEIDENTIFIER, and inserting into about 5 different indexes. I appreciate this process is going to take a long time, so my issue is this: while this import is taking place, a 3rd process is responsible for performing constant lookups on ProductionTable - in addition to inserting an additional record into the table as such:
INSERT INTO ProductionTable(fields...)
VALUES(values...)
SELECT *
FROM ProductionTable (nolock)
WHERE ID = #Id
For the 30 or so minutes that the INSERT...SELECT above is taking place, the INSERT INTO times-out.
My immediate thought is that SQL server is locking the entire table during the INSERT...SELECT. I did quite a lot of profiling on the server during my analysis, and there are definitely locks being allocated for the duration of the INSERT...SELECT, though I fail remember what type they were.
Having never needed to insert records into a table from two sources at the same time - at least during an ETL process - I'm not sure how to approach this. I've been looking up INSERT table hints, but most are being made obsolete in future versions.
It looks to me like a CURSOR is the only way to go here?
You could consider BULK INSERT for Process-2 to get the data into the ProductionTable.
Another option would be to batch Process-2 into small batches of around 1000 records and use a Table Valued Parameter to do the INSERT. See: http://msdn.microsoft.com/en-us/library/bb510489.aspx#BulkInsert
It seems like table lock.
Try portion insert in ETL process. Something like
while 1=1
begin
INSERT INTO ProductionTable(field1,field2)
SELECT top (1000) field1, field2
FROM SourceHeapTable sht (nolock)
where not exists (select 1 from ProductionTable pt where pt.id = sht.id)
-- optional
--waitfor delay '00:00:01.0'
if ##rowcount = 0
break;
end
I have a SQL Server 2008 table with 80,000 rows and am executing the following query:
UPDATE dbo.TableName WITH (ROWLOCK)
SET HelloWorldID = NULL
WHERE HelloWorldID = #helloWorldID
HelloWorldID is an int and the #helloWorldID parameter is also int.
The query is taking too long and I'd like to optimize it. I created a nonclustered index on HelloWorldID but it didn't matter. I may have to redesign this...maybe put the HelloWorldID on another table that links it to the TableName table?
Since the command you're waiting on is DELETE I have to guess that there is a trigger on dbo.TableName and that it is performing additional work that you do not expect. Or perhaps some CASCADE option that is affecting other tables that have triggers on them.
It all depends on how much rows will be updated by this query.
If you're updating a lot of rows, say 30% of the table, then the index will actually slow down the query (as index will be updated along with the table, and it won't help with filtering the rows for update). Also ROWLOCK will slow it down, because the engine will issue a separate lock for each row (as opposed to pagelocks that would occur normally).
Try removing the index and running this update using WITH(TABLOCK) just to see what happens.
I get this problem sometimes. Your query is dependent upon simultaneously getting a write-lock on every row in the table meeting the conditions of the WHERE-Clause . Depending on your needs for full 'ACID', you could do something like this:
SELECT getdate() -- force ##rowcount=1
while ##rowcount > 0
UPDATE TOP (1000) dbo.TableName
SET HelloWorldID = NULL
WHERE HelloWorldID = #helloWorldID
This will do the update is smaller chunks, and help overcome locking issues. But remember, this-method gives up on doing this-query as a single-transaction. You will need to tune the 1000 to a value that is right for your server.
I am updating 2 columns in a table that contains millions (85 million) of rows. Now to update these I am using a update command like,
UPDATE Table1
SET Table1.column1 = Table2.column1 ,
Table1.column2 = Table2.column2
FROM
Tables and with a Join-conditions;
Now my problem is, it is taking 23 hours for that. Even after using the batch size there is not much change in the time taken.
But I need to update it in less than 5 hours. Is that possible. What approach should I take to achieve it ?
SQL Update statements have to keep all the rows in the log file so it can roll-back on failure. As explained by this guy, the best way to handle millions of rows is to forget about atomicity and batch your updates into 50,000 rows (or whatever):
--Declare variable for row count
Declare #rc int
Set #rc=50000
While #rc=50000
Begin
Begin Transaction
--Use Top (50000) to limit number of updates
--performed in each batch to 50K rows.
--Use tablockx and holdlock to obtain and hold
--an immediate exclusive table lock. This unusually
--speeds the update because only one lock is needed.
Update Top (50000) MyTable With (tablockx, holdlock)
Set UpdFlag = 0
From MyTable mt
Join ControlTable ct
On mt.KeyCol=ct.PK
--Add criteria to avoid updating rows that
--were updated in previous pass
Where m.UpdFlag <> 0
--Get number of rows updated
--Process will continue until less than 50000
Select #rc=##rowcount
--Commit the transaction
Commit
End
This still has some problems in that you need to know which rows you've already handled, perhaps someone smarter than this guy (and me!) can figure something nicer with more MSSQL magic; but this should be a start.
I have used SSIS for doing this task.
First I have taken the source table in which I have to update the 2-columns. Then I have taken Look-Up task in which I have to mapped source columns to the destination table columns from which I have to get the data to update source table columns. Finally added OLEDB destination from where I'll fill the table basing on the joining conditions from the look-up.
This process was really fast than executing an update script.
I want to do some calculations when my table data is changed. However, I am updating my table manually and copy pasting about 3000 rows in once. That makes my trigger work 3000 times, but I want it to do the trigger only once.
Is there a way to do that ?
Thanks.
What's your statement?
If you have multiple inserts then it will fire for each insert you are running. If you want it to execute only once for multiple inserts you must:
Write your insert on a single statement such as insert into foo select * from bar
Your trigger can't be for each row
Other possibility might be:
Disable the trigger
Perform your insertions
Put the trigger code in a stored procedure
Run your stored procedure
If by 'manually' you mean you are copying and pasting into some User Interface tool (like an Access dataGrid) or something like that, then the tool may be issuing one insert statement per row, and in that case you are out of luck the database trigger will be executed once per insert statement. As other answers have mentioned, if you can insert the rows directly into the database, using a single insert statement, then the trigger will only fire once.
The issue is caused because you are manually pasting the 3000 rows. You really have 2 solutions. You can turn off the trigger by doing this:
ALTER TABLE tablename DISABLE TRIGGER ALL
-- do work here
ALTER TABLE tablename ENABLE TRIGGER ALL
and then run the contents of your trigger at then end or you can put your 3000 columns into a temp table and then insert them all at once. This will only setup 1 trigger. If this isn't enough please give us more info on what you are trying to do.
Your trigger will not fire 3000 times if you are modifying 3000 rows in a single statement. Your trigger will fire once and there will be 3000 rows in your virtual 'deleted' table.
If you are limited in how you can import the data into the system that does a single insert per row, I would suggest that you import data into an intermediate table then do the insert into the final table from the intermediate one.
There is a better way.
Re-link the tables that your trigger created or altered, compare them against what's expected to be changed or added and avoid the trigger with a simple WHERE clause.
eg - this trigger I use to INSERT a record, but only once based on a column value existing (#ack).
DECLARE #ack INT
SELECT #ack = (SELECT TOP 1 i.CUSTOM_BOOL_1 AS [agent_acknowledged] FROM inserted AS i)
IF #ack = 1
BEGIN
INSERT INTO TABLEA(
COLA, COLB, etc
)
SELECT
COLA, COLB, etc
from inserted as i
LEFT JOIN TABLEA AS chk --relink to the INSERT table to see if the record already exists
ON chk.COLA = i.COLA
AND chk.COLB = i.COLB
AND etc
WHERE chk.ID IS NULL --and here we say if NOT found, then continue to insert
END