Delete few entries from a table SQL Server - sql-server

I have 42,715,078 entries in one of my table, that I would like to delete TOP 42,715,000 rows (i want to keep just 78 entries).
Any one know who can I do that??
PS: I dont want to delete the table, just want delete the entries of table.

Probably your best bet is to select out the 78 rows you want to keep into a temporary table, then truncate the table and insert them back in.
SELECT * INTO #temp FROM TableName WHERE <Condition that gets you the 78 rows you want>
Or if you don't have a specific 78 rows
SELECT TOP 78 * INTO #temp FROM TableName
Then
TRUNCATE TABLE TableName
And last but not least
INSERT INTO TableName
SELECT * FROM #temp
Doing it this way should be considerably faster depending on what condition you use to get the 78 rows and you avoid bloating the log as TRUNCATE is only minimally logged.

We have an activity log that we truncate once a month. (We keep the monthly backups, so we can get back to any old data if we want to.) If your table is growing every month and you want to keep it small like we do with ours, you can set up a SQL Agent Job to run each month.
We only remove 5000 rows at a time to keep the load of the database, so this job runs every two minutes for an hour. That gives it enough time to remove all the oldest rows without locking the database.
DECLARE #LastDate DateTime -- We remove the oldest rows by month
DECLARE #NumberOfRows INT -- Number of rows to keep
-- Set the Date to the current date minus 3 months.
SET #LastDate = DATEADD(MM, -3, GETDATE())
-- Since it runs on the first Saturday of each month, this code gets it
back to the first of the monh.
SET #LastDate = CAST(CAST(DATEPART(YYYY, #LastDate) AS varchar) + '-' + CAST(DATEPART(MM, #LastDate) AS varchar) + '-01' AS DATETIME)
-- We use 5000.
SET #NumberOfRows = 5000
DELETE TOP (#NumberOfRows) FROM MyTable WHERE Created < #LastDate

I got it.
DELETE TOP (42715000)
FROM <tablename>
WHERE <condition>
It worked so well!

Related

SQL Server : update trigger seeming to affect wrong column

Thanks for looking. I'm trying to write a SQL Server trigger that when a new record is added containing date information, will add the day of the week to the DayOfWeek column. Here's my table, with the columns in order:
Food table:
FoodName **varchar(20)**
CategoryID (FK) **int**
Price **smallmoney**
StoreID (FK) **int**
Date **datetime**
DayOfWeek **varchar(9)**
ShopperID (FK) **int**
Week **int**
Here is the trigger I've written:
-- Create a trigger to update day of the week when a record is inserted
CREATE TRIGGER DOW
ON Food
FOR INSERT
AS
BEGIN
-- Declare a variable to hold the date ID
DECLARE #dateID DATETIME
-- Get the date from the new record and store it in #dateID
SELECT #dateID = Date FROM Food
-- Insert day of the week based on the inserted date
INSERT INTO Food (DayOfWeek)
SELECT DATENAME(dw, #dateID)
END
GO
SQL Server seemed to accept the procedure, but when I ran another procedure to insert a new record, I got this error:
Msg 515, Level 16, State 2, Procedure DOW, Line 8 [Batch Start Line 21]
Cannot insert the value NULL into column 'Week', table *******; column does not allow nulls. INSERT fails.
I am not sure why this trigger is affecting the 'Week' column at all. The code should take the value entered for the Date and use the DATENAME(dw,...) function to return the day of the week, which should go into the DayOfWeek column. I've written a stored procedure that accepts a date as input and inserts the corresponding day of the week into the record, and it works just fine, but this trigger doesn't seem to want to cooperate. I'm stumped!
What your trigger does:
it fetches a Date from your table (the last one that is returned) which is not necessarily the last inserted value.
it tries to insert a new record with just the DayOfWeek of that Date specified.
it fails, because at least the Week must also be specified.
I guess that you want to update the value of the DayOfWeek for the inserted row(s) instead. To be able to do so, there must be a way to identify the row(s) that need to be updated in the Food table by knowing the values of the inserted rows. To be sure to update the correct rows, there should be a primary key that allows you to identify them. For sure you have such a primary key, and I guess that it's named FoodID, so probably you wanted to do this:
CREATE TRIGGER DOW ON Food
FOR INSERT
AS
BEGIN
SET NOCOUNT ON;
-- update the day of the week for the inserted rows
UPDATE Food
SET [DayOfWeek] = DATENAME(dw, f.[Date])
FROM Food f
INNER JOIN inserted i ON f.FoodID = i.FoodID
END
GO
There are some major problems with your trigger. In triggers, there is an inserted table (on inserts and updates) and deleted table (on deletes and updates). You should be using this table's information to know what records need updated.
This is bad because a trigger can have multiple rows
This SQL simply will not work correctly if you insert multiple rows.
DECLARE #dateID DATETIME
SELECT #dateID = Date FROM Food
This SQL is trying to insert a new row which is causing your NULL error
It is not trying to update the row you are inserting
INSERT INTO Food (DayOfWeek)
SELECT DATENAME(dw, #dateID)
It would need to be an INSTEAD OF trigger to avoid the null constraint on the column. Wolfgang's answer will still cause a null constraint error, because after triggers run AFTER the data is inserted. An INSTEAD OF trigger will run in place of the the actual insert.
CREATE TRIGGER DOW ON Food
INSTEAD OF INSERT
AS
BEGIN
SET NOCOUNT ON;
-- update the day of the week for the inserted rows
INSERT INTO Food (FoodName,CategoryID,Price,StoreID,[Date],ShopperID,[Week],[DayOfWeek])
SELECT
FoodName,CategoryID,Price,StoreID,[Date],ShopperID,[Week],DATENAME(dw, [Date]) AS [DayOfWeek]
FROM inserted
END
GO
Personally, I think storing the week and day of week is a bad idea. You already have a value that can derive that information (Date). Any time you have multiple columns that are essentially duplicate data, you will run into maintenance pain.

MS SQL Delete records older than month

I have data stored in ms sql database. I want delete all records older than some date.
For this, a service is used that sends a request once a day, like:
delete from [log].[HttpRequestLogEntries] where DateTimeUtc < dateadd(day, -3, getutcdate())
and it work fine, but very slowly. In my table can be over 10kk rows and deleting may take hours for the work.
How to solve this problem in the best way?
If there is not an existing index with a first column of [DateTimeUtc], you might try adding one. Indexing the column in the search criteria has improved mass delete performance on some of our databases. The trade-off is that inserts and updates may take additional time to maintain index entries.
Consider deleting fewer rows at a time. If you delete more than 5,000 rows at once, the delete query may attempt to escalate to a table lock. If there is a lot of concurrent activity, the attempt to acquire a table lock may block while other requests complete.
For example, this loop deletes 4,000 rows maximum at a time:
declare #RowCount int = 1
while #RowCount > 0
begin
delete top (4000)
from [log].[HttpRequestLogEntries]
where DateTimeUtc < dateadd(day, -3, getutcdate())
select #RowCount = ##rowcount
end
Also, check for database triggers. If a trigger is firing when rows are deleted, it is possible code in the trigger is causing a long delay.

Stored procedure to archive data older than 6 months (180) days

We are trying to create a stored procedure to archive data older than 6 months (180 days) from our production database in to a new archive database.
We also want to delete those archived rows from the production database.
We are thinking to include a while loop, but we want to archive only 10,000 rows a day and we need to schedule it on daily basis.
Can you please share us your experience.
Thanks
Maybe delete into would work for you? Found something useful here: https://msdn.microsoft.com/en-us/library/ms177564.aspx
USE AdventureWorks2012;
GO
DECLARE #MyTableVar TABLE
(
ProductID INT NOT NULL
);
DELETE TOP (10000) ph
OUTPUT DELETED.ProductID INTO #MyTableVar
FROM Production.ProductProductPhoto AS ph
WHERE DATEDIFF(DAY, ph.YourDay, GETDATE()) > 180
--Display the results of the table variable.
SELECT *
FROM #MyTableVar

Update Primary table with String match from another table

This is my table structure
Temp_camp(id int identity,email varchar(100),shot_id bigint)
insert into Temp_camp values(email)('xyz#gmail.com'),('y2k#yahoo.com'),('maaki#quora.com')
and other table structure
tb_adhar(shot_id bigint,email varchar(100))
insert into tb_adhar values(100,'xyz#gmail.com'),(200'y2k#yahoo.com')
The tb_adhar get a bulk load after 6PM with 10^6 records every day..
so i need to update Temp_Camp(shot_id) with shot_id from tb_adhar
i wrote this query but it takes quite a long time to procees!
here is the query
update temp_camp
set shot_id=t.Shot_id from tb_adhar t join temp_camp c on t.email=c.email
I only wanted to know if there are bugs in code?? i dont want to mess up with the client data!
What you could do to increase performance if you haven't done so already. Since you are joining tables on email you could add additional index on email column on both tables and then on tb_adhar table include shot_id column in the index.
I should note that adding these indexes will reduce the speed at witch inserts are being done.
CREATE NONCLUSTERED INDEX Temp_camp_index_email
ON Temp_camp(email)
GO
CREATE NONCLUSTERED INDEX tb_adhar_index_email
ON tb_adhar (email)
INCLUDE (Shot_id);
GO
I think your code is OK but you're doing pretty big update and SQL Server doesn't like that much. I had similar issue and I resolved it by splitting the whole operation into smaller transactions. Try this out:
DECLARE
#ChunkSize int = 100000,
#ChunkNumber int,
#ChunkID int = 0
SELECT #ChunkNumber = COUNT(*) / #ChunkSize + 1
FROM tb_adhar
WHILE #ChunkID < #ChunkNumber
BEGIN
update temp_camp
set shot_id=t.Shot_id
from (select * from (
select tt.*,
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) rnum
FROM tb_adhar tt)
WHERE rnum BETWEEN #ChunkID * #ChunkSize AND (#ChunkID + 1) * #ChunkSize) t
join temp_camp c on t.email=c.email
SET #ChunkID += 1
END
One more thing: you're saying that tb_adhar table gets 10^6 new records every day. Try to find a way to identify only new records. Otherwise your query does the same thing every time - let's say it updates 20mil old rows to the same values and 1mil new rows - the actual useful work.

SQL Server best way to calculate datediff between current row and next row?

I've got the following rough structure:
Object -> Object Revisions -> Data
The Data can be shared between several Objects.
What I'm trying to do is clean out old Object Revisions. I want to keep the first, active, and a spread of revisions so that the last change for a time period is kept. The Data might be changed a lot over the course of 2 days then left alone for months, so I want to keep the last revision before the changes started and the end change of the new set.
I'm currently using a cursor and temp table to hold the IDs and date between changes so I can select out the low hanging fruit to get rid of. This means using #LastID, #LastDate, updates and inserts to the temp table, etc...
Is there an easier/better way to calculate the date difference between the current row and the next row in my initial result set without using a cursor and temp table?
I'm on sql server 2000, but would be interested in any new features of 2005, 2008 that could help with this as well.
Here is example SQL. If you have an Identity column, you can use this instead of "ActivityDate".
SELECT DATEDIFF(HOUR, prev.ActivityDate, curr.ActivityDate)
FROM MyTable curr
JOIN MyTable prev
ON prev.ObjectID = curr.ObjectID
WHERE prev.ActivityDate =
(SELECT MAX(maxtbl.ActivityDate)
FROM MyTable maxtbl
WHERE maxtbl.ObjectID = curr.ObjectID
AND maxtbl.ActivityDate < curr.ActivityDate)
I could remove "prev", but have it there assuming you need IDs from it for deleting.
If the identity column is sequential you can use this approach:
SELECT curr.*, DATEDIFF(MINUTE, prev.EventDateTime,curr.EventDateTime) Duration FROM DWLog curr join DWLog prev on prev.EventID = curr.EventID - 1
Hrmm, interesting challenge. I think you can do it without a self-join if you use the new-to-2005 pivot functionality.
Here's what I've got so far, I wanted to give this a little more time before accepting an answer.
DECLARE #IDs TABLE
(
ID int ,
DateBetween int
)
DECLARE #OID int
SET #OID = 6150
-- Grab the revisions, calc the datediff, and insert into temp table var.
INSERT #IDs
SELECT ID,
DATEDIFF(dd,
(SELECT MAX(ActiveDate)
FROM ObjectRevisionHistory
WHERE ObjectID=#OID AND
ActiveDate < ORH.ActiveDate), ActiveDate)
FROM ObjectRevisionHistory ORH
WHERE ObjectID=#OID
-- Hard set DateBetween for special case revisions to always keep
UPDATE #IDs SET DateBetween = 1000 WHERE ID=(SELECT MIN(ID) FROM #IDs)
UPDATE #IDs SET DateBetween = 1000 WHERE ID=(SELECT MAX(ID) FROM #IDs)
UPDATE #IDs SET DateBetween = 1000
WHERE ID=(SELECT ID
FROM ObjectRevisionHistory
WHERE ObjectID=#OID AND Active=1)
-- Select out IDs for however I need them
SELECT * FROM #IDs
SELECT * FROM #IDs WHERE DateBetween < 2
SELECT * FROM #IDs WHERE DateBetween > 2
I'm looking to extend this so that I can keep at maximum so many revisions, and prune off the older ones while still keeping the first, last, and active. Should be easy enough through select top and order by clauses, um... and tossing in ActiveDate into the temp table.
I got Peter's example to work, but took that and modified it into a subselect. I messed around with both and the sql trace shows the subselect doing less reads. But it does work and I'll vote him up when I get my rep high enough.

Resources