We are using the technique outlined here to generate random record IDs without collisions. In short, we create a randomly-ordered table of every possible ID, and mark each record as 'Taken' as it is used.
I use the following Stored Procedure to obtain an ID:
ALTER PROCEDURE spc_GetId #retVal BIGINT OUTPUT
AS
DECLARE #curUpdate TABLE (Id BIGINT);
SET NOCOUNT ON;
UPDATE IdMasterList SET Taken=1
OUTPUT DELETED.Id INTO #curUpdate
WHERE ID=(SELECT TOP 1 ID FROM IdMasterList WITH (INDEX(IX_Taken)) WHERE Taken IS NULL ORDER BY SeqNo);
SELECT TOP 1 #retVal=Id FROM #curUpdate;
RETURN;
The retrieval of the ID must be an atomic operation, as simultaneous inserts are possible.
For large inserts (10+ million), the process is quite slow, as I must pass through the table to be inserted via a cursor.
The IdMasterList has a schema:
SeqNo (BIGINT, NOT NULL) (PK) -- sequence of ordered numbers
Id (BIGINT) -- sequence of random numbers
Taken (BIT, NULL) -- 1 if taken, NULL if not
The IX_Taken index is:
CREATE NONCLUSTERED INDEX (IX_Taken) ON IdMasterList (Taken ASC)
I generally populate a table with Ids in this manner:
DECLARE #recNo BIGINT;
DECLARE #newId BIGINT;
DECLARE newAdds CURSOR FOR SELECT recNo FROM Adds
OPEN newAdds;
FETCH NEXT FROM newAdds INTO #recNo;
WHILE ##FETCH_STATUS=0 BEGIN
EXEC spc_GetId #newId OUTPUT;
UPDATE Adds SET id=#newId WHERE recNo=#recNo;
FETCH NEXT FROM newAdds INTO #id;
END;
CLOSE newAdds;
DEALLOCATE newAdds;
Questions:
Is there any way I can improve the SP to extract Ids faster?
Would a conditional index improve peformance (I've yet to test, as
IdMasterList is very big)?
Is there a better way to populate a table with these Ids?
As with most things in SQL Server, if you are using cursors, you are doing it wrong.
Since you are using SQL Server 2012, you can use a SEQUENCE to keep track of what random value you already used and effectively replace the Taken column.
CREATE SEQUENCE SeqNoSequence
AS bigint
START WITH 1 -- Start with the first SeqNo that is not taken yet
CACHE 1000; -- Increase the cache size if you regularly need large blocks
Usage:
CREATE TABLE #tmp
(
recNo bigint,
SeqNo bigint
)
INSERT INTO #tmp (recNo, SeqNo)
SELECT recNo,
NEXT VALUE FOR SeqNoSequence
FROM Adds
UPDATE Adds
SET id = m.id
FROM Adds a
INNER JOIN #tmp tmp ON a.recNo = tmp.recNo
INNER JOIN IdMasterList m ON tmp.SeqNo = m.SeqNo
SEQUENCE is atomic. Subsequent calls to NEXT VALUE FOR SeqNoSequence are guaranteed to return unique values, even for parallel processes. Note that there can be gaps in SeqNo, but it's a very small trade off for the huge speed increase.
Put a PK inden of BigInt on each table
insert into user (name)
values ().....
update user set = user.ID = id.ID
from id
left join usr
on usr.PK = id.PK
where user.ID = null;
one
insert into user (name) value ("justsaynotocursor");
set #PK = select select SCOPE_IDENTITY();
update user set ID = (select ID from id where PK = #PK);
Few ideas that came to my mind:
Try if removing the top, inner select etc. helps to improve the performance of the ID fetching (look at statistics io & query plan):
UPDATE top(1) IdMasterList
SET #retVal = Id, Taken=1
WHERE Taken IS NULL
Change the index to be a filtered index, since I assume you don't need to fetch numbers that are taken. If I remember correctly, you can't do this for NULL values, so you would need to change the Taken to be 0/1.
What actually is your problem? Fetching single IDs or 10+ million IDs? Is the problem CPU / I/O etc. caused by the cursor & ID fetching logic, or are the parallel processes being blocked by other processes?
Use sequence object to get the SeqNo. and then fetch the Id from idMasterList using the value returned by it. This could work if you don't have gaps in IdMasterList sequences.
Using READPAST hint could help in blocking, for CPU / I/O issues, you should try to optimize the SQL.
If the cause is purely the table being a hotspot, and no other easy solutions seem to help, split it into several tables and use some kind of simple logic (even ##spid, rand() or something similar) to decide from which table the ID should be fetched. You would need more checking if all tables have free numbers, but it shouldn't be that bad.
Create different procedures (or even tables) to handle fetching of single ID, hundreds of IDs and millions of IDs.
Related
I'm currently working on a stored procedure in SQL Server 2012 using T-SQL. My problem: I have several SWOTs (e.g. for a specific client) holding several SWOTParts (strengths, weaknesses, opportunities, and threats). I store the values in a table Swot as well as in another table SwotPart.
My foreign Key link is SwotId in SwotPart, thus 1 Swot can hold N SwotParts. Hence, I store the SwotId in every SwotPart.
I can have many Swots and now need to set the SwotId correctly to create the foreign key. I set the SwotId using SCOPE_IDENTITY() unfortunately it only takes the last SwotId from the DB.I'm looking for something like a for loop to increment the SwotId after each insert on the 1st insert.
DECLARE #SwotId INT = 1;
-- 1st insert
SET NOCOUNT ON
INSERT INTO [MySchema].[SWOT]([SwotTypeId]) // Type can be e.g. a sepcific client
SELECT SwotTypeId
FROM #SWOTS
SET #SwotId = SCOPE_IDENTITY(); // currently e.g. 7, but should increment: 1, 2, 3...
-- 2nd insert
SET NOCOUNT ON
INSERT INTO [MySchema].[SwotPart]([SwotId], [FieldTypeId], [Label]) // FieldType can be e.g. Streangh
SELECT #SwotId, FieldTypeId, Label
FROM #SWOTPARTS
Do you know how to solve this issue? What could I use instead of SCOPE_IDENTITY()?
Thank you very much!
You can output the inserted rows into a temporary table, then join your #swotparts to the temporary table based on the natural key (whatever unique column set ties them together beyond the SwotId). This would solve the problem with resorting to loops or cursors, while also overcoming the obstacle of doing a single swot at a time.
set nocount, xact_abort on;
create table #swot (SwotId int, SwotTypeId int);
insert into MySchema.swot (SwotTypeId)
output inserted.SwotId, inserted.SwotTypeId into #swot
select SwotTypeId
from #swots;
insert into MySchema.SwotPart(SwotId, FieldTypeId, Label)
select s.SwotId, p.FieldTypeId, p.Label
from #swotparts p
inner join #swot s
on p.SwotTypeId = p.SwotTypeId;
Unfortunately I cant comment so I`ll leave you an answer hopefully to clarify some things:
Since you need to create the correct foreign key I don`t understand
why do you need to increment a value instead of using the id inserted
into the SWOT table.
I suggest returning the inserted id using the SCOPE_IDENTITY right after the insert statement and use it for you insert into the swot parts (there is plenty of info about it and how to use it)
DECLARE #SwotId INT;
-- 1st insert
INSERT INTO [MySchema].[SWOT]([SwotTypeId]) // Type can be e.g. a sepcific client
SET #SwotId = SCOPE_IDENTITY();
-- 2nd insert
INSERT INTO [MySchema].[SwotPart]([SwotId], [FieldTypeId], [Label])
SELECT #SwotId, FieldTypeId, Label
FROM #SWOTPARTS
I have two tables
existing_bactria (may contain millions of rows)
new_bactria (may contain millions of rows)
sample tables:
CREATE TABLE [dbo].[existing_bacteria](
[bacteria_name] [nchar](10) NULL,
[bacteria_type] [nchar](10) NULL,
[bacteria_sub_type] [nchar](10) NULL,
[bacteria_size] [nchar](10) NULL,
[bacteria_family] [nchar](10) NULL,
[bacteria_discovery_year] [date] NOT NULL
)
CREATE TABLE [dbo].[new_bacteria](
[existing_bacteria_name] [nchar](10) NULL,
[bacteria_type] [nchar](10) NULL,
[bacteria_sub_type] [nchar](10) NULL,
[bacteria_size] [nchar](10) NULL,
[bacteria_family] [nchar](10) NULL,
[bacteria_discovery_year] [date] NOT NULL
)
I need to create a stored proc to update new_bactria table with a possible match from existing_bactria (update field new_bactria.existing_bacteria_name
By finding a match on the other fields from [existing_bacteria] (assuming only single record in existing_bacteria)
Since the tables are massive (millions of records each) I would like your opinion on how to go about the solution, here is what I got so far:
Solution 1:
the obvious solution is to fetch all into a cursor and iterate over the results and update existing_bacteria
But since there are million records - its not an optimal solution
-- pseudo code
db_cursor as select * from new_bacteria
OPEN db_cursor
FETCH NEXT FROM db_cursor INTO #row
WHILE ##FETCH_STATUS = 0
BEGIN
IF EXISTS (
SELECT
#bacteria_name = [bacteria_name]
,#bacteria_type = [bacteria_type]
,#bacteria_size = [bacteria_size]
FROM [dbo].[existing_bacteria]
where [bacteria_type] = #row.[bacteria_type] and #row.[bacteria_size] = [bacteria_size]
)
BEGIN
PRINT 'update new_bacteria.existing_bacteria_name with [bacteria_name] we found.';
END
-- go to next record
FETCH NEXT FROM db_cursor INTO #name
END
Solution 2:
solution2 is to Join both tables in the mssql procedure
and iterate on the results but this is also
-- pseudo code
select * from [new_bacteria]
inner join [existing_bacteria]
on [new_bacteria].bacteria_size = [existing_bacteria].bacteria_size
and [new_bacteria].bacteria_family = [existing_bacteria].bacteria_family
for each result update [existing_bacteria]
I am sure this is not an optimal because of the table size and the iteration
Solution 3:
solution3 is to let the db handle the data and update the tables directly using inner Join:
-- pseudo code
UPDATE R
SET R.existing_bacteria_name = p.[bacteria_name]
FROM [new_bacteria] AS R
inner join [existing_bacteria] P
on R.bacteria_size = P.bacteria_size
and R.bacteria_family = P.bacteria_family
I am not sure about this solution.
Based on your pseudo code, I'd go with solution 3 because it is a set based operation and should be much quicker than using a cursor or other loop.
If you are having issues with performance with solution 3...
and you don't have indexes on those tables, particularly those columns you are using to join the two tables, creating those would help.
create unique index uix_new_bacteria_bacteria_size_bacteria_family
on [new_bacteria] (bacteria_size,bacteria_family);
create unique index uix_existing_bacteria_bacteria_size_bacteria_family
on [existing_bacteria] (bacteria_size,bacteria_family) include (bacteria_name);
and then try:
update r
set r.existing_bacteria_name = p.[bacteria_name]
from [new_bacteria] AS R
inner join [existing_bacteria] P on R.bacteria_size = P.bacteria_size
and R.bacteria_family = P.bacteria_family;
Updating a few million rows should not be a problem with the right indexes.
This section is no longer relevant after an update to the question
Another issue possibly exists in that if bacteria_size and bacteria_family are not unique sets, you could have multiple matches.
(since they are nullable I would imagine they aren't unique unless you're using a filtered index)
In that case, before moving forward, I'd create a table to investigate multiple matches like this:
create table [dbo].[new_and_existing_bacteria_matches](
[existing_bacteria_name] [nchar](10) not null,
rn int not null,
[bacteria_type] [nchar](10) null,
[bacteria_sub_type] [nchar](10) null,
[bacteria_size] [nchar](10) null,
[bacteria_family] [nchar](10) null,
[bacteria_discovery_year] [date] not null,
constraint pk_new_and_existing primary key clustered ([existing_bacteria_name], rn)
);
insert into [new_and_existing_bacteria_matches]
([existing_bacteria_name],rn,[bacteria_type],[bacteria_sub_type],[bacteria_size],[bacteria_family],[bacteria_discovery_year])
select
e.[existing_bacteria_name]
, rn = row_number() over (partition by e.[existing_bacteria_name] order by n.[bacteria_type], n.[bacteria_sub_type])
, n.[bacteria_type]
, n.[bacteria_sub_type]
, n.[bacteria_size]
, n.[bacteria_family]
, n.[bacteria_discovery_year]
from [new_bacteria] as n
inner join [existing_bacteria] e on n.bacteria_size = e.bacteria_size
and n.bacteria_family = e.bacteria_family;
-- and query multiple matches with something like this:
select *
from [new_and_existing_bacteria_matches] n
where exists (
select 1
from [new_and_existing_bacteria_matches] i
where i.[existing_bacteria_name]=n.[existing_bacteria_name]
and rn>1
);
On the subject of performance I'd look at:
The "Recovery Model" of the database, if your DBA says you can have it in "simple mode" then do it, you want to have as little logging as possible.
Consider Disabling some Indexes on the TARGET table, and then rebuilding them when you've finished. On large scale operations the modifications to the index will lead to extra logging, and the manipulation of the index will take up space in your Buffer Pool.
Can you convert the NCHAR to CHAR, it will require less storage space consequently reducing IO, freeing up buffer space and reducing Logging.
If your target table has no Clustered index then try activating 'TraceFlag 610' (warning this is an Instance-wide setting so talk to your DBA)
If your environment allows it, the use of the TABLOCKX hint can remove locking overhead and also help meet the criteria for reduced logging.
For anyone who has to perform Bulk Inserts or Large scale updates, this white paper from Microsoft is a valuable read:
You can try a MERGE statement. It will perform the operation in a single pass of the data. (the problem with a merge is that it tries to do everything in one Transaction and you can end up with an unwanted Spool in the execution plan. I'd then move towards a Batch process looping through maybe 100,000 records at a time.)
(It will need some minor changes to suit your column matching/update requirements)
MERGE [dbo].[new_bacteria] T --TARGET TABLE
USING [dbo].[existing_bacteria] S --SOURCE TABLE
ON
S.[bacteria_name] = T.[existing_bacteria_name] --FILEDS TO MATCH ON
AND S.[bacteria_type] = T.[bacteria_type]
WHEN MATCHED
AND
ISNULL(T.[bacteria_sub_type],'') <> ISNULL(S.[bacteria_sub_type],'') --FIELDS WHERE YOURE LOOKING FOR A CHANGE
OR ISNULL(T.[bacteria_size],'') <> ISNULL(S.[bacteria_size],'')
THEN --UPDATE RECORDS THAT HAVE CHANGED
UPDATE
SET T.[bacteria_sub_type] = S.[bacteria_sub_type]
WHEN NOT MATCHED BY TARGET THEN --ANY NEW RECORDS IN THE SOURCE TABLE WILL BE INSERTED
INSERT(
[existing_bacteria_name],
[bacteria_type],
[bacteria_sub_type],
[bacteria_size],
[bacteria_family],
[bacteria_discovery_year]
)
VALUES(
s.[bacteria_name],
s.[bacteria_type],
s.[bacteria_sub_type],
s.[bacteria_size],
s.[bacteria_family],
s.[bacteria_discovery_year]
);
If the Single MERGE is too much for your system to handle, here's a method for embedding it in a loop that updates large batches. You can modify the batch size to match your Server's capabilities.
It works by using a couple of staging tables that ensure if anything goes wrong (i.e. server agent restart), the process can continue from where it left off. (If you have any question please ask).
--CAPTURE WHAT HAS CHANGED SINCE THE LAST TIME THE SP WAS RUN
--EXCEPT is a usefull command because it can compare NULLS, this removes the need for ISNULL or COALESCE
INSERT INTO [dbo].[existing_bacteria_changes]
SELECT
*
FROM
[dbo].[existing_bacteria]
EXCEPT
SELECT
*
FROM
[dbo].[new_bacteria]
--RUN FROM THIS POINT IN THE EVENT OF A FAILURE
DECLARE #R INT = 1
DECLARE #Batch INT = 100000
WHILE #R > 0
BEGIN
BEGIN TRAN --CARRY OUT A TRANSACTION WITH A SUBSET OF DATA
--USE DELETE WITH OUTPUT TO MOVE A BATCH OF RECORDS INTO A HOLDING AREA.
--The holding area will provide a rollback point so if the job fails at any point it will restart from where it last was.
DELETE TOP (#Batch)
FROM [dbo].[existing_bacteria_changes]
OUTPUT DELETED.* INTO [dbo].[existing_bacteria_Batch]
##ROWCOUNT
--LOG THE NUMBER OF RECORDS IN THE UPDATE SET, THIS WILL ENSURE THE NEXT ITTERATION
SET #R = ISNULL(##ROWCOUNT,0)
--RUN THE MERGE STATEMENT WITH THE SUBSET OF UPDATES
MERGE [dbo].[new_bacteria] T --TARGET TABLE
USING [dbo].[existing_bacteria_Batch] S --SOURCE TABLE
ON
S.[bacteria_name] = T.[existing_bacteria_name] --FILEDS TO MATCH ON
AND S.[bacteria_type] = T.[bacteria_type]
WHEN MATCHED
AND
ISNULL(T.[bacteria_sub_type],'') <> ISNULL(S.[bacteria_sub_type],'') --FIELDS WHERE YOURE LOOKING FOR A CHANGE
OR ISNULL(T.[bacteria_size],'') <> ISNULL(S.[bacteria_size],'')
THEN --UPDATE RECORDS THAT HAVE CHANGED
UPDATE
SET T.[bacteria_sub_type] = S.[bacteria_sub_type]
WHEN NOT MATCHED BY TARGET THEN --ANY NEW RECORDS IN THE SOURCE TABLE WILL BE INSERTED
INSERT(
[existing_bacteria_name],
[bacteria_type],
[bacteria_sub_type],
[bacteria_size],
[bacteria_family],
[bacteria_discovery_year]
)
VALUES(
s.[bacteria_name],
s.[bacteria_type],
s.[bacteria_sub_type],
s.[bacteria_size],
s.[bacteria_family],
s.[bacteria_discovery_year]
);
COMMIT;
--No point in logging this action
TRUNCATE [dbo].[existing_bacteria_Batch]
END
Definitely option 3. SET-based always wins from anything loopy.
That said, the biggest 'risk' might be that the amount of updated data 'overwhelms' your machine. More specific, it could happen that the transaction becomes so big that the system takes forever to finish it. To avoid this you could try splitting the one big UPDATE into multiple smaller UPDATEs and still work mostly set-based. Good indexing and knowing your data is key here.
For instance, starting from
UPDATE R
SET R.existing_bacteria_name = p.[bacteria_name]
FROM [new_bacteria] AS R
INNER JOIN [existing_bacteria] P
ON R.bacteria_size = P.bacteria_size
AND R.bacteria_family = P.bacteria_family
You might try 'chunk' the (target) table into smaller parts. E.g. by making a loop over the bacteria_discovery_year field, assuming that said column splits the table into e.g. 50 more or less equally sized parts. (BTW: I'm no biologist so I might be totally wrong there =)
You'd then get something along the lines of:
DECLARE #c_bacteria_discovery_year date
DECLARE year_loop CURSOR LOCAL STATIC
FOR SELECT DISTINCT bacteria_discovery_year
FROM [new_bacteria]
ORDER BY bacteria_discovery_year
OPEN year_loop
FETCH NEXT FROM year_loop INTO #c_bacteria_discovery_year
WHILE ##FETCH_STATUS = 0
BEGIN
UPDATE R
SET R.existing_bacteria_name = p.[bacteria_name]
FROM [new_bacteria] AS R
INNER JOIN [existing_bacteria] P
ON R.bacteria_size = P.bacteria_size
AND R.bacteria_family = P.bacteria_family
WHERE R.bacteria_discovery_year = #c_bacteria_discovery_year
FETCH NEXT FROM year_loop INTO #c_bacteria_discovery_year
END
CLOSE year_loop
DEALLOCATE year_loop
Some remarks:
Like I said, I don't know the distribution of the bacteria_discovery_year values, if 3 years make up 95% of the data it might not be such a great choice.
This will only work if there is an index on the bacteria_discovery_year column, preferably with bacteria_size and bacteria_family included.
You could add some PRINT inside the loop to see the progress and rows affected... it won't speed up anything, but it feels better if you know it's doing something =)
All in all, don't overdo it, if you split it into too many small chunks you'll end up with something that takes forever too.
PS: in any case you'll also need an index on the 'source' table that indexes the bacteria_size and bacteria_family column, preferably including the bacteria_name if the latter is not the (clustered) PK of the table.
I am struggling to find a SQL Server replacement for select for update that works.
I have a master table that contains a column which is used for next order number. The application does a select from update on this row, reads the current value (while locked) adds one to this value and then updates the row, then uses the number it received. This process works perfectly on all databases I've tried but for SQL Server which does not seem to have any process for selecting data for exclusive use.
How do I do a locked read and update of something like a next order number from a sequence table is SQL Server?
BTW, I know I can use things like IDENTITY cols and stuff, to do this, but in this case I must read from this existing column. Get the value and inc it, and do it in a safe locked manner to avoid 2 users getting the same value.
UPDATE::
Thank you, that works for this case :)
DECLARE #Output char(30)
UPDATE scheme.sysdirm
SET #Output = key_value = cast(key_value as int)+1
WHERE system_key='OPLASTORD'
SELECT #Output
I have one other place I do something similar. I read and lock a stock record too.
SELECT STOCK
FROM PRODUCT
WHERE ID = ? FOR UPDATE.
I then do some validation and the do
UPDATE PRODUCT SET STOCK = ?
WHERE ID=?
I can't just use your above method here, as the value I update is based on things I do from the stock I read. But I need to ensure no one else can mess with the stock while I do this. Again, easy on other DB's with SELECT FOR UPDATE... is there a SQL Server workaround?? :)
You can simple do an UPDATE that also reads out the new value into a SQL Server variable:
DECLARE #Output INT
UPDATE dbo.YourTable
SET #Output = YourColumn = YourColumn + 1
WHERE ID = ????
SELECT #Output
Since it's an atomic UPDATE statement, it's safe against concurrency issues (since only one connection can get an update locks at any one given time). A potential second session that wants to get the incremented value at the same time will have to wait until the first one completes, thus getting the next value from the table.
As an alternative you can use the OUTPUT clause of the UPDATE statement, although this will insert into a table variable.
Create table YourTable
(
ID int,
YourColumn int
)
GO
INSERT INTO YourTable VALUES (1, 1)
GO
DECLARE #Output TABLE
(
YourColumn int
)
UPDATE YourTable
SET YourColumn = YourColumn + 1
OUTPUT inserted.YourColumn INTO #Output
WHERE ID = 1
SELECT TOP 1 YourColumn
FROM #Output
**** EDIT
If you want to ensure that no-one can change the data after you have read it, you can use a repeatable read. You should be aware that any reads of any tables you do will be locked for Update (pessimistic locking) and may cause Deadlocking. You can also sue the SELECT ... FROM TABLE (UPDLOCK) hint within a transaction.
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ
BEGIN TRANSACTION
SELECT STOCK
FROM PRODUCT
WHERE ID = ?
.....
...
UPDATE Product
SET Stock = nnn
WHERE ID = ?
COMMIT TRANSACTION
I have a licensing scenario where when a person activates a new system it adds the old activations to a lockout table so they can only have their latest X systems activated. I need to pass a parameter of how many recent activations to keep and all older activations should be added to the lockout table if they are not already locked out. I'm not sure how best to do this, i.e. a temp table (which I've never done) etc.
For example, an activation comes in from John Doe on System XYZ. I would then need to query the activations table for all activations by John Doe and sort it by DATE DESC. John Doe may have a license allowing two systems in this case so I need all records older than the top 2 deactivated, i.e. inserted into a lockouts table.
Thanks in advance for your assistance.
Something like this perhaps?
insert into lockouts
(<column list>)
select <column list>
from (select <column list>,
row_number() over (order by date desc) as RowNum
from activations) t
where t.RowNum > #NumLicenses
It'd probably be easiest to couple to row_number() over with a view or table-valued function:
WITH ActivationRank AS
(
SELECT SystemId,ProductId,CreatedDate,ROW_NUMBER() OVER(PARTITION BY ProductId ORDER BY CreatedDate DESC) AS RANK
FROM [Activations]
)
SELECT SystemId, ProductId, CASE WHEN RANK < #lockoutParameterOrConstant 0 ELSE 1 END AS LockedOut
FROM ActivationRank
Before you invest time to read and try my approach, I want to say that Joe Stefanelli's answer is an excellent one - short, compact, advanced and probably better than mine, espacially in terms of performance. On the other hand, performance might not be your first concern (how many activations to you expect per day? per hour? per minute?) and my example may be easier to read and understand.
As I don't know how your database schema is set up, I had do to some assumptions on it. You probably won't be able to use this code as a copy and paste template, but it should give you an idea on how to do it.
You were talking about a lockout table, so I reckon you have a reason to duplicate portions of the data into a second table. If possible, I would rather use a lockout flag in the table containing the systems data, but obviously that depends on your scenario.
Please be aware that I currently do not have access to a SQL Server, so I could not check the validity of the code. I tried my best, but there may be typos in it even though.
First assumption: A minimalistic "registered systems" table:
CREATE TABLE registered_systems
(id INT NOT NULL IDENTITY,
owner_id INT NOT NULL,
system_id VARCHAR(MAX) NOT NULL,
activation_date DATETIME NOT NULL)
Second assumption: A minimalistic "locked out systems" table:
CREATE TABLE locked_out_systems
(id INT NOT NULL,
lockout_date DATETIME NOT NULL)
Then we can define a stored procedure to activate a new system. It takes the owner_id, the number of allowed systems and of course the new system id as parameters.
CREATE PROCEDURE register_new_system
#owner_id INT,
#allowed_systems_count INT,
#new_system_id VARCHAR(MAX)
AS
BEGIN TRANSACTION
-- Variable declaration
DECLARE #sid INT -- Storage for a system id
-- Insert the new system
INSERT INTO registered_systems
(owner_id, system_id, activation_date)
VALUES
(#owner_id, #system_od, GETDATE())
-- Use a cursor to query all registered-and-not-locked-out systems for this
-- owner. Skip the first #allowed_systems_count systems, then insert the
-- remaining ones into the lockout table.
DECLARE c_systems CURSOR FAST_FORWARD FOR
SELECT system_id FROM
registered_systems r
LEFT OUTER JOIN
locked_out_systems l
ON r.system_id = l.system_id
WHERE l.system_id IS NULL
ORDER BY r.activation_date DESC
OPEN c_systems
FETCH NEXT FROM c_systems INTO #sid
WHILE ##FETCH_STATUS = 0
BEGIN
IF #allowed_systems_count > 0
-- System still allowed, just decrement the counter
SET #allowed_systems_count = #allowed_systems_count -1
ELSE
-- All allowed systems used up, insert this one into lockout table
INSERT INTO locked_out_systems
(id, lockout_date)
VALUES
(#sid, GETDATE())
FETCH NEXT FROM c_systems INTO #sid
END
CLOSE c_systems
DEALLOCATE c_systems
COMMIT
I've got the following rough structure:
Object -> Object Revisions -> Data
The Data can be shared between several Objects.
What I'm trying to do is clean out old Object Revisions. I want to keep the first, active, and a spread of revisions so that the last change for a time period is kept. The Data might be changed a lot over the course of 2 days then left alone for months, so I want to keep the last revision before the changes started and the end change of the new set.
I'm currently using a cursor and temp table to hold the IDs and date between changes so I can select out the low hanging fruit to get rid of. This means using #LastID, #LastDate, updates and inserts to the temp table, etc...
Is there an easier/better way to calculate the date difference between the current row and the next row in my initial result set without using a cursor and temp table?
I'm on sql server 2000, but would be interested in any new features of 2005, 2008 that could help with this as well.
Here is example SQL. If you have an Identity column, you can use this instead of "ActivityDate".
SELECT DATEDIFF(HOUR, prev.ActivityDate, curr.ActivityDate)
FROM MyTable curr
JOIN MyTable prev
ON prev.ObjectID = curr.ObjectID
WHERE prev.ActivityDate =
(SELECT MAX(maxtbl.ActivityDate)
FROM MyTable maxtbl
WHERE maxtbl.ObjectID = curr.ObjectID
AND maxtbl.ActivityDate < curr.ActivityDate)
I could remove "prev", but have it there assuming you need IDs from it for deleting.
If the identity column is sequential you can use this approach:
SELECT curr.*, DATEDIFF(MINUTE, prev.EventDateTime,curr.EventDateTime) Duration FROM DWLog curr join DWLog prev on prev.EventID = curr.EventID - 1
Hrmm, interesting challenge. I think you can do it without a self-join if you use the new-to-2005 pivot functionality.
Here's what I've got so far, I wanted to give this a little more time before accepting an answer.
DECLARE #IDs TABLE
(
ID int ,
DateBetween int
)
DECLARE #OID int
SET #OID = 6150
-- Grab the revisions, calc the datediff, and insert into temp table var.
INSERT #IDs
SELECT ID,
DATEDIFF(dd,
(SELECT MAX(ActiveDate)
FROM ObjectRevisionHistory
WHERE ObjectID=#OID AND
ActiveDate < ORH.ActiveDate), ActiveDate)
FROM ObjectRevisionHistory ORH
WHERE ObjectID=#OID
-- Hard set DateBetween for special case revisions to always keep
UPDATE #IDs SET DateBetween = 1000 WHERE ID=(SELECT MIN(ID) FROM #IDs)
UPDATE #IDs SET DateBetween = 1000 WHERE ID=(SELECT MAX(ID) FROM #IDs)
UPDATE #IDs SET DateBetween = 1000
WHERE ID=(SELECT ID
FROM ObjectRevisionHistory
WHERE ObjectID=#OID AND Active=1)
-- Select out IDs for however I need them
SELECT * FROM #IDs
SELECT * FROM #IDs WHERE DateBetween < 2
SELECT * FROM #IDs WHERE DateBetween > 2
I'm looking to extend this so that I can keep at maximum so many revisions, and prune off the older ones while still keeping the first, last, and active. Should be easy enough through select top and order by clauses, um... and tossing in ActiveDate into the temp table.
I got Peter's example to work, but took that and modified it into a subselect. I messed around with both and the sql trace shows the subselect doing less reads. But it does work and I'll vote him up when I get my rep high enough.