SQL Server Join issue / alternate ways - sql-server

I have a query which I am trying to find out chain relationship between customer id. Currently 80k records are taking approx. 7 minutes. Could you please suggest some alternate improved ways?
Sample format is shown below. Here we are grouping based on records having relationships among them (a = b = c)
Create table #chaintable
(
CustID int,
MatchCustID int,
FN varchar(10),
LN varchar(10),
PhoneNo int,
Email varchar(50),
dtAppointment int
)
insert into #chaintable
Select 1,2,'Global','Chain',123,'',1
union all
Select 2,3,'Global','Chain',123,'a#a.com',2
union all
Select 3,2,'Global','Chain',567,'a#a.com',3
union all
Select 4,5,'Global1','Chain1',123,'a#a.com',1
union all
Select 5,4,'Global1','Chain1',123,'a#a.com',2
Select distinct
A.CustID, A.MatchCustID, A.GroupID
from
(select
c1.CustID, c1.MatchCustID, C1.dtAppointment,
case
when C1.CustID = C2.MatchCustID and C1.MatchCustID <> C2.CustID
then C1.CustID
when C1.CustID <> C2.MatchCustID and C1.MatchCustID = C2.CustID
then c1.MatchCustID
when C1.CustID = C2.MatchCustID and C1.MatchCustID = C2.CustID
then
case
when c1.CustID < C1.MatchCustID
then c1.CustID
else c1.MatchCustID
end
end GroupID
from
#chaintable C1, #chaintable C2
where
c1.CustID = c2.MatchCustID
or c1.MatchCustID = c2.CustID) A
Output:
CustID MatchCustID FN LN PhoneNo Email dtAppointment
---------------------------------------------------------
1 2 Global Chain 123 1
2 3 Global Chain 123 a#a.com 2
3 2 Global Chain 567 a#a.com 3
4 5 Global1 Chain1 123 a#a.com 1
5 4 Global1 Chain1 123 a#a.com 2

First, it is impossible to help improve performance of a query without knowing the execution plan.
There are certain problem here that I do not understand. Example, why do you have a join to the table itself and all of the outputs are values of the first table. Is the join really necessary? Or is it just for the sake of testing?
I suggest the below "logically equivalent" way of rewriting the query without using the OR in the JOIN and less work to understand the query to human (and if the computer feel the same, then it might improve).
SELECT DISTINCT c1.CustID, c1.MatchCustID,
CASE WHEN (C1.MatchCustID <> c2.CustID)
OR (c1.CustID < c1.MatchCustID) THEN c1.CustID
ELSE c1.MatchCustID END AS GroupID
FROM #chaintable C1 JOIN #chaintable C2
ON c1.CustID = c2.MatchCustID
UNION
SELECT c1.CustID, c1.MatchCustID,
c1.MatchCustID AS GroupID
FROM #chaintable C1 JOIN #chaintable C2
ON c2.CustID = c1.MatchCustID AND C1.CustID<>C2.MatchCustID

First, try to stick to the standard when adding rows to a table. While UNION ALL may be efficient enough to handle your simple rows, it seems rather verbose for a large set of inserts like you mentioned. However you do this, make sure you treat them as sets of relational data and avoid unnecessary steps.
Furthermore, CARTESIAN JOIN is old SQL syntax, today's OUTER and INNER JOINs more proficient, and as such this old style of join is only really useful in niche cases. This not being one of them.
Looking at your table and the results, the following about your table structure are observed:
CustID is independent of this table.
MatchCustID has no barring on the order
Appointment is the durable Key.
FN and LN together form an ID that dictates the GROUPID
So then, a solution may be as follows:
Create table #chaintable
(
CustID int, MatchCustID int, FN varchar(10), LN varchar(10)
, PhoneNo, Email varchar(50), dtAppointment int
)
INSERT INTO #chaintable
VALUES (1,2,'Global','Chain',123,'',1)
, (2,3,'Global','Chain',123,'a#a.com',2)
, (3,2,'Global','Chain',567,'a#a.com',3)
, (4,5,'Global1','Chain1',123,'a#a.com',1)
, (5,4,'Global1','Chain1',123,'a#a.com',2)
SELECT CustID
,MatchCustID
,dtAppointment
, FN
, LN
, DENSE_RANK() OVER (ORDER BY FN + LN DESC ) AS GroupID
FROM #chaintable
Resuls:
CustID MatchCustID dtAppointment FN LN GroupID
1 2 1 Global Chain 1
2 3 2 Global Chain 1
3 2 3 Global Chain 1
4 5 1 Global1 Chain1 2
5 4 2 Global1 Chain1 2
The only catch here is how the unique ID was used. In this example, since I have no value that uniquely identifies the event chains, I used FN + LN to bring back an order.
This has several advantages:
You avoid costly Cartesian JOINs by passing through your rows once.
GROUPID Will always be the same for each group in your final table.
Performing prechecks will not be difficult:
DECLARE #GROUPID = (SELECT MAX(GROUPID) FROM <SourceTable> )
However, this also has drawbacks:
Adding to this table may require you break up the insert statements to check for existing GROUPIDs (which sadly your data does not already have). Which in your case, the columns that determines GROUPID is FN + LN.
I would suggest a temp table that grabs the unique FN + LN values and then runs an outer apply operation such as
Example
SELECT FN + LN FROM #chaintable A
WHERE NOT EXISTS (SELECT 1
FROM #chaintable
WHERE A.FN = FN
AND A.LN = LN)
Before running an insert statement that adds the pre-value we checked earlier in the insert statement:
DECLARE #GROUPID = (SELECT ISNULL(MAX(GROUPID), 0) FROM <SourceTable> )
INSERT INTO FINAL_TABLE (CustID, MatchCustID, FN, LN, PhoneNo, Email, dtAppointment)
SELECT CustID
, MatchCustID
, FN
, LN
, PhoneNo
, Email
, dtAppointment
, #GROUPID + DENSE_RANK() OVER (ORDER BY FN + LN DESC ) AS GroupID
FROM #chaintable_sub
This will always result in the next GROUPID being of greater number than the previous entries.
Finally, I would advise you treat this data as it really is: Dirty Data. You have to perform ETL transformations on it, particularly since you have a durable key with a composite ID key...so essentially a FACT table.

Related

How can I access a specific field in a named subquery when the field name might not be unique?

I am trying to create a routine that can accept an SQL query as a string and the [table].[primaryKey] of the primary record in the returned dataset, then wrap that original query to implement pagination (return records 40-49 when requesting page 4 and 10 records per page).
The dataset returned by the original queries will frequently contain multiple instances of the primary record, one for each occurrence of supporting records. For the example provided, if a customer has three phone numbers on record the results for that customer in the original query would look like:
{5; John Smith; 205 W. Fort St; 17; Home; 123-123-4587}
{5; John Smith; 205 W. Fort St; 18; Work; 123-123-8547}
{5; John Smith; 205 W. Fort St; 19; Mobile; 123-123-1147}
I'm almost there, I think, with the following query:
DECLARE #PageNumber int = 4;
DECLARE #RecordsPerPage int = 10;
WITH OriginalQuery AS (
SELECT [Customer].[Id],
[Customer].[Name],
[Customer].[Address],
[Phone].[Id],
[Phone].[Type],
[Phone].[Number]
FROM [Customer] INNER JOIN [Phone] ON [Customer].[Id] = [Phone].[CustomerId]
)
SELECT [WrappedQuery].[RowNumber], [OriginalQuery].* FROM (
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) [RowNumber], *
FROM (
SELECT DISTINCT [OriginalQuery].[{Customer.Id}] [PrimaryKey]
FROM [OriginalQuery]
) [RuwNumberQuery]
) [WrappedQuery]
INNER JOIN [OriginalQuery] ON [WrappedQuery].[PrimaryKey] = [OriginalQuery].[{Customer.Id}]
WHERE [WrappedQuery].[RowNumber] >= #PageNumber
AND [WrappedQuery].[RowNumber] < #PageNumber + #RecordsPerPage
This solution performs a SELECT DISTINCT on the primary key for the Primary (Customer) record and uses the SQL routine Row_Number() then joins the result with the results of the original query such that each unique primary (customer) record is numbered 1 - {end of file}, and I can pull only the RowNumber counts that I want.
But because OriginalQuery may have multiple fields named Id (from different tables), I can't figure out how to properly access [Customer].[Id] in my SELECT DISTINCT clause of [RowNumberQuery] or in the INNER JOIN.
Is there a better way to implement pagination at the SQL level, or a more direct method of accessing the field I need from within the subquery based on the table to which it belongs?
EDIT:
I've caused confusion in the pagination I am looking for. I am using Dapper in C# to compile the resulting dataset into individual complex objects, so the goal in the example would be to retrieve customers 31-40 in the list regardless of how many individual records exist for each customer. If Customer 31 had five phone records, Customer 32 had three phone records, Customer 33 had 1 phone record, and the remaining seven customers had two phone records each, I would expect the resulting dataset to contain 23 records total, but only 10 distinct customers.
SOLUTION
Thank you for all of the assistance, and I apologize for those areas I should have clarified sooner. I am creating a toolset that will allow C# Data Access Libraries to implement a set of standard parameters. If I have an option to implement the pagination in an internal function that can accept the SQL statement, I can defer to the toolset and not have to remember (or count on others to remember) to add the appropriate text each time. I'll set it up to return the finished objects, but if I were going to just modify the original query string it would look like:
public static string AddPagination(string sql, string primaryKey, Parameter requestParameters)
{
return $"WITH OriginalQuery AS ({sql.Replace("SELECT ", $"SELECT DENSE_RANK() OVER (ORDER BY {primaryKey}) AS PrimaryRecordCount, ",StringComparison.OrdinalIgnoreCase)}) " +
$"SELECT TOP ({requestParameters.MaxRecords}) * " +
$"FROM OriginalQuery " +
$"WHERE PrimaryRecordCount >= 1 + (({requestParameters.PageNumber - 1}) * {requestParameters.RecordsPerPage})" +
$" AND PrimaryRecordCount <= {requestParameters.Page} * {requestParameters.Limit}";
}
Just give your columns a different alias in your original query, e.g. [Customer].[Id] AS CustomerId, [Phone].[Id] AS PhoneId..., then you can reference OriginalQuery.CustomerId, or OriginalQuery.PhoneId
e.g.
DECLARE #PageNumber int = 4;
DECLARE #RecordsPerPage int = 10;
WITH OriginalQuery AS (
SELECT [Customer].[Id] AS CustomerId,
[Customer].[Name],
[Customer].[Address],
[Phone].[Id] AS PhoneId,
[Phone].[Type],
[Phone].[Number]
FROM [Customer] INNER JOIN [Phone] ON [Customer].[Id] = [Phone].[CustomerId]
)
SELECT [WrappedQuery].[RowNumber], [OriginalQuery].* FROM (
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) [RowNumber], *
FROM (
SELECT DISTINCT [OriginalQuery].[{Customer.Id}] [PrimaryKey]
FROM [OriginalQuery]
) [RuwNumberQuery]
) [WrappedQuery]
INNER JOIN [OriginalQuery] ON [WrappedQuery].[PrimaryKey] = [OriginalQuery].[CustomerId]
WHERE [WrappedQuery].[RowNumber] >= #PageNumber
AND [WrappedQuery].[RowNumber] < #PageNumber + #RecordsPerPage
It's worth noting that your paging logic is wrong too. Currently you are adding page number to the number of pages so you are searching for:
Page 1: Customers 1 - 10
Page 2: Customers 2 - 11
Page 3: Customers 3 - 12
Your logic should be:
WHERE [WrappedQuery].[RowNumber] >= 1 + ((#PageNumber - 1) * #RecordsPerPage)
AND [WrappedQuery].[RowNumber] <= (#PageNumber * #RecordsPerPage)
Page 1: Customers 1 - 10
Page 2: Customers 11 - 20
Page 3: Customers 21 - 30
With that being said, you could just use DENSE_RANK() Rather than ROW_NUMBER which would simplify everything. I think this would give you the same result:
DECLARE #PageNumber int = 4;
DECLARE #RecordsPerPage int = 10;
WITH OriginalQuery AS (
SELECT c.Id AS CustomerId,
c.Name,
c.Address,
p.Id AS PhoneId,
p.Type,
p.Number,
DENSE_RANK() OVER(ORDER BY c.Id) AS RowNumber
FROM Customer AS c INNER JOIN Phone AS p ON c.Id = p.CustomerId
)
SELECT oq.CustomerId, oq.Name, oq.Address, oq.PhoneId, oq.Type, oq.Number
FROM OriginalQuery AS oq
WHERE oq.RowNumber >= 1 +((#PageNumber - 1) * #RecordsPerPage)
AND oq.RowNumber <= (#PageNumber * #RecordsPerPage);
I've added table aliases to try and make the code a bit cleaner, and also removed all the unnecessary square brackets. This is not necessary, but I personally find them quite hard on the eye, and only use them to escape key words.
Another difference is that in adding ORDER BY c.CustomerId you ensure consistent results for your paging. Using ORDER BY (SELECT NULL) implies that you don't care about the order, but you should if you using it for paging.
There are many concerns with what you are trying to do and you might be better off explaining why you are trying to make this process.
SQL query as a string
You are receiving a SQL query as a string, how are you parsing that string into the OriginalQuery CTE? This has both concerns about sql injection and concerns about global temp tables if you are using those.
Secondly, your example isn't doing pagination as it is commonly understood. If someone were to request page 1, 10 records per page, the calling application would expect to receive the first 10 records of the result set but your example will returns all records for the first 10 customers. Meaning the result could be 40+ if they each had 4 phone numbers as in your example data.
You should take a look at OFFSET and FETCH NEXT, as well as why this requirement to parse an arbitrary SQL string. There is probably a better way to do that.
Here is a rough example using OFFSET and FETCH NEXT from a static query, and returning only #RecordsPerPage number of records.
DECLARE #PageNumber int = 1;
DECLARE #RecordsPerPage int = 10;
SELECT [Customer].[Id],
[Customer].[Name],
[Customer].[Address],
[Phone].[Id],
[Phone].[Type],
[Phone].[Number]
FROM [Customer] INNER JOIN [Phone] ON [Customer].[Id] = [Phone].[CustomerId]
ORDER BY [Customer].[Id]
OFFSET (#PageNumber-1)*#RecordsPerPage rows
FETCH NEXT #RecordsPerPage ROWS ONLY
If you wanted to return all records for the the RecordsPerPage number of entries which have a corresponding phone number, then it would be something like...
DECLARE #PageNumber int = 1;
DECLARE #RecordsPerPage int = 10;
SELECT [Customer].[Id],
[Customer].[Name],
[Customer].[Address],
[Phone].[Id],
[Phone].[Type],
[Phone].[Number]
FROM [Customer] INNER JOIN [Phone] ON [Customer].[Id] = [Phone].[CustomerId]
WHERE Customer.ID IN (
SELECT DISTINCT Customer.ID FROM Customer INNER JOIN [Phone] ON [Customer].[Id] = [Phone].[CustomerId]
ORDER BY [Customer].[Id]
OFFSET (#PageNumber-1)*#RecordsPerPage rows
FETCH NEXT #RecordsPerPage ROWS ONLY
)
This does leave a question, what is the point of this query when the calling application can just use their own OFFSET and FETCH NEXT? They already have the SQL to generate the initial dataset, all they need to do is add OFFSET / FETCH NEXT to the end of it and they have their own pagination without trying to wrap it in a procedure of some sort.
To create a comparison, would you create a stored procedure that accepts a SQL string and then filters specific fields by specific values? Or would the people calling that stored procedure just add a Where clause to their own queries instead?
You can use alias name for the cuplicated column.
For example:
WITH OriginalQuery AS (
SELECT [Customer].[Id] as CustomerID,
[Customer].[Name],
[Customer].[Address],
[Phone].[Id] as PhoneID,
[Phone].[Type],
[Phone].[Number]
FROM [Customer] INNER JOIN [Phone] ON [Customer].[Id] = [Phone].[CustomerId]
)
now you can use the 2 ids whit the alias name for the next query.

Getting non-deterministic results from WITH RECURSIVE cte

I'm trying to create a recursive CTE that traverses all the records for a given ID, and does some operations between ordered records. Let's say I have customers at a bank who get charged a uniquely identifiable fee, and a customer can pay that fee in any number of installments:
WITH recursive payments (
id
, index
, fees_paid
, fees_owed
)
AS (
SELECT id
, index
, fees_paid
, fee_charged
FROM table
WHERE index = 1
UNION ALL
SELECT t.id
, t.index
, t.fees_paid
, p.fees_owed - p.fees_paid
FROM table t
JOIN payments p
ON t.id = p.id
AND t.index = p.index + 1
)
SELECT *
FROM payments
ORDER BY 1,2;
The join logic seems sound, but when I join the output of this query to the source table, I'm getting non-deterministic and incorrect results.
This is my first foray into Snowflake's recursive CTEs. What am I missing in the intermediate result logic that is leading to the non-determinism here?
I assume this is edited code, because in the anchor of you CTE you select the fourth column fee_charged which does not exist, and then in the recursion you don't sum the fees paid and other stuff, basically you logic seems rather strange.
So creating some random data, that has two different id streams to recurse over:
create or replace table data (id number, index number, val text);
insert into data
select * from values (1,1,'a'),(2,1,'b')
,(1,2,'c'), (2,2,'d')
,(1,3,'e'), (2,3,'f')
v(id, index, val);
Now altering you CTE just a little bit to concat that strings together..
WITH RECURSIVE payments AS
(
SELECT id
, index
, val
FROM data
WHERE index = 1
UNION ALL
SELECT t.id
, t.index
, p.val || t.val as val
FROM data t
JOIN payments p
ON t.id = p.id
AND t.index = p.index + 1
)
SELECT *
FROM payments
ORDER BY 1,2;
we get:
ID INDEX VAL
1 1 a
1 2 ac
1 3 ace
2 1 b
2 2 bd
2 3 bdf
Which is exactly as I would expect. So how this relates to your "it gets strange when I join to other stuff" is ether, your output of you CTE is not how you expect it to be.. Or your join to other stuff is not working as you expect, Or there is a bug with snowflake.
Which all comes down to, if the CTE results are exactly what you expect, create a table and join that to your other table, so eliminate some form of CTE vs JOIN bug, and to debug why your join is not working.
But if your CTE output is not what you expect, then lets help debug that.

SQL Function to return sequential id's

Consider this simple INSERT
INSERT INTO Assignment (CustomerId,UserId)
SELECT CustomerId,123 FROM Customers
That will obviously assign UserId=123 to all customers.
What I need to do is assign them to 3 userId's sequentially, so 3 users get one third of the accounts equally.
INSERT INTO Assignment (CustomerId,UserId)
SELECT CustomerId,fnGetNextId() FROM Customers
Could I create a function to return sequentially from a list of 3 ID's?, i.e. each time the function is called it returns the next one in the list?
Thanks
Could I create a function to return sequentially from a list of 3 ID's?,
If you create a SEQUENCE, then you can assign incremental numbers with the NEXT VALUE FOR (Transact-SQL) expression.
This is a strange requirement, but the modulus operator (%) should help you out without the need for functions, sequences, or altering your database structure. This assumes that the IDs are integers. If they're not, you can use ROW_NUMBER or a number of other tactics to get a distinct number value for each customer.
Obviously, you would replace the SELECT statement with an INSERT once you're satisfied with the code, but it's good practice to always select when developing before inserting.
SETUP WITH SAMPLE DATA:
DECLARE #Users TABLE (ID int, [Name] varchar(50))
DECLARE #Customers TABLE (ID int, [Name] varchar(50))
DECLARE #Assignment TABLE (CustomerID int, UserID int)
INSERT INTO #Customers
VALUES
(1, 'Joe'),
(2, 'Jane'),
(3, 'Jon'),
(4, 'Jake'),
(5, 'Jerry'),
(6, 'Jesus')
INSERT INTO #Users
VALUES
(1, 'Ted'),
(2, 'Ned'),
(3, 'Fred')
QUERY:
SELECT C.Name AS [CustomerName], U.Name AS [UserName]
FROM #Customers C
JOIN #Users U
ON
CASE WHEN C.ID % 3 = 0 THEN 1
WHEN C.ID % 3 = 1 THEN 2
WHEN C.ID % 3 = 2 THEN 3
END = U.ID
You would change the THEN 1 to whatever your first UserID is, THEN 2 with the second UserID, and THEN 3 with the third UserID. If you end up with another user and want to split the customers 4 ways, you would do replace the CASE statement with the following:
CASE WHEN C.ID % 4 = 0 THEN 1
WHEN C.ID % 4 = 1 THEN 2
WHEN C.ID % 4 = 2 THEN 3
WHEN C.ID % 4 = 3 THEN 4
END = U.ID
OUTPUT:
CustomerName UserName
-------------------------------------------------- --------------------------------------------------
Joe Ned
Jane Fred
Jon Ted
Jake Ned
Jerry Fred
Jesus Ted
(6 row(s) affected)
Lastly, you will want to select the IDs for your actual insert, but I selected the names so the results are easier to understand. Please let me know if this needs clarification.
Here's one way to produce Assignment as an automatically rebalancing view:
CREATE VIEW dbo.Assignment WITH SCHEMABINDING AS
WITH SeqUsers AS (
SELECT UserID, ROW_NUMBER() OVER (ORDER BY UserID) - 1 AS _ord
FROM dbo.Users
), SeqCustomers AS (
SELECT CustomerID, ROW_NUMBER() OVER (ORDER BY CustomerID) - 1 AS _ord
FROM dbo.Customers
)
-- INSERT Assignment(CustomerID, UserID)
SELECT SeqCustomers.CustomerID, SeqUsers.UserID
FROM SeqUsers
JOIN SeqCustomers ON SeqUsers._ord = SeqCustomers._ord % (SELECT COUNT(*) FROM SeqUsers)
;
This shifts assignments around if you insert a new user, which could be quite undesirable, and it's also not efficient if you had to JOIN on it. You can easily repurpose the query it contains for one-time inserts (the commented-out INSERT). The key technique there is joining on ROW_NUMBER()s.

How do I get the "Next available number" from an SQL Server? (Not an Identity column)

Technologies: SQL Server 2008
So I've tried a few options that I've found on SO, but nothing really provided me with a definitive answer.
I have a table with two columns, (Transaction ID, GroupID) where neither has unique values. For example:
TransID | GroupID
-----------------
23 | 4001
99 | 4001
63 | 4001
123 | 4001
77 | 2113
2645 | 2113
123 | 2113
99 | 2113
Originally, the groupID was just chosen at random by the user, but now we're automating it. Thing is, we're keeping the existing DB without any changes to the existing data(too much work, for too little gain)
Is there a way to query "GroupID" on table "GroupTransactions" for the next available value of GroupID > 2000?
I think from the question you're after the next available, although that may not be the same as max+1 right? - In that case:
Start with a list of integers, and look for those that aren't there in the groupid column, for example:
;WITH CTE_Numbers AS (
SELECT n = 2001
UNION ALL
SELECT n + 1 FROM CTE_Numbers WHERE n < 4000
)
SELECT top 1 n
FROM CTE_Numbers num
WHERE NOT EXISTS (SELECT 1 FROM MyTable tab WHERE num.n = tab.groupid)
ORDER BY n
Note: you need to tweak the 2001/4000 values int the CTE to allow for the range you want. I assumed the name of your table to by MyTable
select max(groupid) + 1 from GroupTransactions
The following will find the next gap above 2000:
SELECT MIN(t.GroupID)+1 AS NextID
FROM GroupTransactions t (updlock)
WHERE NOT EXISTS
(SELECT NULL FROM GroupTransactions n WHERE n.GroupID=t.GroupID+1 AND n.GroupID>2000)
AND t.GroupID>2000
There are always many ways to do everything. I resolved this problem by doing like this:
declare #i int = null
declare #t table (i int)
insert into #t values (1)
insert into #t values (2)
--insert into #t values (3)
--insert into #t values (4)
insert into #t values (5)
--insert into #t values (6)
--get the first missing number
select #i = min(RowNumber)
from (
select ROW_NUMBER() OVER(ORDER BY i) AS RowNumber, i
from (
--select distinct in case a number is in there multiple times
select distinct i
from #t
--start after 0 in case there are negative or 0 number
where i > 0
) as a
) as b
where RowNumber <> i
--if there are no missing numbers or no records, get the max record
if #i is null
begin
select #i = isnull(max(i),0) + 1 from #t
end
select #i
In my situation I have a system to generate message numbers or a file/case/reservation number sequentially from 1 every year. But in some situations a number does not get use (user was testing/practicing or whatever reason) and the number was deleted.
You can use a where clause to filter by year if all entries are in the same table, and make it dynamic (my example is hardcoded). if you archive your yearly data then not needed. The sub-query part for mID and mID2 must be identical.
The "union 0 as seq " for mID is there in case your table is empty; this is the base seed number. It can be anything ex: 3000000 or {prefix}0000. The field is an integer. If you omit " Union 0 as seq " it will not work on an empty table or when you have a table missing ID 1 it will given you the next ID ( if the first number is 4 the value returned will be 5).
This query is very quick - hint: the field must be indexed; it was tested on a table of 100,000+ rows. I found that using a domain aggregate get slower as the table increases in size.
If you remove the "top 1" you will get a list of 'next numbers' but not all the missing numbers in a sequence; ie if you have 1 2 4 7 the result will be 3 5 8.
set #newID = select top 1 mID.seq + 1 as seq from
(select a.[msg_number] as seq from [tblMSG] a --where a.[msg_date] between '2023-01-01' and '2023-12-31'
union select 0 as seq ) as mID
left outer join
(Select b.[msg_number] as seq from [tblMSG] b --where b.[msg_date] between '2023-01-01' and '2023-12-31'
) as mID2 on mID.seq + 1 = mID2.seq where mID2.seq is null order by mID.seq
-- Next: a statement to insert a row with #newID immediately in tblMSG (in a transaction block).
-- Then the row can be updated by your app.

Assign Unique ID within groups of records

I have a situation where I need to add an arbitrary unique id to each of a group of records. It's easier to visualize this below.
Edited 11:26 est:
Currently the lineNum field has garbage. This is running on sql server 2000. The sample that follows is what the results should look like but the actual values aren't important, the numbers could anything as long as the two combined fields can be used for a unique key.
OrderID lineNum
AAA 1
AAA 2
AAA 3
BBB 1
CCC 1
CCC 2
The value of line num is not important, but the field is only 4 characters. This needs to by done in a sql server stored procedure. I have no problem doing it programatically.
Assuming your using SQL Server 2005 or better you can use Row_Number()
select orderId,
row_number() over(PARTITION BY orderId ORDER BY orderId) as lineNum
from Order
While adding a record to the table, you could create the "linenum" field dynamically:
In Transact-SQL, something like this:
Declare #lineNum AS INT
-- Get next linenum
SELECT #lineNum = MAX(COALESCE(linenum, 0)) FROM Orders WHERE OrderID = #OrderID
SET #lineNum = #lineNum + 1
INSERT INTO ORDERS (OrderID, linenum, .....)
VALUES (#OrderID, #lineNum, ....)
You could create a cursor that reads all values sorted, then at each change in value resets the 1 then steps through incrementing each time.
E.g.:
AAA reset 1
AAA set 1 + 1 = 2
AAA set 2 + 1 = 3
BBB reset 1
CCC reset 1
CCC set 1 + 1 = 1
Hmmmmm, could you create a view that returns the line number information in order and group it based on your order ID? Making sure the line number is always returned in the same order.
Either that or you could use a trigger and on the insert calculate the max id for the order?
Or perhaps you could use a select from max statement on the insert?
Perhaps none of these are satisfactory?
If you're not using SQL 2005 this is a slightly more set based way to do this (I don't like temp tables much but I like cursors less):
declare #out table (id tinyint identity(1,1), orderid char(4))
insert #out select orderid from THESOURCETABLE
select
o.orderid, o.id - omin.minid + 1 as linenum
from #out o
inner join
(select orderid, min(id) minid from #out group by orderid) as omin on
o.orderid = omin.orderid

Resources