Assign Unique ID within groups of records - sql-server

I have a situation where I need to add an arbitrary unique id to each of a group of records. It's easier to visualize this below.
Edited 11:26 est:
Currently the lineNum field has garbage. This is running on sql server 2000. The sample that follows is what the results should look like but the actual values aren't important, the numbers could anything as long as the two combined fields can be used for a unique key.
OrderID lineNum
AAA 1
AAA 2
AAA 3
BBB 1
CCC 1
CCC 2
The value of line num is not important, but the field is only 4 characters. This needs to by done in a sql server stored procedure. I have no problem doing it programatically.

Assuming your using SQL Server 2005 or better you can use Row_Number()
select orderId,
row_number() over(PARTITION BY orderId ORDER BY orderId) as lineNum
from Order

While adding a record to the table, you could create the "linenum" field dynamically:
In Transact-SQL, something like this:
Declare #lineNum AS INT
-- Get next linenum
SELECT #lineNum = MAX(COALESCE(linenum, 0)) FROM Orders WHERE OrderID = #OrderID
SET #lineNum = #lineNum + 1
INSERT INTO ORDERS (OrderID, linenum, .....)
VALUES (#OrderID, #lineNum, ....)

You could create a cursor that reads all values sorted, then at each change in value resets the 1 then steps through incrementing each time.
E.g.:
AAA reset 1
AAA set 1 + 1 = 2
AAA set 2 + 1 = 3
BBB reset 1
CCC reset 1
CCC set 1 + 1 = 1

Hmmmmm, could you create a view that returns the line number information in order and group it based on your order ID? Making sure the line number is always returned in the same order.
Either that or you could use a trigger and on the insert calculate the max id for the order?
Or perhaps you could use a select from max statement on the insert?
Perhaps none of these are satisfactory?

If you're not using SQL 2005 this is a slightly more set based way to do this (I don't like temp tables much but I like cursors less):
declare #out table (id tinyint identity(1,1), orderid char(4))
insert #out select orderid from THESOURCETABLE
select
o.orderid, o.id - omin.minid + 1 as linenum
from #out o
inner join
(select orderid, min(id) minid from #out group by orderid) as omin on
o.orderid = omin.orderid

Related

T-SQL : Cleaning up data, merging rows into columns

I'm trying to clean up some Active Directory data in SQL Server. I have managed to read the raw LFD file into a table. Now I need to clean up some attributes where values are spread out over multiple rows. I can identify records that need to be appended to the prior record by the fact the have a leading space.
Example:
ID Record IsPartOfPrior
3114 memberOf: 0
3115 CN=Sharepoint-Members-home 1
3116 memberOf: 0
3117 This is 1
3118 part of the 1
3119 next line. 1
Ultimately, I would like to have the following table generated:
ID Record
3114 memberOf:CN=Sharepoint-Members-home
3116 memberOf:This is part of the next line
I could write it through a cursor, setting variables, working with temp tables and populating a table. But there has to be a set based (maybe recursive?) approach to this?
I could use the STUFF method to combine various rows together, but how am I about to group the various sets together? I'm thinking that I first have to define groupID's per record, and then stuff them together per groupID?
Thanks for any help.
Batch with comments below. Should work starting with SQL Server 2008.
--Here I emulate your table
DECLARE #yourtable TABLE (ID int, Record nvarchar(max), IsPartOfPrior bit)
INSERT INTO #yourtable VALUES
(3114,'memberOf:',0),(3115,'CN=Sharepoint-Members-home',1),(3116,'memberOf:',0),(3117,'This is',1),(3118,'part of the',1),(3119,'next line.',1)
--Getting max ID
DECLARE #max_id int
SELECT #max_id = MAX(ID)+1
FROM #yourtable
--We get next prior for each prior record
--And use STUFF and FOR XML PATH to build new Record
SELECT y.ID,
y.Record + b.Record as Record
FROM #yourtable y
OUTER APPLY (
SELECT TOP 1 ID as NextPrior
FROM #yourtable
WHERE IsPartOfPrior = 0 and y.ID < ID
ORDER BY ID ASC
) as t
OUTER APPLY (
SELECT STUFF((
SELECT ' '+Record
FROM #yourtable
WHERE ID > y.ID and ID < ISNULL(t.NextPrior,#max_id)
ORDER BY id ASC
FOR XML PATH('')
),1,1,'') as Record
) as b
WHERE y.IsPartOfPrior = 0
The output:
ID Record
----------- -----------------------------------------
3114 memberOf:CN=Sharepoint-Members-home
3116 memberOf:This is part of the next line.
This will work if ID are numeric and ascending.
Yet another option if 2012+
Example
Declare #YourTable Table ([ID] int,[Record] varchar(50),[IsPartOfPrior] int)
Insert Into #YourTable Values
(3114,'memberOf:',0)
,(3115,'CN=Sharepoint-Members-home',1)
,(3116,'memberOf:',0)
,(3117,'This is',1)
,(3118,'part of the',1)
,(3119,'next line.',1)
;with cte as (
Select *,Grp = sum(IIF([IsPartOfPrior]=0,1,0)) over (Order By ID)
From #YourTable
)
Select ID
,Record = Stuff((Select ' ' +Record From cte Where Grp=A.Grp Order by ID For XML Path ('')),1,1,'')
From (Select Grp,ID=min(ID)from cte Group By Grp ) A
Returns
ID Record
3114 memberOf: CN=Sharepoint-Members-home
3116 memberOf: This is part of the next line.
If it Helps with the Visualization, the cte Produces:
ID Record IsPartOfPrior Grp << Notice Grp Values
3114 memberOf: 0 1
3115 CN=Sharepoint-Members-home 1 1
3116 memberOf: 0 2
3117 This is 1 2
3118 part of the 1 2
3119 next line. 1 2

SQL Server Join issue / alternate ways

I have a query which I am trying to find out chain relationship between customer id. Currently 80k records are taking approx. 7 minutes. Could you please suggest some alternate improved ways?
Sample format is shown below. Here we are grouping based on records having relationships among them (a = b = c)
Create table #chaintable
(
CustID int,
MatchCustID int,
FN varchar(10),
LN varchar(10),
PhoneNo int,
Email varchar(50),
dtAppointment int
)
insert into #chaintable
Select 1,2,'Global','Chain',123,'',1
union all
Select 2,3,'Global','Chain',123,'a#a.com',2
union all
Select 3,2,'Global','Chain',567,'a#a.com',3
union all
Select 4,5,'Global1','Chain1',123,'a#a.com',1
union all
Select 5,4,'Global1','Chain1',123,'a#a.com',2
Select distinct
A.CustID, A.MatchCustID, A.GroupID
from
(select
c1.CustID, c1.MatchCustID, C1.dtAppointment,
case
when C1.CustID = C2.MatchCustID and C1.MatchCustID <> C2.CustID
then C1.CustID
when C1.CustID <> C2.MatchCustID and C1.MatchCustID = C2.CustID
then c1.MatchCustID
when C1.CustID = C2.MatchCustID and C1.MatchCustID = C2.CustID
then
case
when c1.CustID < C1.MatchCustID
then c1.CustID
else c1.MatchCustID
end
end GroupID
from
#chaintable C1, #chaintable C2
where
c1.CustID = c2.MatchCustID
or c1.MatchCustID = c2.CustID) A
Output:
CustID MatchCustID FN LN PhoneNo Email dtAppointment
---------------------------------------------------------
1 2 Global Chain 123 1
2 3 Global Chain 123 a#a.com 2
3 2 Global Chain 567 a#a.com 3
4 5 Global1 Chain1 123 a#a.com 1
5 4 Global1 Chain1 123 a#a.com 2
First, it is impossible to help improve performance of a query without knowing the execution plan.
There are certain problem here that I do not understand. Example, why do you have a join to the table itself and all of the outputs are values of the first table. Is the join really necessary? Or is it just for the sake of testing?
I suggest the below "logically equivalent" way of rewriting the query without using the OR in the JOIN and less work to understand the query to human (and if the computer feel the same, then it might improve).
SELECT DISTINCT c1.CustID, c1.MatchCustID,
CASE WHEN (C1.MatchCustID <> c2.CustID)
OR (c1.CustID < c1.MatchCustID) THEN c1.CustID
ELSE c1.MatchCustID END AS GroupID
FROM #chaintable C1 JOIN #chaintable C2
ON c1.CustID = c2.MatchCustID
UNION
SELECT c1.CustID, c1.MatchCustID,
c1.MatchCustID AS GroupID
FROM #chaintable C1 JOIN #chaintable C2
ON c2.CustID = c1.MatchCustID AND C1.CustID<>C2.MatchCustID
First, try to stick to the standard when adding rows to a table. While UNION ALL may be efficient enough to handle your simple rows, it seems rather verbose for a large set of inserts like you mentioned. However you do this, make sure you treat them as sets of relational data and avoid unnecessary steps.
Furthermore, CARTESIAN JOIN is old SQL syntax, today's OUTER and INNER JOINs more proficient, and as such this old style of join is only really useful in niche cases. This not being one of them.
Looking at your table and the results, the following about your table structure are observed:
CustID is independent of this table.
MatchCustID has no barring on the order
Appointment is the durable Key.
FN and LN together form an ID that dictates the GROUPID
So then, a solution may be as follows:
Create table #chaintable
(
CustID int, MatchCustID int, FN varchar(10), LN varchar(10)
, PhoneNo, Email varchar(50), dtAppointment int
)
INSERT INTO #chaintable
VALUES (1,2,'Global','Chain',123,'',1)
, (2,3,'Global','Chain',123,'a#a.com',2)
, (3,2,'Global','Chain',567,'a#a.com',3)
, (4,5,'Global1','Chain1',123,'a#a.com',1)
, (5,4,'Global1','Chain1',123,'a#a.com',2)
SELECT CustID
,MatchCustID
,dtAppointment
, FN
, LN
, DENSE_RANK() OVER (ORDER BY FN + LN DESC ) AS GroupID
FROM #chaintable
Resuls:
CustID MatchCustID dtAppointment FN LN GroupID
1 2 1 Global Chain 1
2 3 2 Global Chain 1
3 2 3 Global Chain 1
4 5 1 Global1 Chain1 2
5 4 2 Global1 Chain1 2
The only catch here is how the unique ID was used. In this example, since I have no value that uniquely identifies the event chains, I used FN + LN to bring back an order.
This has several advantages:
You avoid costly Cartesian JOINs by passing through your rows once.
GROUPID Will always be the same for each group in your final table.
Performing prechecks will not be difficult:
DECLARE #GROUPID = (SELECT MAX(GROUPID) FROM <SourceTable> )
However, this also has drawbacks:
Adding to this table may require you break up the insert statements to check for existing GROUPIDs (which sadly your data does not already have). Which in your case, the columns that determines GROUPID is FN + LN.
I would suggest a temp table that grabs the unique FN + LN values and then runs an outer apply operation such as
Example
SELECT FN + LN FROM #chaintable A
WHERE NOT EXISTS (SELECT 1
FROM #chaintable
WHERE A.FN = FN
AND A.LN = LN)
Before running an insert statement that adds the pre-value we checked earlier in the insert statement:
DECLARE #GROUPID = (SELECT ISNULL(MAX(GROUPID), 0) FROM <SourceTable> )
INSERT INTO FINAL_TABLE (CustID, MatchCustID, FN, LN, PhoneNo, Email, dtAppointment)
SELECT CustID
, MatchCustID
, FN
, LN
, PhoneNo
, Email
, dtAppointment
, #GROUPID + DENSE_RANK() OVER (ORDER BY FN + LN DESC ) AS GroupID
FROM #chaintable_sub
This will always result in the next GROUPID being of greater number than the previous entries.
Finally, I would advise you treat this data as it really is: Dirty Data. You have to perform ETL transformations on it, particularly since you have a durable key with a composite ID key...so essentially a FACT table.

How to check for a specific condition by looping through every record in SQL Server?

I do have following table
ID Name
1 Jagan Mohan Reddy868
2 Jagan Mohan Reddy869
3 Jagan Mohan Reddy
Name column size is VARCHAR(55).
Now for some other task we need to take only 10 varchar length i.e. VARCHAR(10).
My requirement is to check that after taking the only 10 bits length of Name column value for eg if i take Name value of ID 1 i.e. Jagan Mohan Reddy868 by SUBSTRING(Name, 0,11) if it equals with another row value. here in this case the final value of SUBSTRING(Jagan Mohan Reddy868, 0,11) is equal to Name value of ID 3 row whose Name is 'Jagan Mohan Reddy'. I need to make a list of those kind rows. Can somebody help me out on how can i achieve in SQL Server.
My main check is that the truncated values of my Name column should not match with any non truncated values of Name column. If so i need to get those records.
Assuming I understand the question, I think you are looking for something like this:
Create and populate sample data (Please save us this step in your future questions)
DECLARE #T as TABLE
(
Id int identity(1,1),
Name varchar(15)
)
INSERT INTO #T VALUES
('Hi, I am Zohar.'),
('Hi, I am Peled.'),
('Hi, I am Z'),
('I''m Zohar peled')
Use a cte with a self inner join to get the list of ids that match the first 10 chars:
;WITH cte as
(
SELECT T2.Id As Id1, T1.Id As Id2
FROM #T T1
INNER JOIN #T T2 ON LEFT(T1.Name, 10) = t2.Name AND T1.Id <> T2.Id
)
Select the records from the original table, inner joined with a union of the Id1 and Id2 from the cte:
SELECT T.Id, Name
FROM #T T
INNER JOIN
(
SELECT Id1 As Id
FROM CTE
UNION
SELECT Id2
FROM CTE
) U ON T.Id = U.Id
Results:
Id Name
----------- ---------------
1 Hi, I am Zohar.
3 Hi, I am Z
Try this
SELECT Id,Name
FROM(
SELECT *,ROW_NUMBER() OVER(PARTITION BY Name, LEFT(Name,11) ORDER BY ID) RN
FROM Tbale1 T
) Tmp
WHERE Tmp.RN = 1
loop over your column for all the values and put your substring() function inside this loop and I think in Sql index of string starts from 1 instead of 0. If you pass your string to charindex() like this
CHARINDEX('Y', 'Your String')
thus you will come to know whether it is starting from 0 or 1
and you can save your substring value as value of other column with length 10
I hope it will help you..
I think this should cover all the cases you are looking for.
-- Create Table
DECLARE #T as TABLE
(
Id int identity(1,1),
Name varchar(55)
)
-- Create Data
INSERT INTO #T VALUES
('Jagan Mohan Reddy868'),
('Jagan Mohan Reddy869'),
('Jagan Mohan Reddy'),
('Mohan Reddy'),
('Mohan Reddy123551'),
('Mohan R')
-- Get Matching Items
select *, SUBSTRING(name, 0, 11) as ShorterName
from #T
where SUBSTRING(name, 0, 11) in
(
-- get all shortnames with a count > 1
select SUBSTRING(name, 0, 11) as ShortName
from #T
group by SUBSTRING(name, 0, 11)
having COUNT(*) > 1
)
order by Name, LEN(Name)

How do I exclude rows when an incremental value starts over?

I am a newbie poster but have spent a lot of time researching answers here. I can't quite figure out how to create a SQL result set using SQL Server 2008 R2 that should probably be using lead/lag from more modern versions. I am trying to aggregate data based on sequencing of one column, but there can be varying numbers of instances in each sequence. The only way I know a sequence has ended is when the next row has a lower sequence number. So it may go 1-2, 1-2-3-4, 1-2-3, and I have to figure out how to make 3 aggregates out of that.
Source data is joined tables that look like this (please help me format):
recordID instanceDate moduleID iResult interactionNum
1356 10/6/15 16:14 1 68 1
1357 10/7/15 16:22 1 100 2
1434 10/9/15 16:58 1 52 1
1435 10/11/15 17:00 1 60 2
1436 10/15/15 16:57 1 100 3
1437 10/15/15 16:59 1 100 4
I need to find a way to separate the first 2 rows from the last 4 rows in this example, based on values in the last column.
What I would love to ultimately get is a result set that looks like this, which averages the iResult column based on the grouping and takes the first instanceDate from the grouping:
instanceDate moduleID iResult
10/6/15 1 84
10/9/15 1 78
I can aggregate to get this result using MIN and AVG if I can just find a way to separate the groups. The data is ordered by instanceDate (please ignore the date formatting here) then interactionNum and the group separation should happen when the query finds a row where the interactionNum is <= than the previous row (will usually start over with '1' but not always, so prefer just to separate on a lower or equal integer value).
Here is the query I have so far (includes the joins that give the above data set):
SELECT
X.*
FROM
(SELECT TOP 100 PERCENT
instanceDate, b.ModuleID, iResult, b.interactionNum
FROM
(firstTable a
INNER JOIN
secondTable b ON b.someID = a.someID)
WHERE
a.someID = 2
AND b.otherID LIKE 'xyz'
AND a.ModuleID = 1
ORDER BY
instanceDate) AS X
OUTER APPLY
(SELECT TOP 1
*
FROM
(SELECT
instanceDate, d.ModuleID, iResult, d.interactionNum
FROM
(firstTable c
INNER JOIN
secondTable d ON d.someID = c.someID)
WHERE
c.someID = 2
AND d.otherID LIKE 'xyz'
AND c.ModuleID = 1
AND d.interactionNum = X.interactionNum
AND c.instanceDate < X.instanceDate) X2
ORDER BY
instanceDate DESC) Y
WHERE
NOT EXISTS (SELECT Y.interactionNum INTERSECT SELECT X.interactionNum)
But this is returning an interim result set like this:
instanceDate ModuleID iResult interactionNum
10/6/15 16:10 1 68 1
10/6/15 16:14 1 100 2
10/15/15 16:57 1 100 3
10/15/15 16:59 1 100 4
and the problem is that interactionNum 3, 4 do not belong in this result set. They would go in the next result set when I loop over this query. How do I keep them out of the result set in this iteration? I need the result set from this query to just include the first two rows, 'seeing' that row 3 of the source data has a lower value for interactionNum than row 2 has.
Not sure what ModuleID was supposed to be used, but I guess you're looking for something like this:
select min (instanceDate), [moduleID], avg([iResult])
from (
select *,row_number() over (partition by [moduleID] order by instanceDate) as RN
from Table1
) X
group by [moduleID], RN - [interactionNum]
The idea here is to create a running number with row_number for each moduleid, and then use the difference between that and InteractionNum as grouping criteria.
Example in SQL Fiddle
Here is my solution, although it should be said, I think #JamesZ answer is cleaner.
I created a new field called newinstance which is 1 wherever your instanceNumber is 1. I then created a rolling sum(newinstance) called rollinginstance to group on.
Change the last select to SELECT * FROM cte2 to show all the fields I added.
IF OBJECT_ID('tempdb..#tmpData') IS NOT NULL
DROP TABLE #tmpData
CREATE TABLE #tmpData (recordID INT, instanceDate DATETIME, moduleID INT, iResult INT, interactionNum INT)
INSERT INTO #tmpData
SELECT 1356,'10/6/15 16:14',1,68,1 UNION
SELECT 1357,'10/7/15 16:22',1,100,2 UNION
SELECT 1434,'10/9/15 16:58',1,52,1 UNION
SELECT 1435,'10/11/15 17:00',1,60,2 UNION
SELECT 1436,'10/15/15 16:57',1,100,3 UNION
SELECT 1437,'10/15/15 16:59',1,100,4
;WITH cte1 AS
(
SELECT *,
CASE WHEN interactionNum=1 THEN 1 ELSE 0 END AS newinstance,
ROW_NUMBER() OVER(ORDER BY recordID) as rowid
FROM #tmpData
), cte2 AS
(
SELECT *,
(select SUM(newinstance) from cte1 b where b.rowid<=a.rowid) as rollinginstance
FROM cte1 a
)
SELECT MIN(instanceDate) AS instanceDate, moduleID, AVG(iResult) AS iResult
FROM cte2
GROUP BY moduleID, rollinginstance

How do I get the "Next available number" from an SQL Server? (Not an Identity column)

Technologies: SQL Server 2008
So I've tried a few options that I've found on SO, but nothing really provided me with a definitive answer.
I have a table with two columns, (Transaction ID, GroupID) where neither has unique values. For example:
TransID | GroupID
-----------------
23 | 4001
99 | 4001
63 | 4001
123 | 4001
77 | 2113
2645 | 2113
123 | 2113
99 | 2113
Originally, the groupID was just chosen at random by the user, but now we're automating it. Thing is, we're keeping the existing DB without any changes to the existing data(too much work, for too little gain)
Is there a way to query "GroupID" on table "GroupTransactions" for the next available value of GroupID > 2000?
I think from the question you're after the next available, although that may not be the same as max+1 right? - In that case:
Start with a list of integers, and look for those that aren't there in the groupid column, for example:
;WITH CTE_Numbers AS (
SELECT n = 2001
UNION ALL
SELECT n + 1 FROM CTE_Numbers WHERE n < 4000
)
SELECT top 1 n
FROM CTE_Numbers num
WHERE NOT EXISTS (SELECT 1 FROM MyTable tab WHERE num.n = tab.groupid)
ORDER BY n
Note: you need to tweak the 2001/4000 values int the CTE to allow for the range you want. I assumed the name of your table to by MyTable
select max(groupid) + 1 from GroupTransactions
The following will find the next gap above 2000:
SELECT MIN(t.GroupID)+1 AS NextID
FROM GroupTransactions t (updlock)
WHERE NOT EXISTS
(SELECT NULL FROM GroupTransactions n WHERE n.GroupID=t.GroupID+1 AND n.GroupID>2000)
AND t.GroupID>2000
There are always many ways to do everything. I resolved this problem by doing like this:
declare #i int = null
declare #t table (i int)
insert into #t values (1)
insert into #t values (2)
--insert into #t values (3)
--insert into #t values (4)
insert into #t values (5)
--insert into #t values (6)
--get the first missing number
select #i = min(RowNumber)
from (
select ROW_NUMBER() OVER(ORDER BY i) AS RowNumber, i
from (
--select distinct in case a number is in there multiple times
select distinct i
from #t
--start after 0 in case there are negative or 0 number
where i > 0
) as a
) as b
where RowNumber <> i
--if there are no missing numbers or no records, get the max record
if #i is null
begin
select #i = isnull(max(i),0) + 1 from #t
end
select #i
In my situation I have a system to generate message numbers or a file/case/reservation number sequentially from 1 every year. But in some situations a number does not get use (user was testing/practicing or whatever reason) and the number was deleted.
You can use a where clause to filter by year if all entries are in the same table, and make it dynamic (my example is hardcoded). if you archive your yearly data then not needed. The sub-query part for mID and mID2 must be identical.
The "union 0 as seq " for mID is there in case your table is empty; this is the base seed number. It can be anything ex: 3000000 or {prefix}0000. The field is an integer. If you omit " Union 0 as seq " it will not work on an empty table or when you have a table missing ID 1 it will given you the next ID ( if the first number is 4 the value returned will be 5).
This query is very quick - hint: the field must be indexed; it was tested on a table of 100,000+ rows. I found that using a domain aggregate get slower as the table increases in size.
If you remove the "top 1" you will get a list of 'next numbers' but not all the missing numbers in a sequence; ie if you have 1 2 4 7 the result will be 3 5 8.
set #newID = select top 1 mID.seq + 1 as seq from
(select a.[msg_number] as seq from [tblMSG] a --where a.[msg_date] between '2023-01-01' and '2023-12-31'
union select 0 as seq ) as mID
left outer join
(Select b.[msg_number] as seq from [tblMSG] b --where b.[msg_date] between '2023-01-01' and '2023-12-31'
) as mID2 on mID.seq + 1 = mID2.seq where mID2.seq is null order by mID.seq
-- Next: a statement to insert a row with #newID immediately in tblMSG (in a transaction block).
-- Then the row can be updated by your app.

Resources